Drum sound recognition for polyphonic audio signals by adaptation and matching of spectrogram templates with harmonic structure suppression

Kazuyoshi Yoshii, Masataka Goto, Hiroshi G. Okuno

Research output: Contribution to journalArticle

41 Citations (Scopus)

Abstract

This paper describes a system that detects onsets of the bass drum, snare drum, and hi-hat cymbals in polyphonic audio signals of popular songs. Our system is based on a template-matching method that uses power spectrograms of drum sounds as templates. This method calculates the distance between a template and each spectrogram segment extracted from a song spectrogram, using Goto's distance measure originally designed to detect the onsets in drums-only signals. However, there are two main problems. The first problem is that appropriate templates are unknown for each song. The second problem is that it is more difficult to detect drum-sound onsets in sound mixtures including various sounds other than drum sounds. To solve these problems, we propose template-adaptation and harmonic-structure-suppression methods. First of all, an initial template of each drum sound, called a seed template, is prepared. The former method adapts it to actual drum-sound spectrograms appearing in the song spectrogram. To make our system robust to the overlapping of harmonic sounds with drum sounds, the latter method suppresses harmonic components in the song spectrogram before the adaptation and matching. Experimental results with 70 popular songs showed that our template-adaptation and harmonic-structure-suppression methods improved the recognition accuracy and achieved 83%, 58%, and 46% in detecting onsets of the bass drum, snare drum, and hi-hat cymbals, respectively.

Original languageEnglish
Article number4032798
Pages (from-to)333-345
Number of pages13
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume15
Issue number1
DOIs
Publication statusPublished - 2007 Jan
Externally publishedYes

Fingerprint

audio signals
drums
spectrograms
templates
retarding
Acoustic waves
harmonics
acoustics
Template matching
Seed
seeds

Keywords

  • Drum sound recognition
  • Harmonic structure suppression
  • Polyphonic audio signal
  • Spectrogram template
  • Template adaptation
  • Template matching

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

@article{5178154b3d6746eb951eadcff4fef348,
title = "Drum sound recognition for polyphonic audio signals by adaptation and matching of spectrogram templates with harmonic structure suppression",
abstract = "This paper describes a system that detects onsets of the bass drum, snare drum, and hi-hat cymbals in polyphonic audio signals of popular songs. Our system is based on a template-matching method that uses power spectrograms of drum sounds as templates. This method calculates the distance between a template and each spectrogram segment extracted from a song spectrogram, using Goto's distance measure originally designed to detect the onsets in drums-only signals. However, there are two main problems. The first problem is that appropriate templates are unknown for each song. The second problem is that it is more difficult to detect drum-sound onsets in sound mixtures including various sounds other than drum sounds. To solve these problems, we propose template-adaptation and harmonic-structure-suppression methods. First of all, an initial template of each drum sound, called a seed template, is prepared. The former method adapts it to actual drum-sound spectrograms appearing in the song spectrogram. To make our system robust to the overlapping of harmonic sounds with drum sounds, the latter method suppresses harmonic components in the song spectrogram before the adaptation and matching. Experimental results with 70 popular songs showed that our template-adaptation and harmonic-structure-suppression methods improved the recognition accuracy and achieved 83{\%}, 58{\%}, and 46{\%} in detecting onsets of the bass drum, snare drum, and hi-hat cymbals, respectively.",
keywords = "Drum sound recognition, Harmonic structure suppression, Polyphonic audio signal, Spectrogram template, Template adaptation, Template matching",
author = "Kazuyoshi Yoshii and Masataka Goto and Okuno, {Hiroshi G.}",
year = "2007",
month = "1",
doi = "10.1109/TASL.2006.876754",
language = "English",
volume = "15",
pages = "333--345",
journal = "IEEE Transactions on Speech and Audio Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "1",

}

TY - JOUR

T1 - Drum sound recognition for polyphonic audio signals by adaptation and matching of spectrogram templates with harmonic structure suppression

AU - Yoshii, Kazuyoshi

AU - Goto, Masataka

AU - Okuno, Hiroshi G.

PY - 2007/1

Y1 - 2007/1

N2 - This paper describes a system that detects onsets of the bass drum, snare drum, and hi-hat cymbals in polyphonic audio signals of popular songs. Our system is based on a template-matching method that uses power spectrograms of drum sounds as templates. This method calculates the distance between a template and each spectrogram segment extracted from a song spectrogram, using Goto's distance measure originally designed to detect the onsets in drums-only signals. However, there are two main problems. The first problem is that appropriate templates are unknown for each song. The second problem is that it is more difficult to detect drum-sound onsets in sound mixtures including various sounds other than drum sounds. To solve these problems, we propose template-adaptation and harmonic-structure-suppression methods. First of all, an initial template of each drum sound, called a seed template, is prepared. The former method adapts it to actual drum-sound spectrograms appearing in the song spectrogram. To make our system robust to the overlapping of harmonic sounds with drum sounds, the latter method suppresses harmonic components in the song spectrogram before the adaptation and matching. Experimental results with 70 popular songs showed that our template-adaptation and harmonic-structure-suppression methods improved the recognition accuracy and achieved 83%, 58%, and 46% in detecting onsets of the bass drum, snare drum, and hi-hat cymbals, respectively.

AB - This paper describes a system that detects onsets of the bass drum, snare drum, and hi-hat cymbals in polyphonic audio signals of popular songs. Our system is based on a template-matching method that uses power spectrograms of drum sounds as templates. This method calculates the distance between a template and each spectrogram segment extracted from a song spectrogram, using Goto's distance measure originally designed to detect the onsets in drums-only signals. However, there are two main problems. The first problem is that appropriate templates are unknown for each song. The second problem is that it is more difficult to detect drum-sound onsets in sound mixtures including various sounds other than drum sounds. To solve these problems, we propose template-adaptation and harmonic-structure-suppression methods. First of all, an initial template of each drum sound, called a seed template, is prepared. The former method adapts it to actual drum-sound spectrograms appearing in the song spectrogram. To make our system robust to the overlapping of harmonic sounds with drum sounds, the latter method suppresses harmonic components in the song spectrogram before the adaptation and matching. Experimental results with 70 popular songs showed that our template-adaptation and harmonic-structure-suppression methods improved the recognition accuracy and achieved 83%, 58%, and 46% in detecting onsets of the bass drum, snare drum, and hi-hat cymbals, respectively.

KW - Drum sound recognition

KW - Harmonic structure suppression

KW - Polyphonic audio signal

KW - Spectrogram template

KW - Template adaptation

KW - Template matching

UR - http://www.scopus.com/inward/record.url?scp=34547541093&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547541093&partnerID=8YFLogxK

U2 - 10.1109/TASL.2006.876754

DO - 10.1109/TASL.2006.876754

M3 - Article

AN - SCOPUS:34547541093

VL - 15

SP - 333

EP - 345

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

SN - 1558-7916

IS - 1

M1 - 4032798

ER -