Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition

Nobuhide Yamakawa, Tetsuro Kitahara, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

研究成果: Paper

10 引用 (Scopus)

抜粋

Research on environmental sound recognition has not shown great development in comparison with that on speech and musical signals. One of the reasons is that the sound category of environmental sounds covers a broad range of acoustical natures. We classified them in order to explore suitable recognition techniques for each characteristic. We focus on impulsive sounds and their non-stationary feature within and between analytic frames. We used matching-pursuit as a framework to use wavelet analysis for extracting temporal variation of audio features inside a frame. We also investigated the validity of modeling decaying patterns of sounds using Hidden markov models. Experimental results indicate that sounds with multiple impulsive signals are recognized better by using time-frequency analyzing bases than by frequency domain analysis. Classification of sound classes with a long and clear decaying pattern improves when HMMs with multiple number of hidden states are applied.

元の言語English
ページ2342-2345
ページ数4
出版物ステータスPublished - 2010 12 1
外部発表Yes
イベント11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba, Japan
継続期間: 2010 9 262010 9 30

Conference

Conference11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010
Japan
Makuhari, Chiba
期間10/9/2610/9/30

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing

フィンガープリント Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用

    Yamakawa, N., Kitahara, T., Takahashi, T., Komatani, K., Ogata, T., & Okuno, H. G. (2010). Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition. 2342-2345. 論文発表場所 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, Japan.