Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition

Nobuhide Yamakawa*, Tetsuro Kitahara, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

*この研究の対応する著者

研究成果

12 被引用数 (Scopus)

抄録

Research on environmental sound recognition has not shown great development in comparison with that on speech and musical signals. One of the reasons is that the sound category of environmental sounds covers a broad range of acoustical natures. We classified them in order to explore suitable recognition techniques for each characteristic. We focus on impulsive sounds and their non-stationary feature within and between analytic frames. We used matching-pursuit as a framework to use wavelet analysis for extracting temporal variation of audio features inside a frame. We also investigated the validity of modeling decaying patterns of sounds using Hidden markov models. Experimental results indicate that sounds with multiple impulsive signals are recognized better by using time-frequency analyzing bases than by frequency domain analysis. Classification of sound classes with a long and clear decaying pattern improves when HMMs with multiple number of hidden states are applied.

本文言語English
ホスト出版物のタイトルProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
出版社International Speech Communication Association
ページ2342-2345
ページ数4
出版ステータスPublished - 2010
外部発表はい

出版物シリーズ

名前Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

ASJC Scopus subject areas

  • 言語および言語学
  • 言語聴覚療法

フィンガープリント

「Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル