Effects of increasing modalities in recognizing three simultaneous speeches

Hiroshi G. Okuno*, Kazuhiro Nakadai, Hiroaki Kitano

*この研究の対応する著者

研究成果: Article査読

抄録

One of the essential problems of auditory processing in noisy real-world environments is that the number of sound sources is greater than that of microphones. To model this situation, we try to separate three simultaneous speeches by two microphones. This problem is difficult because well-known techniques with microphone arrays such as the nullforming and beamforming techniques and independent component analysis (ICA) require in practice three or more microphones. This paper reports the effects of increasing modalities in recognizing three simultaneous speeches with two microphones. We investigate four cases; monaural (one microphone), binaural (a pair of microphones embedded in a dummy head), binaural with ICA, and binaural with vision (two dummy head microphones and two cameras). The fourth method is called "Direction-Pass Filter" (DPF), which separates sound sources originating from a specific direction given by auditory and/or visual processing. The direction of auditory frequency component is determined by using the Head-Related Transfer Function (HRTF) of the dummy head and thus the DPF is independent for the number of sound sources i.e. it does not assume the number of sound sources. With 200 benchmarks of three simultaneous utterances of Japanese words, the quality of each separated speech is evaluated by an automatic speech recognition system. The performance of word recognition of three simultaneous speeches is improved by adding more modalities, that is, from monaural, binaural, binaural with ICA, to binaural with vision. The average 1-best and 10-best recognition rates of separated speeches attained by the Direction-Pass Filter are 60% and 81%, respectively.

本文言語English
ページ(範囲)347-359
ページ数13
ジャーナルSpeech Communication
43
4 SPEC. ISS.
DOI
出版ステータスPublished - 2004 9月
外部発表はい

ASJC Scopus subject areas

  • 信号処理
  • 電子工学および電気工学
  • 実験心理学および認知心理学
  • 言語学および言語

フィンガープリント

「Effects of increasing modalities in recognizing three simultaneous speeches」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル