Three simultaneous speech recognition by integration of active audition and face recognition for humanoid

Kazuhiro Nakadai, Daisuke Matsuura, Hiroshi G. Okuno, Hiroshi Tsujino

研究成果: Conference contribution

1 引用 (Scopus)

抜粋

This paper addresses listening to three simultaneous talkers by a humanoid with two microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech are difficult, because the number of simultaneous talkers exceeds that of its microphones, the signal-to-noise ratio is quite low (around -3 dB) and noise is not stable due to interfering voices. Humanoid audition system consists of sound separation, face recognition and ASR. Sound sources are separated by an active direction-pass filter (ADPF), which extracts sounds from a specified direction in real-time. Since features of sounds separated by ADPF vary according to the sound direction, ASR uses multiple direction- and speaker-dependent acoustic models. The system integrates ASR results by using the sound direction and speaker information by face recognition as well as confidence measure of ASR results to select the best one. The resulting system improves word recognition rates against three simultaneous utterances.

元の言語English
ホスト出版物のタイトルEUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology
出版者International Speech Communication Association
ページ2705-2708
ページ数4
出版物ステータスPublished - 2003
外部発表Yes
イベント8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland
継続期間: 2003 9 12003 9 4

Other

Other8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
Switzerland
Geneva
期間03/9/103/9/4

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Linguistics and Language
  • Communication

フィンガープリント Three simultaneous speech recognition by integration of active audition and face recognition for humanoid' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用

    Nakadai, K., Matsuura, D., Okuno, H. G., & Tsujino, H. (2003). Three simultaneous speech recognition by integration of active audition and face recognition for humanoid. : EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology (pp. 2705-2708). International Speech Communication Association.