Real-time speaker localization and speech separation by audio-visual integration

Kazuhiro Nakadai, Ken Ichi Hidai, Hiroshi G. Okuno, Hiroaki Kitano

研究成果: Conference contribution

30 引用 (Scopus)

抜粋

Robot audition in real-world should cope with motor and other noises caused by the robot's own movements in addition to environmental noises and reverberation. This paper reports how auditory processing is improved by audio-visual integration with active movements. The key idea resides in hierarchical integration of auditory and visual streams to disambiguate auditory or visual processing. The system runs in real-time by using distributed processing on 4 PCs connected by Gigabit Ethernet. The system implemented in a upper-torso humanoid tracks multiple talkers and extracts speech from a mixture of sounds. The performance of epipolar geometry based sound source localization and sound source separation by active and adaptive direction-pass filtering is also reported.

元の言語English
ホスト出版物のタイトルProceedings - IEEE International Conference on Robotics and Automation
ページ1043-1049
ページ数7
1
出版物ステータスPublished - 2002
外部発表Yes
イベント2002 IEEE International Conference on Robotics and Automation - Washington, DC, United States
継続期間: 2002 5 112002 5 15

Other

Other2002 IEEE International Conference on Robotics and Automation
United States
Washington, DC
期間02/5/1102/5/15

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering

フィンガープリント Real-time speaker localization and speech separation by audio-visual integration' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用

    Nakadai, K., Hidai, K. I., Okuno, H. G., & Kitano, H. (2002). Real-time speaker localization and speech separation by audio-visual integration. : Proceedings - IEEE International Conference on Robotics and Automation (巻 1, pp. 1043-1049)