Real-time speaker localization and speech separation by audio-visual integration

Kazuhiro Nakadai*, Ken Ichi Hidai, Hiroshi G. Okuno, Hiroaki Kitano

*この研究の対応する著者

研究成果: Conference contribution

37 被引用数 (Scopus)

抄録

Robot audition in real-world should cope with motor and other noises caused by the robot's own movements in addition to environmental noises and reverberation. This paper reports how auditory processing is improved by audio-visual integration with active movements. The key idea resides in hierarchical integration of auditory and visual streams to disambiguate auditory or visual processing. The system runs in real-time by using distributed processing on 4 PCs connected by Gigabit Ethernet. The system implemented in a upper-torso humanoid tracks multiple talkers and extracts speech from a mixture of sounds. The performance of epipolar geometry based sound source localization and sound source separation by active and adaptive direction-pass filtering is also reported.

本文言語English
ホスト出版物のタイトルProceedings - IEEE International Conference on Robotics and Automation
ページ1043-1049
ページ数7
1
出版ステータスPublished - 2002
外部発表はい
イベント2002 IEEE International Conference on Robotics and Automation - Washington, DC, United States
継続期間: 2002 5月 112002 5月 15

Other

Other2002 IEEE International Conference on Robotics and Automation
国/地域United States
CityWashington, DC
Period02/5/1102/5/15

ASJC Scopus subject areas

  • ソフトウェア
  • 制御およびシステム工学

フィンガープリント

「Real-time speaker localization and speech separation by audio-visual integration」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル