Real-time speaker localization and speech separation by audio-visual integration

Kazuhiro Nakadai, Ken Ichi Hidai, Hiroshi G. Okuno, Hiroaki Kitano

Research output: Chapter in Book/Report/Conference proceedingConference contribution

31 Citations (Scopus)

Abstract

Robot audition in real-world should cope with motor and other noises caused by the robot's own movements in addition to environmental noises and reverberation. This paper reports how auditory processing is improved by audio-visual integration with active movements. The key idea resides in hierarchical integration of auditory and visual streams to disambiguate auditory or visual processing. The system runs in real-time by using distributed processing on 4 PCs connected by Gigabit Ethernet. The system implemented in a upper-torso humanoid tracks multiple talkers and extracts speech from a mixture of sounds. The performance of epipolar geometry based sound source localization and sound source separation by active and adaptive direction-pass filtering is also reported.

Original languageEnglish
Title of host publicationProceedings - IEEE International Conference on Robotics and Automation
Pages1043-1049
Number of pages7
Volume1
Publication statusPublished - 2002
Externally publishedYes
Event2002 IEEE International Conference on Robotics and Automation - Washington, DC, United States
Duration: 2002 May 112002 May 15

Other

Other2002 IEEE International Conference on Robotics and Automation
CountryUnited States
CityWashington, DC
Period02/5/1102/5/15

Keywords

  • Audio-visual integration
  • Multiple speaker tracking
  • Robot audition
  • Sound source localization
  • Sound source separation

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering

Fingerprint Dive into the research topics of 'Real-time speaker localization and speech separation by audio-visual integration'. Together they form a unique fingerprint.

Cite this