Robust localization and tracking of multiple speakers in real environments for binaural robot audition

Ui Hyun Kim, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a multisource sound localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) and a novel multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering algorithm for binaural robot audition. The standard K-means clustering algorithm was improved for the purpose of multisource speech tracking by adding two additional steps. Experiments conducted on the SIG-2 humanoid robot in a real environment show that our method can track multiple speakers in real-time with tracking error below 4.35°.

Original languageEnglish
Title of host publicationInternational Workshop on Image Analysis for Multimedia Interactive Services
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event2013 14th International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2013 - Paris
Duration: 2013 Jul 32013 Jul 5

Other

Other2013 14th International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2013
CityParis
Period13/7/313/7/5

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction
  • Software

Fingerprint Dive into the research topics of 'Robust localization and tracking of multiple speakers in real environments for binaural robot audition'. Together they form a unique fingerprint.

  • Cite this

    Kim, U. H., & Okuno, H. G. (2013). Robust localization and tracking of multiple speakers in real environments for binaural robot audition. In International Workshop on Image Analysis for Multimedia Interactive Services [6616137] https://doi.org/10.1109/WIAMIS.2013.6616137