Robust localization and tracking of multiple speakers in real environments for binaural robot audition

Ui Hyun Kim, Hiroshi G. Okuno

研究成果: Conference contribution

抄録

This paper presents a multisource sound localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) and a novel multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering algorithm for binaural robot audition. The standard K-means clustering algorithm was improved for the purpose of multisource speech tracking by adding two additional steps. Experiments conducted on the SIG-2 humanoid robot in a real environment show that our method can track multiple speakers in real-time with tracking error below 4.35°.

本文言語English
ホスト出版物のタイトルInternational Workshop on Image Analysis for Multimedia Interactive Services
DOI
出版ステータスPublished - 2013
外部発表はい
イベント2013 14th International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2013 - Paris
継続期間: 2013 7 32013 7 5

Other

Other2013 14th International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2013
CityParis
Period13/7/313/7/5

ASJC Scopus subject areas

  • コンピュータ グラフィックスおよびコンピュータ支援設計
  • 人間とコンピュータの相互作用
  • ソフトウェア

フィンガープリント

「Robust localization and tracking of multiple speakers in real environments for binaural robot audition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル