Robust localization and tracking of multiple speakers in real environments for binaural robot audition

Ui Hyun Kim, Hiroshi G. Okuno

研究成果: Conference contribution

抄録

This paper presents a multisource sound localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) and a novel multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering algorithm for binaural robot audition. The standard K-means clustering algorithm was improved for the purpose of multisource speech tracking by adding two additional steps. Experiments conducted on the SIG-2 humanoid robot in a real environment show that our method can track multiple speakers in real-time with tracking error below 4.35°.

元の言語English
ホスト出版物のタイトルInternational Workshop on Image Analysis for Multimedia Interactive Services
DOI
出版物ステータスPublished - 2013
外部発表Yes
イベント2013 14th International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2013 - Paris
継続期間: 2013 7 32013 7 5

Other

Other2013 14th International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2013
Paris
期間13/7/313/7/5

Fingerprint

Audition
Robots
Clustering algorithms
Correlation methods
Acoustic waves
Experiments

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction
  • Software

これを引用

Kim, U. H., & Okuno, H. G. (2013). Robust localization and tracking of multiple speakers in real environments for binaural robot audition. : International Workshop on Image Analysis for Multimedia Interactive Services [6616137] https://doi.org/10.1109/WIAMIS.2013.6616137

Robust localization and tracking of multiple speakers in real environments for binaural robot audition. / Kim, Ui Hyun; Okuno, Hiroshi G.

International Workshop on Image Analysis for Multimedia Interactive Services. 2013. 6616137.

研究成果: Conference contribution

Kim, UH & Okuno, HG 2013, Robust localization and tracking of multiple speakers in real environments for binaural robot audition. : International Workshop on Image Analysis for Multimedia Interactive Services., 6616137, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2013, Paris, 13/7/3. https://doi.org/10.1109/WIAMIS.2013.6616137
Kim UH, Okuno HG. Robust localization and tracking of multiple speakers in real environments for binaural robot audition. : International Workshop on Image Analysis for Multimedia Interactive Services. 2013. 6616137 https://doi.org/10.1109/WIAMIS.2013.6616137
Kim, Ui Hyun ; Okuno, Hiroshi G. / Robust localization and tracking of multiple speakers in real environments for binaural robot audition. International Workshop on Image Analysis for Multimedia Interactive Services. 2013.
@inproceedings{06ec2365ae43400ab478e983e129e60c,
title = "Robust localization and tracking of multiple speakers in real environments for binaural robot audition",
abstract = "This paper presents a multisource sound localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) and a novel multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering algorithm for binaural robot audition. The standard K-means clustering algorithm was improved for the purpose of multisource speech tracking by adding two additional steps. Experiments conducted on the SIG-2 humanoid robot in a real environment show that our method can track multiple speakers in real-time with tracking error below 4.35°.",
author = "Kim, {Ui Hyun} and Okuno, {Hiroshi G.}",
year = "2013",
doi = "10.1109/WIAMIS.2013.6616137",
language = "English",
isbn = "9781479908332",
booktitle = "International Workshop on Image Analysis for Multimedia Interactive Services",

}

TY - GEN

T1 - Robust localization and tracking of multiple speakers in real environments for binaural robot audition

AU - Kim, Ui Hyun

AU - Okuno, Hiroshi G.

PY - 2013

Y1 - 2013

N2 - This paper presents a multisource sound localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) and a novel multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering algorithm for binaural robot audition. The standard K-means clustering algorithm was improved for the purpose of multisource speech tracking by adding two additional steps. Experiments conducted on the SIG-2 humanoid robot in a real environment show that our method can track multiple speakers in real-time with tracking error below 4.35°.

AB - This paper presents a multisource sound localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) and a novel multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering algorithm for binaural robot audition. The standard K-means clustering algorithm was improved for the purpose of multisource speech tracking by adding two additional steps. Experiments conducted on the SIG-2 humanoid robot in a real environment show that our method can track multiple speakers in real-time with tracking error below 4.35°.

UR - http://www.scopus.com/inward/record.url?scp=84887222569&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84887222569&partnerID=8YFLogxK

U2 - 10.1109/WIAMIS.2013.6616137

DO - 10.1109/WIAMIS.2013.6616137

M3 - Conference contribution

AN - SCOPUS:84887222569

SN - 9781479908332

BT - International Workshop on Image Analysis for Multimedia Interactive Services

ER -