Robust localization and tracking of multiple speakers in real environments for binaural robot audition

Ui Hyun Kim, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a multisource sound localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) and a novel multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering algorithm for binaural robot audition. The standard K-means clustering algorithm was improved for the purpose of multisource speech tracking by adding two additional steps. Experiments conducted on the SIG-2 humanoid robot in a real environment show that our method can track multiple speakers in real-time with tracking error below 4.35°.

Original languageEnglish
Title of host publicationInternational Workshop on Image Analysis for Multimedia Interactive Services
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event2013 14th International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2013 - Paris
Duration: 2013 Jul 32013 Jul 5

Other

Other2013 14th International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2013
CityParis
Period13/7/313/7/5

Fingerprint

Audition
Robots
Clustering algorithms
Correlation methods
Acoustic waves
Experiments

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction
  • Software

Cite this

Kim, U. H., & Okuno, H. G. (2013). Robust localization and tracking of multiple speakers in real environments for binaural robot audition. In International Workshop on Image Analysis for Multimedia Interactive Services [6616137] https://doi.org/10.1109/WIAMIS.2013.6616137

Robust localization and tracking of multiple speakers in real environments for binaural robot audition. / Kim, Ui Hyun; Okuno, Hiroshi G.

International Workshop on Image Analysis for Multimedia Interactive Services. 2013. 6616137.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kim, UH & Okuno, HG 2013, Robust localization and tracking of multiple speakers in real environments for binaural robot audition. in International Workshop on Image Analysis for Multimedia Interactive Services., 6616137, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2013, Paris, 13/7/3. https://doi.org/10.1109/WIAMIS.2013.6616137
Kim UH, Okuno HG. Robust localization and tracking of multiple speakers in real environments for binaural robot audition. In International Workshop on Image Analysis for Multimedia Interactive Services. 2013. 6616137 https://doi.org/10.1109/WIAMIS.2013.6616137
Kim, Ui Hyun ; Okuno, Hiroshi G. / Robust localization and tracking of multiple speakers in real environments for binaural robot audition. International Workshop on Image Analysis for Multimedia Interactive Services. 2013.
@inproceedings{06ec2365ae43400ab478e983e129e60c,
title = "Robust localization and tracking of multiple speakers in real environments for binaural robot audition",
abstract = "This paper presents a multisource sound localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) and a novel multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering algorithm for binaural robot audition. The standard K-means clustering algorithm was improved for the purpose of multisource speech tracking by adding two additional steps. Experiments conducted on the SIG-2 humanoid robot in a real environment show that our method can track multiple speakers in real-time with tracking error below 4.35°.",
author = "Kim, {Ui Hyun} and Okuno, {Hiroshi G.}",
year = "2013",
doi = "10.1109/WIAMIS.2013.6616137",
language = "English",
isbn = "9781479908332",
booktitle = "International Workshop on Image Analysis for Multimedia Interactive Services",

}

TY - GEN

T1 - Robust localization and tracking of multiple speakers in real environments for binaural robot audition

AU - Kim, Ui Hyun

AU - Okuno, Hiroshi G.

PY - 2013

Y1 - 2013

N2 - This paper presents a multisource sound localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) and a novel multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering algorithm for binaural robot audition. The standard K-means clustering algorithm was improved for the purpose of multisource speech tracking by adding two additional steps. Experiments conducted on the SIG-2 humanoid robot in a real environment show that our method can track multiple speakers in real-time with tracking error below 4.35°.

AB - This paper presents a multisource sound localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) and a novel multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering algorithm for binaural robot audition. The standard K-means clustering algorithm was improved for the purpose of multisource speech tracking by adding two additional steps. Experiments conducted on the SIG-2 humanoid robot in a real environment show that our method can track multiple speakers in real-time with tracking error below 4.35°.

UR - http://www.scopus.com/inward/record.url?scp=84887222569&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84887222569&partnerID=8YFLogxK

U2 - 10.1109/WIAMIS.2013.6616137

DO - 10.1109/WIAMIS.2013.6616137

M3 - Conference contribution

AN - SCOPUS:84887222569

SN - 9781479908332

BT - International Workshop on Image Analysis for Multimedia Interactive Services

ER -