Improvement of speaker localization by considering multipath interference of sound wave for binaural robot audition

Ui Hyun Kim, Takeshi Mizumoto, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

This paper presents an improved speaker localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for binaural robot audition. The problem with the conventional direction-of-arrival (DOA) estimation based on the GCC-PHAT method is a multipath interference whereby a sound wave travels to microphones via the front-head path and the back-head path in binaural robot audition. This paper describes a new time delay factor for the GCC-PHAT method to compensate multipath interference on the assumption of spherical robot head. In addition, the restriction of the time difference of arrival (TDOA) estimation by the sampling frequency is also solved by applying the maximum likelihood (ML) estimation in frequency domain. Experiments conducted in the SIG-2 humanoid robot show that the proposed method reduces localization errors by 17.8 degrees on average and by over 35 degrees in side directions comparing to the conventional DOA estimation.

Original languageEnglish
Title of host publicationIEEE International Conference on Intelligent Robots and Systems
Pages2910-2915
Number of pages6
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event2011 IEEE/RSJ International Conference on Intelligent Robots and Systems: Celebrating 50 Years of Robotics, IROS'11 - San Francisco, CA
Duration: 2011 Sep 252011 Sep 30

Other

Other2011 IEEE/RSJ International Conference on Intelligent Robots and Systems: Celebrating 50 Years of Robotics, IROS'11
CitySan Francisco, CA
Period11/9/2511/9/30

Fingerprint

Audition
Acoustic waves
Robots
Direction of arrival
Correlation methods
Maximum likelihood estimation
Microphones
Time delay
Sampling
Experiments

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Software
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Kim, U. H., Mizumoto, T., Ogata, T., & Okuno, H. G. (2011). Improvement of speaker localization by considering multipath interference of sound wave for binaural robot audition. In IEEE International Conference on Intelligent Robots and Systems (pp. 2910-2915). [6048364] https://doi.org/10.1109/IROS.2011.6048364

Improvement of speaker localization by considering multipath interference of sound wave for binaural robot audition. / Kim, Ui Hyun; Mizumoto, Takeshi; Ogata, Tetsuya; Okuno, Hiroshi G.

IEEE International Conference on Intelligent Robots and Systems. 2011. p. 2910-2915 6048364.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kim, UH, Mizumoto, T, Ogata, T & Okuno, HG 2011, Improvement of speaker localization by considering multipath interference of sound wave for binaural robot audition. in IEEE International Conference on Intelligent Robots and Systems., 6048364, pp. 2910-2915, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems: Celebrating 50 Years of Robotics, IROS'11, San Francisco, CA, 11/9/25. https://doi.org/10.1109/IROS.2011.6048364
Kim UH, Mizumoto T, Ogata T, Okuno HG. Improvement of speaker localization by considering multipath interference of sound wave for binaural robot audition. In IEEE International Conference on Intelligent Robots and Systems. 2011. p. 2910-2915. 6048364 https://doi.org/10.1109/IROS.2011.6048364
Kim, Ui Hyun ; Mizumoto, Takeshi ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Improvement of speaker localization by considering multipath interference of sound wave for binaural robot audition. IEEE International Conference on Intelligent Robots and Systems. 2011. pp. 2910-2915
@inproceedings{2cb172b79ae74bfc958a94463c104ccc,
title = "Improvement of speaker localization by considering multipath interference of sound wave for binaural robot audition",
abstract = "This paper presents an improved speaker localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for binaural robot audition. The problem with the conventional direction-of-arrival (DOA) estimation based on the GCC-PHAT method is a multipath interference whereby a sound wave travels to microphones via the front-head path and the back-head path in binaural robot audition. This paper describes a new time delay factor for the GCC-PHAT method to compensate multipath interference on the assumption of spherical robot head. In addition, the restriction of the time difference of arrival (TDOA) estimation by the sampling frequency is also solved by applying the maximum likelihood (ML) estimation in frequency domain. Experiments conducted in the SIG-2 humanoid robot show that the proposed method reduces localization errors by 17.8 degrees on average and by over 35 degrees in side directions comparing to the conventional DOA estimation.",
author = "Kim, {Ui Hyun} and Takeshi Mizumoto and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2011",
doi = "10.1109/IROS.2011.6048364",
language = "English",
isbn = "9781612844541",
pages = "2910--2915",
booktitle = "IEEE International Conference on Intelligent Robots and Systems",

}

TY - GEN

T1 - Improvement of speaker localization by considering multipath interference of sound wave for binaural robot audition

AU - Kim, Ui Hyun

AU - Mizumoto, Takeshi

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2011

Y1 - 2011

N2 - This paper presents an improved speaker localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for binaural robot audition. The problem with the conventional direction-of-arrival (DOA) estimation based on the GCC-PHAT method is a multipath interference whereby a sound wave travels to microphones via the front-head path and the back-head path in binaural robot audition. This paper describes a new time delay factor for the GCC-PHAT method to compensate multipath interference on the assumption of spherical robot head. In addition, the restriction of the time difference of arrival (TDOA) estimation by the sampling frequency is also solved by applying the maximum likelihood (ML) estimation in frequency domain. Experiments conducted in the SIG-2 humanoid robot show that the proposed method reduces localization errors by 17.8 degrees on average and by over 35 degrees in side directions comparing to the conventional DOA estimation.

AB - This paper presents an improved speaker localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for binaural robot audition. The problem with the conventional direction-of-arrival (DOA) estimation based on the GCC-PHAT method is a multipath interference whereby a sound wave travels to microphones via the front-head path and the back-head path in binaural robot audition. This paper describes a new time delay factor for the GCC-PHAT method to compensate multipath interference on the assumption of spherical robot head. In addition, the restriction of the time difference of arrival (TDOA) estimation by the sampling frequency is also solved by applying the maximum likelihood (ML) estimation in frequency domain. Experiments conducted in the SIG-2 humanoid robot show that the proposed method reduces localization errors by 17.8 degrees on average and by over 35 degrees in side directions comparing to the conventional DOA estimation.

UR - http://www.scopus.com/inward/record.url?scp=84455195816&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84455195816&partnerID=8YFLogxK

U2 - 10.1109/IROS.2011.6048364

DO - 10.1109/IROS.2011.6048364

M3 - Conference contribution

AN - SCOPUS:84455195816

SN - 9781612844541

SP - 2910

EP - 2915

BT - IEEE International Conference on Intelligent Robots and Systems

ER -