Improved sound source localization in horizontal plane for binaural robot audition

Ui Hyun Kim, Kazuhiro Nakadai, Hiroshi G. Okuno

研究成果: Article

13 引用 (Scopus)

抄録

An improved sound source localization (SSL) method has been developed that is based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for use with binaural robots equipped with two microphones inside artificial pinnae. The conventional SSL method based on the GCC-PHAT method has two main problems when used on a binaural robot platform: 1) diffraction of sound waves with multipath interference caused by the contours of the robot head, which affects localization accuracy, and 2) front-back ambiguity, which limits the localization range to half the horizontal space. The diffraction problem was overcome by incorporating a new time delay factor into the GCC-PHAT method under the assumption of a spherical robot head. The ambiguity problem was overcome by utilizing the amplification effect of the pinnae for localization over the entire azimuth. Experiments conducted using two dummy heads equipped with small or large pinnae showed that localization errors were reduced by 8.91° (3.21° vs. 12.12°) on average with the new time delay factor compared with the conventional GCC-PHAT method and that the success rate for front-back disambiguation using the pinnae amplification effect was 29.76 % (93.46 % vs. 72.02 %) better on average over the entire azimuth than with a conventional head related transfer function (HRTF)-based method.

元の言語English
ページ(範囲)63-74
ページ数12
ジャーナルApplied Intelligence
42
発行部数1
DOI
出版物ステータスPublished - 2014
外部発表Yes

Fingerprint

Audition
Acoustic waves
Robots
Amplification
Time delay
Diffraction
Correlation methods
Microphones
Transfer functions
Experiments

ASJC Scopus subject areas

  • Artificial Intelligence

これを引用

Improved sound source localization in horizontal plane for binaural robot audition. / Kim, Ui Hyun; Nakadai, Kazuhiro; Okuno, Hiroshi G.

:: Applied Intelligence, 巻 42, 番号 1, 2014, p. 63-74.

研究成果: Article

@article{01444b94ee59420db465e453c08c491d,
title = "Improved sound source localization in horizontal plane for binaural robot audition",
abstract = "An improved sound source localization (SSL) method has been developed that is based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for use with binaural robots equipped with two microphones inside artificial pinnae. The conventional SSL method based on the GCC-PHAT method has two main problems when used on a binaural robot platform: 1) diffraction of sound waves with multipath interference caused by the contours of the robot head, which affects localization accuracy, and 2) front-back ambiguity, which limits the localization range to half the horizontal space. The diffraction problem was overcome by incorporating a new time delay factor into the GCC-PHAT method under the assumption of a spherical robot head. The ambiguity problem was overcome by utilizing the amplification effect of the pinnae for localization over the entire azimuth. Experiments conducted using two dummy heads equipped with small or large pinnae showed that localization errors were reduced by 8.91° (3.21° vs. 12.12°) on average with the new time delay factor compared with the conventional GCC-PHAT method and that the success rate for front-back disambiguation using the pinnae amplification effect was 29.76 {\%} (93.46 {\%} vs. 72.02 {\%}) better on average over the entire azimuth than with a conventional head related transfer function (HRTF)-based method.",
keywords = "Front-back disambiguation, Human-robot interaction, Intelligent robot audition, Sound source localization",
author = "Kim, {Ui Hyun} and Kazuhiro Nakadai and Okuno, {Hiroshi G.}",
year = "2014",
doi = "10.1007/s10489-014-0544-y",
language = "English",
volume = "42",
pages = "63--74",
journal = "Applied Intelligence",
issn = "0924-669X",
publisher = "Springer Netherlands",
number = "1",

}

TY - JOUR

T1 - Improved sound source localization in horizontal plane for binaural robot audition

AU - Kim, Ui Hyun

AU - Nakadai, Kazuhiro

AU - Okuno, Hiroshi G.

PY - 2014

Y1 - 2014

N2 - An improved sound source localization (SSL) method has been developed that is based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for use with binaural robots equipped with two microphones inside artificial pinnae. The conventional SSL method based on the GCC-PHAT method has two main problems when used on a binaural robot platform: 1) diffraction of sound waves with multipath interference caused by the contours of the robot head, which affects localization accuracy, and 2) front-back ambiguity, which limits the localization range to half the horizontal space. The diffraction problem was overcome by incorporating a new time delay factor into the GCC-PHAT method under the assumption of a spherical robot head. The ambiguity problem was overcome by utilizing the amplification effect of the pinnae for localization over the entire azimuth. Experiments conducted using two dummy heads equipped with small or large pinnae showed that localization errors were reduced by 8.91° (3.21° vs. 12.12°) on average with the new time delay factor compared with the conventional GCC-PHAT method and that the success rate for front-back disambiguation using the pinnae amplification effect was 29.76 % (93.46 % vs. 72.02 %) better on average over the entire azimuth than with a conventional head related transfer function (HRTF)-based method.

AB - An improved sound source localization (SSL) method has been developed that is based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for use with binaural robots equipped with two microphones inside artificial pinnae. The conventional SSL method based on the GCC-PHAT method has two main problems when used on a binaural robot platform: 1) diffraction of sound waves with multipath interference caused by the contours of the robot head, which affects localization accuracy, and 2) front-back ambiguity, which limits the localization range to half the horizontal space. The diffraction problem was overcome by incorporating a new time delay factor into the GCC-PHAT method under the assumption of a spherical robot head. The ambiguity problem was overcome by utilizing the amplification effect of the pinnae for localization over the entire azimuth. Experiments conducted using two dummy heads equipped with small or large pinnae showed that localization errors were reduced by 8.91° (3.21° vs. 12.12°) on average with the new time delay factor compared with the conventional GCC-PHAT method and that the success rate for front-back disambiguation using the pinnae amplification effect was 29.76 % (93.46 % vs. 72.02 %) better on average over the entire azimuth than with a conventional head related transfer function (HRTF)-based method.

KW - Front-back disambiguation

KW - Human-robot interaction

KW - Intelligent robot audition

KW - Sound source localization

UR - http://www.scopus.com/inward/record.url?scp=84920708896&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84920708896&partnerID=8YFLogxK

U2 - 10.1007/s10489-014-0544-y

DO - 10.1007/s10489-014-0544-y

M3 - Article

AN - SCOPUS:84920708896

VL - 42

SP - 63

EP - 74

JO - Applied Intelligence

JF - Applied Intelligence

SN - 0924-669X

IS - 1

ER -