Improvement in listening capability for humanoid robot HRP-2

Toru Takahashi, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

This paper describes improvement of sound source separation for a simultaneous automatic speech recognition (ASR) system of a humanoid robot. A recognition error in the system is caused by a separation error and interferences of other sources. In separability, an original geometric source separation (GSS) is improved. Our GSS uses a measured robot's head related transfer function (HRTF) to estimate a separation matrix. As an original GSS uses a simulated HRTF calculated based on a distance between microphone and sound source, there is a large mismatch between the simulated and the measured transfer functions. The mismatch causes a severe degradation of recognition performance. Faster convergence speed of separation matrix reduces separation error. Our approach gives a nearer initial separation matrix based on a measured transfer function from an optimal separation matrix than a simulated one. As a result, we expect that our GSS improves the convergence speed. Our GSS is also able to handle an adaptive step-size parameter. These new features are added into open source robot audition software (OSS) called "HARK" which is newly updated as version 1.0.0. The HARK has been installed on a HRP-2 humanoid with an 8-element microphone array. The listening capability of HRP-2 is evaluated by recognizing a target speech signal which is separated from a simultaneous speech signal by three talkers. The word correct rate (WCR) of ASR improves by 5 points under normal acoustic environments and by 10 points under noisy environments. Experimental results show that HARK 1.0.0 improves the robustness against noises.

Original languageEnglish
Title of host publicationProceedings - IEEE International Conference on Robotics and Automation
Pages470-475
Number of pages6
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event2010 IEEE International Conference on Robotics and Automation, ICRA 2010 - Anchorage, AK
Duration: 2010 May 32010 May 7

Other

Other2010 IEEE International Conference on Robotics and Automation, ICRA 2010
CityAnchorage, AK
Period10/5/310/5/7

Fingerprint

Source separation
Robots
Transfer functions
Microphones
Speech recognition
Acoustic waves
Audition
Acoustics
Degradation

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Cite this

Takahashi, T., Nakadai, K., Komatani, K., Ogata, T., & Okuno, H. G. (2010). Improvement in listening capability for humanoid robot HRP-2. In Proceedings - IEEE International Conference on Robotics and Automation (pp. 470-475). [5509830] https://doi.org/10.1109/ROBOT.2010.5509830

Improvement in listening capability for humanoid robot HRP-2. / Takahashi, Toru; Nakadai, Kazuhiro; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

Proceedings - IEEE International Conference on Robotics and Automation. 2010. p. 470-475 5509830.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Takahashi, T, Nakadai, K, Komatani, K, Ogata, T & Okuno, HG 2010, Improvement in listening capability for humanoid robot HRP-2. in Proceedings - IEEE International Conference on Robotics and Automation., 5509830, pp. 470-475, 2010 IEEE International Conference on Robotics and Automation, ICRA 2010, Anchorage, AK, 10/5/3. https://doi.org/10.1109/ROBOT.2010.5509830
Takahashi T, Nakadai K, Komatani K, Ogata T, Okuno HG. Improvement in listening capability for humanoid robot HRP-2. In Proceedings - IEEE International Conference on Robotics and Automation. 2010. p. 470-475. 5509830 https://doi.org/10.1109/ROBOT.2010.5509830
Takahashi, Toru ; Nakadai, Kazuhiro ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Improvement in listening capability for humanoid robot HRP-2. Proceedings - IEEE International Conference on Robotics and Automation. 2010. pp. 470-475
@inproceedings{f02ab7f174f14e838b8dd300d211c69d,
title = "Improvement in listening capability for humanoid robot HRP-2",
abstract = "This paper describes improvement of sound source separation for a simultaneous automatic speech recognition (ASR) system of a humanoid robot. A recognition error in the system is caused by a separation error and interferences of other sources. In separability, an original geometric source separation (GSS) is improved. Our GSS uses a measured robot's head related transfer function (HRTF) to estimate a separation matrix. As an original GSS uses a simulated HRTF calculated based on a distance between microphone and sound source, there is a large mismatch between the simulated and the measured transfer functions. The mismatch causes a severe degradation of recognition performance. Faster convergence speed of separation matrix reduces separation error. Our approach gives a nearer initial separation matrix based on a measured transfer function from an optimal separation matrix than a simulated one. As a result, we expect that our GSS improves the convergence speed. Our GSS is also able to handle an adaptive step-size parameter. These new features are added into open source robot audition software (OSS) called {"}HARK{"} which is newly updated as version 1.0.0. The HARK has been installed on a HRP-2 humanoid with an 8-element microphone array. The listening capability of HRP-2 is evaluated by recognizing a target speech signal which is separated from a simultaneous speech signal by three talkers. The word correct rate (WCR) of ASR improves by 5 points under normal acoustic environments and by 10 points under noisy environments. Experimental results show that HARK 1.0.0 improves the robustness against noises.",
author = "Toru Takahashi and Kazuhiro Nakadai and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2010",
doi = "10.1109/ROBOT.2010.5509830",
language = "English",
isbn = "9781424450381",
pages = "470--475",
booktitle = "Proceedings - IEEE International Conference on Robotics and Automation",

}

TY - GEN

T1 - Improvement in listening capability for humanoid robot HRP-2

AU - Takahashi, Toru

AU - Nakadai, Kazuhiro

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2010

Y1 - 2010

N2 - This paper describes improvement of sound source separation for a simultaneous automatic speech recognition (ASR) system of a humanoid robot. A recognition error in the system is caused by a separation error and interferences of other sources. In separability, an original geometric source separation (GSS) is improved. Our GSS uses a measured robot's head related transfer function (HRTF) to estimate a separation matrix. As an original GSS uses a simulated HRTF calculated based on a distance between microphone and sound source, there is a large mismatch between the simulated and the measured transfer functions. The mismatch causes a severe degradation of recognition performance. Faster convergence speed of separation matrix reduces separation error. Our approach gives a nearer initial separation matrix based on a measured transfer function from an optimal separation matrix than a simulated one. As a result, we expect that our GSS improves the convergence speed. Our GSS is also able to handle an adaptive step-size parameter. These new features are added into open source robot audition software (OSS) called "HARK" which is newly updated as version 1.0.0. The HARK has been installed on a HRP-2 humanoid with an 8-element microphone array. The listening capability of HRP-2 is evaluated by recognizing a target speech signal which is separated from a simultaneous speech signal by three talkers. The word correct rate (WCR) of ASR improves by 5 points under normal acoustic environments and by 10 points under noisy environments. Experimental results show that HARK 1.0.0 improves the robustness against noises.

AB - This paper describes improvement of sound source separation for a simultaneous automatic speech recognition (ASR) system of a humanoid robot. A recognition error in the system is caused by a separation error and interferences of other sources. In separability, an original geometric source separation (GSS) is improved. Our GSS uses a measured robot's head related transfer function (HRTF) to estimate a separation matrix. As an original GSS uses a simulated HRTF calculated based on a distance between microphone and sound source, there is a large mismatch between the simulated and the measured transfer functions. The mismatch causes a severe degradation of recognition performance. Faster convergence speed of separation matrix reduces separation error. Our approach gives a nearer initial separation matrix based on a measured transfer function from an optimal separation matrix than a simulated one. As a result, we expect that our GSS improves the convergence speed. Our GSS is also able to handle an adaptive step-size parameter. These new features are added into open source robot audition software (OSS) called "HARK" which is newly updated as version 1.0.0. The HARK has been installed on a HRP-2 humanoid with an 8-element microphone array. The listening capability of HRP-2 is evaluated by recognizing a target speech signal which is separated from a simultaneous speech signal by three talkers. The word correct rate (WCR) of ASR improves by 5 points under normal acoustic environments and by 10 points under noisy environments. Experimental results show that HARK 1.0.0 improves the robustness against noises.

UR - http://www.scopus.com/inward/record.url?scp=77955834429&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955834429&partnerID=8YFLogxK

U2 - 10.1109/ROBOT.2010.5509830

DO - 10.1109/ROBOT.2010.5509830

M3 - Conference contribution

SN - 9781424450381

SP - 470

EP - 475

BT - Proceedings - IEEE International Conference on Robotics and Automation

ER -