Real-time robot audition system that recognizes simultaneous speech in the real world

Shun'ichi Yamamoto, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino, Jean Marc Valin, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

49 Citations (Scopus)

Abstract

This paper presents a robot audition system that recognizes simultaneous speech in the real world by using robot-embedded microphones. We have previously reported Missing Feature Theory (MFT) based integration of Sound Source Separation (SSS) and Automatic Speech Recognition (ASR) for building robust robot audition. We demonstrated that a MFT-based prototype system drastically improved the performance of speech recognition even when three speakers talked to a robot simultaneously. However, the prototype system had three problems; being offline, hand-tuning of system parameters, and failure in Voice Activity Detection (VAD). To attain online processing, we introduced FlowDesigner-based architecture to integrate sound source localization (SSL), SSS and ASR. This architecture brings fast processing and easy implementation because it provides a simple framework of shared-object-based integration. To optimize the parameters, we developed Genetic Algorithm (GA) based parameter optimization, because it is difficult to build an analytical optimization model for mutually dependent system parameters. To improve VAD, we integrated new VAD based on a power spectrum and location of a sound source into the system, since conventional VAD relying only on power often fails due to low signal-to-noise ratio of simultaneous speech. We, then, constructed a robot audition system for Honda ASIMO. As a result, we showed that the system worked online and fast, and had a better performance in robustness and accuracy through experiments on recognition of simultaneous speech in a noisy and echoic environment.

Original languageEnglish
Title of host publicationIEEE International Conference on Intelligent Robots and Systems
Pages5333-5338
Number of pages6
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006 - Beijing
Duration: 2006 Oct 92006 Oct 15

Other

Other2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006
CityBeijing
Period06/10/906/10/15

Fingerprint

Audition
Robots
Speech recognition
Acoustic waves
Source separation
Online systems
Microphones
Processing
Power spectrum
Signal to noise ratio
Tuning
Genetic algorithms

Keywords

  • Genetic algorithm
  • Missing feature theory
  • Parameter optimization
  • Real-time processing
  • Robot audition
  • Voice activity detection

ASJC Scopus subject areas

  • Control and Systems Engineering

Cite this

Yamamoto, S., Nakadai, K., Nakano, M., Tsujino, H., Valin, J. M., Komatani, K., ... Okuno, H. G. (2006). Real-time robot audition system that recognizes simultaneous speech in the real world. In IEEE International Conference on Intelligent Robots and Systems (pp. 5333-5338). [4059274] https://doi.org/10.1109/IROS.2006.282037

Real-time robot audition system that recognizes simultaneous speech in the real world. / Yamamoto, Shun'ichi; Nakadai, Kazuhiro; Nakano, Mikio; Tsujino, Hiroshi; Valin, Jean Marc; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

IEEE International Conference on Intelligent Robots and Systems. 2006. p. 5333-5338 4059274.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yamamoto, S, Nakadai, K, Nakano, M, Tsujino, H, Valin, JM, Komatani, K, Ogata, T & Okuno, HG 2006, Real-time robot audition system that recognizes simultaneous speech in the real world. in IEEE International Conference on Intelligent Robots and Systems., 4059274, pp. 5333-5338, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006, Beijing, 06/10/9. https://doi.org/10.1109/IROS.2006.282037
Yamamoto S, Nakadai K, Nakano M, Tsujino H, Valin JM, Komatani K et al. Real-time robot audition system that recognizes simultaneous speech in the real world. In IEEE International Conference on Intelligent Robots and Systems. 2006. p. 5333-5338. 4059274 https://doi.org/10.1109/IROS.2006.282037
Yamamoto, Shun'ichi ; Nakadai, Kazuhiro ; Nakano, Mikio ; Tsujino, Hiroshi ; Valin, Jean Marc ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Real-time robot audition system that recognizes simultaneous speech in the real world. IEEE International Conference on Intelligent Robots and Systems. 2006. pp. 5333-5338
@inproceedings{6339fd010db44fb8a5a5e2c4fd6112c1,
title = "Real-time robot audition system that recognizes simultaneous speech in the real world",
abstract = "This paper presents a robot audition system that recognizes simultaneous speech in the real world by using robot-embedded microphones. We have previously reported Missing Feature Theory (MFT) based integration of Sound Source Separation (SSS) and Automatic Speech Recognition (ASR) for building robust robot audition. We demonstrated that a MFT-based prototype system drastically improved the performance of speech recognition even when three speakers talked to a robot simultaneously. However, the prototype system had three problems; being offline, hand-tuning of system parameters, and failure in Voice Activity Detection (VAD). To attain online processing, we introduced FlowDesigner-based architecture to integrate sound source localization (SSL), SSS and ASR. This architecture brings fast processing and easy implementation because it provides a simple framework of shared-object-based integration. To optimize the parameters, we developed Genetic Algorithm (GA) based parameter optimization, because it is difficult to build an analytical optimization model for mutually dependent system parameters. To improve VAD, we integrated new VAD based on a power spectrum and location of a sound source into the system, since conventional VAD relying only on power often fails due to low signal-to-noise ratio of simultaneous speech. We, then, constructed a robot audition system for Honda ASIMO. As a result, we showed that the system worked online and fast, and had a better performance in robustness and accuracy through experiments on recognition of simultaneous speech in a noisy and echoic environment.",
keywords = "Genetic algorithm, Missing feature theory, Parameter optimization, Real-time processing, Robot audition, Voice activity detection",
author = "Shun'ichi Yamamoto and Kazuhiro Nakadai and Mikio Nakano and Hiroshi Tsujino and Valin, {Jean Marc} and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2006",
doi = "10.1109/IROS.2006.282037",
language = "English",
isbn = "142440259X",
pages = "5333--5338",
booktitle = "IEEE International Conference on Intelligent Robots and Systems",

}

TY - GEN

T1 - Real-time robot audition system that recognizes simultaneous speech in the real world

AU - Yamamoto, Shun'ichi

AU - Nakadai, Kazuhiro

AU - Nakano, Mikio

AU - Tsujino, Hiroshi

AU - Valin, Jean Marc

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2006

Y1 - 2006

N2 - This paper presents a robot audition system that recognizes simultaneous speech in the real world by using robot-embedded microphones. We have previously reported Missing Feature Theory (MFT) based integration of Sound Source Separation (SSS) and Automatic Speech Recognition (ASR) for building robust robot audition. We demonstrated that a MFT-based prototype system drastically improved the performance of speech recognition even when three speakers talked to a robot simultaneously. However, the prototype system had three problems; being offline, hand-tuning of system parameters, and failure in Voice Activity Detection (VAD). To attain online processing, we introduced FlowDesigner-based architecture to integrate sound source localization (SSL), SSS and ASR. This architecture brings fast processing and easy implementation because it provides a simple framework of shared-object-based integration. To optimize the parameters, we developed Genetic Algorithm (GA) based parameter optimization, because it is difficult to build an analytical optimization model for mutually dependent system parameters. To improve VAD, we integrated new VAD based on a power spectrum and location of a sound source into the system, since conventional VAD relying only on power often fails due to low signal-to-noise ratio of simultaneous speech. We, then, constructed a robot audition system for Honda ASIMO. As a result, we showed that the system worked online and fast, and had a better performance in robustness and accuracy through experiments on recognition of simultaneous speech in a noisy and echoic environment.

AB - This paper presents a robot audition system that recognizes simultaneous speech in the real world by using robot-embedded microphones. We have previously reported Missing Feature Theory (MFT) based integration of Sound Source Separation (SSS) and Automatic Speech Recognition (ASR) for building robust robot audition. We demonstrated that a MFT-based prototype system drastically improved the performance of speech recognition even when three speakers talked to a robot simultaneously. However, the prototype system had three problems; being offline, hand-tuning of system parameters, and failure in Voice Activity Detection (VAD). To attain online processing, we introduced FlowDesigner-based architecture to integrate sound source localization (SSL), SSS and ASR. This architecture brings fast processing and easy implementation because it provides a simple framework of shared-object-based integration. To optimize the parameters, we developed Genetic Algorithm (GA) based parameter optimization, because it is difficult to build an analytical optimization model for mutually dependent system parameters. To improve VAD, we integrated new VAD based on a power spectrum and location of a sound source into the system, since conventional VAD relying only on power often fails due to low signal-to-noise ratio of simultaneous speech. We, then, constructed a robot audition system for Honda ASIMO. As a result, we showed that the system worked online and fast, and had a better performance in robustness and accuracy through experiments on recognition of simultaneous speech in a noisy and echoic environment.

KW - Genetic algorithm

KW - Missing feature theory

KW - Parameter optimization

KW - Real-time processing

KW - Robot audition

KW - Voice activity detection

UR - http://www.scopus.com/inward/record.url?scp=34250652551&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34250652551&partnerID=8YFLogxK

U2 - 10.1109/IROS.2006.282037

DO - 10.1109/IROS.2006.282037

M3 - Conference contribution

AN - SCOPUS:34250652551

SN - 142440259X

SN - 9781424402595

SP - 5333

EP - 5338

BT - IEEE International Conference on Intelligent Robots and Systems

ER -