Real-time robot audition system that recognizes simultaneous speech in the real world

Shun'ichi Yamamoto*, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino, Jean Marc Valin, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

*この研究の対応する著者

研究成果: Conference contribution

53 被引用数 (Scopus)

抄録

This paper presents a robot audition system that recognizes simultaneous speech in the real world by using robot-embedded microphones. We have previously reported Missing Feature Theory (MFT) based integration of Sound Source Separation (SSS) and Automatic Speech Recognition (ASR) for building robust robot audition. We demonstrated that a MFT-based prototype system drastically improved the performance of speech recognition even when three speakers talked to a robot simultaneously. However, the prototype system had three problems; being offline, hand-tuning of system parameters, and failure in Voice Activity Detection (VAD). To attain online processing, we introduced FlowDesigner-based architecture to integrate sound source localization (SSL), SSS and ASR. This architecture brings fast processing and easy implementation because it provides a simple framework of shared-object-based integration. To optimize the parameters, we developed Genetic Algorithm (GA) based parameter optimization, because it is difficult to build an analytical optimization model for mutually dependent system parameters. To improve VAD, we integrated new VAD based on a power spectrum and location of a sound source into the system, since conventional VAD relying only on power often fails due to low signal-to-noise ratio of simultaneous speech. We, then, constructed a robot audition system for Honda ASIMO. As a result, we showed that the system worked online and fast, and had a better performance in robustness and accuracy through experiments on recognition of simultaneous speech in a noisy and echoic environment.

本文言語English
ホスト出版物のタイトル2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006
ページ5333-5338
ページ数6
DOI
出版ステータスPublished - 2006
外部発表はい
イベント2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006 - Beijing, China
継続期間: 2006 10 92006 10 15

出版物シリーズ

名前IEEE International Conference on Intelligent Robots and Systems

Conference

Conference2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006
国/地域China
CityBeijing
Period06/10/906/10/15

ASJC Scopus subject areas

  • 制御およびシステム工学
  • ソフトウェア
  • コンピュータ ビジョンおよびパターン認識
  • コンピュータ サイエンスの応用

フィンガープリント

「Real-time robot audition system that recognizes simultaneous speech in the real world」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル