Making a robot recognize three simultaneous sentences in real-time

Shun'Ichi Yamamoto, Kazuhiro Nakadai, Jean Marc Valin, Jean Rouat, François Michaud, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

29 Citations (Scopus)

Abstract

A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. We have adopted the missing feature theory (MFT) for automatic recognition of separated speech, and developed the robot audition system. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. The automatic speech recognition based on MFT recognizes separated sounds by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. In this paper, we used the improved Julius as an MFT-based automatic speech recognizer (ASR). The Julius is a real-time large vocabulary continuous speech recognition (LVCSR) system. We performed the experiment to evaluate our robot audition system. In this experiment, the system recognizes a sentence, not an isolated word. We showed the improvement in the system performance through three simultaneous speech recognition on the humanoid SIG2.

Original languageEnglish
Title of host publication2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS
Pages897-902
Number of pages6
DOIs
Publication statusPublished - 2005 Dec 1
Externally publishedYes
EventIEEE IRS/RSJ International Conference on Intelligent Robots and Systems, IROS 2005 - Edmonton, AB, Canada
Duration: 2005 Aug 22005 Aug 6

Publication series

Name2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS

Conference

ConferenceIEEE IRS/RSJ International Conference on Intelligent Robots and Systems, IROS 2005
CountryCanada
CityEdmonton, AB
Period05/8/205/8/6

Keywords

  • Automatic missing feature mask generation
  • Continuous speech recognition
  • Missing feature theory
  • Robot audition

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Control and Systems Engineering

Fingerprint Dive into the research topics of 'Making a robot recognize three simultaneous sentences in real-time'. Together they form a unique fingerprint.

  • Cite this

    Yamamoto, SI., Nakadai, K., Valin, J. M., Rouat, J., Michaud, F., Komatani, K., Ogata, T., & Okuno, H. G. (2005). Making a robot recognize three simultaneous sentences in real-time. In 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (pp. 897-902). [1545094] (2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS). https://doi.org/10.1109/IROS.2005.1545094