Making a robot recognize three simultaneous sentences in real-time

Shun'Ichi Yamamoto, Kazuhiro Nakadai, Jean Marc Valin, Jean Rouat, François Michaud, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

29 Citations (Scopus)

Abstract

A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. We have adopted the missing feature theory (MFT) for automatic recognition of separated speech, and developed the robot audition system. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. The automatic speech recognition based on MFT recognizes separated sounds by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. In this paper, we used the improved Julius as an MFT-based automatic speech recognizer (ASR). The Julius is a real-time large vocabulary continuous speech recognition (LVCSR) system. We performed the experiment to evaluate our robot audition system. In this experiment, the system recognizes a sentence, not an isolated word. We showed the improvement in the system performance through three simultaneous speech recognition on the humanoid SIG2.

Original languageEnglish
Title of host publication2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS
Pages897-902
Number of pages6
DOIs
Publication statusPublished - 2005
Externally publishedYes
EventIEEE IRS/RSJ International Conference on Intelligent Robots and Systems, IROS 2005 - Edmonton, AB
Duration: 2005 Aug 22005 Aug 6

Other

OtherIEEE IRS/RSJ International Conference on Intelligent Robots and Systems, IROS 2005
CityEdmonton, AB
Period05/8/205/8/6

Fingerprint

Acoustic waves
Robots
Audition
Source separation
Speech recognition
Masks
Continuous speech recognition
Microphones
Acoustics
Experiments

Keywords

  • Automatic missing feature mask generation
  • Continuous speech recognition
  • Missing feature theory
  • Robot audition

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Control and Systems Engineering

Cite this

Yamamoto, SI., Nakadai, K., Valin, J. M., Rouat, J., Michaud, F., Komatani, K., ... Okuno, H. G. (2005). Making a robot recognize three simultaneous sentences in real-time. In 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (pp. 897-902). [1545094] https://doi.org/10.1109/IROS.2005.1545094

Making a robot recognize three simultaneous sentences in real-time. / Yamamoto, Shun'Ichi; Nakadai, Kazuhiro; Valin, Jean Marc; Rouat, Jean; Michaud, François; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS. 2005. p. 897-902 1545094.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yamamoto, SI, Nakadai, K, Valin, JM, Rouat, J, Michaud, F, Komatani, K, Ogata, T & Okuno, HG 2005, Making a robot recognize three simultaneous sentences in real-time. in 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS., 1545094, pp. 897-902, IEEE IRS/RSJ International Conference on Intelligent Robots and Systems, IROS 2005, Edmonton, AB, 05/8/2. https://doi.org/10.1109/IROS.2005.1545094
Yamamoto SI, Nakadai K, Valin JM, Rouat J, Michaud F, Komatani K et al. Making a robot recognize three simultaneous sentences in real-time. In 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS. 2005. p. 897-902. 1545094 https://doi.org/10.1109/IROS.2005.1545094
Yamamoto, Shun'Ichi ; Nakadai, Kazuhiro ; Valin, Jean Marc ; Rouat, Jean ; Michaud, François ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Making a robot recognize three simultaneous sentences in real-time. 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS. 2005. pp. 897-902
@inproceedings{18b82018039b457e92ecc21807028718,
title = "Making a robot recognize three simultaneous sentences in real-time",
abstract = "A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. We have adopted the missing feature theory (MFT) for automatic recognition of separated speech, and developed the robot audition system. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. The automatic speech recognition based on MFT recognizes separated sounds by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. In this paper, we used the improved Julius as an MFT-based automatic speech recognizer (ASR). The Julius is a real-time large vocabulary continuous speech recognition (LVCSR) system. We performed the experiment to evaluate our robot audition system. In this experiment, the system recognizes a sentence, not an isolated word. We showed the improvement in the system performance through three simultaneous speech recognition on the humanoid SIG2.",
keywords = "Automatic missing feature mask generation, Continuous speech recognition, Missing feature theory, Robot audition",
author = "Shun'Ichi Yamamoto and Kazuhiro Nakadai and Valin, {Jean Marc} and Jean Rouat and Fran{\cc}ois Michaud and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2005",
doi = "10.1109/IROS.2005.1545094",
language = "English",
isbn = "0780389123",
pages = "897--902",
booktitle = "2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS",

}

TY - GEN

T1 - Making a robot recognize three simultaneous sentences in real-time

AU - Yamamoto, Shun'Ichi

AU - Nakadai, Kazuhiro

AU - Valin, Jean Marc

AU - Rouat, Jean

AU - Michaud, François

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2005

Y1 - 2005

N2 - A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. We have adopted the missing feature theory (MFT) for automatic recognition of separated speech, and developed the robot audition system. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. The automatic speech recognition based on MFT recognizes separated sounds by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. In this paper, we used the improved Julius as an MFT-based automatic speech recognizer (ASR). The Julius is a real-time large vocabulary continuous speech recognition (LVCSR) system. We performed the experiment to evaluate our robot audition system. In this experiment, the system recognizes a sentence, not an isolated word. We showed the improvement in the system performance through three simultaneous speech recognition on the humanoid SIG2.

AB - A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. We have adopted the missing feature theory (MFT) for automatic recognition of separated speech, and developed the robot audition system. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. The automatic speech recognition based on MFT recognizes separated sounds by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. In this paper, we used the improved Julius as an MFT-based automatic speech recognizer (ASR). The Julius is a real-time large vocabulary continuous speech recognition (LVCSR) system. We performed the experiment to evaluate our robot audition system. In this experiment, the system recognizes a sentence, not an isolated word. We showed the improvement in the system performance through three simultaneous speech recognition on the humanoid SIG2.

KW - Automatic missing feature mask generation

KW - Continuous speech recognition

KW - Missing feature theory

KW - Robot audition

UR - http://www.scopus.com/inward/record.url?scp=79957986619&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79957986619&partnerID=8YFLogxK

U2 - 10.1109/IROS.2005.1545094

DO - 10.1109/IROS.2005.1545094

M3 - Conference contribution

AN - SCOPUS:79957986619

SN - 0780389123

SN - 9780780389120

SP - 897

EP - 902

BT - 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS

ER -