TY - GEN
T1 - Making a robot recognize three simultaneous sentences in real-time
AU - Yamamoto, Shun'Ichi
AU - Nakadai, Kazuhiro
AU - Valin, Jean Marc
AU - Rouat, Jean
AU - Michaud, François
AU - Komatani, Kazunori
AU - Ogata, Tetsuya
AU - Okuno, Hiroshi G.
N1 - Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2005
Y1 - 2005
N2 - A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. We have adopted the missing feature theory (MFT) for automatic recognition of separated speech, and developed the robot audition system. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. The automatic speech recognition based on MFT recognizes separated sounds by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. In this paper, we used the improved Julius as an MFT-based automatic speech recognizer (ASR). The Julius is a real-time large vocabulary continuous speech recognition (LVCSR) system. We performed the experiment to evaluate our robot audition system. In this experiment, the system recognizes a sentence, not an isolated word. We showed the improvement in the system performance through three simultaneous speech recognition on the humanoid SIG2.
AB - A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. We have adopted the missing feature theory (MFT) for automatic recognition of separated speech, and developed the robot audition system. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. The automatic speech recognition based on MFT recognizes separated sounds by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. In this paper, we used the improved Julius as an MFT-based automatic speech recognizer (ASR). The Julius is a real-time large vocabulary continuous speech recognition (LVCSR) system. We performed the experiment to evaluate our robot audition system. In this experiment, the system recognizes a sentence, not an isolated word. We showed the improvement in the system performance through three simultaneous speech recognition on the humanoid SIG2.
KW - Automatic missing feature mask generation
KW - Continuous speech recognition
KW - Missing feature theory
KW - Robot audition
UR - http://www.scopus.com/inward/record.url?scp=79957986619&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79957986619&partnerID=8YFLogxK
U2 - 10.1109/IROS.2005.1545094
DO - 10.1109/IROS.2005.1545094
M3 - Conference contribution
AN - SCOPUS:79957986619
SN - 0780389123
SN - 9780780389120
T3 - 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS
SP - 4040
EP - 4045
BT - 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS
PB - IEEE Computer Society
ER -