Robot audition is a critical technology in creating an intelligent robot operating in daily environments. We have developed such a robot audition system by using a new interface between sound source separation and automatic speech recognition (ASR). A mixture of speeches captured with a pair of microphones installed in the ear positions of a humanoid is separated into each speech by using active direction-pass filter (ADPF). The ADPF extracts a sound source originating from a specific direction in real-time by using interaural phase and intensity differences. The separated speech is recognized by a speech recognizer based on the missing feature theory (MFT). By using a missing feature mask, the MFT based ASR neglects distorted and missing features caused during the speech separation. A missing feature mask for each separated speech is generated in speech separation and is sent to the ASR with the separated speech. Thus, this new integration improves the performance of ASR. However, the generality of this robot audition system has not been assessed so far. In this paper, we assess its general applicability by implementing it on the three humanoids, i.e., ASIMO of Honda, SIG2, and Replie of Kyoto University. By using three simultaneous speeches as benchmarks, the robot audition system improved the performance of ASR over 50% in every humanoid, and thus its general applicability was confirmed.
|ホスト出版物のタイトル||2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)|
|出版ステータス||Published - 2004|
|イベント||2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) - Sendai|
継続期間: 2004 9月 28 → 2004 10月 2
|Other||2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)|
|Period||04/9/28 → 04/10/2|
ASJC Scopus subject areas