Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation

Rya Takeda, Shu N.Ichi Yamamoto, Kazunori Komatoni, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Robot audition systems require capabilities for sound source separation and the recognition of separated sounds, since we hear a mixture of sounds in our daily lives, especially mixed of speech. We report a robot audition system with a pair of omni-directional microphones embedded in a humanoid that recognizes two simultaneous talkers. It first separates the sound sources by Independent Component Analysis (ICA) with the single-input multiple-output (SIMO) model. Then, spectral distortion in the separated sounds is then estimated to generate missing feature masks. Finally, the separated sounds are recognized by missing-feature theory (MFT) for Automatic Speech Recognition (ASR). The novel aspects of our system involve estimates of spectral distortion in the temporal-frequency domain in terms of feature vectors and based on estimates error in SIMO-ICA signals. The resulting system outperformed the baseline robot audition system by 7 %.

Original languageEnglish
Title of host publicationINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
PublisherInternational Speech Communication Association
Pages2302-2305
Number of pages4
ISBN (Print)9781604234497
Publication statusPublished - 2006 Jan 1
EventINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP - Pittsburgh, PA, United States
Duration: 2006 Sep 172006 Sep 21

Publication series

NameINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Volume5

Conference

ConferenceINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
CountryUnited States
CityPittsburgh, PA
Period06/9/1706/9/21

    Fingerprint

Keywords

  • ICA
  • Missing-feature mask
  • Robot audition

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Takeda, R., Yamamoto, S. N. I., Komatoni, K., Ogata, T., & Okuno, H. G. (2006). Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation. In INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP (pp. 2302-2305). (INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP; Vol. 5). International Speech Communication Association.