Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation

Rya Takeda, ShuN'Ichi Yamamoto, Kazunori Komatoni, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Robot audition systems require capabilities for sound source separation and the recognition of separated sounds, since we hear a mixture of sounds in our daily lives, especially mixed of speech. We report a robot audition system with a pair of omni-directional microphones embedded in a humanoid that recognizes two simultaneous talkers. It first separates the sound sources by Independent Component Analysis (ICA) with the single-input multiple-output (SIMO) model. Then, spectral distortion in the separated sounds is then estimated to generate missing feature masks. Finally, the separated sounds are recognized by missing-feature theory (MFT) for Automatic Speech Recognition (ASR). The novel aspects of our system involve estimates of spectral distortion in the temporal-frequency domain in terms of feature vectors and based on estimates error in SIMO-ICA signals. The resulting system outperformed the baseline robot audition system by 7 %.

Original languageEnglish
Title of host publicationINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Pages2302-2305
Number of pages4
Volume5
Publication statusPublished - 2006
Externally publishedYes
EventINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP - Pittsburgh, PA
Duration: 2006 Sep 172006 Sep 21

Other

OtherINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
CityPittsburgh, PA
Period06/9/1706/9/21

Fingerprint

Blind source separation
Independent component analysis
Speech recognition
Masks
Acoustic waves
Audition
Robots
Source separation
Microphones

Keywords

  • ICA
  • Missing-feature mask
  • Robot audition

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Takeda, R., Yamamoto, SI., Komatoni, K., Ogata, T., & Okuno, H. G. (2006). Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation. In INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP (Vol. 5, pp. 2302-2305)

Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation. / Takeda, Rya; Yamamoto, ShuN'Ichi; Komatoni, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. Vol. 5 2006. p. 2302-2305.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Takeda, R, Yamamoto, SI, Komatoni, K, Ogata, T & Okuno, HG 2006, Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation. in INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. vol. 5, pp. 2302-2305, INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP, Pittsburgh, PA, 06/9/17.
Takeda R, Yamamoto SI, Komatoni K, Ogata T, Okuno HG. Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation. In INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. Vol. 5. 2006. p. 2302-2305
Takeda, Rya ; Yamamoto, ShuN'Ichi ; Komatoni, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation. INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. Vol. 5 2006. pp. 2302-2305
@inproceedings{67ca185a924244699d812cbedaed9504,
title = "Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation",
abstract = "Robot audition systems require capabilities for sound source separation and the recognition of separated sounds, since we hear a mixture of sounds in our daily lives, especially mixed of speech. We report a robot audition system with a pair of omni-directional microphones embedded in a humanoid that recognizes two simultaneous talkers. It first separates the sound sources by Independent Component Analysis (ICA) with the single-input multiple-output (SIMO) model. Then, spectral distortion in the separated sounds is then estimated to generate missing feature masks. Finally, the separated sounds are recognized by missing-feature theory (MFT) for Automatic Speech Recognition (ASR). The novel aspects of our system involve estimates of spectral distortion in the temporal-frequency domain in terms of feature vectors and based on estimates error in SIMO-ICA signals. The resulting system outperformed the baseline robot audition system by 7 {\%}.",
keywords = "ICA, Missing-feature mask, Robot audition",
author = "Rya Takeda and ShuN'Ichi Yamamoto and Kazunori Komatoni and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2006",
language = "English",
isbn = "9781604234497",
volume = "5",
pages = "2302--2305",
booktitle = "INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP",

}

TY - GEN

T1 - Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation

AU - Takeda, Rya

AU - Yamamoto, ShuN'Ichi

AU - Komatoni, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2006

Y1 - 2006

N2 - Robot audition systems require capabilities for sound source separation and the recognition of separated sounds, since we hear a mixture of sounds in our daily lives, especially mixed of speech. We report a robot audition system with a pair of omni-directional microphones embedded in a humanoid that recognizes two simultaneous talkers. It first separates the sound sources by Independent Component Analysis (ICA) with the single-input multiple-output (SIMO) model. Then, spectral distortion in the separated sounds is then estimated to generate missing feature masks. Finally, the separated sounds are recognized by missing-feature theory (MFT) for Automatic Speech Recognition (ASR). The novel aspects of our system involve estimates of spectral distortion in the temporal-frequency domain in terms of feature vectors and based on estimates error in SIMO-ICA signals. The resulting system outperformed the baseline robot audition system by 7 %.

AB - Robot audition systems require capabilities for sound source separation and the recognition of separated sounds, since we hear a mixture of sounds in our daily lives, especially mixed of speech. We report a robot audition system with a pair of omni-directional microphones embedded in a humanoid that recognizes two simultaneous talkers. It first separates the sound sources by Independent Component Analysis (ICA) with the single-input multiple-output (SIMO) model. Then, spectral distortion in the separated sounds is then estimated to generate missing feature masks. Finally, the separated sounds are recognized by missing-feature theory (MFT) for Automatic Speech Recognition (ASR). The novel aspects of our system involve estimates of spectral distortion in the temporal-frequency domain in terms of feature vectors and based on estimates error in SIMO-ICA signals. The resulting system outperformed the baseline robot audition system by 7 %.

KW - ICA

KW - Missing-feature mask

KW - Robot audition

UR - http://www.scopus.com/inward/record.url?scp=44949199368&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=44949199368&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781604234497

VL - 5

SP - 2302

EP - 2305

BT - INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP

ER -