Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears

Ryu Takeda, Shun'ichi Yamamoto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

Robot audition is a critical technology in making robots symbiosis with people. Since we hear a mixture of sounds in our daily lives, sound source localization and separation, and recognition of separated sounds are three essential capabilities. Sound source localization has been recently studied well for robots, while the other capabilities still need extensive studies. This paper reports the robot audition system with a pair of omni-directional microphones embedded in a humanoid to recognize two simultaneous talkers. It first separates sound sources by Independent Component Analysis (ICA) with single-input multiple-output (SIMO) model. Then, spectral distortion for separated sounds is estimated to identify reliable and unreliable components of the spectrogram. This estimation generates the missing feature masks as spectrographic masks. These masks are then used to avoid influences caused by spectral distortion in automatic speech recognition based on missing-feature method. The novel ideas of our system reside in estimates of spectral distortion of temporal-frequency domain in terms of feature vectors. In addition, we point out that the voice-activity detection (VAD) is effective to overcome the weak point of ICA against the changing number of talkers. The resulting system outperformed the baseline robot audition system by 15 %.

Original languageEnglish
Title of host publicationIEEE International Conference on Intelligent Robots and Systems
Pages878-885
Number of pages8
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006 - Beijing
Duration: 2006 Oct 92006 Oct 15

Other

Other2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006
CityBeijing
Period06/10/906/10/15

Fingerprint

Independent component analysis
Speech recognition
Acoustic waves
Robots
Audition
Masks
Microphones

Keywords

  • Automatic speech recognition
  • ICA
  • Missing-feature methods
  • Multiple speakers
  • Robot audition

ASJC Scopus subject areas

  • Control and Systems Engineering

Cite this

Takeda, R., Yamamoto, S., Komatani, K., Ogata, T., & Okuno, H. G. (2006). Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears. In IEEE International Conference on Intelligent Robots and Systems (pp. 878-885). [4058472] https://doi.org/10.1109/IROS.2006.281741

Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears. / Takeda, Ryu; Yamamoto, Shun'ichi; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

IEEE International Conference on Intelligent Robots and Systems. 2006. p. 878-885 4058472.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Takeda, R, Yamamoto, S, Komatani, K, Ogata, T & Okuno, HG 2006, Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears. in IEEE International Conference on Intelligent Robots and Systems., 4058472, pp. 878-885, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006, Beijing, 06/10/9. https://doi.org/10.1109/IROS.2006.281741
Takeda R, Yamamoto S, Komatani K, Ogata T, Okuno HG. Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears. In IEEE International Conference on Intelligent Robots and Systems. 2006. p. 878-885. 4058472 https://doi.org/10.1109/IROS.2006.281741
Takeda, Ryu ; Yamamoto, Shun'ichi ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears. IEEE International Conference on Intelligent Robots and Systems. 2006. pp. 878-885
@inproceedings{2397c5f9261544a6b7319fac15ec0206,
title = "Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears",
abstract = "Robot audition is a critical technology in making robots symbiosis with people. Since we hear a mixture of sounds in our daily lives, sound source localization and separation, and recognition of separated sounds are three essential capabilities. Sound source localization has been recently studied well for robots, while the other capabilities still need extensive studies. This paper reports the robot audition system with a pair of omni-directional microphones embedded in a humanoid to recognize two simultaneous talkers. It first separates sound sources by Independent Component Analysis (ICA) with single-input multiple-output (SIMO) model. Then, spectral distortion for separated sounds is estimated to identify reliable and unreliable components of the spectrogram. This estimation generates the missing feature masks as spectrographic masks. These masks are then used to avoid influences caused by spectral distortion in automatic speech recognition based on missing-feature method. The novel ideas of our system reside in estimates of spectral distortion of temporal-frequency domain in terms of feature vectors. In addition, we point out that the voice-activity detection (VAD) is effective to overcome the weak point of ICA against the changing number of talkers. The resulting system outperformed the baseline robot audition system by 15 {\%}.",
keywords = "Automatic speech recognition, ICA, Missing-feature methods, Multiple speakers, Robot audition",
author = "Ryu Takeda and Shun'ichi Yamamoto and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2006",
doi = "10.1109/IROS.2006.281741",
language = "English",
isbn = "142440259X",
pages = "878--885",
booktitle = "IEEE International Conference on Intelligent Robots and Systems",

}

TY - GEN

T1 - Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears

AU - Takeda, Ryu

AU - Yamamoto, Shun'ichi

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2006

Y1 - 2006

N2 - Robot audition is a critical technology in making robots symbiosis with people. Since we hear a mixture of sounds in our daily lives, sound source localization and separation, and recognition of separated sounds are three essential capabilities. Sound source localization has been recently studied well for robots, while the other capabilities still need extensive studies. This paper reports the robot audition system with a pair of omni-directional microphones embedded in a humanoid to recognize two simultaneous talkers. It first separates sound sources by Independent Component Analysis (ICA) with single-input multiple-output (SIMO) model. Then, spectral distortion for separated sounds is estimated to identify reliable and unreliable components of the spectrogram. This estimation generates the missing feature masks as spectrographic masks. These masks are then used to avoid influences caused by spectral distortion in automatic speech recognition based on missing-feature method. The novel ideas of our system reside in estimates of spectral distortion of temporal-frequency domain in terms of feature vectors. In addition, we point out that the voice-activity detection (VAD) is effective to overcome the weak point of ICA against the changing number of talkers. The resulting system outperformed the baseline robot audition system by 15 %.

AB - Robot audition is a critical technology in making robots symbiosis with people. Since we hear a mixture of sounds in our daily lives, sound source localization and separation, and recognition of separated sounds are three essential capabilities. Sound source localization has been recently studied well for robots, while the other capabilities still need extensive studies. This paper reports the robot audition system with a pair of omni-directional microphones embedded in a humanoid to recognize two simultaneous talkers. It first separates sound sources by Independent Component Analysis (ICA) with single-input multiple-output (SIMO) model. Then, spectral distortion for separated sounds is estimated to identify reliable and unreliable components of the spectrogram. This estimation generates the missing feature masks as spectrographic masks. These masks are then used to avoid influences caused by spectral distortion in automatic speech recognition based on missing-feature method. The novel ideas of our system reside in estimates of spectral distortion of temporal-frequency domain in terms of feature vectors. In addition, we point out that the voice-activity detection (VAD) is effective to overcome the weak point of ICA against the changing number of talkers. The resulting system outperformed the baseline robot audition system by 15 %.

KW - Automatic speech recognition

KW - ICA

KW - Missing-feature methods

KW - Multiple speakers

KW - Robot audition

UR - http://www.scopus.com/inward/record.url?scp=34250689497&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34250689497&partnerID=8YFLogxK

U2 - 10.1109/IROS.2006.281741

DO - 10.1109/IROS.2006.281741

M3 - Conference contribution

SN - 142440259X

SN - 9781424402595

SP - 878

EP - 885

BT - IEEE International Conference on Intelligent Robots and Systems

ER -