Recognition of simultaneous speech by estimating reliability of separated signals for robot audition

Shun'ichi Yamamoto, Ryu Takeda, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino, Jean Marc Valin, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

"Listening to several things at once" is a people's dream and one goal of AI and robot audition, because people can listen to at most two things at once according to psychophysical observations. Current noise reduction techniques cannot help to achieve this goal because they assume quasi-stationary noises, not interfering speech signals. Since robots are used in various environments, robot audition systems require minimum a priori information about their acoustic environments and speakers. We evaluate a missing feature theory approach that interfaces between sound source separation (SSS) and automatic speech recognition. The essential part is the estimate of reliability of each feature of separated sounds. We tested two kinds of robot audition systems that use SSS: independent component analysis (ICA) with two microphones, and geometric source separation (GSS) with eight microphones. For each SSS, automatic missing feature mask generation is developed. The recognition accuracy of two simultaneous speech improved to an average of 67.8 and 88.0% for ICA and GSS, respectively.

Original languageEnglish
Title of host publicationPRICAI 2006
Subtitle of host publicationTrends in Artificial Intelligence - 9th Pacific Rim International Conference on Artificial Intelligence, Proceedings
PublisherSpringer Verlag
Pages484-494
Number of pages11
ISBN (Print)3540366679, 9783540366676
Publication statusPublished - 2006 Jan 1
Event9th Pacific Rim International Conference on Artificial Intelligence - Guilin, China
Duration: 2006 Aug 72006 Aug 11

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4099 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th Pacific Rim International Conference on Artificial Intelligence
CountryChina
CityGuilin
Period06/8/706/8/11

    Fingerprint

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Yamamoto, S., Takeda, R., Nakadai, K., Nakano, M., Tsujino, H., Valin, J. M., Komatani, K., Ogata, T., & Okuno, H. G. (2006). Recognition of simultaneous speech by estimating reliability of separated signals for robot audition. In PRICAI 2006: Trends in Artificial Intelligence - 9th Pacific Rim International Conference on Artificial Intelligence, Proceedings (pp. 484-494). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4099 LNAI). Springer Verlag.