Recognition of simultaneous speech by estimating reliability of separated signals for robot audition

Shun'ichi Yamamoto, Ryu Takeda, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino, Jean Marc Valin, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

"Listening to several things at once" is a people's dream and one goal of AI and robot audition, because people can listen to at most two things at once according to psychophysical observations. Current noise reduction techniques cannot help to achieve this goal because they assume quasi-stationary noises, not interfering speech signals. Since robots are used in various environments, robot audition systems require minimum a priori information about their acoustic environments and speakers. We evaluate a missing feature theory approach that interfaces between sound source separation (SSS) and automatic speech recognition. The essential part is the estimate of reliability of each feature of separated sounds. We tested two kinds of robot audition systems that use SSS: independent component analysis (ICA) with two microphones, and geometric source separation (GSS) with eight microphones. For each SSS, automatic missing feature mask generation is developed. The recognition accuracy of two simultaneous speech improved to an average of 67.8 and 88.0% for ICA and GSS, respectively.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages484-494
Number of pages11
Volume4099 LNAI
Publication statusPublished - 2006
Externally publishedYes
Event9th Pacific Rim International Conference on Artificial Intelligence - Guilin
Duration: 2006 Aug 72006 Aug 11

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4099 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other9th Pacific Rim International Conference on Artificial Intelligence
CityGuilin
Period06/8/706/8/11

Fingerprint

Source separation
Source Separation
Audition
Hearing
Robot
Robots
Acoustic waves
Independent component analysis
Noise
Independent Component Analysis
Microphones
Thing
Masks
Acoustics
Automatic Speech Recognition
Speech Signal
Noise Reduction
Noise abatement
Speech recognition
Mask

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Yamamoto, S., Takeda, R., Nakadai, K., Nakano, M., Tsujino, H., Valin, J. M., ... Okuno, H. G. (2006). Recognition of simultaneous speech by estimating reliability of separated signals for robot audition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4099 LNAI, pp. 484-494). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4099 LNAI).

Recognition of simultaneous speech by estimating reliability of separated signals for robot audition. / Yamamoto, Shun'ichi; Takeda, Ryu; Nakadai, Kazuhiro; Nakano, Mikio; Tsujino, Hiroshi; Valin, Jean Marc; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4099 LNAI 2006. p. 484-494 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4099 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yamamoto, S, Takeda, R, Nakadai, K, Nakano, M, Tsujino, H, Valin, JM, Komatani, K, Ogata, T & Okuno, HG 2006, Recognition of simultaneous speech by estimating reliability of separated signals for robot audition. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 4099 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4099 LNAI, pp. 484-494, 9th Pacific Rim International Conference on Artificial Intelligence, Guilin, 06/8/7.
Yamamoto S, Takeda R, Nakadai K, Nakano M, Tsujino H, Valin JM et al. Recognition of simultaneous speech by estimating reliability of separated signals for robot audition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4099 LNAI. 2006. p. 484-494. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Yamamoto, Shun'ichi ; Takeda, Ryu ; Nakadai, Kazuhiro ; Nakano, Mikio ; Tsujino, Hiroshi ; Valin, Jean Marc ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Recognition of simultaneous speech by estimating reliability of separated signals for robot audition. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4099 LNAI 2006. pp. 484-494 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{5d7e6065a8d044baaea637748b7e70bd,
title = "Recognition of simultaneous speech by estimating reliability of separated signals for robot audition",
abstract = "{"}Listening to several things at once{"} is a people's dream and one goal of AI and robot audition, because people can listen to at most two things at once according to psychophysical observations. Current noise reduction techniques cannot help to achieve this goal because they assume quasi-stationary noises, not interfering speech signals. Since robots are used in various environments, robot audition systems require minimum a priori information about their acoustic environments and speakers. We evaluate a missing feature theory approach that interfaces between sound source separation (SSS) and automatic speech recognition. The essential part is the estimate of reliability of each feature of separated sounds. We tested two kinds of robot audition systems that use SSS: independent component analysis (ICA) with two microphones, and geometric source separation (GSS) with eight microphones. For each SSS, automatic missing feature mask generation is developed. The recognition accuracy of two simultaneous speech improved to an average of 67.8 and 88.0{\%} for ICA and GSS, respectively.",
author = "Shun'ichi Yamamoto and Ryu Takeda and Kazuhiro Nakadai and Mikio Nakano and Hiroshi Tsujino and Valin, {Jean Marc} and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2006",
language = "English",
isbn = "3540366679",
volume = "4099 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "484--494",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Recognition of simultaneous speech by estimating reliability of separated signals for robot audition

AU - Yamamoto, Shun'ichi

AU - Takeda, Ryu

AU - Nakadai, Kazuhiro

AU - Nakano, Mikio

AU - Tsujino, Hiroshi

AU - Valin, Jean Marc

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2006

Y1 - 2006

N2 - "Listening to several things at once" is a people's dream and one goal of AI and robot audition, because people can listen to at most two things at once according to psychophysical observations. Current noise reduction techniques cannot help to achieve this goal because they assume quasi-stationary noises, not interfering speech signals. Since robots are used in various environments, robot audition systems require minimum a priori information about their acoustic environments and speakers. We evaluate a missing feature theory approach that interfaces between sound source separation (SSS) and automatic speech recognition. The essential part is the estimate of reliability of each feature of separated sounds. We tested two kinds of robot audition systems that use SSS: independent component analysis (ICA) with two microphones, and geometric source separation (GSS) with eight microphones. For each SSS, automatic missing feature mask generation is developed. The recognition accuracy of two simultaneous speech improved to an average of 67.8 and 88.0% for ICA and GSS, respectively.

AB - "Listening to several things at once" is a people's dream and one goal of AI and robot audition, because people can listen to at most two things at once according to psychophysical observations. Current noise reduction techniques cannot help to achieve this goal because they assume quasi-stationary noises, not interfering speech signals. Since robots are used in various environments, robot audition systems require minimum a priori information about their acoustic environments and speakers. We evaluate a missing feature theory approach that interfaces between sound source separation (SSS) and automatic speech recognition. The essential part is the estimate of reliability of each feature of separated sounds. We tested two kinds of robot audition systems that use SSS: independent component analysis (ICA) with two microphones, and geometric source separation (GSS) with eight microphones. For each SSS, automatic missing feature mask generation is developed. The recognition accuracy of two simultaneous speech improved to an average of 67.8 and 88.0% for ICA and GSS, respectively.

UR - http://www.scopus.com/inward/record.url?scp=33749539191&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33749539191&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33749539191

SN - 3540366679

SN - 9783540366676

VL - 4099 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 484

EP - 494

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -