Recognition of simultaneous speech by estimating reliability of separated signals for robot audition

Shun'ichi Yamamoto, Ryu Takeda, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino, Jean Marc Valin, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

研究成果: Conference contribution

3 引用 (Scopus)

抄録

"Listening to several things at once" is a people's dream and one goal of AI and robot audition, because people can listen to at most two things at once according to psychophysical observations. Current noise reduction techniques cannot help to achieve this goal because they assume quasi-stationary noises, not interfering speech signals. Since robots are used in various environments, robot audition systems require minimum a priori information about their acoustic environments and speakers. We evaluate a missing feature theory approach that interfaces between sound source separation (SSS) and automatic speech recognition. The essential part is the estimate of reliability of each feature of separated sounds. We tested two kinds of robot audition systems that use SSS: independent component analysis (ICA) with two microphones, and geometric source separation (GSS) with eight microphones. For each SSS, automatic missing feature mask generation is developed. The recognition accuracy of two simultaneous speech improved to an average of 67.8 and 88.0% for ICA and GSS, respectively.

元の言語English
ホスト出版物のタイトルLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ページ484-494
ページ数11
4099 LNAI
出版物ステータスPublished - 2006
外部発表Yes
イベント9th Pacific Rim International Conference on Artificial Intelligence - Guilin
継続期間: 2006 8 72006 8 11

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
4099 LNAI
ISSN(印刷物)03029743
ISSN(電子版)16113349

Other

Other9th Pacific Rim International Conference on Artificial Intelligence
Guilin
期間06/8/706/8/11

Fingerprint

Source separation
Source Separation
Audition
Hearing
Robot
Robots
Acoustic waves
Independent component analysis
Noise
Independent Component Analysis
Microphones
Thing
Masks
Acoustics
Automatic Speech Recognition
Speech Signal
Noise Reduction
Noise abatement
Speech recognition
Mask

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

これを引用

Yamamoto, S., Takeda, R., Nakadai, K., Nakano, M., Tsujino, H., Valin, J. M., ... Okuno, H. G. (2006). Recognition of simultaneous speech by estimating reliability of separated signals for robot audition. : Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (巻 4099 LNAI, pp. 484-494). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 巻数 4099 LNAI).

Recognition of simultaneous speech by estimating reliability of separated signals for robot audition. / Yamamoto, Shun'ichi; Takeda, Ryu; Nakadai, Kazuhiro; Nakano, Mikio; Tsujino, Hiroshi; Valin, Jean Marc; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 巻 4099 LNAI 2006. p. 484-494 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 巻 4099 LNAI).

研究成果: Conference contribution

Yamamoto, S, Takeda, R, Nakadai, K, Nakano, M, Tsujino, H, Valin, JM, Komatani, K, Ogata, T & Okuno, HG 2006, Recognition of simultaneous speech by estimating reliability of separated signals for robot audition. : Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 巻. 4099 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 巻. 4099 LNAI, pp. 484-494, 9th Pacific Rim International Conference on Artificial Intelligence, Guilin, 06/8/7.
Yamamoto S, Takeda R, Nakadai K, Nakano M, Tsujino H, Valin JM その他. Recognition of simultaneous speech by estimating reliability of separated signals for robot audition. : Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 巻 4099 LNAI. 2006. p. 484-494. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Yamamoto, Shun'ichi ; Takeda, Ryu ; Nakadai, Kazuhiro ; Nakano, Mikio ; Tsujino, Hiroshi ; Valin, Jean Marc ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Recognition of simultaneous speech by estimating reliability of separated signals for robot audition. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 巻 4099 LNAI 2006. pp. 484-494 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{5d7e6065a8d044baaea637748b7e70bd,
title = "Recognition of simultaneous speech by estimating reliability of separated signals for robot audition",
abstract = "{"}Listening to several things at once{"} is a people's dream and one goal of AI and robot audition, because people can listen to at most two things at once according to psychophysical observations. Current noise reduction techniques cannot help to achieve this goal because they assume quasi-stationary noises, not interfering speech signals. Since robots are used in various environments, robot audition systems require minimum a priori information about their acoustic environments and speakers. We evaluate a missing feature theory approach that interfaces between sound source separation (SSS) and automatic speech recognition. The essential part is the estimate of reliability of each feature of separated sounds. We tested two kinds of robot audition systems that use SSS: independent component analysis (ICA) with two microphones, and geometric source separation (GSS) with eight microphones. For each SSS, automatic missing feature mask generation is developed. The recognition accuracy of two simultaneous speech improved to an average of 67.8 and 88.0{\%} for ICA and GSS, respectively.",
author = "Shun'ichi Yamamoto and Ryu Takeda and Kazuhiro Nakadai and Mikio Nakano and Hiroshi Tsujino and Valin, {Jean Marc} and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2006",
language = "English",
isbn = "3540366679",
volume = "4099 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "484--494",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Recognition of simultaneous speech by estimating reliability of separated signals for robot audition

AU - Yamamoto, Shun'ichi

AU - Takeda, Ryu

AU - Nakadai, Kazuhiro

AU - Nakano, Mikio

AU - Tsujino, Hiroshi

AU - Valin, Jean Marc

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2006

Y1 - 2006

N2 - "Listening to several things at once" is a people's dream and one goal of AI and robot audition, because people can listen to at most two things at once according to psychophysical observations. Current noise reduction techniques cannot help to achieve this goal because they assume quasi-stationary noises, not interfering speech signals. Since robots are used in various environments, robot audition systems require minimum a priori information about their acoustic environments and speakers. We evaluate a missing feature theory approach that interfaces between sound source separation (SSS) and automatic speech recognition. The essential part is the estimate of reliability of each feature of separated sounds. We tested two kinds of robot audition systems that use SSS: independent component analysis (ICA) with two microphones, and geometric source separation (GSS) with eight microphones. For each SSS, automatic missing feature mask generation is developed. The recognition accuracy of two simultaneous speech improved to an average of 67.8 and 88.0% for ICA and GSS, respectively.

AB - "Listening to several things at once" is a people's dream and one goal of AI and robot audition, because people can listen to at most two things at once according to psychophysical observations. Current noise reduction techniques cannot help to achieve this goal because they assume quasi-stationary noises, not interfering speech signals. Since robots are used in various environments, robot audition systems require minimum a priori information about their acoustic environments and speakers. We evaluate a missing feature theory approach that interfaces between sound source separation (SSS) and automatic speech recognition. The essential part is the estimate of reliability of each feature of separated sounds. We tested two kinds of robot audition systems that use SSS: independent component analysis (ICA) with two microphones, and geometric source separation (GSS) with eight microphones. For each SSS, automatic missing feature mask generation is developed. The recognition accuracy of two simultaneous speech improved to an average of 67.8 and 88.0% for ICA and GSS, respectively.

UR - http://www.scopus.com/inward/record.url?scp=33749539191&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33749539191&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33749539191

SN - 3540366679

SN - 9783540366676

VL - 4099 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 484

EP - 494

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -