Interfacing sound stream segregation to automatic speech recognition - preliminary results on listening to several sounds simultaneously

Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata

研究成果: Conference contribution

6 引用 (Scopus)

抜粋

This paper reports the preliminary results of experiments on listening to several sounds at once. Two issues are addressed: segregating speech streams from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition (ASR). Speech stream segregation (SSS) is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting some sounds for non-harmonic parts of groups. This system is implemented by extending the harmonic-based stream segregation system reported at AAAI-94 and IJCAI-95. The main problem in interfacing SSS with HMM-based ASR is how to improve the recognition performance which is degraded by spectral distortion of segregated sounds caused mainly by the binaural input, grouping, and residue substitution. Our solution is to re-train the parameters of the HMM with training data binauralized for four directions, to group harmonic fragments according to their directions, and to substitute the residue of harmonic fragments for non-harmonic parts of each group. Experiments with 500 mixtures of two women's utterances of a word showed that the cumulative accuracy of word recognition up to the 10th candidate of each woman's utterance is, on average, 75%.

元の言語English
ホスト出版物のタイトルProceedings of the National Conference on Artificial Intelligence
編集者 Anon
出版場所Menlo Park, CA, United States
出版者AAAI
ページ1082-1089
ページ数8
2
出版物ステータスPublished - 1996
外部発表Yes
イベントProceedings of the 1996 13th National Conference on Artificial Intelligence. Part 2 (of 2) - Portland, OR, USA
継続期間: 1996 8 41996 8 8

Other

OtherProceedings of the 1996 13th National Conference on Artificial Intelligence. Part 2 (of 2)
Portland, OR, USA
期間96/8/496/8/8

ASJC Scopus subject areas

  • Software

フィンガープリント Interfacing sound stream segregation to automatic speech recognition - preliminary results on listening to several sounds simultaneously' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用

    Okuno, H. G., Nakatani, T., & Kawabata, T. (1996). Interfacing sound stream segregation to automatic speech recognition - preliminary results on listening to several sounds simultaneously. : Anon (版), Proceedings of the National Conference on Artificial Intelligence (巻 2, pp. 1082-1089). AAAI.