Dictation of multiparty conversation using statistical turn taking model and speaker model

Noriyuki Murai, Tetsunori Kobayashi

研究成果: Conference contribution

6 引用 (Scopus)

抜粋

A new speech decoder dealing with multiparty conversation is proposed. Multiparty conversation denotes a situation in which many speakers talk to each other. Almost of all conventional speech recognition systems assume that the input data consist of single speaker's voice. However, some applications, such as dialogue dictation and voice interfaces for multi-users, have to deal with mixed speakers' voices. In such a situation, the system has to recognize not only the word sequence of the input speech but also the speaker of each part of them. Therefore, we propose a decoder utilizing not only an acoustic model and language model, which are the resources of a conventional single-user speech decoder, but also a statistic turn taking model and speakers models to recognize speech. This framework realizes simultaneous maximum likelihood estimation of spoken word sequence and the speaker sequence. Experimental results using a TV sports news show that the proposed method reduce the word error rate by 7.7% and speaker error rate by 97.8% compared to the conventional method.

元の言語English
ホスト出版物のタイトルSpeech Processing II
出版者Institute of Electrical and Electronics Engineers Inc.
ページ1575-1578
ページ数4
ISBN(電子版)0780362934
DOI
出版物ステータスPublished - 2000 1 1
イベント25th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000 - Istanbul, Turkey
継続期間: 2000 6 52000 6 9

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
3
ISSN(印刷物)1520-6149

Conference

Conference25th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000
Turkey
Istanbul
期間00/6/500/6/9

    フィンガープリント

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

これを引用

Murai, N., & Kobayashi, T. (2000). Dictation of multiparty conversation using statistical turn taking model and speaker model. : Speech Processing II (pp. 1575-1578). [861980] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 巻数 3). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2000.861980