Dictation of multiparty conversation using statistical turn taking model and speaker model

Noriyuki Murai, Tetsunori Kobayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

A new speech decoder dealing with multiparty conversation is proposed. Multiparty conversation denotes a situation in which many speakers talk to each other. Almost of all conventional speech recognition systems assume that the input data consist of single speaker's voice. However, some applications, such as dialogue dictation and voice interfaces for multi-users, have to deal with mixed speakers' voices. In such a situation, the system has to recognize not only the word sequence of the input speech but also the speaker of each part of them. Therefore, we propose a decoder utilizing not only an acoustic model and language model, which are the resources of a conventional single-user speech decoder, but also a statistic turn taking model and speakers models to recognize speech. This framework realizes simultaneous maximum likelihood estimation of spoken word sequence and the speaker sequence. Experimental results using a TV sports news show that the proposed method reduce the word error rate by 7.7% and speaker error rate by 97.8% compared to the conventional method.

Original languageEnglish
Title of host publicationSpeech Processing II
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1575-1578
Number of pages4
ISBN (Electronic)0780362934
DOIs
Publication statusPublished - 2000 Jan 1
Event25th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000 - Istanbul, Turkey
Duration: 2000 Jun 52000 Jun 9

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume3
ISSN (Print)1520-6149

Conference

Conference25th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000
CountryTurkey
CityIstanbul
Period00/6/500/6/9

    Fingerprint

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Murai, N., & Kobayashi, T. (2000). Dictation of multiparty conversation using statistical turn taking model and speaker model. In Speech Processing II (pp. 1575-1578). [861980] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 3). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2000.861980