Dictation of Multiparty Conversation Considering Speaker Individuality and Turn Taking

Noriyuki Murai, Tetsunori Kobayashi

    Research output: Contribution to journalArticle

    2 Citations (Scopus)

    Abstract

    This paper discusses an algorithm that recognizes multiparty speech with complex turn taking. In recognition of the conversation of multiple speakers, it is necessary to know not only what is spoken, as in the conventional system, but also who spoke up to what point. The purpose of this paper is to find a method to solve this problem. The representation of the likelihood of turn taking is included in the language model in the continuous speech recognition system, and the speech properties of each speaker are represented by a statistical model. Using this approach, two algorithms are proposed that estimate simultaneously and in parallel the speaker and the speech content. Recognition experiments using conversation in TV sports news show that the proposed method can correct a maximum of 29.5% of the errors in the recognition of speech content and 93.0% of the errors in recognition of the speaker.

    Original languageEnglish
    Pages (from-to)103-111
    Number of pages9
    JournalSystems and Computers in Japan
    Volume34
    Issue number13
    DOIs
    Publication statusPublished - 2003 Nov 30

    Fingerprint

    Continuous speech recognition
    Language Model
    Speech Recognition
    Sports
    Statistical Model
    Likelihood
    Necessary
    Speech
    Estimate
    Experiment
    Experiments
    Statistical Models

    Keywords

    • GMM
    • MLLR
    • Multiparty conversation
    • Speaker individuality
    • Statistical turn taking model

    ASJC Scopus subject areas

    • Hardware and Architecture
    • Information Systems
    • Theoretical Computer Science
    • Computational Theory and Mathematics

    Cite this

    Dictation of Multiparty Conversation Considering Speaker Individuality and Turn Taking. / Murai, Noriyuki; Kobayashi, Tetsunori.

    In: Systems and Computers in Japan, Vol. 34, No. 13, 30.11.2003, p. 103-111.

    Research output: Contribution to journalArticle

    @article{e5f6f4a628d64d62b7506f8d30b19907,
    title = "Dictation of Multiparty Conversation Considering Speaker Individuality and Turn Taking",
    abstract = "This paper discusses an algorithm that recognizes multiparty speech with complex turn taking. In recognition of the conversation of multiple speakers, it is necessary to know not only what is spoken, as in the conventional system, but also who spoke up to what point. The purpose of this paper is to find a method to solve this problem. The representation of the likelihood of turn taking is included in the language model in the continuous speech recognition system, and the speech properties of each speaker are represented by a statistical model. Using this approach, two algorithms are proposed that estimate simultaneously and in parallel the speaker and the speech content. Recognition experiments using conversation in TV sports news show that the proposed method can correct a maximum of 29.5{\%} of the errors in the recognition of speech content and 93.0{\%} of the errors in recognition of the speaker.",
    keywords = "GMM, MLLR, Multiparty conversation, Speaker individuality, Statistical turn taking model",
    author = "Noriyuki Murai and Tetsunori Kobayashi",
    year = "2003",
    month = "11",
    day = "30",
    doi = "10.1002/scj.1223",
    language = "English",
    volume = "34",
    pages = "103--111",
    journal = "Systems and Computers in Japan",
    issn = "0882-1666",
    publisher = "John Wiley and Sons Inc.",
    number = "13",

    }

    TY - JOUR

    T1 - Dictation of Multiparty Conversation Considering Speaker Individuality and Turn Taking

    AU - Murai, Noriyuki

    AU - Kobayashi, Tetsunori

    PY - 2003/11/30

    Y1 - 2003/11/30

    N2 - This paper discusses an algorithm that recognizes multiparty speech with complex turn taking. In recognition of the conversation of multiple speakers, it is necessary to know not only what is spoken, as in the conventional system, but also who spoke up to what point. The purpose of this paper is to find a method to solve this problem. The representation of the likelihood of turn taking is included in the language model in the continuous speech recognition system, and the speech properties of each speaker are represented by a statistical model. Using this approach, two algorithms are proposed that estimate simultaneously and in parallel the speaker and the speech content. Recognition experiments using conversation in TV sports news show that the proposed method can correct a maximum of 29.5% of the errors in the recognition of speech content and 93.0% of the errors in recognition of the speaker.

    AB - This paper discusses an algorithm that recognizes multiparty speech with complex turn taking. In recognition of the conversation of multiple speakers, it is necessary to know not only what is spoken, as in the conventional system, but also who spoke up to what point. The purpose of this paper is to find a method to solve this problem. The representation of the likelihood of turn taking is included in the language model in the continuous speech recognition system, and the speech properties of each speaker are represented by a statistical model. Using this approach, two algorithms are proposed that estimate simultaneously and in parallel the speaker and the speech content. Recognition experiments using conversation in TV sports news show that the proposed method can correct a maximum of 29.5% of the errors in the recognition of speech content and 93.0% of the errors in recognition of the speaker.

    KW - GMM

    KW - MLLR

    KW - Multiparty conversation

    KW - Speaker individuality

    KW - Statistical turn taking model

    UR - http://www.scopus.com/inward/record.url?scp=0242364146&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0242364146&partnerID=8YFLogxK

    U2 - 10.1002/scj.1223

    DO - 10.1002/scj.1223

    M3 - Article

    AN - SCOPUS:0242364146

    VL - 34

    SP - 103

    EP - 111

    JO - Systems and Computers in Japan

    JF - Systems and Computers in Japan

    SN - 0882-1666

    IS - 13

    ER -