MIS-recognized utterance detection using hierarchical language model

Hirofumi Yamamoto, Genichiro Kikui, Yoshinori Sagisaka

    Research output: Chapter in Book/Report/Conference proceedingConference contribution


    In this paper, a mis-recognized utterance detection and modification scheme is proposed to recover speech recognition errors in speech translation. In a speech recognition stage, mis-recognition is frequently observed. The most of mis-recognitions result from mis-match of acoustic models and out-of-vocabulary (OOV) words. To cope with both acoustic model mis-match and OOVs, we adopt a hierarchical language model to identify them. A hierarchical language model can generate both hypotheses with and without OOVs (or acoustic mis-matched words). Likelihood difference of these hypotheses is used as utterance confidence measure. To confirm the possibility of this scheme, as a first experiment, we have conducted speech recognition experiments and mis-recognized utterance detection. Experiment results showed 99% detection rate for utterances with OOVs. This rate is considerably higher than 94% of a conventional detection method using a-posteriori probability. The rate of 80%, which is comparable to a conventional method were obtained for the utterances without OOVs. These results support the possibility of the proposed error detection and modification scheme.

    Original languageEnglish
    Title of host publication8th International Conference on Spoken Language Processing, ICSLP 2004
    PublisherInternational Speech Communication Association
    Number of pages4
    Publication statusPublished - 2004
    Event8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, Korea, Republic of
    Duration: 2004 Oct 42004 Oct 8


    Other8th International Conference on Spoken Language Processing, ICSLP 2004
    Country/TerritoryKorea, Republic of
    CityJeju, Jeju Island

    ASJC Scopus subject areas

    • Language and Linguistics
    • Linguistics and Language


    Dive into the research topics of 'MIS-recognized utterance detection using hierarchical language model'. Together they form a unique fingerprint.

    Cite this