MIS-recognized utterance detection using hierarchical language model

Hirofumi Yamamoto, Genichiro Kikui, Yoshinori Sagisaka

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    In this paper, a mis-recognized utterance detection and modification scheme is proposed to recover speech recognition errors in speech translation. In a speech recognition stage, mis-recognition is frequently observed. The most of mis-recognitions result from mis-match of acoustic models and out-of-vocabulary (OOV) words. To cope with both acoustic model mis-match and OOVs, we adopt a hierarchical language model to identify them. A hierarchical language model can generate both hypotheses with and without OOVs (or acoustic mis-matched words). Likelihood difference of these hypotheses is used as utterance confidence measure. To confirm the possibility of this scheme, as a first experiment, we have conducted speech recognition experiments and mis-recognized utterance detection. Experiment results showed 99% detection rate for utterances with OOVs. This rate is considerably higher than 94% of a conventional detection method using a-posteriori probability. The rate of 80%, which is comparable to a conventional method were obtained for the utterances without OOVs. These results support the possibility of the proposed error detection and modification scheme.

    Original languageEnglish
    Title of host publication8th International Conference on Spoken Language Processing, ICSLP 2004
    PublisherInternational Speech Communication Association
    Pages1025-1028
    Number of pages4
    Publication statusPublished - 2004
    Event8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, Korea, Republic of
    Duration: 2004 Oct 42004 Oct 8

    Other

    Other8th International Conference on Spoken Language Processing, ICSLP 2004
    CountryKorea, Republic of
    CityJeju, Jeju Island
    Period04/10/404/10/8

    Fingerprint

    acoustics
    language
    mismatch
    experiment
    vocabulary
    confidence
    Language Model
    Utterance
    Experiment
    Speech Recognition
    Acoustics
    Conventional
    Mismatch

    ASJC Scopus subject areas

    • Language and Linguistics
    • Linguistics and Language

    Cite this

    Yamamoto, H., Kikui, G., & Sagisaka, Y. (2004). MIS-recognized utterance detection using hierarchical language model. In 8th International Conference on Spoken Language Processing, ICSLP 2004 (pp. 1025-1028). International Speech Communication Association.

    MIS-recognized utterance detection using hierarchical language model. / Yamamoto, Hirofumi; Kikui, Genichiro; Sagisaka, Yoshinori.

    8th International Conference on Spoken Language Processing, ICSLP 2004. International Speech Communication Association, 2004. p. 1025-1028.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Yamamoto, H, Kikui, G & Sagisaka, Y 2004, MIS-recognized utterance detection using hierarchical language model. in 8th International Conference on Spoken Language Processing, ICSLP 2004. International Speech Communication Association, pp. 1025-1028, 8th International Conference on Spoken Language Processing, ICSLP 2004, Jeju, Jeju Island, Korea, Republic of, 04/10/4.
    Yamamoto H, Kikui G, Sagisaka Y. MIS-recognized utterance detection using hierarchical language model. In 8th International Conference on Spoken Language Processing, ICSLP 2004. International Speech Communication Association. 2004. p. 1025-1028
    Yamamoto, Hirofumi ; Kikui, Genichiro ; Sagisaka, Yoshinori. / MIS-recognized utterance detection using hierarchical language model. 8th International Conference on Spoken Language Processing, ICSLP 2004. International Speech Communication Association, 2004. pp. 1025-1028
    @inproceedings{6f59ebfa7126487da2339f1851dbea04,
    title = "MIS-recognized utterance detection using hierarchical language model",
    abstract = "In this paper, a mis-recognized utterance detection and modification scheme is proposed to recover speech recognition errors in speech translation. In a speech recognition stage, mis-recognition is frequently observed. The most of mis-recognitions result from mis-match of acoustic models and out-of-vocabulary (OOV) words. To cope with both acoustic model mis-match and OOVs, we adopt a hierarchical language model to identify them. A hierarchical language model can generate both hypotheses with and without OOVs (or acoustic mis-matched words). Likelihood difference of these hypotheses is used as utterance confidence measure. To confirm the possibility of this scheme, as a first experiment, we have conducted speech recognition experiments and mis-recognized utterance detection. Experiment results showed 99{\%} detection rate for utterances with OOVs. This rate is considerably higher than 94{\%} of a conventional detection method using a-posteriori probability. The rate of 80{\%}, which is comparable to a conventional method were obtained for the utterances without OOVs. These results support the possibility of the proposed error detection and modification scheme.",
    author = "Hirofumi Yamamoto and Genichiro Kikui and Yoshinori Sagisaka",
    year = "2004",
    language = "English",
    pages = "1025--1028",
    booktitle = "8th International Conference on Spoken Language Processing, ICSLP 2004",
    publisher = "International Speech Communication Association",

    }

    TY - GEN

    T1 - MIS-recognized utterance detection using hierarchical language model

    AU - Yamamoto, Hirofumi

    AU - Kikui, Genichiro

    AU - Sagisaka, Yoshinori

    PY - 2004

    Y1 - 2004

    N2 - In this paper, a mis-recognized utterance detection and modification scheme is proposed to recover speech recognition errors in speech translation. In a speech recognition stage, mis-recognition is frequently observed. The most of mis-recognitions result from mis-match of acoustic models and out-of-vocabulary (OOV) words. To cope with both acoustic model mis-match and OOVs, we adopt a hierarchical language model to identify them. A hierarchical language model can generate both hypotheses with and without OOVs (or acoustic mis-matched words). Likelihood difference of these hypotheses is used as utterance confidence measure. To confirm the possibility of this scheme, as a first experiment, we have conducted speech recognition experiments and mis-recognized utterance detection. Experiment results showed 99% detection rate for utterances with OOVs. This rate is considerably higher than 94% of a conventional detection method using a-posteriori probability. The rate of 80%, which is comparable to a conventional method were obtained for the utterances without OOVs. These results support the possibility of the proposed error detection and modification scheme.

    AB - In this paper, a mis-recognized utterance detection and modification scheme is proposed to recover speech recognition errors in speech translation. In a speech recognition stage, mis-recognition is frequently observed. The most of mis-recognitions result from mis-match of acoustic models and out-of-vocabulary (OOV) words. To cope with both acoustic model mis-match and OOVs, we adopt a hierarchical language model to identify them. A hierarchical language model can generate both hypotheses with and without OOVs (or acoustic mis-matched words). Likelihood difference of these hypotheses is used as utterance confidence measure. To confirm the possibility of this scheme, as a first experiment, we have conducted speech recognition experiments and mis-recognized utterance detection. Experiment results showed 99% detection rate for utterances with OOVs. This rate is considerably higher than 94% of a conventional detection method using a-posteriori probability. The rate of 80%, which is comparable to a conventional method were obtained for the utterances without OOVs. These results support the possibility of the proposed error detection and modification scheme.

    UR - http://www.scopus.com/inward/record.url?scp=85009083785&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85009083785&partnerID=8YFLogxK

    M3 - Conference contribution

    AN - SCOPUS:85009083785

    SP - 1025

    EP - 1028

    BT - 8th International Conference on Spoken Language Processing, ICSLP 2004

    PB - International Speech Communication Association

    ER -