Investigation of ASR systems for resource-deficient languages

I. Dawa, Yoshinori Sagisaka, Satoshi Nakamura

    Research output: Contribution to journalArticle

    1 Citation (Scopus)

    Abstract

    Because the minority languages in China have their special characteristics, it is not suitable to directly adopt the traditional automatic speech recognition (ASR) methods which are used for some major languages, such as Chinese, English, Japanese, etc. In this paper, we take Mongolian (a resource-deficient language) as an example and build the acoustic and language models for applying the ATRASR system. In this paper, we specially focus on the language modeling aspect by considering the special characteristics of the Mongolian. We trained a multi-class N-gram language model based on similar word clustering. By applying the proposed language model, the system could improve the performance by 5.5% compared with the conventional word N-gram.

    Original languageEnglish
    Pages (from-to)550-557
    Number of pages8
    JournalZidonghua Xuebao/ Acta Automatica Sinica
    Volume36
    Issue number4
    DOIs
    Publication statusPublished - 2010 Apr

    Fingerprint

    Speech recognition
    Acoustics

    Keywords

    • Agglutinative language
    • Continuous speech recognition
    • Mongolian language
    • Multi-class N-gram model
    • Similar word clustering

    ASJC Scopus subject areas

    • Control and Systems Engineering
    • Software
    • Information Systems
    • Computer Graphics and Computer-Aided Design

    Cite this

    Investigation of ASR systems for resource-deficient languages. / Dawa, I.; Sagisaka, Yoshinori; Nakamura, Satoshi.

    In: Zidonghua Xuebao/ Acta Automatica Sinica, Vol. 36, No. 4, 04.2010, p. 550-557.

    Research output: Contribution to journalArticle

    @article{36e53d0cc42d4ce1856996ef0e23e21d,
    title = "Investigation of ASR systems for resource-deficient languages",
    abstract = "Because the minority languages in China have their special characteristics, it is not suitable to directly adopt the traditional automatic speech recognition (ASR) methods which are used for some major languages, such as Chinese, English, Japanese, etc. In this paper, we take Mongolian (a resource-deficient language) as an example and build the acoustic and language models for applying the ATRASR system. In this paper, we specially focus on the language modeling aspect by considering the special characteristics of the Mongolian. We trained a multi-class N-gram language model based on similar word clustering. By applying the proposed language model, the system could improve the performance by 5.5{\%} compared with the conventional word N-gram.",
    keywords = "Agglutinative language, Continuous speech recognition, Mongolian language, Multi-class N-gram model, Similar word clustering",
    author = "I. Dawa and Yoshinori Sagisaka and Satoshi Nakamura",
    year = "2010",
    month = "4",
    doi = "10.3724/SP.J.1004.2010.00550",
    language = "English",
    volume = "36",
    pages = "550--557",
    journal = "Zidonghua Xuebao/Acta Automatica Sinica",
    issn = "0254-4156",
    publisher = "Science Press",
    number = "4",

    }

    TY - JOUR

    T1 - Investigation of ASR systems for resource-deficient languages

    AU - Dawa, I.

    AU - Sagisaka, Yoshinori

    AU - Nakamura, Satoshi

    PY - 2010/4

    Y1 - 2010/4

    N2 - Because the minority languages in China have their special characteristics, it is not suitable to directly adopt the traditional automatic speech recognition (ASR) methods which are used for some major languages, such as Chinese, English, Japanese, etc. In this paper, we take Mongolian (a resource-deficient language) as an example and build the acoustic and language models for applying the ATRASR system. In this paper, we specially focus on the language modeling aspect by considering the special characteristics of the Mongolian. We trained a multi-class N-gram language model based on similar word clustering. By applying the proposed language model, the system could improve the performance by 5.5% compared with the conventional word N-gram.

    AB - Because the minority languages in China have their special characteristics, it is not suitable to directly adopt the traditional automatic speech recognition (ASR) methods which are used for some major languages, such as Chinese, English, Japanese, etc. In this paper, we take Mongolian (a resource-deficient language) as an example and build the acoustic and language models for applying the ATRASR system. In this paper, we specially focus on the language modeling aspect by considering the special characteristics of the Mongolian. We trained a multi-class N-gram language model based on similar word clustering. By applying the proposed language model, the system could improve the performance by 5.5% compared with the conventional word N-gram.

    KW - Agglutinative language

    KW - Continuous speech recognition

    KW - Mongolian language

    KW - Multi-class N-gram model

    KW - Similar word clustering

    UR - http://www.scopus.com/inward/record.url?scp=77952588984&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=77952588984&partnerID=8YFLogxK

    U2 - 10.3724/SP.J.1004.2010.00550

    DO - 10.3724/SP.J.1004.2010.00550

    M3 - Article

    AN - SCOPUS:77952588984

    VL - 36

    SP - 550

    EP - 557

    JO - Zidonghua Xuebao/Acta Automatica Sinica

    JF - Zidonghua Xuebao/Acta Automatica Sinica

    SN - 0254-4156

    IS - 4

    ER -