Modeling characteristics of agglutinative languages with multi-class language model for ASR system

I. Dawa, Yoshinori Sagisaka, S. Nakamura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In this paper, we discuss a new language model that considers the characteristics of the agglutinative languages. We used Mongolian (a Cyrillic language system used in Mongolia) as an example from which to build the language model. We developed a Multi-class N-gram language model based on similar word clustering that focuses on the variable suffixes of a word in Mongolian. By applying our proposed language model, the resulting recognition system can improve performance by 6.85% compared with a conventional word N-gram when applying the ATRASR engine. We also confirmed that our new model will be convenient for rapid development of an ASR system for resource-deficient languages, especially for agglutinative languages such as Mongolian.

Original languageEnglish
Title of host publication2009 Oriental COCOSDA International Conference on Speech Database and Assessments, ICSDA 2009
Pages104-109
Number of pages6
DOIs
Publication statusPublished - 2009
Externally publishedYes
Event2009 Oriental COCOSDA International Conference on Speech Database and Assessments, ICSDA 2009 - Urumqi
Duration: 2009 Aug 102009 Aug 12

Other

Other2009 Oriental COCOSDA International Conference on Speech Database and Assessments, ICSDA 2009
CityUrumqi
Period09/8/1009/8/12

Fingerprint

language
Modeling
Language Model
Agglutinative Language
Mongolia
Engines
N-gram
Language
resources
performance
Cyrillic
Conventional
Resources

ASJC Scopus subject areas

  • Language and Linguistics
  • Software
  • Linguistics and Language

Cite this

Dawa, I., Sagisaka, Y., & Nakamura, S. (2009). Modeling characteristics of agglutinative languages with multi-class language model for ASR system. In 2009 Oriental COCOSDA International Conference on Speech Database and Assessments, ICSDA 2009 (pp. 104-109). [5278368] https://doi.org/10.1109/ICSDA.2009.5278368

Modeling characteristics of agglutinative languages with multi-class language model for ASR system. / Dawa, I.; Sagisaka, Yoshinori; Nakamura, S.

2009 Oriental COCOSDA International Conference on Speech Database and Assessments, ICSDA 2009. 2009. p. 104-109 5278368.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dawa, I, Sagisaka, Y & Nakamura, S 2009, Modeling characteristics of agglutinative languages with multi-class language model for ASR system. in 2009 Oriental COCOSDA International Conference on Speech Database and Assessments, ICSDA 2009., 5278368, pp. 104-109, 2009 Oriental COCOSDA International Conference on Speech Database and Assessments, ICSDA 2009, Urumqi, 09/8/10. https://doi.org/10.1109/ICSDA.2009.5278368
Dawa I, Sagisaka Y, Nakamura S. Modeling characteristics of agglutinative languages with multi-class language model for ASR system. In 2009 Oriental COCOSDA International Conference on Speech Database and Assessments, ICSDA 2009. 2009. p. 104-109. 5278368 https://doi.org/10.1109/ICSDA.2009.5278368
Dawa, I. ; Sagisaka, Yoshinori ; Nakamura, S. / Modeling characteristics of agglutinative languages with multi-class language model for ASR system. 2009 Oriental COCOSDA International Conference on Speech Database and Assessments, ICSDA 2009. 2009. pp. 104-109
@inproceedings{5a9d568a39a3423daf85ad43b585a8a3,
title = "Modeling characteristics of agglutinative languages with multi-class language model for ASR system",
abstract = "In this paper, we discuss a new language model that considers the characteristics of the agglutinative languages. We used Mongolian (a Cyrillic language system used in Mongolia) as an example from which to build the language model. We developed a Multi-class N-gram language model based on similar word clustering that focuses on the variable suffixes of a word in Mongolian. By applying our proposed language model, the resulting recognition system can improve performance by 6.85{\%} compared with a conventional word N-gram when applying the ATRASR engine. We also confirmed that our new model will be convenient for rapid development of an ASR system for resource-deficient languages, especially for agglutinative languages such as Mongolian.",
author = "I. Dawa and Yoshinori Sagisaka and S. Nakamura",
year = "2009",
doi = "10.1109/ICSDA.2009.5278368",
language = "English",
isbn = "9781424444007",
pages = "104--109",
booktitle = "2009 Oriental COCOSDA International Conference on Speech Database and Assessments, ICSDA 2009",

}

TY - GEN

T1 - Modeling characteristics of agglutinative languages with multi-class language model for ASR system

AU - Dawa, I.

AU - Sagisaka, Yoshinori

AU - Nakamura, S.

PY - 2009

Y1 - 2009

N2 - In this paper, we discuss a new language model that considers the characteristics of the agglutinative languages. We used Mongolian (a Cyrillic language system used in Mongolia) as an example from which to build the language model. We developed a Multi-class N-gram language model based on similar word clustering that focuses on the variable suffixes of a word in Mongolian. By applying our proposed language model, the resulting recognition system can improve performance by 6.85% compared with a conventional word N-gram when applying the ATRASR engine. We also confirmed that our new model will be convenient for rapid development of an ASR system for resource-deficient languages, especially for agglutinative languages such as Mongolian.

AB - In this paper, we discuss a new language model that considers the characteristics of the agglutinative languages. We used Mongolian (a Cyrillic language system used in Mongolia) as an example from which to build the language model. We developed a Multi-class N-gram language model based on similar word clustering that focuses on the variable suffixes of a word in Mongolian. By applying our proposed language model, the resulting recognition system can improve performance by 6.85% compared with a conventional word N-gram when applying the ATRASR engine. We also confirmed that our new model will be convenient for rapid development of an ASR system for resource-deficient languages, especially for agglutinative languages such as Mongolian.

UR - http://www.scopus.com/inward/record.url?scp=71249128067&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=71249128067&partnerID=8YFLogxK

U2 - 10.1109/ICSDA.2009.5278368

DO - 10.1109/ICSDA.2009.5278368

M3 - Conference contribution

AN - SCOPUS:71249128067

SN - 9781424444007

SP - 104

EP - 109

BT - 2009 Oriental COCOSDA International Conference on Speech Database and Assessments, ICSDA 2009

ER -