Statistical method of building dialect language models for ASR systems

Naoki Hirayama, Shinsuke Mori, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

This paper develops a new statistical method of building language models (LMs) of Japanese dialects for automatic speech recognition (ASR). One possible application is to recognize a variety of utterances in our daily lives. The most crucial problem in training language models for dialects is the shortage of linguistic corpora in dialects. Our solution is to transform linguistic corpora into dialects at a level of pronunciations of words. We develop phonemesequence transducers based on weighted finite-state transducers (WFSTs). Each word in common language (CL) corpora is automatically labelled as dialect word pronunciations. For example, anta (Kansai dialect) is labelled anata (the most common representation of 'you' in Japanese). Phoneme-sequence transducers are trained from parallel corpora of a dialect and CL. We evaluate the word recognition accuracy of our ASR system. Our method outperforms the ASR system with LMs trained from untransformed corpora in written language by 9.9 points.

Original languageEnglish
Title of host publication24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers
Pages1179-1194
Number of pages16
Publication statusPublished - 2012
Externally publishedYes
Event24th International Conference on Computational Linguistics, COLING 2012 - Mumbai
Duration: 2012 Dec 82012 Dec 15

Other

Other24th International Conference on Computational Linguistics, COLING 2012
CityMumbai
Period12/12/812/12/15

Fingerprint

statistical method
Speech recognition
dialect
Transducers
Statistical methods
Linguistics
language
linguistics
written language
Automatic Speech Recognition
Statistical Methods
Language Model
shortage
Linguistic Corpora
Common Language

Keywords

  • Dialect
  • Language model
  • Spoken language
  • Weighted finite-state transducer (WFST)

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Language and Linguistics
  • Linguistics and Language

Cite this

Hirayama, N., Mori, S., & Okuno, H. G. (2012). Statistical method of building dialect language models for ASR systems. In 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers (pp. 1179-1194)

Statistical method of building dialect language models for ASR systems. / Hirayama, Naoki; Mori, Shinsuke; Okuno, Hiroshi G.

24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers. 2012. p. 1179-1194.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hirayama, N, Mori, S & Okuno, HG 2012, Statistical method of building dialect language models for ASR systems. in 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers. pp. 1179-1194, 24th International Conference on Computational Linguistics, COLING 2012, Mumbai, 12/12/8.
Hirayama N, Mori S, Okuno HG. Statistical method of building dialect language models for ASR systems. In 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers. 2012. p. 1179-1194
Hirayama, Naoki ; Mori, Shinsuke ; Okuno, Hiroshi G. / Statistical method of building dialect language models for ASR systems. 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers. 2012. pp. 1179-1194
@inproceedings{410c0be9308e436ba8acd51f5c33eba8,
title = "Statistical method of building dialect language models for ASR systems",
abstract = "This paper develops a new statistical method of building language models (LMs) of Japanese dialects for automatic speech recognition (ASR). One possible application is to recognize a variety of utterances in our daily lives. The most crucial problem in training language models for dialects is the shortage of linguistic corpora in dialects. Our solution is to transform linguistic corpora into dialects at a level of pronunciations of words. We develop phonemesequence transducers based on weighted finite-state transducers (WFSTs). Each word in common language (CL) corpora is automatically labelled as dialect word pronunciations. For example, anta (Kansai dialect) is labelled anata (the most common representation of 'you' in Japanese). Phoneme-sequence transducers are trained from parallel corpora of a dialect and CL. We evaluate the word recognition accuracy of our ASR system. Our method outperforms the ASR system with LMs trained from untransformed corpora in written language by 9.9 points.",
keywords = "Dialect, Language model, Spoken language, Weighted finite-state transducer (WFST)",
author = "Naoki Hirayama and Shinsuke Mori and Okuno, {Hiroshi G.}",
year = "2012",
language = "English",
pages = "1179--1194",
booktitle = "24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers",

}

TY - GEN

T1 - Statistical method of building dialect language models for ASR systems

AU - Hirayama, Naoki

AU - Mori, Shinsuke

AU - Okuno, Hiroshi G.

PY - 2012

Y1 - 2012

N2 - This paper develops a new statistical method of building language models (LMs) of Japanese dialects for automatic speech recognition (ASR). One possible application is to recognize a variety of utterances in our daily lives. The most crucial problem in training language models for dialects is the shortage of linguistic corpora in dialects. Our solution is to transform linguistic corpora into dialects at a level of pronunciations of words. We develop phonemesequence transducers based on weighted finite-state transducers (WFSTs). Each word in common language (CL) corpora is automatically labelled as dialect word pronunciations. For example, anta (Kansai dialect) is labelled anata (the most common representation of 'you' in Japanese). Phoneme-sequence transducers are trained from parallel corpora of a dialect and CL. We evaluate the word recognition accuracy of our ASR system. Our method outperforms the ASR system with LMs trained from untransformed corpora in written language by 9.9 points.

AB - This paper develops a new statistical method of building language models (LMs) of Japanese dialects for automatic speech recognition (ASR). One possible application is to recognize a variety of utterances in our daily lives. The most crucial problem in training language models for dialects is the shortage of linguistic corpora in dialects. Our solution is to transform linguistic corpora into dialects at a level of pronunciations of words. We develop phonemesequence transducers based on weighted finite-state transducers (WFSTs). Each word in common language (CL) corpora is automatically labelled as dialect word pronunciations. For example, anta (Kansai dialect) is labelled anata (the most common representation of 'you' in Japanese). Phoneme-sequence transducers are trained from parallel corpora of a dialect and CL. We evaluate the word recognition accuracy of our ASR system. Our method outperforms the ASR system with LMs trained from untransformed corpora in written language by 9.9 points.

KW - Dialect

KW - Language model

KW - Spoken language

KW - Weighted finite-state transducer (WFST)

UR - http://www.scopus.com/inward/record.url?scp=84876815189&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84876815189&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84876815189

SP - 1179

EP - 1194

BT - 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers

ER -