Automatic estimation of dialect mixing ratio for dialect speech recognition

Naoki Hirayama, Koichiro Yoshino, Katsutoshi Itoyama, Shinsuke Mori, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper proposes methods for determining an appropriate mixing ratio of dialects in automatic speech recognition (ASR) for dialects. To handle ASR for various dialects, it has been re- ported to be effective to train a language model using a dialect- mixed corpus. One reason behind this is geographical continu- ity of spoken dialect; we regard spoken dialect as a mixture of various dialects. This mixing ratio changes at every moment as well as depends on a speaker. We can improve recognition accu- racy by giving an appropriate dialect mixing ratio for a speaker's dialect. The mixing ratio is generally unknown and requires to be estimated and updated referring to input utterances. We han- dle two methods for updating it based on recognition results; one is to compute contribution of dialects for each recognized word, and the other is to predict mixture information referring to a whole recognized sentence based on topic modeling. The experimental result shows that the mixing ratio estimated by these methods realized higher recognition accuracy than a fixed mixing ratio.

Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech and Communication Association
Pages1492-1496
Number of pages5
Publication statusPublished - 2013
Externally publishedYes
Event14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013 - Lyon, France
Duration: 2013 Aug 252013 Aug 29

Other

Other14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013
CountryFrance
CityLyon
Period13/8/2513/8/29

Keywords

  • Dialect
  • Mixing ratio
  • Supervised latent dirichlet allocation (sLDA)

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint Dive into the research topics of 'Automatic estimation of dialect mixing ratio for dialect speech recognition'. Together they form a unique fingerprint.

  • Cite this

    Hirayama, N., Yoshino, K., Itoyama, K., Mori, S., & Okuno, H. G. (2013). Automatic estimation of dialect mixing ratio for dialect speech recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 1492-1496). International Speech and Communication Association.