Automatic estimation of dialect mixing ratio for dialect speech recognition

Naoki Hirayama, Koichiro Yoshino, Katsutoshi Itoyama, Shinsuke Mori, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper proposes methods for determining an appropriate mixing ratio of dialects in automatic speech recognition (ASR) for dialects. To handle ASR for various dialects, it has been re- ported to be effective to train a language model using a dialect- mixed corpus. One reason behind this is geographical continu- ity of spoken dialect; we regard spoken dialect as a mixture of various dialects. This mixing ratio changes at every moment as well as depends on a speaker. We can improve recognition accu- racy by giving an appropriate dialect mixing ratio for a speaker's dialect. The mixing ratio is generally unknown and requires to be estimated and updated referring to input utterances. We han- dle two methods for updating it based on recognition results; one is to compute contribution of dialects for each recognized word, and the other is to predict mixture information referring to a whole recognized sentence based on topic modeling. The experimental result shows that the mixing ratio estimated by these methods realized higher recognition accuracy than a fixed mixing ratio.

Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech and Communication Association
Pages1492-1496
Number of pages5
Publication statusPublished - 2013
Externally publishedYes
Event14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013 - Lyon, France
Duration: 2013 Aug 252013 Aug 29

Other

Other14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013
CountryFrance
CityLyon
Period13/8/2513/8/29

Fingerprint

Speech Recognition
Speech recognition
Automatic Speech Recognition
Language Model
Updating
Moment
Predict
Unknown
Experimental Results
Modeling

Keywords

  • Dialect
  • Mixing ratio
  • Supervised latent dirichlet allocation (sLDA)

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

Hirayama, N., Yoshino, K., Itoyama, K., Mori, S., & Okuno, H. G. (2013). Automatic estimation of dialect mixing ratio for dialect speech recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 1492-1496). International Speech and Communication Association.

Automatic estimation of dialect mixing ratio for dialect speech recognition. / Hirayama, Naoki; Yoshino, Koichiro; Itoyama, Katsutoshi; Mori, Shinsuke; Okuno, Hiroshi G.

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association, 2013. p. 1492-1496.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hirayama, N, Yoshino, K, Itoyama, K, Mori, S & Okuno, HG 2013, Automatic estimation of dialect mixing ratio for dialect speech recognition. in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association, pp. 1492-1496, 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, Lyon, France, 13/8/25.
Hirayama N, Yoshino K, Itoyama K, Mori S, Okuno HG. Automatic estimation of dialect mixing ratio for dialect speech recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association. 2013. p. 1492-1496
Hirayama, Naoki ; Yoshino, Koichiro ; Itoyama, Katsutoshi ; Mori, Shinsuke ; Okuno, Hiroshi G. / Automatic estimation of dialect mixing ratio for dialect speech recognition. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association, 2013. pp. 1492-1496
@inproceedings{bb6f6b8b6d1844cc80a1a7b3c4ba064c,
title = "Automatic estimation of dialect mixing ratio for dialect speech recognition",
abstract = "This paper proposes methods for determining an appropriate mixing ratio of dialects in automatic speech recognition (ASR) for dialects. To handle ASR for various dialects, it has been re- ported to be effective to train a language model using a dialect- mixed corpus. One reason behind this is geographical continu- ity of spoken dialect; we regard spoken dialect as a mixture of various dialects. This mixing ratio changes at every moment as well as depends on a speaker. We can improve recognition accu- racy by giving an appropriate dialect mixing ratio for a speaker's dialect. The mixing ratio is generally unknown and requires to be estimated and updated referring to input utterances. We han- dle two methods for updating it based on recognition results; one is to compute contribution of dialects for each recognized word, and the other is to predict mixture information referring to a whole recognized sentence based on topic modeling. The experimental result shows that the mixing ratio estimated by these methods realized higher recognition accuracy than a fixed mixing ratio.",
keywords = "Dialect, Mixing ratio, Supervised latent dirichlet allocation (sLDA)",
author = "Naoki Hirayama and Koichiro Yoshino and Katsutoshi Itoyama and Shinsuke Mori and Okuno, {Hiroshi G.}",
year = "2013",
language = "English",
pages = "1492--1496",
booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
publisher = "International Speech and Communication Association",

}

TY - GEN

T1 - Automatic estimation of dialect mixing ratio for dialect speech recognition

AU - Hirayama, Naoki

AU - Yoshino, Koichiro

AU - Itoyama, Katsutoshi

AU - Mori, Shinsuke

AU - Okuno, Hiroshi G.

PY - 2013

Y1 - 2013

N2 - This paper proposes methods for determining an appropriate mixing ratio of dialects in automatic speech recognition (ASR) for dialects. To handle ASR for various dialects, it has been re- ported to be effective to train a language model using a dialect- mixed corpus. One reason behind this is geographical continu- ity of spoken dialect; we regard spoken dialect as a mixture of various dialects. This mixing ratio changes at every moment as well as depends on a speaker. We can improve recognition accu- racy by giving an appropriate dialect mixing ratio for a speaker's dialect. The mixing ratio is generally unknown and requires to be estimated and updated referring to input utterances. We han- dle two methods for updating it based on recognition results; one is to compute contribution of dialects for each recognized word, and the other is to predict mixture information referring to a whole recognized sentence based on topic modeling. The experimental result shows that the mixing ratio estimated by these methods realized higher recognition accuracy than a fixed mixing ratio.

AB - This paper proposes methods for determining an appropriate mixing ratio of dialects in automatic speech recognition (ASR) for dialects. To handle ASR for various dialects, it has been re- ported to be effective to train a language model using a dialect- mixed corpus. One reason behind this is geographical continu- ity of spoken dialect; we regard spoken dialect as a mixture of various dialects. This mixing ratio changes at every moment as well as depends on a speaker. We can improve recognition accu- racy by giving an appropriate dialect mixing ratio for a speaker's dialect. The mixing ratio is generally unknown and requires to be estimated and updated referring to input utterances. We han- dle two methods for updating it based on recognition results; one is to compute contribution of dialects for each recognized word, and the other is to predict mixture information referring to a whole recognized sentence based on topic modeling. The experimental result shows that the mixing ratio estimated by these methods realized higher recognition accuracy than a fixed mixing ratio.

KW - Dialect

KW - Mixing ratio

KW - Supervised latent dirichlet allocation (sLDA)

UR - http://www.scopus.com/inward/record.url?scp=84906247538&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84906247538&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84906247538

SP - 1492

EP - 1496

BT - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

PB - International Speech and Communication Association

ER -