Semi-supervised learning of a pronunciation dictionary from disjoint phonemic transcripts and text

Takahiro Shinozaki, Shinji Watanabe, Daichi Mochihashi, Graham Neubig

研究成果: Conference article

2 引用 (Scopus)


While the performance of automatic speech recognition systems has recently approached human levels in some tasks, the application is still limited to specific domains. This is because system development relies on extensive supervised training and expert tuning in the target domain. To solve this problem, systems must become more self-sufficient, having the ability to learn directly from speech and adapt to new tasks. One open question in this area is how to learn a pronunciation dictionary containing the appropriate vocabulary. Humans can recognize words, even ones they have never heard before, by reading text and understanding the context in which a word is used. However, this ability is missing in current speech recognition systems. In this work, we propose a new framework that automatically expands an initial pronunciation dictionary using independently sampled acoustic and textual data. While the task is very challenging and in its initial stage, we demonstrate that a model based on Bayesian learning of Dirichlet processes can acquire word pronunciations from phone transcripts and text of the WSJ data set.

ジャーナルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
出版物ステータスPublished - 2017 1 1
イベント18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
継続期間: 2017 8 202017 8 24


ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation