Semi-supervised learning of a pronunciation dictionary from disjoint phonemic transcripts and text

Takahiro Shinozaki, Shinji Watanabe, Daichi Mochihashi, Graham Neubig

研究成果: Conference article査読

3 被引用数 (Scopus)

抄録

While the performance of automatic speech recognition systems has recently approached human levels in some tasks, the application is still limited to specific domains. This is because system development relies on extensive supervised training and expert tuning in the target domain. To solve this problem, systems must become more self-sufficient, having the ability to learn directly from speech and adapt to new tasks. One open question in this area is how to learn a pronunciation dictionary containing the appropriate vocabulary. Humans can recognize words, even ones they have never heard before, by reading text and understanding the context in which a word is used. However, this ability is missing in current speech recognition systems. In this work, we propose a new framework that automatically expands an initial pronunciation dictionary using independently sampled acoustic and textual data. While the task is very challenging and in its initial stage, we demonstrate that a model based on Bayesian learning of Dirichlet processes can acquire word pronunciations from phone transcripts and text of the WSJ data set.

本文言語English
ページ(範囲)2546-2550
ページ数5
ジャーナルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2017-August
DOI
出版ステータスPublished - 2017
外部発表はい
イベント18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
継続期間: 2017 8月 202017 8月 24

ASJC Scopus subject areas

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション

フィンガープリント

「Semi-supervised learning of a pronunciation dictionary from disjoint phonemic transcripts and text」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル