Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition

Takaaki Hori, Shinji Watanabe, John R. Hershey

研究成果: Conference contribution

24 被引用数 (Scopus)

抄録

We propose a combination of character-based and word-based language models in an end-to-end automatic speech recognition (ASR) architecture. In our prior work, we combined a character-based LSTM RNN-LM with a hybrid attention/connectionist temporal classification (CTC) architecture. The character LMs improved recognition accuracy to rival state-of-the-art DNN/HMM systems in Japanese and Mandarin Chinese tasks. Although a character-based architecture can provide for open vocabulary recognition, the character-based LMs generally under-perform relative to word LMs for languages such as English with a small alphabet, because of the difficulty of modeling Linguistic constraints across long sequences of characters. This paper presents a novel method for end-to-end ASR decoding with LMs at both the character and word level. Hypotheses are first scored with the character-based LM until a word boundary is encountered. Known words are then re-scored using the word-based LM, while the character-based LM provides for out-of-vocabulary scores. In a standard Wall Street Journal (WSJ) task, we achieved 5.6 % WER for the Eval'92 test set using only the SI284 training set and WSJ text data, which is the best score reported for end-to-end ASR systems on this benchmark.

本文言語English
ホスト出版物のタイトル2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ287-293
ページ数7
ISBN(電子版)9781509047888
DOI
出版ステータスPublished - 2018 1月 24
外部発表はい
イベント2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Okinawa, Japan
継続期間: 2017 12月 162017 12月 20

出版物シリーズ

名前2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
2018-January

Other

Other2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017
国/地域Japan
CityOkinawa
Period17/12/1617/12/20

ASJC Scopus subject areas

  • コンピュータ ビジョンおよびパターン認識
  • 人間とコンピュータの相互作用

フィンガープリント

「Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル