Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling

Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Mallidi, Nelson Yalta, Martin Karafiat, Shinji Watanabe, Takaaki Hori

研究成果: Conference contribution

7 引用 (Scopus)

抜粋

Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of requiring more data compared to conventional DNN-HMM systems. In this work, we attempt to use data from 10 BABEL languages to build a multilingual seq2seq model as a prior model, and then port them towards 4 other BABEL languages using transfer learning approach. We also explore different architectures for improving the prior multilingual seq2seq model. The paper also discusses the effect of integrating a recurrent neural network language model (RNNLM) with a seq2seq model during decoding. Experimental results show that the transfer learning approach from the multilingual model shows substantial gains over monolingual models across all 4 BABEL languages. Incorporating an RNNLM also brings significant improvements in terms of %WER, and achieves recognition performance comparable to the models trained with twice more training data.

元の言語English
ホスト出版物のタイトル2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings
出版者Institute of Electrical and Electronics Engineers Inc.
ページ521-527
ページ数7
ISBN(電子版)9781538643341
DOI
出版物ステータスPublished - 2019 2 11
イベント2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Athens, Greece
継続期間: 2018 12 182018 12 21

出版物シリーズ

名前2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings

Conference

Conference2018 IEEE Spoken Language Technology Workshop, SLT 2018
Greece
Athens
期間18/12/1818/12/21

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Linguistics and Language

フィンガープリント Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用

    Cho, J., Baskar, M. K., Li, R., Wiesner, M., Mallidi, S. H., Yalta, N., Karafiat, M., Watanabe, S., & Hori, T. (2019). Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling. : 2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings (pp. 521-527). [8639655] (2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SLT.2018.8639655