Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling

Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Mallidi, Nelson Yalta, Martin Karafiat, Shinji Watanabe, Takaaki Hori

研究成果: Conference contribution

31 被引用数 (Scopus)

抄録

Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of requiring more data compared to conventional DNN-HMM systems. In this work, we attempt to use data from 10 BABEL languages to build a multilingual seq2seq model as a prior model, and then port them towards 4 other BABEL languages using transfer learning approach. We also explore different architectures for improving the prior multilingual seq2seq model. The paper also discusses the effect of integrating a recurrent neural network language model (RNNLM) with a seq2seq model during decoding. Experimental results show that the transfer learning approach from the multilingual model shows substantial gains over monolingual models across all 4 BABEL languages. Incorporating an RNNLM also brings significant improvements in terms of %WER, and achieves recognition performance comparable to the models trained with twice more training data.

本文言語English
ホスト出版物のタイトル2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ521-527
ページ数7
ISBN(電子版)9781538643341
DOI
出版ステータスPublished - 2019 2 11
外部発表はい
イベント2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Athens, Greece
継続期間: 2018 12 182018 12 21

出版物シリーズ

名前2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings

Conference

Conference2018 IEEE Spoken Language Technology Workshop, SLT 2018
国/地域Greece
CityAthens
Period18/12/1818/12/21

ASJC Scopus subject areas

  • コンピュータ ビジョンおよびパターン認識
  • 人間とコンピュータの相互作用
  • 言語学および言語

フィンガープリント

「Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル