Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling

Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Mallidi, Nelson Yalta, Martin Karafiat, Shinji Watanabe, Takaaki Hori

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of requiring more data compared to conventional DNN-HMM systems. In this work, we attempt to use data from 10 BABEL languages to build a multilingual seq2seq model as a prior model, and then port them towards 4 other BABEL languages using transfer learning approach. We also explore different architectures for improving the prior multilingual seq2seq model. The paper also discusses the effect of integrating a recurrent neural network language model (RNNLM) with a seq2seq model during decoding. Experimental results show that the transfer learning approach from the multilingual model shows substantial gains over monolingual models across all 4 BABEL languages. Incorporating an RNNLM also brings significant improvements in terms of %WER, and achieves recognition performance comparable to the models trained with twice more training data.

Original languageEnglish
Title of host publication2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages521-527
Number of pages7
ISBN (Electronic)9781538643341
DOIs
Publication statusPublished - 2019 Feb 11
Event2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Athens, Greece
Duration: 2018 Dec 182018 Dec 21

Publication series

Name2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings

Conference

Conference2018 IEEE Spoken Language Technology Workshop, SLT 2018
CountryGreece
CityAthens
Period18/12/1818/12/21

    Fingerprint

Keywords

  • Automatic speech recognition (ASR)
  • language modeling
  • multilingual setup
  • sequence to sequence
  • transfer learning

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Linguistics and Language

Cite this

Cho, J., Baskar, M. K., Li, R., Wiesner, M., Mallidi, S. H., Yalta, N., Karafiat, M., Watanabe, S., & Hori, T. (2019). Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling. In 2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings (pp. 521-527). [8639655] (2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SLT.2018.8639655