TY - GEN
T1 - Multilingual Sequence-to-Sequence Speech Recognition
T2 - 2018 IEEE Spoken Language Technology Workshop, SLT 2018
AU - Cho, Jaejin
AU - Baskar, Murali Karthick
AU - Li, Ruizhi
AU - Wiesner, Matthew
AU - Mallidi, Sri Harish
AU - Yalta, Nelson
AU - Karafiat, Martin
AU - Watanabe, Shinji
AU - Hori, Takaaki
N1 - Publisher Copyright:
© 2018 IEEE.
Copyright:
Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2019/2/11
Y1 - 2019/2/11
N2 - Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of requiring more data compared to conventional DNN-HMM systems. In this work, we attempt to use data from 10 BABEL languages to build a multilingual seq2seq model as a prior model, and then port them towards 4 other BABEL languages using transfer learning approach. We also explore different architectures for improving the prior multilingual seq2seq model. The paper also discusses the effect of integrating a recurrent neural network language model (RNNLM) with a seq2seq model during decoding. Experimental results show that the transfer learning approach from the multilingual model shows substantial gains over monolingual models across all 4 BABEL languages. Incorporating an RNNLM also brings significant improvements in terms of %WER, and achieves recognition performance comparable to the models trained with twice more training data.
AB - Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of requiring more data compared to conventional DNN-HMM systems. In this work, we attempt to use data from 10 BABEL languages to build a multilingual seq2seq model as a prior model, and then port them towards 4 other BABEL languages using transfer learning approach. We also explore different architectures for improving the prior multilingual seq2seq model. The paper also discusses the effect of integrating a recurrent neural network language model (RNNLM) with a seq2seq model during decoding. Experimental results show that the transfer learning approach from the multilingual model shows substantial gains over monolingual models across all 4 BABEL languages. Incorporating an RNNLM also brings significant improvements in terms of %WER, and achieves recognition performance comparable to the models trained with twice more training data.
KW - Automatic speech recognition (ASR)
KW - language modeling
KW - multilingual setup
KW - sequence to sequence
KW - transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85063077624&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063077624&partnerID=8YFLogxK
U2 - 10.1109/SLT.2018.8639655
DO - 10.1109/SLT.2018.8639655
M3 - Conference contribution
AN - SCOPUS:85063077624
T3 - 2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings
SP - 521
EP - 527
BT - 2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 December 2018 through 21 December 2018
ER -