TY - JOUR
T1 - Advances in joint CTC-attention based end-to-end speech recognition with a deep CNN encoder and RNN-LM
AU - Hori, Takaaki
AU - Watanabe, Shinji
AU - Zhang, Yu
AU - Chan, William
N1 - Publisher Copyright:
Copyright © 2017 ISCA.
PY - 2017
Y1 - 2017
N2 - We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder. During the beam search process, we combine the CTC predictions, the attention-based decoder predictions and a separately trained LSTM language model. We achieve a 5-10% error reduction compared to prior systems on spontaneous Japanese and Chinese speech, and our end-to-end model beats out traditional hybrid ASR systems.
AB - We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder. During the beam search process, we combine the CTC predictions, the attention-based decoder predictions and a separately trained LSTM language model. We achieve a 5-10% error reduction compared to prior systems on spontaneous Japanese and Chinese speech, and our end-to-end model beats out traditional hybrid ASR systems.
KW - Attention model
KW - Connectionist temporal classification
KW - Encoder-decoder
KW - End-to-end speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85039169903&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85039169903&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2017-1296
DO - 10.21437/Interspeech.2017-1296
M3 - Conference article
AN - SCOPUS:85039169903
VL - 2017-August
SP - 949
EP - 953
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SN - 2308-457X
T2 - 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017
Y2 - 20 August 2017 through 24 August 2017
ER -