TY - GEN
T1 - Acoustic Modeling for Overlapping Speech Recognition
T2 - 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
AU - Manohar, Vimal
AU - Chen, Szu Jui
AU - Wang, Zhiqi
AU - Fujita, Y.
AU - Watanabe, Shinji
AU - Khudanpur, Sanjeev
N1 - Funding Information:
This work was partially supported by NSF Grant No CRI-1513128 and IARPA MATERIAL award number FA8650-17-C-9115. Vimal Manohar was supported by Alexa Graduate Fellowship.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - This paper summarizes our acoustic modeling efforts in the Johns Hopkins University speech recognition system for the CHiME-5 challenge to recognize highly-overlapped dinner party speech recorded by multiple microphone arrays. We explore data augmentation approaches, neural network architectures, front-end speech dereverberation, beamforming and robust i-vector extraction with comparisons of our in-house implementations and publicly available tools. We finally achieved a word error rate of 69.4% on the development set, which is a 11.7% absolute improvement over the previous baseline of 81.1%, and release this improved baseline with refined techniques/tools as an advanced CHiME-5 recipe.
AB - This paper summarizes our acoustic modeling efforts in the Johns Hopkins University speech recognition system for the CHiME-5 challenge to recognize highly-overlapped dinner party speech recorded by multiple microphone arrays. We explore data augmentation approaches, neural network architectures, front-end speech dereverberation, beamforming and robust i-vector extraction with comparisons of our in-house implementations and publicly available tools. We finally achieved a word error rate of 69.4% on the development set, which is a 11.7% absolute improvement over the previous baseline of 81.1%, and release this improved baseline with refined techniques/tools as an advanced CHiME-5 recipe.
KW - CHiME-5 challenge
KW - Kaldi
KW - Robust speech recognition
KW - acoustic modeling
UR - http://www.scopus.com/inward/record.url?scp=85068973715&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068973715&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2019.8682556
DO - 10.1109/ICASSP.2019.8682556
M3 - Conference contribution
AN - SCOPUS:85068973715
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6665
EP - 6669
BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 12 May 2019 through 17 May 2019
ER -