Acoustic Modeling for Overlapping Speech Recognition

Jhu Chime-5 Challenge System

Vimal Manohar, Szu Jui Chen, Zhiqi Wang, Y. Fujita, Shinji Watanabe, Sanjeev Khudanpur

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper summarizes our acoustic modeling efforts in the Johns Hopkins University speech recognition system for the CHiME-5 challenge to recognize highly-overlapped dinner party speech recorded by multiple microphone arrays. We explore data augmentation approaches, neural network architectures, front-end speech dereverberation, beamforming and robust i-vector extraction with comparisons of our in-house implementations and publicly available tools. We finally achieved a word error rate of 69.4% on the development set, which is a 11.7% absolute improvement over the previous baseline of 81.1%, and release this improved baseline with refined techniques/tools as an advanced CHiME-5 recipe.

Original languageEnglish
Title of host publication2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6665-6669
Number of pages5
ISBN (Electronic)9781479981311
DOIs
Publication statusPublished - 2019 May 1
Event44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Brighton, United Kingdom
Duration: 2019 May 122019 May 17

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2019-May
ISSN (Print)1520-6149

Conference

Conference44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
CountryUnited Kingdom
CityBrighton
Period19/5/1219/5/17

Fingerprint

Speech recognition
Acoustics
Microphones
Beamforming
Network architecture
Neural networks

Keywords

  • acoustic modeling
  • CHiME-5 challenge
  • Kaldi
  • Robust speech recognition

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Manohar, V., Chen, S. J., Wang, Z., Fujita, Y., Watanabe, S., & Khudanpur, S. (2019). Acoustic Modeling for Overlapping Speech Recognition: Jhu Chime-5 Challenge System. In 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings (pp. 6665-6669). [8682556] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2019-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2019.8682556

Acoustic Modeling for Overlapping Speech Recognition : Jhu Chime-5 Challenge System. / Manohar, Vimal; Chen, Szu Jui; Wang, Zhiqi; Fujita, Y.; Watanabe, Shinji; Khudanpur, Sanjeev.

2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. p. 6665-6669 8682556 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2019-May).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Manohar, V, Chen, SJ, Wang, Z, Fujita, Y, Watanabe, S & Khudanpur, S 2019, Acoustic Modeling for Overlapping Speech Recognition: Jhu Chime-5 Challenge System. in 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings., 8682556, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2019-May, Institute of Electrical and Electronics Engineers Inc., pp. 6665-6669, 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, Brighton, United Kingdom, 19/5/12. https://doi.org/10.1109/ICASSP.2019.8682556
Manohar V, Chen SJ, Wang Z, Fujita Y, Watanabe S, Khudanpur S. Acoustic Modeling for Overlapping Speech Recognition: Jhu Chime-5 Challenge System. In 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2019. p. 6665-6669. 8682556. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2019.8682556
Manohar, Vimal ; Chen, Szu Jui ; Wang, Zhiqi ; Fujita, Y. ; Watanabe, Shinji ; Khudanpur, Sanjeev. / Acoustic Modeling for Overlapping Speech Recognition : Jhu Chime-5 Challenge System. 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 6665-6669 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).
@inproceedings{713e7ce29b94489e834379ea7d7eba95,
title = "Acoustic Modeling for Overlapping Speech Recognition: Jhu Chime-5 Challenge System",
abstract = "This paper summarizes our acoustic modeling efforts in the Johns Hopkins University speech recognition system for the CHiME-5 challenge to recognize highly-overlapped dinner party speech recorded by multiple microphone arrays. We explore data augmentation approaches, neural network architectures, front-end speech dereverberation, beamforming and robust i-vector extraction with comparisons of our in-house implementations and publicly available tools. We finally achieved a word error rate of 69.4{\%} on the development set, which is a 11.7{\%} absolute improvement over the previous baseline of 81.1{\%}, and release this improved baseline with refined techniques/tools as an advanced CHiME-5 recipe.",
keywords = "acoustic modeling, CHiME-5 challenge, Kaldi, Robust speech recognition",
author = "Vimal Manohar and Chen, {Szu Jui} and Zhiqi Wang and Y. Fujita and Shinji Watanabe and Sanjeev Khudanpur",
year = "2019",
month = "5",
day = "1",
doi = "10.1109/ICASSP.2019.8682556",
language = "English",
series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "6665--6669",
booktitle = "2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings",

}

TY - GEN

T1 - Acoustic Modeling for Overlapping Speech Recognition

T2 - Jhu Chime-5 Challenge System

AU - Manohar, Vimal

AU - Chen, Szu Jui

AU - Wang, Zhiqi

AU - Fujita, Y.

AU - Watanabe, Shinji

AU - Khudanpur, Sanjeev

PY - 2019/5/1

Y1 - 2019/5/1

N2 - This paper summarizes our acoustic modeling efforts in the Johns Hopkins University speech recognition system for the CHiME-5 challenge to recognize highly-overlapped dinner party speech recorded by multiple microphone arrays. We explore data augmentation approaches, neural network architectures, front-end speech dereverberation, beamforming and robust i-vector extraction with comparisons of our in-house implementations and publicly available tools. We finally achieved a word error rate of 69.4% on the development set, which is a 11.7% absolute improvement over the previous baseline of 81.1%, and release this improved baseline with refined techniques/tools as an advanced CHiME-5 recipe.

AB - This paper summarizes our acoustic modeling efforts in the Johns Hopkins University speech recognition system for the CHiME-5 challenge to recognize highly-overlapped dinner party speech recorded by multiple microphone arrays. We explore data augmentation approaches, neural network architectures, front-end speech dereverberation, beamforming and robust i-vector extraction with comparisons of our in-house implementations and publicly available tools. We finally achieved a word error rate of 69.4% on the development set, which is a 11.7% absolute improvement over the previous baseline of 81.1%, and release this improved baseline with refined techniques/tools as an advanced CHiME-5 recipe.

KW - acoustic modeling

KW - CHiME-5 challenge

KW - Kaldi

KW - Robust speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85068973715&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068973715&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2019.8682556

DO - 10.1109/ICASSP.2019.8682556

M3 - Conference contribution

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 6665

EP - 6669

BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -