Recurrent deep neural networks for robust speech recognition

Chao Weng, Dong Yu, Shinji Watanabe, Biing Hwang Fred Juang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

76 Citations (Scopus)

Abstract

In this work, we propose recurrent deep neural networks (DNNs) for robust automatic speech recognition (ASR). Full recurrent connections are added to certain hidden layer of a conventional feedforward DNN and allow the model to capture the temporal dependency in deep representations. A new backpropagation through time (BPTT) algorithm is introduced to make the minibatch stochastic gradient descent (SGD) on the proposed recurrent DNNs more efficient and effective. We evaluate the proposed recurrent DNN architecture under the hybrid setup on both the 2nd CHiME challenge (track 2) and Aurora-4 tasks. Experimental results on the CHiME challenge data show that the proposed system can obtain consistent 7% relative WER improvements over the DNN systems, achieving state-of-the-art performance without front-end preprocessing, speaker adaptive training or multiple decoding passes. For the experiments on Aurora-4, the proposed system achieves 4% relative WER improvement over a strong DNN baseline system.

Original languageEnglish
Title of host publication2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5532-5536
Number of pages5
ISBN (Print)9781479928927
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 - Florence
Duration: 2014 May 42014 May 9

Other

Other2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
CityFlorence
Period14/5/414/5/9

Fingerprint

Speech recognition
Network architecture
Backpropagation
Decoding
Deep neural networks
Experiments

Keywords

  • Aurora-4
  • CHiME
  • DNN
  • RNN
  • robust ASR

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Weng, C., Yu, D., Watanabe, S., & Juang, B. H. F. (2014). Recurrent deep neural networks for robust speech recognition. In 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 (pp. 5532-5536). [6854661] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2014.6854661

Recurrent deep neural networks for robust speech recognition. / Weng, Chao; Yu, Dong; Watanabe, Shinji; Juang, Biing Hwang Fred.

2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Institute of Electrical and Electronics Engineers Inc., 2014. p. 5532-5536 6854661.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Weng, C, Yu, D, Watanabe, S & Juang, BHF 2014, Recurrent deep neural networks for robust speech recognition. in 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014., 6854661, Institute of Electrical and Electronics Engineers Inc., pp. 5532-5536, 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014, Florence, 14/5/4. https://doi.org/10.1109/ICASSP.2014.6854661
Weng C, Yu D, Watanabe S, Juang BHF. Recurrent deep neural networks for robust speech recognition. In 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Institute of Electrical and Electronics Engineers Inc. 2014. p. 5532-5536. 6854661 https://doi.org/10.1109/ICASSP.2014.6854661
Weng, Chao ; Yu, Dong ; Watanabe, Shinji ; Juang, Biing Hwang Fred. / Recurrent deep neural networks for robust speech recognition. 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 5532-5536
@inproceedings{ca78eaac82b846b692d928a7b4ceab99,
title = "Recurrent deep neural networks for robust speech recognition",
abstract = "In this work, we propose recurrent deep neural networks (DNNs) for robust automatic speech recognition (ASR). Full recurrent connections are added to certain hidden layer of a conventional feedforward DNN and allow the model to capture the temporal dependency in deep representations. A new backpropagation through time (BPTT) algorithm is introduced to make the minibatch stochastic gradient descent (SGD) on the proposed recurrent DNNs more efficient and effective. We evaluate the proposed recurrent DNN architecture under the hybrid setup on both the 2nd CHiME challenge (track 2) and Aurora-4 tasks. Experimental results on the CHiME challenge data show that the proposed system can obtain consistent 7{\%} relative WER improvements over the DNN systems, achieving state-of-the-art performance without front-end preprocessing, speaker adaptive training or multiple decoding passes. For the experiments on Aurora-4, the proposed system achieves 4{\%} relative WER improvement over a strong DNN baseline system.",
keywords = "Aurora-4, CHiME, DNN, RNN, robust ASR",
author = "Chao Weng and Dong Yu and Shinji Watanabe and Juang, {Biing Hwang Fred}",
year = "2014",
doi = "10.1109/ICASSP.2014.6854661",
language = "English",
isbn = "9781479928927",
pages = "5532--5536",
booktitle = "2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Recurrent deep neural networks for robust speech recognition

AU - Weng, Chao

AU - Yu, Dong

AU - Watanabe, Shinji

AU - Juang, Biing Hwang Fred

PY - 2014

Y1 - 2014

N2 - In this work, we propose recurrent deep neural networks (DNNs) for robust automatic speech recognition (ASR). Full recurrent connections are added to certain hidden layer of a conventional feedforward DNN and allow the model to capture the temporal dependency in deep representations. A new backpropagation through time (BPTT) algorithm is introduced to make the minibatch stochastic gradient descent (SGD) on the proposed recurrent DNNs more efficient and effective. We evaluate the proposed recurrent DNN architecture under the hybrid setup on both the 2nd CHiME challenge (track 2) and Aurora-4 tasks. Experimental results on the CHiME challenge data show that the proposed system can obtain consistent 7% relative WER improvements over the DNN systems, achieving state-of-the-art performance without front-end preprocessing, speaker adaptive training or multiple decoding passes. For the experiments on Aurora-4, the proposed system achieves 4% relative WER improvement over a strong DNN baseline system.

AB - In this work, we propose recurrent deep neural networks (DNNs) for robust automatic speech recognition (ASR). Full recurrent connections are added to certain hidden layer of a conventional feedforward DNN and allow the model to capture the temporal dependency in deep representations. A new backpropagation through time (BPTT) algorithm is introduced to make the minibatch stochastic gradient descent (SGD) on the proposed recurrent DNNs more efficient and effective. We evaluate the proposed recurrent DNN architecture under the hybrid setup on both the 2nd CHiME challenge (track 2) and Aurora-4 tasks. Experimental results on the CHiME challenge data show that the proposed system can obtain consistent 7% relative WER improvements over the DNN systems, achieving state-of-the-art performance without front-end preprocessing, speaker adaptive training or multiple decoding passes. For the experiments on Aurora-4, the proposed system achieves 4% relative WER improvement over a strong DNN baseline system.

KW - Aurora-4

KW - CHiME

KW - DNN

KW - RNN

KW - robust ASR

UR - http://www.scopus.com/inward/record.url?scp=84905240834&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905240834&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2014.6854661

DO - 10.1109/ICASSP.2014.6854661

M3 - Conference contribution

AN - SCOPUS:84905240834

SN - 9781479928927

SP - 5532

EP - 5536

BT - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014

PB - Institute of Electrical and Electronics Engineers Inc.

ER -