Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition

Felix Weninger, Shinji Watanabe, Yuuki Tachioka, Bjorn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contribution

43 Citations (Scopus)

Abstract

This paper describes our joint efforts to provide robust automatic speech recognition (ASR) for reverberated environments, such as in hands-free human-machine interaction. We investigate blind feature space de-reverberation and deep recurrent de-noising auto-encoders (DAE) in an early fusion scheme. Results on the 2014 REVERB Challenge development set indicate that the DAE front-end provides complementary performance gains to multi-condition training, feature transformations, and model adaptation. The proposed ASR system achieves word error rates of 17.62 % and 36.6 % on simulated and real data, which is a significant improvement over the Challenge baseline (25.16 and 47.2 %).

Original languageEnglish
Title of host publication2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4623-4627
Number of pages5
ISBN (Print)9781479928927
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 - Florence
Duration: 2014 May 42014 May 9

Other

Other2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
CityFlorence
Period14/5/414/5/9

Fingerprint

Reverberation
Speech recognition
Fusion reactions

Keywords

  • automatic speech recognition
  • De-reverberation
  • feature enhancement
  • recurrent neural networks

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Weninger, F., Watanabe, S., Tachioka, Y., & Schuller, B. (2014). Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition. In 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 (pp. 4623-4627). [6854478] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2014.6854478

Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition. / Weninger, Felix; Watanabe, Shinji; Tachioka, Yuuki; Schuller, Bjorn.

2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Institute of Electrical and Electronics Engineers Inc., 2014. p. 4623-4627 6854478.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Weninger, F, Watanabe, S, Tachioka, Y & Schuller, B 2014, Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition. in 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014., 6854478, Institute of Electrical and Electronics Engineers Inc., pp. 4623-4627, 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014, Florence, 14/5/4. https://doi.org/10.1109/ICASSP.2014.6854478
Weninger F, Watanabe S, Tachioka Y, Schuller B. Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition. In 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Institute of Electrical and Electronics Engineers Inc. 2014. p. 4623-4627. 6854478 https://doi.org/10.1109/ICASSP.2014.6854478
Weninger, Felix ; Watanabe, Shinji ; Tachioka, Yuuki ; Schuller, Bjorn. / Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition. 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 4623-4627
@inproceedings{233cdd039fde43bc970b6470f8b36a23,
title = "Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition",
abstract = "This paper describes our joint efforts to provide robust automatic speech recognition (ASR) for reverberated environments, such as in hands-free human-machine interaction. We investigate blind feature space de-reverberation and deep recurrent de-noising auto-encoders (DAE) in an early fusion scheme. Results on the 2014 REVERB Challenge development set indicate that the DAE front-end provides complementary performance gains to multi-condition training, feature transformations, and model adaptation. The proposed ASR system achieves word error rates of 17.62 {\%} and 36.6 {\%} on simulated and real data, which is a significant improvement over the Challenge baseline (25.16 and 47.2 {\%}).",
keywords = "automatic speech recognition, De-reverberation, feature enhancement, recurrent neural networks",
author = "Felix Weninger and Shinji Watanabe and Yuuki Tachioka and Bjorn Schuller",
year = "2014",
doi = "10.1109/ICASSP.2014.6854478",
language = "English",
isbn = "9781479928927",
pages = "4623--4627",
booktitle = "2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition

AU - Weninger, Felix

AU - Watanabe, Shinji

AU - Tachioka, Yuuki

AU - Schuller, Bjorn

PY - 2014

Y1 - 2014

N2 - This paper describes our joint efforts to provide robust automatic speech recognition (ASR) for reverberated environments, such as in hands-free human-machine interaction. We investigate blind feature space de-reverberation and deep recurrent de-noising auto-encoders (DAE) in an early fusion scheme. Results on the 2014 REVERB Challenge development set indicate that the DAE front-end provides complementary performance gains to multi-condition training, feature transformations, and model adaptation. The proposed ASR system achieves word error rates of 17.62 % and 36.6 % on simulated and real data, which is a significant improvement over the Challenge baseline (25.16 and 47.2 %).

AB - This paper describes our joint efforts to provide robust automatic speech recognition (ASR) for reverberated environments, such as in hands-free human-machine interaction. We investigate blind feature space de-reverberation and deep recurrent de-noising auto-encoders (DAE) in an early fusion scheme. Results on the 2014 REVERB Challenge development set indicate that the DAE front-end provides complementary performance gains to multi-condition training, feature transformations, and model adaptation. The proposed ASR system achieves word error rates of 17.62 % and 36.6 % on simulated and real data, which is a significant improvement over the Challenge baseline (25.16 and 47.2 %).

KW - automatic speech recognition

KW - De-reverberation

KW - feature enhancement

KW - recurrent neural networks

UR - http://www.scopus.com/inward/record.url?scp=84905216003&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905216003&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2014.6854478

DO - 10.1109/ICASSP.2014.6854478

M3 - Conference contribution

SN - 9781479928927

SP - 4623

EP - 4627

BT - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014

PB - Institute of Electrical and Electronics Engineers Inc.

ER -