Robust speech recognition in unknown reverberant and noisy conditions

Roger Hsiao, Jeff Ma, William Hartmann, Martin Karafiát, František Grézl, Lukáš Burget, Igor Szöke, Jan Honza Černocky, Shinji Watanabe, Zhuo Chen, Sri Harish Mallidi, Hynek Hermansky, Stavros Tsakalidis, Richard Schwartz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

In this paper, we describe our work on the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge, which aims to assess the robustness of automatic speech recognition (ASR) systems. The main characteristic of the challenge is developing a high-performance system without access to matched training and development data. While the evaluation data are recorded with far-field microphones in noisy and reverberant rooms, the training data are telephone speech and close talking. Our approach to this challenge includes speech enhancement, neural network methods and acoustic model adaptation, We show that these techniques can successfully alleviate the performance degradation due to noisy audio and data mismatch.

Original languageEnglish
Title of host publication2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages533-538
Number of pages6
ISBN (Electronic)9781479972913
DOIs
Publication statusPublished - 2016 Feb 10
Externally publishedYes
EventIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Scottsdale, United States
Duration: 2015 Dec 132015 Dec 17

Other

OtherIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015
CountryUnited States
CityScottsdale
Period15/12/1315/12/17

Fingerprint

Speech recognition
Speech enhancement
Microphones
Telephone
Acoustics
Neural networks
Degradation

Keywords

  • ASpIRE challenge
  • robust speech recognition

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition

Cite this

Hsiao, R., Ma, J., Hartmann, W., Karafiát, M., Grézl, F., Burget, L., ... Schwartz, R. (2016). Robust speech recognition in unknown reverberant and noisy conditions. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings (pp. 533-538). [7404841] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU.2015.7404841

Robust speech recognition in unknown reverberant and noisy conditions. / Hsiao, Roger; Ma, Jeff; Hartmann, William; Karafiát, Martin; Grézl, František; Burget, Lukáš; Szöke, Igor; Černocky, Jan Honza; Watanabe, Shinji; Chen, Zhuo; Mallidi, Sri Harish; Hermansky, Hynek; Tsakalidis, Stavros; Schwartz, Richard.

2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. p. 533-538 7404841.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hsiao, R, Ma, J, Hartmann, W, Karafiát, M, Grézl, F, Burget, L, Szöke, I, Černocky, JH, Watanabe, S, Chen, Z, Mallidi, SH, Hermansky, H, Tsakalidis, S & Schwartz, R 2016, Robust speech recognition in unknown reverberant and noisy conditions. in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings., 7404841, Institute of Electrical and Electronics Engineers Inc., pp. 533-538, IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, United States, 15/12/13. https://doi.org/10.1109/ASRU.2015.7404841
Hsiao R, Ma J, Hartmann W, Karafiát M, Grézl F, Burget L et al. Robust speech recognition in unknown reverberant and noisy conditions. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2016. p. 533-538. 7404841 https://doi.org/10.1109/ASRU.2015.7404841
Hsiao, Roger ; Ma, Jeff ; Hartmann, William ; Karafiát, Martin ; Grézl, František ; Burget, Lukáš ; Szöke, Igor ; Černocky, Jan Honza ; Watanabe, Shinji ; Chen, Zhuo ; Mallidi, Sri Harish ; Hermansky, Hynek ; Tsakalidis, Stavros ; Schwartz, Richard. / Robust speech recognition in unknown reverberant and noisy conditions. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 533-538
@inproceedings{d590dd9ca8a3478fbb350b25040f743c,
title = "Robust speech recognition in unknown reverberant and noisy conditions",
abstract = "In this paper, we describe our work on the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge, which aims to assess the robustness of automatic speech recognition (ASR) systems. The main characteristic of the challenge is developing a high-performance system without access to matched training and development data. While the evaluation data are recorded with far-field microphones in noisy and reverberant rooms, the training data are telephone speech and close talking. Our approach to this challenge includes speech enhancement, neural network methods and acoustic model adaptation, We show that these techniques can successfully alleviate the performance degradation due to noisy audio and data mismatch.",
keywords = "ASpIRE challenge, robust speech recognition",
author = "Roger Hsiao and Jeff Ma and William Hartmann and Martin Karafi{\'a}t and František Gr{\'e}zl and Luk{\'a}š Burget and Igor Sz{\"o}ke and Černocky, {Jan Honza} and Shinji Watanabe and Zhuo Chen and Mallidi, {Sri Harish} and Hynek Hermansky and Stavros Tsakalidis and Richard Schwartz",
year = "2016",
month = "2",
day = "10",
doi = "10.1109/ASRU.2015.7404841",
language = "English",
pages = "533--538",
booktitle = "2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Robust speech recognition in unknown reverberant and noisy conditions

AU - Hsiao, Roger

AU - Ma, Jeff

AU - Hartmann, William

AU - Karafiát, Martin

AU - Grézl, František

AU - Burget, Lukáš

AU - Szöke, Igor

AU - Černocky, Jan Honza

AU - Watanabe, Shinji

AU - Chen, Zhuo

AU - Mallidi, Sri Harish

AU - Hermansky, Hynek

AU - Tsakalidis, Stavros

AU - Schwartz, Richard

PY - 2016/2/10

Y1 - 2016/2/10

N2 - In this paper, we describe our work on the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge, which aims to assess the robustness of automatic speech recognition (ASR) systems. The main characteristic of the challenge is developing a high-performance system without access to matched training and development data. While the evaluation data are recorded with far-field microphones in noisy and reverberant rooms, the training data are telephone speech and close talking. Our approach to this challenge includes speech enhancement, neural network methods and acoustic model adaptation, We show that these techniques can successfully alleviate the performance degradation due to noisy audio and data mismatch.

AB - In this paper, we describe our work on the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge, which aims to assess the robustness of automatic speech recognition (ASR) systems. The main characteristic of the challenge is developing a high-performance system without access to matched training and development data. While the evaluation data are recorded with far-field microphones in noisy and reverberant rooms, the training data are telephone speech and close talking. Our approach to this challenge includes speech enhancement, neural network methods and acoustic model adaptation, We show that these techniques can successfully alleviate the performance degradation due to noisy audio and data mismatch.

KW - ASpIRE challenge

KW - robust speech recognition

UR - http://www.scopus.com/inward/record.url?scp=84964470918&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964470918&partnerID=8YFLogxK

U2 - 10.1109/ASRU.2015.7404841

DO - 10.1109/ASRU.2015.7404841

M3 - Conference contribution

SP - 533

EP - 538

BT - 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -