Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments

Yuuki Tachioka, Tomohiro Narita, Shinji Watanabe

Research output: Contribution to journalArticle

Abstract

The recently released REverberant Voice Enhancement and Recognition Benchmark (REVERB) challenge includes a reverberant automatic speech recognition (ASR) task. This paper describes our proposed system based on multi-channel speech enhancement preprocessing and state-of-the-art ASR techniques. For preprocessing, we propose a single-channel dereverberation method with reverberation time estimation, which is combined with multichannel beamforming that enhances direct sound compared with the reflected sound. In addition, this paper also focuses on state-of-the-art ASR techniques such as discriminative training of acoustic models including the Gaussian mixture model, subspace Gaussian mixture model, and deep neural networks, as well as various feature transformation techniques. Although, for the REVERB challenge, it is necessary to handle various acoustic environments, a single ASR system tends to be overly tuned for a specific environment, which degrades the performance in the mismatch environments. To overcome this mismatch problem with a single ASR system, we use a system combination approach using multiple ASR systems with different features and different model types because a combination of various systems that have different error patterns is beneficial. In particular, we use our discriminative training technique for system combination that achieves better generalization by making systems complementary with the modified discriminative criteria. Experiments show the effectiveness of these approaches, reaching 6.76 and 18.60 % word error rates on the REVERB simulated and real test sets. These are 68.8 and 61.5 % relative improvements over the baseline.

Original languageEnglish
Article number52
JournalEurasip Journal on Advances in Signal Processing
Volume2015
Issue number1
DOIs
Publication statusPublished - 2015 Dec 27
Externally publishedYes

Fingerprint

Speech recognition
Acoustics
Acoustic waves
Speech enhancement
Reverberation
Beamforming
Experiments

Keywords

  • Dereverberation
  • Discriminative training
  • Feature transformation
  • REVERB challenge
  • Reverberant speech recognition
  • System combination

ASJC Scopus subject areas

  • Hardware and Architecture
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

@article{2fb0628d636e40bb89f3939085cb8bd3,
title = "Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments",
abstract = "The recently released REverberant Voice Enhancement and Recognition Benchmark (REVERB) challenge includes a reverberant automatic speech recognition (ASR) task. This paper describes our proposed system based on multi-channel speech enhancement preprocessing and state-of-the-art ASR techniques. For preprocessing, we propose a single-channel dereverberation method with reverberation time estimation, which is combined with multichannel beamforming that enhances direct sound compared with the reflected sound. In addition, this paper also focuses on state-of-the-art ASR techniques such as discriminative training of acoustic models including the Gaussian mixture model, subspace Gaussian mixture model, and deep neural networks, as well as various feature transformation techniques. Although, for the REVERB challenge, it is necessary to handle various acoustic environments, a single ASR system tends to be overly tuned for a specific environment, which degrades the performance in the mismatch environments. To overcome this mismatch problem with a single ASR system, we use a system combination approach using multiple ASR systems with different features and different model types because a combination of various systems that have different error patterns is beneficial. In particular, we use our discriminative training technique for system combination that achieves better generalization by making systems complementary with the modified discriminative criteria. Experiments show the effectiveness of these approaches, reaching 6.76 and 18.60 {\%} word error rates on the REVERB simulated and real test sets. These are 68.8 and 61.5 {\%} relative improvements over the baseline.",
keywords = "Dereverberation, Discriminative training, Feature transformation, REVERB challenge, Reverberant speech recognition, System combination",
author = "Yuuki Tachioka and Tomohiro Narita and Shinji Watanabe",
year = "2015",
month = "12",
day = "27",
doi = "10.1186/s13634-015-0241-y",
language = "English",
volume = "2015",
journal = "Eurasip Journal on Advances in Signal Processing",
issn = "1687-6172",
publisher = "Hindawi Publishing Corporation",
number = "1",

}

TY - JOUR

T1 - Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments

AU - Tachioka, Yuuki

AU - Narita, Tomohiro

AU - Watanabe, Shinji

PY - 2015/12/27

Y1 - 2015/12/27

N2 - The recently released REverberant Voice Enhancement and Recognition Benchmark (REVERB) challenge includes a reverberant automatic speech recognition (ASR) task. This paper describes our proposed system based on multi-channel speech enhancement preprocessing and state-of-the-art ASR techniques. For preprocessing, we propose a single-channel dereverberation method with reverberation time estimation, which is combined with multichannel beamforming that enhances direct sound compared with the reflected sound. In addition, this paper also focuses on state-of-the-art ASR techniques such as discriminative training of acoustic models including the Gaussian mixture model, subspace Gaussian mixture model, and deep neural networks, as well as various feature transformation techniques. Although, for the REVERB challenge, it is necessary to handle various acoustic environments, a single ASR system tends to be overly tuned for a specific environment, which degrades the performance in the mismatch environments. To overcome this mismatch problem with a single ASR system, we use a system combination approach using multiple ASR systems with different features and different model types because a combination of various systems that have different error patterns is beneficial. In particular, we use our discriminative training technique for system combination that achieves better generalization by making systems complementary with the modified discriminative criteria. Experiments show the effectiveness of these approaches, reaching 6.76 and 18.60 % word error rates on the REVERB simulated and real test sets. These are 68.8 and 61.5 % relative improvements over the baseline.

AB - The recently released REverberant Voice Enhancement and Recognition Benchmark (REVERB) challenge includes a reverberant automatic speech recognition (ASR) task. This paper describes our proposed system based on multi-channel speech enhancement preprocessing and state-of-the-art ASR techniques. For preprocessing, we propose a single-channel dereverberation method with reverberation time estimation, which is combined with multichannel beamforming that enhances direct sound compared with the reflected sound. In addition, this paper also focuses on state-of-the-art ASR techniques such as discriminative training of acoustic models including the Gaussian mixture model, subspace Gaussian mixture model, and deep neural networks, as well as various feature transformation techniques. Although, for the REVERB challenge, it is necessary to handle various acoustic environments, a single ASR system tends to be overly tuned for a specific environment, which degrades the performance in the mismatch environments. To overcome this mismatch problem with a single ASR system, we use a system combination approach using multiple ASR systems with different features and different model types because a combination of various systems that have different error patterns is beneficial. In particular, we use our discriminative training technique for system combination that achieves better generalization by making systems complementary with the modified discriminative criteria. Experiments show the effectiveness of these approaches, reaching 6.76 and 18.60 % word error rates on the REVERB simulated and real test sets. These are 68.8 and 61.5 % relative improvements over the baseline.

KW - Dereverberation

KW - Discriminative training

KW - Feature transformation

KW - REVERB challenge

KW - Reverberant speech recognition

KW - System combination

UR - http://www.scopus.com/inward/record.url?scp=84938824436&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84938824436&partnerID=8YFLogxK

U2 - 10.1186/s13634-015-0241-y

DO - 10.1186/s13634-015-0241-y

M3 - Article

AN - SCOPUS:84938824436

VL - 2015

JO - Eurasip Journal on Advances in Signal Processing

JF - Eurasip Journal on Advances in Signal Processing

SN - 1687-6172

IS - 1

M1 - 52

ER -