Static and dynamic variance compensation for recognition of reverberant speech with dereverberation preprocessing

Marc Delcroix, Tomohiro Nakatani, Shinji Watanabe

Research output: Contribution to journalArticle

48 Citations (Scopus)

Abstract

The performance of automatic speech recognition is severely degraded in the presence of noise or reverberation. Much research has been undertaken on noise robustness. In contrast, the problem of the recognition of reverberant speech has received far less attention and remains very challenging. In this paper, we use a dereverberation method to reduce reverberation prior to recognition. Such a preprocessor may remove most reverberation effects. However, it often introduces distortion, causing a dynamic mismatch between speech features and the acoustic model used for recognition. Model adaptation could be used to reduce this mismatch. However, conventional model adaptation techniques assume a static mismatch and may therefore not cope well with a dynamic mismatch arising from dereverberation. This paper proposes a novel adaptation scheme that is capable of managing both static and dynamic mismatches. We introduce a parametric model for variance adaptation that includes static and dynamic components in order to realize an appropriate interconnection between dereverberation and a speech recognizer. The model parameters are optimized using adaptive training implemented with the Expectation Maximization algorithm. An experiment using the proposed method with reverberant speech for a reverberation time of 0.5 s revealed that it was possible to achieve an 80% reduction in the relative error rate compared with the recognition of dereverberated speech (word error rate of 31%), and the final error rate was 5.4%, which was obtained by combining the proposed variance compensation and MLLR adaptation.

Original languageEnglish
Pages (from-to)324-334
Number of pages11
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume17
Issue number2
DOIs
Publication statusPublished - 2009 Feb
Externally publishedYes

Fingerprint

preprocessing
Reverberation
reverberation
Acoustic noise
speech recognition
Speech recognition
education
Acoustics
Compensation and Redress
acoustics
Experiments

Keywords

  • Dereverberation
  • Model adaptation
  • Robust automatic speech recognition (ASR)
  • Variance compensation

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

Static and dynamic variance compensation for recognition of reverberant speech with dereverberation preprocessing. / Delcroix, Marc; Nakatani, Tomohiro; Watanabe, Shinji.

In: IEEE Transactions on Audio, Speech and Language Processing, Vol. 17, No. 2, 02.2009, p. 324-334.

Research output: Contribution to journalArticle

@article{8683fa420b1d4bc7960a6d62d7f9ebdd,
title = "Static and dynamic variance compensation for recognition of reverberant speech with dereverberation preprocessing",
abstract = "The performance of automatic speech recognition is severely degraded in the presence of noise or reverberation. Much research has been undertaken on noise robustness. In contrast, the problem of the recognition of reverberant speech has received far less attention and remains very challenging. In this paper, we use a dereverberation method to reduce reverberation prior to recognition. Such a preprocessor may remove most reverberation effects. However, it often introduces distortion, causing a dynamic mismatch between speech features and the acoustic model used for recognition. Model adaptation could be used to reduce this mismatch. However, conventional model adaptation techniques assume a static mismatch and may therefore not cope well with a dynamic mismatch arising from dereverberation. This paper proposes a novel adaptation scheme that is capable of managing both static and dynamic mismatches. We introduce a parametric model for variance adaptation that includes static and dynamic components in order to realize an appropriate interconnection between dereverberation and a speech recognizer. The model parameters are optimized using adaptive training implemented with the Expectation Maximization algorithm. An experiment using the proposed method with reverberant speech for a reverberation time of 0.5 s revealed that it was possible to achieve an 80{\%} reduction in the relative error rate compared with the recognition of dereverberated speech (word error rate of 31{\%}), and the final error rate was 5.4{\%}, which was obtained by combining the proposed variance compensation and MLLR adaptation.",
keywords = "Dereverberation, Model adaptation, Robust automatic speech recognition (ASR), Variance compensation",
author = "Marc Delcroix and Tomohiro Nakatani and Shinji Watanabe",
year = "2009",
month = "2",
doi = "10.1109/TASL.2008.2010214",
language = "English",
volume = "17",
pages = "324--334",
journal = "IEEE Transactions on Speech and Audio Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "2",

}

TY - JOUR

T1 - Static and dynamic variance compensation for recognition of reverberant speech with dereverberation preprocessing

AU - Delcroix, Marc

AU - Nakatani, Tomohiro

AU - Watanabe, Shinji

PY - 2009/2

Y1 - 2009/2

N2 - The performance of automatic speech recognition is severely degraded in the presence of noise or reverberation. Much research has been undertaken on noise robustness. In contrast, the problem of the recognition of reverberant speech has received far less attention and remains very challenging. In this paper, we use a dereverberation method to reduce reverberation prior to recognition. Such a preprocessor may remove most reverberation effects. However, it often introduces distortion, causing a dynamic mismatch between speech features and the acoustic model used for recognition. Model adaptation could be used to reduce this mismatch. However, conventional model adaptation techniques assume a static mismatch and may therefore not cope well with a dynamic mismatch arising from dereverberation. This paper proposes a novel adaptation scheme that is capable of managing both static and dynamic mismatches. We introduce a parametric model for variance adaptation that includes static and dynamic components in order to realize an appropriate interconnection between dereverberation and a speech recognizer. The model parameters are optimized using adaptive training implemented with the Expectation Maximization algorithm. An experiment using the proposed method with reverberant speech for a reverberation time of 0.5 s revealed that it was possible to achieve an 80% reduction in the relative error rate compared with the recognition of dereverberated speech (word error rate of 31%), and the final error rate was 5.4%, which was obtained by combining the proposed variance compensation and MLLR adaptation.

AB - The performance of automatic speech recognition is severely degraded in the presence of noise or reverberation. Much research has been undertaken on noise robustness. In contrast, the problem of the recognition of reverberant speech has received far less attention and remains very challenging. In this paper, we use a dereverberation method to reduce reverberation prior to recognition. Such a preprocessor may remove most reverberation effects. However, it often introduces distortion, causing a dynamic mismatch between speech features and the acoustic model used for recognition. Model adaptation could be used to reduce this mismatch. However, conventional model adaptation techniques assume a static mismatch and may therefore not cope well with a dynamic mismatch arising from dereverberation. This paper proposes a novel adaptation scheme that is capable of managing both static and dynamic mismatches. We introduce a parametric model for variance adaptation that includes static and dynamic components in order to realize an appropriate interconnection between dereverberation and a speech recognizer. The model parameters are optimized using adaptive training implemented with the Expectation Maximization algorithm. An experiment using the proposed method with reverberant speech for a reverberation time of 0.5 s revealed that it was possible to achieve an 80% reduction in the relative error rate compared with the recognition of dereverberated speech (word error rate of 31%), and the final error rate was 5.4%, which was obtained by combining the proposed variance compensation and MLLR adaptation.

KW - Dereverberation

KW - Model adaptation

KW - Robust automatic speech recognition (ASR)

KW - Variance compensation

UR - http://www.scopus.com/inward/record.url?scp=70350450398&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70350450398&partnerID=8YFLogxK

U2 - 10.1109/TASL.2008.2010214

DO - 10.1109/TASL.2008.2010214

M3 - Article

AN - SCOPUS:70350450398

VL - 17

SP - 324

EP - 334

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

SN - 1558-7916

IS - 2

ER -