Prior-based binary masking and discriminative methods for reverberant and noisy speech recognition using distant stereo microphones

Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey

Research output: Contribution to journalArticle

Abstract

Reverberant and noisy automatic speech recognition (ASR) using distant stereo microphones is a very challenging, but desirable scenario for home-environment speech applications. This scenario can often provide prior knowledge such as physical information about the sound sources and the environment in advance, which may then be used to reduce the influence of the interference. We propose a method to enhance the binary masking algorithm by using prior distributions of the time difference of arrival. This paper also validates state-of-the-art ASR techniques that include various discriminative training and feature transformation methods. Furthermore, we develop an efficient method to combine discriminative language modeling and minimum Bayes risk decoding in the ASR post-processing stage. We also investigate the effectiveness of this method when used for reverberated and noisy ASR of deep neural networks (DNNs) as well when used in systems that combine multiple DNNs using different features. Experiments on the medium vocabulary sub-task of the second CHiME challenge show that the system submitted to the challenge achieved a 26.86% word error rate (WER), moreover, the DNN system with the discriminative training, speaker adaptation and system combination achieves a 20.40% WER.

Original languageEnglish
Pages (from-to)407-416
Number of pages10
JournalJournal of Information Processing
Volume25
DOIs
Publication statusPublished - 2017
Externally publishedYes

Fingerprint

Speech intelligibility
Microphones
Speech recognition
Decoding
Acoustic waves
Processing
Deep neural networks
Experiments

Keywords

  • CHiME challenge
  • Deep neural networks
  • Discriminative methods
  • Feature transformation
  • Noise-robust ASR
  • Prior-based binary masking
  • System combination

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Prior-based binary masking and discriminative methods for reverberant and noisy speech recognition using distant stereo microphones. / Tachioka, Yuuki; Watanabe, Shinji; Le Roux, Jonathan; Hershey, John R.

In: Journal of Information Processing, Vol. 25, 2017, p. 407-416.

Research output: Contribution to journalArticle

@article{4570d61c33c840289a573b52371f63cd,
title = "Prior-based binary masking and discriminative methods for reverberant and noisy speech recognition using distant stereo microphones",
abstract = "Reverberant and noisy automatic speech recognition (ASR) using distant stereo microphones is a very challenging, but desirable scenario for home-environment speech applications. This scenario can often provide prior knowledge such as physical information about the sound sources and the environment in advance, which may then be used to reduce the influence of the interference. We propose a method to enhance the binary masking algorithm by using prior distributions of the time difference of arrival. This paper also validates state-of-the-art ASR techniques that include various discriminative training and feature transformation methods. Furthermore, we develop an efficient method to combine discriminative language modeling and minimum Bayes risk decoding in the ASR post-processing stage. We also investigate the effectiveness of this method when used for reverberated and noisy ASR of deep neural networks (DNNs) as well when used in systems that combine multiple DNNs using different features. Experiments on the medium vocabulary sub-task of the second CHiME challenge show that the system submitted to the challenge achieved a 26.86{\%} word error rate (WER), moreover, the DNN system with the discriminative training, speaker adaptation and system combination achieves a 20.40{\%} WER.",
keywords = "CHiME challenge, Deep neural networks, Discriminative methods, Feature transformation, Noise-robust ASR, Prior-based binary masking, System combination",
author = "Yuuki Tachioka and Shinji Watanabe and {Le Roux}, Jonathan and Hershey, {John R.}",
year = "2017",
doi = "10.2197/ipsjjip.25.407",
language = "English",
volume = "25",
pages = "407--416",
journal = "Journal of Information Processing",
issn = "0387-5806",
publisher = "Information Processing Society of Japan",

}

TY - JOUR

T1 - Prior-based binary masking and discriminative methods for reverberant and noisy speech recognition using distant stereo microphones

AU - Tachioka, Yuuki

AU - Watanabe, Shinji

AU - Le Roux, Jonathan

AU - Hershey, John R.

PY - 2017

Y1 - 2017

N2 - Reverberant and noisy automatic speech recognition (ASR) using distant stereo microphones is a very challenging, but desirable scenario for home-environment speech applications. This scenario can often provide prior knowledge such as physical information about the sound sources and the environment in advance, which may then be used to reduce the influence of the interference. We propose a method to enhance the binary masking algorithm by using prior distributions of the time difference of arrival. This paper also validates state-of-the-art ASR techniques that include various discriminative training and feature transformation methods. Furthermore, we develop an efficient method to combine discriminative language modeling and minimum Bayes risk decoding in the ASR post-processing stage. We also investigate the effectiveness of this method when used for reverberated and noisy ASR of deep neural networks (DNNs) as well when used in systems that combine multiple DNNs using different features. Experiments on the medium vocabulary sub-task of the second CHiME challenge show that the system submitted to the challenge achieved a 26.86% word error rate (WER), moreover, the DNN system with the discriminative training, speaker adaptation and system combination achieves a 20.40% WER.

AB - Reverberant and noisy automatic speech recognition (ASR) using distant stereo microphones is a very challenging, but desirable scenario for home-environment speech applications. This scenario can often provide prior knowledge such as physical information about the sound sources and the environment in advance, which may then be used to reduce the influence of the interference. We propose a method to enhance the binary masking algorithm by using prior distributions of the time difference of arrival. This paper also validates state-of-the-art ASR techniques that include various discriminative training and feature transformation methods. Furthermore, we develop an efficient method to combine discriminative language modeling and minimum Bayes risk decoding in the ASR post-processing stage. We also investigate the effectiveness of this method when used for reverberated and noisy ASR of deep neural networks (DNNs) as well when used in systems that combine multiple DNNs using different features. Experiments on the medium vocabulary sub-task of the second CHiME challenge show that the system submitted to the challenge achieved a 26.86% word error rate (WER), moreover, the DNN system with the discriminative training, speaker adaptation and system combination achieves a 20.40% WER.

KW - CHiME challenge

KW - Deep neural networks

KW - Discriminative methods

KW - Feature transformation

KW - Noise-robust ASR

KW - Prior-based binary masking

KW - System combination

UR - http://www.scopus.com/inward/record.url?scp=85020911444&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85020911444&partnerID=8YFLogxK

U2 - 10.2197/ipsjjip.25.407

DO - 10.2197/ipsjjip.25.407

M3 - Article

VL - 25

SP - 407

EP - 416

JO - Journal of Information Processing

JF - Journal of Information Processing

SN - 0387-5806

ER -