Effectiveness of discriminative training and feature transformation for reverberated and noisy speech

Yuuki Tachioka, Shinji Watanabe, John R. Hershey

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

Automatic speech recognition in the presence of non-stationary interference and reverberation remains a challenging problem. The 2nd 'CHiME' Speech Separation and Recognition Challenge introduces a new and difficult task with time-varying reverberation and non-stationary interference including natural background speech, home noises, or music. This paper establishes baselines using state-of-the-art ASR techniques such as discriminative training and various feature transformation on the middle-vocabulary sub-task of this challenge. In addition, we propose an augmented discriminative feature transformation that introduces arbitrary features to a discriminative feature transformation. We present experimental results showing that discriminative training of model parameters and feature transforms is highly effective for this task, and that the augmented feature transformation provides some preliminary benefits. The training code will be released as an advanced ASR baseline.

Original languageEnglish
Title of host publication2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Pages6935-6939
Number of pages5
DOIs
Publication statusPublished - 2013 Oct 18
Externally publishedYes
Event2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC
Duration: 2013 May 262013 May 31

Other

Other2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
CityVancouver, BC
Period13/5/2613/5/31

Fingerprint

Reverberation
Speech recognition

Keywords

  • Augmented discriminative feature transformation
  • CHiME challenge
  • Discriminative training
  • Feature transformation
  • Kaldi

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Tachioka, Y., Watanabe, S., & Hershey, J. R. (2013). Effectiveness of discriminative training and feature transformation for reverberated and noisy speech. In 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings (pp. 6935-6939). [6639006] https://doi.org/10.1109/ICASSP.2013.6639006

Effectiveness of discriminative training and feature transformation for reverberated and noisy speech. / Tachioka, Yuuki; Watanabe, Shinji; Hershey, John R.

2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings. 2013. p. 6935-6939 6639006.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tachioka, Y, Watanabe, S & Hershey, JR 2013, Effectiveness of discriminative training and feature transformation for reverberated and noisy speech. in 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings., 6639006, pp. 6935-6939, 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013, Vancouver, BC, 13/5/26. https://doi.org/10.1109/ICASSP.2013.6639006
Tachioka Y, Watanabe S, Hershey JR. Effectiveness of discriminative training and feature transformation for reverberated and noisy speech. In 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings. 2013. p. 6935-6939. 6639006 https://doi.org/10.1109/ICASSP.2013.6639006
Tachioka, Yuuki ; Watanabe, Shinji ; Hershey, John R. / Effectiveness of discriminative training and feature transformation for reverberated and noisy speech. 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings. 2013. pp. 6935-6939
@inproceedings{33dddb9d89fc47b999891f102de15a6f,
title = "Effectiveness of discriminative training and feature transformation for reverberated and noisy speech",
abstract = "Automatic speech recognition in the presence of non-stationary interference and reverberation remains a challenging problem. The 2nd 'CHiME' Speech Separation and Recognition Challenge introduces a new and difficult task with time-varying reverberation and non-stationary interference including natural background speech, home noises, or music. This paper establishes baselines using state-of-the-art ASR techniques such as discriminative training and various feature transformation on the middle-vocabulary sub-task of this challenge. In addition, we propose an augmented discriminative feature transformation that introduces arbitrary features to a discriminative feature transformation. We present experimental results showing that discriminative training of model parameters and feature transforms is highly effective for this task, and that the augmented feature transformation provides some preliminary benefits. The training code will be released as an advanced ASR baseline.",
keywords = "Augmented discriminative feature transformation, CHiME challenge, Discriminative training, Feature transformation, Kaldi",
author = "Yuuki Tachioka and Shinji Watanabe and Hershey, {John R.}",
year = "2013",
month = "10",
day = "18",
doi = "10.1109/ICASSP.2013.6639006",
language = "English",
isbn = "9781479903566",
pages = "6935--6939",
booktitle = "2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings",

}

TY - GEN

T1 - Effectiveness of discriminative training and feature transformation for reverberated and noisy speech

AU - Tachioka, Yuuki

AU - Watanabe, Shinji

AU - Hershey, John R.

PY - 2013/10/18

Y1 - 2013/10/18

N2 - Automatic speech recognition in the presence of non-stationary interference and reverberation remains a challenging problem. The 2nd 'CHiME' Speech Separation and Recognition Challenge introduces a new and difficult task with time-varying reverberation and non-stationary interference including natural background speech, home noises, or music. This paper establishes baselines using state-of-the-art ASR techniques such as discriminative training and various feature transformation on the middle-vocabulary sub-task of this challenge. In addition, we propose an augmented discriminative feature transformation that introduces arbitrary features to a discriminative feature transformation. We present experimental results showing that discriminative training of model parameters and feature transforms is highly effective for this task, and that the augmented feature transformation provides some preliminary benefits. The training code will be released as an advanced ASR baseline.

AB - Automatic speech recognition in the presence of non-stationary interference and reverberation remains a challenging problem. The 2nd 'CHiME' Speech Separation and Recognition Challenge introduces a new and difficult task with time-varying reverberation and non-stationary interference including natural background speech, home noises, or music. This paper establishes baselines using state-of-the-art ASR techniques such as discriminative training and various feature transformation on the middle-vocabulary sub-task of this challenge. In addition, we propose an augmented discriminative feature transformation that introduces arbitrary features to a discriminative feature transformation. We present experimental results showing that discriminative training of model parameters and feature transforms is highly effective for this task, and that the augmented feature transformation provides some preliminary benefits. The training code will be released as an advanced ASR baseline.

KW - Augmented discriminative feature transformation

KW - CHiME challenge

KW - Discriminative training

KW - Feature transformation

KW - Kaldi

UR - http://www.scopus.com/inward/record.url?scp=84890503970&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890503970&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2013.6639006

DO - 10.1109/ICASSP.2013.6639006

M3 - Conference contribution

AN - SCOPUS:84890503970

SN - 9781479903566

SP - 6935

EP - 6939

BT - 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings

ER -