Feature enhancement with joint use of consecutive corrupted and noise feature vectors with discriminative region weighting

Masayuki Suzuki, Takuya Yoshioka, Shinji Watanabe, Nobuaki Minematsu, Keikichi Hirose

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

This paper proposes a feature enhancement method that can achieve high speech recognition performance in a variety of noise environments with feasible computational cost. As the well-known Stereo-based Piecewise Linear Compensation for Environments (SPLICE) algorithm, the proposed method learns piecewise linear transformation to map corrupted feature vectors to the corresponding clean features, which enables efficient operation. To make the feature enhancement process adaptive to changes in noise, the piecewise linear transformation is performed by using a subspace of the joint space of corrupted and noise feature vectors, where the subspace is chosen such that classes (i.e., Gaussian mixture components) of underlying clean feature vectors can be best predicted. In addition, we propose utilizing temporally adjacent frames of corrupted and noise features in order to leverage dynamic characteristics of feature vectors. To prevent overfitting caused by the high dimensionality of the extended feature vectors covering the neighboring frames, we introduce regularized weighted minimum mean square error criterion. The proposed method achieved relative improvements of 34.2% and 22.2% over SPLICE under the clean and multi-style conditions, respectively, on the Aurora 2 task.

Original languageEnglish
Article number6544587
Pages (from-to)2172-2181
Number of pages10
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume21
Issue number10
DOIs
Publication statusPublished - 2013
Externally publishedYes

Fingerprint

augmentation
linear transformations
Linear transformations
speech recognition
Speech recognition
Acoustic noise
Mean square error
dynamic characteristics
coverings
costs
Costs
Compensation and Redress

Keywords

  • Feature enhancement
  • noise robust automatic speech recognition
  • non-stationary noise
  • SPLICE
  • vector Taylor series

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

Feature enhancement with joint use of consecutive corrupted and noise feature vectors with discriminative region weighting. / Suzuki, Masayuki; Yoshioka, Takuya; Watanabe, Shinji; Minematsu, Nobuaki; Hirose, Keikichi.

In: IEEE Transactions on Audio, Speech and Language Processing, Vol. 21, No. 10, 6544587, 2013, p. 2172-2181.

Research output: Contribution to journalArticle

@article{15b5023cc83b478fa7e5d528e8657d02,
title = "Feature enhancement with joint use of consecutive corrupted and noise feature vectors with discriminative region weighting",
abstract = "This paper proposes a feature enhancement method that can achieve high speech recognition performance in a variety of noise environments with feasible computational cost. As the well-known Stereo-based Piecewise Linear Compensation for Environments (SPLICE) algorithm, the proposed method learns piecewise linear transformation to map corrupted feature vectors to the corresponding clean features, which enables efficient operation. To make the feature enhancement process adaptive to changes in noise, the piecewise linear transformation is performed by using a subspace of the joint space of corrupted and noise feature vectors, where the subspace is chosen such that classes (i.e., Gaussian mixture components) of underlying clean feature vectors can be best predicted. In addition, we propose utilizing temporally adjacent frames of corrupted and noise features in order to leverage dynamic characteristics of feature vectors. To prevent overfitting caused by the high dimensionality of the extended feature vectors covering the neighboring frames, we introduce regularized weighted minimum mean square error criterion. The proposed method achieved relative improvements of 34.2{\%} and 22.2{\%} over SPLICE under the clean and multi-style conditions, respectively, on the Aurora 2 task.",
keywords = "Feature enhancement, noise robust automatic speech recognition, non-stationary noise, SPLICE, vector Taylor series",
author = "Masayuki Suzuki and Takuya Yoshioka and Shinji Watanabe and Nobuaki Minematsu and Keikichi Hirose",
year = "2013",
doi = "10.1109/TASL.2013.2270407",
language = "English",
volume = "21",
pages = "2172--2181",
journal = "IEEE Transactions on Speech and Audio Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "10",

}

TY - JOUR

T1 - Feature enhancement with joint use of consecutive corrupted and noise feature vectors with discriminative region weighting

AU - Suzuki, Masayuki

AU - Yoshioka, Takuya

AU - Watanabe, Shinji

AU - Minematsu, Nobuaki

AU - Hirose, Keikichi

PY - 2013

Y1 - 2013

N2 - This paper proposes a feature enhancement method that can achieve high speech recognition performance in a variety of noise environments with feasible computational cost. As the well-known Stereo-based Piecewise Linear Compensation for Environments (SPLICE) algorithm, the proposed method learns piecewise linear transformation to map corrupted feature vectors to the corresponding clean features, which enables efficient operation. To make the feature enhancement process adaptive to changes in noise, the piecewise linear transformation is performed by using a subspace of the joint space of corrupted and noise feature vectors, where the subspace is chosen such that classes (i.e., Gaussian mixture components) of underlying clean feature vectors can be best predicted. In addition, we propose utilizing temporally adjacent frames of corrupted and noise features in order to leverage dynamic characteristics of feature vectors. To prevent overfitting caused by the high dimensionality of the extended feature vectors covering the neighboring frames, we introduce regularized weighted minimum mean square error criterion. The proposed method achieved relative improvements of 34.2% and 22.2% over SPLICE under the clean and multi-style conditions, respectively, on the Aurora 2 task.

AB - This paper proposes a feature enhancement method that can achieve high speech recognition performance in a variety of noise environments with feasible computational cost. As the well-known Stereo-based Piecewise Linear Compensation for Environments (SPLICE) algorithm, the proposed method learns piecewise linear transformation to map corrupted feature vectors to the corresponding clean features, which enables efficient operation. To make the feature enhancement process adaptive to changes in noise, the piecewise linear transformation is performed by using a subspace of the joint space of corrupted and noise feature vectors, where the subspace is chosen such that classes (i.e., Gaussian mixture components) of underlying clean feature vectors can be best predicted. In addition, we propose utilizing temporally adjacent frames of corrupted and noise features in order to leverage dynamic characteristics of feature vectors. To prevent overfitting caused by the high dimensionality of the extended feature vectors covering the neighboring frames, we introduce regularized weighted minimum mean square error criterion. The proposed method achieved relative improvements of 34.2% and 22.2% over SPLICE under the clean and multi-style conditions, respectively, on the Aurora 2 task.

KW - Feature enhancement

KW - noise robust automatic speech recognition

KW - non-stationary noise

KW - SPLICE

KW - vector Taylor series

UR - http://www.scopus.com/inward/record.url?scp=84881054746&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84881054746&partnerID=8YFLogxK

U2 - 10.1109/TASL.2013.2270407

DO - 10.1109/TASL.2013.2270407

M3 - Article

AN - SCOPUS:84881054746

VL - 21

SP - 2172

EP - 2181

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

SN - 1558-7916

IS - 10

M1 - 6544587

ER -