MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments

Masayuki Suzuki, Takuya Yoshioka, Shinji Watanabe, Nobuaki Minematsu, Keikichi Hirose

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

One of the most effective approaches to noise robust speech recognition is to remove the noise effect directly from corrupted MFCC vectors. However, VTS enhancement, which is a typical method for performing MFCC enhancement, provides limited improvement when the noise is highly non-stationary. This is because the VTS enhancement method cannot use a time-varying noise model to keep the computational cost at an acceptable level. This paper proposes a method that can enhance MFCC vectors and their dynamic parameters by using noise estimates that change on a frame-by-frame basis at a practical computational cost. The proposed method employs stereo data-based feature mapping like the well known SPLICE algorithm. The novelty of the proposed method lies in that it uses the joint space spanned by a concatenated vector of corrupted and noise features. It is also proposed to use linear discriminant analysis to effectively reduce the dimensionality of the joint space. The proposed method achieves 19.1% and 8.3% relative error reduction from the SPLICE and noise-mean normalized SPLICE algorithms, respectively.

Original languageEnglish
Title of host publication2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
Pages4109-4112
Number of pages4
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto
Duration: 2012 Mar 252012 Mar 30

Other

Other2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
CityKyoto
Period12/3/2512/3/30

Fingerprint

Acoustic noise
Discriminant analysis
Speech recognition
Costs

Keywords

  • Noise robust ASR
  • non-stationary noise
  • SPLICE

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Suzuki, M., Yoshioka, T., Watanabe, S., Minematsu, N., & Hirose, K. (2012). MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments. In 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings (pp. 4109-4112). [6288822] https://doi.org/10.1109/ICASSP.2012.6288822

MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments. / Suzuki, Masayuki; Yoshioka, Takuya; Watanabe, Shinji; Minematsu, Nobuaki; Hirose, Keikichi.

2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings. 2012. p. 4109-4112 6288822.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Suzuki, M, Yoshioka, T, Watanabe, S, Minematsu, N & Hirose, K 2012, MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments. in 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings., 6288822, pp. 4109-4112, 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, Kyoto, 12/3/25. https://doi.org/10.1109/ICASSP.2012.6288822
Suzuki M, Yoshioka T, Watanabe S, Minematsu N, Hirose K. MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments. In 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings. 2012. p. 4109-4112. 6288822 https://doi.org/10.1109/ICASSP.2012.6288822
Suzuki, Masayuki ; Yoshioka, Takuya ; Watanabe, Shinji ; Minematsu, Nobuaki ; Hirose, Keikichi. / MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments. 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings. 2012. pp. 4109-4112
@inproceedings{0498922e1309410bbbde606af4d6ce24,
title = "MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments",
abstract = "One of the most effective approaches to noise robust speech recognition is to remove the noise effect directly from corrupted MFCC vectors. However, VTS enhancement, which is a typical method for performing MFCC enhancement, provides limited improvement when the noise is highly non-stationary. This is because the VTS enhancement method cannot use a time-varying noise model to keep the computational cost at an acceptable level. This paper proposes a method that can enhance MFCC vectors and their dynamic parameters by using noise estimates that change on a frame-by-frame basis at a practical computational cost. The proposed method employs stereo data-based feature mapping like the well known SPLICE algorithm. The novelty of the proposed method lies in that it uses the joint space spanned by a concatenated vector of corrupted and noise features. It is also proposed to use linear discriminant analysis to effectively reduce the dimensionality of the joint space. The proposed method achieves 19.1{\%} and 8.3{\%} relative error reduction from the SPLICE and noise-mean normalized SPLICE algorithms, respectively.",
keywords = "Noise robust ASR, non-stationary noise, SPLICE",
author = "Masayuki Suzuki and Takuya Yoshioka and Shinji Watanabe and Nobuaki Minematsu and Keikichi Hirose",
year = "2012",
doi = "10.1109/ICASSP.2012.6288822",
language = "English",
isbn = "9781467300469",
pages = "4109--4112",
booktitle = "2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings",

}

TY - GEN

T1 - MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments

AU - Suzuki, Masayuki

AU - Yoshioka, Takuya

AU - Watanabe, Shinji

AU - Minematsu, Nobuaki

AU - Hirose, Keikichi

PY - 2012

Y1 - 2012

N2 - One of the most effective approaches to noise robust speech recognition is to remove the noise effect directly from corrupted MFCC vectors. However, VTS enhancement, which is a typical method for performing MFCC enhancement, provides limited improvement when the noise is highly non-stationary. This is because the VTS enhancement method cannot use a time-varying noise model to keep the computational cost at an acceptable level. This paper proposes a method that can enhance MFCC vectors and their dynamic parameters by using noise estimates that change on a frame-by-frame basis at a practical computational cost. The proposed method employs stereo data-based feature mapping like the well known SPLICE algorithm. The novelty of the proposed method lies in that it uses the joint space spanned by a concatenated vector of corrupted and noise features. It is also proposed to use linear discriminant analysis to effectively reduce the dimensionality of the joint space. The proposed method achieves 19.1% and 8.3% relative error reduction from the SPLICE and noise-mean normalized SPLICE algorithms, respectively.

AB - One of the most effective approaches to noise robust speech recognition is to remove the noise effect directly from corrupted MFCC vectors. However, VTS enhancement, which is a typical method for performing MFCC enhancement, provides limited improvement when the noise is highly non-stationary. This is because the VTS enhancement method cannot use a time-varying noise model to keep the computational cost at an acceptable level. This paper proposes a method that can enhance MFCC vectors and their dynamic parameters by using noise estimates that change on a frame-by-frame basis at a practical computational cost. The proposed method employs stereo data-based feature mapping like the well known SPLICE algorithm. The novelty of the proposed method lies in that it uses the joint space spanned by a concatenated vector of corrupted and noise features. It is also proposed to use linear discriminant analysis to effectively reduce the dimensionality of the joint space. The proposed method achieves 19.1% and 8.3% relative error reduction from the SPLICE and noise-mean normalized SPLICE algorithms, respectively.

KW - Noise robust ASR

KW - non-stationary noise

KW - SPLICE

UR - http://www.scopus.com/inward/record.url?scp=84867614789&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867614789&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2012.6288822

DO - 10.1109/ICASSP.2012.6288822

M3 - Conference contribution

SN - 9781467300469

SP - 4109

EP - 4112

BT - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings

ER -