TY - GEN
T1 - MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments
AU - Suzuki, Masayuki
AU - Yoshioka, Takuya
AU - Watanabe, Shinji
AU - Minematsu, Nobuaki
AU - Hirose, Keikichi
PY - 2012/10/23
Y1 - 2012/10/23
N2 - One of the most effective approaches to noise robust speech recognition is to remove the noise effect directly from corrupted MFCC vectors. However, VTS enhancement, which is a typical method for performing MFCC enhancement, provides limited improvement when the noise is highly non-stationary. This is because the VTS enhancement method cannot use a time-varying noise model to keep the computational cost at an acceptable level. This paper proposes a method that can enhance MFCC vectors and their dynamic parameters by using noise estimates that change on a frame-by-frame basis at a practical computational cost. The proposed method employs stereo data-based feature mapping like the well known SPLICE algorithm. The novelty of the proposed method lies in that it uses the joint space spanned by a concatenated vector of corrupted and noise features. It is also proposed to use linear discriminant analysis to effectively reduce the dimensionality of the joint space. The proposed method achieves 19.1% and 8.3% relative error reduction from the SPLICE and noise-mean normalized SPLICE algorithms, respectively.
AB - One of the most effective approaches to noise robust speech recognition is to remove the noise effect directly from corrupted MFCC vectors. However, VTS enhancement, which is a typical method for performing MFCC enhancement, provides limited improvement when the noise is highly non-stationary. This is because the VTS enhancement method cannot use a time-varying noise model to keep the computational cost at an acceptable level. This paper proposes a method that can enhance MFCC vectors and their dynamic parameters by using noise estimates that change on a frame-by-frame basis at a practical computational cost. The proposed method employs stereo data-based feature mapping like the well known SPLICE algorithm. The novelty of the proposed method lies in that it uses the joint space spanned by a concatenated vector of corrupted and noise features. It is also proposed to use linear discriminant analysis to effectively reduce the dimensionality of the joint space. The proposed method achieves 19.1% and 8.3% relative error reduction from the SPLICE and noise-mean normalized SPLICE algorithms, respectively.
KW - Noise robust ASR
KW - SPLICE
KW - non-stationary noise
UR - http://www.scopus.com/inward/record.url?scp=84867614789&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84867614789&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2012.6288822
DO - 10.1109/ICASSP.2012.6288822
M3 - Conference contribution
AN - SCOPUS:84867614789
SN - 9781467300469
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4109
EP - 4112
BT - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
T2 - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
Y2 - 25 March 2012 through 30 March 2012
ER -