TY - GEN
T1 - Generalized weighted-prediction-error dereverberation with varying source priors for reverberant speech recognition
AU - Taniguchi, Toru
AU - Subramanian, Aswin Shanmugam
AU - Wang, Xiaofei
AU - Tran, Dung
AU - Fujita, Yuya
AU - Watanabe, Shinji
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Weighted-prediction-error (WPE) is one of the well-known dereverberation signal processing methods especially for alleviating degradation of performance of automatic speech recognition (ASR) in a distant speaker scenario. WPE usually assumes that desired source signals always follow predefined specific source priors such as Gaussian with time-varying variances (TVG). Although based on this assumption WPE works well in practice, generally proper priors depend on sources, and they cannot be known in advance of the processing. On-demand estimation of source priors e.g. according to each utterance is thus required. For this purpose, we extend WPE by introducing a complex-valued generalized Gaussian (CGG) prior and its shape parameter estimator inside of processing to deal with a variety of super-Gaussian sources depending on sources. Blind estimation of the shape parameter of priors is realized by adding a shape parameter estimator as a sub-network to WPE-CGG, treated as a differentiable neural network. The sub-network can be trained by backpropagation from the outputs of the whole network using any criteria such as signal-level mean square error or even ASR errors if the WPE-CGG computational graph is connected to that of the ASR network. Experimental results show that the proposed method outperforms conventional baseline methods with the TVG prior without careful setting of the shape parameter value during evaluation.
AB - Weighted-prediction-error (WPE) is one of the well-known dereverberation signal processing methods especially for alleviating degradation of performance of automatic speech recognition (ASR) in a distant speaker scenario. WPE usually assumes that desired source signals always follow predefined specific source priors such as Gaussian with time-varying variances (TVG). Although based on this assumption WPE works well in practice, generally proper priors depend on sources, and they cannot be known in advance of the processing. On-demand estimation of source priors e.g. according to each utterance is thus required. For this purpose, we extend WPE by introducing a complex-valued generalized Gaussian (CGG) prior and its shape parameter estimator inside of processing to deal with a variety of super-Gaussian sources depending on sources. Blind estimation of the shape parameter of priors is realized by adding a shape parameter estimator as a sub-network to WPE-CGG, treated as a differentiable neural network. The sub-network can be trained by backpropagation from the outputs of the whole network using any criteria such as signal-level mean square error or even ASR errors if the WPE-CGG computational graph is connected to that of the ASR network. Experimental results show that the proposed method outperforms conventional baseline methods with the TVG prior without careful setting of the shape parameter value during evaluation.
KW - Single-channel Dereverberation
KW - WPE
KW - complex generalized Gaussian
KW - reverberant speech recognition
KW - shape parameter
UR - http://www.scopus.com/inward/record.url?scp=85078572064&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078572064&partnerID=8YFLogxK
U2 - 10.1109/WASPAA.2019.8937270
DO - 10.1109/WASPAA.2019.8937270
M3 - Conference contribution
AN - SCOPUS:85078572064
T3 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
SP - 293
EP - 297
BT - 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019
Y2 - 20 October 2019 through 23 October 2019
ER -