Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features

Yuuki Tachioka, Shinji Watanabe

研究成果: Article査読

8 被引用数 (Scopus)

抄録

Speech enhancement is an important front-end technique to improve automatic speech recognition (ASR) in noisy environments. However, the wrong noise suppression of speech enhancement often causes additional distortions in speech signals, which degrades the ASR performance. To compensate the distortions, ASR needs to consider the uncertainty of enhanced features, which can be achieved by using the expectation of ASR decoding/training process with respect to the probabilistic representation of input features. However, unlike the Gaussian mixture model, it is difficult for Deep Neural Network (DNN) to deal with this expectation analytically due to the nonlinear activations. This paper proposes efficient Monte-Carlo approximation methods for this expectation calculation to realize DNN based uncertainty decoding and training. It first models the uncertainty of input features with linear interpolation between original and enhanced feature vectors with a random interpolation coefficient. By sampling input features based on this stochastic process in training, DNN can learn to generalize the variations of enhanced features. Our method also samples input features in decoding, and integrates multiple recognition hypotheses obtained from the samples. Experiments on the reverberated noisy speech recognition tasks (the second CHiME and REVERB challenges) show the effectiveness of our techniques.

本文言語English
ページ(範囲)3541-3545
ページ数5
ジャーナルUnknown Journal
2015-January
出版ステータスPublished - 2015
外部発表はい

ASJC Scopus subject areas

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション

フィンガープリント

「Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル