Uncertainty propagation through deep neural networks

Ahmed Hussen Abdelaziz, Shinji Watanabe, John R. Hershey, Emanuel Vincent, Dorothea Kolossa

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

In order to improve the ASR performance in noisy environments, distorted speech is typically pre-processed by a speech enhancement algorithm, which usually results in a speech estimate containing residual noise and distortion. We may also have some measures of uncertainty or variance of the estimate. Uncertainty decoding is a framework that utilizes this knowledge of uncertainty in the input features during acoustic model scoring. Such frameworks have been well explored for traditional probabilistic models, but their optimal use for deep neural network (DNN)-based ASR systems is not yet clear. In this paper, we study the propagation of observation uncertainties through the layers of a DNN-based acoustic model. Since this is intractable due to the nonlinearities of the DNN, we employ approximate propagation methods, including Monte Carlo sampling, the unscented transform, and the piecewise exponential approximation of the activation function, to estimate the distribution of acoustic scores. Finally, the expected value of the acoustic score distribution is used for decoding, which is shown to further improve the ASR accuracy on the CHiME database, relative to a highly optimized DNN baseline.

Original languageEnglish
Pages (from-to)3561-3565
Number of pages5
JournalUnknown Journal
Volume2015-January
Publication statusPublished - 2015
Externally publishedYes

Fingerprint

Uncertainty Propagation
Acoustics
Neural Networks
Uncertainty
Acoustic Model
propagation
acoustics
decoding
Decoding
estimates
Estimate
Propagation
Speech Enhancement
Speech enhancement
Monte Carlo Sampling
scoring
Feature Model
Activation Function
Expected Value
Scoring

Keywords

  • Deep Neural Networks
  • Noise-robust ASR
  • Observation Uncertainty
  • Uncertainty Propagation

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

Abdelaziz, A. H., Watanabe, S., Hershey, J. R., Vincent, E., & Kolossa, D. (2015). Uncertainty propagation through deep neural networks. Unknown Journal, 2015-January, 3561-3565.

Uncertainty propagation through deep neural networks. / Abdelaziz, Ahmed Hussen; Watanabe, Shinji; Hershey, John R.; Vincent, Emanuel; Kolossa, Dorothea.

In: Unknown Journal, Vol. 2015-January, 2015, p. 3561-3565.

Research output: Contribution to journalArticle

Abdelaziz, AH, Watanabe, S, Hershey, JR, Vincent, E & Kolossa, D 2015, 'Uncertainty propagation through deep neural networks', Unknown Journal, vol. 2015-January, pp. 3561-3565.
Abdelaziz AH, Watanabe S, Hershey JR, Vincent E, Kolossa D. Uncertainty propagation through deep neural networks. Unknown Journal. 2015;2015-January:3561-3565.
Abdelaziz, Ahmed Hussen ; Watanabe, Shinji ; Hershey, John R. ; Vincent, Emanuel ; Kolossa, Dorothea. / Uncertainty propagation through deep neural networks. In: Unknown Journal. 2015 ; Vol. 2015-January. pp. 3561-3565.
@article{f0a5842909da4417a237d7ddd132a0ac,
title = "Uncertainty propagation through deep neural networks",
abstract = "In order to improve the ASR performance in noisy environments, distorted speech is typically pre-processed by a speech enhancement algorithm, which usually results in a speech estimate containing residual noise and distortion. We may also have some measures of uncertainty or variance of the estimate. Uncertainty decoding is a framework that utilizes this knowledge of uncertainty in the input features during acoustic model scoring. Such frameworks have been well explored for traditional probabilistic models, but their optimal use for deep neural network (DNN)-based ASR systems is not yet clear. In this paper, we study the propagation of observation uncertainties through the layers of a DNN-based acoustic model. Since this is intractable due to the nonlinearities of the DNN, we employ approximate propagation methods, including Monte Carlo sampling, the unscented transform, and the piecewise exponential approximation of the activation function, to estimate the distribution of acoustic scores. Finally, the expected value of the acoustic score distribution is used for decoding, which is shown to further improve the ASR accuracy on the CHiME database, relative to a highly optimized DNN baseline.",
keywords = "Deep Neural Networks, Noise-robust ASR, Observation Uncertainty, Uncertainty Propagation",
author = "Abdelaziz, {Ahmed Hussen} and Shinji Watanabe and Hershey, {John R.} and Emanuel Vincent and Dorothea Kolossa",
year = "2015",
language = "English",
volume = "2015-January",
pages = "3561--3565",
journal = "Nuclear Physics A",
issn = "0375-9474",
publisher = "Elsevier",

}

TY - JOUR

T1 - Uncertainty propagation through deep neural networks

AU - Abdelaziz, Ahmed Hussen

AU - Watanabe, Shinji

AU - Hershey, John R.

AU - Vincent, Emanuel

AU - Kolossa, Dorothea

PY - 2015

Y1 - 2015

N2 - In order to improve the ASR performance in noisy environments, distorted speech is typically pre-processed by a speech enhancement algorithm, which usually results in a speech estimate containing residual noise and distortion. We may also have some measures of uncertainty or variance of the estimate. Uncertainty decoding is a framework that utilizes this knowledge of uncertainty in the input features during acoustic model scoring. Such frameworks have been well explored for traditional probabilistic models, but their optimal use for deep neural network (DNN)-based ASR systems is not yet clear. In this paper, we study the propagation of observation uncertainties through the layers of a DNN-based acoustic model. Since this is intractable due to the nonlinearities of the DNN, we employ approximate propagation methods, including Monte Carlo sampling, the unscented transform, and the piecewise exponential approximation of the activation function, to estimate the distribution of acoustic scores. Finally, the expected value of the acoustic score distribution is used for decoding, which is shown to further improve the ASR accuracy on the CHiME database, relative to a highly optimized DNN baseline.

AB - In order to improve the ASR performance in noisy environments, distorted speech is typically pre-processed by a speech enhancement algorithm, which usually results in a speech estimate containing residual noise and distortion. We may also have some measures of uncertainty or variance of the estimate. Uncertainty decoding is a framework that utilizes this knowledge of uncertainty in the input features during acoustic model scoring. Such frameworks have been well explored for traditional probabilistic models, but their optimal use for deep neural network (DNN)-based ASR systems is not yet clear. In this paper, we study the propagation of observation uncertainties through the layers of a DNN-based acoustic model. Since this is intractable due to the nonlinearities of the DNN, we employ approximate propagation methods, including Monte Carlo sampling, the unscented transform, and the piecewise exponential approximation of the activation function, to estimate the distribution of acoustic scores. Finally, the expected value of the acoustic score distribution is used for decoding, which is shown to further improve the ASR accuracy on the CHiME database, relative to a highly optimized DNN baseline.

KW - Deep Neural Networks

KW - Noise-robust ASR

KW - Observation Uncertainty

KW - Uncertainty Propagation

UR - http://www.scopus.com/inward/record.url?scp=84959121946&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959121946&partnerID=8YFLogxK

M3 - Article

VL - 2015-January

SP - 3561

EP - 3565

JO - Nuclear Physics A

JF - Nuclear Physics A

SN - 0375-9474

ER -