Variational Bayesian estimation and clustering for speech recognition

Shinji Watanabe*, Yasuhiro Minami, Atsushi Nakamura, Naonori Ueda

*この研究の対応する著者

研究成果: Article査読

78 被引用数 (Scopus)

抄録

In this paper, we propose variational Bayesian estimation and clustering for speech recognition (VBEC), which is based on the variational Bayesian (VB) approach. VBEC is a total Bayesian framework: all speech recognition procedures (acoustic modeling and speech classification) are based on VB posterior distribution, unlike the maximum likelihood (ML) approach based on ML parameters. The total Bayesian framework generates two major Bayesian advantages over the ML approach for the mitigation of over-training effects, as it can select an appropriate model structure without any data set size condition, and can classify categories robustly using a predictive posterior distribution. By using these advantages, VBEC: 1) allows the automatic construction of acoustic models along two separate dimensions, namely, clustering triphone hidden Markov model states and determining the number of Gaussians and 2) enables robust speech classification, based on Bayesian predictive classification using VB posterior distributions. The capabilities of the VBEC functions were confirmed in large vocabulary continuous speech recognition experiments for read and spontaneous speech tasks. The experiments confirmed that VBEC automatically constructed accurate acoustic models and robustly classified speech, i.e., totally mitigated the over-training effects with high word accuracies due to the VBEC functions.

本文言語English
ページ(範囲)365-381
ページ数17
ジャーナルIEEE Transactions on Speech and Audio Processing
12
4
DOI
出版ステータスPublished - 2004 7 1
外部発表はい

ASJC Scopus subject areas

  • ソフトウェア
  • 音響学および超音波学
  • コンピュータ ビジョンおよびパターン認識
  • 電子工学および電気工学

フィンガープリント

「Variational Bayesian estimation and clustering for speech recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル