Stream Weight Estimation Using Higher Order Statistics in Multi-modal Speech Recognition

Kazuto Ukai, Satoshi Tamura, Satoru Hayamizu

研究成果: Paper査読

抄録

In this paper, stream weight optimization for multi-modal speech recognition using audio information and visual information is examined. In a conventional multi-stream Hidden Markov Model (HMM) used in multi-modal speech recognition, a constraint in which the summation of audio and visual weight factors should be one is employed. This means balance between transition and observation probabilities of HMM is fixed. We study an effective weight estimation indicator when releasing the constraint. Recognition experiments were conducted using an audio-visual corpus CENSREC-1-AV [1]. In noisy environments, effectiveness of deactivating the constraint is clarified for improving recognition accuracy. Subsequently higher-order statistical parameter (kurtosis) based stream weights were proposed and tested. Through recognition experiments, it is found proposed stream weights are successful.

本文言語English
ページ181-184
ページ数4
出版ステータスPublished - 2015
イベント1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing, FAAVSP 2015 - Vienna, Austria
継続期間: 2015 9月 112015 9月 13

Conference

Conference1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing, FAAVSP 2015
国/地域Austria
CityVienna
Period15/9/1115/9/13

ASJC Scopus subject areas

  • 言語および言語学
  • 言語聴覚療法
  • 耳鼻咽喉科学

フィンガープリント

「Stream Weight Estimation Using Higher Order Statistics in Multi-modal Speech Recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル