In this paper, stream weight optimization for multi-modal speech recognition using audio information and visual information is examined. In a conventional multi-stream Hidden Markov Model (HMM) used in multi-modal speech recognition, a constraint in which the summation of audio and visual weight factors should be one is employed. This means balance between transition and observation probabilities of HMM is fixed. We study an effective weight estimation indicator when releasing the constraint. Recognition experiments were conducted using an audio-visual corpus CENSREC-1-AV . In noisy environments, effectiveness of deactivating the constraint is clarified for improving recognition accuracy. Subsequently higher-order statistical parameter (kurtosis) based stream weights were proposed and tested. Through recognition experiments, it is found proposed stream weights are successful.
|出版ステータス||Published - 2015|
|イベント||1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing, FAAVSP 2015 - Vienna, Austria|
継続期間: 2015 9月 11 → 2015 9月 13
|Conference||1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing, FAAVSP 2015|
|Period||15/9/11 → 15/9/13|
ASJC Scopus subject areas