Abstract
In this paper, stream weight optimization for multi-modal speech recognition using audio information and visual information is examined. In a conventional multi-stream Hidden Markov Model (HMM) used in multi-modal speech recognition, a constraint in which the summation of audio and visual weight factors should be one is employed. This means balance between transition and observation probabilities of HMM is fixed. We study an effective weight estimation indicator when releasing the constraint. Recognition experiments were conducted using an audio-visual corpus CENSREC-1-AV [1]. In noisy environments, effectiveness of deactivating the constraint is clarified for improving recognition accuracy. Subsequently higher-order statistical parameter (kurtosis) based stream weights were proposed and tested. Through recognition experiments, it is found proposed stream weights are successful.
Original language | English |
---|---|
Pages | 181-184 |
Number of pages | 4 |
Publication status | Published - 2015 |
Event | 1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing, FAAVSP 2015 - Vienna, Austria Duration: 2015 Sep 11 → 2015 Sep 13 |
Conference
Conference | 1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing, FAAVSP 2015 |
---|---|
Country/Territory | Austria |
City | Vienna |
Period | 15/9/11 → 15/9/13 |
Keywords
- Kurtosis
- Multi-modal speech recognition
- Multi-stream HMM
- Stream weight optimization
ASJC Scopus subject areas
- Language and Linguistics
- Speech and Hearing
- Otorhinolaryngology