Stream Weight Estimation Using Higher Order Statistics in Multi-modal Speech Recognition

Kazuto Ukai, Satoshi Tamura, Satoru Hayamizu

Research output: Contribution to conferencePaperpeer-review

Abstract

In this paper, stream weight optimization for multi-modal speech recognition using audio information and visual information is examined. In a conventional multi-stream Hidden Markov Model (HMM) used in multi-modal speech recognition, a constraint in which the summation of audio and visual weight factors should be one is employed. This means balance between transition and observation probabilities of HMM is fixed. We study an effective weight estimation indicator when releasing the constraint. Recognition experiments were conducted using an audio-visual corpus CENSREC-1-AV [1]. In noisy environments, effectiveness of deactivating the constraint is clarified for improving recognition accuracy. Subsequently higher-order statistical parameter (kurtosis) based stream weights were proposed and tested. Through recognition experiments, it is found proposed stream weights are successful.

Original languageEnglish
Pages181-184
Number of pages4
Publication statusPublished - 2015
Event1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing, FAAVSP 2015 - Vienna, Austria
Duration: 2015 Sep 112015 Sep 13

Conference

Conference1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing, FAAVSP 2015
Country/TerritoryAustria
CityVienna
Period15/9/1115/9/13

Keywords

  • Kurtosis
  • Multi-modal speech recognition
  • Multi-stream HMM
  • Stream weight optimization

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing
  • Otorhinolaryngology

Fingerprint

Dive into the research topics of 'Stream Weight Estimation Using Higher Order Statistics in Multi-modal Speech Recognition'. Together they form a unique fingerprint.

Cite this