This paper introduces incoming evaluation frameworks for bimodal speech recognition in noisy conditions and real environments. In order to develop a robust speech recognition in noisy environments, bimodal speech recognition which uses acoustic and visual information has been paid attention to particularly for this decade. As a lot of methods and techniques for bimodal speech recognition have been proposed, a common evaluation framework, including audio-visual speech data and baseline system, is needed to estimate and compare these techniques and bimodal speech recognition schemes. Audio-visual evaluation frameworks, CENSREC-1-AV and CENSREC-2-AV, have been being built by the CENSREC project in Japan; CENSREC-1-AV includes artificially noise-added waveforms and image sequences, whereas CENSREC-2-AV consists of audio-visual data recorded in in-car environments. A baseline method and its recognition results will be also provided with these corpora.
|出版ステータス||Published - 2008|
|イベント||2008 International Conference on Auditory-Visual Speech Processing, AVSP 2008 - Moreton Island, Australia|
継続期間: 2008 9月 26 → 2008 9月 29
|Conference||2008 International Conference on Auditory-Visual Speech Processing, AVSP 2008|
|Period||08/9/26 → 08/9/29|
ASJC Scopus subject areas