CENSREC-AV: Evaluation frameworks for audio-visual speech recognition

Satoshi Tamura, Chiyomi Miyajima, Norihide Kitaoka, Satoru Hayamizu, Kazuya Takeda

Research output: Contribution to conferencePaperpeer-review

Abstract

This paper introduces incoming evaluation frameworks for bimodal speech recognition in noisy conditions and real environments. In order to develop a robust speech recognition in noisy environments, bimodal speech recognition which uses acoustic and visual information has been paid attention to particularly for this decade. As a lot of methods and techniques for bimodal speech recognition have been proposed, a common evaluation framework, including audio-visual speech data and baseline system, is needed to estimate and compare these techniques and bimodal speech recognition schemes. Audio-visual evaluation frameworks, CENSREC-1-AV and CENSREC-2-AV, have been being built by the CENSREC project in Japan; CENSREC-1-AV includes artificially noise-added waveforms and image sequences, whereas CENSREC-2-AV consists of audio-visual data recorded in in-car environments. A baseline method and its recognition results will be also provided with these corpora.

Original languageEnglish
Pages51-54
Number of pages4
Publication statusPublished - 2008
Event2008 International Conference on Auditory-Visual Speech Processing, AVSP 2008 - Moreton Island, Australia
Duration: 2008 Sep 262008 Sep 29

Conference

Conference2008 International Conference on Auditory-Visual Speech Processing, AVSP 2008
Country/TerritoryAustralia
CityMoreton Island
Period08/9/2608/9/29

Keywords

  • Audio-visual speech corpus
  • Bimodal speech recognition
  • Evaluation framework
  • Noisy environments

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing
  • Otorhinolaryngology

Fingerprint

Dive into the research topics of 'CENSREC-AV: Evaluation frameworks for audio-visual speech recognition'. Together they form a unique fingerprint.

Cite this