Development of audio-visual speech corpus toward speaker-independent Japanese LVCSR

Kazuto Ukai, Satoshi Tamura, Satoru Hayamizu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In the speech recognition literature, building corpora for Large Vocabulary Continuous Speech Recognition (LVCSR) is quite important. In addition, in order to overcome performance decrease caused by noise, using visual information such as lip images is effective. In this paper, therefore, we focus on collecting speech and lip-image data for audio-visual LVCSR. Audio-visual speech data were obtained from 12 speakers, each who uttered ATR503 phonetically-balanced sentences. These data were recorded in acoustically and visually clean environments. Using the data, we conducted recognition experiments. Mel Frequency Cepstral Coefficients (MFCCs) and eigenlip features were obtained, and multi-stream Hidden Markov Models (HMMs) were built. We compared the performance in clean condition to those in noisy environments. It is found that visual information is able to compensate the performance. In addition, it turns out that we should improve visual speech recognition for high-performance audio-visual LVCSR.

Original languageEnglish
Title of host publication2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages12-15
Number of pages4
ISBN (Electronic)9781509035168
DOIs
Publication statusPublished - 2017 May 3
Externally publishedYes
Event19th Annual Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016 - Bali, Indonesia
Duration: 2016 Oct 262016 Oct 28

Publication series

Name2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016

Other

Other19th Annual Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016
CountryIndonesia
CityBali
Period16/10/2616/10/28

Keywords

  • audio-visual speech recognition
  • lipreading
  • LVCSR
  • multi-stream HMM

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Information Systems and Management
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Development of audio-visual speech corpus toward speaker-independent Japanese LVCSR'. Together they form a unique fingerprint.

Cite this