Lipreading using deep bottleneck features for optical and depth images

Satoshi Tamura, Koichi Miyazaki, Satoru Hayamizu

Research output: Contribution to conferencePaperpeer-review

Abstract

This paper investigates a lipreading scheme employing optical and depth modalities, with using deep bottleneck features. Optical and depth data are captured by Microsoft Kinect v2, followed by computing an appearance-based feature set in each modality. A basic feature set is then converted into a deep bottleneck feature using a deep neural network having a bottleneck layer. Multi-stream hidden Marcov models are used for recognition. We evaluated the method using our connected-digit corpus, comparing to our previous method. It is finally found that we could improve lipreading performance by employing deep bottleneck features.

Original languageEnglish
Pages76-77
Number of pages2
Publication statusPublished - 2017
Event14th International Conference on Auditory-Visual Speech Processing, AVSP 2017 - Stockholm, Sweden
Duration: 2017 Aug 252017 Aug 26

Conference

Conference14th International Conference on Auditory-Visual Speech Processing, AVSP 2017
Country/TerritorySweden
CityStockholm
Period17/8/2517/8/26

Keywords

  • deep bottleneck feature
  • depth information
  • lipreading
  • multi-stream HMM

ASJC Scopus subject areas

  • Language and Linguistics
  • Otorhinolaryngology
  • Speech and Hearing

Fingerprint

Dive into the research topics of 'Lipreading using deep bottleneck features for optical and depth images'. Together they form a unique fingerprint.

Cite this