Multi-angle lipreading using angle classification and angle-specific feature integration

Shinnosuke Isobe, Satoshi Tamura, Satoru Hayamizu, Yuuto Gotoh, Masaki Nose

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Recently, visual speech recognition (VSR), or namely lipreading, has been widely researched due to development of Deep Learning (DL). The most lipreading researches focus only on frontal face images. However, assuming real scenes, it is obvious that a lipreading system should correctly recognize spoken contents not only from frontal but also side faces. In this paper, we propose a novel lipreading method that is applicable to faces taken at any angles, using Convolutional Neural Networks (CNNs) which is one of key deep-learning techniques. Our method consists of three parts; the view classification part, the feature extraction part and the integration part. We firstly apply angle classification to input faces. Based on the results, secondly we determine the best combination of pre-trained angle-specific feature extraction scheme. Finally, we integrate these features followed by DL-based lipreading. We evaluated our method using the open dataset OuluVS2 dataset including multi-angle audiovisual data. We then confirmed our approach has achieved the best performance among conventional and the other DL-based lipreading schemes in the phrase classification task.

Original languageEnglish
Title of host publicationICCSPA 2020 - 4th International Conference on Communications, Signal Processing, and their Applications
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728165356
DOIs
Publication statusPublished - 2021 Mar 16
Externally publishedYes
Event4th International Conference on Communications, Signal Processing, and their Applications, ICCSPA 2020 - Sharjah, United Arab Emirates
Duration: 2021 Mar 162021 Mar 18

Publication series

NameICCSPA 2020 - 4th International Conference on Communications, Signal Processing, and their Applications
Volume2021-January

Conference

Conference4th International Conference on Communications, Signal Processing, and their Applications, ICCSPA 2020
Country/TerritoryUnited Arab Emirates
CitySharjah
Period21/3/1621/3/18

Keywords

  • Deep-learning
  • Multi-angle lipreading
  • View classification
  • Visual speech recognition

ASJC Scopus subject areas

  • Signal Processing
  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Multi-angle lipreading using angle classification and angle-specific feature integration'. Together they form a unique fingerprint.

Cite this