Automatic lip reading by using multimodal visual features

Shohei Takahashi*, Jun Ohya

*この研究の対応する著者

研究成果: Conference contribution

抄録

Since long time ago, speech recognition has been researched, though it does not work well in noisy places such as in the car or in the train. In addition, people with hearing-impaired or difficulties in hearing cannot receive benefits from speech recognition. To recognize the speech automatically, visual information is also important. People understand speeches from not only audio information, but also visual information such as temporal changes in the lip shape. A vision based speech recognition method could work well in noisy places, and could be useful also for people with hearing disabilities. In this paper, we propose an automatic lip-reading method for recognizing the speech by using multimodal visual information without using any audio information such as speech recognition. First, the ASM (Active Shape Model) is used to track and detect the face and lip in a video sequence. Second, the shape, optical flow and spatial frequencies of the lip features are extracted from the lip detected by ASM. Next, the extracted multimodal features are ordered chronologically so that Support Vector Machine is performed in order to learn and classify the spoken words. Experiments for classifying several words show promising results of this proposed method.

本文言語English
ホスト出版物のタイトルProceedings of SPIE-IS and T Electronic Imaging - Intelligent Robots and Computer Vision XXXI
ホスト出版物のサブタイトルAlgorithms and Techniques
DOI
出版ステータスPublished - 2014 3 17
イベントIntelligent Robots and Computer Vision XXXI: Algorithms and Techniques - San Francisco, CA, United States
継続期間: 2014 2 42014 2 6

出版物シリーズ

名前Proceedings of SPIE - The International Society for Optical Engineering
9025
ISSN(印刷版)0277-786X
ISSN(電子版)1996-756X

Conference

ConferenceIntelligent Robots and Computer Vision XXXI: Algorithms and Techniques
国/地域United States
CitySan Francisco, CA
Period14/2/414/2/6

ASJC Scopus subject areas

  • 電子材料、光学材料、および磁性材料
  • 凝縮系物理学
  • コンピュータ サイエンスの応用
  • 応用数学
  • 電子工学および電気工学

フィンガープリント

「Automatic lip reading by using multimodal visual features」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル