Audio-visual processing toward robust speech recognition in cars

Satoshi Tamura, Hiroshi Ninomiya, Norihide Kitaoka, Shin Osuga, Yurie Iribe, Kazuya Takeda, Satoru Hayamizu

研究成果: Conference contribution

抄録

This paper reports our recent efforts to develop robust speech recognition in cars. Speech recognition is expected to handle many devices on cars. However, many kinds of acoustic noises, e.g. engine noise and car stereo, are observed in in-car environments, making performance of speech recognition decrease. In order to overcome the degradation, we develop a high-performance audio-visual speech recognition method. Lip images are obtained from captured face images using our face detection scheme. Some basic visual features are computed, then converted into visual features for speech recognition using a deep neural network. Audio features are obtained as well. Audio and visual features are subsequently concatenated into audio-visual features. As a recognition model, a multi-stream hidden Markov model is employed which can adjust contributions of audio and visual modalities. We evaluated our proposed method using an audio-visual corpus CENSREC-1-AV. In order to simulate driving-car condition, we prepared driving and music noises. Experimental results show that our method can significantly improving recognition performance in in-car condition.

本文言語English
ホスト出版物のタイトル7th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2015
出版社University of Texas at Dallas
ページ31-34
ページ数4
ISBN(電子版)9781510827844
出版ステータスPublished - 2015
外部発表はい
イベント7th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2015 - Berkeley, United States
継続期間: 2015 10 142015 10 16

出版物シリーズ

名前7th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2015

Other

Other7th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2015
CountryUnited States
CityBerkeley
Period15/10/1415/10/16

ASJC Scopus subject areas

  • Signal Processing
  • Automotive Engineering
  • Control and Systems Engineering

フィンガープリント 「Audio-visual processing toward robust speech recognition in cars」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル