A robust audio-visual speech recognition using Audio-Visual Voice Activity Detection

Satoshi Tamura*, Masato Ishikawa, Takashi Hashiba, Shin'ichi Takeuchi, Satoru Hayamizu

*この研究の対応する著者

研究成果

8 被引用数 (Scopus)

抄録

This paper proposes a novel speech recognition method combining Audio-Visual Voice Activity Detection (AVVAD) and Audio-Visual Automatic Speech Recognition (AVASR). AVASR has been developed to enhance the robustness of ASR in noisy environments, using visual information in addition to acoustic features. Similarly, AVVAD increases the precision of VAD in noisy conditions, which detects presence of speech from an audio signal. In our approach, AVVAD is conducted as a preprocessing followed by an AVASR system, making a significantly robust speech recognizer. To evaluate the proposed system, recognition experiments were conducted using noisy audio-visual data, testing several AVVAD approaches. Then it is found that the proposed AVASR system using the model-free feature-fusion AVVAD method outperforms not only non-VAD audio-only ASR but also conventional AVASR.

本文言語English
ホスト出版物のタイトルProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
出版社International Speech Communication Association
ページ2694-2697
ページ数4
出版ステータスPublished - 2010
外部発表はい

出版物シリーズ

名前Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

ASJC Scopus subject areas

  • 言語および言語学
  • 言語聴覚療法

フィンガープリント

「A robust audio-visual speech recognition using Audio-Visual Voice Activity Detection」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル