An improvement in audio-visual voice activity detection for automatic speech recognition

Takami Yoshida*, Kazuhiro Nakadai, Hiroshi G. Okuno

*この研究の対応する著者

研究成果: Conference contribution

6 被引用数 (Scopus)

抄録

Noise-robust Automatic Speech Recognition (ASR) is essential for robots which are expected to communicate with humans in a daily environment. In such an environment, Voice Activity Detection (VAD) strongly affects the performance of ASR because there are many acoustically and visually noises. In this paper, we improved Audio-Visual VAD for our two-layered audio visual integration framework for ASR by using hangover processing based on erosion and dilation. We implemented proposed method to our audio-visual speech recognition system for robot. Empirical results show the effectiveness of our proposed method in terms of VAD.

本文言語English
ホスト出版物のタイトルLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ページ51-61
ページ数11
6096 LNAI
PART 1
DOI
出版ステータスPublished - 2010
外部発表はい
イベント23rd International Conference on Industrial Engineering and Other Applications of Applied Intelligence Systems, IEA/AIE 2010 - Cordoba
継続期間: 2010 6月 12010 6月 4

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
番号PART 1
6096 LNAI
ISSN(印刷版)03029743
ISSN(電子版)16113349

Other

Other23rd International Conference on Industrial Engineering and Other Applications of Applied Intelligence Systems, IEA/AIE 2010
CityCordoba
Period10/6/110/6/4

ASJC Scopus subject areas

  • コンピュータ サイエンス(全般)
  • 理論的コンピュータサイエンス

フィンガープリント

「An improvement in audio-visual voice activity detection for automatic speech recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル