An improvement in audio-visual voice activity detection for automatic speech recognition

Takami Yoshida, Kazuhiro Nakadai, Hiroshi G. Okuno

研究成果: Conference contribution

6 引用 (Scopus)

抄録

Noise-robust Automatic Speech Recognition (ASR) is essential for robots which are expected to communicate with humans in a daily environment. In such an environment, Voice Activity Detection (VAD) strongly affects the performance of ASR because there are many acoustically and visually noises. In this paper, we improved Audio-Visual VAD for our two-layered audio visual integration framework for ASR by using hangover processing based on erosion and dilation. We implemented proposed method to our audio-visual speech recognition system for robot. Empirical results show the effectiveness of our proposed method in terms of VAD.

元の言語English
ホスト出版物のタイトルLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ページ51-61
ページ数11
6096 LNAI
エディションPART 1
DOI
出版物ステータスPublished - 2010
外部発表Yes
イベント23rd International Conference on Industrial Engineering and Other Applications of Applied Intelligence Systems, IEA/AIE 2010 - Cordoba
継続期間: 2010 6 12010 6 4

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
番号PART 1
6096 LNAI
ISSN(印刷物)03029743
ISSN(電子版)16113349

Other

Other23rd International Conference on Industrial Engineering and Other Applications of Applied Intelligence Systems, IEA/AIE 2010
Cordoba
期間10/6/110/6/4

Fingerprint

Voice Activity Detection
Automatic Speech Recognition
Speech recognition
Robot
Acoustic noise
Erosion
Robots
Speech Recognition
Dilation
Vision
Processing

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

これを引用

Yoshida, T., Nakadai, K., & Okuno, H. G. (2010). An improvement in audio-visual voice activity detection for automatic speech recognition. : Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (PART 1 版, 巻 6096 LNAI, pp. 51-61). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 巻数 6096 LNAI, 番号 PART 1). https://doi.org/10.1007/978-3-642-13022-9_6

An improvement in audio-visual voice activity detection for automatic speech recognition. / Yoshida, Takami; Nakadai, Kazuhiro; Okuno, Hiroshi G.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 巻 6096 LNAI PART 1. 編 2010. p. 51-61 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 巻 6096 LNAI, 番号 PART 1).

研究成果: Conference contribution

Yoshida, T, Nakadai, K & Okuno, HG 2010, An improvement in audio-visual voice activity detection for automatic speech recognition. : Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 1 Edn, 巻. 6096 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 番号 PART 1, 巻. 6096 LNAI, pp. 51-61, 23rd International Conference on Industrial Engineering and Other Applications of Applied Intelligence Systems, IEA/AIE 2010, Cordoba, 10/6/1. https://doi.org/10.1007/978-3-642-13022-9_6
Yoshida T, Nakadai K, Okuno HG. An improvement in audio-visual voice activity detection for automatic speech recognition. : Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 1 版 巻 6096 LNAI. 2010. p. 51-61. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1). https://doi.org/10.1007/978-3-642-13022-9_6
Yoshida, Takami ; Nakadai, Kazuhiro ; Okuno, Hiroshi G. / An improvement in audio-visual voice activity detection for automatic speech recognition. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 巻 6096 LNAI PART 1. 版 2010. pp. 51-61 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1).
@inproceedings{2c8917c0f4e0406a9df5b52c9227a020,
title = "An improvement in audio-visual voice activity detection for automatic speech recognition",
abstract = "Noise-robust Automatic Speech Recognition (ASR) is essential for robots which are expected to communicate with humans in a daily environment. In such an environment, Voice Activity Detection (VAD) strongly affects the performance of ASR because there are many acoustically and visually noises. In this paper, we improved Audio-Visual VAD for our two-layered audio visual integration framework for ASR by using hangover processing based on erosion and dilation. We implemented proposed method to our audio-visual speech recognition system for robot. Empirical results show the effectiveness of our proposed method in terms of VAD.",
keywords = "Audio-Visual integration, Speech Recognition, Voice Activity Detection",
author = "Takami Yoshida and Kazuhiro Nakadai and Okuno, {Hiroshi G.}",
year = "2010",
doi = "10.1007/978-3-642-13022-9_6",
language = "English",
isbn = "3642130216",
volume = "6096 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
number = "PART 1",
pages = "51--61",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
edition = "PART 1",

}

TY - GEN

T1 - An improvement in audio-visual voice activity detection for automatic speech recognition

AU - Yoshida, Takami

AU - Nakadai, Kazuhiro

AU - Okuno, Hiroshi G.

PY - 2010

Y1 - 2010

N2 - Noise-robust Automatic Speech Recognition (ASR) is essential for robots which are expected to communicate with humans in a daily environment. In such an environment, Voice Activity Detection (VAD) strongly affects the performance of ASR because there are many acoustically and visually noises. In this paper, we improved Audio-Visual VAD for our two-layered audio visual integration framework for ASR by using hangover processing based on erosion and dilation. We implemented proposed method to our audio-visual speech recognition system for robot. Empirical results show the effectiveness of our proposed method in terms of VAD.

AB - Noise-robust Automatic Speech Recognition (ASR) is essential for robots which are expected to communicate with humans in a daily environment. In such an environment, Voice Activity Detection (VAD) strongly affects the performance of ASR because there are many acoustically and visually noises. In this paper, we improved Audio-Visual VAD for our two-layered audio visual integration framework for ASR by using hangover processing based on erosion and dilation. We implemented proposed method to our audio-visual speech recognition system for robot. Empirical results show the effectiveness of our proposed method in terms of VAD.

KW - Audio-Visual integration

KW - Speech Recognition

KW - Voice Activity Detection

UR - http://www.scopus.com/inward/record.url?scp=79551526836&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79551526836&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-13022-9_6

DO - 10.1007/978-3-642-13022-9_6

M3 - Conference contribution

AN - SCOPUS:79551526836

SN - 3642130216

SN - 9783642130212

VL - 6096 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 51

EP - 61

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -