An improvement in audio-visual voice activity detection for automatic speech recognition

Takami Yoshida, Kazuhiro Nakadai, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Noise-robust Automatic Speech Recognition (ASR) is essential for robots which are expected to communicate with humans in a daily environment. In such an environment, Voice Activity Detection (VAD) strongly affects the performance of ASR because there are many acoustically and visually noises. In this paper, we improved Audio-Visual VAD for our two-layered audio visual integration framework for ASR by using hangover processing based on erosion and dilation. We implemented proposed method to our audio-visual speech recognition system for robot. Empirical results show the effectiveness of our proposed method in terms of VAD.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages51-61
Number of pages11
Volume6096 LNAI
EditionPART 1
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event23rd International Conference on Industrial Engineering and Other Applications of Applied Intelligence Systems, IEA/AIE 2010 - Cordoba
Duration: 2010 Jun 12010 Jun 4

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume6096 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other23rd International Conference on Industrial Engineering and Other Applications of Applied Intelligence Systems, IEA/AIE 2010
CityCordoba
Period10/6/110/6/4

Fingerprint

Voice Activity Detection
Automatic Speech Recognition
Speech recognition
Robot
Acoustic noise
Erosion
Robots
Speech Recognition
Dilation
Vision
Processing

Keywords

  • Audio-Visual integration
  • Speech Recognition
  • Voice Activity Detection

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Yoshida, T., Nakadai, K., & Okuno, H. G. (2010). An improvement in audio-visual voice activity detection for automatic speech recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (PART 1 ed., Vol. 6096 LNAI, pp. 51-61). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6096 LNAI, No. PART 1). https://doi.org/10.1007/978-3-642-13022-9_6

An improvement in audio-visual voice activity detection for automatic speech recognition. / Yoshida, Takami; Nakadai, Kazuhiro; Okuno, Hiroshi G.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6096 LNAI PART 1. ed. 2010. p. 51-61 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6096 LNAI, No. PART 1).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yoshida, T, Nakadai, K & Okuno, HG 2010, An improvement in audio-visual voice activity detection for automatic speech recognition. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 1 edn, vol. 6096 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 1, vol. 6096 LNAI, pp. 51-61, 23rd International Conference on Industrial Engineering and Other Applications of Applied Intelligence Systems, IEA/AIE 2010, Cordoba, 10/6/1. https://doi.org/10.1007/978-3-642-13022-9_6
Yoshida T, Nakadai K, Okuno HG. An improvement in audio-visual voice activity detection for automatic speech recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 1 ed. Vol. 6096 LNAI. 2010. p. 51-61. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1). https://doi.org/10.1007/978-3-642-13022-9_6
Yoshida, Takami ; Nakadai, Kazuhiro ; Okuno, Hiroshi G. / An improvement in audio-visual voice activity detection for automatic speech recognition. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6096 LNAI PART 1. ed. 2010. pp. 51-61 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1).
@inproceedings{2c8917c0f4e0406a9df5b52c9227a020,
title = "An improvement in audio-visual voice activity detection for automatic speech recognition",
abstract = "Noise-robust Automatic Speech Recognition (ASR) is essential for robots which are expected to communicate with humans in a daily environment. In such an environment, Voice Activity Detection (VAD) strongly affects the performance of ASR because there are many acoustically and visually noises. In this paper, we improved Audio-Visual VAD for our two-layered audio visual integration framework for ASR by using hangover processing based on erosion and dilation. We implemented proposed method to our audio-visual speech recognition system for robot. Empirical results show the effectiveness of our proposed method in terms of VAD.",
keywords = "Audio-Visual integration, Speech Recognition, Voice Activity Detection",
author = "Takami Yoshida and Kazuhiro Nakadai and Okuno, {Hiroshi G.}",
year = "2010",
doi = "10.1007/978-3-642-13022-9_6",
language = "English",
isbn = "3642130216",
volume = "6096 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
number = "PART 1",
pages = "51--61",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
edition = "PART 1",

}

TY - GEN

T1 - An improvement in audio-visual voice activity detection for automatic speech recognition

AU - Yoshida, Takami

AU - Nakadai, Kazuhiro

AU - Okuno, Hiroshi G.

PY - 2010

Y1 - 2010

N2 - Noise-robust Automatic Speech Recognition (ASR) is essential for robots which are expected to communicate with humans in a daily environment. In such an environment, Voice Activity Detection (VAD) strongly affects the performance of ASR because there are many acoustically and visually noises. In this paper, we improved Audio-Visual VAD for our two-layered audio visual integration framework for ASR by using hangover processing based on erosion and dilation. We implemented proposed method to our audio-visual speech recognition system for robot. Empirical results show the effectiveness of our proposed method in terms of VAD.

AB - Noise-robust Automatic Speech Recognition (ASR) is essential for robots which are expected to communicate with humans in a daily environment. In such an environment, Voice Activity Detection (VAD) strongly affects the performance of ASR because there are many acoustically and visually noises. In this paper, we improved Audio-Visual VAD for our two-layered audio visual integration framework for ASR by using hangover processing based on erosion and dilation. We implemented proposed method to our audio-visual speech recognition system for robot. Empirical results show the effectiveness of our proposed method in terms of VAD.

KW - Audio-Visual integration

KW - Speech Recognition

KW - Voice Activity Detection

UR - http://www.scopus.com/inward/record.url?scp=79551526836&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79551526836&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-13022-9_6

DO - 10.1007/978-3-642-13022-9_6

M3 - Conference contribution

AN - SCOPUS:79551526836

SN - 3642130216

SN - 9783642130212

VL - 6096 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 51

EP - 61

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -