TY - GEN
T1 - A robust audio-visual speech recognition using Audio-Visual Voice Activity Detection
AU - Tamura, Satoshi
AU - Ishikawa, Masato
AU - Hashiba, Takashi
AU - Takeuchi, Shin'ichi
AU - Hayamizu, Satoru
PY - 2010
Y1 - 2010
N2 - This paper proposes a novel speech recognition method combining Audio-Visual Voice Activity Detection (AVVAD) and Audio-Visual Automatic Speech Recognition (AVASR). AVASR has been developed to enhance the robustness of ASR in noisy environments, using visual information in addition to acoustic features. Similarly, AVVAD increases the precision of VAD in noisy conditions, which detects presence of speech from an audio signal. In our approach, AVVAD is conducted as a preprocessing followed by an AVASR system, making a significantly robust speech recognizer. To evaluate the proposed system, recognition experiments were conducted using noisy audio-visual data, testing several AVVAD approaches. Then it is found that the proposed AVASR system using the model-free feature-fusion AVVAD method outperforms not only non-VAD audio-only ASR but also conventional AVASR.
AB - This paper proposes a novel speech recognition method combining Audio-Visual Voice Activity Detection (AVVAD) and Audio-Visual Automatic Speech Recognition (AVASR). AVASR has been developed to enhance the robustness of ASR in noisy environments, using visual information in addition to acoustic features. Similarly, AVVAD increases the precision of VAD in noisy conditions, which detects presence of speech from an audio signal. In our approach, AVVAD is conducted as a preprocessing followed by an AVASR system, making a significantly robust speech recognizer. To evaluate the proposed system, recognition experiments were conducted using noisy audio-visual data, testing several AVVAD approaches. Then it is found that the proposed AVASR system using the model-free feature-fusion AVVAD method outperforms not only non-VAD audio-only ASR but also conventional AVASR.
KW - Audio-visual
KW - Decision fusion
KW - Feature fusion
KW - Speech recognition
KW - Voice Activity Detection
UR - http://www.scopus.com/inward/record.url?scp=79959837545&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79959837545&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:79959837545
T3 - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
SP - 2694
EP - 2697
BT - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
PB - International Speech Communication Association
ER -