The purpose of this study is to build an evaluation framework for robust bimodal speech recognition in real environments, such as in-car conditions. Bimodal speech recognition using lip images has been studied to prevent the deterioration of speech recognition performance in noisy environments. Lip reading technologies using lip images play a great role for the bimodal speech recognition. Therefore, for the bimodal speech recognition, a database both speech signals and lip images is necessary to build a bimodal speech recognizer and to evaluate its performance. An evaluation framework for noisy bimodal speech recognition (CENSREC-1-AV) was constructed by Tamura et al; a subject on a blue screen background spoke Japanese connected digits in a quiet office environment. CENSREC-1-AV was recorded in the clean condition, on the other hand, a database recorded in real environments is required to evaluate a bimodal speech recognizer. Therefore, we have constructed a new audio-visual corpus CENSREC-2-AV, recorded in in-car environments; a subject sitting on a driver's seat in a car uttered Japanese connected digits in various driving conditions: for example, a tunnel situation with music background noises, and driving on an expressway while the window is open. By using CENSREC-2-AV, it is possible to realize a robust bimodal speech recognition method even in real environments.
|出版ステータス||Published - 2013|
|イベント||6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2013, DSP 2013 - Seoul, Korea, Republic of|
継続期間: 2013 9月 29 → 2013 10月 2
|Conference||6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2013, DSP 2013|
|国/地域||Korea, Republic of|
|Period||13/9/29 → 13/10/2|
ASJC Scopus subject areas