In this paper, we propose a multi-modal voice activity detection system (VAD) that uses audio and visual information. In multi-modal (speech) signal processing, there are two methods for fusing the audio and the visual information: concatenating the audio and visual features, and employing audio-only and visual-only classifiers, then fusing the unimodal decisions. We investigate the effectiveness of decision fusion given by the results from AdaBoost. AdaBoost is one of the machine learning method. By using AdaBoost, the effective classifier is constructed by combining weak classifiers. It classifies input data into two classes based on the weighted results from weak classifiers. In proposed method, this fusion scheme is applied to decision fusion of multi-modal VAD. Experimental results show proposed method to generally be more effective.
|出版ステータス||Published - 2010|
|イベント||2010 International Conference on Auditory-Visual Speech Processing, AVSP 2010 - Hakone, Japan|
継続期間: 2010 9月 30 → 2010 10月 3
|Conference||2010 International Conference on Auditory-Visual Speech Processing, AVSP 2010|
|Period||10/9/30 → 10/10/3|
ASJC Scopus subject areas