This paper investigates audio-visual interaction, i.e. inter-modal influences, in linear-regressive model adaptation for multi-modal speech recognition. In the multi-modal adaptation, inter-modal information may contribute the performance of speech recognition. Thus the influence and advantage of intermodal elements should be examined. Experiments were conducted to evaluate several transformation matrices including or excluding inter-modal and intra-modal elements, using noisy data in an audio-visual corpus. From the experimental results, the importance of effective use of audio-visual interaction is clarified.
|出版ステータス||Published - 2011|
|イベント||Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011 - Xi'an, China|
継続期間: 2011 10月 18 → 2011 10月 21
|Conference||Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011|
|Period||11/10/18 → 11/10/21|
ASJC Scopus subject areas