Cross-modal binding in auditory-visual speech perception was investigated by using the McGurk effect, a phenomenon in which hearing is altered by incongruent visual mouth movements. We used functional magnetic resonance imaging (fMRI) and positron emission tomography (PET). In each experiment, the subjects were asked to identify spoken syllables ('ba', 'da', 'ga') presented auditorily, visually, or audiovisually (incongruent stimuli). For the auditory component of the stimuli, there were two conditions of intelligibility (High versus Low) as determined by the signal-to-noise (SN) ratio. The control task was visual talker identification of still faces. In the Low intelligibility condition in which the auditory component of the speech was harder to hear, the visual influence was much stronger. Brain imaging data showed bilateral activations specific to the unimodal auditory stimuli (in the temporal cortex) and visual stimuli (in the MT/V5). For the bimodal audiovisual stimuli, activation in the left temporal cortex extended more posteriorly toward the visual-specific area in the Low intelligibility condition. The direct comparison between the Low and High audiovisual conditions showed increased activations in the posterior part of the left superior temporal sulcus (STS), indicating its relationship with the stronger visual influence. It was discussed that this region is likely to be involved in cross-modal binding of auditory-visual speech.
ASJC Scopus subject areas