Mobile robots with auditory perception usually adopt “stop- perceive-act” principle to avoid sounds made during moving due to motor noises or bumpy roads. Although this principle reduces the complexity of the problems involved auditory processing for mobile robots, it restricts their capabilities of auditory processing. In this paper, sound and visual tracking is investigated to attain robust object tracking by compensating each drawbacks in tracking objects. Visual tracking may be difficult in case of occlusion, while sound tracking may be ambiguous in localization due to the nature of auditory processing. For this purpose, we present an active audition system for a humanoid robot. The audition system of the intelligent humanoid requires localization of sound sources and identification of meanings of the sound in the auditory scene. The active audition reported in this paper focuses on improved sound source tracking by integrating audition, vision, and motor movements. Given the multiple sound sources in the auditory scene, SIG the humanoid actively moves its head to improve localization by aligning microphones orthogonal to the sound source and by capturing the possible sound sources by vision. The system adaptively cancels motor noise using motor control signals. The experimental result demonstrates the effectiveness and robustness of sound and visual tracking.