Social interaction is essential in improving robot human interface. Such behaviors for social interaction may include paying attention to a new sound source, moving toward it, or keeping face to face with a moving speaker. Some sound-centered behaviors may be difficult to attain, because the mixture of sounds is not well treated or auditory processing is too slow for real-time applications. Recently, Nakadai et al have developed real-time auditory and visual multiple-talker tracking technology by associating auditory and visual streams. The system is implemented on an upper-torso humanoid and the real-time talker tracking is attained with 200 msec of delay by distributed processing on four PCs connected by Gigabit Ethernet. Focus-of-attention is programmable and allows a variety of behaviors. The system demonstrates non-verbal social interaction by realizing a receptionist robot by focusing on an associated stream, while a companion robot on an auditory stream.