Abstract
Sound is essential to enhance visual experience and human robot interaction, but usually most research and development efforts are made mainly towards sound generation, speech synthesis and speech recognition. The reason why only a little attention has been paid on auditory scene analysis is that real-time perception of a mixture of sounds is difficult. Recently, Nakadai et al have developed real-time auditory and visual multiple-talker tracking technology. In this paper, this technology is applied to human-robot interaction including a receptionist robot and a companion robot at a party. The system includes face identification, speech recognition, focus-of-attention control, and sensorimotor task in tracking multiple talkers. The system is implemented on a upper-torso humanoid and the talker tracking is attained by distributed processing on three nodes connected by 100Base-TX network. The delay of tracking is 200 msec. Focus-of-attention is controlled by associating auditory and visual streams with using the sound source direction and talker position as a clue. Once an association is established, the humanoid keeps its face to the direction of the associated talker.
Original language | English |
---|---|
Title of host publication | IEEE International Conference on Intelligent Robots and Systems |
Pages | 1402-1409 |
Number of pages | 8 |
Volume | 3 |
Publication status | Published - 2001 |
Externally published | Yes |
Event | 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems - Maui, HI Duration: 2001 Oct 29 → 2001 Nov 3 |
Other
Other | 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems |
---|---|
City | Maui, HI |
Period | 01/10/29 → 01/11/3 |
ASJC Scopus subject areas
- Control and Systems Engineering