Human-robot non-verbal interaction empowered by real-time auditory and visual multiple-talker tracking

Hiroshi G. Okuno, Kazuhiro Nakadai*, Ken Ichi Hidai, Hiroshi Mizoguchi, Hiroaki Kitano

*この研究の対応する著者

研究成果: Article査読

12 被引用数 (Scopus)

抄録

Sound is essential to enhance visual experience and human-robot interaction, but most research and development efforts are usually made mainly towards sound generation, speech synthesis and speech recognition. The reason why only little attention has been paid to auditory scene analysis is that real-time perception of a mixture of sounds is difficult. Recently, Nakadai et al. have developed real-time auditory and visual multiple-talker tracking technology. In this paper, this technology is applied to human-robot verbal and non-verbal interaction including a receptionist robot and a companion robot at a party. The system includes face identification, speech recognition, focus-of-attention control and a sensorimotor task in tracking multiple talkers. The system is implemented on an upper-torso humanoid called SIG and the talker tracking is attained by distributed processing on three nodes connected by a 100Base-TX network. The overall delay of tracking is 200 ms. Focus-of-attention is controlled by associating auditory and visual streams with using the sound source direction and talker position as a clue. Once an association is established, the humanoid keeps its face towards the direction of the associated talker.

本文言語English
ページ(範囲)115-130
ページ数16
ジャーナルAdvanced Robotics
17
2
DOI
出版ステータスPublished - 2003
外部発表はい

ASJC Scopus subject areas

  • 制御およびシステム工学

フィンガープリント

「Human-robot non-verbal interaction empowered by real-time auditory and visual multiple-talker tracking」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル