Human-robot interaction through real-time auditory and visual multiple-talker tracking

Hiroshi G. Okuno, Kazuhiro Nakadai, Ken Ichi Hidai, Hiroshi Mizoguchi, Hiroaki Kitano

Research output: Chapter in Book/Report/Conference proceedingConference contribution

49 Citations (Scopus)

Abstract

Sound is essential to enhance visual experience and human robot interaction, but usually most research and development efforts are made mainly towards sound generation, speech synthesis and speech recognition. The reason why only a little attention has been paid on auditory scene analysis is that real-time perception of a mixture of sounds is difficult. Recently, Nakadai et al have developed real-time auditory and visual multiple-talker tracking technology. In this paper, this technology is applied to human-robot interaction including a receptionist robot and a companion robot at a party. The system includes face identification, speech recognition, focus-of-attention control, and sensorimotor task in tracking multiple talkers. The system is implemented on a upper-torso humanoid and the talker tracking is attained by distributed processing on three nodes connected by 100Base-TX network. The delay of tracking is 200 msec. Focus-of-attention is controlled by associating auditory and visual streams with using the sound source direction and talker position as a clue. Once an association is established, the humanoid keeps its face to the direction of the associated talker.

Original languageEnglish
Title of host publicationIEEE International Conference on Intelligent Robots and Systems
Pages1402-1409
Number of pages8
Volume3
Publication statusPublished - 2001
Externally publishedYes
Event2001 IEEE/RSJ International Conference on Intelligent Robots and Systems - Maui, HI
Duration: 2001 Oct 292001 Nov 3

Other

Other2001 IEEE/RSJ International Conference on Intelligent Robots and Systems
CityMaui, HI
Period01/10/2901/11/3

Fingerprint

Human robot interaction
Acoustic waves
Speech recognition
Robots
Speech synthesis
Processing

ASJC Scopus subject areas

  • Control and Systems Engineering

Cite this

Okuno, H. G., Nakadai, K., Hidai, K. I., Mizoguchi, H., & Kitano, H. (2001). Human-robot interaction through real-time auditory and visual multiple-talker tracking. In IEEE International Conference on Intelligent Robots and Systems (Vol. 3, pp. 1402-1409)

Human-robot interaction through real-time auditory and visual multiple-talker tracking. / Okuno, Hiroshi G.; Nakadai, Kazuhiro; Hidai, Ken Ichi; Mizoguchi, Hiroshi; Kitano, Hiroaki.

IEEE International Conference on Intelligent Robots and Systems. Vol. 3 2001. p. 1402-1409.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Okuno, HG, Nakadai, K, Hidai, KI, Mizoguchi, H & Kitano, H 2001, Human-robot interaction through real-time auditory and visual multiple-talker tracking. in IEEE International Conference on Intelligent Robots and Systems. vol. 3, pp. 1402-1409, 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems, Maui, HI, 01/10/29.
Okuno HG, Nakadai K, Hidai KI, Mizoguchi H, Kitano H. Human-robot interaction through real-time auditory and visual multiple-talker tracking. In IEEE International Conference on Intelligent Robots and Systems. Vol. 3. 2001. p. 1402-1409
Okuno, Hiroshi G. ; Nakadai, Kazuhiro ; Hidai, Ken Ichi ; Mizoguchi, Hiroshi ; Kitano, Hiroaki. / Human-robot interaction through real-time auditory and visual multiple-talker tracking. IEEE International Conference on Intelligent Robots and Systems. Vol. 3 2001. pp. 1402-1409
@inproceedings{23eabd8699c3488e8abfc396ab6626f6,
title = "Human-robot interaction through real-time auditory and visual multiple-talker tracking",
abstract = "Sound is essential to enhance visual experience and human robot interaction, but usually most research and development efforts are made mainly towards sound generation, speech synthesis and speech recognition. The reason why only a little attention has been paid on auditory scene analysis is that real-time perception of a mixture of sounds is difficult. Recently, Nakadai et al have developed real-time auditory and visual multiple-talker tracking technology. In this paper, this technology is applied to human-robot interaction including a receptionist robot and a companion robot at a party. The system includes face identification, speech recognition, focus-of-attention control, and sensorimotor task in tracking multiple talkers. The system is implemented on a upper-torso humanoid and the talker tracking is attained by distributed processing on three nodes connected by 100Base-TX network. The delay of tracking is 200 msec. Focus-of-attention is controlled by associating auditory and visual streams with using the sound source direction and talker position as a clue. Once an association is established, the humanoid keeps its face to the direction of the associated talker.",
author = "Okuno, {Hiroshi G.} and Kazuhiro Nakadai and Hidai, {Ken Ichi} and Hiroshi Mizoguchi and Hiroaki Kitano",
year = "2001",
language = "English",
volume = "3",
pages = "1402--1409",
booktitle = "IEEE International Conference on Intelligent Robots and Systems",

}

TY - GEN

T1 - Human-robot interaction through real-time auditory and visual multiple-talker tracking

AU - Okuno, Hiroshi G.

AU - Nakadai, Kazuhiro

AU - Hidai, Ken Ichi

AU - Mizoguchi, Hiroshi

AU - Kitano, Hiroaki

PY - 2001

Y1 - 2001

N2 - Sound is essential to enhance visual experience and human robot interaction, but usually most research and development efforts are made mainly towards sound generation, speech synthesis and speech recognition. The reason why only a little attention has been paid on auditory scene analysis is that real-time perception of a mixture of sounds is difficult. Recently, Nakadai et al have developed real-time auditory and visual multiple-talker tracking technology. In this paper, this technology is applied to human-robot interaction including a receptionist robot and a companion robot at a party. The system includes face identification, speech recognition, focus-of-attention control, and sensorimotor task in tracking multiple talkers. The system is implemented on a upper-torso humanoid and the talker tracking is attained by distributed processing on three nodes connected by 100Base-TX network. The delay of tracking is 200 msec. Focus-of-attention is controlled by associating auditory and visual streams with using the sound source direction and talker position as a clue. Once an association is established, the humanoid keeps its face to the direction of the associated talker.

AB - Sound is essential to enhance visual experience and human robot interaction, but usually most research and development efforts are made mainly towards sound generation, speech synthesis and speech recognition. The reason why only a little attention has been paid on auditory scene analysis is that real-time perception of a mixture of sounds is difficult. Recently, Nakadai et al have developed real-time auditory and visual multiple-talker tracking technology. In this paper, this technology is applied to human-robot interaction including a receptionist robot and a companion robot at a party. The system includes face identification, speech recognition, focus-of-attention control, and sensorimotor task in tracking multiple talkers. The system is implemented on a upper-torso humanoid and the talker tracking is attained by distributed processing on three nodes connected by 100Base-TX network. The delay of tracking is 200 msec. Focus-of-attention is controlled by associating auditory and visual streams with using the sound source direction and talker position as a clue. Once an association is established, the humanoid keeps its face to the direction of the associated talker.

UR - http://www.scopus.com/inward/record.url?scp=0035558059&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035558059&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0035558059

VL - 3

SP - 1402

EP - 1409

BT - IEEE International Conference on Intelligent Robots and Systems

ER -