Human tracking system integrating sound and face localization using an expectation-maximization algorithm in real environments

Hyun Don Kim*, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

11 Citations (Scopus)


We have developed a human tracking system for use by robots that integrate sound and face localization. Conventional systems usually require many microphones and/or prior information to localize several sound sources. Moreover, they are incapable of coping with various types of background noise. Our system, the cross-power spectrum phase analysis of sound signals obtained with only two microphones, is used to localize the sound source without having to use prior information such as impulse response data. An expectation- maximization (EM) algorithm is used to help the system cope with several moving sound sources. The problem of distinguishing whether sounds are coming from the front or back is also solved with only two microphones by rotating the robot's head. A developed method that uses facial skin colors classified by another EM algorithm enables the system to detect faces in various poses. It can compensate for the error in the sound localization for a speaker and also identify noise signals entering from undesired directions by detecting a human face. A developed probability-based method is used to integrate the auditory and visual information in order to produce a reliable tracking path in real-time. Experiments using a robot showed that our system can localize two sounds at the same time and track a communication partner while dealing with various types of background noise.

Original languageEnglish
Pages (from-to)629-653
Number of pages25
JournalAdvanced Robotics
Issue number6
Publication statusPublished - 2009
Externally publishedYes


  • Face localization
  • Human tracking
  • Human-robot interaction
  • Humanoid robot
  • Sound source localization

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Human-Computer Interaction
  • Hardware and Architecture
  • Computer Science Applications


Dive into the research topics of 'Human tracking system integrating sound and face localization using an expectation-maximization algorithm in real environments'. Together they form a unique fingerprint.

Cite this