Human tracking system integrating sound and face localization using an expectation-maximization algorithm in real environments

Hyun Don Kim, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

We have developed a human tracking system for use by robots that integrate sound and face localization. Conventional systems usually require many microphones and/or prior information to localize several sound sources. Moreover, they are incapable of coping with various types of background noise. Our system, the cross-power spectrum phase analysis of sound signals obtained with only two microphones, is used to localize the sound source without having to use prior information such as impulse response data. An expectation- maximization (EM) algorithm is used to help the system cope with several moving sound sources. The problem of distinguishing whether sounds are coming from the front or back is also solved with only two microphones by rotating the robot's head. A developed method that uses facial skin colors classified by another EM algorithm enables the system to detect faces in various poses. It can compensate for the error in the sound localization for a speaker and also identify noise signals entering from undesired directions by detecting a human face. A developed probability-based method is used to integrate the auditory and visual information in order to produce a reliable tracking path in real-time. Experiments using a robot showed that our system can localize two sounds at the same time and track a communication partner while dealing with various types of background noise.

Original languageEnglish
Pages (from-to)629-653
Number of pages25
JournalAdvanced Robotics
Volume23
Issue number6
DOIs
Publication statusPublished - 2009
Externally publishedYes

Fingerprint

Acoustic waves
Microphones
Robots
Information use
Power spectrum
Impulse response
Acoustic noise
Skin
Color
Communication
Experiments

Keywords

  • Face localization
  • Human tracking
  • Human-robot interaction
  • Humanoid robot
  • Sound source localization

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Human-Computer Interaction
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Cite this

Human tracking system integrating sound and face localization using an expectation-maximization algorithm in real environments. / Kim, Hyun Don; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

In: Advanced Robotics, Vol. 23, No. 6, 2009, p. 629-653.

Research output: Contribution to journalArticle

@article{51531a2480664bfe8bb03f92648d7f79,
title = "Human tracking system integrating sound and face localization using an expectation-maximization algorithm in real environments",
abstract = "We have developed a human tracking system for use by robots that integrate sound and face localization. Conventional systems usually require many microphones and/or prior information to localize several sound sources. Moreover, they are incapable of coping with various types of background noise. Our system, the cross-power spectrum phase analysis of sound signals obtained with only two microphones, is used to localize the sound source without having to use prior information such as impulse response data. An expectation- maximization (EM) algorithm is used to help the system cope with several moving sound sources. The problem of distinguishing whether sounds are coming from the front or back is also solved with only two microphones by rotating the robot's head. A developed method that uses facial skin colors classified by another EM algorithm enables the system to detect faces in various poses. It can compensate for the error in the sound localization for a speaker and also identify noise signals entering from undesired directions by detecting a human face. A developed probability-based method is used to integrate the auditory and visual information in order to produce a reliable tracking path in real-time. Experiments using a robot showed that our system can localize two sounds at the same time and track a communication partner while dealing with various types of background noise.",
keywords = "Face localization, Human tracking, Human-robot interaction, Humanoid robot, Sound source localization",
author = "Kim, {Hyun Don} and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2009",
doi = "10.1163/156855309X431659",
language = "English",
volume = "23",
pages = "629--653",
journal = "Advanced Robotics",
issn = "0169-1864",
publisher = "Taylor and Francis Ltd.",
number = "6",

}

TY - JOUR

T1 - Human tracking system integrating sound and face localization using an expectation-maximization algorithm in real environments

AU - Kim, Hyun Don

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2009

Y1 - 2009

N2 - We have developed a human tracking system for use by robots that integrate sound and face localization. Conventional systems usually require many microphones and/or prior information to localize several sound sources. Moreover, they are incapable of coping with various types of background noise. Our system, the cross-power spectrum phase analysis of sound signals obtained with only two microphones, is used to localize the sound source without having to use prior information such as impulse response data. An expectation- maximization (EM) algorithm is used to help the system cope with several moving sound sources. The problem of distinguishing whether sounds are coming from the front or back is also solved with only two microphones by rotating the robot's head. A developed method that uses facial skin colors classified by another EM algorithm enables the system to detect faces in various poses. It can compensate for the error in the sound localization for a speaker and also identify noise signals entering from undesired directions by detecting a human face. A developed probability-based method is used to integrate the auditory and visual information in order to produce a reliable tracking path in real-time. Experiments using a robot showed that our system can localize two sounds at the same time and track a communication partner while dealing with various types of background noise.

AB - We have developed a human tracking system for use by robots that integrate sound and face localization. Conventional systems usually require many microphones and/or prior information to localize several sound sources. Moreover, they are incapable of coping with various types of background noise. Our system, the cross-power spectrum phase analysis of sound signals obtained with only two microphones, is used to localize the sound source without having to use prior information such as impulse response data. An expectation- maximization (EM) algorithm is used to help the system cope with several moving sound sources. The problem of distinguishing whether sounds are coming from the front or back is also solved with only two microphones by rotating the robot's head. A developed method that uses facial skin colors classified by another EM algorithm enables the system to detect faces in various poses. It can compensate for the error in the sound localization for a speaker and also identify noise signals entering from undesired directions by detecting a human face. A developed probability-based method is used to integrate the auditory and visual information in order to produce a reliable tracking path in real-time. Experiments using a robot showed that our system can localize two sounds at the same time and track a communication partner while dealing with various types of background noise.

KW - Face localization

KW - Human tracking

KW - Human-robot interaction

KW - Humanoid robot

KW - Sound source localization

UR - http://www.scopus.com/inward/record.url?scp=67649363920&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67649363920&partnerID=8YFLogxK

U2 - 10.1163/156855309X431659

DO - 10.1163/156855309X431659

M3 - Article

AN - SCOPUS:67649363920

VL - 23

SP - 629

EP - 653

JO - Advanced Robotics

JF - Advanced Robotics

SN - 0169-1864

IS - 6

ER -