Auditory and visual integration based localization and tracking of multiple moving sounds in daily-life environments

Hyun Don Kim, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper presents techniques that enable talker tracking for effective human-robot interaction. To track moving people in daily-life environments, localizing multiple moving sounds is necessary so that robots can locate talkers. However, the conventional method requires an array of microphones and impulse response data. Therefore, we propose a way to integrate a cross-power spectrum phase analysis (CSP) method and an expectation-maximization (EM) algorithm. The CSP can localize sound sources using only two microphones and does not need impulse response data. Moreover, the EM algorithm increases the system's effectiveness and allows it to cope with multiple sound sources. We confirmed that the proposed method performs better than the conventional method. In addition, we added a particle filter to the tracking process to produce a reliable tracking path and the particle filter is able to integrate audio-visual information effectively. Furthermore, the applied particle filter is able to track people while dealing with various noises that are even loud sounds in the daily-life environments.

Original languageEnglish
Title of host publicationProceedings - IEEE International Workshop on Robot and Human Interactive Communication
Pages399-404
Number of pages6
DOIs
Publication statusPublished - 2007
Externally publishedYes
Event16th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN - Jeju
Duration: 2007 Aug 262007 Aug 29

Other

Other16th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN
CityJeju
Period07/8/2607/8/29

Fingerprint

Acoustic waves
Microphones
Power spectrum
Impulse response
Human robot interaction
Acoustic noise
Robots

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Kim, H. D., Komatani, K., Ogata, T., & Okuno, H. G. (2007). Auditory and visual integration based localization and tracking of multiple moving sounds in daily-life environments. In Proceedings - IEEE International Workshop on Robot and Human Interactive Communication (pp. 399-404). [4415117] https://doi.org/10.1109/ROMAN.2007.4415117

Auditory and visual integration based localization and tracking of multiple moving sounds in daily-life environments. / Kim, Hyun Don; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

Proceedings - IEEE International Workshop on Robot and Human Interactive Communication. 2007. p. 399-404 4415117.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kim, HD, Komatani, K, Ogata, T & Okuno, HG 2007, Auditory and visual integration based localization and tracking of multiple moving sounds in daily-life environments. in Proceedings - IEEE International Workshop on Robot and Human Interactive Communication., 4415117, pp. 399-404, 16th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN, Jeju, 07/8/26. https://doi.org/10.1109/ROMAN.2007.4415117
Kim HD, Komatani K, Ogata T, Okuno HG. Auditory and visual integration based localization and tracking of multiple moving sounds in daily-life environments. In Proceedings - IEEE International Workshop on Robot and Human Interactive Communication. 2007. p. 399-404. 4415117 https://doi.org/10.1109/ROMAN.2007.4415117
Kim, Hyun Don ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Auditory and visual integration based localization and tracking of multiple moving sounds in daily-life environments. Proceedings - IEEE International Workshop on Robot and Human Interactive Communication. 2007. pp. 399-404
@inproceedings{e423e0c3ee28458da44d6fd3f0fd7e63,
title = "Auditory and visual integration based localization and tracking of multiple moving sounds in daily-life environments",
abstract = "This paper presents techniques that enable talker tracking for effective human-robot interaction. To track moving people in daily-life environments, localizing multiple moving sounds is necessary so that robots can locate talkers. However, the conventional method requires an array of microphones and impulse response data. Therefore, we propose a way to integrate a cross-power spectrum phase analysis (CSP) method and an expectation-maximization (EM) algorithm. The CSP can localize sound sources using only two microphones and does not need impulse response data. Moreover, the EM algorithm increases the system's effectiveness and allows it to cope with multiple sound sources. We confirmed that the proposed method performs better than the conventional method. In addition, we added a particle filter to the tracking process to produce a reliable tracking path and the particle filter is able to integrate audio-visual information effectively. Furthermore, the applied particle filter is able to track people while dealing with various noises that are even loud sounds in the daily-life environments.",
author = "Kim, {Hyun Don} and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2007",
doi = "10.1109/ROMAN.2007.4415117",
language = "English",
isbn = "1424416345",
pages = "399--404",
booktitle = "Proceedings - IEEE International Workshop on Robot and Human Interactive Communication",

}

TY - GEN

T1 - Auditory and visual integration based localization and tracking of multiple moving sounds in daily-life environments

AU - Kim, Hyun Don

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2007

Y1 - 2007

N2 - This paper presents techniques that enable talker tracking for effective human-robot interaction. To track moving people in daily-life environments, localizing multiple moving sounds is necessary so that robots can locate talkers. However, the conventional method requires an array of microphones and impulse response data. Therefore, we propose a way to integrate a cross-power spectrum phase analysis (CSP) method and an expectation-maximization (EM) algorithm. The CSP can localize sound sources using only two microphones and does not need impulse response data. Moreover, the EM algorithm increases the system's effectiveness and allows it to cope with multiple sound sources. We confirmed that the proposed method performs better than the conventional method. In addition, we added a particle filter to the tracking process to produce a reliable tracking path and the particle filter is able to integrate audio-visual information effectively. Furthermore, the applied particle filter is able to track people while dealing with various noises that are even loud sounds in the daily-life environments.

AB - This paper presents techniques that enable talker tracking for effective human-robot interaction. To track moving people in daily-life environments, localizing multiple moving sounds is necessary so that robots can locate talkers. However, the conventional method requires an array of microphones and impulse response data. Therefore, we propose a way to integrate a cross-power spectrum phase analysis (CSP) method and an expectation-maximization (EM) algorithm. The CSP can localize sound sources using only two microphones and does not need impulse response data. Moreover, the EM algorithm increases the system's effectiveness and allows it to cope with multiple sound sources. We confirmed that the proposed method performs better than the conventional method. In addition, we added a particle filter to the tracking process to produce a reliable tracking path and the particle filter is able to integrate audio-visual information effectively. Furthermore, the applied particle filter is able to track people while dealing with various noises that are even loud sounds in the daily-life environments.

UR - http://www.scopus.com/inward/record.url?scp=48749120358&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=48749120358&partnerID=8YFLogxK

U2 - 10.1109/ROMAN.2007.4415117

DO - 10.1109/ROMAN.2007.4415117

M3 - Conference contribution

AN - SCOPUS:48749120358

SN - 1424416345

SN - 9781424416349

SP - 399

EP - 404

BT - Proceedings - IEEE International Workshop on Robot and Human Interactive Communication

ER -