TY - GEN
T1 - Auditory and visual integration based localization and tracking of multiple moving sounds in daily-life environments
AU - Kim, Hyun Don
AU - Komatani, Kazunori
AU - Ogata, Tetsuya
AU - Okuno, Hiroshi G.
PY - 2007/12/1
Y1 - 2007/12/1
N2 - This paper presents techniques that enable talker tracking for effective human-robot interaction. To track moving people in daily-life environments, localizing multiple moving sounds is necessary so that robots can locate talkers. However, the conventional method requires an array of microphones and impulse response data. Therefore, we propose a way to integrate a cross-power spectrum phase analysis (CSP) method and an expectation-maximization (EM) algorithm. The CSP can localize sound sources using only two microphones and does not need impulse response data. Moreover, the EM algorithm increases the system's effectiveness and allows it to cope with multiple sound sources. We confirmed that the proposed method performs better than the conventional method. In addition, we added a particle filter to the tracking process to produce a reliable tracking path and the particle filter is able to integrate audio-visual information effectively. Furthermore, the applied particle filter is able to track people while dealing with various noises that are even loud sounds in the daily-life environments.
AB - This paper presents techniques that enable talker tracking for effective human-robot interaction. To track moving people in daily-life environments, localizing multiple moving sounds is necessary so that robots can locate talkers. However, the conventional method requires an array of microphones and impulse response data. Therefore, we propose a way to integrate a cross-power spectrum phase analysis (CSP) method and an expectation-maximization (EM) algorithm. The CSP can localize sound sources using only two microphones and does not need impulse response data. Moreover, the EM algorithm increases the system's effectiveness and allows it to cope with multiple sound sources. We confirmed that the proposed method performs better than the conventional method. In addition, we added a particle filter to the tracking process to produce a reliable tracking path and the particle filter is able to integrate audio-visual information effectively. Furthermore, the applied particle filter is able to track people while dealing with various noises that are even loud sounds in the daily-life environments.
UR - http://www.scopus.com/inward/record.url?scp=48749120358&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=48749120358&partnerID=8YFLogxK
U2 - 10.1109/ROMAN.2007.4415117
DO - 10.1109/ROMAN.2007.4415117
M3 - Conference contribution
AN - SCOPUS:48749120358
SN - 1424416345
SN - 9781424416349
T3 - Proceedings - IEEE International Workshop on Robot and Human Interactive Communication
SP - 399
EP - 404
BT - 16th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN
T2 - 16th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN
Y2 - 26 August 2007 through 29 August 2007
ER -