Auditory fovea based speech separation and its application to dialog system

Kazuhiro Nakadai, Hiroshi G. Okuno, Hiroaki Kitano

研究成果: Conference contribution

14 引用 (Scopus)

抄録

Robots, in particular, mobile robots should listen to and recognize speeches with their own ears in a real world to attain smooth communications with people. This paper presents an active direction-pass filter (ADPF) that separates sounds originating from the specified direction by using a pair of microphones. Its application to front-end processing for speech recognition is also reported. Since the performance of sound source separation by the ADPF depends on the accuracy of sound source localization (direction), various localization modules including interaural phase difference (IPD), interaural intensity difference (IID) for each sub-band, other visual and auditory processing is integrated hierarchically. The resulting performance of auditory localization varies according to the relative position of sound source. The resolution of the center of the robot is much higher than that of peripherals, indicating similar property of visual fovea (high resolution in the center of human eye). To make the best use of this property, the ADPF controls the direction of a head by motor movement. In order to recognize sound streams separated by the ADPF, a Hidden Markov Model (HMM) based automatic speech recognition is built with multiple acoustic models trained by the output of the ADPF under different conditions. A preliminary dialog system is thus implemented on an upper-torso humanoid. The experimental results prove that it works well even when two speakers speak simultaneously.

元の言語English
ホスト出版物のタイトルIEEE International Conference on Intelligent Robots and Systems
ページ1320-1325
ページ数6
2
出版物ステータスPublished - 2002
外部発表Yes
イベント2002 IEEE/RSJ International Conference on Intelligent Robots and Systems - Lausanne
継続期間: 2002 9 302002 10 4

Other

Other2002 IEEE/RSJ International Conference on Intelligent Robots and Systems
Lausanne
期間02/9/3002/10/4

Fingerprint

Acoustic waves
Speech recognition
Robots
Source separation
Hidden Markov models
Microphones
Processing
Mobile robots
Acoustics
Communication

ASJC Scopus subject areas

  • Control and Systems Engineering

これを引用

Nakadai, K., Okuno, H. G., & Kitano, H. (2002). Auditory fovea based speech separation and its application to dialog system. : IEEE International Conference on Intelligent Robots and Systems (巻 2, pp. 1320-1325)

Auditory fovea based speech separation and its application to dialog system. / Nakadai, Kazuhiro; Okuno, Hiroshi G.; Kitano, Hiroaki.

IEEE International Conference on Intelligent Robots and Systems. 巻 2 2002. p. 1320-1325.

研究成果: Conference contribution

Nakadai, K, Okuno, HG & Kitano, H 2002, Auditory fovea based speech separation and its application to dialog system. : IEEE International Conference on Intelligent Robots and Systems. 巻. 2, pp. 1320-1325, 2002 IEEE/RSJ International Conference on Intelligent Robots and Systems, Lausanne, 02/9/30.
Nakadai K, Okuno HG, Kitano H. Auditory fovea based speech separation and its application to dialog system. : IEEE International Conference on Intelligent Robots and Systems. 巻 2. 2002. p. 1320-1325
Nakadai, Kazuhiro ; Okuno, Hiroshi G. ; Kitano, Hiroaki. / Auditory fovea based speech separation and its application to dialog system. IEEE International Conference on Intelligent Robots and Systems. 巻 2 2002. pp. 1320-1325
@inproceedings{e353f60438f04644b7fe9ad394af9cee,
title = "Auditory fovea based speech separation and its application to dialog system",
abstract = "Robots, in particular, mobile robots should listen to and recognize speeches with their own ears in a real world to attain smooth communications with people. This paper presents an active direction-pass filter (ADPF) that separates sounds originating from the specified direction by using a pair of microphones. Its application to front-end processing for speech recognition is also reported. Since the performance of sound source separation by the ADPF depends on the accuracy of sound source localization (direction), various localization modules including interaural phase difference (IPD), interaural intensity difference (IID) for each sub-band, other visual and auditory processing is integrated hierarchically. The resulting performance of auditory localization varies according to the relative position of sound source. The resolution of the center of the robot is much higher than that of peripherals, indicating similar property of visual fovea (high resolution in the center of human eye). To make the best use of this property, the ADPF controls the direction of a head by motor movement. In order to recognize sound streams separated by the ADPF, a Hidden Markov Model (HMM) based automatic speech recognition is built with multiple acoustic models trained by the output of the ADPF under different conditions. A preliminary dialog system is thus implemented on an upper-torso humanoid. The experimental results prove that it works well even when two speakers speak simultaneously.",
author = "Kazuhiro Nakadai and Okuno, {Hiroshi G.} and Hiroaki Kitano",
year = "2002",
language = "English",
volume = "2",
pages = "1320--1325",
booktitle = "IEEE International Conference on Intelligent Robots and Systems",

}

TY - GEN

T1 - Auditory fovea based speech separation and its application to dialog system

AU - Nakadai, Kazuhiro

AU - Okuno, Hiroshi G.

AU - Kitano, Hiroaki

PY - 2002

Y1 - 2002

N2 - Robots, in particular, mobile robots should listen to and recognize speeches with their own ears in a real world to attain smooth communications with people. This paper presents an active direction-pass filter (ADPF) that separates sounds originating from the specified direction by using a pair of microphones. Its application to front-end processing for speech recognition is also reported. Since the performance of sound source separation by the ADPF depends on the accuracy of sound source localization (direction), various localization modules including interaural phase difference (IPD), interaural intensity difference (IID) for each sub-band, other visual and auditory processing is integrated hierarchically. The resulting performance of auditory localization varies according to the relative position of sound source. The resolution of the center of the robot is much higher than that of peripherals, indicating similar property of visual fovea (high resolution in the center of human eye). To make the best use of this property, the ADPF controls the direction of a head by motor movement. In order to recognize sound streams separated by the ADPF, a Hidden Markov Model (HMM) based automatic speech recognition is built with multiple acoustic models trained by the output of the ADPF under different conditions. A preliminary dialog system is thus implemented on an upper-torso humanoid. The experimental results prove that it works well even when two speakers speak simultaneously.

AB - Robots, in particular, mobile robots should listen to and recognize speeches with their own ears in a real world to attain smooth communications with people. This paper presents an active direction-pass filter (ADPF) that separates sounds originating from the specified direction by using a pair of microphones. Its application to front-end processing for speech recognition is also reported. Since the performance of sound source separation by the ADPF depends on the accuracy of sound source localization (direction), various localization modules including interaural phase difference (IPD), interaural intensity difference (IID) for each sub-band, other visual and auditory processing is integrated hierarchically. The resulting performance of auditory localization varies according to the relative position of sound source. The resolution of the center of the robot is much higher than that of peripherals, indicating similar property of visual fovea (high resolution in the center of human eye). To make the best use of this property, the ADPF controls the direction of a head by motor movement. In order to recognize sound streams separated by the ADPF, a Hidden Markov Model (HMM) based automatic speech recognition is built with multiple acoustic models trained by the output of the ADPF under different conditions. A preliminary dialog system is thus implemented on an upper-torso humanoid. The experimental results prove that it works well even when two speakers speak simultaneously.

UR - http://www.scopus.com/inward/record.url?scp=0036448776&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036448776&partnerID=8YFLogxK

M3 - Conference contribution

VL - 2

SP - 1320

EP - 1325

BT - IEEE International Conference on Intelligent Robots and Systems

ER -