Auditory fovea based speech enchancement and its application to human-robot dialog system

Kazuhiro Nakadai, Hiroshi G. Okuno, Hiroaki Kitano

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents an active direction-pass filter (ADPF) that separates sound from a specified direction by using a pair of microphones. Its application to front-end processing for speech recognition is also reported. The ADPF improves sound source separation by accurate sound direction obtained by multi-modal integration and active motor control that keeps the robot facing to a sound source, because the resolution of the center direction is much higher than that of peripherals, indicating similar property of visual fovea. In order to recognize separated sound streams, a Hidden Markov Model (HMM) based automatic speech recognition is built with multiple acoustic models trained by the output of the ADPF under various conditions. The experimental results by a preliminary dialog system prove that it works well even when two speakers speak simultaneously.

Original languageEnglish
Title of host publication7th International Conference on Spoken Language Processing, ICSLP 2002
PublisherInternational Speech Communication Association
Pages1817-1820
Number of pages4
Publication statusPublished - 2002
Externally publishedYes
Event7th International Conference on Spoken Language Processing, ICSLP 2002 - Denver, United States
Duration: 2002 Sep 162002 Sep 20

Other

Other7th International Conference on Spoken Language Processing, ICSLP 2002
CountryUnited States
CityDenver
Period02/9/1602/9/20

Fingerprint

robot
communication technology
acoustics
Sound
Hearing
Robot
Dialogue Systems
Filter

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Nakadai, K., Okuno, H. G., & Kitano, H. (2002). Auditory fovea based speech enchancement and its application to human-robot dialog system. In 7th International Conference on Spoken Language Processing, ICSLP 2002 (pp. 1817-1820). International Speech Communication Association.

Auditory fovea based speech enchancement and its application to human-robot dialog system. / Nakadai, Kazuhiro; Okuno, Hiroshi G.; Kitano, Hiroaki.

7th International Conference on Spoken Language Processing, ICSLP 2002. International Speech Communication Association, 2002. p. 1817-1820.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakadai, K, Okuno, HG & Kitano, H 2002, Auditory fovea based speech enchancement and its application to human-robot dialog system. in 7th International Conference on Spoken Language Processing, ICSLP 2002. International Speech Communication Association, pp. 1817-1820, 7th International Conference on Spoken Language Processing, ICSLP 2002, Denver, United States, 02/9/16.
Nakadai K, Okuno HG, Kitano H. Auditory fovea based speech enchancement and its application to human-robot dialog system. In 7th International Conference on Spoken Language Processing, ICSLP 2002. International Speech Communication Association. 2002. p. 1817-1820
Nakadai, Kazuhiro ; Okuno, Hiroshi G. ; Kitano, Hiroaki. / Auditory fovea based speech enchancement and its application to human-robot dialog system. 7th International Conference on Spoken Language Processing, ICSLP 2002. International Speech Communication Association, 2002. pp. 1817-1820
@inproceedings{2855381d6e544937a48b067408a0fe51,
title = "Auditory fovea based speech enchancement and its application to human-robot dialog system",
abstract = "This paper presents an active direction-pass filter (ADPF) that separates sound from a specified direction by using a pair of microphones. Its application to front-end processing for speech recognition is also reported. The ADPF improves sound source separation by accurate sound direction obtained by multi-modal integration and active motor control that keeps the robot facing to a sound source, because the resolution of the center direction is much higher than that of peripherals, indicating similar property of visual fovea. In order to recognize separated sound streams, a Hidden Markov Model (HMM) based automatic speech recognition is built with multiple acoustic models trained by the output of the ADPF under various conditions. The experimental results by a preliminary dialog system prove that it works well even when two speakers speak simultaneously.",
author = "Kazuhiro Nakadai and Okuno, {Hiroshi G.} and Hiroaki Kitano",
year = "2002",
language = "English",
pages = "1817--1820",
booktitle = "7th International Conference on Spoken Language Processing, ICSLP 2002",
publisher = "International Speech Communication Association",

}

TY - GEN

T1 - Auditory fovea based speech enchancement and its application to human-robot dialog system

AU - Nakadai, Kazuhiro

AU - Okuno, Hiroshi G.

AU - Kitano, Hiroaki

PY - 2002

Y1 - 2002

N2 - This paper presents an active direction-pass filter (ADPF) that separates sound from a specified direction by using a pair of microphones. Its application to front-end processing for speech recognition is also reported. The ADPF improves sound source separation by accurate sound direction obtained by multi-modal integration and active motor control that keeps the robot facing to a sound source, because the resolution of the center direction is much higher than that of peripherals, indicating similar property of visual fovea. In order to recognize separated sound streams, a Hidden Markov Model (HMM) based automatic speech recognition is built with multiple acoustic models trained by the output of the ADPF under various conditions. The experimental results by a preliminary dialog system prove that it works well even when two speakers speak simultaneously.

AB - This paper presents an active direction-pass filter (ADPF) that separates sound from a specified direction by using a pair of microphones. Its application to front-end processing for speech recognition is also reported. The ADPF improves sound source separation by accurate sound direction obtained by multi-modal integration and active motor control that keeps the robot facing to a sound source, because the resolution of the center direction is much higher than that of peripherals, indicating similar property of visual fovea. In order to recognize separated sound streams, a Hidden Markov Model (HMM) based automatic speech recognition is built with multiple acoustic models trained by the output of the ADPF under various conditions. The experimental results by a preliminary dialog system prove that it works well even when two speakers speak simultaneously.

UR - http://www.scopus.com/inward/record.url?scp=85009238148&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85009238148&partnerID=8YFLogxK

M3 - Conference contribution

SP - 1817

EP - 1820

BT - 7th International Conference on Spoken Language Processing, ICSLP 2002

PB - International Speech Communication Association

ER -