Three simultaneous speech recognition by integration of active audition and face recognition for humanoid

Kazuhiro Nakadai, Daisuke Matsuura, Hiroshi G. Okuno, Hiroshi Tsujino

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper addresses listening to three simultaneous talkers by a humanoid with two microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech are difficult, because the number of simultaneous talkers exceeds that of its microphones, the signal-to-noise ratio is quite low (around -3 dB) and noise is not stable due to interfering voices. Humanoid audition system consists of sound separation, face recognition and ASR. Sound sources are separated by an active direction-pass filter (ADPF), which extracts sounds from a specified direction in real-time. Since features of sounds separated by ADPF vary according to the sound direction, ASR uses multiple direction- and speaker-dependent acoustic models. The system integrates ASR results by using the sound direction and speaker information by face recognition as well as confidence measure of ASR results to select the best one. The resulting system improves word recognition rates against three simultaneous utterances.

Original languageEnglish
Title of host publicationEUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology
PublisherInternational Speech Communication Association
Pages2705-2708
Number of pages4
Publication statusPublished - 2003
Externally publishedYes
Event8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland
Duration: 2003 Sep 12003 Sep 4

Other

Other8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
CountrySwitzerland
CityGeneva
Period03/9/103/9/4

Fingerprint

Audition
Face recognition
Speech recognition
Acoustic waves
Microphones
acoustics
confidence
Signal to noise ratio
Acoustics

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Linguistics and Language
  • Communication

Cite this

Nakadai, K., Matsuura, D., Okuno, H. G., & Tsujino, H. (2003). Three simultaneous speech recognition by integration of active audition and face recognition for humanoid. In EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology (pp. 2705-2708). International Speech Communication Association.

Three simultaneous speech recognition by integration of active audition and face recognition for humanoid. / Nakadai, Kazuhiro; Matsuura, Daisuke; Okuno, Hiroshi G.; Tsujino, Hiroshi.

EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association, 2003. p. 2705-2708.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakadai, K, Matsuura, D, Okuno, HG & Tsujino, H 2003, Three simultaneous speech recognition by integration of active audition and face recognition for humanoid. in EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association, pp. 2705-2708, 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, Geneva, Switzerland, 03/9/1.
Nakadai K, Matsuura D, Okuno HG, Tsujino H. Three simultaneous speech recognition by integration of active audition and face recognition for humanoid. In EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association. 2003. p. 2705-2708
Nakadai, Kazuhiro ; Matsuura, Daisuke ; Okuno, Hiroshi G. ; Tsujino, Hiroshi. / Three simultaneous speech recognition by integration of active audition and face recognition for humanoid. EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association, 2003. pp. 2705-2708
@inproceedings{42a4b2dbfb1343c3bf8a586e479ea99f,
title = "Three simultaneous speech recognition by integration of active audition and face recognition for humanoid",
abstract = "This paper addresses listening to three simultaneous talkers by a humanoid with two microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech are difficult, because the number of simultaneous talkers exceeds that of its microphones, the signal-to-noise ratio is quite low (around -3 dB) and noise is not stable due to interfering voices. Humanoid audition system consists of sound separation, face recognition and ASR. Sound sources are separated by an active direction-pass filter (ADPF), which extracts sounds from a specified direction in real-time. Since features of sounds separated by ADPF vary according to the sound direction, ASR uses multiple direction- and speaker-dependent acoustic models. The system integrates ASR results by using the sound direction and speaker information by face recognition as well as confidence measure of ASR results to select the best one. The resulting system improves word recognition rates against three simultaneous utterances.",
author = "Kazuhiro Nakadai and Daisuke Matsuura and Okuno, {Hiroshi G.} and Hiroshi Tsujino",
year = "2003",
language = "English",
pages = "2705--2708",
booktitle = "EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology",
publisher = "International Speech Communication Association",

}

TY - GEN

T1 - Three simultaneous speech recognition by integration of active audition and face recognition for humanoid

AU - Nakadai, Kazuhiro

AU - Matsuura, Daisuke

AU - Okuno, Hiroshi G.

AU - Tsujino, Hiroshi

PY - 2003

Y1 - 2003

N2 - This paper addresses listening to three simultaneous talkers by a humanoid with two microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech are difficult, because the number of simultaneous talkers exceeds that of its microphones, the signal-to-noise ratio is quite low (around -3 dB) and noise is not stable due to interfering voices. Humanoid audition system consists of sound separation, face recognition and ASR. Sound sources are separated by an active direction-pass filter (ADPF), which extracts sounds from a specified direction in real-time. Since features of sounds separated by ADPF vary according to the sound direction, ASR uses multiple direction- and speaker-dependent acoustic models. The system integrates ASR results by using the sound direction and speaker information by face recognition as well as confidence measure of ASR results to select the best one. The resulting system improves word recognition rates against three simultaneous utterances.

AB - This paper addresses listening to three simultaneous talkers by a humanoid with two microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech are difficult, because the number of simultaneous talkers exceeds that of its microphones, the signal-to-noise ratio is quite low (around -3 dB) and noise is not stable due to interfering voices. Humanoid audition system consists of sound separation, face recognition and ASR. Sound sources are separated by an active direction-pass filter (ADPF), which extracts sounds from a specified direction in real-time. Since features of sounds separated by ADPF vary according to the sound direction, ASR uses multiple direction- and speaker-dependent acoustic models. The system integrates ASR results by using the sound direction and speaker information by face recognition as well as confidence measure of ASR results to select the best one. The resulting system improves word recognition rates against three simultaneous utterances.

UR - http://www.scopus.com/inward/record.url?scp=85009193773&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85009193773&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85009193773

SP - 2705

EP - 2708

BT - EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology

PB - International Speech Communication Association

ER -