Robust recognition of simultaneous speech by a mobile robot

Jean Marc Valin, Shun'ichi Yamamoto, Jean Rouat, François Michaud, Kazuhiro Nakadai, Hiroshi G. Okuno

Research output: Contribution to journalArticle

58 Citations (Scopus)

Abstract

This paper describes a system that gives a mobile robot the ability to perform automatic speech recognition with simultaneous speakers. A microphone array is used along with a real-time implementation of geometric source separation (GSS) and a postfilter that gives a further reduction of interference from other sources. The postfilter is also used to estimate the reliability of spectral features and compute a missing feature mask. The mask is used in a missing feature theory-based speech recognition system to recognize the speech from simultaneous Japanese speakers in the context of a humanoid robot. Recognition rates are presented for three simultaneous speakers located at 2 m from the robot. The system was evaluated on a 200-word vocabulary at different azimuths between sources, ranging from 10° to 90°. Compared to the use of the microphone array source separation alone, we demonstrate an average reduction in relative recognition error rate of 24% with the postfilter and of 42% when the missing features approach is combined with the postfilter. We demonstrate the effectiveness of our multisource microphone array postfilter and the improvement it provides when used in conjunction with the missing features theory.

Original languageEnglish
Pages (from-to)742-752
Number of pages11
JournalIEEE Transactions on Robotics
Volume23
Issue number4
DOIs
Publication statusPublished - 2007 Aug
Externally publishedYes

Fingerprint

Microphones
Mobile robots
Source separation
Speech recognition
Masks
Robots

Keywords

  • Cocktail party
  • Gometric source separation (GSS)
  • Microphone array
  • Missing feature theory
  • Robot audition
  • Speech recognition

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Cite this

Robust recognition of simultaneous speech by a mobile robot. / Valin, Jean Marc; Yamamoto, Shun'ichi; Rouat, Jean; Michaud, François; Nakadai, Kazuhiro; Okuno, Hiroshi G.

In: IEEE Transactions on Robotics, Vol. 23, No. 4, 08.2007, p. 742-752.

Research output: Contribution to journalArticle

Valin, JM, Yamamoto, S, Rouat, J, Michaud, F, Nakadai, K & Okuno, HG 2007, 'Robust recognition of simultaneous speech by a mobile robot', IEEE Transactions on Robotics, vol. 23, no. 4, pp. 742-752. https://doi.org/10.1109/TRO.2007.900612
Valin, Jean Marc ; Yamamoto, Shun'ichi ; Rouat, Jean ; Michaud, François ; Nakadai, Kazuhiro ; Okuno, Hiroshi G. / Robust recognition of simultaneous speech by a mobile robot. In: IEEE Transactions on Robotics. 2007 ; Vol. 23, No. 4. pp. 742-752.
@article{196a6252016b47bd9b363a37d7389bb9,
title = "Robust recognition of simultaneous speech by a mobile robot",
abstract = "This paper describes a system that gives a mobile robot the ability to perform automatic speech recognition with simultaneous speakers. A microphone array is used along with a real-time implementation of geometric source separation (GSS) and a postfilter that gives a further reduction of interference from other sources. The postfilter is also used to estimate the reliability of spectral features and compute a missing feature mask. The mask is used in a missing feature theory-based speech recognition system to recognize the speech from simultaneous Japanese speakers in the context of a humanoid robot. Recognition rates are presented for three simultaneous speakers located at 2 m from the robot. The system was evaluated on a 200-word vocabulary at different azimuths between sources, ranging from 10° to 90°. Compared to the use of the microphone array source separation alone, we demonstrate an average reduction in relative recognition error rate of 24{\%} with the postfilter and of 42{\%} when the missing features approach is combined with the postfilter. We demonstrate the effectiveness of our multisource microphone array postfilter and the improvement it provides when used in conjunction with the missing features theory.",
keywords = "Cocktail party, Gometric source separation (GSS), Microphone array, Missing feature theory, Robot audition, Speech recognition",
author = "Valin, {Jean Marc} and Shun'ichi Yamamoto and Jean Rouat and Fran{\cc}ois Michaud and Kazuhiro Nakadai and Okuno, {Hiroshi G.}",
year = "2007",
month = "8",
doi = "10.1109/TRO.2007.900612",
language = "English",
volume = "23",
pages = "742--752",
journal = "IEEE Transactions on Robotics",
issn = "1552-3098",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "4",

}

TY - JOUR

T1 - Robust recognition of simultaneous speech by a mobile robot

AU - Valin, Jean Marc

AU - Yamamoto, Shun'ichi

AU - Rouat, Jean

AU - Michaud, François

AU - Nakadai, Kazuhiro

AU - Okuno, Hiroshi G.

PY - 2007/8

Y1 - 2007/8

N2 - This paper describes a system that gives a mobile robot the ability to perform automatic speech recognition with simultaneous speakers. A microphone array is used along with a real-time implementation of geometric source separation (GSS) and a postfilter that gives a further reduction of interference from other sources. The postfilter is also used to estimate the reliability of spectral features and compute a missing feature mask. The mask is used in a missing feature theory-based speech recognition system to recognize the speech from simultaneous Japanese speakers in the context of a humanoid robot. Recognition rates are presented for three simultaneous speakers located at 2 m from the robot. The system was evaluated on a 200-word vocabulary at different azimuths between sources, ranging from 10° to 90°. Compared to the use of the microphone array source separation alone, we demonstrate an average reduction in relative recognition error rate of 24% with the postfilter and of 42% when the missing features approach is combined with the postfilter. We demonstrate the effectiveness of our multisource microphone array postfilter and the improvement it provides when used in conjunction with the missing features theory.

AB - This paper describes a system that gives a mobile robot the ability to perform automatic speech recognition with simultaneous speakers. A microphone array is used along with a real-time implementation of geometric source separation (GSS) and a postfilter that gives a further reduction of interference from other sources. The postfilter is also used to estimate the reliability of spectral features and compute a missing feature mask. The mask is used in a missing feature theory-based speech recognition system to recognize the speech from simultaneous Japanese speakers in the context of a humanoid robot. Recognition rates are presented for three simultaneous speakers located at 2 m from the robot. The system was evaluated on a 200-word vocabulary at different azimuths between sources, ranging from 10° to 90°. Compared to the use of the microphone array source separation alone, we demonstrate an average reduction in relative recognition error rate of 24% with the postfilter and of 42% when the missing features approach is combined with the postfilter. We demonstrate the effectiveness of our multisource microphone array postfilter and the improvement it provides when used in conjunction with the missing features theory.

KW - Cocktail party

KW - Gometric source separation (GSS)

KW - Microphone array

KW - Missing feature theory

KW - Robot audition

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=34548176123&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548176123&partnerID=8YFLogxK

U2 - 10.1109/TRO.2007.900612

DO - 10.1109/TRO.2007.900612

M3 - Article

AN - SCOPUS:34548176123

VL - 23

SP - 742

EP - 752

JO - IEEE Transactions on Robotics

JF - IEEE Transactions on Robotics

SN - 1552-3098

IS - 4

ER -