Improvement of robot audition by interfacing sound source separation and automatic speech recognition with missing feature theory

Shun'ichi Yamamoto, Kazuhiro Nakadai, Hiroshi Tsujino, Toshio Yokoyama, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

23 Citations (Scopus)

Abstract

We have been developed robot audition system using the active direction-pass filter (ADPF) with the Scattering Theory, and demonstrated that the humanoid SIG could separate and recognize three simultaneous speeches originating from different directions. This is the first result that a robot can listen to several things simultaneously. However, its general applicability to other robots is not yet confirmed. Since automatic speech recognition (ASR) requires direction- and speaker-dependent acoustic models, it is difficult to adapt various kinds of environments. In addition ASR with lots of acoustic models causes slow processing. In this paper, these three problems are resolved. First, we confirmed the generality of the ADPF by applying it to two humanoids, SIG2 and Replie, under different environments. Next, we present the new interface between ADPF and ASR based on the Missing Feature Theory, which masks broken features of separated sound to make them unavailable to ASR. This new interface improved the recognition performance of three simultaneous speeches up to about 90%. Finally, since the ASR uses only a single acoustic model that is direction- and speaker-independent and created under clean environments, the processing of the whole system was made very light and fast.

Original languageEnglish
Title of host publicationProceedings - IEEE International Conference on Robotics and Automation
Pages1517-1523
Number of pages7
Volume2004
Edition2
Publication statusPublished - 2004
Externally publishedYes
EventProceedings- 2004 IEEE International Conference on Robotics and Automation - New Orleans, LA, United States
Duration: 2004 Apr 262004 May 1

Other

OtherProceedings- 2004 IEEE International Conference on Robotics and Automation
CountryUnited States
CityNew Orleans, LA
Period04/4/2604/5/1

Fingerprint

Source separation
Audition
Speech recognition
Acoustic waves
Robots
Acoustics
Processing
Masks
Scattering

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering

Cite this

Yamamoto, S., Nakadai, K., Tsujino, H., Yokoyama, T., & Okuno, H. G. (2004). Improvement of robot audition by interfacing sound source separation and automatic speech recognition with missing feature theory. In Proceedings - IEEE International Conference on Robotics and Automation (2 ed., Vol. 2004, pp. 1517-1523)

Improvement of robot audition by interfacing sound source separation and automatic speech recognition with missing feature theory. / Yamamoto, Shun'ichi; Nakadai, Kazuhiro; Tsujino, Hiroshi; Yokoyama, Toshio; Okuno, Hiroshi G.

Proceedings - IEEE International Conference on Robotics and Automation. Vol. 2004 2. ed. 2004. p. 1517-1523.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yamamoto, S, Nakadai, K, Tsujino, H, Yokoyama, T & Okuno, HG 2004, Improvement of robot audition by interfacing sound source separation and automatic speech recognition with missing feature theory. in Proceedings - IEEE International Conference on Robotics and Automation. 2 edn, vol. 2004, pp. 1517-1523, Proceedings- 2004 IEEE International Conference on Robotics and Automation, New Orleans, LA, United States, 04/4/26.
Yamamoto S, Nakadai K, Tsujino H, Yokoyama T, Okuno HG. Improvement of robot audition by interfacing sound source separation and automatic speech recognition with missing feature theory. In Proceedings - IEEE International Conference on Robotics and Automation. 2 ed. Vol. 2004. 2004. p. 1517-1523
Yamamoto, Shun'ichi ; Nakadai, Kazuhiro ; Tsujino, Hiroshi ; Yokoyama, Toshio ; Okuno, Hiroshi G. / Improvement of robot audition by interfacing sound source separation and automatic speech recognition with missing feature theory. Proceedings - IEEE International Conference on Robotics and Automation. Vol. 2004 2. ed. 2004. pp. 1517-1523
@inproceedings{c0d7e29072a74b17b0fc6702cbde87e6,
title = "Improvement of robot audition by interfacing sound source separation and automatic speech recognition with missing feature theory",
abstract = "We have been developed robot audition system using the active direction-pass filter (ADPF) with the Scattering Theory, and demonstrated that the humanoid SIG could separate and recognize three simultaneous speeches originating from different directions. This is the first result that a robot can listen to several things simultaneously. However, its general applicability to other robots is not yet confirmed. Since automatic speech recognition (ASR) requires direction- and speaker-dependent acoustic models, it is difficult to adapt various kinds of environments. In addition ASR with lots of acoustic models causes slow processing. In this paper, these three problems are resolved. First, we confirmed the generality of the ADPF by applying it to two humanoids, SIG2 and Replie, under different environments. Next, we present the new interface between ADPF and ASR based on the Missing Feature Theory, which masks broken features of separated sound to make them unavailable to ASR. This new interface improved the recognition performance of three simultaneous speeches up to about 90{\%}. Finally, since the ASR uses only a single acoustic model that is direction- and speaker-independent and created under clean environments, the processing of the whole system was made very light and fast.",
author = "Shun'ichi Yamamoto and Kazuhiro Nakadai and Hiroshi Tsujino and Toshio Yokoyama and Okuno, {Hiroshi G.}",
year = "2004",
language = "English",
volume = "2004",
pages = "1517--1523",
booktitle = "Proceedings - IEEE International Conference on Robotics and Automation",
edition = "2",

}

TY - GEN

T1 - Improvement of robot audition by interfacing sound source separation and automatic speech recognition with missing feature theory

AU - Yamamoto, Shun'ichi

AU - Nakadai, Kazuhiro

AU - Tsujino, Hiroshi

AU - Yokoyama, Toshio

AU - Okuno, Hiroshi G.

PY - 2004

Y1 - 2004

N2 - We have been developed robot audition system using the active direction-pass filter (ADPF) with the Scattering Theory, and demonstrated that the humanoid SIG could separate and recognize three simultaneous speeches originating from different directions. This is the first result that a robot can listen to several things simultaneously. However, its general applicability to other robots is not yet confirmed. Since automatic speech recognition (ASR) requires direction- and speaker-dependent acoustic models, it is difficult to adapt various kinds of environments. In addition ASR with lots of acoustic models causes slow processing. In this paper, these three problems are resolved. First, we confirmed the generality of the ADPF by applying it to two humanoids, SIG2 and Replie, under different environments. Next, we present the new interface between ADPF and ASR based on the Missing Feature Theory, which masks broken features of separated sound to make them unavailable to ASR. This new interface improved the recognition performance of three simultaneous speeches up to about 90%. Finally, since the ASR uses only a single acoustic model that is direction- and speaker-independent and created under clean environments, the processing of the whole system was made very light and fast.

AB - We have been developed robot audition system using the active direction-pass filter (ADPF) with the Scattering Theory, and demonstrated that the humanoid SIG could separate and recognize three simultaneous speeches originating from different directions. This is the first result that a robot can listen to several things simultaneously. However, its general applicability to other robots is not yet confirmed. Since automatic speech recognition (ASR) requires direction- and speaker-dependent acoustic models, it is difficult to adapt various kinds of environments. In addition ASR with lots of acoustic models causes slow processing. In this paper, these three problems are resolved. First, we confirmed the generality of the ADPF by applying it to two humanoids, SIG2 and Replie, under different environments. Next, we present the new interface between ADPF and ASR based on the Missing Feature Theory, which masks broken features of separated sound to make them unavailable to ASR. This new interface improved the recognition performance of three simultaneous speeches up to about 90%. Finally, since the ASR uses only a single acoustic model that is direction- and speaker-independent and created under clean environments, the processing of the whole system was made very light and fast.

UR - http://www.scopus.com/inward/record.url?scp=3042664198&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=3042664198&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:3042664198

VL - 2004

SP - 1517

EP - 1523

BT - Proceedings - IEEE International Conference on Robotics and Automation

ER -