Design and implementation of robot audition system 'HARK' - Open source software for listening to three simultaneous speakers

Kazuhiro Nakadai, Toru Takahashi, Hiroshi G. Okuno, Hirofumi Nakajima, Yuji Hasegawa, Hiroshi Tsujino

Research output: Contribution to journalArticle

135 Citations (Scopus)

Abstract

This paper presents the design and implementation of the HARK robot audition software system consisting of sound source localization modules, sound source separation modules and automatic speech recognition modules of separated speech signals that works on any robot with any microphone configuration. Since a robot with ears may be deployed to various auditory environments, the robot audition system should provide an easy way to adapt to them. HARK provides a set of modules to cope with various auditory environments by using an open-sourced middleware, FlowDesigner, and reduces the overheads of data transfer between modules. HARK has been open-sourced since April 2008. The resulting implementation of HARK with MUSIC-based sound source localization, GSS-based sound source separation and Missing Feature Theory-based automatic speech recognition on Honda ASIMO, SIG2 and Robovie R2 attains recognizing three simultaneous utterances with the delay of 1.9 s at the word correct rate of 80-90% for three speakers.

Original languageEnglish
Pages (from-to)739-761
Number of pages23
JournalAdvanced Robotics
Volume24
Issue number5-6
DOIs
Publication statusPublished - 2010 Apr 14
Externally publishedYes

Fingerprint

Audition
Acoustic waves
Robots
Source separation
Speech recognition
Data transfer
Microphones
Middleware
Open source software

Keywords

  • Automatic speech recognition
  • Open source software
  • Robot audition
  • Sound source localization
  • Sound source separation

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Human-Computer Interaction
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Cite this

Design and implementation of robot audition system 'HARK' - Open source software for listening to three simultaneous speakers. / Nakadai, Kazuhiro; Takahashi, Toru; Okuno, Hiroshi G.; Nakajima, Hirofumi; Hasegawa, Yuji; Tsujino, Hiroshi.

In: Advanced Robotics, Vol. 24, No. 5-6, 14.04.2010, p. 739-761.

Research output: Contribution to journalArticle

Nakadai, Kazuhiro ; Takahashi, Toru ; Okuno, Hiroshi G. ; Nakajima, Hirofumi ; Hasegawa, Yuji ; Tsujino, Hiroshi. / Design and implementation of robot audition system 'HARK' - Open source software for listening to three simultaneous speakers. In: Advanced Robotics. 2010 ; Vol. 24, No. 5-6. pp. 739-761.
@article{a043c0c778cc4d11bb98a0eba381b993,
title = "Design and implementation of robot audition system 'HARK' - Open source software for listening to three simultaneous speakers",
abstract = "This paper presents the design and implementation of the HARK robot audition software system consisting of sound source localization modules, sound source separation modules and automatic speech recognition modules of separated speech signals that works on any robot with any microphone configuration. Since a robot with ears may be deployed to various auditory environments, the robot audition system should provide an easy way to adapt to them. HARK provides a set of modules to cope with various auditory environments by using an open-sourced middleware, FlowDesigner, and reduces the overheads of data transfer between modules. HARK has been open-sourced since April 2008. The resulting implementation of HARK with MUSIC-based sound source localization, GSS-based sound source separation and Missing Feature Theory-based automatic speech recognition on Honda ASIMO, SIG2 and Robovie R2 attains recognizing three simultaneous utterances with the delay of 1.9 s at the word correct rate of 80-90{\%} for three speakers.",
keywords = "Automatic speech recognition, Open source software, Robot audition, Sound source localization, Sound source separation",
author = "Kazuhiro Nakadai and Toru Takahashi and Okuno, {Hiroshi G.} and Hirofumi Nakajima and Yuji Hasegawa and Hiroshi Tsujino",
year = "2010",
month = "4",
day = "14",
doi = "10.1163/016918610X493561",
language = "English",
volume = "24",
pages = "739--761",
journal = "Advanced Robotics",
issn = "0169-1864",
publisher = "Taylor and Francis Ltd.",
number = "5-6",

}

TY - JOUR

T1 - Design and implementation of robot audition system 'HARK' - Open source software for listening to three simultaneous speakers

AU - Nakadai, Kazuhiro

AU - Takahashi, Toru

AU - Okuno, Hiroshi G.

AU - Nakajima, Hirofumi

AU - Hasegawa, Yuji

AU - Tsujino, Hiroshi

PY - 2010/4/14

Y1 - 2010/4/14

N2 - This paper presents the design and implementation of the HARK robot audition software system consisting of sound source localization modules, sound source separation modules and automatic speech recognition modules of separated speech signals that works on any robot with any microphone configuration. Since a robot with ears may be deployed to various auditory environments, the robot audition system should provide an easy way to adapt to them. HARK provides a set of modules to cope with various auditory environments by using an open-sourced middleware, FlowDesigner, and reduces the overheads of data transfer between modules. HARK has been open-sourced since April 2008. The resulting implementation of HARK with MUSIC-based sound source localization, GSS-based sound source separation and Missing Feature Theory-based automatic speech recognition on Honda ASIMO, SIG2 and Robovie R2 attains recognizing three simultaneous utterances with the delay of 1.9 s at the word correct rate of 80-90% for three speakers.

AB - This paper presents the design and implementation of the HARK robot audition software system consisting of sound source localization modules, sound source separation modules and automatic speech recognition modules of separated speech signals that works on any robot with any microphone configuration. Since a robot with ears may be deployed to various auditory environments, the robot audition system should provide an easy way to adapt to them. HARK provides a set of modules to cope with various auditory environments by using an open-sourced middleware, FlowDesigner, and reduces the overheads of data transfer between modules. HARK has been open-sourced since April 2008. The resulting implementation of HARK with MUSIC-based sound source localization, GSS-based sound source separation and Missing Feature Theory-based automatic speech recognition on Honda ASIMO, SIG2 and Robovie R2 attains recognizing three simultaneous utterances with the delay of 1.9 s at the word correct rate of 80-90% for three speakers.

KW - Automatic speech recognition

KW - Open source software

KW - Robot audition

KW - Sound source localization

KW - Sound source separation

UR - http://www.scopus.com/inward/record.url?scp=77951808166&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951808166&partnerID=8YFLogxK

U2 - 10.1163/016918610X493561

DO - 10.1163/016918610X493561

M3 - Article

AN - SCOPUS:77951808166

VL - 24

SP - 739

EP - 761

JO - Advanced Robotics

JF - Advanced Robotics

SN - 0169-1864

IS - 5-6

ER -