An open source software system for robot audition HARK and its evaluation

Kazuhiro Nakadai, Hiroshi G. Okuno, Hirofumi Nakajima, Yuji Hasegawa, Hiroshi Tsujino

Research output: Chapter in Book/Report/Conference proceedingConference contribution

50 Citations (Scopus)

Abstract

Robot capability of listening to several things at once by its own ears, that is, robot audition, is important in improving human-robot interaction. The critical issue in robot audition is real-time processing in noisy environments with high flexibility to support various kinds of robots and hardware configurations. This paper presents open-source robot audition software, called "HARK", which includes sound source localization, separation, and automatic speech recognition (ASR). Since separated sounds suffer from spectral distortion due to separation, HARK generates a temporal-frequency map of reliability, called "missing feature mask", for features of separated sounds. Then separated sounds are recognized by the Missing-Feature Theory (MFT) based ASR with missing feature masks. HARK is implemented on the middleware called "FlowDesigner" to share intermediate audio data, which provides real-time processing. HARK's performance in recognition of noisy/simultaneous speech is shown by using three humanoid robots, Honda ASIMO, SIG2 and Robovie with different microphone layouts.

Original languageEnglish
Title of host publication2008 8th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2008
Pages561-566
Number of pages6
DOIs
Publication statusPublished - 2008
Externally publishedYes
Event2008 8th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2008 - Daejeon
Duration: 2008 Dec 12008 Dec 3

Other

Other2008 8th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2008
CityDaejeon
Period08/12/108/12/3

Fingerprint

Audition
Robots
Acoustic waves
Speech recognition
Masks
Source separation
Human robot interaction
Microphones
Processing
Middleware
Open source software
Hardware

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction

Cite this

Nakadai, K., Okuno, H. G., Nakajima, H., Hasegawa, Y., & Tsujino, H. (2008). An open source software system for robot audition HARK and its evaluation. In 2008 8th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2008 (pp. 561-566). [4756031] https://doi.org/10.1109/ICHR.2008.4756031

An open source software system for robot audition HARK and its evaluation. / Nakadai, Kazuhiro; Okuno, Hiroshi G.; Nakajima, Hirofumi; Hasegawa, Yuji; Tsujino, Hiroshi.

2008 8th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2008. 2008. p. 561-566 4756031.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakadai, K, Okuno, HG, Nakajima, H, Hasegawa, Y & Tsujino, H 2008, An open source software system for robot audition HARK and its evaluation. in 2008 8th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2008., 4756031, pp. 561-566, 2008 8th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2008, Daejeon, 08/12/1. https://doi.org/10.1109/ICHR.2008.4756031
Nakadai K, Okuno HG, Nakajima H, Hasegawa Y, Tsujino H. An open source software system for robot audition HARK and its evaluation. In 2008 8th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2008. 2008. p. 561-566. 4756031 https://doi.org/10.1109/ICHR.2008.4756031
Nakadai, Kazuhiro ; Okuno, Hiroshi G. ; Nakajima, Hirofumi ; Hasegawa, Yuji ; Tsujino, Hiroshi. / An open source software system for robot audition HARK and its evaluation. 2008 8th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2008. 2008. pp. 561-566
@inproceedings{ee1e3abd8a8b402bbaa3748a7aa761ab,
title = "An open source software system for robot audition HARK and its evaluation",
abstract = "Robot capability of listening to several things at once by its own ears, that is, robot audition, is important in improving human-robot interaction. The critical issue in robot audition is real-time processing in noisy environments with high flexibility to support various kinds of robots and hardware configurations. This paper presents open-source robot audition software, called {"}HARK{"}, which includes sound source localization, separation, and automatic speech recognition (ASR). Since separated sounds suffer from spectral distortion due to separation, HARK generates a temporal-frequency map of reliability, called {"}missing feature mask{"}, for features of separated sounds. Then separated sounds are recognized by the Missing-Feature Theory (MFT) based ASR with missing feature masks. HARK is implemented on the middleware called {"}FlowDesigner{"} to share intermediate audio data, which provides real-time processing. HARK's performance in recognition of noisy/simultaneous speech is shown by using three humanoid robots, Honda ASIMO, SIG2 and Robovie with different microphone layouts.",
author = "Kazuhiro Nakadai and Okuno, {Hiroshi G.} and Hirofumi Nakajima and Yuji Hasegawa and Hiroshi Tsujino",
year = "2008",
doi = "10.1109/ICHR.2008.4756031",
language = "English",
isbn = "9781424428229",
pages = "561--566",
booktitle = "2008 8th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2008",

}

TY - GEN

T1 - An open source software system for robot audition HARK and its evaluation

AU - Nakadai, Kazuhiro

AU - Okuno, Hiroshi G.

AU - Nakajima, Hirofumi

AU - Hasegawa, Yuji

AU - Tsujino, Hiroshi

PY - 2008

Y1 - 2008

N2 - Robot capability of listening to several things at once by its own ears, that is, robot audition, is important in improving human-robot interaction. The critical issue in robot audition is real-time processing in noisy environments with high flexibility to support various kinds of robots and hardware configurations. This paper presents open-source robot audition software, called "HARK", which includes sound source localization, separation, and automatic speech recognition (ASR). Since separated sounds suffer from spectral distortion due to separation, HARK generates a temporal-frequency map of reliability, called "missing feature mask", for features of separated sounds. Then separated sounds are recognized by the Missing-Feature Theory (MFT) based ASR with missing feature masks. HARK is implemented on the middleware called "FlowDesigner" to share intermediate audio data, which provides real-time processing. HARK's performance in recognition of noisy/simultaneous speech is shown by using three humanoid robots, Honda ASIMO, SIG2 and Robovie with different microphone layouts.

AB - Robot capability of listening to several things at once by its own ears, that is, robot audition, is important in improving human-robot interaction. The critical issue in robot audition is real-time processing in noisy environments with high flexibility to support various kinds of robots and hardware configurations. This paper presents open-source robot audition software, called "HARK", which includes sound source localization, separation, and automatic speech recognition (ASR). Since separated sounds suffer from spectral distortion due to separation, HARK generates a temporal-frequency map of reliability, called "missing feature mask", for features of separated sounds. Then separated sounds are recognized by the Missing-Feature Theory (MFT) based ASR with missing feature masks. HARK is implemented on the middleware called "FlowDesigner" to share intermediate audio data, which provides real-time processing. HARK's performance in recognition of noisy/simultaneous speech is shown by using three humanoid robots, Honda ASIMO, SIG2 and Robovie with different microphone layouts.

UR - http://www.scopus.com/inward/record.url?scp=63549118078&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=63549118078&partnerID=8YFLogxK

U2 - 10.1109/ICHR.2008.4756031

DO - 10.1109/ICHR.2008.4756031

M3 - Conference contribution

AN - SCOPUS:63549118078

SN - 9781424428229

SP - 561

EP - 566

BT - 2008 8th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2008

ER -