Computational auditory scene analysis and its application to robot audition

Hiroshi G. Okuno, Kazuhiro Nakadai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

Robot capability of hearing sounds, in particular, a mixture of sounds, by its own microphones, that is, robot audition, is important in improving human robot interaction. This paper presents the robot audition open-source software, called "HARK" (HRI-JP Audition for Robots with Kyoto University), which consists of primitive functions in computational auditory scene analysis; sound source localization, separation, and recognition of separated sounds. Since separated sounds suffer from spectral distortion due to separation, the HARK generates a time-spectral map of reliability, called "missing feature mask", for features of separated sounds. Then separated sounds are recognized by the Missing-Feature Theory (MFT) based ASR with missing feature masks. The HARK is implemented on the middleware called "FlowDesigner" to share intermediate audio data, which enables near real-time processing.

Original languageEnglish
Title of host publication2008 Hands-free Speech Communication and Microphone Arrays, Proceedings, HSCMA 2008
Pages124-127
Number of pages4
DOIs
Publication statusPublished - 2008
Externally publishedYes
Event2008 Hands-free Speech Communication and Microphone Arrays, HSCMA 2008 - Trento
Duration: 2008 May 62008 May 8

Other

Other2008 Hands-free Speech Communication and Microphone Arrays, HSCMA 2008
CityTrento
Period08/5/608/5/8

Fingerprint

Audition
robot
Acoustic waves
Robots
hearing (sound)
Masks
Source separation
Human robot interaction
Microphones
Middleware
interaction
Processing
time

Keywords

  • Computational auditory scene analysis
  • Missing feature theory
  • Robot audition
  • Simultaneous speakers

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering
  • Communication

Cite this

Okuno, H. G., & Nakadai, K. (2008). Computational auditory scene analysis and its application to robot audition. In 2008 Hands-free Speech Communication and Microphone Arrays, Proceedings, HSCMA 2008 (pp. 124-127). [4538702] https://doi.org/10.1109/HSCMA.2008.4538702

Computational auditory scene analysis and its application to robot audition. / Okuno, Hiroshi G.; Nakadai, Kazuhiro.

2008 Hands-free Speech Communication and Microphone Arrays, Proceedings, HSCMA 2008. 2008. p. 124-127 4538702.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Okuno, HG & Nakadai, K 2008, Computational auditory scene analysis and its application to robot audition. in 2008 Hands-free Speech Communication and Microphone Arrays, Proceedings, HSCMA 2008., 4538702, pp. 124-127, 2008 Hands-free Speech Communication and Microphone Arrays, HSCMA 2008, Trento, 08/5/6. https://doi.org/10.1109/HSCMA.2008.4538702
Okuno HG, Nakadai K. Computational auditory scene analysis and its application to robot audition. In 2008 Hands-free Speech Communication and Microphone Arrays, Proceedings, HSCMA 2008. 2008. p. 124-127. 4538702 https://doi.org/10.1109/HSCMA.2008.4538702
Okuno, Hiroshi G. ; Nakadai, Kazuhiro. / Computational auditory scene analysis and its application to robot audition. 2008 Hands-free Speech Communication and Microphone Arrays, Proceedings, HSCMA 2008. 2008. pp. 124-127
@inproceedings{a4f94496f3e8458c897c391c165568d5,
title = "Computational auditory scene analysis and its application to robot audition",
abstract = "Robot capability of hearing sounds, in particular, a mixture of sounds, by its own microphones, that is, robot audition, is important in improving human robot interaction. This paper presents the robot audition open-source software, called {"}HARK{"} (HRI-JP Audition for Robots with Kyoto University), which consists of primitive functions in computational auditory scene analysis; sound source localization, separation, and recognition of separated sounds. Since separated sounds suffer from spectral distortion due to separation, the HARK generates a time-spectral map of reliability, called {"}missing feature mask{"}, for features of separated sounds. Then separated sounds are recognized by the Missing-Feature Theory (MFT) based ASR with missing feature masks. The HARK is implemented on the middleware called {"}FlowDesigner{"} to share intermediate audio data, which enables near real-time processing.",
keywords = "Computational auditory scene analysis, Missing feature theory, Robot audition, Simultaneous speakers",
author = "Okuno, {Hiroshi G.} and Kazuhiro Nakadai",
year = "2008",
doi = "10.1109/HSCMA.2008.4538702",
language = "English",
isbn = "9781424423385",
pages = "124--127",
booktitle = "2008 Hands-free Speech Communication and Microphone Arrays, Proceedings, HSCMA 2008",

}

TY - GEN

T1 - Computational auditory scene analysis and its application to robot audition

AU - Okuno, Hiroshi G.

AU - Nakadai, Kazuhiro

PY - 2008

Y1 - 2008

N2 - Robot capability of hearing sounds, in particular, a mixture of sounds, by its own microphones, that is, robot audition, is important in improving human robot interaction. This paper presents the robot audition open-source software, called "HARK" (HRI-JP Audition for Robots with Kyoto University), which consists of primitive functions in computational auditory scene analysis; sound source localization, separation, and recognition of separated sounds. Since separated sounds suffer from spectral distortion due to separation, the HARK generates a time-spectral map of reliability, called "missing feature mask", for features of separated sounds. Then separated sounds are recognized by the Missing-Feature Theory (MFT) based ASR with missing feature masks. The HARK is implemented on the middleware called "FlowDesigner" to share intermediate audio data, which enables near real-time processing.

AB - Robot capability of hearing sounds, in particular, a mixture of sounds, by its own microphones, that is, robot audition, is important in improving human robot interaction. This paper presents the robot audition open-source software, called "HARK" (HRI-JP Audition for Robots with Kyoto University), which consists of primitive functions in computational auditory scene analysis; sound source localization, separation, and recognition of separated sounds. Since separated sounds suffer from spectral distortion due to separation, the HARK generates a time-spectral map of reliability, called "missing feature mask", for features of separated sounds. Then separated sounds are recognized by the Missing-Feature Theory (MFT) based ASR with missing feature masks. The HARK is implemented on the middleware called "FlowDesigner" to share intermediate audio data, which enables near real-time processing.

KW - Computational auditory scene analysis

KW - Missing feature theory

KW - Robot audition

KW - Simultaneous speakers

UR - http://www.scopus.com/inward/record.url?scp=50449106910&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=50449106910&partnerID=8YFLogxK

U2 - 10.1109/HSCMA.2008.4538702

DO - 10.1109/HSCMA.2008.4538702

M3 - Conference contribution

SN - 9781424423385

SP - 124

EP - 127

BT - 2008 Hands-free Speech Communication and Microphone Arrays, Proceedings, HSCMA 2008

ER -