Robot audition from the viewpoint of computational auditory scene analysis

Hiroshi G. Okuno, Tetsuya Ogata, Kazunori Komatani

研究成果: Conference contribution

1 引用 (Scopus)

抄録

We have been engaged in research on computational auditory scene analysis to attain sophisticated robot/computer human interaction by manipulating real-world sound signals. The objective of our research is the understanding of an arbitrary sound mixture including music and environmental sounds as well as voiced speech, obtained by robot's ears (microphones) embedded on the robot. Three main issues in computational auditory scene analysis are sound source localization, separation, and recognition of separated sounds for a mixture of speech signals as well as polyphonic music signals. The Missing Feature Theory (MFT) approach integrates sound source separation and automatic speech recognition by generating missing feature masks. This robot audition system has been successfully ported to three kinds of robots, SIG2, Robovie R2 and Honda ASIMO. A robot recognizes three simultaneous speeches such as placing a meal order ora referee for RockPaper-Scissors Sound Games with a delay of less than 2 seconds. The real-time beat tracking system is also developed for robot audition. A robot hears music, understands and predicts its musical beats to behave in accordance with the beat times in real-time.

元の言語English
ホスト出版物のタイトルProceedings - International Conference on Informatics Education and Research for Knowledge-Circulating Society, ICKS 2008
ページ35-40
ページ数6
DOI
出版物ステータスPublished - 2008
外部発表Yes
イベントInternational Conference on Informatics Education and Research for Knowledge-Circulating Society, ICKS 2008 - Kyoto
継続期間: 2008 1 172008 1 17

Other

OtherInternational Conference on Informatics Education and Research for Knowledge-Circulating Society, ICKS 2008
Kyoto
期間08/1/1708/1/17

Fingerprint

Audition
robot
Robots
Acoustic waves
Source separation
music
referee
meals
Human computer interaction
Microphones
Speech recognition
Masks
interaction
time

ASJC Scopus subject areas

  • Information Systems
  • Education

これを引用

Okuno, H. G., Ogata, T., & Komatani, K. (2008). Robot audition from the viewpoint of computational auditory scene analysis. : Proceedings - International Conference on Informatics Education and Research for Knowledge-Circulating Society, ICKS 2008 (pp. 35-40). [4460465] https://doi.org/10.1109/ICKS.2008.10

Robot audition from the viewpoint of computational auditory scene analysis. / Okuno, Hiroshi G.; Ogata, Tetsuya; Komatani, Kazunori.

Proceedings - International Conference on Informatics Education and Research for Knowledge-Circulating Society, ICKS 2008. 2008. p. 35-40 4460465.

研究成果: Conference contribution

Okuno, HG, Ogata, T & Komatani, K 2008, Robot audition from the viewpoint of computational auditory scene analysis. : Proceedings - International Conference on Informatics Education and Research for Knowledge-Circulating Society, ICKS 2008., 4460465, pp. 35-40, International Conference on Informatics Education and Research for Knowledge-Circulating Society, ICKS 2008, Kyoto, 08/1/17. https://doi.org/10.1109/ICKS.2008.10
Okuno HG, Ogata T, Komatani K. Robot audition from the viewpoint of computational auditory scene analysis. : Proceedings - International Conference on Informatics Education and Research for Knowledge-Circulating Society, ICKS 2008. 2008. p. 35-40. 4460465 https://doi.org/10.1109/ICKS.2008.10
Okuno, Hiroshi G. ; Ogata, Tetsuya ; Komatani, Kazunori. / Robot audition from the viewpoint of computational auditory scene analysis. Proceedings - International Conference on Informatics Education and Research for Knowledge-Circulating Society, ICKS 2008. 2008. pp. 35-40
@inproceedings{239adac1534a455fa9cea0b9d5ec1a23,
title = "Robot audition from the viewpoint of computational auditory scene analysis",
abstract = "We have been engaged in research on computational auditory scene analysis to attain sophisticated robot/computer human interaction by manipulating real-world sound signals. The objective of our research is the understanding of an arbitrary sound mixture including music and environmental sounds as well as voiced speech, obtained by robot's ears (microphones) embedded on the robot. Three main issues in computational auditory scene analysis are sound source localization, separation, and recognition of separated sounds for a mixture of speech signals as well as polyphonic music signals. The Missing Feature Theory (MFT) approach integrates sound source separation and automatic speech recognition by generating missing feature masks. This robot audition system has been successfully ported to three kinds of robots, SIG2, Robovie R2 and Honda ASIMO. A robot recognizes three simultaneous speeches such as placing a meal order ora referee for RockPaper-Scissors Sound Games with a delay of less than 2 seconds. The real-time beat tracking system is also developed for robot audition. A robot hears music, understands and predicts its musical beats to behave in accordance with the beat times in real-time.",
author = "Okuno, {Hiroshi G.} and Tetsuya Ogata and Kazunori Komatani",
year = "2008",
doi = "10.1109/ICKS.2008.10",
language = "English",
isbn = "0769531288",
pages = "35--40",
booktitle = "Proceedings - International Conference on Informatics Education and Research for Knowledge-Circulating Society, ICKS 2008",

}

TY - GEN

T1 - Robot audition from the viewpoint of computational auditory scene analysis

AU - Okuno, Hiroshi G.

AU - Ogata, Tetsuya

AU - Komatani, Kazunori

PY - 2008

Y1 - 2008

N2 - We have been engaged in research on computational auditory scene analysis to attain sophisticated robot/computer human interaction by manipulating real-world sound signals. The objective of our research is the understanding of an arbitrary sound mixture including music and environmental sounds as well as voiced speech, obtained by robot's ears (microphones) embedded on the robot. Three main issues in computational auditory scene analysis are sound source localization, separation, and recognition of separated sounds for a mixture of speech signals as well as polyphonic music signals. The Missing Feature Theory (MFT) approach integrates sound source separation and automatic speech recognition by generating missing feature masks. This robot audition system has been successfully ported to three kinds of robots, SIG2, Robovie R2 and Honda ASIMO. A robot recognizes three simultaneous speeches such as placing a meal order ora referee for RockPaper-Scissors Sound Games with a delay of less than 2 seconds. The real-time beat tracking system is also developed for robot audition. A robot hears music, understands and predicts its musical beats to behave in accordance with the beat times in real-time.

AB - We have been engaged in research on computational auditory scene analysis to attain sophisticated robot/computer human interaction by manipulating real-world sound signals. The objective of our research is the understanding of an arbitrary sound mixture including music and environmental sounds as well as voiced speech, obtained by robot's ears (microphones) embedded on the robot. Three main issues in computational auditory scene analysis are sound source localization, separation, and recognition of separated sounds for a mixture of speech signals as well as polyphonic music signals. The Missing Feature Theory (MFT) approach integrates sound source separation and automatic speech recognition by generating missing feature masks. This robot audition system has been successfully ported to three kinds of robots, SIG2, Robovie R2 and Honda ASIMO. A robot recognizes three simultaneous speeches such as placing a meal order ora referee for RockPaper-Scissors Sound Games with a delay of less than 2 seconds. The real-time beat tracking system is also developed for robot audition. A robot hears music, understands and predicts its musical beats to behave in accordance with the beat times in real-time.

UR - http://www.scopus.com/inward/record.url?scp=50149115357&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=50149115357&partnerID=8YFLogxK

U2 - 10.1109/ICKS.2008.10

DO - 10.1109/ICKS.2008.10

M3 - Conference contribution

AN - SCOPUS:50149115357

SN - 0769531288

SN - 9780769531281

SP - 35

EP - 40

BT - Proceedings - International Conference on Informatics Education and Research for Knowledge-Circulating Society, ICKS 2008

ER -