An active audition framework for auditory-driven HRI

Application to interactive robot dancing

Joao Lobato Oliveira, Gokhan Ince, Keisuke Nakamura, Kazuhiro Nakadai, Hiroshi G. Okuno, Luis Paulo Reis, Fabien Gouyon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

In this paper we propose a general active audition framework for auditory-driven Human-Robot Interaction (HRI). The proposed framework simultaneously processes speech and music on-the-fly, integrates perceptual models for robot audition, and supports verbal and non-verbal interactive communication by means of (pro)active behaviors. To ensure a reliable interaction, on top of the framework a behavior decision mechanism based on active audition policies the robot's actions according to the reliability of the acoustic signals for auditory processing. To validate the framework's application to general auditory-driven HRI, we propose the implementation of an interactive robot dancing system. This system integrates three preprocessing robot audition modules: sound source localization, sound source separation, and ego noise suppression; two modules for auditory perception: live audio beat tracking and automatic speech recognition; and multi-modal behaviors for verbal and non-verbal interaction: music-driven dancing and speech-driven dialoguing. To fully assess the system, we set up experimental and interactive real-world scenarios with highly dynamic acoustic conditions, and defined a set of evaluation criteria. The experimental tests revealed accurate and robust beat tracking and speech recognition, and convincing dance beat-synchrony. The interactive sessions confirmed the fundamental role of the behavior decision mechanism for actively maintaining a robust and natural human-robot interaction.

Original languageEnglish
Title of host publicationProceedings - IEEE International Workshop on Robot and Human Interactive Communication
Pages1078-1085
Number of pages8
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event2012 21st IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN 2012 - Paris
Duration: 2012 Sep 92012 Sep 13

Other

Other2012 21st IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN 2012
CityParis
Period12/9/912/9/13

Fingerprint

Human robot interaction
Audition
Robots
Speech recognition
Acoustics
Acoustic waves
Source separation
Acoustic noise
Communication
Processing

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Human-Computer Interaction

Cite this

Oliveira, J. L., Ince, G., Nakamura, K., Nakadai, K., Okuno, H. G., Reis, L. P., & Gouyon, F. (2012). An active audition framework for auditory-driven HRI: Application to interactive robot dancing. In Proceedings - IEEE International Workshop on Robot and Human Interactive Communication (pp. 1078-1085). [6343892] https://doi.org/10.1109/ROMAN.2012.6343892

An active audition framework for auditory-driven HRI : Application to interactive robot dancing. / Oliveira, Joao Lobato; Ince, Gokhan; Nakamura, Keisuke; Nakadai, Kazuhiro; Okuno, Hiroshi G.; Reis, Luis Paulo; Gouyon, Fabien.

Proceedings - IEEE International Workshop on Robot and Human Interactive Communication. 2012. p. 1078-1085 6343892.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Oliveira, JL, Ince, G, Nakamura, K, Nakadai, K, Okuno, HG, Reis, LP & Gouyon, F 2012, An active audition framework for auditory-driven HRI: Application to interactive robot dancing. in Proceedings - IEEE International Workshop on Robot and Human Interactive Communication., 6343892, pp. 1078-1085, 2012 21st IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN 2012, Paris, 12/9/9. https://doi.org/10.1109/ROMAN.2012.6343892
Oliveira JL, Ince G, Nakamura K, Nakadai K, Okuno HG, Reis LP et al. An active audition framework for auditory-driven HRI: Application to interactive robot dancing. In Proceedings - IEEE International Workshop on Robot and Human Interactive Communication. 2012. p. 1078-1085. 6343892 https://doi.org/10.1109/ROMAN.2012.6343892
Oliveira, Joao Lobato ; Ince, Gokhan ; Nakamura, Keisuke ; Nakadai, Kazuhiro ; Okuno, Hiroshi G. ; Reis, Luis Paulo ; Gouyon, Fabien. / An active audition framework for auditory-driven HRI : Application to interactive robot dancing. Proceedings - IEEE International Workshop on Robot and Human Interactive Communication. 2012. pp. 1078-1085
@inproceedings{995791bf65d74a3ea8da6f0e810b2cec,
title = "An active audition framework for auditory-driven HRI: Application to interactive robot dancing",
abstract = "In this paper we propose a general active audition framework for auditory-driven Human-Robot Interaction (HRI). The proposed framework simultaneously processes speech and music on-the-fly, integrates perceptual models for robot audition, and supports verbal and non-verbal interactive communication by means of (pro)active behaviors. To ensure a reliable interaction, on top of the framework a behavior decision mechanism based on active audition policies the robot's actions according to the reliability of the acoustic signals for auditory processing. To validate the framework's application to general auditory-driven HRI, we propose the implementation of an interactive robot dancing system. This system integrates three preprocessing robot audition modules: sound source localization, sound source separation, and ego noise suppression; two modules for auditory perception: live audio beat tracking and automatic speech recognition; and multi-modal behaviors for verbal and non-verbal interaction: music-driven dancing and speech-driven dialoguing. To fully assess the system, we set up experimental and interactive real-world scenarios with highly dynamic acoustic conditions, and defined a set of evaluation criteria. The experimental tests revealed accurate and robust beat tracking and speech recognition, and convincing dance beat-synchrony. The interactive sessions confirmed the fundamental role of the behavior decision mechanism for actively maintaining a robust and natural human-robot interaction.",
author = "Oliveira, {Joao Lobato} and Gokhan Ince and Keisuke Nakamura and Kazuhiro Nakadai and Okuno, {Hiroshi G.} and Reis, {Luis Paulo} and Fabien Gouyon",
year = "2012",
doi = "10.1109/ROMAN.2012.6343892",
language = "English",
isbn = "9781467346054",
pages = "1078--1085",
booktitle = "Proceedings - IEEE International Workshop on Robot and Human Interactive Communication",

}

TY - GEN

T1 - An active audition framework for auditory-driven HRI

T2 - Application to interactive robot dancing

AU - Oliveira, Joao Lobato

AU - Ince, Gokhan

AU - Nakamura, Keisuke

AU - Nakadai, Kazuhiro

AU - Okuno, Hiroshi G.

AU - Reis, Luis Paulo

AU - Gouyon, Fabien

PY - 2012

Y1 - 2012

N2 - In this paper we propose a general active audition framework for auditory-driven Human-Robot Interaction (HRI). The proposed framework simultaneously processes speech and music on-the-fly, integrates perceptual models for robot audition, and supports verbal and non-verbal interactive communication by means of (pro)active behaviors. To ensure a reliable interaction, on top of the framework a behavior decision mechanism based on active audition policies the robot's actions according to the reliability of the acoustic signals for auditory processing. To validate the framework's application to general auditory-driven HRI, we propose the implementation of an interactive robot dancing system. This system integrates three preprocessing robot audition modules: sound source localization, sound source separation, and ego noise suppression; two modules for auditory perception: live audio beat tracking and automatic speech recognition; and multi-modal behaviors for verbal and non-verbal interaction: music-driven dancing and speech-driven dialoguing. To fully assess the system, we set up experimental and interactive real-world scenarios with highly dynamic acoustic conditions, and defined a set of evaluation criteria. The experimental tests revealed accurate and robust beat tracking and speech recognition, and convincing dance beat-synchrony. The interactive sessions confirmed the fundamental role of the behavior decision mechanism for actively maintaining a robust and natural human-robot interaction.

AB - In this paper we propose a general active audition framework for auditory-driven Human-Robot Interaction (HRI). The proposed framework simultaneously processes speech and music on-the-fly, integrates perceptual models for robot audition, and supports verbal and non-verbal interactive communication by means of (pro)active behaviors. To ensure a reliable interaction, on top of the framework a behavior decision mechanism based on active audition policies the robot's actions according to the reliability of the acoustic signals for auditory processing. To validate the framework's application to general auditory-driven HRI, we propose the implementation of an interactive robot dancing system. This system integrates three preprocessing robot audition modules: sound source localization, sound source separation, and ego noise suppression; two modules for auditory perception: live audio beat tracking and automatic speech recognition; and multi-modal behaviors for verbal and non-verbal interaction: music-driven dancing and speech-driven dialoguing. To fully assess the system, we set up experimental and interactive real-world scenarios with highly dynamic acoustic conditions, and defined a set of evaluation criteria. The experimental tests revealed accurate and robust beat tracking and speech recognition, and convincing dance beat-synchrony. The interactive sessions confirmed the fundamental role of the behavior decision mechanism for actively maintaining a robust and natural human-robot interaction.

UR - http://www.scopus.com/inward/record.url?scp=84870795054&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84870795054&partnerID=8YFLogxK

U2 - 10.1109/ROMAN.2012.6343892

DO - 10.1109/ROMAN.2012.6343892

M3 - Conference contribution

SN - 9781467346054

SP - 1078

EP - 1085

BT - Proceedings - IEEE International Workshop on Robot and Human Interactive Communication

ER -