Design and implementation of 3d auditory scene visualizer towards auditory awarenesswith face tracking

Yuji Kubota, Masatoshi Yoshida, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

If machine audition can recognize an auditory scene containing simultaneous and moving talkers, what kinds of awareness will people gain from an auditory scene visualizer? This paper presents the design and implementation of 3D Auditory Scene Visualizer based on the visual information seeking mantra, i.e., "overview first, zoom and filter, then details on demand". The machine audition system called HARK captures 3D sounds with a microphone array, localizes and separates sounds, and recognizes separated sounds by automatic speech recognition (ASR). The 3D visualizer implemented in Java 3D displays each sound stream as a beam originating from the center of the microphones (overview mode), shows temporal snapshots with/without specifying focusing areas (zoom and filter mode), and shows detailed information about a particular sound stream (details on demand). In the details-ondemand mode, ASR results are displayed in a "karaoke" manner, i.e., character-by-character. This three-mode visualization will give the user auditory awareness enhanced by HARK. In addition, a face-tracking system automatically changes the focus of attention by tracking the user's face. The resulting system is portable and can be deployed in any place, so it is expected to give more vivid awareness than expensive high-fidelity auditory scene reproduction systems.

Original languageEnglish
Title of host publicationProceedings - 10th IEEE International Symposium on Multimedia, ISM 2008
Pages468-476
Number of pages9
DOIs
Publication statusPublished - 2008 Dec 1
Externally publishedYes
Event10th IEEE International Symposium on Multimedia, ISM 2008 - Berkeley, CA, United States
Duration: 2008 Dec 152008 Dec 17

Publication series

NameProceedings - 10th IEEE International Symposium on Multimedia, ISM 2008

Conference

Conference10th IEEE International Symposium on Multimedia, ISM 2008
CountryUnited States
CityBerkeley, CA
Period08/12/1508/12/17

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Design and implementation of 3d auditory scene visualizer towards auditory awarenesswith face tracking'. Together they form a unique fingerprint.

  • Cite this

    Kubota, Y., Yoshida, M., Komatani, K., Ogata, T., & Okuno, H. G. (2008). Design and implementation of 3d auditory scene visualizer towards auditory awarenesswith face tracking. In Proceedings - 10th IEEE International Symposium on Multimedia, ISM 2008 (pp. 468-476). [4741208] (Proceedings - 10th IEEE International Symposium on Multimedia, ISM 2008). https://doi.org/10.1109/ISM.2008.107