Design and implementation of 3d auditory scene visualizer towards auditory awarenesswith face tracking

Yuji Kubota*, Masatoshi Yoshida, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

*この研究の対応する著者

研究成果: Conference contribution

11 被引用数 (Scopus)

抄録

If machine audition can recognize an auditory scene containing simultaneous and moving talkers, what kinds of awareness will people gain from an auditory scene visualizer? This paper presents the design and implementation of 3D Auditory Scene Visualizer based on the visual information seeking mantra, i.e., "overview first, zoom and filter, then details on demand". The machine audition system called HARK captures 3D sounds with a microphone array, localizes and separates sounds, and recognizes separated sounds by automatic speech recognition (ASR). The 3D visualizer implemented in Java 3D displays each sound stream as a beam originating from the center of the microphones (overview mode), shows temporal snapshots with/without specifying focusing areas (zoom and filter mode), and shows detailed information about a particular sound stream (details on demand). In the details-ondemand mode, ASR results are displayed in a "karaoke" manner, i.e., character-by-character. This three-mode visualization will give the user auditory awareness enhanced by HARK. In addition, a face-tracking system automatically changes the focus of attention by tracking the user's face. The resulting system is portable and can be deployed in any place, so it is expected to give more vivid awareness than expensive high-fidelity auditory scene reproduction systems.

本文言語English
ホスト出版物のタイトルProceedings - 10th IEEE International Symposium on Multimedia, ISM 2008
ページ468-476
ページ数9
DOI
出版ステータスPublished - 2008
外部発表はい
イベント10th IEEE International Symposium on Multimedia, ISM 2008 - Berkeley, CA, United States
継続期間: 2008 12 152008 12 17

出版物シリーズ

名前Proceedings - 10th IEEE International Symposium on Multimedia, ISM 2008

Conference

Conference10th IEEE International Symposium on Multimedia, ISM 2008
国/地域United States
CityBerkeley, CA
Period08/12/1508/12/17

ASJC Scopus subject areas

  • コンピュータ グラフィックスおよびコンピュータ支援設計
  • コンピュータ サイエンスの応用
  • 電子工学および電気工学

フィンガープリント

「Design and implementation of 3d auditory scene visualizer towards auditory awarenesswith face tracking」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル