Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera

Takaaki Hori, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto, Shinji Watanabe, Takanobu Oba, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Atsushi Nakamura, Kazuhiro Otsuka, Dan Mikami, Junji Yamato

研究成果: Article査読

67 被引用数 (Scopus)

抄録

This paper presents our real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to recognize automatically “who is speaking what” in an online manner for meeting assistance. Our system continuously captures the utterances and face poses of each speaker using a microphone array and an omni-directional camera positioned at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker's channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g., speaking, laughing, watching someone) and the circumstances of the meeting (e.g., topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-la-tency monitoring of meetings, and we show our experimental results for real-time meeting transcription.

本文言語English
ページ(範囲)499-513
ページ数15
ジャーナルIEEE Transactions on Audio, Speech and Language Processing
20
2
DOI
出版ステータスPublished - 2012
外部発表はい

ASJC Scopus subject areas

  • 音響学および超音波学
  • 電子工学および電気工学

フィンガープリント

「Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル