Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera

Takaaki Hori, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto, Shinji Watanabe, Takanobu Oba, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Atsushi Nakamura, Kazuhiro Otsuka, Dan Mikami, Junji Yamato

Research output: Contribution to journalArticle

53 Citations (Scopus)

Abstract

This paper presents our real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to recognize automatically “who is speaking what” in an online manner for meeting assistance. Our system continuously captures the utterances and face poses of each speaker using a microphone array and an omni-directional camera positioned at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker's channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g., speaking, laughing, watching someone) and the circumstances of the meeting (e.g., topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-la-tency monitoring of meetings, and we show our experimental results for real-time meeting transcription.

Original languageEnglish
Pages (from-to)499-513
Number of pages15
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume20
Issue number2
DOIs
Publication statusPublished - 2012
Externally publishedYes

Fingerprint

Microphones
microphones
Cameras
cameras
Monitoring
Transcription
Speech recognition
laughing
Processing
conversation
speech recognition
analyzers

Keywords

  • Distant microphones
  • meeting analysis
  • speaker diarization
  • speech enhancement
  • speech recognition
  • topic tracking

ASJC Scopus subject areas

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

Cite this

Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera. / Hori, Takaaki; Araki, Shoko; Yoshioka, Takuya; Fujimoto, Masakiyo; Watanabe, Shinji; Oba, Takanobu; Ogawa, Atsunori; Kinoshita, Keisuke; Nakatani, Tomohiro; Nakamura, Atsushi; Otsuka, Kazuhiro; Mikami, Dan; Yamato, Junji.

In: IEEE Transactions on Audio, Speech and Language Processing, Vol. 20, No. 2, 2012, p. 499-513.

Research output: Contribution to journalArticle

Hori, T, Araki, S, Yoshioka, T, Fujimoto, M, Watanabe, S, Oba, T, Ogawa, A, Kinoshita, K, Nakatani, T, Nakamura, A, Otsuka, K, Mikami, D & Yamato, J 2012, 'Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera', IEEE Transactions on Audio, Speech and Language Processing, vol. 20, no. 2, pp. 499-513. https://doi.org/10.1109/TASL.2011.2164527
Hori, Takaaki ; Araki, Shoko ; Yoshioka, Takuya ; Fujimoto, Masakiyo ; Watanabe, Shinji ; Oba, Takanobu ; Ogawa, Atsunori ; Kinoshita, Keisuke ; Nakatani, Tomohiro ; Nakamura, Atsushi ; Otsuka, Kazuhiro ; Mikami, Dan ; Yamato, Junji. / Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera. In: IEEE Transactions on Audio, Speech and Language Processing. 2012 ; Vol. 20, No. 2. pp. 499-513.
@article{0151f987b1a04c8791745d594755d2a8,
title = "Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera",
abstract = "This paper presents our real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to recognize automatically “who is speaking what” in an online manner for meeting assistance. Our system continuously captures the utterances and face poses of each speaker using a microphone array and an omni-directional camera positioned at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker's channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g., speaking, laughing, watching someone) and the circumstances of the meeting (e.g., topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-la-tency monitoring of meetings, and we show our experimental results for real-time meeting transcription.",
keywords = "Distant microphones, meeting analysis, speaker diarization, speech enhancement, speech recognition, topic tracking",
author = "Takaaki Hori and Shoko Araki and Takuya Yoshioka and Masakiyo Fujimoto and Shinji Watanabe and Takanobu Oba and Atsunori Ogawa and Keisuke Kinoshita and Tomohiro Nakatani and Atsushi Nakamura and Kazuhiro Otsuka and Dan Mikami and Junji Yamato",
year = "2012",
doi = "10.1109/TASL.2011.2164527",
language = "English",
volume = "20",
pages = "499--513",
journal = "IEEE Transactions on Speech and Audio Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "2",

}

TY - JOUR

T1 - Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera

AU - Hori, Takaaki

AU - Araki, Shoko

AU - Yoshioka, Takuya

AU - Fujimoto, Masakiyo

AU - Watanabe, Shinji

AU - Oba, Takanobu

AU - Ogawa, Atsunori

AU - Kinoshita, Keisuke

AU - Nakatani, Tomohiro

AU - Nakamura, Atsushi

AU - Otsuka, Kazuhiro

AU - Mikami, Dan

AU - Yamato, Junji

PY - 2012

Y1 - 2012

N2 - This paper presents our real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to recognize automatically “who is speaking what” in an online manner for meeting assistance. Our system continuously captures the utterances and face poses of each speaker using a microphone array and an omni-directional camera positioned at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker's channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g., speaking, laughing, watching someone) and the circumstances of the meeting (e.g., topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-la-tency monitoring of meetings, and we show our experimental results for real-time meeting transcription.

AB - This paper presents our real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to recognize automatically “who is speaking what” in an online manner for meeting assistance. Our system continuously captures the utterances and face poses of each speaker using a microphone array and an omni-directional camera positioned at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker's channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g., speaking, laughing, watching someone) and the circumstances of the meeting (e.g., topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-la-tency monitoring of meetings, and we show our experimental results for real-time meeting transcription.

KW - Distant microphones

KW - meeting analysis

KW - speaker diarization

KW - speech enhancement

KW - speech recognition

KW - topic tracking

UR - http://www.scopus.com/inward/record.url?scp=85008590333&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85008590333&partnerID=8YFLogxK

U2 - 10.1109/TASL.2011.2164527

DO - 10.1109/TASL.2011.2164527

M3 - Article

AN - SCOPUS:85008590333

VL - 20

SP - 499

EP - 513

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

SN - 1558-7916

IS - 2

ER -