Real-time meeting recognition and understanding using distant microphones and omni-directional camera

Takaaki Hori, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto, Shinji Watanabe, Takanobu Oba, Atsunori Ogawa, Kazuhiro Otsuka, Dan Mikami, Keisuke Kinoshita, Tomohiro Nakatani, Atsushi Nakamura, Junji Yamato

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

This paper presents our newly developed real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to automatically recognize "who is speaking what" in an online manner for meeting assistance. Our system continuously captures the utterances and the face pose of each speaker using a distant microphone array and an omni-directional camera at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker's channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g. speaking, laughing, watching someone) and the situation of the meeting (e.g. topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-latency monitoring of meetings, and we show our experimental results for real-time meeting transcription.

Original languageEnglish
Title of host publication2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings
Pages424-429
Number of pages6
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Berkeley, CA, United States
Duration: 2010 Dec 122010 Dec 15

Other

Other2010 IEEE Workshop on Spoken Language Technology, SLT 2010
CountryUnited States
CityBerkeley, CA
Period10/12/1210/12/15

Fingerprint

Latency
Utterance
Monitoring
Speech Recognition
Transcription

Keywords

  • Distant microphones
  • Meeting analysis
  • Speaker diarization
  • Speech enhancement
  • Speech recognition
  • Topic tracking

ASJC Scopus subject areas

  • Language and Linguistics

Cite this

Hori, T., Araki, S., Yoshioka, T., Fujimoto, M., Watanabe, S., Oba, T., ... Yamato, J. (2010). Real-time meeting recognition and understanding using distant microphones and omni-directional camera. In 2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings (pp. 424-429). [5700890] https://doi.org/10.1109/SLT.2010.5700890

Real-time meeting recognition and understanding using distant microphones and omni-directional camera. / Hori, Takaaki; Araki, Shoko; Yoshioka, Takuya; Fujimoto, Masakiyo; Watanabe, Shinji; Oba, Takanobu; Ogawa, Atsunori; Otsuka, Kazuhiro; Mikami, Dan; Kinoshita, Keisuke; Nakatani, Tomohiro; Nakamura, Atsushi; Yamato, Junji.

2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings. 2010. p. 424-429 5700890.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hori, T, Araki, S, Yoshioka, T, Fujimoto, M, Watanabe, S, Oba, T, Ogawa, A, Otsuka, K, Mikami, D, Kinoshita, K, Nakatani, T, Nakamura, A & Yamato, J 2010, Real-time meeting recognition and understanding using distant microphones and omni-directional camera. in 2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings., 5700890, pp. 424-429, 2010 IEEE Workshop on Spoken Language Technology, SLT 2010, Berkeley, CA, United States, 10/12/12. https://doi.org/10.1109/SLT.2010.5700890
Hori T, Araki S, Yoshioka T, Fujimoto M, Watanabe S, Oba T et al. Real-time meeting recognition and understanding using distant microphones and omni-directional camera. In 2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings. 2010. p. 424-429. 5700890 https://doi.org/10.1109/SLT.2010.5700890
Hori, Takaaki ; Araki, Shoko ; Yoshioka, Takuya ; Fujimoto, Masakiyo ; Watanabe, Shinji ; Oba, Takanobu ; Ogawa, Atsunori ; Otsuka, Kazuhiro ; Mikami, Dan ; Kinoshita, Keisuke ; Nakatani, Tomohiro ; Nakamura, Atsushi ; Yamato, Junji. / Real-time meeting recognition and understanding using distant microphones and omni-directional camera. 2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings. 2010. pp. 424-429
@inproceedings{ac929446e16242388ab9dcce2b7d4e7c,
title = "Real-time meeting recognition and understanding using distant microphones and omni-directional camera",
abstract = "This paper presents our newly developed real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to automatically recognize {"}who is speaking what{"} in an online manner for meeting assistance. Our system continuously captures the utterances and the face pose of each speaker using a distant microphone array and an omni-directional camera at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker's channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g. speaking, laughing, watching someone) and the situation of the meeting (e.g. topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-latency monitoring of meetings, and we show our experimental results for real-time meeting transcription.",
keywords = "Distant microphones, Meeting analysis, Speaker diarization, Speech enhancement, Speech recognition, Topic tracking",
author = "Takaaki Hori and Shoko Araki and Takuya Yoshioka and Masakiyo Fujimoto and Shinji Watanabe and Takanobu Oba and Atsunori Ogawa and Kazuhiro Otsuka and Dan Mikami and Keisuke Kinoshita and Tomohiro Nakatani and Atsushi Nakamura and Junji Yamato",
year = "2010",
doi = "10.1109/SLT.2010.5700890",
language = "English",
isbn = "9781424479030",
pages = "424--429",
booktitle = "2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings",

}

TY - GEN

T1 - Real-time meeting recognition and understanding using distant microphones and omni-directional camera

AU - Hori, Takaaki

AU - Araki, Shoko

AU - Yoshioka, Takuya

AU - Fujimoto, Masakiyo

AU - Watanabe, Shinji

AU - Oba, Takanobu

AU - Ogawa, Atsunori

AU - Otsuka, Kazuhiro

AU - Mikami, Dan

AU - Kinoshita, Keisuke

AU - Nakatani, Tomohiro

AU - Nakamura, Atsushi

AU - Yamato, Junji

PY - 2010

Y1 - 2010

N2 - This paper presents our newly developed real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to automatically recognize "who is speaking what" in an online manner for meeting assistance. Our system continuously captures the utterances and the face pose of each speaker using a distant microphone array and an omni-directional camera at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker's channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g. speaking, laughing, watching someone) and the situation of the meeting (e.g. topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-latency monitoring of meetings, and we show our experimental results for real-time meeting transcription.

AB - This paper presents our newly developed real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to automatically recognize "who is speaking what" in an online manner for meeting assistance. Our system continuously captures the utterances and the face pose of each speaker using a distant microphone array and an omni-directional camera at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker's channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g. speaking, laughing, watching someone) and the situation of the meeting (e.g. topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-latency monitoring of meetings, and we show our experimental results for real-time meeting transcription.

KW - Distant microphones

KW - Meeting analysis

KW - Speaker diarization

KW - Speech enhancement

KW - Speech recognition

KW - Topic tracking

UR - http://www.scopus.com/inward/record.url?scp=79951797950&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79951797950&partnerID=8YFLogxK

U2 - 10.1109/SLT.2010.5700890

DO - 10.1109/SLT.2010.5700890

M3 - Conference contribution

AN - SCOPUS:79951797950

SN - 9781424479030

SP - 424

EP - 429

BT - 2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings

ER -