Online meeting recognizer with multichannel speaker diarization

Shoko Araki, Takaaki Hori, Masakiyo Fujimoto, Shinji Watanabe, Takuya Yoshioka, Tomohiro Nakatani, Atsushi Nakamura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

We present our newly developed real-time conversation analyzer for group meetings. The goal of the system is to estimate automatically "who speaks when and what" in an online manner. In our system, "who speaks when" information is first obtained by estimating the directions of arrival (DOAs) of signals. Then, "who speaks what" is estimated with our automatic speech recognition (ASR) system, after suppressing reverberation, background noise, and interference speakers' voices. In this paper, we focus particularly on the speaker diarization ("who speaks when" estimation) method, and we show that the speaker diarization information helps the ASR to reduce insertion errors.

Original languageEnglish
Title of host publicationConference Record of the 44th Asilomar Conference on Signals, Systems and Computers, Asilomar 2010
Pages1697-1701
Number of pages5
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event44th Asilomar Conference on Signals, Systems and Computers, Asilomar 2010 - Pacific Grove, CA, United States
Duration: 2010 Nov 72010 Nov 10

Other

Other44th Asilomar Conference on Signals, Systems and Computers, Asilomar 2010
CountryUnited States
CityPacific Grove, CA
Period10/11/710/11/10

    Fingerprint

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Signal Processing

Cite this

Araki, S., Hori, T., Fujimoto, M., Watanabe, S., Yoshioka, T., Nakatani, T., & Nakamura, A. (2010). Online meeting recognizer with multichannel speaker diarization. In Conference Record of the 44th Asilomar Conference on Signals, Systems and Computers, Asilomar 2010 (pp. 1697-1701). [5757829] https://doi.org/10.1109/ACSSC.2010.5757829