Online meeting recognizer with multichannel speaker diarization

Shoko Araki, Takaaki Hori, Masakiyo Fujimoto, Shinji Watanabe, Takuya Yoshioka, Tomohiro Nakatani, Atsushi Nakamura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

We present our newly developed real-time conversation analyzer for group meetings. The goal of the system is to estimate automatically "who speaks when and what" in an online manner. In our system, "who speaks when" information is first obtained by estimating the directions of arrival (DOAs) of signals. Then, "who speaks what" is estimated with our automatic speech recognition (ASR) system, after suppressing reverberation, background noise, and interference speakers' voices. In this paper, we focus particularly on the speaker diarization ("who speaks when" estimation) method, and we show that the speaker diarization information helps the ASR to reduce insertion errors.

Original languageEnglish
Title of host publicationConference Record of the 44th Asilomar Conference on Signals, Systems and Computers, Asilomar 2010
Pages1697-1701
Number of pages5
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event44th Asilomar Conference on Signals, Systems and Computers, Asilomar 2010 - Pacific Grove, CA, United States
Duration: 2010 Nov 72010 Nov 10

Other

Other44th Asilomar Conference on Signals, Systems and Computers, Asilomar 2010
CountryUnited States
CityPacific Grove, CA
Period10/11/710/11/10

Fingerprint

Speech recognition
Reverberation
Direction of arrival
Acoustic noise

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Signal Processing

Cite this

Araki, S., Hori, T., Fujimoto, M., Watanabe, S., Yoshioka, T., Nakatani, T., & Nakamura, A. (2010). Online meeting recognizer with multichannel speaker diarization. In Conference Record of the 44th Asilomar Conference on Signals, Systems and Computers, Asilomar 2010 (pp. 1697-1701). [5757829] https://doi.org/10.1109/ACSSC.2010.5757829

Online meeting recognizer with multichannel speaker diarization. / Araki, Shoko; Hori, Takaaki; Fujimoto, Masakiyo; Watanabe, Shinji; Yoshioka, Takuya; Nakatani, Tomohiro; Nakamura, Atsushi.

Conference Record of the 44th Asilomar Conference on Signals, Systems and Computers, Asilomar 2010. 2010. p. 1697-1701 5757829.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Araki, S, Hori, T, Fujimoto, M, Watanabe, S, Yoshioka, T, Nakatani, T & Nakamura, A 2010, Online meeting recognizer with multichannel speaker diarization. in Conference Record of the 44th Asilomar Conference on Signals, Systems and Computers, Asilomar 2010., 5757829, pp. 1697-1701, 44th Asilomar Conference on Signals, Systems and Computers, Asilomar 2010, Pacific Grove, CA, United States, 10/11/7. https://doi.org/10.1109/ACSSC.2010.5757829
Araki S, Hori T, Fujimoto M, Watanabe S, Yoshioka T, Nakatani T et al. Online meeting recognizer with multichannel speaker diarization. In Conference Record of the 44th Asilomar Conference on Signals, Systems and Computers, Asilomar 2010. 2010. p. 1697-1701. 5757829 https://doi.org/10.1109/ACSSC.2010.5757829
Araki, Shoko ; Hori, Takaaki ; Fujimoto, Masakiyo ; Watanabe, Shinji ; Yoshioka, Takuya ; Nakatani, Tomohiro ; Nakamura, Atsushi. / Online meeting recognizer with multichannel speaker diarization. Conference Record of the 44th Asilomar Conference on Signals, Systems and Computers, Asilomar 2010. 2010. pp. 1697-1701
@inproceedings{aa074a6c21ac4045abe3e46eaba2c98b,
title = "Online meeting recognizer with multichannel speaker diarization",
abstract = "We present our newly developed real-time conversation analyzer for group meetings. The goal of the system is to estimate automatically {"}who speaks when and what{"} in an online manner. In our system, {"}who speaks when{"} information is first obtained by estimating the directions of arrival (DOAs) of signals. Then, {"}who speaks what{"} is estimated with our automatic speech recognition (ASR) system, after suppressing reverberation, background noise, and interference speakers' voices. In this paper, we focus particularly on the speaker diarization ({"}who speaks when{"} estimation) method, and we show that the speaker diarization information helps the ASR to reduce insertion errors.",
author = "Shoko Araki and Takaaki Hori and Masakiyo Fujimoto and Shinji Watanabe and Takuya Yoshioka and Tomohiro Nakatani and Atsushi Nakamura",
year = "2010",
doi = "10.1109/ACSSC.2010.5757829",
language = "English",
isbn = "9781424497218",
pages = "1697--1701",
booktitle = "Conference Record of the 44th Asilomar Conference on Signals, Systems and Computers, Asilomar 2010",

}

TY - GEN

T1 - Online meeting recognizer with multichannel speaker diarization

AU - Araki, Shoko

AU - Hori, Takaaki

AU - Fujimoto, Masakiyo

AU - Watanabe, Shinji

AU - Yoshioka, Takuya

AU - Nakatani, Tomohiro

AU - Nakamura, Atsushi

PY - 2010

Y1 - 2010

N2 - We present our newly developed real-time conversation analyzer for group meetings. The goal of the system is to estimate automatically "who speaks when and what" in an online manner. In our system, "who speaks when" information is first obtained by estimating the directions of arrival (DOAs) of signals. Then, "who speaks what" is estimated with our automatic speech recognition (ASR) system, after suppressing reverberation, background noise, and interference speakers' voices. In this paper, we focus particularly on the speaker diarization ("who speaks when" estimation) method, and we show that the speaker diarization information helps the ASR to reduce insertion errors.

AB - We present our newly developed real-time conversation analyzer for group meetings. The goal of the system is to estimate automatically "who speaks when and what" in an online manner. In our system, "who speaks when" information is first obtained by estimating the directions of arrival (DOAs) of signals. Then, "who speaks what" is estimated with our automatic speech recognition (ASR) system, after suppressing reverberation, background noise, and interference speakers' voices. In this paper, we focus particularly on the speaker diarization ("who speaks when" estimation) method, and we show that the speaker diarization information helps the ASR to reduce insertion errors.

UR - http://www.scopus.com/inward/record.url?scp=79957991229&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79957991229&partnerID=8YFLogxK

U2 - 10.1109/ACSSC.2010.5757829

DO - 10.1109/ACSSC.2010.5757829

M3 - Conference contribution

AN - SCOPUS:79957991229

SN - 9781424497218

SP - 1697

EP - 1701

BT - Conference Record of the 44th Asilomar Conference on Signals, Systems and Computers, Asilomar 2010

ER -