Online End-To-End Neural Diarization with Speaker-Tracing Buffer

Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Paola Garcia, Kenji Nagamatsu

研究成果: Conference contribution

1 被引用数 (Scopus)

抄録

This paper proposes a novel online speaker diarization algorithm based on a fully supervised self-attention mechanism (SA-EEND). Online diarization inherently presents a speaker's permutation problem due to the possibility to assign speaker regions incorrectly across the recording. To circumvent this inconsistency, we proposed a speaker-tracing buffer mechanism that selects several input frames representing the speaker permutation information from previous chunks and stores them in a buffer. These buffered frames are stacked with the input frames in the current chunk and fed into a self-attention network. Our method ensures consistent diarization outputs across the buffer and the current chunk by checking the correlation between their corresponding outputs. Additionally, we trained SA-EEND with variable chunk-sizes to mitigate the mismatch between training and inference introduced by the speaker-tracing buffer mechanism. Experimental results, including online SA-EEND and variable chunk-size, achieved DERs of 12.54 % for CALLHOME and 20.77 % for CSJ with 1.4 s actual latency.

本文言語English
ホスト出版物のタイトル2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ841-848
ページ数8
ISBN(電子版)9781728170664
DOI
出版ステータスPublished - 2021 1 19
イベント2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Virtual, Shenzhen, China
継続期間: 2021 1 192021 1 22

出版物シリーズ

名前2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings

Conference

Conference2021 IEEE Spoken Language Technology Workshop, SLT 2021
CountryChina
CityVirtual, Shenzhen
Period21/1/1921/1/22

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics
  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture

フィンガープリント 「Online End-To-End Neural Diarization with Speaker-Tracing Buffer」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル