Online streaming end-to-end neural diarization handling overlapping speech and flexible numbers of speakers

Yawen Xue*, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Paola García, Kenji Nagamatsu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We propose a streaming diarization method based on an end-to-end neural diarization (EEND) model, which handles flexible numbers of speakers and overlapping speech. In our previous study, the speaker-tracing buffer (STB) mechanism was proposed to achieve a chunk-wise streaming diarization using a pre-trained EEND model. STB traces the speaker information in previous chunks to map the speakers in a new chunk. However, it only worked with two-speaker recordings. In this paper, we propose an extended STB for flexible numbers of speakers, FLEX-STB. The proposed method uses a zero-padding followed by speaker-tracing, which alleviates the difference in the number of speakers between a buffer and a current chunk. We also examine buffer update strategies to select important frames for tracing multiple speakers. Experiments on CALLHOME and DIHARD II datasets show that the proposed method achieves comparable performance to the offline EEND method with 1-second latency. The results also show that our proposed method outperforms recently proposed chunk-wise diarization methods based on EEND (BW-EDA-EEND).

Original languageEnglish
Title of host publication22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PublisherInternational Speech Communication Association
Pages2483-2487
Number of pages5
ISBN (Electronic)9781713836902
DOIs
Publication statusPublished - 2021
Externally publishedYes
Event22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic
Duration: 2021 Aug 302021 Sep 3

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume4
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

Conference22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Country/TerritoryCzech Republic
CityBrno
Period21/8/3021/9/3

Keywords

  • EEND
  • Flexible numbers of speakers
  • Online speaker diarization
  • Overlapping speech

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Online streaming end-to-end neural diarization handling overlapping speech and flexible numbers of speakers'. Together they form a unique fingerprint.

Cite this