End-To-End Neural Speaker Diarization with Self-Attention

Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe

研究成果: Conference contribution

37 被引用数 (Scopus)

抄録

Speaker diarization has been mainly developed based on the clustering of speaker embeddings. However, the clustering-based approach has two major problems; i.e., (i) it is not optimized to minimize diarization errors directly, and (ii) it cannot handle speaker overlaps correctly. To solve these problems, the End-To-End Neural Diarization (EEND), in which a bidirectional long short-Term memory (BLSTM) network directly outputs speaker diarization results given a multi-Talker recording, was recently proposed. In this study, we enhance EEND by introducing self-Attention blocks instead of BLSTM blocks. In contrast to BLSTM, which is conditioned only on its previous and next hidden states, self-Attention is directly conditioned on all the other frames, making it much suitable for dealing with the speaker diarization problem. We evaluated our proposed method on simulated mixtures, real telephone calls, and real dialogue recordings. The experimental results revealed that the self-Attention was the key to achieving good performance and that our proposed method performed significantly better than the conventional BLSTM-based method. Our method was even better than that of the state-of-The-Art x-vector clustering-based method. Finally, by visualizing the latent representation, we show that the self-Attention can capture global speaker characteristics in addition to local speech activity dynamics. Our source code is available online at https://github.com/hitachi-speech/EEND.

本文言語English
ホスト出版物のタイトル2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ296-303
ページ数8
ISBN(電子版)9781728103068
DOI
出版ステータスPublished - 2019 12
外部発表はい
イベント2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Singapore, Singapore
継続期間: 2019 12 152019 12 18

出版物シリーズ

名前2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings

Conference

Conference2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019
国/地域Singapore
CitySingapore
Period19/12/1519/12/18

ASJC Scopus subject areas

  • コンピュータ ネットワークおよび通信
  • 信号処理
  • 言語学および言語
  • 通信

フィンガープリント

「End-To-End Neural Speaker Diarization with Self-Attention」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル