TY - GEN
T1 - Semi-supervised training with pseudo-labeling for end-to-end neural diarization
AU - Takashima, Yuki
AU - Fujita, Yusuke
AU - Horiguchi, Shota
AU - Watanabe, Shinji
AU - García, Paola
AU - Nagamatsu, Kenji
N1 - Funding Information:
We thank Desh Raj, Zili Huang, Sanjeev Khudanpur, Nelson Yalta, and Yawen Xue for your contribution to the evaluation on the DIHARD III.
Publisher Copyright:
Copyright © 2021 ISCA.
PY - 2021
Y1 - 2021
N2 - In this paper, we present a semi-supervised training technique using pseudo-labeling for end-to-end neural diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. However, to get a well-tuned model, EEND requires labeled data for all the joint speech activities of every speaker at each time frame in a recording. In this paper, we explore a pseudo-labeling approach that employs unlabeled data. First, we propose an iterative pseudo-label method for EEND, which trains the model using unlabeled data of a target condition. Then, we also propose a committee-based training method to improve the performance of EEND. To evaluate our proposed method, we conduct the experiments of model adaptation using labeled and unlabeled data. Experimental results on the CALLHOME dataset show that our proposed pseudo-label achieved a 37.4% relative diarization error rate reduction compared to a seed model. Moreover, we analyzed the results of semi-supervised adaptation with pseudo-labeling. We also show the effectiveness of our approach on the third DIHARD dataset.
AB - In this paper, we present a semi-supervised training technique using pseudo-labeling for end-to-end neural diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. However, to get a well-tuned model, EEND requires labeled data for all the joint speech activities of every speaker at each time frame in a recording. In this paper, we explore a pseudo-labeling approach that employs unlabeled data. First, we propose an iterative pseudo-label method for EEND, which trains the model using unlabeled data of a target condition. Then, we also propose a committee-based training method to improve the performance of EEND. To evaluate our proposed method, we conduct the experiments of model adaptation using labeled and unlabeled data. Experimental results on the CALLHOME dataset show that our proposed pseudo-label achieved a 37.4% relative diarization error rate reduction compared to a seed model. Moreover, we analyzed the results of semi-supervised adaptation with pseudo-labeling. We also show the effectiveness of our approach on the third DIHARD dataset.
KW - End-to-end neural diarization
KW - Pseudo-labeling
KW - Self-training
KW - Speaker diarization
UR - http://www.scopus.com/inward/record.url?scp=85119180123&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119180123&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2021-384
DO - 10.21437/Interspeech.2021-384
M3 - Conference contribution
AN - SCOPUS:85119180123
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 2498
EP - 2502
BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PB - International Speech Communication Association
T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Y2 - 30 August 2021 through 3 September 2021
ER -