TY - GEN
T1 - Dual-Path RNN for Long Recording Speech Separation
AU - Li, Chenda
AU - Luo, Yi
AU - Han, Cong
AU - Li, Jinyu
AU - Yoshioka, Takuya
AU - Zhou, Tianyan
AU - Delcroix, Marc
AU - Kinoshita, Keisuke
AU - Boeddeker, Christoph
AU - Qian, Yanmin
AU - Watanabe, Shinji
AU - Chen, Zhuo
N1 - Funding Information:
The work presented here was carried out during the 2020 Je-linek Memorial Summer Workshop on Speech and Language Technologies at Johns Hopkins University, which was supported with unrestricted gifts from Microsoft (Research and Azure), Amazon (Alexa and AWS), and Google. Chenda Li and Yanmin Qian are also supported by the China NSFC project (No. 62071288 and No.U1736202).
Publisher Copyright:
© 2021 IEEE.
PY - 2021/1/19
Y1 - 2021/1/19
N2 - Continuous speech separation (CSS) is an arising task in speech separation aiming at separating overlap-free targets from a long, partially-overlapped recording. A straightforward extension of previously proposed sentence-level separation models to this task is to segment the long recording into fixed-length blocks and perform separation on them independently. However, such simple extension does not fully address the cross-block dependencies and the separation performance may not be satisfactory. In this paper, we focus on how the block-level separation performance can be improved by exploring methods to utilize the cross-block information. Based on the recently proposed dual-path RNN (DPRNN) architecture, we investigate how DPRNN can help the block-level separation by the interleaved intra- and inter-block modules. Experiment results show that DPRNN is able to significantly outperform the baseline block-level model in both offline and block-online configurations under certain settings.
AB - Continuous speech separation (CSS) is an arising task in speech separation aiming at separating overlap-free targets from a long, partially-overlapped recording. A straightforward extension of previously proposed sentence-level separation models to this task is to segment the long recording into fixed-length blocks and perform separation on them independently. However, such simple extension does not fully address the cross-block dependencies and the separation performance may not be satisfactory. In this paper, we focus on how the block-level separation performance can be improved by exploring methods to utilize the cross-block information. Based on the recently proposed dual-path RNN (DPRNN) architecture, we investigate how DPRNN can help the block-level separation by the interleaved intra- and inter-block modules. Experiment results show that DPRNN is able to significantly outperform the baseline block-level model in both offline and block-online configurations under certain settings.
KW - Continuous speech separation
KW - dual-path RNN
KW - long recording speech separation
UR - http://www.scopus.com/inward/record.url?scp=85102374739&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102374739&partnerID=8YFLogxK
U2 - 10.1109/SLT48900.2021.9383514
DO - 10.1109/SLT48900.2021.9383514
M3 - Conference contribution
AN - SCOPUS:85102374739
T3 - 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings
SP - 865
EP - 872
BT - 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE Spoken Language Technology Workshop, SLT 2021
Y2 - 19 January 2021 through 22 January 2021
ER -