Dual-path modeling for long recording speech separation in meetings

Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian

Research output: Contribution to journalConference articlepeer-review

Abstract

The continuous speech separation (CSS) is a task to separate the speech sources from a long, partially overlapped recording, which involves a varying number of speakers. A straightforward extension of conventional utterance-level speech separation to the CSS task is to segment the long recording with a size-fixed window and process each window separately. Though effective, this extension fails to model the long dependency in speech and thus leads to sub-optimum performance. The recent proposed dual-path modeling could be a remedy to this problem, thanks to its capability in jointly modeling the cross-window dependency and the local-window processing. In this work, we further extend the dual-path modeling framework for CSS task. A transformer-based dual-path system is proposed, which integrates transform layers for global modeling. The proposed models are applied to LibriCSS, a real recorded multi-talk dataset, and consistent WER reduction can be observed in the ASR evaluation for separated speech. Also, a dual-path transformer equipped with convolutional layers is proposed. It significantly reduces the computation amount by 30% with better WER evaluation. Furthermore, the online processing dual-path models are investigated, which shows 10% relative WER reduction compared to the baseline.

Original languageEnglish
Pages (from-to)5739-5743
Number of pages5
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2021-June
DOIs
Publication statusPublished - 2021
Externally publishedYes
Event2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Canada
Duration: 2021 Jun 62021 Jun 11

Keywords

  • Continuous speech separation
  • Dual-path modeling
  • Long recording speech separation
  • Online processing

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this