Analysis of robustness of deep single-channel speech separation using corpora constructed from multiple domains

Matthew Maciejewski, Gregory Sell, Yusuke Fujita, Leibny Paola Garcia-Perera, Shinji Watanabe, Sanjeev Khudanpur

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Deep-learning based single-channel speech separation has been studied with great success, though evaluations have typically been limited to relatively controlled environments based on clean, near-field, and read speech. This work investigates the robustness of such representative techniques in more realistic environments with multiple and diverse conditions. To this end, we first construct datasets from the Mixer 6 and CHiME-5 corpora, featuring studio interviews and dinner parties respectively, using a procedure carefully designed to generate desirable synthetic overlap data sufficient for evaluation as well as for training deep learning models. Using these new datasets, we demonstrate the substantial shortcomings in mismatched conditions of these separation techniques. Though multi-condition training greatly mitigated the performance degradation in near-field conditions, one of the important findings is that both matched and multi-condition training have significant gaps from the oracle performance in far-field conditions, which advocates a need for extending existing separation techniques to deal with far-field/highly-reverberant speech mixtures.

Original languageEnglish
Title of host publication2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages165-169
Number of pages5
ISBN (Electronic)9781728111230
DOIs
Publication statusPublished - 2019 Oct
Externally publishedYes
Event2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019 - New Paltz, United States
Duration: 2019 Oct 202019 Oct 23

Publication series

NameIEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Volume2019-October
ISSN (Print)1931-1168
ISSN (Electronic)1947-1629

Conference

Conference2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019
Country/TerritoryUnited States
CityNew Paltz
Period19/10/2019/10/23

Keywords

  • deep learning
  • far-field speech
  • single-channel speech separation

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Analysis of robustness of deep single-channel speech separation using corpora constructed from multiple domains'. Together they form a unique fingerprint.

Cite this