Closing the Gap between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions

Wangyou Zhang, Jing Shi, Chenda Li, Shinji Watanabe, Yanmin Qian

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

The deep learning based time-domain models, e.g. Conv-TasNet, have shown great potential in both single-channel and multi-channel speech enhancement. However, many experiments on the time-domain speech enhancement model are done in simulated conditions, and it is not well studied whether the good performance can generalize to real-world scenarios. In this paper, we aim to provide an insightful investigation of applying multi-channel Conv-TasNet based speech enhancement to both simulation and real data. Our preliminary experiments show a large performance gap between the two conditions in terms of the ASR performance. Several approaches are applied to close this gap, including the integration of multi-channel Conv-TasNet into the beamforming model with various strategies, and the joint training of speech enhancement and speech recognition models. Our experiments on the CHiME-4 corpus show that our proposed approaches can greatly reduce the speech recognition performance discrepancy between simulation and real data, while preserving the strong speech enhancement capability in the frontend.

Original languageEnglish
Title of host publication2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages146-150
Number of pages5
ISBN (Electronic)9781665448703
DOIs
Publication statusPublished - 2021
Externally publishedYes
Event2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021 - New Paltz, United States
Duration: 2021 Oct 172021 Oct 20

Publication series

NameIEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Volume2021-October
ISSN (Print)1931-1168
ISSN (Electronic)1947-1629

Conference

Conference2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021
Country/TerritoryUnited States
CityNew Paltz
Period21/10/1721/10/20

Keywords

  • automatic speech recognition
  • beamforming
  • multi-channel speech enhancement
  • time domain

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Closing the Gap between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions'. Together they form a unique fingerprint.

Cite this