Closing the Gap between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions

Wangyou Zhang, Jing Shi, Chenda Li, Shinji Watanabe, Yanmin Qian

研究成果: Conference contribution

抄録

The deep learning based time-domain models, e.g. Conv-TasNet, have shown great potential in both single-channel and multi-channel speech enhancement. However, many experiments on the time-domain speech enhancement model are done in simulated conditions, and it is not well studied whether the good performance can generalize to real-world scenarios. In this paper, we aim to provide an insightful investigation of applying multi-channel Conv-TasNet based speech enhancement to both simulation and real data. Our preliminary experiments show a large performance gap between the two conditions in terms of the ASR performance. Several approaches are applied to close this gap, including the integration of multi-channel Conv-TasNet into the beamforming model with various strategies, and the joint training of speech enhancement and speech recognition models. Our experiments on the CHiME-4 corpus show that our proposed approaches can greatly reduce the speech recognition performance discrepancy between simulation and real data, while preserving the strong speech enhancement capability in the frontend.

本文言語English
ホスト出版物のタイトル2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021
出版社Institute of Electrical and Electronics Engineers Inc.
ページ146-150
ページ数5
ISBN(電子版)9781665448703
DOI
出版ステータスPublished - 2021
外部発表はい
イベント2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021 - New Paltz, United States
継続期間: 2021 10月 172021 10月 20

出版物シリーズ

名前IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
2021-October
ISSN(印刷版)1931-1168
ISSN(電子版)1947-1629

Conference

Conference2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021
国/地域United States
CityNew Paltz
Period21/10/1721/10/20

ASJC Scopus subject areas

  • 電子工学および電気工学
  • コンピュータ サイエンスの応用

フィンガープリント

「Closing the Gap between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル