ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration

Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Boeddeker, Zhuo Chen, Shinji Watanabe

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present ESPnet-SE, which is designed for the quick development of speech enhancement and speech separation systems in a single framework, along with the optional downstream speech recognition module. ESPnet-SE is a new project which integrates rich automatic speech recognition related models, resources and systems to support and validate the proposed front-end implementation (i.e. speech enhancement and separation). It is capable of processing both single-channel and multi-channel data, with various functionalities including dereverberation, denoising and source separation. We provide all-in-one recipes including data pre-processing, feature extraction, training and evaluation pipelines for a wide range of benchmark datasets. This paper describes the design of the toolkit, several important functionalities, especially the speech recognition integration, which differentiates ESPnet-SE from other open source toolkits, and experimental results with major benchmark datasets.

Original languageEnglish
Title of host publication2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages785-792
Number of pages8
ISBN (Electronic)9781728170664
DOIs
Publication statusPublished - 2021 Jan 19
Event2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Virtual, Shenzhen, China
Duration: 2021 Jan 192021 Jan 22

Publication series

Name2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings

Conference

Conference2021 IEEE Spoken Language Technology Workshop, SLT 2021
CountryChina
CityVirtual, Shenzhen
Period21/1/1921/1/22

Keywords

  • end-to-end
  • Open-source
  • source separation
  • speech enhancement
  • speech recognition

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics
  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture

Fingerprint Dive into the research topics of 'ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration'. Together they form a unique fingerprint.

Cite this