Improving Frame-Online Neural Speech Enhancement With Overlapped-Frame Prediction

Zhong Qiu Wang*, Shinji Watanabe

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Frame-online speech enhancement systems in the short-time Fourier transform (STFT) domain usually have an algorithmic latency equal to the window size due to the use of overlap-add in the inverse STFT (iSTFT). This algorithmic latency allows the enhancement models to leverage future contextual information up to a length equal to the window size. However, this information is only partially leveraged by current frame-online systems. To fully exploit it, we propose an overlapped-frame prediction technique for deep learning based frame-online speech enhancement, where at each frame our deep neural network (DNN) predicts the current and several past frames that are necessary for overlap-add, instead of only predicting the current frame. In addition, we propose a loss function to account for the scale difference between predicted and oracle target signals. Experiments on a noisy-reverberant speech enhancement task show the effectiveness of the proposed algorithms.

Original languageEnglish
Pages (from-to)1422-1426
Number of pages5
JournalIEEE Signal Processing Letters
Volume29
DOIs
Publication statusPublished - 2022
Externally publishedYes

Keywords

  • Deep learning
  • online speech enhancement

ASJC Scopus subject areas

  • Signal Processing
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Improving Frame-Online Neural Speech Enhancement With Overlapped-Frame Prediction'. Together they form a unique fingerprint.

Cite this