Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement

Zhong Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey

研究成果: Conference contribution

3 被引用数 (Scopus)

抄録

This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation. Our neural networks for separation use an advanced convolutional architecture trained with a novel stabilized signal-to-noise ratio loss function. For beamforming, we explore multiple ways of computing time-varying covariance matrices, including factorizing the spatial covariance into a time-varying amplitude component and a time-invariant spatial component, as well as using block-based techniques. In addition, we introduce a multi-frame beamforming method which improves the results significantly by adding contextual frames to the beamforming formulations. We extensively evaluate and analyze the effects of window size, block size, and multi-frame context size for these methods. Our best method utilizes a sequence of three neural separation and multi-frame time-invariant spatial beamforming stages, and demonstrates an average improvement of 2.75 dB in scale-invariant signal-to-noise ratio and 14.2% absolute reduction in a comparative speech recognition metric across four challenging reverberant speech enhancement and separation tasks. We also use our three-speaker separation model to separate real recordings in the LibriCSS evaluation set into non-overlapping tracks, and achieve a better word error rate as compared to a baseline mask based beamformer.

本文言語English
ホスト出版物のタイトル2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ905-911
ページ数7
ISBN(電子版)9781728170664
DOI
出版ステータスPublished - 2021 1 19
外部発表はい
イベント2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Virtual, Shenzhen, China
継続期間: 2021 1 192021 1 22

出版物シリーズ

名前2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings

Conference

Conference2021 IEEE Spoken Language Technology Workshop, SLT 2021
国/地域China
CityVirtual, Shenzhen
Period21/1/1921/1/22

ASJC Scopus subject areas

  • 言語学および言語
  • 言語および言語学
  • 人工知能
  • コンピュータ サイエンスの応用
  • コンピュータ ビジョンおよびパターン認識
  • ハードウェアとアーキテクチャ

フィンガープリント

「Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル