Improved MVDR beamforming using single-channel mask prediction networks

Hakan Erdogan, John Hershey, Shinji Watanabe, Michael Mandel, Jonathan Le Roux

研究成果: Conference article査読

206 被引用数 (Scopus)

抄録

Recent studies on multi-microphone speech databases indicate that it is beneficial to perform beamforming to improve speech recognition accuracies, especially when there is a high level of background noise. Minimum variance distortionless response (MVDR) beamforming is an important beamforming method that performs quite well for speech recognition purposes especially if the steering vector is known. However, steering the beamformer to focus on speech in unknown acoustic conditions remains a challenging problem. In this study, we use singlechannel speech enhancement deep networks to form masks that can be used for noise spatial covariance estimation, which steers the MVDR beamforming toward the speech. We analyze how mask prediction affects performance and also discuss various ways to use masks to obtain the speech and noise spatial covariance estimates in a reliable way. We show that using a single mask across microphones for covariance prediction with minima-limited post-masking yields the best result in terms of signal-level quality measures and speech recognition word error rates in a mismatched training condition.

本文言語English
ページ(範囲)1981-1985
ページ数5
ジャーナルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
08-12-September-2016
DOI
出版ステータスPublished - 2016
外部発表はい
イベント17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States
継続期間: 2016 9月 82016 9月 16

ASJC Scopus subject areas

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション

フィンガープリント

「Improved MVDR beamforming using single-channel mask prediction networks」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル