Online integration of DNN-Based and spatial clustering-based mask estimation for robust MVDR beamforming

Yutaro Matsui, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Nobutaka Ito, Shoko Araki, Shoji Makino

研究成果: Conference contribution

9 被引用数 (Scopus)

抄録

This paper discusses the online estimation of time- frequency masks, which enables us to perform mask-based beamforming by online processing for robust automatic speech recognition (ASR). Two approaches to online mask estimation have been separately developed for this purpose. One is based on a deep neural network (DNN), which exploits the spectral features of the signal. The other is based on spatial clustering (SC), which exploits the spatial features of the signal. This paper proposes a new method that integrates the two online estimation approaches to further improve online mask estimation by exploiting the advantages of both approaches. Experiments using the real data of the CHiME-3 multichannel noisy speech corpus show that the proposed method greatly outperforms the conventional approaches in terms of improving the word error rate (WER).

本文言語English
ホスト出版物のタイトル16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ71-75
ページ数5
ISBN(電子版)9781538681510
DOI
出版ステータスPublished - 2018 11月 2
外部発表はい
イベント16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Tokyo, Japan
継続期間: 2018 9月 172018 9月 20

出版物シリーズ

名前16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings

Other

Other16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018
国/地域Japan
CityTokyo
Period18/9/1718/9/20

ASJC Scopus subject areas

  • 信号処理
  • 音響学および超音波学

フィンガープリント

「Online integration of DNN-Based and spatial clustering-based mask estimation for robust MVDR beamforming」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル