Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR

Felix Weninger, Hakan Erdogan, Shinji Watanabe, Emmanuel Vincent, Jonathan Le Roux, John R. Hershey, Björn Schuller

研究成果: Conference contribution

286 被引用数 (Scopus)

抄録

We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR). The proposed framework is based on Long Short-Term Memory (LSTM) RNNs which are discriminatively trained according to an optimal speech reconstruction objective. We demonstrate that LSTM speech enhancement, even when used ‘naïvely’ as front-end processing, delivers competitive results on the CHiME-2 speech recognition task. Furthermore, simple, feature-level fusion based extensions to the framework are proposed to improve the integration with the ASR back-end. These yield a best result of 13.76% average word error rate, which is, to our knowledge, the best score to date.

本文言語English
ホスト出版物のタイトルLatent Variable Analysis and Signal Separation - 12th International Conference, LVA/ICA 2015, Proceedings
編集者Zbynĕk Koldovský, Emmanuel Vincent, Arie Yeredor, Petr Tichavský
出版社Springer Verlag
ページ91-99
ページ数9
ISBN(印刷版)9783319224817
DOI
出版ステータスPublished - 2015
外部発表はい
イベント12th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2015 - Liberec, Czech Republic
継続期間: 2015 8 252015 8 28

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
9237
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Other

Other12th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2015
国/地域Czech Republic
CityLiberec
Period15/8/2515/8/28

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル