Noise-robust attention learning for end-to-end speech recognition

Yosuke Higuchi*, Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi, Tetsuji Ogawa

*この研究の対応する著者

研究成果: Conference contribution

抄録

We propose a method for improving the noise robustness of an end-to-end automatic speech recognition (ASR) model using attention weights. Several studies have adopted a combination of recurrent neural networks and attention mechanisms to achieve direct speech-to-text translation. In the real-world environment, however, noisy conditions make it difficult for the attention mechanisms to estimate the accurate alignment between the input speech frames and output characters, leading to the degradation of the recognition performance of the end-to-end model. In this work, we propose noise-robust attention learning (NRAL) which explicitly tells the attention mechanism where to “listen at” in a sequence of noisy speech features. Specifically, we train the attention weights estimated from a noisy speech to approximate the weights estimated from a clean speech. The experimental results based on the CHiME-4 task indicate that the proposed NRAL approach effectively improves the noise robustness of the end-to-end ASR model.

本文言語English
ホスト出版物のタイトル28th European Signal Processing Conference, EUSIPCO 2020 - Proceedings
出版社European Signal Processing Conference, EUSIPCO
ページ311-315
ページ数5
ISBN(電子版)9789082797053
DOI
出版ステータスPublished - 2021 1 24
イベント28th European Signal Processing Conference, EUSIPCO 2020 - Amsterdam, Netherlands
継続期間: 2020 8 242020 8 28

出版物シリーズ

名前European Signal Processing Conference
2021-January
ISSN(印刷版)2219-5491

Conference

Conference28th European Signal Processing Conference, EUSIPCO 2020
国/地域Netherlands
CityAmsterdam
Period20/8/2420/8/28

ASJC Scopus subject areas

  • 信号処理
  • 電子工学および電気工学

フィンガープリント

「Noise-robust attention learning for end-to-end speech recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル