Noise-robust attention learning for end-to-end speech recognition

Yosuke Higuchi*, Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi, Tetsuji Ogawa

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution


We propose a method for improving the noise robustness of an end-to-end automatic speech recognition (ASR) model using attention weights. Several studies have adopted a combination of recurrent neural networks and attention mechanisms to achieve direct speech-to-text translation. In the real-world environment, however, noisy conditions make it difficult for the attention mechanisms to estimate the accurate alignment between the input speech frames and output characters, leading to the degradation of the recognition performance of the end-to-end model. In this work, we propose noise-robust attention learning (NRAL) which explicitly tells the attention mechanism where to “listen at” in a sequence of noisy speech features. Specifically, we train the attention weights estimated from a noisy speech to approximate the weights estimated from a clean speech. The experimental results based on the CHiME-4 task indicate that the proposed NRAL approach effectively improves the noise robustness of the end-to-end ASR model.

Original languageEnglish
Title of host publication28th European Signal Processing Conference, EUSIPCO 2020 - Proceedings
PublisherEuropean Signal Processing Conference, EUSIPCO
Number of pages5
ISBN (Electronic)9789082797053
Publication statusPublished - 2021 Jan 24
Event28th European Signal Processing Conference, EUSIPCO 2020 - Amsterdam, Netherlands
Duration: 2020 Aug 242020 Aug 28

Publication series

NameEuropean Signal Processing Conference
ISSN (Print)2219-5491


Conference28th European Signal Processing Conference, EUSIPCO 2020


  • Attention mechanism
  • Deep neural networks
  • Noise robustness
  • Speech recognition

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering


Dive into the research topics of 'Noise-robust attention learning for end-to-end speech recognition'. Together they form a unique fingerprint.

Cite this