Speaker Adaptation for Multichannel End-to-End Speech Recognition

Tsubasa Ochiai, Shinji Watanabe, Shigeru Katagiri, Takaaki Hori, John Hershey

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Recent work on multichannel end-to-end automatic speech recognition (ASR) has shown that multichannel speech enhancement and speech recognition functions can be integrated into a deep neural network (DNN)-based system, and promising experimental results have been shown using the CHiME-4 and AMI corpora. In other recent DNN-based hidden Markov model (DNN-HMM) hybrid architectures, the effectiveness of speaker adaptation has been well established. Motivated by these results, we propose a multi-path adaptation scheme for end-to-end multichannel ASR, which combines the unprocessed noisy speech features with a speech-enhanced pathway to improve upon previous end-to-end ASR approaches. Experimental results using CHiME-4 show that (1) our proposed multi-path adaptation scheme improves ASR performance and (2) adapting the encoder network is more effective than adapting the neural beamformer, attention mechanism, or decoder network.

Original languageEnglish
Title of host publication2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6707-6711
Number of pages5
Volume2018-April
ISBN (Print)9781538646588
DOIs
Publication statusPublished - 2018 Sep 10
Externally publishedYes
Event2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
Duration: 2018 Apr 152018 Apr 20

Other

Other2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
CountryCanada
CityCalgary
Period18/4/1518/4/20

Fingerprint

Speech recognition
Speech enhancement
Hidden Markov models
Deep neural networks

Keywords

  • Attention-based encoder-decoder
  • Multichannel end-to-end ASR
  • Neural beamformer
  • Speaker adaptation

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Ochiai, T., Watanabe, S., Katagiri, S., Hori, T., & Hershey, J. (2018). Speaker Adaptation for Multichannel End-to-End Speech Recognition. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings (Vol. 2018-April, pp. 6707-6711). [8462161] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2018.8462161

Speaker Adaptation for Multichannel End-to-End Speech Recognition. / Ochiai, Tsubasa; Watanabe, Shinji; Katagiri, Shigeru; Hori, Takaaki; Hershey, John.

2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Vol. 2018-April Institute of Electrical and Electronics Engineers Inc., 2018. p. 6707-6711 8462161.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ochiai, T, Watanabe, S, Katagiri, S, Hori, T & Hershey, J 2018, Speaker Adaptation for Multichannel End-to-End Speech Recognition. in 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. vol. 2018-April, 8462161, Institute of Electrical and Electronics Engineers Inc., pp. 6707-6711, 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018, Calgary, Canada, 18/4/15. https://doi.org/10.1109/ICASSP.2018.8462161
Ochiai T, Watanabe S, Katagiri S, Hori T, Hershey J. Speaker Adaptation for Multichannel End-to-End Speech Recognition. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Vol. 2018-April. Institute of Electrical and Electronics Engineers Inc. 2018. p. 6707-6711. 8462161 https://doi.org/10.1109/ICASSP.2018.8462161
Ochiai, Tsubasa ; Watanabe, Shinji ; Katagiri, Shigeru ; Hori, Takaaki ; Hershey, John. / Speaker Adaptation for Multichannel End-to-End Speech Recognition. 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Vol. 2018-April Institute of Electrical and Electronics Engineers Inc., 2018. pp. 6707-6711
@inproceedings{ba69d5bded0f46a3a75e6ee9a89f550f,
title = "Speaker Adaptation for Multichannel End-to-End Speech Recognition",
abstract = "Recent work on multichannel end-to-end automatic speech recognition (ASR) has shown that multichannel speech enhancement and speech recognition functions can be integrated into a deep neural network (DNN)-based system, and promising experimental results have been shown using the CHiME-4 and AMI corpora. In other recent DNN-based hidden Markov model (DNN-HMM) hybrid architectures, the effectiveness of speaker adaptation has been well established. Motivated by these results, we propose a multi-path adaptation scheme for end-to-end multichannel ASR, which combines the unprocessed noisy speech features with a speech-enhanced pathway to improve upon previous end-to-end ASR approaches. Experimental results using CHiME-4 show that (1) our proposed multi-path adaptation scheme improves ASR performance and (2) adapting the encoder network is more effective than adapting the neural beamformer, attention mechanism, or decoder network.",
keywords = "Attention-based encoder-decoder, Multichannel end-to-end ASR, Neural beamformer, Speaker adaptation",
author = "Tsubasa Ochiai and Shinji Watanabe and Shigeru Katagiri and Takaaki Hori and John Hershey",
year = "2018",
month = "9",
day = "10",
doi = "10.1109/ICASSP.2018.8462161",
language = "English",
isbn = "9781538646588",
volume = "2018-April",
pages = "6707--6711",
booktitle = "2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Speaker Adaptation for Multichannel End-to-End Speech Recognition

AU - Ochiai, Tsubasa

AU - Watanabe, Shinji

AU - Katagiri, Shigeru

AU - Hori, Takaaki

AU - Hershey, John

PY - 2018/9/10

Y1 - 2018/9/10

N2 - Recent work on multichannel end-to-end automatic speech recognition (ASR) has shown that multichannel speech enhancement and speech recognition functions can be integrated into a deep neural network (DNN)-based system, and promising experimental results have been shown using the CHiME-4 and AMI corpora. In other recent DNN-based hidden Markov model (DNN-HMM) hybrid architectures, the effectiveness of speaker adaptation has been well established. Motivated by these results, we propose a multi-path adaptation scheme for end-to-end multichannel ASR, which combines the unprocessed noisy speech features with a speech-enhanced pathway to improve upon previous end-to-end ASR approaches. Experimental results using CHiME-4 show that (1) our proposed multi-path adaptation scheme improves ASR performance and (2) adapting the encoder network is more effective than adapting the neural beamformer, attention mechanism, or decoder network.

AB - Recent work on multichannel end-to-end automatic speech recognition (ASR) has shown that multichannel speech enhancement and speech recognition functions can be integrated into a deep neural network (DNN)-based system, and promising experimental results have been shown using the CHiME-4 and AMI corpora. In other recent DNN-based hidden Markov model (DNN-HMM) hybrid architectures, the effectiveness of speaker adaptation has been well established. Motivated by these results, we propose a multi-path adaptation scheme for end-to-end multichannel ASR, which combines the unprocessed noisy speech features with a speech-enhanced pathway to improve upon previous end-to-end ASR approaches. Experimental results using CHiME-4 show that (1) our proposed multi-path adaptation scheme improves ASR performance and (2) adapting the encoder network is more effective than adapting the neural beamformer, attention mechanism, or decoder network.

KW - Attention-based encoder-decoder

KW - Multichannel end-to-end ASR

KW - Neural beamformer

KW - Speaker adaptation

UR - http://www.scopus.com/inward/record.url?scp=85054252839&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054252839&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2018.8462161

DO - 10.1109/ICASSP.2018.8462161

M3 - Conference contribution

SN - 9781538646588

VL - 2018-April

SP - 6707

EP - 6711

BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -