Speaker Adaptation for Multichannel End-to-End Speech Recognition

Tsubasa Ochiai, Shinji Watanabe, Shigeru Katagiri, Takaaki Hori, John Hershey

研究成果: Conference contribution

14 被引用数 (Scopus)

抄録

Recent work on multichannel end-to-end automatic speech recognition (ASR) has shown that multichannel speech enhancement and speech recognition functions can be integrated into a deep neural network (DNN)-based system, and promising experimental results have been shown using the CHiME-4 and AMI corpora. In other recent DNN-based hidden Markov model (DNN-HMM) hybrid architectures, the effectiveness of speaker adaptation has been well established. Motivated by these results, we propose a multi-path adaptation scheme for end-to-end multichannel ASR, which combines the unprocessed noisy speech features with a speech-enhanced pathway to improve upon previous end-to-end ASR approaches. Experimental results using CHiME-4 show that (1) our proposed multi-path adaptation scheme improves ASR performance and (2) adapting the encoder network is more effective than adapting the neural beamformer, attention mechanism, or decoder network.

本文言語English
ホスト出版物のタイトル2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ6707-6711
ページ数5
ISBN(印刷版)9781538646588
DOI
出版ステータスPublished - 2018 9 10
外部発表はい
イベント2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
継続期間: 2018 4 152018 4 20

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2018-April
ISSN(印刷版)1520-6149

Other

Other2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
国/地域Canada
CityCalgary
Period18/4/1518/4/20

ASJC Scopus subject areas

  • ソフトウェア
  • 信号処理
  • 電子工学および電気工学

フィンガープリント

「Speaker Adaptation for Multichannel End-to-End Speech Recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル