Does speech enhancement work with end-to-end ASR objectives? Experimental analysis of multichannel end-to-end ASR

Tsubasa Ochiai, Shinji Watanabe, Shigeru Katagiri

研究成果: Conference contribution

8 被引用数 (Scopus)

抄録

Recently we proposed a novel multichannel end-to-end speech recognition architecture that integrates the components of multichannel speech enhancement and speech recognition into a single neural-network-based architecture and demonstrated its fundamental utility for automatic speech recognition (ASR). However, the behavior of the proposed integrated system remains insufficiently clarified. An open question is whether the speech enhancement component really gains speech enhancement (noise suppression) ability, because it is optimized based on end-to-end ASR objectives instead of speech enhancement objectives. In this paper, we solve this question by conducting systematic evaluation experiments using the CHiME-4 corpus. We first show that the integrated end-to-end architecture successfully obtains adequate speech enhancement ability that is superior to that of a conventional alternative (a delay-and-sum beamformer) by observing two signal-level measures: the signal-todistortion ratio and the perceptual evaluation of speech quality. Our findings suggest that to further increase the performances of an integrated system, we must boost the power of the latter-stage speech recognition component. However, an insufficient amount of multichannel noisy speech data is available. Based on these situations, we next investigate the effect of using a large amount of single-channel clean speech data, e.g., the WSJ corpus, for additional training of the speech recognition component. We also show that our approach with clean speech significantly improves the total performance of multichannel end-to-end architecture in the multichannel noisy ASR tasks.

本文言語English
ホスト出版物のタイトル2017 IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2017 - Proceedings
編集者Naonori Ueda, Jen-Tzung Chien, Tomoko Matsui, Jan Larsen, Shinji Watanabe
出版社IEEE Computer Society
ページ1-5
ページ数5
ISBN(電子版)9781509063413
DOI
出版ステータスPublished - 2017 12 5
外部発表はい
イベント2017 IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2017 - Tokyo, Japan
継続期間: 2017 9 252017 9 28

出版物シリーズ

名前IEEE International Workshop on Machine Learning for Signal Processing, MLSP
2017-September
ISSN(印刷版)2161-0363
ISSN(電子版)2161-0371

Other

Other2017 IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2017
国/地域Japan
CityTokyo
Period17/9/2517/9/28

ASJC Scopus subject areas

  • 人間とコンピュータの相互作用
  • 信号処理

フィンガープリント

「Does speech enhancement work with end-to-end ASR objectives? Experimental analysis of multichannel end-to-end ASR」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル