Student-teacher network learning with enhanced features

Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, John R. Hershey

研究成果: Conference contribution

36 被引用数 (Scopus)

抄録

Recent advances in distant-talking ASR research have confirmed that speech enhancement is an essential technique for improving the ASR performance, especially in the multichannel scenario. However, speech enhancement inevitably distorts speech signals, which can cause significant degradation when enhanced signals are used as training data. Thus, distant-talking ASR systems often resort to using the original noisy signals as training data and the enhanced signals only at test time, and give up on taking advantage of enhancement techniques in the training stage. This paper proposes to make use of enhanced features in the student-teacher learning paradigm. The enhanced features are used as input to a teacher network to obtain soft targets, while a student network tries to mimic the teacher network's outputs using the original noisy features as input, so that speech enhancement is implicitly performed within the student network. Compared with conventional student-teacher learning, which uses a better network as teacher, the proposed self-supervised method uses better (enhanced) inputs to a teacher. This setup matches the above scenario of making use of enhanced features in network training. Experiments with the CHiME-4 challenge real dataset show significant ASR improvements with an error reduction rate of 12% in the single-channel track and 15% in the 2-channel track, respectively, by using 6-channel beamformed features for the teacher model.

本文言語English
ホスト出版物のタイトル2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ5275-5279
ページ数5
ISBN(電子版)9781509041176
DOI
出版ステータスPublished - 2017 6 16
外部発表はい
イベント2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - New Orleans, United States
継続期間: 2017 3 52017 3 9

Other

Other2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
CountryUnited States
CityNew Orleans
Period17/3/517/3/9

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

フィンガープリント 「Student-teacher network learning with enhanced features」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル