Acoustic Modeling for Distant Multi-talker Speech Recognition with Single- and Multi-channel Branches

Naoyuki Kanda, Yusuke Fujita, Shota Horiguchi, Rintaro Ikeshita, Kenji Nagamatsu, Shinji Watanabe

研究成果: Conference contribution

4 引用 (Scopus)

抜粋

This paper presents a novel heterogeneous-input multi-channel acoustic model (AM) that has both single-channel and multi-channel input branches. In our proposed training pipeline, a single-channel AM is trained first, then a multi-channel AM is trained starting from the single-channel AM with a randomly initialized multi-channel input branch. Our model uniquely uses the power of a complemen-tal speech enhancement (SE) module while exploiting the power of jointly trained AM and SE architecture. Our method was the foundation for the Hitachi/JHU CHiME-5 system that achieved the second-best result in the CHiME-5 competition, and this paper details various investigation results that we were not able to present during the competition period. We also evaluated and reconfirmed our method's effectiveness with the AMI Meeting Corpus. Our AM achieved a 30.12% word error rate (WER) for the development set and a 32.33% WER for the evaluation set for the AMI Corpus, both of which are the best results ever reported to the best of our knowledge.

元の言語English
ホスト出版物のタイトル2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
出版者Institute of Electrical and Electronics Engineers Inc.
ページ6630-6634
ページ数5
ISBN(電子版)9781479981311
DOI
出版物ステータスPublished - 2019 5 1
外部発表Yes
イベント44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Brighton, United Kingdom
継続期間: 2019 5 122019 5 17

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2019-May
ISSN(印刷物)1520-6149

Conference

Conference44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
United Kingdom
Brighton
期間19/5/1219/5/17

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

フィンガープリント Acoustic Modeling for Distant Multi-talker Speech Recognition with Single- and Multi-channel Branches' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用

    Kanda, N., Fujita, Y., Horiguchi, S., Ikeshita, R., Nagamatsu, K., & Watanabe, S. (2019). Acoustic Modeling for Distant Multi-talker Speech Recognition with Single- and Multi-channel Branches. : 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings (pp. 6630-6634). [8682273] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 巻数 2019-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2019.8682273