A purely end-to-end system for multi-speaker speech recognition

Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey

研究成果: Conference contribution

50 被引用数 (Scopus)

抄録

Recently, there has been growing interest in multi-speaker speech recognition, where the utterances of multiple speakers are recognized from their mixture. Promising techniques have been proposed for this task, but earlier works have required additional training data such as isolated source signals or senone alignments for effective learning. In this paper, we propose a new sequence-to-sequence framework to directly decode multiple label sequences from a single speech sequence by unifying source separation and speech recognition functions in an end-to-end manner. We further propose a new objective function to improve the contrast between the hidden vectors to avoid generating similar hypotheses. Experimental results show that the model is directly able to learn a mapping from a speech mixture to multiple label sequences, achieving 83.1% relative improvement compared to a model trained without the proposed objective. Interestingly, the results are comparable to those produced by previous end-to-end works featuring explicit separation and recognition modules.

本文言語English
ホスト出版物のタイトルACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
出版社Association for Computational Linguistics (ACL)
ページ2620-2630
ページ数11
ISBN(電子版)9781948087322
DOI
出版ステータスPublished - 2018
外部発表はい
イベント56th Annual Meeting of the Association for Computational Linguistics, ACL 2018 - Melbourne, Australia
継続期間: 2018 7月 152018 7月 20

出版物シリーズ

名前ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
1

Conference

Conference56th Annual Meeting of the Association for Computational Linguistics, ACL 2018
国/地域Australia
CityMelbourne
Period18/7/1518/7/20

ASJC Scopus subject areas

  • ソフトウェア
  • 計算理論と計算数学

フィンガープリント

「A purely end-to-end system for multi-speaker speech recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル