End-to-End Multi-Speaker Speech Recognition

Shane Settle, Jonathan Le Roux, Takaaki Hori, Shinji Watanabe, John R. Hershey

研究成果

42 被引用数 (Scopus)

抄録

Current advances in deep learning have resulted in a convergence of methods across a wide range of tasks, opening the door for tighter integration of modules that were previously developed and optimized in isolation. Recent ground-breaking works have produced end-to-end deep network methods for both speech separation and end-to-end automatic speech recognition (ASR). Speech separation methods such as deep clustering address the challenging cocktail-party problem of distinguishing multiple simultaneous speech signals. This is an enabling technology for real-world human machine interaction (HMI). However, speech separation requires ASR to interpret the speech for any HMI task. Likewise, ASR requires speech separation to work in an unconstrained environment. Although these two components can be trained in isolation and connected after the fact, this paradigm is likely to be sub-optimal, since it relies on artificially mixed data. In this paper, we develop the first fully end-to-end, jointly trained deep learning system for separation and recognition of overlapping speech signals. The joint training framework synergistically adapts the separation and recognition to each other. As an additional benefit, it enables training on more realistic data that contains only mixed signals and their transcriptions, and thus is suited to large scale training on existing transcribed data.

本文言語English
ホスト出版物のタイトル2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ4819-4823
ページ数5
ISBN(印刷版)9781538646588
DOI
出版ステータスPublished - 2018 9 10
外部発表はい
イベント2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
継続期間: 2018 4 152018 4 20

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2018-April
ISSN(印刷版)1520-6149

Other

Other2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
国/地域Canada
CityCalgary
Period18/4/1518/4/20

ASJC Scopus subject areas

  • ソフトウェア
  • 信号処理
  • 電子工学および電気工学

フィンガープリント

「End-to-End Multi-Speaker Speech Recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル