A purely end-to-end system for multi-speaker speech recognition

Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey

Research output: Chapter in Book/Report/Conference proceedingConference contribution

50 Citations (Scopus)

Abstract

Recently, there has been growing interest in multi-speaker speech recognition, where the utterances of multiple speakers are recognized from their mixture. Promising techniques have been proposed for this task, but earlier works have required additional training data such as isolated source signals or senone alignments for effective learning. In this paper, we propose a new sequence-to-sequence framework to directly decode multiple label sequences from a single speech sequence by unifying source separation and speech recognition functions in an end-to-end manner. We further propose a new objective function to improve the contrast between the hidden vectors to avoid generating similar hypotheses. Experimental results show that the model is directly able to learn a mapping from a speech mixture to multiple label sequences, achieving 83.1% relative improvement compared to a model trained without the proposed objective. Interestingly, the results are comparable to those produced by previous end-to-end works featuring explicit separation and recognition modules.

Original languageEnglish
Title of host publicationACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
PublisherAssociation for Computational Linguistics (ACL)
Pages2620-2630
Number of pages11
ISBN (Electronic)9781948087322
DOIs
Publication statusPublished - 2018
Externally publishedYes
Event56th Annual Meeting of the Association for Computational Linguistics, ACL 2018 - Melbourne, Australia
Duration: 2018 Jul 152018 Jul 20

Publication series

NameACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
Volume1

Conference

Conference56th Annual Meeting of the Association for Computational Linguistics, ACL 2018
Country/TerritoryAustralia
CityMelbourne
Period18/7/1518/7/20

ASJC Scopus subject areas

  • Software
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'A purely end-to-end system for multi-speaker speech recognition'. Together they form a unique fingerprint.

Cite this