Sequence to multi-sequence learning via conditional chain mapping for mixture signals

Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe*, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

12 Citations (Scopus)

Abstract

Neural sequence-to-sequence models are well established for applications which can be cast as mapping a single input sequence into a single output sequence. In this work, we focus on one-to-many sequence transduction problems, such as extracting multiple sequential sources from a mixture sequence. We extend the standard sequence-to-sequence model to a conditional multi-sequence model, which explicitly models the relevance between multiple output sequences with the probabilistic chain rule. Based on this extension, our model can conditionally infer output sequences one-by-one by making use of both input and previously-estimated contextual output sequences. This model additionally has a simple and efficient stop criterion for the end of the transduction, making it able to infer the variable number of output sequences. We take speech data as a primary test field to evaluate our methods since the observed speech data is often composed of multiple sources due to the nature of the superposition principle of sound waves. Experiments on several different tasks including speech separation and multi-speaker speech recognition show that our conditional multi-sequence models lead to consistent improvements over the conventional non-conditional models.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume2020-December
Publication statusPublished - 2020
Externally publishedYes
Event34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online
Duration: 2020 Dec 62020 Dec 12

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Sequence to multi-sequence learning via conditional chain mapping for mixture signals'. Together they form a unique fingerprint.

Cite this