Speaker-conditional chain model for speech separation and extraction

Jing Shi*, Jiaming Xu, Yusuke Fujita, Shinji Watanabe, Bo Xu

*この研究の対応する著者

研究成果: Conference article査読

8 被引用数 (Scopus)

抄録

Speech separation has been extensively explored to tackle the cocktail party problem. However, these studies are still far from having enough generalization capabilities for real scenarios. In this work, we raise a common strategy named Speaker-Conditional Chain Model to process complex speech recordings. In the proposed method, our model first infers the identities of variable numbers of speakers from the observation based on a sequence-to-sequence model. Then, it takes the information from the inferred speakers as conditions to extract their speech sources. With the predicted speaker information from whole observation, our model is helpful to solve the problem of conventional speech separation and speaker extraction for multi-round long recordings. The experiments from standard fully-overlapped speech separation benchmarks show comparable results with prior studies, while our proposed model gets better adaptability for multi-round long recordings.

本文言語English
ページ(範囲)2707-2711
ページ数5
ジャーナルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2020-October
DOI
出版ステータスPublished - 2020
外部発表はい
イベント21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
継続期間: 2020 10 252020 10 29

ASJC Scopus subject areas

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション

フィンガープリント

「Speaker-conditional chain model for speech separation and extraction」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル