Speech Enhancement Using Self-Adaptation and Multi-Head Self-Attention

Yuma Koizumi, Kohei Yatabe, Marc Delcroix, Yoshiki Masuyama, Daiki Takeuchi

研究成果: Conference contribution

15 被引用数 (Scopus)

抄録

This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance. Conventional studies of deep neural network (DNN)-based speech enhancement mainly focus on building a speaker independent model. Meanwhile, in speech applications including speech recognition and synthesis, it is known that model adaptation to the target speaker improves the accuracy. Our research question is whether a DNN for speech enhancement can be adopted to unknown speakers without any auxiliary guidance signal in test-phase. To achieve this, we adopt multi-task learning of speech enhancement and speaker identification, and use the output of the final hidden layer of speaker identification branch as an auxiliary feature. In addition, we use multi-head self-attention for capturing long-term dependencies in the speech and noise. Experimental results on a public dataset show that our strategy achieves the state-of-the-art performance and also outperform conventional methods in terms of subjective quality.

本文言語English
ホスト出版物のタイトル2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ181-185
ページ数5
ISBN(電子版)9781509066315
DOI
出版ステータスPublished - 2020 5
イベント2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, Spain
継続期間: 2020 5 42020 5 8

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2020-May
ISSN(印刷版)1520-6149

Conference

Conference2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
国/地域Spain
CityBarcelona
Period20/5/420/5/8

ASJC Scopus subject areas

  • ソフトウェア
  • 信号処理
  • 電子工学および電気工学

フィンガープリント

「Speech Enhancement Using Self-Adaptation and Multi-Head Self-Attention」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル