Frame-Level Phoneme-Invariant Speaker Embedding for Text-Independent Speaker Recognition on Extremely Short Utterances

Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Marc Delcroix, Tetsuji Ogawa

研究成果: Conference contribution

1 被引用数 (Scopus)

抄録

This paper investigates a phoneme-invariant speaker embedding approach for speaker recognition on extremely short utterances. Intuitively, phonemes are nuisance information for text-independent speaker recognition task since the contents of the speech are usually mismatched between enrolling and testing time. However, many studies have shown that incorporating phoneme information is quite effective to improve the performance of the speaker recognition system. One reasonable explanation for this counter-intuitive result is that the pooling mechanism of segment-based speaker embedding can focus on the specific phonemes which contain rich speaker information, and phoneme information may help this. From this insight, we hypothesize that the pooling mechanism and phoneme-aware training are harmful to extract the speaker embeddings from extremely short utterances. To verify this hypothesis, an adversarial framework is introduced to remove phoneme-variability from the frame-wise speaker embeddings. The experimental results on the Librispeech corpus confirm that our frame-wise, phoneme-adversarial approach outperforms the conventional segment-wise, phoneme-aware approach for short utterances of less than about 1.4 seconds.

本文言語English
ホスト出版物のタイトル2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ6799-6803
ページ数5
ISBN(電子版)9781509066315
DOI
出版ステータスPublished - 2020 5
イベント2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, Spain
継続期間: 2020 5 42020 5 8

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2020-May
ISSN(印刷版)1520-6149

Conference

Conference2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
CountrySpain
CityBarcelona
Period20/5/420/5/8

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

フィンガープリント 「Frame-Level Phoneme-Invariant Speaker Embedding for Text-Independent Speaker Recognition on Extremely Short Utterances」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル