Speaker recognition benchmark using the CHiME-5 corpus

Daniel Garcia-Romero, David Snyder, Shinji Watanabe, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur

Research output: Contribution to journalConference articlepeer-review

Abstract

In this paper, we introduce a speaker recognition benchmark derived from the publicly-available CHiME-5 corpus. Our goal is to foster research that tackles the challenging artifacts introduced by far-field multi-speaker recordings of naturally occurring spoken interactions. The benchmark comprises four tasks that involve enrollment and test conditions with single-speaker and/or multi-speaker recordings. Additionally, it supports performance comparisons between close-talking vs distant/far-field microphone recordings, and single-microphone vs microphone-array approaches. We validate the evaluation design with a single-microphone state-of-the-art DNN speaker recognition and diarization system (that we are making publicly available). The results show that the proposed tasks are very challenging, and can be used to quantify the performance gap due to the degradations present in far-field multi-speaker recordings.

Original languageEnglish
Pages (from-to)1506-1510
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2019-September
DOIs
Publication statusPublished - 2019
Externally publishedYes
Event20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria
Duration: 2019 Sep 152019 Sep 19

Keywords

  • Far-field speech
  • Multi-speaker
  • Robustness
  • Speaker recognition

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint Dive into the research topics of 'Speaker recognition benchmark using the CHiME-5 corpus'. Together they form a unique fingerprint.

Cite this