The fifth 'CHiME' speech separation and recognition challenge

Dataset, task and baselines

Jon Barker, Shinji Watanabe, Emmanuel Vincent, Jan Trmal

Research output: Contribution to journalConference article

20 Citations (Scopus)

Abstract

The CHiME challenge series aims to advance robust automatic speech recognition (ASR) technology by promoting research at the interface of speech and language processing, signal processing, and machine learning. This paper introduces the 5th CHiME Challenge, which considers the task of distant multi-microphone conversational ASR in real home environments. Speech material was elicited using a dinner party scenario with efforts taken to capture data that is representative of natural conversational speech and recorded by 6 Kinect microphone arrays and 4 binaural microphone pairs. The challenge features a single-array track and a multiple-array track and, for each track, distinct rankings will be produced for systems focusing on robustness with respect to distant-microphone capture vs. systems attempting to address all aspects of the task including conversational language modeling. We discuss the rationale for the challenge and provide a detailed description of the data collection procedure, the task, and the baseline systems for array synchronization, speech enhancement, and conventional and end-to-end ASR.

Original languageEnglish
Pages (from-to)1561-1565
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2018-September
DOIs
Publication statusPublished - 2018 Jan 1
Externally publishedYes
Event19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India
Duration: 2018 Sep 22018 Sep 6

Fingerprint

Microphones
Automatic Speech Recognition
Baseline
Speech recognition
Microphone Array
Speech Enhancement
Speech enhancement
Language Modeling
Learning systems
Signal Processing
Data acquisition
Ranking
Machine Learning
Synchronization
Signal processing
Robustness
Distinct
Scenarios
Series
Speech

Keywords

  • 'CHiME' challenge
  • Conversational speech
  • Microphone array
  • Noise
  • Reverberation
  • Robust ASR

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

The fifth 'CHiME' speech separation and recognition challenge : Dataset, task and baselines. / Barker, Jon; Watanabe, Shinji; Vincent, Emmanuel; Trmal, Jan.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2018-September, 01.01.2018, p. 1561-1565.

Research output: Contribution to journalConference article

@article{0f8225e1c6ee4ab0819415ec1a80c378,
title = "The fifth 'CHiME' speech separation and recognition challenge: Dataset, task and baselines",
abstract = "The CHiME challenge series aims to advance robust automatic speech recognition (ASR) technology by promoting research at the interface of speech and language processing, signal processing, and machine learning. This paper introduces the 5th CHiME Challenge, which considers the task of distant multi-microphone conversational ASR in real home environments. Speech material was elicited using a dinner party scenario with efforts taken to capture data that is representative of natural conversational speech and recorded by 6 Kinect microphone arrays and 4 binaural microphone pairs. The challenge features a single-array track and a multiple-array track and, for each track, distinct rankings will be produced for systems focusing on robustness with respect to distant-microphone capture vs. systems attempting to address all aspects of the task including conversational language modeling. We discuss the rationale for the challenge and provide a detailed description of the data collection procedure, the task, and the baseline systems for array synchronization, speech enhancement, and conventional and end-to-end ASR.",
keywords = "'CHiME' challenge, Conversational speech, Microphone array, Noise, Reverberation, Robust ASR",
author = "Jon Barker and Shinji Watanabe and Emmanuel Vincent and Jan Trmal",
year = "2018",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2018-1768",
language = "English",
volume = "2018-September",
pages = "1561--1565",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - The fifth 'CHiME' speech separation and recognition challenge

T2 - Dataset, task and baselines

AU - Barker, Jon

AU - Watanabe, Shinji

AU - Vincent, Emmanuel

AU - Trmal, Jan

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The CHiME challenge series aims to advance robust automatic speech recognition (ASR) technology by promoting research at the interface of speech and language processing, signal processing, and machine learning. This paper introduces the 5th CHiME Challenge, which considers the task of distant multi-microphone conversational ASR in real home environments. Speech material was elicited using a dinner party scenario with efforts taken to capture data that is representative of natural conversational speech and recorded by 6 Kinect microphone arrays and 4 binaural microphone pairs. The challenge features a single-array track and a multiple-array track and, for each track, distinct rankings will be produced for systems focusing on robustness with respect to distant-microphone capture vs. systems attempting to address all aspects of the task including conversational language modeling. We discuss the rationale for the challenge and provide a detailed description of the data collection procedure, the task, and the baseline systems for array synchronization, speech enhancement, and conventional and end-to-end ASR.

AB - The CHiME challenge series aims to advance robust automatic speech recognition (ASR) technology by promoting research at the interface of speech and language processing, signal processing, and machine learning. This paper introduces the 5th CHiME Challenge, which considers the task of distant multi-microphone conversational ASR in real home environments. Speech material was elicited using a dinner party scenario with efforts taken to capture data that is representative of natural conversational speech and recorded by 6 Kinect microphone arrays and 4 binaural microphone pairs. The challenge features a single-array track and a multiple-array track and, for each track, distinct rankings will be produced for systems focusing on robustness with respect to distant-microphone capture vs. systems attempting to address all aspects of the task including conversational language modeling. We discuss the rationale for the challenge and provide a detailed description of the data collection procedure, the task, and the baseline systems for array synchronization, speech enhancement, and conventional and end-to-end ASR.

KW - 'CHiME' challenge

KW - Conversational speech

KW - Microphone array

KW - Noise

KW - Reverberation

KW - Robust ASR

UR - http://www.scopus.com/inward/record.url?scp=85054986374&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054986374&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2018-1768

DO - 10.21437/Interspeech.2018-1768

M3 - Conference article

VL - 2018-September

SP - 1561

EP - 1565

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -