The third 'CHiME' speech separation and recognition challenge: Analysis and outcomes

Jon Barker, Ricard Marxer, Emmanuel Vincent, Shinji Watanabe

Research output: Contribution to journalArticle

25 Citations (Scopus)

Abstract

This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various 'axes of difficulty' by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations.

Original languageEnglish
JournalComputer Speech and Language
DOIs
Publication statusAccepted/In press - 2016 Apr 25
Externally publishedYes

Fingerprint

Speech recognition
Speech Recognition
Evaluation
Baseline
Mobile devices
Dissect
Data acquisition
Signal to noise ratio
Mobile Devices
Error Rate
System Performance
Speech
Scenarios
Target
Motion
Design

Keywords

  • 'CHiME' challenge
  • Microphone array
  • Noise-robust ASR

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Software
  • Human-Computer Interaction

Cite this

The third 'CHiME' speech separation and recognition challenge : Analysis and outcomes. / Barker, Jon; Marxer, Ricard; Vincent, Emmanuel; Watanabe, Shinji.

In: Computer Speech and Language, 25.04.2016.

Research output: Contribution to journalArticle

@article{ea1337bd014148a4a05d5101a3479f56,
title = "The third 'CHiME' speech separation and recognition challenge: Analysis and outcomes",
abstract = "This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4{\%} to as low as 5.8{\%}. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various 'axes of difficulty' by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations.",
keywords = "'CHiME' challenge, Microphone array, Noise-robust ASR",
author = "Jon Barker and Ricard Marxer and Emmanuel Vincent and Shinji Watanabe",
year = "2016",
month = "4",
day = "25",
doi = "10.1016/j.csl.2016.10.005",
language = "English",
journal = "Computer Speech and Language",
issn = "0885-2308",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - The third 'CHiME' speech separation and recognition challenge

T2 - Analysis and outcomes

AU - Barker, Jon

AU - Marxer, Ricard

AU - Vincent, Emmanuel

AU - Watanabe, Shinji

PY - 2016/4/25

Y1 - 2016/4/25

N2 - This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various 'axes of difficulty' by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations.

AB - This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various 'axes of difficulty' by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations.

KW - 'CHiME' challenge

KW - Microphone array

KW - Noise-robust ASR

UR - http://www.scopus.com/inward/record.url?scp=85008625955&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85008625955&partnerID=8YFLogxK

U2 - 10.1016/j.csl.2016.10.005

DO - 10.1016/j.csl.2016.10.005

M3 - Article

AN - SCOPUS:85008625955

JO - Computer Speech and Language

JF - Computer Speech and Language

SN - 0885-2308

ER -