TY - JOUR
T1 - The third ‘CHiME’ speech separation and recognition challenge
T2 - Analysis and outcomes
AU - Barker, Jon
AU - Marxer, Ricard
AU - Vincent, Emmanuel
AU - Watanabe, Shinji
N1 - Publisher Copyright:
© 2016 Elsevier Ltd
PY - 2017/11
Y1 - 2017/11
N2 - This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various ‘axes of difficulty’ by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations.
AB - This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various ‘axes of difficulty’ by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations.
KW - Microphone array
KW - Noise-robust ASR
KW - ‘CHiME’ challenge
UR - http://www.scopus.com/inward/record.url?scp=85008625955&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85008625955&partnerID=8YFLogxK
U2 - 10.1016/j.csl.2016.10.005
DO - 10.1016/j.csl.2016.10.005
M3 - Article
AN - SCOPUS:85008625955
VL - 46
SP - 605
EP - 626
JO - Computer Speech and Language
JF - Computer Speech and Language
SN - 0885-2308
ER -