Crowdsourcing for evaluating machine translation quality

Shinsuke Goto, Donghui Lin, Toru Ishida

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

The recent popularity of machine translation has increased the demand for the evaluation of translations. However, the traditional evaluation approach, manual checking by a bilingual professional, is too expensive and too slow. In this study, we confirm the feasibility of crowdsourcing by analyzing the accuracy of crowdsourcing translation evaluations. We compare crowdsourcing scores to professional scores with regard to three metrics: translation-score, sentence-score, and system-score. A Chinese to English translation evaluation task was designed using around the NTCIR-9 PATENT parallel corpus with the goal being 5-range evaluations of adequacy and fluency. The experiment shows that the average score of crowdsource workers well matches professional evaluation results. The system-score comparison strongly indicates that crowdsourcing can be used to find the best translation system given the input of 10 source sentence.

Original languageEnglish
Title of host publicationProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
EditorsNicoletta Calzolari, Khalid Choukri, Sara Goggi, Thierry Declerck, Joseph Mariani, Bente Maegaard, Asuncion Moreno, Jan Odijk, Helene Mazo, Stelios Piperidis, Hrafn Loftsson
PublisherEuropean Language Resources Association (ELRA)
Pages3456-3463
Number of pages8
ISBN (Electronic)9782951740884
Publication statusPublished - 2014 Jan 1
Externally publishedYes
Event9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland
Duration: 2014 May 262014 May 31

Publication series

NameProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

Other

Other9th International Conference on Language Resources and Evaluation, LREC 2014
CountryIceland
CityReykjavik
Period14/5/2614/5/31

Fingerprint

evaluation
Machine Translation
popularity
Evaluation
worker
demand
experiment

Keywords

  • Crowdsourcing
  • Evaluation
  • Machine translation

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Education
  • Language and Linguistics

Cite this

Goto, S., Lin, D., & Ishida, T. (2014). Crowdsourcing for evaluating machine translation quality. In N. Calzolari, K. Choukri, S. Goggi, T. Declerck, J. Mariani, B. Maegaard, A. Moreno, J. Odijk, H. Mazo, S. Piperidis, ... H. Loftsson (Eds.), Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014 (pp. 3456-3463). (Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014). European Language Resources Association (ELRA).

Crowdsourcing for evaluating machine translation quality. / Goto, Shinsuke; Lin, Donghui; Ishida, Toru.

Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. ed. / Nicoletta Calzolari; Khalid Choukri; Sara Goggi; Thierry Declerck; Joseph Mariani; Bente Maegaard; Asuncion Moreno; Jan Odijk; Helene Mazo; Stelios Piperidis; Hrafn Loftsson. European Language Resources Association (ELRA), 2014. p. 3456-3463 (Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Goto, S, Lin, D & Ishida, T 2014, Crowdsourcing for evaluating machine translation quality. in N Calzolari, K Choukri, S Goggi, T Declerck, J Mariani, B Maegaard, A Moreno, J Odijk, H Mazo, S Piperidis & H Loftsson (eds), Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, European Language Resources Association (ELRA), pp. 3456-3463, 9th International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, 14/5/26.
Goto S, Lin D, Ishida T. Crowdsourcing for evaluating machine translation quality. In Calzolari N, Choukri K, Goggi S, Declerck T, Mariani J, Maegaard B, Moreno A, Odijk J, Mazo H, Piperidis S, Loftsson H, editors, Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA). 2014. p. 3456-3463. (Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014).
Goto, Shinsuke ; Lin, Donghui ; Ishida, Toru. / Crowdsourcing for evaluating machine translation quality. Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. editor / Nicoletta Calzolari ; Khalid Choukri ; Sara Goggi ; Thierry Declerck ; Joseph Mariani ; Bente Maegaard ; Asuncion Moreno ; Jan Odijk ; Helene Mazo ; Stelios Piperidis ; Hrafn Loftsson. European Language Resources Association (ELRA), 2014. pp. 3456-3463 (Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014).
@inproceedings{c8597e7082aa4dd6a7937969847c7329,
title = "Crowdsourcing for evaluating machine translation quality",
abstract = "The recent popularity of machine translation has increased the demand for the evaluation of translations. However, the traditional evaluation approach, manual checking by a bilingual professional, is too expensive and too slow. In this study, we confirm the feasibility of crowdsourcing by analyzing the accuracy of crowdsourcing translation evaluations. We compare crowdsourcing scores to professional scores with regard to three metrics: translation-score, sentence-score, and system-score. A Chinese to English translation evaluation task was designed using around the NTCIR-9 PATENT parallel corpus with the goal being 5-range evaluations of adequacy and fluency. The experiment shows that the average score of crowdsource workers well matches professional evaluation results. The system-score comparison strongly indicates that crowdsourcing can be used to find the best translation system given the input of 10 source sentence.",
keywords = "Crowdsourcing, Evaluation, Machine translation",
author = "Shinsuke Goto and Donghui Lin and Toru Ishida",
year = "2014",
month = "1",
day = "1",
language = "English",
series = "Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014",
publisher = "European Language Resources Association (ELRA)",
pages = "3456--3463",
editor = "Nicoletta Calzolari and Khalid Choukri and Sara Goggi and Thierry Declerck and Joseph Mariani and Bente Maegaard and Asuncion Moreno and Jan Odijk and Helene Mazo and Stelios Piperidis and Hrafn Loftsson",
booktitle = "Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014",

}

TY - GEN

T1 - Crowdsourcing for evaluating machine translation quality

AU - Goto, Shinsuke

AU - Lin, Donghui

AU - Ishida, Toru

PY - 2014/1/1

Y1 - 2014/1/1

N2 - The recent popularity of machine translation has increased the demand for the evaluation of translations. However, the traditional evaluation approach, manual checking by a bilingual professional, is too expensive and too slow. In this study, we confirm the feasibility of crowdsourcing by analyzing the accuracy of crowdsourcing translation evaluations. We compare crowdsourcing scores to professional scores with regard to three metrics: translation-score, sentence-score, and system-score. A Chinese to English translation evaluation task was designed using around the NTCIR-9 PATENT parallel corpus with the goal being 5-range evaluations of adequacy and fluency. The experiment shows that the average score of crowdsource workers well matches professional evaluation results. The system-score comparison strongly indicates that crowdsourcing can be used to find the best translation system given the input of 10 source sentence.

AB - The recent popularity of machine translation has increased the demand for the evaluation of translations. However, the traditional evaluation approach, manual checking by a bilingual professional, is too expensive and too slow. In this study, we confirm the feasibility of crowdsourcing by analyzing the accuracy of crowdsourcing translation evaluations. We compare crowdsourcing scores to professional scores with regard to three metrics: translation-score, sentence-score, and system-score. A Chinese to English translation evaluation task was designed using around the NTCIR-9 PATENT parallel corpus with the goal being 5-range evaluations of adequacy and fluency. The experiment shows that the average score of crowdsource workers well matches professional evaluation results. The system-score comparison strongly indicates that crowdsourcing can be used to find the best translation system given the input of 10 source sentence.

KW - Crowdsourcing

KW - Evaluation

KW - Machine translation

UR - http://www.scopus.com/inward/record.url?scp=85026825744&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85026825744&partnerID=8YFLogxK

M3 - Conference contribution

T3 - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

SP - 3456

EP - 3463

BT - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

A2 - Calzolari, Nicoletta

A2 - Choukri, Khalid

A2 - Goggi, Sara

A2 - Declerck, Thierry

A2 - Mariani, Joseph

A2 - Maegaard, Bente

A2 - Moreno, Asuncion

A2 - Odijk, Jan

A2 - Mazo, Helene

A2 - Piperidis, Stelios

A2 - Loftsson, Hrafn

PB - European Language Resources Association (ELRA)

ER -