Crowdsourcing for evaluating machine translation quality

Shinsuke Goto, Donghui Lin, Toru Ishida

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

The recent popularity of machine translation has increased the demand for the evaluation of translations. However, the traditional evaluation approach, manual checking by a bilingual professional, is too expensive and too slow. In this study, we confirm the feasibility of crowdsourcing by analyzing the accuracy of crowdsourcing translation evaluations. We compare crowdsourcing scores to professional scores with regard to three metrics: translation-score, sentence-score, and system-score. A Chinese to English translation evaluation task was designed using around the NTCIR-9 PATENT parallel corpus with the goal being 5-range evaluations of adequacy and fluency. The experiment shows that the average score of crowdsource workers well matches professional evaluation results. The system-score comparison strongly indicates that crowdsourcing can be used to find the best translation system given the input of 10 source sentence.

Original languageEnglish
Title of host publicationProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
EditorsNicoletta Calzolari, Khalid Choukri, Sara Goggi, Thierry Declerck, Joseph Mariani, Bente Maegaard, Asuncion Moreno, Jan Odijk, Helene Mazo, Stelios Piperidis, Hrafn Loftsson
PublisherEuropean Language Resources Association (ELRA)
Pages3456-3463
Number of pages8
ISBN (Electronic)9782951740884
Publication statusPublished - 2014 Jan 1
Externally publishedYes
Event9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland
Duration: 2014 May 262014 May 31

Publication series

NameProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

Other

Other9th International Conference on Language Resources and Evaluation, LREC 2014
CountryIceland
CityReykjavik
Period14/5/2614/5/31

Keywords

  • Crowdsourcing
  • Evaluation
  • Machine translation

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Education
  • Language and Linguistics

Fingerprint Dive into the research topics of 'Crowdsourcing for evaluating machine translation quality'. Together they form a unique fingerprint.

  • Cite this

    Goto, S., Lin, D., & Ishida, T. (2014). Crowdsourcing for evaluating machine translation quality. In N. Calzolari, K. Choukri, S. Goggi, T. Declerck, J. Mariani, B. Maegaard, A. Moreno, J. Odijk, H. Mazo, S. Piperidis, & H. Loftsson (Eds.), Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014 (pp. 3456-3463). (Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014). European Language Resources Association (ELRA).