Automatic grader of MT outputs in colloquial style by using multiple edit distances

Yasuhiro Akiba, Kenji Imamura, Eiichiro Sumita, Hiromi Nakaiwa, Seiichi Yamamoto, Hiroshi G. Okuno

Research output: Contribution to journalArticle

Abstract

This paper addresses the challenging problem of automating the human's intelligent ability to evaluate output from machine translation (MT) systems, which are subsystems of Speech-to-Speech MT (SSMT) systems. Conventional automatic MT evaluation methods include BLEU, which MT researchers have frequently used. BLEU is unsuitable for SSMT evaluation for two reasons. First, BLEU assesses errors lightly at the beginning or ending of translations and heavily in the middle, although the assessments should be independent from the positions. Second, BLEU lacks tolerance in accepting colloquial sentences with small errors, although such errors do not prevent us from continuing conversation. In this paper, the authors report a new evaluation method called RED that automatically grades each MT output by using a decision tree (DT). The DT is learned from training examples that are encoded by using multiple edit distances and their grades. The multiple edit distances are normal edit dista nee (ED) defined by insertion, deletion, and replacement, as well as extensions of ED. The use of multiple edit distances allows more tolerance than either ED or BLEU. Each evaluated MT output is assigned a grade by using the DT. RED and BLEU were compared for the task of evaluating SSMT systems, which have various performances, on a spoken language corpus, ATR's Basic Travel Expression Corpus (BTEC). Experimental results showed that RED significantly outperformed BLEU.

Original languageEnglish
Pages (from-to)139-148
Number of pages10
JournalTransactions of the Japanese Society for Artificial Intelligence
Volume20
Issue number3
DOIs
Publication statusPublished - 2005
Externally publishedYes

Fingerprint

Decision trees

Keywords

  • BLEU
  • Decision tree
  • Edit distances
  • Machine translation evaluation
  • mWER
  • Reference translations

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Automatic grader of MT outputs in colloquial style by using multiple edit distances. / Akiba, Yasuhiro; Imamura, Kenji; Sumita, Eiichiro; Nakaiwa, Hiromi; Yamamoto, Seiichi; Okuno, Hiroshi G.

In: Transactions of the Japanese Society for Artificial Intelligence, Vol. 20, No. 3, 2005, p. 139-148.

Research output: Contribution to journalArticle

Akiba, Yasuhiro ; Imamura, Kenji ; Sumita, Eiichiro ; Nakaiwa, Hiromi ; Yamamoto, Seiichi ; Okuno, Hiroshi G. / Automatic grader of MT outputs in colloquial style by using multiple edit distances. In: Transactions of the Japanese Society for Artificial Intelligence. 2005 ; Vol. 20, No. 3. pp. 139-148.
@article{3fa79f6f0f104cd58c23daccd3dee0e0,
title = "Automatic grader of MT outputs in colloquial style by using multiple edit distances",
abstract = "This paper addresses the challenging problem of automating the human's intelligent ability to evaluate output from machine translation (MT) systems, which are subsystems of Speech-to-Speech MT (SSMT) systems. Conventional automatic MT evaluation methods include BLEU, which MT researchers have frequently used. BLEU is unsuitable for SSMT evaluation for two reasons. First, BLEU assesses errors lightly at the beginning or ending of translations and heavily in the middle, although the assessments should be independent from the positions. Second, BLEU lacks tolerance in accepting colloquial sentences with small errors, although such errors do not prevent us from continuing conversation. In this paper, the authors report a new evaluation method called RED that automatically grades each MT output by using a decision tree (DT). The DT is learned from training examples that are encoded by using multiple edit distances and their grades. The multiple edit distances are normal edit dista nee (ED) defined by insertion, deletion, and replacement, as well as extensions of ED. The use of multiple edit distances allows more tolerance than either ED or BLEU. Each evaluated MT output is assigned a grade by using the DT. RED and BLEU were compared for the task of evaluating SSMT systems, which have various performances, on a spoken language corpus, ATR's Basic Travel Expression Corpus (BTEC). Experimental results showed that RED significantly outperformed BLEU.",
keywords = "BLEU, Decision tree, Edit distances, Machine translation evaluation, mWER, Reference translations",
author = "Yasuhiro Akiba and Kenji Imamura and Eiichiro Sumita and Hiromi Nakaiwa and Seiichi Yamamoto and Okuno, {Hiroshi G.}",
year = "2005",
doi = "10.1527/tjsai.20.139",
language = "English",
volume = "20",
pages = "139--148",
journal = "Transactions of the Japanese Society for Artificial Intelligence",
issn = "1346-0714",
publisher = "Japanese Society for Artificial Intelligence",
number = "3",

}

TY - JOUR

T1 - Automatic grader of MT outputs in colloquial style by using multiple edit distances

AU - Akiba, Yasuhiro

AU - Imamura, Kenji

AU - Sumita, Eiichiro

AU - Nakaiwa, Hiromi

AU - Yamamoto, Seiichi

AU - Okuno, Hiroshi G.

PY - 2005

Y1 - 2005

N2 - This paper addresses the challenging problem of automating the human's intelligent ability to evaluate output from machine translation (MT) systems, which are subsystems of Speech-to-Speech MT (SSMT) systems. Conventional automatic MT evaluation methods include BLEU, which MT researchers have frequently used. BLEU is unsuitable for SSMT evaluation for two reasons. First, BLEU assesses errors lightly at the beginning or ending of translations and heavily in the middle, although the assessments should be independent from the positions. Second, BLEU lacks tolerance in accepting colloquial sentences with small errors, although such errors do not prevent us from continuing conversation. In this paper, the authors report a new evaluation method called RED that automatically grades each MT output by using a decision tree (DT). The DT is learned from training examples that are encoded by using multiple edit distances and their grades. The multiple edit distances are normal edit dista nee (ED) defined by insertion, deletion, and replacement, as well as extensions of ED. The use of multiple edit distances allows more tolerance than either ED or BLEU. Each evaluated MT output is assigned a grade by using the DT. RED and BLEU were compared for the task of evaluating SSMT systems, which have various performances, on a spoken language corpus, ATR's Basic Travel Expression Corpus (BTEC). Experimental results showed that RED significantly outperformed BLEU.

AB - This paper addresses the challenging problem of automating the human's intelligent ability to evaluate output from machine translation (MT) systems, which are subsystems of Speech-to-Speech MT (SSMT) systems. Conventional automatic MT evaluation methods include BLEU, which MT researchers have frequently used. BLEU is unsuitable for SSMT evaluation for two reasons. First, BLEU assesses errors lightly at the beginning or ending of translations and heavily in the middle, although the assessments should be independent from the positions. Second, BLEU lacks tolerance in accepting colloquial sentences with small errors, although such errors do not prevent us from continuing conversation. In this paper, the authors report a new evaluation method called RED that automatically grades each MT output by using a decision tree (DT). The DT is learned from training examples that are encoded by using multiple edit distances and their grades. The multiple edit distances are normal edit dista nee (ED) defined by insertion, deletion, and replacement, as well as extensions of ED. The use of multiple edit distances allows more tolerance than either ED or BLEU. Each evaluated MT output is assigned a grade by using the DT. RED and BLEU were compared for the task of evaluating SSMT systems, which have various performances, on a spoken language corpus, ATR's Basic Travel Expression Corpus (BTEC). Experimental results showed that RED significantly outperformed BLEU.

KW - BLEU

KW - Decision tree

KW - Edit distances

KW - Machine translation evaluation

KW - mWER

KW - Reference translations

UR - http://www.scopus.com/inward/record.url?scp=18544386509&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=18544386509&partnerID=8YFLogxK

U2 - 10.1527/tjsai.20.139

DO - 10.1527/tjsai.20.139

M3 - Article

VL - 20

SP - 139

EP - 148

JO - Transactions of the Japanese Society for Artificial Intelligence

JF - Transactions of the Japanese Society for Artificial Intelligence

SN - 1346-0714

IS - 3

ER -