Automatic detection and semi-automatic revision of non-machine-translatable parts of a sentence

Kiyotaka Uchimoto, Naoko Hayashida, Toru Ishida, Hitoshi Isahara

Research output: Contribution to conferencePaper

3 Citations (Scopus)

Abstract

We developed a method for automatically distinguishing the machine-translatable and non-machine-translatable parts of a given sentence for a particular machine translation (MT) system. They can be distinguished by calculating the similarity between a source-language sentence and its back translation for each part of the sentence. The parts with low similarities are highly likely to be non-machinetranslatable parts. We showed that the parts of a sentence that are automatically distinguished as non-machine-translatable provide useful information for paraphrasing or revising the sentence in the source language to improve the quality of the translation by the MT system. We also developed a method of providing knowledge useful to effectively paraphrasing or revising the detected non-machine-translatable parts. Two types of knowledge were extracted from the EDR dictionary: one for transforming a lexical entry into an expression used in the definition and the other for conducting the reverse paraphrasing, which transforms an expression found in a definition into the lexical entry. We found that the information provided by the methods helped improve the machine translatability of the originally input sentences.

Original languageEnglish
Pages703-708
Number of pages6
Publication statusPublished - 2006 Jan 1
Externally publishedYes
Event5th International Conference on Language Resources and Evaluation, LREC 2006 - Genoa, Italy
Duration: 2006 May 222006 May 28

Other

Other5th International Conference on Language Resources and Evaluation, LREC 2006
CountryItaly
CityGenoa
Period06/5/2206/5/28

Fingerprint

language
dictionary
Paraphrasing
Machine Translation System
Source Language
Lexical Entries
Dictionary
Conducting
Translatability

ASJC Scopus subject areas

  • Education
  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics

Cite this

Uchimoto, K., Hayashida, N., Ishida, T., & Isahara, H. (2006). Automatic detection and semi-automatic revision of non-machine-translatable parts of a sentence. 703-708. Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.

Automatic detection and semi-automatic revision of non-machine-translatable parts of a sentence. / Uchimoto, Kiyotaka; Hayashida, Naoko; Ishida, Toru; Isahara, Hitoshi.

2006. 703-708 Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.

Research output: Contribution to conferencePaper

Uchimoto, K, Hayashida, N, Ishida, T & Isahara, H 2006, 'Automatic detection and semi-automatic revision of non-machine-translatable parts of a sentence' Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy, 06/5/22 - 06/5/28, pp. 703-708.
Uchimoto K, Hayashida N, Ishida T, Isahara H. Automatic detection and semi-automatic revision of non-machine-translatable parts of a sentence. 2006. Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.
Uchimoto, Kiyotaka ; Hayashida, Naoko ; Ishida, Toru ; Isahara, Hitoshi. / Automatic detection and semi-automatic revision of non-machine-translatable parts of a sentence. Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.6 p.
@conference{19915007047c49d399bc8d22563b014c,
title = "Automatic detection and semi-automatic revision of non-machine-translatable parts of a sentence",
abstract = "We developed a method for automatically distinguishing the machine-translatable and non-machine-translatable parts of a given sentence for a particular machine translation (MT) system. They can be distinguished by calculating the similarity between a source-language sentence and its back translation for each part of the sentence. The parts with low similarities are highly likely to be non-machinetranslatable parts. We showed that the parts of a sentence that are automatically distinguished as non-machine-translatable provide useful information for paraphrasing or revising the sentence in the source language to improve the quality of the translation by the MT system. We also developed a method of providing knowledge useful to effectively paraphrasing or revising the detected non-machine-translatable parts. Two types of knowledge were extracted from the EDR dictionary: one for transforming a lexical entry into an expression used in the definition and the other for conducting the reverse paraphrasing, which transforms an expression found in a definition into the lexical entry. We found that the information provided by the methods helped improve the machine translatability of the originally input sentences.",
author = "Kiyotaka Uchimoto and Naoko Hayashida and Toru Ishida and Hitoshi Isahara",
year = "2006",
month = "1",
day = "1",
language = "English",
pages = "703--708",
note = "5th International Conference on Language Resources and Evaluation, LREC 2006 ; Conference date: 22-05-2006 Through 28-05-2006",

}

TY - CONF

T1 - Automatic detection and semi-automatic revision of non-machine-translatable parts of a sentence

AU - Uchimoto, Kiyotaka

AU - Hayashida, Naoko

AU - Ishida, Toru

AU - Isahara, Hitoshi

PY - 2006/1/1

Y1 - 2006/1/1

N2 - We developed a method for automatically distinguishing the machine-translatable and non-machine-translatable parts of a given sentence for a particular machine translation (MT) system. They can be distinguished by calculating the similarity between a source-language sentence and its back translation for each part of the sentence. The parts with low similarities are highly likely to be non-machinetranslatable parts. We showed that the parts of a sentence that are automatically distinguished as non-machine-translatable provide useful information for paraphrasing or revising the sentence in the source language to improve the quality of the translation by the MT system. We also developed a method of providing knowledge useful to effectively paraphrasing or revising the detected non-machine-translatable parts. Two types of knowledge were extracted from the EDR dictionary: one for transforming a lexical entry into an expression used in the definition and the other for conducting the reverse paraphrasing, which transforms an expression found in a definition into the lexical entry. We found that the information provided by the methods helped improve the machine translatability of the originally input sentences.

AB - We developed a method for automatically distinguishing the machine-translatable and non-machine-translatable parts of a given sentence for a particular machine translation (MT) system. They can be distinguished by calculating the similarity between a source-language sentence and its back translation for each part of the sentence. The parts with low similarities are highly likely to be non-machinetranslatable parts. We showed that the parts of a sentence that are automatically distinguished as non-machine-translatable provide useful information for paraphrasing or revising the sentence in the source language to improve the quality of the translation by the MT system. We also developed a method of providing knowledge useful to effectively paraphrasing or revising the detected non-machine-translatable parts. Two types of knowledge were extracted from the EDR dictionary: one for transforming a lexical entry into an expression used in the definition and the other for conducting the reverse paraphrasing, which transforms an expression found in a definition into the lexical entry. We found that the information provided by the methods helped improve the machine translatability of the originally input sentences.

UR - http://www.scopus.com/inward/record.url?scp=38149118510&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38149118510&partnerID=8YFLogxK

M3 - Paper

SP - 703

EP - 708

ER -