抄録
We developed a method for automatically distinguishing the machine-translatable and non-machine-translatable parts of a given sentence for a particular machine translation (MT) system. They can be distinguished by calculating the similarity between a source-language sentence and its back translation for each part of the sentence. The parts with low similarities are highly likely to be non-machinetranslatable parts. We showed that the parts of a sentence that are automatically distinguished as non-machine-translatable provide useful information for paraphrasing or revising the sentence in the source language to improve the quality of the translation by the MT system. We also developed a method of providing knowledge useful to effectively paraphrasing or revising the detected non-machine-translatable parts. Two types of knowledge were extracted from the EDR dictionary: one for transforming a lexical entry into an expression used in the definition and the other for conducting the reverse paraphrasing, which transforms an expression found in a definition into the lexical entry. We found that the information provided by the methods helped improve the machine translatability of the originally input sentences.
本文言語 | English |
---|---|
ページ | 703-708 |
ページ数 | 6 |
出版ステータス | Published - 2006 |
外部発表 | はい |
イベント | 5th International Conference on Language Resources and Evaluation, LREC 2006 - Genoa, Italy 継続期間: 2006 5月 22 → 2006 5月 28 |
Other
Other | 5th International Conference on Language Resources and Evaluation, LREC 2006 |
---|---|
国/地域 | Italy |
City | Genoa |
Period | 06/5/22 → 06/5/28 |
ASJC Scopus subject areas
- 教育
- 図書館情報学
- 言語学および言語
- 言語および言語学