Extraction of Potentially Useful Phrase Pairs for Statistical Machine Translation

Juan Luo, Yves Lepage

Research output: Contribution to journalArticle

Abstract

Over the last decade, an increasing amount of work has been done to advance the phrase-based statistical machine translation model in which the method of extracting phrase pairs consists of word alignment and phrase extraction. In this paper, we show that, for Japanese-English and Chinese-English statistical machine translation systems, this method is indeed missing potentially useful phrase pairs which could lead to better translation scores. These potentially useful phrase pairs can be detected by looking at the segmentation traces after decoding. We choose to see the problem of extracting potentially useful phrase pairs as a two-class classification problem: among all the possible phrase pairs, distinguish the useful ones from the not-useful ones. As for any classification problem, the question is to discover the relevant features which contribute the most. Extracting potentially useful phrase pairs resulted in a statistically significant improvement of 7.65 BLEU points in English-Chinese and 7.61 BLEU points in Chinese-English experiments. A slight increase of 0.94 BLEU points and 0.4 BLEU points is also observed for English-Japanese system and Japanese-English system, respectively.

Original languageEnglish
Pages (from-to)344-352
Number of pages9
JournalJournal of Information Processing
Volume23
Issue number3
DOIs
Publication statusPublished - 2015

Fingerprint

Decoding
Experiments

Keywords

  • Classification model
  • Phrase table
  • Statistical machine translation

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Extraction of Potentially Useful Phrase Pairs for Statistical Machine Translation. / Luo, Juan; Lepage, Yves.

In: Journal of Information Processing, Vol. 23, No. 3, 2015, p. 344-352.

Research output: Contribution to journalArticle

@article{4202c859ea2e45bb9e7178b05d06323a,
title = "Extraction of Potentially Useful Phrase Pairs for Statistical Machine Translation",
abstract = "Over the last decade, an increasing amount of work has been done to advance the phrase-based statistical machine translation model in which the method of extracting phrase pairs consists of word alignment and phrase extraction. In this paper, we show that, for Japanese-English and Chinese-English statistical machine translation systems, this method is indeed missing potentially useful phrase pairs which could lead to better translation scores. These potentially useful phrase pairs can be detected by looking at the segmentation traces after decoding. We choose to see the problem of extracting potentially useful phrase pairs as a two-class classification problem: among all the possible phrase pairs, distinguish the useful ones from the not-useful ones. As for any classification problem, the question is to discover the relevant features which contribute the most. Extracting potentially useful phrase pairs resulted in a statistically significant improvement of 7.65 BLEU points in English-Chinese and 7.61 BLEU points in Chinese-English experiments. A slight increase of 0.94 BLEU points and 0.4 BLEU points is also observed for English-Japanese system and Japanese-English system, respectively.",
keywords = "Classification model, Phrase table, Statistical machine translation",
author = "Juan Luo and Yves Lepage",
year = "2015",
doi = "10.2197/ipsjjip.23.344",
language = "English",
volume = "23",
pages = "344--352",
journal = "Journal of Information Processing",
issn = "0387-5806",
publisher = "Information Processing Society of Japan",
number = "3",

}

TY - JOUR

T1 - Extraction of Potentially Useful Phrase Pairs for Statistical Machine Translation

AU - Luo, Juan

AU - Lepage, Yves

PY - 2015

Y1 - 2015

N2 - Over the last decade, an increasing amount of work has been done to advance the phrase-based statistical machine translation model in which the method of extracting phrase pairs consists of word alignment and phrase extraction. In this paper, we show that, for Japanese-English and Chinese-English statistical machine translation systems, this method is indeed missing potentially useful phrase pairs which could lead to better translation scores. These potentially useful phrase pairs can be detected by looking at the segmentation traces after decoding. We choose to see the problem of extracting potentially useful phrase pairs as a two-class classification problem: among all the possible phrase pairs, distinguish the useful ones from the not-useful ones. As for any classification problem, the question is to discover the relevant features which contribute the most. Extracting potentially useful phrase pairs resulted in a statistically significant improvement of 7.65 BLEU points in English-Chinese and 7.61 BLEU points in Chinese-English experiments. A slight increase of 0.94 BLEU points and 0.4 BLEU points is also observed for English-Japanese system and Japanese-English system, respectively.

AB - Over the last decade, an increasing amount of work has been done to advance the phrase-based statistical machine translation model in which the method of extracting phrase pairs consists of word alignment and phrase extraction. In this paper, we show that, for Japanese-English and Chinese-English statistical machine translation systems, this method is indeed missing potentially useful phrase pairs which could lead to better translation scores. These potentially useful phrase pairs can be detected by looking at the segmentation traces after decoding. We choose to see the problem of extracting potentially useful phrase pairs as a two-class classification problem: among all the possible phrase pairs, distinguish the useful ones from the not-useful ones. As for any classification problem, the question is to discover the relevant features which contribute the most. Extracting potentially useful phrase pairs resulted in a statistically significant improvement of 7.65 BLEU points in English-Chinese and 7.61 BLEU points in Chinese-English experiments. A slight increase of 0.94 BLEU points and 0.4 BLEU points is also observed for English-Japanese system and Japanese-English system, respectively.

KW - Classification model

KW - Phrase table

KW - Statistical machine translation

UR - http://www.scopus.com/inward/record.url?scp=84929393599&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84929393599&partnerID=8YFLogxK

U2 - 10.2197/ipsjjip.23.344

DO - 10.2197/ipsjjip.23.344

M3 - Article

AN - SCOPUS:84929393599

VL - 23

SP - 344

EP - 352

JO - Journal of Information Processing

JF - Journal of Information Processing

SN - 0387-5806

IS - 3

ER -