Statistical machine translation using hierarchical phrase alignment

Taro Watanabe, Kenji Imamura, Eiichiro Sumita, Hiroshi G. Okuno

研究成果: Article

抄録

The following three problems are known to exist with statistical machine translation. (1) the modeling problem involved in prescribing translation relations, (2) the problem of determining parameter settings from a text corpus of translations, and (3) the search problem involved in determining the output text (the translation) given a statistical model and an input text. In this paper we find alignments of translations using phrase-based units in a hierarchical fashion with the intention of solving the above-mentioned modeling and training problems with such hierarchical phrase alignments. As an initial method we perform chunking on the corpus on the basis of these hierarchical alignments, and create translation models using these chunks as translation units. Then, as a second method we convert the translation relations expressed in the hierarchical phrase alignments into correspondences in the translation model, and perform additional training having initialized the model parameters to values obtained from these relations. The results of experiments with Japanese-to-English translation show that both methods improve performance with the second method being particularly effective resulting in an increase in translation rate from 61.3% to 70.0%.

元の言語English
ページ(範囲)70-79
ページ数10
ジャーナルSystems and Computers in Japan
38
発行部数6
DOI
出版物ステータスPublished - 2007 6 15
外部発表Yes

Fingerprint

Statistical Machine Translation
Alignment
Unit
Search Problems
Modeling
Experiments
Statistical Model
Convert
Correspondence
Model

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems
  • Theoretical Computer Science
  • Computational Theory and Mathematics

これを引用

Statistical machine translation using hierarchical phrase alignment. / Watanabe, Taro; Imamura, Kenji; Sumita, Eiichiro; Okuno, Hiroshi G.

:: Systems and Computers in Japan, 巻 38, 番号 6, 15.06.2007, p. 70-79.

研究成果: Article

Watanabe, Taro ; Imamura, Kenji ; Sumita, Eiichiro ; Okuno, Hiroshi G. / Statistical machine translation using hierarchical phrase alignment. :: Systems and Computers in Japan. 2007 ; 巻 38, 番号 6. pp. 70-79.
@article{7e51aaf3d597482284b3f9e4f969bcf5,
title = "Statistical machine translation using hierarchical phrase alignment",
abstract = "The following three problems are known to exist with statistical machine translation. (1) the modeling problem involved in prescribing translation relations, (2) the problem of determining parameter settings from a text corpus of translations, and (3) the search problem involved in determining the output text (the translation) given a statistical model and an input text. In this paper we find alignments of translations using phrase-based units in a hierarchical fashion with the intention of solving the above-mentioned modeling and training problems with such hierarchical phrase alignments. As an initial method we perform chunking on the corpus on the basis of these hierarchical alignments, and create translation models using these chunks as translation units. Then, as a second method we convert the translation relations expressed in the hierarchical phrase alignments into correspondences in the translation model, and perform additional training having initialized the model parameters to values obtained from these relations. The results of experiments with Japanese-to-English translation show that both methods improve performance with the second method being particularly effective resulting in an increase in translation rate from 61.3{\%} to 70.0{\%}.",
keywords = "EM algorithm, Phrase alignments, Statistical machine translation",
author = "Taro Watanabe and Kenji Imamura and Eiichiro Sumita and Okuno, {Hiroshi G.}",
year = "2007",
month = "6",
day = "15",
doi = "10.1002/scj.20271",
language = "English",
volume = "38",
pages = "70--79",
journal = "Systems and Computers in Japan",
issn = "0882-1666",
publisher = "John Wiley and Sons Inc.",
number = "6",

}

TY - JOUR

T1 - Statistical machine translation using hierarchical phrase alignment

AU - Watanabe, Taro

AU - Imamura, Kenji

AU - Sumita, Eiichiro

AU - Okuno, Hiroshi G.

PY - 2007/6/15

Y1 - 2007/6/15

N2 - The following three problems are known to exist with statistical machine translation. (1) the modeling problem involved in prescribing translation relations, (2) the problem of determining parameter settings from a text corpus of translations, and (3) the search problem involved in determining the output text (the translation) given a statistical model and an input text. In this paper we find alignments of translations using phrase-based units in a hierarchical fashion with the intention of solving the above-mentioned modeling and training problems with such hierarchical phrase alignments. As an initial method we perform chunking on the corpus on the basis of these hierarchical alignments, and create translation models using these chunks as translation units. Then, as a second method we convert the translation relations expressed in the hierarchical phrase alignments into correspondences in the translation model, and perform additional training having initialized the model parameters to values obtained from these relations. The results of experiments with Japanese-to-English translation show that both methods improve performance with the second method being particularly effective resulting in an increase in translation rate from 61.3% to 70.0%.

AB - The following three problems are known to exist with statistical machine translation. (1) the modeling problem involved in prescribing translation relations, (2) the problem of determining parameter settings from a text corpus of translations, and (3) the search problem involved in determining the output text (the translation) given a statistical model and an input text. In this paper we find alignments of translations using phrase-based units in a hierarchical fashion with the intention of solving the above-mentioned modeling and training problems with such hierarchical phrase alignments. As an initial method we perform chunking on the corpus on the basis of these hierarchical alignments, and create translation models using these chunks as translation units. Then, as a second method we convert the translation relations expressed in the hierarchical phrase alignments into correspondences in the translation model, and perform additional training having initialized the model parameters to values obtained from these relations. The results of experiments with Japanese-to-English translation show that both methods improve performance with the second method being particularly effective resulting in an increase in translation rate from 61.3% to 70.0%.

KW - EM algorithm

KW - Phrase alignments

KW - Statistical machine translation

UR - http://www.scopus.com/inward/record.url?scp=34248143419&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34248143419&partnerID=8YFLogxK

U2 - 10.1002/scj.20271

DO - 10.1002/scj.20271

M3 - Article

AN - SCOPUS:34248143419

VL - 38

SP - 70

EP - 79

JO - Systems and Computers in Japan

JF - Systems and Computers in Japan

SN - 0882-1666

IS - 6

ER -