HSSA tree structures for BTG-based preordering in machine translation

Yujia Zhang, Hao Wang, Yves Lepage

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Hierarchical Sub-Sentential Alignment (HSSA) method is a method to obtain aligned binary tree structures for two aligned sentences in translation correspondence. We propose to use the binary aligned tree structures delivered by this method as training data for preordering prior to machine translation. For that, we learn a Bracketing Transduction Grammar (BTG) from these binary aligned tree structures. In two oracle experiments in English to Japanese and Japanese to English translation, we show that it is theoretically possible to outperform a baseline system with a default distortion limit of 6, by about 2.5 and 5 BLEU points and, 7 and 10 RIBES points respectively, when preordering the source sentences using the learnt preordering model and using a distortion limit of 0. An attempt at learning a preordering model and its results are also reported.

Original languageEnglish
Title of host publicationProceedings of the 30th Pacific Asia Conference on Language, Information and Computation, PACLIC 2016
PublisherInstitute for the Study of Language and Information
Pages123-132
Number of pages10
ISBN (Electronic)9788968174285
Publication statusPublished - 2016
Event30th Pacific Asia Conference on Language, Information and Computation, PACLIC 2016 - Seoul, Korea, Republic of
Duration: 2016 Oct 282016 Oct 30

Other

Other30th Pacific Asia Conference on Language, Information and Computation, PACLIC 2016
CountryKorea, Republic of
CitySeoul
Period16/10/2816/10/30

    Fingerprint

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science (miscellaneous)
  • Information Systems

Cite this

Zhang, Y., Wang, H., & Lepage, Y. (2016). HSSA tree structures for BTG-based preordering in machine translation. In Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation, PACLIC 2016 (pp. 123-132). Institute for the Study of Language and Information.