Improving the distribution of N-grams in phrase tables obtained by the sampling-based method

Juan Luo, Adrien Lardilleux, Yves Lepage

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe an approach to improve the performance of sampling-based sub-sentential alignment method on translation tasks by investigating the distribution of n-grams in the phrase tables. This approach consists in enforcing the alignment of n-grams. We compare the quality of phrase translation tables output by this approach and that of the state-of-the-art estimation approach in statistical machine translation tasks. We report significant improvements for this approach and show that merging phrase tables outperforms the state-of-the-art techniques.

Original languageEnglish
Title of host publicationHuman Language Technology Challenges for Computer Science and Linguistics - 5th Language and Technology Conference, LTC 2011, Revised Selected Papers
PublisherSpringer Verlag
Pages419-431
Number of pages13
ISBN (Print)9783319089577
DOIs
Publication statusPublished - 2014 Jan 1
Event5th Language and Technology Conference, LTC 2011 - Poznan, Poland
Duration: 2011 Nov 252011 Nov 27

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8387 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference5th Language and Technology Conference, LTC 2011
CountryPoland
CityPoznan
Period11/11/2511/11/27

Keywords

  • Statistical machine translation
  • Sub-sentential alignment

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Improving the distribution of N-grams in phrase tables obtained by the sampling-based method'. Together they form a unique fingerprint.

  • Cite this

    Luo, J., Lardilleux, A., & Lepage, Y. (2014). Improving the distribution of N-grams in phrase tables obtained by the sampling-based method. In Human Language Technology Challenges for Computer Science and Linguistics - 5th Language and Technology Conference, LTC 2011, Revised Selected Papers (pp. 419-431). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8387 LNAI). Springer Verlag. https://doi.org/10.1007/978-3-319-08958-4_34