Improving the distribution of N-grams in phrase tables obtained by the sampling-based method

Juan Luo, Adrien Lardilleux, Yves Lepage

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe an approach to improve the performance of sampling-based sub-sentential alignment method on translation tasks by investigating the distribution of n-grams in the phrase tables. This approach consists in enforcing the alignment of n-grams. We compare the quality of phrase translation tables output by this approach and that of the state-of-the-art estimation approach in statistical machine translation tasks. We report significant improvements for this approach and show that merging phrase tables outperforms the state-of-the-art techniques.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages419-431
Number of pages13
Volume8387 LNAI
ISBN (Print)9783319089577
DOIs
Publication statusPublished - 2014
Event5th Language and Technology Conference, LTC 2011 - Poznan
Duration: 2011 Nov 252011 Nov 27

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8387 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other5th Language and Technology Conference, LTC 2011
CityPoznan
Period11/11/2511/11/27

Fingerprint

N-gram
Tables
Sampling
Alignment
Merging
Statistical Machine Translation
Output

Keywords

  • Statistical machine translation
  • Sub-sentential alignment

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Luo, J., Lardilleux, A., & Lepage, Y. (2014). Improving the distribution of N-grams in phrase tables obtained by the sampling-based method. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8387 LNAI, pp. 419-431). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8387 LNAI). Springer Verlag. https://doi.org/10.1007/978-3-319-08958-4_34

Improving the distribution of N-grams in phrase tables obtained by the sampling-based method. / Luo, Juan; Lardilleux, Adrien; Lepage, Yves.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8387 LNAI Springer Verlag, 2014. p. 419-431 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8387 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Luo, J, Lardilleux, A & Lepage, Y 2014, Improving the distribution of N-grams in phrase tables obtained by the sampling-based method. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 8387 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8387 LNAI, Springer Verlag, pp. 419-431, 5th Language and Technology Conference, LTC 2011, Poznan, 11/11/25. https://doi.org/10.1007/978-3-319-08958-4_34
Luo J, Lardilleux A, Lepage Y. Improving the distribution of N-grams in phrase tables obtained by the sampling-based method. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8387 LNAI. Springer Verlag. 2014. p. 419-431. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-08958-4_34
Luo, Juan ; Lardilleux, Adrien ; Lepage, Yves. / Improving the distribution of N-grams in phrase tables obtained by the sampling-based method. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8387 LNAI Springer Verlag, 2014. pp. 419-431 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{592b93620a344b19ab77ba8dc00ceeae,
title = "Improving the distribution of N-grams in phrase tables obtained by the sampling-based method",
abstract = "We describe an approach to improve the performance of sampling-based sub-sentential alignment method on translation tasks by investigating the distribution of n-grams in the phrase tables. This approach consists in enforcing the alignment of n-grams. We compare the quality of phrase translation tables output by this approach and that of the state-of-the-art estimation approach in statistical machine translation tasks. We report significant improvements for this approach and show that merging phrase tables outperforms the state-of-the-art techniques.",
keywords = "Statistical machine translation, Sub-sentential alignment",
author = "Juan Luo and Adrien Lardilleux and Yves Lepage",
year = "2014",
doi = "10.1007/978-3-319-08958-4_34",
language = "English",
isbn = "9783319089577",
volume = "8387 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "419--431",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Improving the distribution of N-grams in phrase tables obtained by the sampling-based method

AU - Luo, Juan

AU - Lardilleux, Adrien

AU - Lepage, Yves

PY - 2014

Y1 - 2014

N2 - We describe an approach to improve the performance of sampling-based sub-sentential alignment method on translation tasks by investigating the distribution of n-grams in the phrase tables. This approach consists in enforcing the alignment of n-grams. We compare the quality of phrase translation tables output by this approach and that of the state-of-the-art estimation approach in statistical machine translation tasks. We report significant improvements for this approach and show that merging phrase tables outperforms the state-of-the-art techniques.

AB - We describe an approach to improve the performance of sampling-based sub-sentential alignment method on translation tasks by investigating the distribution of n-grams in the phrase tables. This approach consists in enforcing the alignment of n-grams. We compare the quality of phrase translation tables output by this approach and that of the state-of-the-art estimation approach in statistical machine translation tasks. We report significant improvements for this approach and show that merging phrase tables outperforms the state-of-the-art techniques.

KW - Statistical machine translation

KW - Sub-sentential alignment

UR - http://www.scopus.com/inward/record.url?scp=84905855861&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905855861&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-08958-4_34

DO - 10.1007/978-3-319-08958-4_34

M3 - Conference contribution

AN - SCOPUS:84905855861

SN - 9783319089577

VL - 8387 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 419

EP - 431

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -