Hapax legomena: Their contribution in number and efficiency to word alignment

Adrien Lardilleux, Yves Lepage

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Current techniques in word alignment disregard words with a low frequency because they would not be useful. Against this belief, this paper shows that, in particular, the notion of hapax legomena may contribute to word alignment to a large extent. In an experiment, we show that pairs of corpus hapaxes contribute to the majority of the best word alignments. In addition, we show that the notion of sentence hapax justifies a practical and common simplification of standard alignment methods.

Original languageEnglish
Title of host publicationHuman Language Technology
Subtitle of host publicationChallenges of the Information Society - Third Language and Technology Conference, LTC 2007, Revised Selected Papers
Pages440-450
Number of pages11
DOIs
Publication statusPublished - 2009 Sep 28
Externally publishedYes
Event3rd Language and Technology Conference, LTC 2007 - Poznan, Poland
Duration: 2007 Oct 52007 Oct 7

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5603 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference3rd Language and Technology Conference, LTC 2007
CountryPoland
CityPoznan
Period07/10/507/10/7

Keywords

  • Hapax
  • Low frequency term
  • Word alignment

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Hapax legomena: Their contribution in number and efficiency to word alignment'. Together they form a unique fingerprint.

  • Cite this

    Lardilleux, A., & Lepage, Y. (2009). Hapax legomena: Their contribution in number and efficiency to word alignment. In Human Language Technology: Challenges of the Information Society - Third Language and Technology Conference, LTC 2007, Revised Selected Papers (pp. 440-450). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5603 LNAI). https://doi.org/10.1007/978-3-642-04235-5_38