Hapax legomena: Their contribution in number and efficiency to word alignment

Adrien Lardilleux*, Yves Lepage

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Current techniques in word alignment disregard words with a low frequency because they would not be useful. Against this belief, this paper shows that, in particular, the notion of hapax legomena may contribute to word alignment to a large extent. In an experiment, we show that pairs of corpus hapaxes contribute to the majority of the best word alignments. In addition, we show that the notion of sentence hapax justifies a practical and common simplification of standard alignment methods.

Original languageEnglish
Title of host publicationHuman Language Technology
Subtitle of host publicationChallenges of the Information Society - Third Language and Technology Conference, LTC 2007, Revised Selected Papers
Pages440-450
Number of pages11
DOIs
Publication statusPublished - 2009
Externally publishedYes
Event3rd Language and Technology Conference, LTC 2007 - Poznan, Poland
Duration: 2007 Oct 52007 Oct 7

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5603 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference3rd Language and Technology Conference, LTC 2007
Country/TerritoryPoland
CityPoznan
Period07/10/507/10/7

Keywords

  • Hapax
  • Low frequency term
  • Word alignment

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Hapax legomena: Their contribution in number and efficiency to word alignment'. Together they form a unique fingerprint.

Cite this