TY - GEN
T1 - Hapax legomena
T2 - 3rd Language and Technology Conference, LTC 2007
AU - Lardilleux, Adrien
AU - Lepage, Yves
PY - 2009
Y1 - 2009
N2 - Current techniques in word alignment disregard words with a low frequency because they would not be useful. Against this belief, this paper shows that, in particular, the notion of hapax legomena may contribute to word alignment to a large extent. In an experiment, we show that pairs of corpus hapaxes contribute to the majority of the best word alignments. In addition, we show that the notion of sentence hapax justifies a practical and common simplification of standard alignment methods.
AB - Current techniques in word alignment disregard words with a low frequency because they would not be useful. Against this belief, this paper shows that, in particular, the notion of hapax legomena may contribute to word alignment to a large extent. In an experiment, we show that pairs of corpus hapaxes contribute to the majority of the best word alignments. In addition, we show that the notion of sentence hapax justifies a practical and common simplification of standard alignment methods.
KW - Hapax
KW - Low frequency term
KW - Word alignment
UR - http://www.scopus.com/inward/record.url?scp=70349330145&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349330145&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-04235-5_38
DO - 10.1007/978-3-642-04235-5_38
M3 - Conference contribution
AN - SCOPUS:70349330145
SN - 3642042341
SN - 9783642042348
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 440
EP - 450
BT - Human Language Technology
Y2 - 5 October 2007 through 7 October 2007
ER -