Hapax legomena: Their contribution in number and efficiency to word alignment

Adrien Lardilleux*, Yves Lepage

*この研究の対応する著者

研究成果: Conference contribution

2 被引用数 (Scopus)

抄録

Current techniques in word alignment disregard words with a low frequency because they would not be useful. Against this belief, this paper shows that, in particular, the notion of hapax legomena may contribute to word alignment to a large extent. In an experiment, we show that pairs of corpus hapaxes contribute to the majority of the best word alignments. In addition, we show that the notion of sentence hapax justifies a practical and common simplification of standard alignment methods.

本文言語English
ホスト出版物のタイトルHuman Language Technology
ホスト出版物のサブタイトルChallenges of the Information Society - Third Language and Technology Conference, LTC 2007, Revised Selected Papers
ページ440-450
ページ数11
DOI
出版ステータスPublished - 2009
外部発表はい
イベント3rd Language and Technology Conference, LTC 2007 - Poznan, Poland
継続期間: 2007 10 52007 10 7

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
5603 LNAI
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Conference

Conference3rd Language and Technology Conference, LTC 2007
国/地域Poland
CityPoznan
Period07/10/507/10/7

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Hapax legomena: Their contribution in number and efficiency to word alignment」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル