The structure of unseen trigrams and its application to language models: A first investigation

Yves Lepage, Julien Gosme, Adrien Lardilleux

研究成果: Conference contribution

1 被引用数 (Scopus)

抄録

In a series of preparatory experiments in 4 languages on subsets of the Europarl corpus, we show that a large number of unseen trigrams can be reconstructed by proportional analogy with trigrams having the lowest frequencies. We derive a very simple smoothing scheme from this empirical result and show that it outperforms Good-Turing and Kneser-Ney smoothing schemes on trigrams models in all 11 languages on the common multilingual part of the Europarl corpus, except Finnish.

本文言語English
ホスト出版物のタイトル2010 4th International Universal Communication Symposium, IUCS 2010 - Proceedings
ページ273-280
ページ数8
DOI
出版ステータスPublished - 2010
イベント2010 4th International Universal Communication Symposium, IUCS 2010 - Beijing, China
継続期間: 2010 10 182010 10 19

出版物シリーズ

名前2010 4th International Universal Communication Symposium, IUCS 2010 - Proceedings

Conference

Conference2010 4th International Universal Communication Symposium, IUCS 2010
国/地域China
CityBeijing
Period10/10/1810/10/19

ASJC Scopus subject areas

  • コンピュータ ネットワークおよび通信
  • 通信

フィンガープリント

「The structure of unseen trigrams and its application to language models: A first investigation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル