Fully-automatic marker-based chunking in 11 European languages and counts of the number of analogies between chunks

Kota Takeya, Yves Lepage

研究成果: Conference contribution

抄録

Analogy has been proposed as a possible principle for example-based machine translation. For such a framework to work properly, the training data should contain a large number of analogies between sentences. Consequently, such a framework can only work properly with short and repetitive sentences. To handle longer and more varied sentences, cutting the sentences into chunks could be a solution if the number of analogies between chunks is confirmed to be large. This paper thus reports counts of number of analogies using different numbers of chunk markers in 11 European languages. These experiments confirm that the number of analogies between chunks is very large: several tens of thousands of analogies between chunks extracted from sentences among which only very few analogies, if not none, were found.

本文言語English
ホスト出版物のタイトルPACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation
ページ567-576
ページ数10
出版ステータスPublished - 2011 12 1
イベント25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25 - , Singapore
継続期間: 2011 12 162011 12 18

出版物シリーズ

名前PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

Conference

Conference25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25
CountrySingapore
Period11/12/1611/12/18

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science (miscellaneous)

フィンガープリント 「Fully-automatic marker-based chunking in 11 European languages and counts of the number of analogies between chunks」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル