Fully-automatic marker-based chunking in 11 European languages and counts of the number of analogies between chunks

Kota Takeya, Yves Lepage

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Analogy has been proposed as a possible principle for example-based machine translation. For such a framework to work properly, the training data should contain a large number of analogies between sentences. Consequently, such a framework can only work properly with short and repetitive sentences. To handle longer and more varied sentences, cutting the sentences into chunks could be a solution if the number of analogies between chunks is confirmed to be large. This paper thus reports counts of number of analogies using different numbers of chunk markers in 11 European languages. These experiments confirm that the number of analogies between chunks is very large: several tens of thousands of analogies between chunks extracted from sentences among which only very few analogies, if not none, were found.

Original languageEnglish
Title of host publicationPACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation
Pages567-576
Number of pages10
Publication statusPublished - 2011 Dec 1
Event25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25 - , Singapore
Duration: 2011 Dec 162011 Dec 18

Publication series

NamePACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

Conference

Conference25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25
CountrySingapore
Period11/12/1611/12/18

Keywords

  • Analogy
  • Branching entropy
  • Marker hypothesis
  • Marker-based chunking

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science (miscellaneous)

Fingerprint Dive into the research topics of 'Fully-automatic marker-based chunking in 11 European languages and counts of the number of analogies between chunks'. Together they form a unique fingerprint.

  • Cite this

    Takeya, K., & Lepage, Y. (2011). Fully-automatic marker-based chunking in 11 European languages and counts of the number of analogies between chunks. In PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (pp. 567-576). (PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation).