Fully-automatic marker-based chunking in 11 European languages and counts of the number of analogies between chunks

Kota Takeya, Yves Lepage

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Analogy has been proposed as a possible principle for example-based machine translation. For such a framework to work properly, the training data should contain a large number of analogies between sentences. Consequently, such a framework can only work properly with short and repetitive sentences. To handle longer and more varied sentences, cutting the sentences into chunks could be a solution if the number of analogies between chunks is confirmed to be large. This paper thus reports counts of number of analogies using different numbers of chunk markers in 11 European languages. These experiments confirm that the number of analogies between chunks is very large: several tens of thousands of analogies between chunks extracted from sentences among which only very few analogies, if not none, were found.

Original languageEnglish
Title of host publicationPACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation
Pages567-576
Number of pages10
Publication statusPublished - 2011
Event25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25 -
Duration: 2011 Dec 162011 Dec 18

Other

Other25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25
Period11/12/1611/12/18

Fingerprint

Experiments
Chunk
European Languages
Chunking
Machine Translation
Experiment

Keywords

  • Analogy
  • Branching entropy
  • Marker hypothesis
  • Marker-based chunking

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science (miscellaneous)

Cite this

Takeya, K., & Lepage, Y. (2011). Fully-automatic marker-based chunking in 11 European languages and counts of the number of analogies between chunks. In PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (pp. 567-576)

Fully-automatic marker-based chunking in 11 European languages and counts of the number of analogies between chunks. / Takeya, Kota; Lepage, Yves.

PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation. 2011. p. 567-576.

Research output: Chapter in Book/Report/Conference proceedingChapter

Takeya, K & Lepage, Y 2011, Fully-automatic marker-based chunking in 11 European languages and counts of the number of analogies between chunks. in PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation. pp. 567-576, 25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25, 11/12/16.
Takeya K, Lepage Y. Fully-automatic marker-based chunking in 11 European languages and counts of the number of analogies between chunks. In PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation. 2011. p. 567-576
Takeya, Kota ; Lepage, Yves. / Fully-automatic marker-based chunking in 11 European languages and counts of the number of analogies between chunks. PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation. 2011. pp. 567-576
@inbook{ce65af80d24e4e72b32008d1217f225f,
title = "Fully-automatic marker-based chunking in 11 European languages and counts of the number of analogies between chunks",
abstract = "Analogy has been proposed as a possible principle for example-based machine translation. For such a framework to work properly, the training data should contain a large number of analogies between sentences. Consequently, such a framework can only work properly with short and repetitive sentences. To handle longer and more varied sentences, cutting the sentences into chunks could be a solution if the number of analogies between chunks is confirmed to be large. This paper thus reports counts of number of analogies using different numbers of chunk markers in 11 European languages. These experiments confirm that the number of analogies between chunks is very large: several tens of thousands of analogies between chunks extracted from sentences among which only very few analogies, if not none, were found.",
keywords = "Analogy, Branching entropy, Marker hypothesis, Marker-based chunking",
author = "Kota Takeya and Yves Lepage",
year = "2011",
language = "English",
isbn = "9784905166023",
pages = "567--576",
booktitle = "PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation",

}

TY - CHAP

T1 - Fully-automatic marker-based chunking in 11 European languages and counts of the number of analogies between chunks

AU - Takeya, Kota

AU - Lepage, Yves

PY - 2011

Y1 - 2011

N2 - Analogy has been proposed as a possible principle for example-based machine translation. For such a framework to work properly, the training data should contain a large number of analogies between sentences. Consequently, such a framework can only work properly with short and repetitive sentences. To handle longer and more varied sentences, cutting the sentences into chunks could be a solution if the number of analogies between chunks is confirmed to be large. This paper thus reports counts of number of analogies using different numbers of chunk markers in 11 European languages. These experiments confirm that the number of analogies between chunks is very large: several tens of thousands of analogies between chunks extracted from sentences among which only very few analogies, if not none, were found.

AB - Analogy has been proposed as a possible principle for example-based machine translation. For such a framework to work properly, the training data should contain a large number of analogies between sentences. Consequently, such a framework can only work properly with short and repetitive sentences. To handle longer and more varied sentences, cutting the sentences into chunks could be a solution if the number of analogies between chunks is confirmed to be large. This paper thus reports counts of number of analogies using different numbers of chunk markers in 11 European languages. These experiments confirm that the number of analogies between chunks is very large: several tens of thousands of analogies between chunks extracted from sentences among which only very few analogies, if not none, were found.

KW - Analogy

KW - Branching entropy

KW - Marker hypothesis

KW - Marker-based chunking

UR - http://www.scopus.com/inward/record.url?scp=84863876755&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863876755&partnerID=8YFLogxK

M3 - Chapter

AN - SCOPUS:84863876755

SN - 9784905166023

SP - 567

EP - 576

BT - PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

ER -