Constraint-based bilingual lexicon induction for closely related languages

Arbi Haza Nasution, Yohei Murakami, Toru Ishida

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task for low-resource languages. Pivot language and cognate recognition approach have been proven useful to induce bilingual lexicons for such languages. We analyze the features of closely related languages and define a semantic constraint assumption. Based on the assumption, we propose a constraint-based bilingual lexicon induction for closely related languages by extending constraints and translation pair candidates from recent pivot language approach. We further define three constraint sets based on language characteristics. In this paper, two controlled experiments are conducted. The former involves four closely related language pairs with different language pair similarities, and the latter focuses on sense connectivity between non-pivot words and pivot words. We evaluate our result with F-measure. The result indicates that our method works better on voluminous input dictionaries and high similarity languages. Finally, we introduce a strategy to use proper constraint sets for different goals and language characteristics.

Original languageEnglish
Title of host publicationProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
EditorsNicoletta Calzolari, Khalid Choukri, Helene Mazo, Asuncion Moreno, Thierry Declerck, Sara Goggi, Marko Grobelnik, Jan Odijk, Stelios Piperidis, Bente Maegaard, Joseph Mariani
PublisherEuropean Language Resources Association (ELRA)
Pages3291-3298
Number of pages8
ISBN (Electronic)9782951740891
Publication statusPublished - 2016 Jan 1
Externally publishedYes
Event10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia
Duration: 2016 May 232016 May 28

Publication series

NameProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

Other

Other10th International Conference on Language Resources and Evaluation, LREC 2016
CountrySlovenia
CityPortoroz
Period16/5/2316/5/28

Fingerprint

induction
language
Induction
Bilingual Lexicon
Language
dictionary
candidacy
semantics
lack
experiment

Keywords

  • Bilingual lexicon induction
  • Constraint satisfaction
  • Weighted partial maxsat

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Language and Linguistics
  • Education

Cite this

Nasution, A. H., Murakami, Y., & Ishida, T. (2016). Constraint-based bilingual lexicon induction for closely related languages. In N. Calzolari, K. Choukri, H. Mazo, A. Moreno, T. Declerck, S. Goggi, M. Grobelnik, J. Odijk, S. Piperidis, B. Maegaard, ... J. Mariani (Eds.), Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016 (pp. 3291-3298). (Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016). European Language Resources Association (ELRA).

Constraint-based bilingual lexicon induction for closely related languages. / Nasution, Arbi Haza; Murakami, Yohei; Ishida, Toru.

Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. ed. / Nicoletta Calzolari; Khalid Choukri; Helene Mazo; Asuncion Moreno; Thierry Declerck; Sara Goggi; Marko Grobelnik; Jan Odijk; Stelios Piperidis; Bente Maegaard; Joseph Mariani. European Language Resources Association (ELRA), 2016. p. 3291-3298 (Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nasution, AH, Murakami, Y & Ishida, T 2016, Constraint-based bilingual lexicon induction for closely related languages. in N Calzolari, K Choukri, H Mazo, A Moreno, T Declerck, S Goggi, M Grobelnik, J Odijk, S Piperidis, B Maegaard & J Mariani (eds), Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, European Language Resources Association (ELRA), pp. 3291-3298, 10th International Conference on Language Resources and Evaluation, LREC 2016, Portoroz, Slovenia, 16/5/23.
Nasution AH, Murakami Y, Ishida T. Constraint-based bilingual lexicon induction for closely related languages. In Calzolari N, Choukri K, Mazo H, Moreno A, Declerck T, Goggi S, Grobelnik M, Odijk J, Piperidis S, Maegaard B, Mariani J, editors, Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA). 2016. p. 3291-3298. (Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016).
Nasution, Arbi Haza ; Murakami, Yohei ; Ishida, Toru. / Constraint-based bilingual lexicon induction for closely related languages. Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. editor / Nicoletta Calzolari ; Khalid Choukri ; Helene Mazo ; Asuncion Moreno ; Thierry Declerck ; Sara Goggi ; Marko Grobelnik ; Jan Odijk ; Stelios Piperidis ; Bente Maegaard ; Joseph Mariani. European Language Resources Association (ELRA), 2016. pp. 3291-3298 (Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016).
@inproceedings{49ad71425f9b4450bb5f491c3fedec6b,
title = "Constraint-based bilingual lexicon induction for closely related languages",
abstract = "The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task for low-resource languages. Pivot language and cognate recognition approach have been proven useful to induce bilingual lexicons for such languages. We analyze the features of closely related languages and define a semantic constraint assumption. Based on the assumption, we propose a constraint-based bilingual lexicon induction for closely related languages by extending constraints and translation pair candidates from recent pivot language approach. We further define three constraint sets based on language characteristics. In this paper, two controlled experiments are conducted. The former involves four closely related language pairs with different language pair similarities, and the latter focuses on sense connectivity between non-pivot words and pivot words. We evaluate our result with F-measure. The result indicates that our method works better on voluminous input dictionaries and high similarity languages. Finally, we introduce a strategy to use proper constraint sets for different goals and language characteristics.",
keywords = "Bilingual lexicon induction, Constraint satisfaction, Weighted partial maxsat",
author = "Nasution, {Arbi Haza} and Yohei Murakami and Toru Ishida",
year = "2016",
month = "1",
day = "1",
language = "English",
series = "Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016",
publisher = "European Language Resources Association (ELRA)",
pages = "3291--3298",
editor = "Nicoletta Calzolari and Khalid Choukri and Helene Mazo and Asuncion Moreno and Thierry Declerck and Sara Goggi and Marko Grobelnik and Jan Odijk and Stelios Piperidis and Bente Maegaard and Joseph Mariani",
booktitle = "Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016",

}

TY - GEN

T1 - Constraint-based bilingual lexicon induction for closely related languages

AU - Nasution, Arbi Haza

AU - Murakami, Yohei

AU - Ishida, Toru

PY - 2016/1/1

Y1 - 2016/1/1

N2 - The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task for low-resource languages. Pivot language and cognate recognition approach have been proven useful to induce bilingual lexicons for such languages. We analyze the features of closely related languages and define a semantic constraint assumption. Based on the assumption, we propose a constraint-based bilingual lexicon induction for closely related languages by extending constraints and translation pair candidates from recent pivot language approach. We further define three constraint sets based on language characteristics. In this paper, two controlled experiments are conducted. The former involves four closely related language pairs with different language pair similarities, and the latter focuses on sense connectivity between non-pivot words and pivot words. We evaluate our result with F-measure. The result indicates that our method works better on voluminous input dictionaries and high similarity languages. Finally, we introduce a strategy to use proper constraint sets for different goals and language characteristics.

AB - The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task for low-resource languages. Pivot language and cognate recognition approach have been proven useful to induce bilingual lexicons for such languages. We analyze the features of closely related languages and define a semantic constraint assumption. Based on the assumption, we propose a constraint-based bilingual lexicon induction for closely related languages by extending constraints and translation pair candidates from recent pivot language approach. We further define three constraint sets based on language characteristics. In this paper, two controlled experiments are conducted. The former involves four closely related language pairs with different language pair similarities, and the latter focuses on sense connectivity between non-pivot words and pivot words. We evaluate our result with F-measure. The result indicates that our method works better on voluminous input dictionaries and high similarity languages. Finally, we introduce a strategy to use proper constraint sets for different goals and language characteristics.

KW - Bilingual lexicon induction

KW - Constraint satisfaction

KW - Weighted partial maxsat

UR - http://www.scopus.com/inward/record.url?scp=85034636317&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85034636317&partnerID=8YFLogxK

M3 - Conference contribution

T3 - Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

SP - 3291

EP - 3298

BT - Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

A2 - Calzolari, Nicoletta

A2 - Choukri, Khalid

A2 - Mazo, Helene

A2 - Moreno, Asuncion

A2 - Declerck, Thierry

A2 - Goggi, Sara

A2 - Grobelnik, Marko

A2 - Odijk, Jan

A2 - Piperidis, Stelios

A2 - Maegaard, Bente

A2 - Mariani, Joseph

PB - European Language Resources Association (ELRA)

ER -