A constraint approach to lexicon induction for low-resource languages

Mairidan Wushouer, Donghui Lin, Toru Ishida, Yohei Murakami

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Bilingual lexicon is a useful language resource, but such data rarely available for lower-density language pairs, especially for those that are closely related. The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task. Using a third language to link two other languages is a well-known solution in low-resource situation, which usually requires only two input bilingual lexicons to automatically induce the new one. This approach, however, is weak in measuring semantic distance between bilingual word pairs because it has never been demonstrated to utilize the complete structures of the input bilingual lexicons as dropped meanings negatively influence the result. This research discuss a constraint approach to pivot-based lexicon induction in case the target language pair are closely related. We create constraints from language similarity and model the structures of the input dictionaries as an optimization problem whose solution produces optimally correct target bilingual lexicon. In addition, we enable created bilingual lexicons of low-resource languages accessible through service grid federation.

Original languageEnglish
Title of host publicationCognitive Technologies
PublisherSpringer-Verlag
Pages109-123
Number of pages15
Edition9789811077920
DOIs
Publication statusPublished - 2018 Jan 1
Externally publishedYes

Publication series

NameCognitive Technologies
Number9789811077920
ISSN (Print)1611-2482

Fingerprint

Glossaries
Semantics

Keywords

  • Bilingual dictionary induction
  • Constraint satisfaction problem
  • Low-resource languages
  • Pivot language
  • Weighted partial max-SAT

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Cite this

Wushouer, M., Lin, D., Ishida, T., & Murakami, Y. (2018). A constraint approach to lexicon induction for low-resource languages. In Cognitive Technologies (9789811077920 ed., pp. 109-123). (Cognitive Technologies; No. 9789811077920). Springer-Verlag. https://doi.org/10.1007/978-981-10-7793-7_7

A constraint approach to lexicon induction for low-resource languages. / Wushouer, Mairidan; Lin, Donghui; Ishida, Toru; Murakami, Yohei.

Cognitive Technologies. 9789811077920. ed. Springer-Verlag, 2018. p. 109-123 (Cognitive Technologies; No. 9789811077920).

Research output: Chapter in Book/Report/Conference proceedingChapter

Wushouer, M, Lin, D, Ishida, T & Murakami, Y 2018, A constraint approach to lexicon induction for low-resource languages. in Cognitive Technologies. 9789811077920 edn, Cognitive Technologies, no. 9789811077920, Springer-Verlag, pp. 109-123. https://doi.org/10.1007/978-981-10-7793-7_7
Wushouer M, Lin D, Ishida T, Murakami Y. A constraint approach to lexicon induction for low-resource languages. In Cognitive Technologies. 9789811077920 ed. Springer-Verlag. 2018. p. 109-123. (Cognitive Technologies; 9789811077920). https://doi.org/10.1007/978-981-10-7793-7_7
Wushouer, Mairidan ; Lin, Donghui ; Ishida, Toru ; Murakami, Yohei. / A constraint approach to lexicon induction for low-resource languages. Cognitive Technologies. 9789811077920. ed. Springer-Verlag, 2018. pp. 109-123 (Cognitive Technologies; 9789811077920).
@inbook{8af9656bb36c47918db3d92721108e83,
title = "A constraint approach to lexicon induction for low-resource languages",
abstract = "Bilingual lexicon is a useful language resource, but such data rarely available for lower-density language pairs, especially for those that are closely related. The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task. Using a third language to link two other languages is a well-known solution in low-resource situation, which usually requires only two input bilingual lexicons to automatically induce the new one. This approach, however, is weak in measuring semantic distance between bilingual word pairs because it has never been demonstrated to utilize the complete structures of the input bilingual lexicons as dropped meanings negatively influence the result. This research discuss a constraint approach to pivot-based lexicon induction in case the target language pair are closely related. We create constraints from language similarity and model the structures of the input dictionaries as an optimization problem whose solution produces optimally correct target bilingual lexicon. In addition, we enable created bilingual lexicons of low-resource languages accessible through service grid federation.",
keywords = "Bilingual dictionary induction, Constraint satisfaction problem, Low-resource languages, Pivot language, Weighted partial max-SAT",
author = "Mairidan Wushouer and Donghui Lin and Toru Ishida and Yohei Murakami",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-981-10-7793-7_7",
language = "English",
series = "Cognitive Technologies",
publisher = "Springer-Verlag",
number = "9789811077920",
pages = "109--123",
booktitle = "Cognitive Technologies",
edition = "9789811077920",

}

TY - CHAP

T1 - A constraint approach to lexicon induction for low-resource languages

AU - Wushouer, Mairidan

AU - Lin, Donghui

AU - Ishida, Toru

AU - Murakami, Yohei

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Bilingual lexicon is a useful language resource, but such data rarely available for lower-density language pairs, especially for those that are closely related. The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task. Using a third language to link two other languages is a well-known solution in low-resource situation, which usually requires only two input bilingual lexicons to automatically induce the new one. This approach, however, is weak in measuring semantic distance between bilingual word pairs because it has never been demonstrated to utilize the complete structures of the input bilingual lexicons as dropped meanings negatively influence the result. This research discuss a constraint approach to pivot-based lexicon induction in case the target language pair are closely related. We create constraints from language similarity and model the structures of the input dictionaries as an optimization problem whose solution produces optimally correct target bilingual lexicon. In addition, we enable created bilingual lexicons of low-resource languages accessible through service grid federation.

AB - Bilingual lexicon is a useful language resource, but such data rarely available for lower-density language pairs, especially for those that are closely related. The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task. Using a third language to link two other languages is a well-known solution in low-resource situation, which usually requires only two input bilingual lexicons to automatically induce the new one. This approach, however, is weak in measuring semantic distance between bilingual word pairs because it has never been demonstrated to utilize the complete structures of the input bilingual lexicons as dropped meanings negatively influence the result. This research discuss a constraint approach to pivot-based lexicon induction in case the target language pair are closely related. We create constraints from language similarity and model the structures of the input dictionaries as an optimization problem whose solution produces optimally correct target bilingual lexicon. In addition, we enable created bilingual lexicons of low-resource languages accessible through service grid federation.

KW - Bilingual dictionary induction

KW - Constraint satisfaction problem

KW - Low-resource languages

KW - Pivot language

KW - Weighted partial max-SAT

UR - http://www.scopus.com/inward/record.url?scp=85042561824&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85042561824&partnerID=8YFLogxK

U2 - 10.1007/978-981-10-7793-7_7

DO - 10.1007/978-981-10-7793-7_7

M3 - Chapter

T3 - Cognitive Technologies

SP - 109

EP - 123

BT - Cognitive Technologies

PB - Springer-Verlag

ER -