Pivot-based bilingual dictionary extraction from multiple dictionary resources

Mairidan Wushouer, Donghui Lin, Toru Ishida, Katsutoshi Hirayama

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

High quality bilingual dictionaries are rarely available for lower-density language pairs, especially for those that are closely related. Using a third language as a pivot to link two other languages is a wellknown solution, and usually requires only two input bilingual dictionaries to automatically induce the new one. This approach, however, produces many incorrect translation pairs because the dictionary entries are normally are not transitive due to polysemy and the ambiguous words in the pivot language. Utilizing the complete structures of the input bilingual dictionaries positively influences the result since dropped meanings can be countered. Moreover, an additional input dictionary may provide more complete information for calculating the semantic distance between word senses which is key to suppressing wrong sense matches. This paper proposes an extended constraint optimization model to inducing new dictionaries of closely related languages from multiple input dictionaries, and its formalization based on Integer Linear Programming. Evaluations indicated that the proposal not only outperforms the baseline method, but also shows improvements in performance and scalability as more dictionaries are utilized.

Original languageEnglish
Pages (from-to)221-234
Number of pages14
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8862
DOIs
Publication statusPublished - 2014

Keywords

  • Bilingual dictionary induction
  • Constraint satisfaction
  • Pseudo-boolean optimization

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Pivot-based bilingual dictionary extraction from multiple dictionary resources'. Together they form a unique fingerprint.

  • Cite this