Pivot-based bilingual dictionary extraction from multiple dictionary resources

Mairidan Wushouer, Donghui Lin, Toru Ishida, Katsutoshi Hirayama

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

High quality bilingual dictionaries are rarely available for lower-density language pairs, especially for those that are closely related. Using a third language as a pivot to link two other languages is a wellknown solution, and usually requires only two input bilingual dictionaries to automatically induce the new one. This approach, however, produces many incorrect translation pairs because the dictionary entries are normally are not transitive due to polysemy and the ambiguous words in the pivot language. Utilizing the complete structures of the input bilingual dictionaries positively influences the result since dropped meanings can be countered. Moreover, an additional input dictionary may provide more complete information for calculating the semantic distance between word senses which is key to suppressing wrong sense matches. This paper proposes an extended constraint optimization model to inducing new dictionaries of closely related languages from multiple input dictionaries, and its formalization based on Integer Linear Programming. Evaluations indicated that the proposal not only outperforms the baseline method, but also shows improvements in performance and scalability as more dictionaries are utilized.

Original languageEnglish
Pages (from-to)221-234
Number of pages14
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8862
DOIs
Publication statusPublished - 2014 Jan 1
Externally publishedYes

Fingerprint

Pivot
Glossaries
Resources
Dictionary
Integer Linear Programming
Ambiguous
Formalization
Optimization Model
Linear programming
Scalability
Baseline
Semantics
Language
Evaluation

Keywords

  • Bilingual dictionary induction
  • Constraint satisfaction
  • Pseudo-boolean optimization

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Pivot-based bilingual dictionary extraction from multiple dictionary resources. / Wushouer, Mairidan; Lin, Donghui; Ishida, Toru; Hirayama, Katsutoshi.

In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 8862, 01.01.2014, p. 221-234.

Research output: Contribution to journalArticle

@article{c2cd6738aa434c7995497c4c6a7b96e6,
title = "Pivot-based bilingual dictionary extraction from multiple dictionary resources",
abstract = "High quality bilingual dictionaries are rarely available for lower-density language pairs, especially for those that are closely related. Using a third language as a pivot to link two other languages is a wellknown solution, and usually requires only two input bilingual dictionaries to automatically induce the new one. This approach, however, produces many incorrect translation pairs because the dictionary entries are normally are not transitive due to polysemy and the ambiguous words in the pivot language. Utilizing the complete structures of the input bilingual dictionaries positively influences the result since dropped meanings can be countered. Moreover, an additional input dictionary may provide more complete information for calculating the semantic distance between word senses which is key to suppressing wrong sense matches. This paper proposes an extended constraint optimization model to inducing new dictionaries of closely related languages from multiple input dictionaries, and its formalization based on Integer Linear Programming. Evaluations indicated that the proposal not only outperforms the baseline method, but also shows improvements in performance and scalability as more dictionaries are utilized.",
keywords = "Bilingual dictionary induction, Constraint satisfaction, Pseudo-boolean optimization",
author = "Mairidan Wushouer and Donghui Lin and Toru Ishida and Katsutoshi Hirayama",
year = "2014",
month = "1",
day = "1",
doi = "10.1007/978-3-319-13560-1",
language = "English",
volume = "8862",
pages = "221--234",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Pivot-based bilingual dictionary extraction from multiple dictionary resources

AU - Wushouer, Mairidan

AU - Lin, Donghui

AU - Ishida, Toru

AU - Hirayama, Katsutoshi

PY - 2014/1/1

Y1 - 2014/1/1

N2 - High quality bilingual dictionaries are rarely available for lower-density language pairs, especially for those that are closely related. Using a third language as a pivot to link two other languages is a wellknown solution, and usually requires only two input bilingual dictionaries to automatically induce the new one. This approach, however, produces many incorrect translation pairs because the dictionary entries are normally are not transitive due to polysemy and the ambiguous words in the pivot language. Utilizing the complete structures of the input bilingual dictionaries positively influences the result since dropped meanings can be countered. Moreover, an additional input dictionary may provide more complete information for calculating the semantic distance between word senses which is key to suppressing wrong sense matches. This paper proposes an extended constraint optimization model to inducing new dictionaries of closely related languages from multiple input dictionaries, and its formalization based on Integer Linear Programming. Evaluations indicated that the proposal not only outperforms the baseline method, but also shows improvements in performance and scalability as more dictionaries are utilized.

AB - High quality bilingual dictionaries are rarely available for lower-density language pairs, especially for those that are closely related. Using a third language as a pivot to link two other languages is a wellknown solution, and usually requires only two input bilingual dictionaries to automatically induce the new one. This approach, however, produces many incorrect translation pairs because the dictionary entries are normally are not transitive due to polysemy and the ambiguous words in the pivot language. Utilizing the complete structures of the input bilingual dictionaries positively influences the result since dropped meanings can be countered. Moreover, an additional input dictionary may provide more complete information for calculating the semantic distance between word senses which is key to suppressing wrong sense matches. This paper proposes an extended constraint optimization model to inducing new dictionaries of closely related languages from multiple input dictionaries, and its formalization based on Integer Linear Programming. Evaluations indicated that the proposal not only outperforms the baseline method, but also shows improvements in performance and scalability as more dictionaries are utilized.

KW - Bilingual dictionary induction

KW - Constraint satisfaction

KW - Pseudo-boolean optimization

UR - http://www.scopus.com/inward/record.url?scp=84911909033&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84911909033&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-13560-1

DO - 10.1007/978-3-319-13560-1

M3 - Article

AN - SCOPUS:84911909033

VL - 8862

SP - 221

EP - 234

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -