Bilingual dictionary induction as an optimization problem

Mairidan Wushouer, Donghui Lin, Toru Ishida, Katsutoshi Hirayama

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Bilingual dictionaries are vital in many areas of natural language processing, but such resources are rarely available for lower-density language pairs, especially for those that are closely related. Pivot-based induction consists of using a third language to bridge a language pair. As an approach to create new dictionaries, it can generate wrong translations due to polysemy and ambiguous words. In this paper we propose a constraint approach to pivot-based dictionary induction for the case of two closely related languages. In order to take into account the word senses, we use an approach based on semantic distances, in which possibly missing translations are considered, and instance of induction is encoded as an optimization problem to generate new dictionary. Evaluations show that the proposal achieves 83.7% accuracy and approximately 70.5% recall, thus outperforming the baseline pivot-based method.

Original languageEnglish
Title of host publicationProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
EditorsNicoletta Calzolari, Khalid Choukri, Sara Goggi, Thierry Declerck, Joseph Mariani, Bente Maegaard, Asuncion Moreno, Jan Odijk, Helene Mazo, Stelios Piperidis, Hrafn Loftsson
PublisherEuropean Language Resources Association (ELRA)
Pages2122-2129
Number of pages8
ISBN (Electronic)9782951740884
Publication statusPublished - 2014 Jan 1
Externally publishedYes
Event9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland
Duration: 2014 May 262014 May 31

Publication series

NameProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

Other

Other9th International Conference on Language Resources and Evaluation, LREC 2014
CountryIceland
CityReykjavik
Period14/5/2614/5/31

Fingerprint

induction
dictionary
language
semantics
Induction
Bilingual Dictionary
Language
Dictionary
evaluation
resources

Keywords

  • Bilingual Dictionary Induction
  • Constraint Satisfaction
  • Weighted Partial Max-SAT

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Education
  • Language and Linguistics

Cite this

Wushouer, M., Lin, D., Ishida, T., & Hirayama, K. (2014). Bilingual dictionary induction as an optimization problem. In N. Calzolari, K. Choukri, S. Goggi, T. Declerck, J. Mariani, B. Maegaard, A. Moreno, J. Odijk, H. Mazo, S. Piperidis, ... H. Loftsson (Eds.), Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014 (pp. 2122-2129). (Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014). European Language Resources Association (ELRA).

Bilingual dictionary induction as an optimization problem. / Wushouer, Mairidan; Lin, Donghui; Ishida, Toru; Hirayama, Katsutoshi.

Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. ed. / Nicoletta Calzolari; Khalid Choukri; Sara Goggi; Thierry Declerck; Joseph Mariani; Bente Maegaard; Asuncion Moreno; Jan Odijk; Helene Mazo; Stelios Piperidis; Hrafn Loftsson. European Language Resources Association (ELRA), 2014. p. 2122-2129 (Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wushouer, M, Lin, D, Ishida, T & Hirayama, K 2014, Bilingual dictionary induction as an optimization problem. in N Calzolari, K Choukri, S Goggi, T Declerck, J Mariani, B Maegaard, A Moreno, J Odijk, H Mazo, S Piperidis & H Loftsson (eds), Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, European Language Resources Association (ELRA), pp. 2122-2129, 9th International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, 14/5/26.
Wushouer M, Lin D, Ishida T, Hirayama K. Bilingual dictionary induction as an optimization problem. In Calzolari N, Choukri K, Goggi S, Declerck T, Mariani J, Maegaard B, Moreno A, Odijk J, Mazo H, Piperidis S, Loftsson H, editors, Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA). 2014. p. 2122-2129. (Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014).
Wushouer, Mairidan ; Lin, Donghui ; Ishida, Toru ; Hirayama, Katsutoshi. / Bilingual dictionary induction as an optimization problem. Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. editor / Nicoletta Calzolari ; Khalid Choukri ; Sara Goggi ; Thierry Declerck ; Joseph Mariani ; Bente Maegaard ; Asuncion Moreno ; Jan Odijk ; Helene Mazo ; Stelios Piperidis ; Hrafn Loftsson. European Language Resources Association (ELRA), 2014. pp. 2122-2129 (Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014).
@inproceedings{e090b059891d46c0a3679771acc8b387,
title = "Bilingual dictionary induction as an optimization problem",
abstract = "Bilingual dictionaries are vital in many areas of natural language processing, but such resources are rarely available for lower-density language pairs, especially for those that are closely related. Pivot-based induction consists of using a third language to bridge a language pair. As an approach to create new dictionaries, it can generate wrong translations due to polysemy and ambiguous words. In this paper we propose a constraint approach to pivot-based dictionary induction for the case of two closely related languages. In order to take into account the word senses, we use an approach based on semantic distances, in which possibly missing translations are considered, and instance of induction is encoded as an optimization problem to generate new dictionary. Evaluations show that the proposal achieves 83.7{\%} accuracy and approximately 70.5{\%} recall, thus outperforming the baseline pivot-based method.",
keywords = "Bilingual Dictionary Induction, Constraint Satisfaction, Weighted Partial Max-SAT",
author = "Mairidan Wushouer and Donghui Lin and Toru Ishida and Katsutoshi Hirayama",
year = "2014",
month = "1",
day = "1",
language = "English",
series = "Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014",
publisher = "European Language Resources Association (ELRA)",
pages = "2122--2129",
editor = "Nicoletta Calzolari and Khalid Choukri and Sara Goggi and Thierry Declerck and Joseph Mariani and Bente Maegaard and Asuncion Moreno and Jan Odijk and Helene Mazo and Stelios Piperidis and Hrafn Loftsson",
booktitle = "Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014",

}

TY - GEN

T1 - Bilingual dictionary induction as an optimization problem

AU - Wushouer, Mairidan

AU - Lin, Donghui

AU - Ishida, Toru

AU - Hirayama, Katsutoshi

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Bilingual dictionaries are vital in many areas of natural language processing, but such resources are rarely available for lower-density language pairs, especially for those that are closely related. Pivot-based induction consists of using a third language to bridge a language pair. As an approach to create new dictionaries, it can generate wrong translations due to polysemy and ambiguous words. In this paper we propose a constraint approach to pivot-based dictionary induction for the case of two closely related languages. In order to take into account the word senses, we use an approach based on semantic distances, in which possibly missing translations are considered, and instance of induction is encoded as an optimization problem to generate new dictionary. Evaluations show that the proposal achieves 83.7% accuracy and approximately 70.5% recall, thus outperforming the baseline pivot-based method.

AB - Bilingual dictionaries are vital in many areas of natural language processing, but such resources are rarely available for lower-density language pairs, especially for those that are closely related. Pivot-based induction consists of using a third language to bridge a language pair. As an approach to create new dictionaries, it can generate wrong translations due to polysemy and ambiguous words. In this paper we propose a constraint approach to pivot-based dictionary induction for the case of two closely related languages. In order to take into account the word senses, we use an approach based on semantic distances, in which possibly missing translations are considered, and instance of induction is encoded as an optimization problem to generate new dictionary. Evaluations show that the proposal achieves 83.7% accuracy and approximately 70.5% recall, thus outperforming the baseline pivot-based method.

KW - Bilingual Dictionary Induction

KW - Constraint Satisfaction

KW - Weighted Partial Max-SAT

UR - http://www.scopus.com/inward/record.url?scp=85029169432&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029169432&partnerID=8YFLogxK

M3 - Conference contribution

T3 - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

SP - 2122

EP - 2129

BT - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

A2 - Calzolari, Nicoletta

A2 - Choukri, Khalid

A2 - Goggi, Sara

A2 - Declerck, Thierry

A2 - Mariani, Joseph

A2 - Maegaard, Bente

A2 - Moreno, Asuncion

A2 - Odijk, Jan

A2 - Mazo, Helene

A2 - Piperidis, Stelios

A2 - Loftsson, Hrafn

PB - European Language Resources Association (ELRA)

ER -