Exploiting parallel corpus for handling out-of-vocabulary words

Juan Luo, John Tinsley, Yves Lepage

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper presents a hybrid model for handling out-of-vocabulary words in Japanese to- English statistical machine translation output by exploiting parallel corpus. As the Japanese writing system makes use of four different script sets (kanji, hiragana, katakana, and romaji), we treat these scripts differently. A machine transliteration model is built to transliterate out-of vocabulary Japanese katakana words into English words. A Japanese dependency structure analyzer is employed to tackle out of-vocabulary kanji and hiragana words. The evaluation results demonstrate that it is an effective approach for addressing out-of vocabulary word problems and decreasing the OOVs rate in the Japanese-to-English machine translation tasks.

Original languageEnglish
Title of host publication27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27
PublisherNational Chengchi University
Pages399-408
Number of pages10
ISBN (Electronic)9789860385670
Publication statusPublished - 2013 Jan 1
Event27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 2013 - Taipei, Taiwan, Province of China
Duration: 2013 Nov 212013 Nov 24

Publication series

Name27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27

Conference

Conference27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 2013
CountryTaiwan, Province of China
CityTaipei
Period13/11/2113/11/24

    Fingerprint

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science(all)

Cite this

Luo, J., Tinsley, J., & Lepage, Y. (2013). Exploiting parallel corpus for handling out-of-vocabulary words. In 27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27 (pp. 399-408). (27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27). National Chengchi University.