Unsupervised bilingual segmentation using MDL for machine translation

Bin Shan, Hao Wang, Yves Lepage

研究成果: Paper査読

抄録

In statistical machine translation systems, a problem arises from the weak performance in alignment due to differences in word form or granularity across different languages. To address this problem, in this paper, we propose a unsupervised bilingual segmentation method using the minimum description length (MDL) principle. Our work aims at improving translation quality using a proper segmentation model (lexicon). For generating bilingual lexica, we implement a heuristic and iterative algorithm. Each entry in this bilingual lexicon is required to hold a proper length and the ability to fit the data well. The results show that this bilingual segmentation significantly improved the translation quality on the Chinese-Japanese and Japanese-Chinese sub-tasks.

本文言語English
ページ89-96
ページ数8
出版ステータスPublished - 2019
イベント31st Pacific Asia Conference on Language, Information and Computation, PACLIC 2017 - Cebu City, Philippines
継続期間: 2017 11 162017 11 18

Conference

Conference31st Pacific Asia Conference on Language, Information and Computation, PACLIC 2017
国/地域Philippines
CityCebu City
Period17/11/1617/11/18

ASJC Scopus subject areas

  • 言語および言語学
  • コンピュータ サイエンス(その他)

フィンガープリント

「Unsupervised bilingual segmentation using MDL for machine translation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル