抄録
In statistical machine translation systems, a problem arises from the weak performance in alignment due to differences in word form or granularity across different languages. To address this problem, in this paper, we propose a unsupervised bilingual segmentation method using the minimum description length (MDL) principle. Our work aims at improving translation quality using a proper segmentation model (lexicon). For generating bilingual lexica, we implement a heuristic and iterative algorithm. Each entry in this bilingual lexicon is required to hold a proper length and the ability to fit the data well. The results show that this bilingual segmentation significantly improved the translation quality on the Chinese-Japanese and Japanese-Chinese sub-tasks.
本文言語 | English |
---|---|
ページ | 89-96 |
ページ数 | 8 |
出版ステータス | Published - 2019 |
イベント | 31st Pacific Asia Conference on Language, Information and Computation, PACLIC 2017 - Cebu City, Philippines 継続期間: 2017 11月 16 → 2017 11月 18 |
Conference
Conference | 31st Pacific Asia Conference on Language, Information and Computation, PACLIC 2017 |
---|---|
国/地域 | Philippines |
City | Cebu City |
Period | 17/11/16 → 17/11/18 |
ASJC Scopus subject areas
- 言語および言語学
- コンピュータ サイエンス(その他)