Abstract
In statistical machine translation systems, a problem arises from the weak performance in alignment due to differences in word form or granularity across different languages. To address this problem, in this paper, we propose a unsupervised bilingual segmentation method using the minimum description length (MDL) principle. Our work aims at improving translation quality using a proper segmentation model (lexicon). For generating bilingual lexica, we implement a heuristic and iterative algorithm. Each entry in this bilingual lexicon is required to hold a proper length and the ability to fit the data well. The results show that this bilingual segmentation significantly improved the translation quality on the Chinese-Japanese and Japanese-Chinese sub-tasks.
Original language | English |
---|---|
Pages | 89-96 |
Number of pages | 8 |
Publication status | Published - 2019 |
Event | 31st Pacific Asia Conference on Language, Information and Computation, PACLIC 2017 - Cebu City, Philippines Duration: 2017 Nov 16 → 2017 Nov 18 |
Conference
Conference | 31st Pacific Asia Conference on Language, Information and Computation, PACLIC 2017 |
---|---|
Country/Territory | Philippines |
City | Cebu City |
Period | 17/11/16 → 17/11/18 |
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science (miscellaneous)