抄録
Unknown words and word segmentation granularity are two main problems in Chinese word segmentation for Chinese-Japanese Machine Translation (MT). In this paper, we propose an approach of exploiting common Chinese characters shared between Chinese and Japanese in Chinese word segmentation optimization for MT aiming to solve these problems. We augment the system dictionary of a Chinese segmenter by extracting Chinese lexicons from a parallel training corpus. In addition, we adjust the granularity of the training data for the Chinese segmenter to that of Japanese. Experimental results of Chinese-Japanese MT on a phrase-based SMT system show that our approach improves MT performance significantly.
本文言語 | English |
---|---|
ページ | 35-42 |
ページ数 | 8 |
出版ステータス | Published - 2012 |
外部発表 | はい |
イベント | 16th Annual Conference of the European Association for Machine Translation, EAMT 2012 - Trento, Italy 継続期間: 2012 5月 28 → 2012 5月 30 |
Other
Other | 16th Annual Conference of the European Association for Machine Translation, EAMT 2012 |
---|---|
国/地域 | Italy |
City | Trento |
Period | 12/5/28 → 12/5/30 |
ASJC Scopus subject areas
- 言語および言語学
- 人間とコンピュータの相互作用
- ソフトウェア