Improved Chinese-Japanese phrase-based MT quality using an extended quasi-parallel corpus

Hao Wang, Wei Yang, Yves Lepage

研究成果: Conference contribution

抄録

State-of-the-art phrase-based machine translation (MT) systems usually demand large parallel corpora in the step of training. The quality and the quantity of the training data exert a direct influence on the performance of such translation systems. The lack of open-source bilingual corpora for a particular language pair results in lower translation scores reported for such a language pair. This is the case of Chinese-Japanese. In this paper, we propose to build an extension of an initial parallel corpus in the form of quasi-parallel sentences, instead of adding new parallel sentences. The extension of the initial corpus is obtained by using monolingual analogical associations. Our experiments show that the use of such quasi-parallel corpora improves the performance of Chinese-Japanese translation systems.

本文言語English
ホスト出版物のタイトルPIC 2014 - Proceedings of 2014 IEEE International Conference on Progress in Informatics and Computing
編集者Yinglin Wang, Xuelong Li, Hongming Cai
出版社Institute of Electrical and Electronics Engineers Inc.
ページ6-10
ページ数5
ISBN(電子版)9781479920334
DOI
出版ステータスPublished - 2014 12 2
イベント2014 2nd IEEE International Conference on Progress in Informatics and Computing, PIC 2014 - Shanghai, China
継続期間: 2014 5 162014 5 18

出版物シリーズ

名前PIC 2014 - Proceedings of 2014 IEEE International Conference on Progress in Informatics and Computing

Conference

Conference2014 2nd IEEE International Conference on Progress in Informatics and Computing, PIC 2014
CountryChina
CityShanghai
Period14/5/1614/5/18

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

フィンガープリント 「Improved Chinese-Japanese phrase-based MT quality using an extended quasi-parallel corpus」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル