Inflating a small parallel corpus into a large quasi-parallel corpus using monolingual data for Chinese–Iapanese machine translation

Wei Yang, Hanfei Shen, Yves Lepage

研究成果査読

2 被引用数 (Scopus)

抄録

Increasing the size of parallel corpora for less-resourced language pairs is essential for machine translation (MT). To address the shortage of parallel corpora between Chinese and Japanese, we propose a method to construct a quasi-parallel corpus by inflating a small amount of Chinese–Japanese corpus, so as to improve statistical machine translation (SMT) quality. We generate new sentences using analogical associations based on large amounts of monolingual data and a small amount of parallel data. We filter over-generated sentences using two filtering methods: one based on BLEU and the second one based on N-sequences. We add the obtained aligned quasi-parallel corpus to a small parallel Chinese–Japanese corpus and perform SMT experiments. We obtain significant improvements over a baseline system.

本文言語English
ページ(範囲)88-99
ページ数12
ジャーナルJournal of information processing
25
DOI
出版ステータスPublished - 2017

ASJC Scopus subject areas

  • コンピュータ サイエンス(全般)

フィンガープリント

「Inflating a small parallel corpus into a large quasi-parallel corpus using monolingual data for Chinese–Iapanese machine translation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル