Extraction of bilingual technical terms for chinese-japanese patent translation

Wei Yang, Jinghui Yan, Yves Lepage

研究成果: Conference contribution

3 被引用数 (Scopus)

抄録

The translation of patents or scientific papers is a key issue that should be helped by the use of statistical machine translation (SMT). In this paper, we propose a method to improve Chinese-Japanese patent SMT by premarking the training corpus with aligned bilingual multi-word terms. We automatically extract multi-word terms from monolingual corpora by combining statistical and linguistic filtering methods. We use the sampling-based alignment method to identify aligned terms and set some threshold on translation probabilities to select the most promising bilingual multi-word terms. We pre-mark a Chinese- Japanese training corpus with such selected aligned bilingual multi-word terms. We obtain the performance of over 70% precision in bilingual term extraction and a significant improvement of BLEU scores in our experiments on a Chinese-Japanese patent parallel corpus.

本文言語English
ホスト出版物のタイトルHLT-NAACL 2016 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics
ホスト出版物のサブタイトルHuman Language Technologies, Proceedings of the Student Research Workshop
編集者Jacob Andreas, Eunsol Choi, Angeliki Lazaridou
出版社Association for Computational Linguistics (ACL)
ページ81-87
ページ数7
ISBN(電子版)9781941643815
出版ステータスPublished - 2016
イベント2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, HLT-NAACL 2016 - San Diego, United States
継続期間: 2016 6月 122016 6月 17

出版物シリーズ

名前HLT-NAACL 2016 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop

Conference

Conference2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, HLT-NAACL 2016
国/地域United States
CitySan Diego
Period16/6/1216/6/17

ASJC Scopus subject areas

  • コンピュータ サイエンスの応用
  • 人工知能
  • 言語および言語学
  • 言語学および言語

フィンガープリント

「Extraction of bilingual technical terms for chinese-japanese patent translation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル