Extraction of bilingual technical terms for chinese-japanese patent translation

Wei Yang, Jinghui Yan, Yves Lepage

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

The translation of patents or scientific papers is a key issue that should be helped by the use of statistical machine translation (SMT). In this paper, we propose a method to improve Chinese-Japanese patent SMT by premarking the training corpus with aligned bilingual multi-word terms. We automatically extract multi-word terms from monolingual corpora by combining statistical and linguistic filtering methods. We use the sampling-based alignment method to identify aligned terms and set some threshold on translation probabilities to select the most promising bilingual multi-word terms. We pre-mark a Chinese- Japanese training corpus with such selected aligned bilingual multi-word terms. We obtain the performance of over 70% precision in bilingual term extraction and a significant improvement of BLEU scores in our experiments on a Chinese-Japanese patent parallel corpus.

Original languageEnglish
Title of host publicationHLT-NAACL 2016 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, Proceedings of the Student Research Workshop
EditorsJacob Andreas, Eunsol Choi, Angeliki Lazaridou
PublisherAssociation for Computational Linguistics (ACL)
Pages81-87
Number of pages7
ISBN (Electronic)9781941643815
Publication statusPublished - 2016
Event2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, HLT-NAACL 2016 - San Diego, United States
Duration: 2016 Jun 122016 Jun 17

Publication series

NameHLT-NAACL 2016 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop

Conference

Conference2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, HLT-NAACL 2016
Country/TerritoryUnited States
CitySan Diego
Period16/6/1216/6/17

ASJC Scopus subject areas

  • Computer Science Applications
  • Artificial Intelligence
  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Extraction of bilingual technical terms for chinese-japanese patent translation'. Together they form a unique fingerprint.

Cite this