TY - GEN
T1 - Improving Patent Translation using Bilingual Term Extraction and Re-tokenization for Chinese-Japanese
AU - Yang, Wei
AU - Lepage, Yves
N1 - Publisher Copyright:
© WAT 2016 - 3rd Workshop on Asian Translation, Proceedings of the Workshop.
PY - 2016
Y1 - 2016
N2 - Unlike European languages, many Asian languages like Chinese and Japanese do not have typographic boundaries in written system. Word segmentation (tokenization) that break sentences down into individual words (tokens) is normally treated as the first step for machine translation (MT). For Chinese and Japanese, different rules and segmentation tools lead different segmentation results in different level of granularity between Chinese and Japanese. To improve the translation accuracy, we adjust and balance the granularity of segmentation results around terms for Chinese-Japanese patent corpus for training translation model. In this paper, we describe a statistical machine translation (SMT) system which is built on re-tokenized Chinese-Japanese patent training corpus using extracted bilingual multi-word terms.
AB - Unlike European languages, many Asian languages like Chinese and Japanese do not have typographic boundaries in written system. Word segmentation (tokenization) that break sentences down into individual words (tokens) is normally treated as the first step for machine translation (MT). For Chinese and Japanese, different rules and segmentation tools lead different segmentation results in different level of granularity between Chinese and Japanese. To improve the translation accuracy, we adjust and balance the granularity of segmentation results around terms for Chinese-Japanese patent corpus for training translation model. In this paper, we describe a statistical machine translation (SMT) system which is built on re-tokenized Chinese-Japanese patent training corpus using extracted bilingual multi-word terms.
UR - http://www.scopus.com/inward/record.url?scp=85032869205&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85032869205&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85032869205
T3 - WAT 2016 - 3rd Workshop on Asian Translation, Proceedings of the Workshop
SP - 194
EP - 202
BT - WAT 2016 - 3rd Workshop on Asian Translation, Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
T2 - 3rd Workshop on Asian Translation, WAT 2016
Y2 - 11 December 2016 through 16 December 2016
ER -