Improving Patent Translation using Bilingual Term Extraction and Re-tokenization for Chinese-Japanese

Wei Yang, Yves Lepage

研究成果: Conference contribution

2 被引用数 (Scopus)

抄録

Unlike European languages, many Asian languages like Chinese and Japanese do not have typographic boundaries in written system. Word segmentation (tokenization) that break sentences down into individual words (tokens) is normally treated as the first step for machine translation (MT). For Chinese and Japanese, different rules and segmentation tools lead different segmentation results in different level of granularity between Chinese and Japanese. To improve the translation accuracy, we adjust and balance the granularity of segmentation results around terms for Chinese-Japanese patent corpus for training translation model. In this paper, we describe a statistical machine translation (SMT) system which is built on re-tokenized Chinese-Japanese patent training corpus using extracted bilingual multi-word terms.

本文言語English
ホスト出版物のタイトルWAT 2016 - 3rd Workshop on Asian Translation, Proceedings of the Workshop
出版社Association for Computational Linguistics (ACL)
ページ194-202
ページ数9
ISBN(電子版)9784879747143
出版ステータスPublished - 2016
イベント3rd Workshop on Asian Translation, WAT 2016 - Osaka, Japan
継続期間: 2016 12月 112016 12月 16

出版物シリーズ

名前WAT 2016 - 3rd Workshop on Asian Translation, Proceedings of the Workshop

Conference

Conference3rd Workshop on Asian Translation, WAT 2016
国/地域Japan
CityOsaka
Period16/12/1116/12/16

ASJC Scopus subject areas

  • 言語および言語学
  • コンピュータ サイエンス(その他)

フィンガープリント

「Improving Patent Translation using Bilingual Term Extraction and Re-tokenization for Chinese-Japanese」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル