Consistent word segmentation, part-of-speech tagging and dependency labelling annotation for Chinese language

Mo Shen, Wingmui Li, Hyunjeong Choe, Chenhui Chu, Daisuke Kawahara, Sadao Kurohashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose a new annotation approach to Chinese word segmentation, part-of-speech (POS) tagging and dependency labelling that aims to overcome the two major issues in traditional morphology-based annotation: Inconsistency and data sparsity. We re-annotate the Penn Chinese Treebank 5.0 (CTB5) and demonstrate the advantages of this approach compared to the original CTB5 annotation through word segmentation, POS tagging and machine translation experiments.

Original languageEnglish
Title of host publicationCOLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016
Subtitle of host publicationTechnical Papers
PublisherAssociation for Computational Linguistics, ACL Anthology
Pages298-308
Number of pages11
ISBN (Print)9784879747020
Publication statusPublished - 2016
Externally publishedYes
Event26th International Conference on Computational Linguistics, COLING 2016 - Osaka, Japan
Duration: 2016 Dec 112016 Dec 16

Publication series

NameCOLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers

Other

Other26th International Conference on Computational Linguistics, COLING 2016
CountryJapan
CityOsaka
Period16/12/1116/12/16

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Language and Linguistics
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Consistent word segmentation, part-of-speech tagging and dependency labelling annotation for Chinese language'. Together they form a unique fingerprint.

Cite this