Transformation-based Khmer Part-of-Speech tagger

Chenda Nou*, Wataru Kameyama

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

This paper introduces an initiative research on Khmer Part-of-Speech (POS) tagger based on Transformation based approach. Due to a few researches on natural language processing for Khmer, many pre-processing tasks are needed before the automatic tagging can take place. The first Khmer annotated corpus is tagged with 27 tags based on the traditional and modern grammar theories. The learner, based on learning algorithm introduced by Brill [2], is built with 32 transformation templates. After applying the transformation rules with our sophisticated ranking algorithm, the error rate of tagging on trained and untrained data can be reduced around 41% and 18% accordingly over the baseline. The experiments provide very encouraging results; however, some future works are drawn to improve the accuracy and the performance of the tagger to reach the better level.

Original languageEnglish
Title of host publicationProceedings of the 2007 International Conference on Artificial Intelligence, ICAI 2007
Pages581-587
Number of pages7
Publication statusPublished - 2007
Event2007 International Conference on Artificial Intelligence, ICAI 2007 - Las Vegas, NV, United States
Duration: 2007 Jun 252007 Jun 28

Publication series

NameProceedings of the 2007 International Conference on Artificial Intelligence, ICAI 2007
Volume2

Conference

Conference2007 International Conference on Artificial Intelligence, ICAI 2007
Country/TerritoryUnited States
CityLas Vegas, NV
Period07/6/2507/6/28

Keywords

  • Automatic learning
  • Corpus-based
  • Khmer Part-of-Speech tagging
  • Natural language processing
  • Transformation-based tagger

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Transformation-based Khmer Part-of-Speech tagger'. Together they form a unique fingerprint.

Cite this