Khmer POS tagger: A transformation-based approach with hybrid unknown word handling

Chenda Nou*, Wataru Kameyama

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This paper presents an initiative research on Khmer part-of-speech tagger. We propose some modifications on applying rule algorithm of the transformation-based approach to adapt to Khmer language which is morphologically and syntactically different from the English language. Furthermore, to overcome the limited coverage of the rule-based approach in handling unknown words, we propose a hybrid approach to combine the rule-based and trigram models. Although training on a very small corpus, both proposed approaches achieve higher accuracy than the conventional methods. The tagger achieves 95.27% on training data and 91.96% on test data which includes 9% of unknown words.

Original languageEnglish
Title of host publicationICSC 2007 International Conference on Semantic Computing
Pages482-489
Number of pages8
DOIs
Publication statusPublished - 2007 Dec 1
EventICSC 2007 International Conference on Semantic Computing - Irvine CA, United States
Duration: 2007 Sept 172007 Sept 19

Publication series

NameICSC 2007 International Conference on Semantic Computing

Conference

ConferenceICSC 2007 International Conference on Semantic Computing
Country/TerritoryUnited States
CityIrvine CA
Period07/9/1707/9/19

ASJC Scopus subject areas

  • Computer Science(all)
  • Computer Science Applications

Cite this