Khmer POS tagger: A transformation-based approach with hybrid unknown word handling

Chenda Nou, Wataru Kameyama

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This paper presents an initiative research on Khmer part-of-speech tagger. We propose some modifications on applying rule algorithm of the transformation-based approach to adapt to Khmer language which is morphologically and syntactically different from the English language. Furthermore, to overcome the limited coverage of the rule-based approach in handling unknown words, we propose a hybrid approach to combine the rule-based and trigram models. Although training on a very small corpus, both proposed approaches achieve higher accuracy than the conventional methods. The tagger achieves 95.27% on training data and 91.96% on test data which includes 9% of unknown words.

Original languageEnglish
Title of host publicationICSC 2007 International Conference on Semantic Computing
Pages482-489
Number of pages8
DOIs
Publication statusPublished - 2007 Dec 1
EventICSC 2007 International Conference on Semantic Computing - Irvine CA, United States
Duration: 2007 Sep 172007 Sep 19

Publication series

NameICSC 2007 International Conference on Semantic Computing

Conference

ConferenceICSC 2007 International Conference on Semantic Computing
CountryUnited States
CityIrvine CA
Period07/9/1707/9/19

ASJC Scopus subject areas

  • Computer Science(all)
  • Computer Science Applications

Cite this

Nou, C., & Kameyama, W. (2007). Khmer POS tagger: A transformation-based approach with hybrid unknown word handling. In ICSC 2007 International Conference on Semantic Computing (pp. 482-489). [4338385] (ICSC 2007 International Conference on Semantic Computing). https://doi.org/10.1109/ICSC.2007.104