Khmer POS tagger: A transformation-based approach with hybrid unknown word handling

Chenda Nou*, Wataru Kameyama

*この研究の対応する著者

研究成果: Conference contribution

5 被引用数 (Scopus)

抄録

This paper presents an initiative research on Khmer part-of-speech tagger. We propose some modifications on applying rule algorithm of the transformation-based approach to adapt to Khmer language which is morphologically and syntactically different from the English language. Furthermore, to overcome the limited coverage of the rule-based approach in handling unknown words, we propose a hybrid approach to combine the rule-based and trigram models. Although training on a very small corpus, both proposed approaches achieve higher accuracy than the conventional methods. The tagger achieves 95.27% on training data and 91.96% on test data which includes 9% of unknown words.

本文言語English
ホスト出版物のタイトルICSC 2007 International Conference on Semantic Computing
ページ482-489
ページ数8
DOI
出版ステータスPublished - 2007 12月 1
イベントICSC 2007 International Conference on Semantic Computing - Irvine CA, United States
継続期間: 2007 9月 172007 9月 19

出版物シリーズ

名前ICSC 2007 International Conference on Semantic Computing

Conference

ConferenceICSC 2007 International Conference on Semantic Computing
国/地域United States
CityIrvine CA
Period07/9/1707/9/19

ASJC Scopus subject areas

  • コンピュータ サイエンス(全般)
  • コンピュータ サイエンスの応用

引用スタイル