TY - GEN
T1 - Khmer POS tagger
T2 - ICSC 2007 International Conference on Semantic Computing
AU - Nou, Chenda
AU - Kameyama, Wataru
PY - 2007/12/1
Y1 - 2007/12/1
N2 - This paper presents an initiative research on Khmer part-of-speech tagger. We propose some modifications on applying rule algorithm of the transformation-based approach to adapt to Khmer language which is morphologically and syntactically different from the English language. Furthermore, to overcome the limited coverage of the rule-based approach in handling unknown words, we propose a hybrid approach to combine the rule-based and trigram models. Although training on a very small corpus, both proposed approaches achieve higher accuracy than the conventional methods. The tagger achieves 95.27% on training data and 91.96% on test data which includes 9% of unknown words.
AB - This paper presents an initiative research on Khmer part-of-speech tagger. We propose some modifications on applying rule algorithm of the transformation-based approach to adapt to Khmer language which is morphologically and syntactically different from the English language. Furthermore, to overcome the limited coverage of the rule-based approach in handling unknown words, we propose a hybrid approach to combine the rule-based and trigram models. Although training on a very small corpus, both proposed approaches achieve higher accuracy than the conventional methods. The tagger achieves 95.27% on training data and 91.96% on test data which includes 9% of unknown words.
UR - http://www.scopus.com/inward/record.url?scp=47749121029&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=47749121029&partnerID=8YFLogxK
U2 - 10.1109/ICSC.2007.104
DO - 10.1109/ICSC.2007.104
M3 - Conference contribution
AN - SCOPUS:47749121029
SN - 0769529976
SN - 9780769529974
T3 - ICSC 2007 International Conference on Semantic Computing
SP - 482
EP - 489
BT - ICSC 2007 International Conference on Semantic Computing
Y2 - 17 September 2007 through 19 September 2007
ER -