Hybrid approach for Khmer unknown word POS guessing

Chenda Nou, Wataru Kameyama

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

New words are being created everyday and the lexicon is not large enough to cover all the words, unknown words become a serious problem in part-of-speech tagging. This paper presents a hybrid approach to handle the unknown word problem in Khmer part-of-speech tagging. The hybrid approach combined of rule-based model and trigram model makes use of both internal structure of the word and surrounding contextual information to predict the part-of-speech of unknown words. The proposed approach achieves 88.9% and 78.2% of accuracy on training and test data respectively.

Original languageEnglish
Title of host publication2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007
Pages215-220
Number of pages6
DOIs
Publication statusPublished - 2007 Dec 1
Event2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007 - Las Vegas, NV, United States
Duration: 2007 Aug 132007 Aug 15

Publication series

Name2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007

Conference

Conference2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007
CountryUnited States
CityLas Vegas, NV
Period07/8/1307/8/15

ASJC Scopus subject areas

  • Information Systems
  • Information Systems and Management
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Hybrid approach for Khmer unknown word POS guessing'. Together they form a unique fingerprint.

  • Cite this

    Nou, C., & Kameyama, W. (2007). Hybrid approach for Khmer unknown word POS guessing. In 2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007 (pp. 215-220). [4296623] (2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007). https://doi.org/10.1109/IRI.2007.4296623