Hybrid approach for Khmer unknown word POS guessing

Chenda Nou, Wataru Kameyama

    研究成果: Conference contribution

    抜粋

    New words are being created everyday and the lexicon is not large enough to cover all the words, unknown words become a serious problem in part-of-speech tagging. This paper presents a hybrid approach to handle the unknown word problem in Khmer part-of-speech tagging. The hybrid approach combined of rule-based model and trigram model makes use of both internal structure of the word and surrounding contextual information to predict the part-of-speech of unknown words. The proposed approach achieves 88.9% and 78.2% of accuracy on training and test data respectively.

    元の言語English
    ホスト出版物のタイトル2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007
    ページ215-220
    ページ数6
    DOI
    出版物ステータスPublished - 2007
    イベント2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007 - Las Vegas, NV
    継続期間: 2007 8 132007 8 15

    Other

    Other2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007
    Las Vegas, NV
    期間07/8/1307/8/15

      フィンガープリント

    ASJC Scopus subject areas

    • Information Systems
    • Information Systems and Management
    • Electrical and Electronic Engineering

    これを引用

    Nou, C., & Kameyama, W. (2007). Hybrid approach for Khmer unknown word POS guessing. : 2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007 (pp. 215-220). [4296623] https://doi.org/10.1109/IRI.2007.4296623