Hybrid approach for Khmer unknown word POS guessing

Chenda Nou, Wataru Kameyama

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    New words are being created everyday and the lexicon is not large enough to cover all the words, unknown words become a serious problem in part-of-speech tagging. This paper presents a hybrid approach to handle the unknown word problem in Khmer part-of-speech tagging. The hybrid approach combined of rule-based model and trigram model makes use of both internal structure of the word and surrounding contextual information to predict the part-of-speech of unknown words. The proposed approach achieves 88.9% and 78.2% of accuracy on training and test data respectively.

    Original languageEnglish
    Title of host publication2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007
    Pages215-220
    Number of pages6
    DOIs
    Publication statusPublished - 2007
    Event2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007 - Las Vegas, NV
    Duration: 2007 Aug 132007 Aug 15

    Other

    Other2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007
    CityLas Vegas, NV
    Period07/8/1307/8/15

    Fingerprint

    Hybrid approach
    Tagging
    Rule-based

    ASJC Scopus subject areas

    • Information Systems
    • Information Systems and Management
    • Electrical and Electronic Engineering

    Cite this

    Nou, C., & Kameyama, W. (2007). Hybrid approach for Khmer unknown word POS guessing. In 2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007 (pp. 215-220). [4296623] https://doi.org/10.1109/IRI.2007.4296623

    Hybrid approach for Khmer unknown word POS guessing. / Nou, Chenda; Kameyama, Wataru.

    2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007. 2007. p. 215-220 4296623.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Nou, C & Kameyama, W 2007, Hybrid approach for Khmer unknown word POS guessing. in 2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007., 4296623, pp. 215-220, 2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007, Las Vegas, NV, 07/8/13. https://doi.org/10.1109/IRI.2007.4296623
    Nou C, Kameyama W. Hybrid approach for Khmer unknown word POS guessing. In 2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007. 2007. p. 215-220. 4296623 https://doi.org/10.1109/IRI.2007.4296623
    Nou, Chenda ; Kameyama, Wataru. / Hybrid approach for Khmer unknown word POS guessing. 2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007. 2007. pp. 215-220
    @inproceedings{eb19badae40e4e7e8de0748be66c74eb,
    title = "Hybrid approach for Khmer unknown word POS guessing",
    abstract = "New words are being created everyday and the lexicon is not large enough to cover all the words, unknown words become a serious problem in part-of-speech tagging. This paper presents a hybrid approach to handle the unknown word problem in Khmer part-of-speech tagging. The hybrid approach combined of rule-based model and trigram model makes use of both internal structure of the word and surrounding contextual information to predict the part-of-speech of unknown words. The proposed approach achieves 88.9{\%} and 78.2{\%} of accuracy on training and test data respectively.",
    author = "Chenda Nou and Wataru Kameyama",
    year = "2007",
    doi = "10.1109/IRI.2007.4296623",
    language = "English",
    isbn = "1424414997",
    pages = "215--220",
    booktitle = "2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007",

    }

    TY - GEN

    T1 - Hybrid approach for Khmer unknown word POS guessing

    AU - Nou, Chenda

    AU - Kameyama, Wataru

    PY - 2007

    Y1 - 2007

    N2 - New words are being created everyday and the lexicon is not large enough to cover all the words, unknown words become a serious problem in part-of-speech tagging. This paper presents a hybrid approach to handle the unknown word problem in Khmer part-of-speech tagging. The hybrid approach combined of rule-based model and trigram model makes use of both internal structure of the word and surrounding contextual information to predict the part-of-speech of unknown words. The proposed approach achieves 88.9% and 78.2% of accuracy on training and test data respectively.

    AB - New words are being created everyday and the lexicon is not large enough to cover all the words, unknown words become a serious problem in part-of-speech tagging. This paper presents a hybrid approach to handle the unknown word problem in Khmer part-of-speech tagging. The hybrid approach combined of rule-based model and trigram model makes use of both internal structure of the word and surrounding contextual information to predict the part-of-speech of unknown words. The proposed approach achieves 88.9% and 78.2% of accuracy on training and test data respectively.

    UR - http://www.scopus.com/inward/record.url?scp=47949109029&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=47949109029&partnerID=8YFLogxK

    U2 - 10.1109/IRI.2007.4296623

    DO - 10.1109/IRI.2007.4296623

    M3 - Conference contribution

    AN - SCOPUS:47949109029

    SN - 1424414997

    SN - 9781424414994

    SP - 215

    EP - 220

    BT - 2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007

    ER -