Transformation-based Khmer Part-of-Speech tagger

Chenda Nou, Wataru Kameyama

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    2 Citations (Scopus)

    Abstract

    This paper introduces an initiative research on Khmer Part-of-Speech (POS) tagger based on Transformation based approach. Due to a few researches on natural language processing for Khmer, many pre-processing tasks are needed before the automatic tagging can take place. The first Khmer annotated corpus is tagged with 27 tags based on the traditional and modern grammar theories. The learner, based on learning algorithm introduced by Brill [2], is built with 32 transformation templates. After applying the transformation rules with our sophisticated ranking algorithm, the error rate of tagging on trained and untrained data can be reduced around 41% and 18% accordingly over the baseline. The experiments provide very encouraging results; however, some future works are drawn to improve the accuracy and the performance of the tagger to reach the better level.

    Original languageEnglish
    Title of host publicationProceedings of the 2007 International Conference on Artificial Intelligence, ICAI 2007
    Pages581-587
    Number of pages7
    Volume2
    Publication statusPublished - 2007
    Event2007 International Conference on Artificial Intelligence, ICAI 2007 - Las Vegas, NV
    Duration: 2007 Jun 252007 Jun 28

    Other

    Other2007 International Conference on Artificial Intelligence, ICAI 2007
    CityLas Vegas, NV
    Period07/6/2507/6/28

    Fingerprint

    Processing
    Learning algorithms
    Experiments

    Keywords

    • Automatic learning
    • Corpus-based
    • Khmer Part-of-Speech tagging
    • Natural language processing
    • Transformation-based tagger

    ASJC Scopus subject areas

    • Artificial Intelligence

    Cite this

    Nou, C., & Kameyama, W. (2007). Transformation-based Khmer Part-of-Speech tagger. In Proceedings of the 2007 International Conference on Artificial Intelligence, ICAI 2007 (Vol. 2, pp. 581-587)

    Transformation-based Khmer Part-of-Speech tagger. / Nou, Chenda; Kameyama, Wataru.

    Proceedings of the 2007 International Conference on Artificial Intelligence, ICAI 2007. Vol. 2 2007. p. 581-587.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Nou, C & Kameyama, W 2007, Transformation-based Khmer Part-of-Speech tagger. in Proceedings of the 2007 International Conference on Artificial Intelligence, ICAI 2007. vol. 2, pp. 581-587, 2007 International Conference on Artificial Intelligence, ICAI 2007, Las Vegas, NV, 07/6/25.
    Nou C, Kameyama W. Transformation-based Khmer Part-of-Speech tagger. In Proceedings of the 2007 International Conference on Artificial Intelligence, ICAI 2007. Vol. 2. 2007. p. 581-587
    Nou, Chenda ; Kameyama, Wataru. / Transformation-based Khmer Part-of-Speech tagger. Proceedings of the 2007 International Conference on Artificial Intelligence, ICAI 2007. Vol. 2 2007. pp. 581-587
    @inproceedings{21f97252ab0243f796306756a5571e66,
    title = "Transformation-based Khmer Part-of-Speech tagger",
    abstract = "This paper introduces an initiative research on Khmer Part-of-Speech (POS) tagger based on Transformation based approach. Due to a few researches on natural language processing for Khmer, many pre-processing tasks are needed before the automatic tagging can take place. The first Khmer annotated corpus is tagged with 27 tags based on the traditional and modern grammar theories. The learner, based on learning algorithm introduced by Brill [2], is built with 32 transformation templates. After applying the transformation rules with our sophisticated ranking algorithm, the error rate of tagging on trained and untrained data can be reduced around 41{\%} and 18{\%} accordingly over the baseline. The experiments provide very encouraging results; however, some future works are drawn to improve the accuracy and the performance of the tagger to reach the better level.",
    keywords = "Automatic learning, Corpus-based, Khmer Part-of-Speech tagging, Natural language processing, Transformation-based tagger",
    author = "Chenda Nou and Wataru Kameyama",
    year = "2007",
    language = "English",
    isbn = "9781601320254",
    volume = "2",
    pages = "581--587",
    booktitle = "Proceedings of the 2007 International Conference on Artificial Intelligence, ICAI 2007",

    }

    TY - GEN

    T1 - Transformation-based Khmer Part-of-Speech tagger

    AU - Nou, Chenda

    AU - Kameyama, Wataru

    PY - 2007

    Y1 - 2007

    N2 - This paper introduces an initiative research on Khmer Part-of-Speech (POS) tagger based on Transformation based approach. Due to a few researches on natural language processing for Khmer, many pre-processing tasks are needed before the automatic tagging can take place. The first Khmer annotated corpus is tagged with 27 tags based on the traditional and modern grammar theories. The learner, based on learning algorithm introduced by Brill [2], is built with 32 transformation templates. After applying the transformation rules with our sophisticated ranking algorithm, the error rate of tagging on trained and untrained data can be reduced around 41% and 18% accordingly over the baseline. The experiments provide very encouraging results; however, some future works are drawn to improve the accuracy and the performance of the tagger to reach the better level.

    AB - This paper introduces an initiative research on Khmer Part-of-Speech (POS) tagger based on Transformation based approach. Due to a few researches on natural language processing for Khmer, many pre-processing tasks are needed before the automatic tagging can take place. The first Khmer annotated corpus is tagged with 27 tags based on the traditional and modern grammar theories. The learner, based on learning algorithm introduced by Brill [2], is built with 32 transformation templates. After applying the transformation rules with our sophisticated ranking algorithm, the error rate of tagging on trained and untrained data can be reduced around 41% and 18% accordingly over the baseline. The experiments provide very encouraging results; however, some future works are drawn to improve the accuracy and the performance of the tagger to reach the better level.

    KW - Automatic learning

    KW - Corpus-based

    KW - Khmer Part-of-Speech tagging

    KW - Natural language processing

    KW - Transformation-based tagger

    UR - http://www.scopus.com/inward/record.url?scp=84866524184&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84866524184&partnerID=8YFLogxK

    M3 - Conference contribution

    AN - SCOPUS:84866524184

    SN - 9781601320254

    VL - 2

    SP - 581

    EP - 587

    BT - Proceedings of the 2007 International Conference on Artificial Intelligence, ICAI 2007

    ER -