Transformation-based Khmer Part-of-Speech tagger

Chenda Nou*, Wataru Kameyama

*この研究の対応する著者

研究成果: Conference contribution

2 被引用数 (Scopus)

抄録

This paper introduces an initiative research on Khmer Part-of-Speech (POS) tagger based on Transformation based approach. Due to a few researches on natural language processing for Khmer, many pre-processing tasks are needed before the automatic tagging can take place. The first Khmer annotated corpus is tagged with 27 tags based on the traditional and modern grammar theories. The learner, based on learning algorithm introduced by Brill [2], is built with 32 transformation templates. After applying the transformation rules with our sophisticated ranking algorithm, the error rate of tagging on trained and untrained data can be reduced around 41% and 18% accordingly over the baseline. The experiments provide very encouraging results; however, some future works are drawn to improve the accuracy and the performance of the tagger to reach the better level.

本文言語English
ホスト出版物のタイトルProceedings of the 2007 International Conference on Artificial Intelligence, ICAI 2007
ページ581-587
ページ数7
出版ステータスPublished - 2007 12 1
イベント2007 International Conference on Artificial Intelligence, ICAI 2007 - Las Vegas, NV, United States
継続期間: 2007 6 252007 6 28

出版物シリーズ

名前Proceedings of the 2007 International Conference on Artificial Intelligence, ICAI 2007
2

Conference

Conference2007 International Conference on Artificial Intelligence, ICAI 2007
国/地域United States
CityLas Vegas, NV
Period07/6/2507/6/28

ASJC Scopus subject areas

  • 人工知能

フィンガープリント

「Transformation-based Khmer Part-of-Speech tagger」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル