Refinement of index term set and improvement of classification accuracy on text categorization

Makoto Suzuki*, Takashi Ishida, Masayuki Goto

*この研究の対応する著者

研究成果: Conference contribution

4 被引用数 (Scopus)

抄録

In our previous paper, we proposed a new classification technique called the Frequency Ratio Accumulation Method (FRAM). This is a simple technique that adds up the ratios of term frequency among categories. However, in FRAM, the use of index terms is unlimited. Then, we adopt Character TV-gram as index terms improving the above-described particularity of FRAM. In the present paper, we will refine the DB of the index term set using mutual information and frequency ratio, and improve the classification accuracy. Next, the proposed method is evaluated by performing several experiments. In these experiments, we classify newspaper articles from English Reuters-21578 using FRAM. Reuters-21578 provides benchmark data in automatic text categorization. As a result, we show that it has the good classification accuracy. Specifically, the macro-averaged F-measure of the proposed method is 92.3% for Reuters-21578. Our method is language-independent and provides a new perspective and has excellent potential.

本文言語English
ホスト出版物のタイトル2008 International Symposium on Information Theory and its Applications, ISITA2008
DOI
出版ステータスPublished - 2008 12 1
外部発表はい
イベント2008 International Symposium on Information Theory and its Applications, ISITA2008 - Auckland, New Zealand
継続期間: 2008 12 72008 12 10

出版物シリーズ

名前2008 International Symposium on Information Theory and its Applications, ISITA2008

Conference

Conference2008 International Symposium on Information Theory and its Applications, ISITA2008
国/地域New Zealand
CityAuckland
Period08/12/708/12/10

ASJC Scopus subject areas

  • コンピュータ サイエンス(全般)

フィンガープリント

「Refinement of index term set and improvement of classification accuracy on text categorization」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル