Refinement of index term set and improvement of classification accuracy on text categorization

Makoto Suzuki, Takashi Ishida, Masayuki Goto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

In our previous paper, we proposed a new classification technique called the Frequency Ratio Accumulation Method (FRAM). This is a simple technique that adds up the ratios of term frequency among categories. However, in FRAM, the use of index terms is unlimited. Then, we adopt Character TV-gram as index terms improving the above-described particularity of FRAM. In the present paper, we will refine the DB of the index term set using mutual information and frequency ratio, and improve the classification accuracy. Next, the proposed method is evaluated by performing several experiments. In these experiments, we classify newspaper articles from English Reuters-21578 using FRAM. Reuters-21578 provides benchmark data in automatic text categorization. As a result, we show that it has the good classification accuracy. Specifically, the macro-averaged F-measure of the proposed method is 92.3% for Reuters-21578. Our method is language-independent and provides a new perspective and has excellent potential.

Original languageEnglish
Title of host publication2008 International Symposium on Information Theory and its Applications, ISITA2008
DOIs
Publication statusPublished - 2008 Dec 1
Externally publishedYes
Event2008 International Symposium on Information Theory and its Applications, ISITA2008 - Auckland, New Zealand
Duration: 2008 Dec 72008 Dec 10

Publication series

Name2008 International Symposium on Information Theory and its Applications, ISITA2008

Conference

Conference2008 International Symposium on Information Theory and its Applications, ISITA2008
CountryNew Zealand
CityAuckland
Period08/12/708/12/10

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'Refinement of index term set and improvement of classification accuracy on text categorization'. Together they form a unique fingerprint.

  • Cite this

    Suzuki, M., Ishida, T., & Goto, M. (2008). Refinement of index term set and improvement of classification accuracy on text categorization. In 2008 International Symposium on Information Theory and its Applications, ISITA2008 [4895455] (2008 International Symposium on Information Theory and its Applications, ISITA2008). https://doi.org/10.1109/ISITA.2008.4895455