TY - GEN
T1 - Refinement of index term set and improvement of classification accuracy on text categorization
AU - Suzuki, Makoto
AU - Ishida, Takashi
AU - Goto, Masayuki
PY - 2008/12/1
Y1 - 2008/12/1
N2 - In our previous paper, we proposed a new classification technique called the Frequency Ratio Accumulation Method (FRAM). This is a simple technique that adds up the ratios of term frequency among categories. However, in FRAM, the use of index terms is unlimited. Then, we adopt Character TV-gram as index terms improving the above-described particularity of FRAM. In the present paper, we will refine the DB of the index term set using mutual information and frequency ratio, and improve the classification accuracy. Next, the proposed method is evaluated by performing several experiments. In these experiments, we classify newspaper articles from English Reuters-21578 using FRAM. Reuters-21578 provides benchmark data in automatic text categorization. As a result, we show that it has the good classification accuracy. Specifically, the macro-averaged F-measure of the proposed method is 92.3% for Reuters-21578. Our method is language-independent and provides a new perspective and has excellent potential.
AB - In our previous paper, we proposed a new classification technique called the Frequency Ratio Accumulation Method (FRAM). This is a simple technique that adds up the ratios of term frequency among categories. However, in FRAM, the use of index terms is unlimited. Then, we adopt Character TV-gram as index terms improving the above-described particularity of FRAM. In the present paper, we will refine the DB of the index term set using mutual information and frequency ratio, and improve the classification accuracy. Next, the proposed method is evaluated by performing several experiments. In these experiments, we classify newspaper articles from English Reuters-21578 using FRAM. Reuters-21578 provides benchmark data in automatic text categorization. As a result, we show that it has the good classification accuracy. Specifically, the macro-averaged F-measure of the proposed method is 92.3% for Reuters-21578. Our method is language-independent and provides a new perspective and has excellent potential.
UR - http://www.scopus.com/inward/record.url?scp=77951126824&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77951126824&partnerID=8YFLogxK
U2 - 10.1109/ISITA.2008.4895455
DO - 10.1109/ISITA.2008.4895455
M3 - Conference contribution
AN - SCOPUS:77951126824
SN - 9781424420698
T3 - 2008 International Symposium on Information Theory and its Applications, ISITA2008
BT - 2008 International Symposium on Information Theory and its Applications, ISITA2008
T2 - 2008 International Symposium on Information Theory and its Applications, ISITA2008
Y2 - 7 December 2008 through 10 December 2008
ER -