English and taiwanese text categorization using N-gram based on Vector Space Model

Makoto Suzuki*, Naohide Yamagishi, Yi Ching Tsai, Takashi Ishida, Masayuki Goto

*この研究の対応する著者

研究成果: Conference contribution

1 被引用数 (Scopus)

抄録

In this paper, we present a new mathematical model based on a "Vector Space Model" and consider its implications. The proposed method is evaluated by performing several experiments. In these experiments, we classify newspaper articles from the English Reuters-21578 data set, and Taiwanese China Times 2005 data set using the proposed method. The Reuters-21578 data set is a benchmark data set for automatic text categorization. It is shown that FRAM has good classification accuracy. Specifically, the micro-averaged F-measure of the proposed method is 94.5% for English. However, that is 78.0% for Taiwanese. Though the proposed method is language-independent and provides a new perspective, our future work is to improve classification accuracy for Taiwanese.

本文言語English
ホスト出版物のタイトルISITA/ISSSTA 2010 - 2010 International Symposium on Information Theory and Its Applications
ページ106-111
ページ数6
DOI
出版ステータスPublished - 2010 12月 1
イベント2010 20th International Symposium on Information Theory and Its Applications, ISITA 2010 and the 2010 20th International Symposium on Spread Spectrum Techniques and Applications, ISSSTA 2010 - Taichung, Taiwan, Province of China
継続期間: 2010 10月 172010 10月 20

出版物シリーズ

名前ISITA/ISSSTA 2010 - 2010 International Symposium on Information Theory and Its Applications

Conference

Conference2010 20th International Symposium on Information Theory and Its Applications, ISITA 2010 and the 2010 20th International Symposium on Spread Spectrum Techniques and Applications, ISSSTA 2010
国/地域Taiwan, Province of China
CityTaichung
Period10/10/1710/10/20

ASJC Scopus subject areas

  • 計算理論と計算数学
  • 情報システム

フィンガープリント

「English and taiwanese text categorization using N-gram based on Vector Space Model」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル