Statistical evaluation of measure and distance on document classification problems in text mining

Masayuki Goto, Takashi Ishida, Shigeichi Hirasawa

研究成果: Conference contribution

4 被引用数 (Scopus)

抄録

This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. By formulation of statistical hypotheses test which is specified as a problem of text mining, some interesting properties can be visualized. In the problem of text mining, the several heuristics are applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and this approach will give us very clear idea. The distance measure in word vector space is used to classify the documents. In this paper, the performance of distance measure is also analized from the new viewpoint of asymptotic analysis.

本文言語English
ホスト出版物のタイトルCIT 2007
ホスト出版物のサブタイトル7th IEEE International Conference on Computer and Information Technology
ページ674-679
ページ数6
DOI
出版ステータスPublished - 2007 12 1
外部発表はい
イベントCIT 2007: 7th IEEE International Conference on Computer and Information Technology - Aizu-Wakamatsu, Fukushima, Japan
継続期間: 2007 10 162007 10 19

出版物シリーズ

名前CIT 2007: 7th IEEE International Conference on Computer and Information Technology

Conference

ConferenceCIT 2007: 7th IEEE International Conference on Computer and Information Technology
国/地域Japan
CityAizu-Wakamatsu, Fukushima
Period07/10/1607/10/19

ASJC Scopus subject areas

  • コンピュータ サイエンスの応用
  • 情報システム
  • ソフトウェア
  • 数学 (全般)

フィンガープリント

「Statistical evaluation of measure and distance on document classification problems in text mining」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル