Statistical evaluation of measure and distance on document classification problems in text mining

Masayuki Goto*, Takashi Ishida, Shigeichi Hirasawa

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. By formulation of statistical hypotheses test which is specified as a problem of text mining, some interesting properties can be visualized. In the problem of text mining, the several heuristics are applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and this approach will give us very clear idea. The distance measure in word vector space is used to classify the documents. In this paper, the performance of distance measure is also analized from the new viewpoint of asymptotic analysis.

Original languageEnglish
Title of host publicationCIT 2007
Subtitle of host publication7th IEEE International Conference on Computer and Information Technology
Pages674-679
Number of pages6
DOIs
Publication statusPublished - 2007
Externally publishedYes
EventCIT 2007: 7th IEEE International Conference on Computer and Information Technology - Aizu-Wakamatsu, Fukushima, Japan
Duration: 2007 Oct 162007 Oct 19

Publication series

NameCIT 2007: 7th IEEE International Conference on Computer and Information Technology

Conference

ConferenceCIT 2007: 7th IEEE International Conference on Computer and Information Technology
Country/TerritoryJapan
CityAizu-Wakamatsu, Fukushima
Period07/10/1607/10/19

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Software
  • Mathematics(all)

Fingerprint

Dive into the research topics of 'Statistical evaluation of measure and distance on document classification problems in text mining'. Together they form a unique fingerprint.

Cite this