Statistical evaluation of measure and distance on document classification problems in text mining

Masayuki Goto, Takashi Ishida, Shigeichi Hirasawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. By formulation of statistical hypotheses test which is specified as a problem of text mining, some interesting properties can be visualized. In the problem of text mining, the several heuristics are applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and this approach will give us very clear idea. The distance measure in word vector space is used to classify the documents. In this paper, the performance of distance measure is also analized from the new viewpoint of asymptotic analysis.

Original languageEnglish
Title of host publicationCIT 2007: 7th IEEE International Conference on Computer and Information Technology
Pages674-679
Number of pages6
DOIs
Publication statusPublished - 2007
Externally publishedYes
EventCIT 2007: 7th IEEE International Conference on Computer and Information Technology - Aizu-Wakamatsu, Fukushima
Duration: 2007 Oct 162007 Oct 19

Other

OtherCIT 2007: 7th IEEE International Conference on Computer and Information Technology
CityAizu-Wakamatsu, Fukushima
Period07/10/1607/10/19

Fingerprint

Document Classification
Asymptotic analysis
Statistical tests
Text Mining
Vector spaces
Classification Problems
Statistical methods
Evaluation
Distance Measure
Asymptotic Analysis
Hypothesis Test
Statistical test
Statistical Analysis
Vector space
Classify
Heuristics
Formulation

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Software
  • Mathematics(all)

Cite this

Goto, M., Ishida, T., & Hirasawa, S. (2007). Statistical evaluation of measure and distance on document classification problems in text mining. In CIT 2007: 7th IEEE International Conference on Computer and Information Technology (pp. 674-679). [4385162] https://doi.org/10.1109/CIT.2007.4385162

Statistical evaluation of measure and distance on document classification problems in text mining. / Goto, Masayuki; Ishida, Takashi; Hirasawa, Shigeichi.

CIT 2007: 7th IEEE International Conference on Computer and Information Technology. 2007. p. 674-679 4385162.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Goto, M, Ishida, T & Hirasawa, S 2007, Statistical evaluation of measure and distance on document classification problems in text mining. in CIT 2007: 7th IEEE International Conference on Computer and Information Technology., 4385162, pp. 674-679, CIT 2007: 7th IEEE International Conference on Computer and Information Technology, Aizu-Wakamatsu, Fukushima, 07/10/16. https://doi.org/10.1109/CIT.2007.4385162
Goto M, Ishida T, Hirasawa S. Statistical evaluation of measure and distance on document classification problems in text mining. In CIT 2007: 7th IEEE International Conference on Computer and Information Technology. 2007. p. 674-679. 4385162 https://doi.org/10.1109/CIT.2007.4385162
Goto, Masayuki ; Ishida, Takashi ; Hirasawa, Shigeichi. / Statistical evaluation of measure and distance on document classification problems in text mining. CIT 2007: 7th IEEE International Conference on Computer and Information Technology. 2007. pp. 674-679
@inproceedings{07f599ff8e5542bbad04cb5cd252a796,
title = "Statistical evaluation of measure and distance on document classification problems in text mining",
abstract = "This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. By formulation of statistical hypotheses test which is specified as a problem of text mining, some interesting properties can be visualized. In the problem of text mining, the several heuristics are applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and this approach will give us very clear idea. The distance measure in word vector space is used to classify the documents. In this paper, the performance of distance measure is also analized from the new viewpoint of asymptotic analysis.",
author = "Masayuki Goto and Takashi Ishida and Shigeichi Hirasawa",
year = "2007",
doi = "10.1109/CIT.2007.4385162",
language = "English",
isbn = "0769529836",
pages = "674--679",
booktitle = "CIT 2007: 7th IEEE International Conference on Computer and Information Technology",

}

TY - GEN

T1 - Statistical evaluation of measure and distance on document classification problems in text mining

AU - Goto, Masayuki

AU - Ishida, Takashi

AU - Hirasawa, Shigeichi

PY - 2007

Y1 - 2007

N2 - This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. By formulation of statistical hypotheses test which is specified as a problem of text mining, some interesting properties can be visualized. In the problem of text mining, the several heuristics are applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and this approach will give us very clear idea. The distance measure in word vector space is used to classify the documents. In this paper, the performance of distance measure is also analized from the new viewpoint of asymptotic analysis.

AB - This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. By formulation of statistical hypotheses test which is specified as a problem of text mining, some interesting properties can be visualized. In the problem of text mining, the several heuristics are applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and this approach will give us very clear idea. The distance measure in word vector space is used to classify the documents. In this paper, the performance of distance measure is also analized from the new viewpoint of asymptotic analysis.

UR - http://www.scopus.com/inward/record.url?scp=38049023026&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38049023026&partnerID=8YFLogxK

U2 - 10.1109/CIT.2007.4385162

DO - 10.1109/CIT.2007.4385162

M3 - Conference contribution

SN - 0769529836

SN - 9780769529837

SP - 674

EP - 679

BT - CIT 2007: 7th IEEE International Conference on Computer and Information Technology

ER -