Asymptotic evaluation of distance measure on high dimensional vector spaces in text mining

Masayuki Goto, Takashi Ishida, Makoto Suzuki, Shigeichi Hirasawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. In the problem of text mining, the several heuristics are applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and such thinking will give us very clear idea. In this paper, the performances of distance measures used to classify the documents are analyzed from the new viewpoint of asymptotic analysis. We also discuss the asymptotic performance of IDF measure used in the information retrieval field.

Original languageEnglish
Title of host publication2008 International Symposium on Information Theory and its Applications, ISITA2008
DOIs
Publication statusPublished - 2008
Externally publishedYes
Event2008 International Symposium on Information Theory and its Applications, ISITA2008 - Auckland
Duration: 2008 Dec 72008 Dec 10

Other

Other2008 International Symposium on Information Theory and its Applications, ISITA2008
CityAuckland
Period08/12/708/12/10

Fingerprint

Asymptotic analysis
Vector spaces
Information retrieval
Statistical methods

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Goto, M., Ishida, T., Suzuki, M., & Hirasawa, S. (2008). Asymptotic evaluation of distance measure on high dimensional vector spaces in text mining. In 2008 International Symposium on Information Theory and its Applications, ISITA2008 [4895453] https://doi.org/10.1109/ISITA.2008.4895453

Asymptotic evaluation of distance measure on high dimensional vector spaces in text mining. / Goto, Masayuki; Ishida, Takashi; Suzuki, Makoto; Hirasawa, Shigeichi.

2008 International Symposium on Information Theory and its Applications, ISITA2008. 2008. 4895453.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Goto, M, Ishida, T, Suzuki, M & Hirasawa, S 2008, Asymptotic evaluation of distance measure on high dimensional vector spaces in text mining. in 2008 International Symposium on Information Theory and its Applications, ISITA2008., 4895453, 2008 International Symposium on Information Theory and its Applications, ISITA2008, Auckland, 08/12/7. https://doi.org/10.1109/ISITA.2008.4895453
Goto M, Ishida T, Suzuki M, Hirasawa S. Asymptotic evaluation of distance measure on high dimensional vector spaces in text mining. In 2008 International Symposium on Information Theory and its Applications, ISITA2008. 2008. 4895453 https://doi.org/10.1109/ISITA.2008.4895453
Goto, Masayuki ; Ishida, Takashi ; Suzuki, Makoto ; Hirasawa, Shigeichi. / Asymptotic evaluation of distance measure on high dimensional vector spaces in text mining. 2008 International Symposium on Information Theory and its Applications, ISITA2008. 2008.
@inproceedings{2354fc4170fb4a06a0a6f7897212712f,
title = "Asymptotic evaluation of distance measure on high dimensional vector spaces in text mining",
abstract = "This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. In the problem of text mining, the several heuristics are applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and such thinking will give us very clear idea. In this paper, the performances of distance measures used to classify the documents are analyzed from the new viewpoint of asymptotic analysis. We also discuss the asymptotic performance of IDF measure used in the information retrieval field.",
author = "Masayuki Goto and Takashi Ishida and Makoto Suzuki and Shigeichi Hirasawa",
year = "2008",
doi = "10.1109/ISITA.2008.4895453",
language = "English",
isbn = "9781424420698",
booktitle = "2008 International Symposium on Information Theory and its Applications, ISITA2008",

}

TY - GEN

T1 - Asymptotic evaluation of distance measure on high dimensional vector spaces in text mining

AU - Goto, Masayuki

AU - Ishida, Takashi

AU - Suzuki, Makoto

AU - Hirasawa, Shigeichi

PY - 2008

Y1 - 2008

N2 - This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. In the problem of text mining, the several heuristics are applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and such thinking will give us very clear idea. In this paper, the performances of distance measures used to classify the documents are analyzed from the new viewpoint of asymptotic analysis. We also discuss the asymptotic performance of IDF measure used in the information retrieval field.

AB - This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. In the problem of text mining, the several heuristics are applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and such thinking will give us very clear idea. In this paper, the performances of distance measures used to classify the documents are analyzed from the new viewpoint of asymptotic analysis. We also discuss the asymptotic performance of IDF measure used in the information retrieval field.

UR - http://www.scopus.com/inward/record.url?scp=77951132642&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951132642&partnerID=8YFLogxK

U2 - 10.1109/ISITA.2008.4895453

DO - 10.1109/ISITA.2008.4895453

M3 - Conference contribution

SN - 9781424420698

BT - 2008 International Symposium on Information Theory and its Applications, ISITA2008

ER -