A study of Bayesian clustering of a document set based on GA

Keiko Aoki, Kazunori Matsumoto, Keiichiro Hoashi, Kazuo Hashimoto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose new approximate clustering algorithm that improves the precision of a top-down clustering. Top-down clustering is proposed to improve the clustering speed by Iwayama et al, where the cluster tree is generated by sampling some documents, making a cluster from these, assigning other documents to the nearest node and if the number of assigned documents is large, continuing sampling and clustering from top to down. To improve precision of the top-down clustering method, we propose selecting documents by applying a GA to decide a quasi-optimum layer and using a MDL criteria for evaluating the layer structure of a cluster tree.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages260-267
Number of pages8
Volume1585
ISBN (Print)3540659072, 9783540659075
Publication statusPublished - 1999
Externally publishedYes
Event2nd Asia-Pacific Conference on Simulated Evolution and Learning, SEAL 1998 - Canberra, Australia
Duration: 1998 Nov 241998 Nov 27

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1585
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other2nd Asia-Pacific Conference on Simulated Evolution and Learning, SEAL 1998
CountryAustralia
CityCanberra
Period98/11/2498/11/27

Fingerprint

Clustering
Sampling
Clustering algorithms
Approximate Algorithm
Clustering Methods
Clustering Algorithm
Gas
Vertex of a graph

Keywords

  • Beysian clustering
  • Document retrieval
  • Genetic algorithm
  • Minimum description length criteria

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Aoki, K., Matsumoto, K., Hoashi, K., & Hashimoto, K. (1999). A study of Bayesian clustering of a document set based on GA. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1585, pp. 260-267). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1585). Springer Verlag.

A study of Bayesian clustering of a document set based on GA. / Aoki, Keiko; Matsumoto, Kazunori; Hoashi, Keiichiro; Hashimoto, Kazuo.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1585 Springer Verlag, 1999. p. 260-267 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1585).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Aoki, K, Matsumoto, K, Hoashi, K & Hashimoto, K 1999, A study of Bayesian clustering of a document set based on GA. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 1585, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1585, Springer Verlag, pp. 260-267, 2nd Asia-Pacific Conference on Simulated Evolution and Learning, SEAL 1998, Canberra, Australia, 98/11/24.
Aoki K, Matsumoto K, Hoashi K, Hashimoto K. A study of Bayesian clustering of a document set based on GA. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1585. Springer Verlag. 1999. p. 260-267. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Aoki, Keiko ; Matsumoto, Kazunori ; Hoashi, Keiichiro ; Hashimoto, Kazuo. / A study of Bayesian clustering of a document set based on GA. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1585 Springer Verlag, 1999. pp. 260-267 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{baa898c130224c908c1c5bd046915286,
title = "A study of Bayesian clustering of a document set based on GA",
abstract = "In this paper, we propose new approximate clustering algorithm that improves the precision of a top-down clustering. Top-down clustering is proposed to improve the clustering speed by Iwayama et al, where the cluster tree is generated by sampling some documents, making a cluster from these, assigning other documents to the nearest node and if the number of assigned documents is large, continuing sampling and clustering from top to down. To improve precision of the top-down clustering method, we propose selecting documents by applying a GA to decide a quasi-optimum layer and using a MDL criteria for evaluating the layer structure of a cluster tree.",
keywords = "Beysian clustering, Document retrieval, Genetic algorithm, Minimum description length criteria",
author = "Keiko Aoki and Kazunori Matsumoto and Keiichiro Hoashi and Kazuo Hashimoto",
year = "1999",
language = "English",
isbn = "3540659072",
volume = "1585",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "260--267",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - A study of Bayesian clustering of a document set based on GA

AU - Aoki, Keiko

AU - Matsumoto, Kazunori

AU - Hoashi, Keiichiro

AU - Hashimoto, Kazuo

PY - 1999

Y1 - 1999

N2 - In this paper, we propose new approximate clustering algorithm that improves the precision of a top-down clustering. Top-down clustering is proposed to improve the clustering speed by Iwayama et al, where the cluster tree is generated by sampling some documents, making a cluster from these, assigning other documents to the nearest node and if the number of assigned documents is large, continuing sampling and clustering from top to down. To improve precision of the top-down clustering method, we propose selecting documents by applying a GA to decide a quasi-optimum layer and using a MDL criteria for evaluating the layer structure of a cluster tree.

AB - In this paper, we propose new approximate clustering algorithm that improves the precision of a top-down clustering. Top-down clustering is proposed to improve the clustering speed by Iwayama et al, where the cluster tree is generated by sampling some documents, making a cluster from these, assigning other documents to the nearest node and if the number of assigned documents is large, continuing sampling and clustering from top to down. To improve precision of the top-down clustering method, we propose selecting documents by applying a GA to decide a quasi-optimum layer and using a MDL criteria for evaluating the layer structure of a cluster tree.

KW - Beysian clustering

KW - Document retrieval

KW - Genetic algorithm

KW - Minimum description length criteria

UR - http://www.scopus.com/inward/record.url?scp=84956858012&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84956858012&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84956858012

SN - 3540659072

SN - 9783540659075

VL - 1585

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 260

EP - 267

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -