Analysis and improvement of HITS algorithm for detecting Web communities

S. Nomura, S. Oyama, T. Hayamizu, Toru Ishida

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

We discuss problems with the HITS (Hyperlink-Induced Topic Search) algorithm, which capitalizes on hyperlinks to extract topic-bound communities of Web pages. Despite its theoretically sound foundations, we observed that the HITS algorithm has failed in real applications. In order to understand this problem, we developed a visualization tool LinkViewer, which graphically presents the extraction process. This tool helped reveal that a large and densely linked set of unrelated Web pages in the base set impeded the extraction. These pages were obtained when the root set was expanded into the base set. As a remedy to this topic drift problem, prior studies applied a textual analysis method. We propose two methods which only utilize the structural information of the Web: 1) the projection method, which projects eigenvectors on the root subspace, so that most elements in the root set will be relevant to the original topic; and 2) the base-set downsizing method, which filters out the pages without links to multiple pages in the root set. These methods are shown to be robust for broader types of topic and low in computation cost.

Original languageEnglish
Title of host publicationProceedings - 2002 Symposium on Applications and the Internet, SAINT 2002
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages132-140
Number of pages9
ISBN (Electronic)0769514472, 9780769514475
DOIs
Publication statusPublished - 2002 Jan 1
Externally publishedYes
EventSymposium on Applications and the Internet, SAINT 2002 - Nara City, Japan
Duration: 2002 Jan 282002 Feb 1

Publication series

NameProceedings - 2002 Symposium on Applications and the Internet, SAINT 2002

Other

OtherSymposium on Applications and the Internet, SAINT 2002
CountryJapan
CityNara City
Period02/1/2802/2/1

Fingerprint

Websites
Eigenvalues and eigenfunctions
Visualization
Acoustic waves
Costs

Keywords

  • Algorithm design and analysis
  • Computational efficiency
  • Data mining
  • Impedance
  • Informatics
  • Information filtering
  • Information filters
  • Visualization
  • Web pages
  • Web sites

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Cite this

Nomura, S., Oyama, S., Hayamizu, T., & Ishida, T. (2002). Analysis and improvement of HITS algorithm for detecting Web communities. In Proceedings - 2002 Symposium on Applications and the Internet, SAINT 2002 (pp. 132-140). [994467] (Proceedings - 2002 Symposium on Applications and the Internet, SAINT 2002). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SAINT.2002.994467

Analysis and improvement of HITS algorithm for detecting Web communities. / Nomura, S.; Oyama, S.; Hayamizu, T.; Ishida, Toru.

Proceedings - 2002 Symposium on Applications and the Internet, SAINT 2002. Institute of Electrical and Electronics Engineers Inc., 2002. p. 132-140 994467 (Proceedings - 2002 Symposium on Applications and the Internet, SAINT 2002).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nomura, S, Oyama, S, Hayamizu, T & Ishida, T 2002, Analysis and improvement of HITS algorithm for detecting Web communities. in Proceedings - 2002 Symposium on Applications and the Internet, SAINT 2002., 994467, Proceedings - 2002 Symposium on Applications and the Internet, SAINT 2002, Institute of Electrical and Electronics Engineers Inc., pp. 132-140, Symposium on Applications and the Internet, SAINT 2002, Nara City, Japan, 02/1/28. https://doi.org/10.1109/SAINT.2002.994467
Nomura S, Oyama S, Hayamizu T, Ishida T. Analysis and improvement of HITS algorithm for detecting Web communities. In Proceedings - 2002 Symposium on Applications and the Internet, SAINT 2002. Institute of Electrical and Electronics Engineers Inc. 2002. p. 132-140. 994467. (Proceedings - 2002 Symposium on Applications and the Internet, SAINT 2002). https://doi.org/10.1109/SAINT.2002.994467
Nomura, S. ; Oyama, S. ; Hayamizu, T. ; Ishida, Toru. / Analysis and improvement of HITS algorithm for detecting Web communities. Proceedings - 2002 Symposium on Applications and the Internet, SAINT 2002. Institute of Electrical and Electronics Engineers Inc., 2002. pp. 132-140 (Proceedings - 2002 Symposium on Applications and the Internet, SAINT 2002).
@inproceedings{d8534efeec554ab59bfa02380a89e448,
title = "Analysis and improvement of HITS algorithm for detecting Web communities",
abstract = "We discuss problems with the HITS (Hyperlink-Induced Topic Search) algorithm, which capitalizes on hyperlinks to extract topic-bound communities of Web pages. Despite its theoretically sound foundations, we observed that the HITS algorithm has failed in real applications. In order to understand this problem, we developed a visualization tool LinkViewer, which graphically presents the extraction process. This tool helped reveal that a large and densely linked set of unrelated Web pages in the base set impeded the extraction. These pages were obtained when the root set was expanded into the base set. As a remedy to this topic drift problem, prior studies applied a textual analysis method. We propose two methods which only utilize the structural information of the Web: 1) the projection method, which projects eigenvectors on the root subspace, so that most elements in the root set will be relevant to the original topic; and 2) the base-set downsizing method, which filters out the pages without links to multiple pages in the root set. These methods are shown to be robust for broader types of topic and low in computation cost.",
keywords = "Algorithm design and analysis, Computational efficiency, Data mining, Impedance, Informatics, Information filtering, Information filters, Visualization, Web pages, Web sites",
author = "S. Nomura and S. Oyama and T. Hayamizu and Toru Ishida",
year = "2002",
month = "1",
day = "1",
doi = "10.1109/SAINT.2002.994467",
language = "English",
series = "Proceedings - 2002 Symposium on Applications and the Internet, SAINT 2002",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "132--140",
booktitle = "Proceedings - 2002 Symposium on Applications and the Internet, SAINT 2002",

}

TY - GEN

T1 - Analysis and improvement of HITS algorithm for detecting Web communities

AU - Nomura, S.

AU - Oyama, S.

AU - Hayamizu, T.

AU - Ishida, Toru

PY - 2002/1/1

Y1 - 2002/1/1

N2 - We discuss problems with the HITS (Hyperlink-Induced Topic Search) algorithm, which capitalizes on hyperlinks to extract topic-bound communities of Web pages. Despite its theoretically sound foundations, we observed that the HITS algorithm has failed in real applications. In order to understand this problem, we developed a visualization tool LinkViewer, which graphically presents the extraction process. This tool helped reveal that a large and densely linked set of unrelated Web pages in the base set impeded the extraction. These pages were obtained when the root set was expanded into the base set. As a remedy to this topic drift problem, prior studies applied a textual analysis method. We propose two methods which only utilize the structural information of the Web: 1) the projection method, which projects eigenvectors on the root subspace, so that most elements in the root set will be relevant to the original topic; and 2) the base-set downsizing method, which filters out the pages without links to multiple pages in the root set. These methods are shown to be robust for broader types of topic and low in computation cost.

AB - We discuss problems with the HITS (Hyperlink-Induced Topic Search) algorithm, which capitalizes on hyperlinks to extract topic-bound communities of Web pages. Despite its theoretically sound foundations, we observed that the HITS algorithm has failed in real applications. In order to understand this problem, we developed a visualization tool LinkViewer, which graphically presents the extraction process. This tool helped reveal that a large and densely linked set of unrelated Web pages in the base set impeded the extraction. These pages were obtained when the root set was expanded into the base set. As a remedy to this topic drift problem, prior studies applied a textual analysis method. We propose two methods which only utilize the structural information of the Web: 1) the projection method, which projects eigenvectors on the root subspace, so that most elements in the root set will be relevant to the original topic; and 2) the base-set downsizing method, which filters out the pages without links to multiple pages in the root set. These methods are shown to be robust for broader types of topic and low in computation cost.

KW - Algorithm design and analysis

KW - Computational efficiency

KW - Data mining

KW - Impedance

KW - Informatics

KW - Information filtering

KW - Information filters

KW - Visualization

KW - Web pages

KW - Web sites

UR - http://www.scopus.com/inward/record.url?scp=84886369304&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84886369304&partnerID=8YFLogxK

U2 - 10.1109/SAINT.2002.994467

DO - 10.1109/SAINT.2002.994467

M3 - Conference contribution

T3 - Proceedings - 2002 Symposium on Applications and the Internet, SAINT 2002

SP - 132

EP - 140

BT - Proceedings - 2002 Symposium on Applications and the Internet, SAINT 2002

PB - Institute of Electrical and Electronics Engineers Inc.

ER -