Analysis and improvement of HITS algorithm for detecting WEB communities

Saeko Nomura*, Satoshi Oyama, Tetsuo Hayamizu, Toru Ishida

*Corresponding author for this work

Research output: Contribution to journalReview articlepeer-review

20 Citations (Scopus)

Abstract

This paper discusses Kleinberg's HITS algorithm (hyperlink-induced topic search) that extracts the Web community by Web inherent hyperlink analysis. The problems of the algorithm are analyzed and an improvement is proposed. For this purpose, a tool (Link Viewer) that visualizes the operation process of HITS algorithm was developed. The analysis revealed the following problem of the HITS algorithm: when there exists a page in the base set which is not related to the original topic at all and has a dense link structure, it is impossible to extract the Web community (authority and hub) matched to the original topic (topic drift problem). The authors focused only on the link analysis, and proposed the following modifications: (1) a technique in the eigenvalue calculation to consider the projection on the root subspace; (2) a technique for iterative calculation by extracting only the page from the base set which has link relations to multiple pages in the root set. A technique combining (1) and (2) is also considered. As a result, the topic drift problem is avoided for any topic with a relatively small amount of computation, and the HITS algorithm is improved by using the link information.

Original languageEnglish
Pages (from-to)32-42
Number of pages11
JournalSystems and Computers in Japan
Volume35
Issue number13
DOIs
Publication statusPublished - 2004 Nov 30
Externally publishedYes

Keywords

  • Eigenvalue calculation
  • Information visualization
  • Link analysis
  • Web community
  • World Wide Web

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Analysis and improvement of HITS algorithm for detecting WEB communities'. Together they form a unique fingerprint.

Cite this