Analysis and improvement of HITS algorithm for detecting WEB communities

Saeko Nomura, Satoshi Oyama, Tetsuo Hayamizu, Toru Ishida

Research output: Contribution to journalReview article

13 Citations (Scopus)

Abstract

This paper discusses Kleinberg's HITS algorithm (hyperlink-induced topic search) that extracts the Web community by Web inherent hyperlink analysis. The problems of the algorithm are analyzed and an improvement is proposed. For this purpose, a tool (Link Viewer) that visualizes the operation process of HITS algorithm was developed. The analysis revealed the following problem of the HITS algorithm: when there exists a page in the base set which is not related to the original topic at all and has a dense link structure, it is impossible to extract the Web community (authority and hub) matched to the original topic (topic drift problem). The authors focused only on the link analysis, and proposed the following modifications: (1) a technique in the eigenvalue calculation to consider the projection on the root subspace; (2) a technique for iterative calculation by extracting only the page from the base set which has link relations to multiple pages in the root set. A technique combining (1) and (2) is also considered. As a result, the topic drift problem is avoided for any topic with a relatively small amount of computation, and the HITS algorithm is improved by using the link information.

Original languageEnglish
Pages (from-to)32-42
Number of pages11
JournalSystems and Computers in Japan
Volume35
Issue number13
DOIs
Publication statusPublished - 2004 Nov 30
Externally publishedYes

Fingerprint

Roots
Link Analysis
Subspace
Community
Projection
Eigenvalue

Keywords

  • Eigenvalue calculation
  • Information visualization
  • Link analysis
  • Web community
  • World Wide Web

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this

Analysis and improvement of HITS algorithm for detecting WEB communities. / Nomura, Saeko; Oyama, Satoshi; Hayamizu, Tetsuo; Ishida, Toru.

In: Systems and Computers in Japan, Vol. 35, No. 13, 30.11.2004, p. 32-42.

Research output: Contribution to journalReview article

Nomura, Saeko ; Oyama, Satoshi ; Hayamizu, Tetsuo ; Ishida, Toru. / Analysis and improvement of HITS algorithm for detecting WEB communities. In: Systems and Computers in Japan. 2004 ; Vol. 35, No. 13. pp. 32-42.
@article{045cbdb40ce142ad8dc4fde75f83f831,
title = "Analysis and improvement of HITS algorithm for detecting WEB communities",
abstract = "This paper discusses Kleinberg's HITS algorithm (hyperlink-induced topic search) that extracts the Web community by Web inherent hyperlink analysis. The problems of the algorithm are analyzed and an improvement is proposed. For this purpose, a tool (Link Viewer) that visualizes the operation process of HITS algorithm was developed. The analysis revealed the following problem of the HITS algorithm: when there exists a page in the base set which is not related to the original topic at all and has a dense link structure, it is impossible to extract the Web community (authority and hub) matched to the original topic (topic drift problem). The authors focused only on the link analysis, and proposed the following modifications: (1) a technique in the eigenvalue calculation to consider the projection on the root subspace; (2) a technique for iterative calculation by extracting only the page from the base set which has link relations to multiple pages in the root set. A technique combining (1) and (2) is also considered. As a result, the topic drift problem is avoided for any topic with a relatively small amount of computation, and the HITS algorithm is improved by using the link information.",
keywords = "Eigenvalue calculation, Information visualization, Link analysis, Web community, World Wide Web",
author = "Saeko Nomura and Satoshi Oyama and Tetsuo Hayamizu and Toru Ishida",
year = "2004",
month = "11",
day = "30",
doi = "10.1002/scj.10425",
language = "English",
volume = "35",
pages = "32--42",
journal = "Systems and Computers in Japan",
issn = "0882-1666",
publisher = "John Wiley and Sons Inc.",
number = "13",

}

TY - JOUR

T1 - Analysis and improvement of HITS algorithm for detecting WEB communities

AU - Nomura, Saeko

AU - Oyama, Satoshi

AU - Hayamizu, Tetsuo

AU - Ishida, Toru

PY - 2004/11/30

Y1 - 2004/11/30

N2 - This paper discusses Kleinberg's HITS algorithm (hyperlink-induced topic search) that extracts the Web community by Web inherent hyperlink analysis. The problems of the algorithm are analyzed and an improvement is proposed. For this purpose, a tool (Link Viewer) that visualizes the operation process of HITS algorithm was developed. The analysis revealed the following problem of the HITS algorithm: when there exists a page in the base set which is not related to the original topic at all and has a dense link structure, it is impossible to extract the Web community (authority and hub) matched to the original topic (topic drift problem). The authors focused only on the link analysis, and proposed the following modifications: (1) a technique in the eigenvalue calculation to consider the projection on the root subspace; (2) a technique for iterative calculation by extracting only the page from the base set which has link relations to multiple pages in the root set. A technique combining (1) and (2) is also considered. As a result, the topic drift problem is avoided for any topic with a relatively small amount of computation, and the HITS algorithm is improved by using the link information.

AB - This paper discusses Kleinberg's HITS algorithm (hyperlink-induced topic search) that extracts the Web community by Web inherent hyperlink analysis. The problems of the algorithm are analyzed and an improvement is proposed. For this purpose, a tool (Link Viewer) that visualizes the operation process of HITS algorithm was developed. The analysis revealed the following problem of the HITS algorithm: when there exists a page in the base set which is not related to the original topic at all and has a dense link structure, it is impossible to extract the Web community (authority and hub) matched to the original topic (topic drift problem). The authors focused only on the link analysis, and proposed the following modifications: (1) a technique in the eigenvalue calculation to consider the projection on the root subspace; (2) a technique for iterative calculation by extracting only the page from the base set which has link relations to multiple pages in the root set. A technique combining (1) and (2) is also considered. As a result, the topic drift problem is avoided for any topic with a relatively small amount of computation, and the HITS algorithm is improved by using the link information.

KW - Eigenvalue calculation

KW - Information visualization

KW - Link analysis

KW - Web community

KW - World Wide Web

UR - http://www.scopus.com/inward/record.url?scp=9744268136&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=9744268136&partnerID=8YFLogxK

U2 - 10.1002/scj.10425

DO - 10.1002/scj.10425

M3 - Review article

VL - 35

SP - 32

EP - 42

JO - Systems and Computers in Japan

JF - Systems and Computers in Japan

SN - 0882-1666

IS - 13

ER -