This paper discusses Kleinberg's HITS algorithm (hyperlink-induced topic search) that extracts the Web community by Web inherent hyperlink analysis. The problems of the algorithm are analyzed and an improvement is proposed. For this purpose, a tool (Link Viewer) that visualizes the operation process of HITS algorithm was developed. The analysis revealed the following problem of the HITS algorithm: when there exists a page in the base set which is not related to the original topic at all and has a dense link structure, it is impossible to extract the Web community (authority and hub) matched to the original topic (topic drift problem). The authors focused only on the link analysis, and proposed the following modifications: (1) a technique in the eigenvalue calculation to consider the projection on the root subspace; (2) a technique for iterative calculation by extracting only the page from the base set which has link relations to multiple pages in the root set. A technique combining (1) and (2) is also considered. As a result, the topic drift problem is avoided for any topic with a relatively small amount of computation, and the HITS algorithm is improved by using the link information.
ASJC Scopus subject areas
- Theoretical Computer Science
- Information Systems
- Hardware and Architecture
- Computational Theory and Mathematics