Finding High Quality Documents through Link and Click Graphs

Linfeng Yu, Mizuho Iwaihara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Link graphs of web pages have been utilized to evaluate importance of each page. Existing link analysis algorithms, including HITS and PageRank, exploit static link connectivity between pages. On the other hand, service providers often record HTTP requests that contain the resource and referrer of each request, from which we can construct a click graph that has edge weights representing the times of clicks on each link, or link traffic. Click graphs reflect users' choices of interesting links, thus the graphs are useful for evaluating importance of pages. However, clicks are often skewed onto highly popular links, so that click graphs only could not properly evaluate less clicked pages. In this paper, we propose an algorithm called click count-weighted HITS algorithm, which integrates HITS algorithm with click graphs, for finding high quality documents. Our evaluations on finding featured articles of English Wikipedia show that our click count-weighted HITS algorithm shows better performance on a large Wikipedia corpus than algorithms that utilize link graphs or click graphs only.

Original languageEnglish
Title of host publicationProceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages49-54
Number of pages6
ISBN (Electronic)9781538674475
DOIs
Publication statusPublished - 2019 Apr 16
Event7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018 - Yonago, Japan
Duration: 2018 Jul 82018 Jul 13

Publication series

NameProceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018

Conference

Conference7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
Country/TerritoryJapan
CityYonago
Period18/7/818/7/13

Keywords

  • Click graph
  • Document ranking
  • HITS algorithm
  • Link analysis
  • Quality
  • Wikipedia

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Communication
  • Information Systems
  • Information Systems and Management
  • Education

Fingerprint

Dive into the research topics of 'Finding High Quality Documents through Link and Click Graphs'. Together they form a unique fingerprint.

Cite this