Finding High Quality Documents through Link and Click Graphs

Linfeng Yu, Mizuho Iwaihara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Link graphs of web pages have been utilized to evaluate importance of each page. Existing link analysis algorithms, including HITS and PageRank, exploit static link connectivity between pages. On the other hand, service providers often record HTTP requests that contain the resource and referrer of each request, from which we can construct a click graph that has edge weights representing the times of clicks on each link, or link traffic. Click graphs reflect users' choices of interesting links, thus the graphs are useful for evaluating importance of pages. However, clicks are often skewed onto highly popular links, so that click graphs only could not properly evaluate less clicked pages. In this paper, we propose an algorithm called click count-weighted HITS algorithm, which integrates HITS algorithm with click graphs, for finding high quality documents. Our evaluations on finding featured articles of English Wikipedia show that our click count-weighted HITS algorithm shows better performance on a large Wikipedia corpus than algorithms that utilize link graphs or click graphs only.

Original languageEnglish
Title of host publicationProceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages49-54
Number of pages6
ISBN (Electronic)9781538674475
DOIs
Publication statusPublished - 2019 Apr 16
Event7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018 - Yonago, Japan
Duration: 2018 Jul 82018 Jul 13

Publication series

NameProceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018

Conference

Conference7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
CountryJapan
CityYonago
Period18/7/818/7/13

Fingerprint

Wikipedia
HTTP
service provider
Graph
Websites
traffic
evaluation
resources
performance
time

Keywords

  • Click graph
  • Document ranking
  • HITS algorithm
  • Link analysis
  • Quality
  • Wikipedia

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Communication
  • Information Systems
  • Information Systems and Management
  • Education

Cite this

Yu, L., & Iwaihara, M. (2019). Finding High Quality Documents through Link and Click Graphs. In Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018 (pp. 49-54). [8693372] (Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IIAI-AAI.2018.00020

Finding High Quality Documents through Link and Click Graphs. / Yu, Linfeng; Iwaihara, Mizuho.

Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018. Institute of Electrical and Electronics Engineers Inc., 2019. p. 49-54 8693372 (Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yu, L & Iwaihara, M 2019, Finding High Quality Documents through Link and Click Graphs. in Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018., 8693372, Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018, Institute of Electrical and Electronics Engineers Inc., pp. 49-54, 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018, Yonago, Japan, 18/7/8. https://doi.org/10.1109/IIAI-AAI.2018.00020
Yu L, Iwaihara M. Finding High Quality Documents through Link and Click Graphs. In Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018. Institute of Electrical and Electronics Engineers Inc. 2019. p. 49-54. 8693372. (Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018). https://doi.org/10.1109/IIAI-AAI.2018.00020
Yu, Linfeng ; Iwaihara, Mizuho. / Finding High Quality Documents through Link and Click Graphs. Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 49-54 (Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018).
@inproceedings{d9eb6ca709c042969bdea9156779c745,
title = "Finding High Quality Documents through Link and Click Graphs",
abstract = "Link graphs of web pages have been utilized to evaluate importance of each page. Existing link analysis algorithms, including HITS and PageRank, exploit static link connectivity between pages. On the other hand, service providers often record HTTP requests that contain the resource and referrer of each request, from which we can construct a click graph that has edge weights representing the times of clicks on each link, or link traffic. Click graphs reflect users' choices of interesting links, thus the graphs are useful for evaluating importance of pages. However, clicks are often skewed onto highly popular links, so that click graphs only could not properly evaluate less clicked pages. In this paper, we propose an algorithm called click count-weighted HITS algorithm, which integrates HITS algorithm with click graphs, for finding high quality documents. Our evaluations on finding featured articles of English Wikipedia show that our click count-weighted HITS algorithm shows better performance on a large Wikipedia corpus than algorithms that utilize link graphs or click graphs only.",
keywords = "Click graph, Document ranking, HITS algorithm, Link analysis, Quality, Wikipedia",
author = "Linfeng Yu and Mizuho Iwaihara",
year = "2019",
month = "4",
day = "16",
doi = "10.1109/IIAI-AAI.2018.00020",
language = "English",
series = "Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "49--54",
booktitle = "Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018",

}

TY - GEN

T1 - Finding High Quality Documents through Link and Click Graphs

AU - Yu, Linfeng

AU - Iwaihara, Mizuho

PY - 2019/4/16

Y1 - 2019/4/16

N2 - Link graphs of web pages have been utilized to evaluate importance of each page. Existing link analysis algorithms, including HITS and PageRank, exploit static link connectivity between pages. On the other hand, service providers often record HTTP requests that contain the resource and referrer of each request, from which we can construct a click graph that has edge weights representing the times of clicks on each link, or link traffic. Click graphs reflect users' choices of interesting links, thus the graphs are useful for evaluating importance of pages. However, clicks are often skewed onto highly popular links, so that click graphs only could not properly evaluate less clicked pages. In this paper, we propose an algorithm called click count-weighted HITS algorithm, which integrates HITS algorithm with click graphs, for finding high quality documents. Our evaluations on finding featured articles of English Wikipedia show that our click count-weighted HITS algorithm shows better performance on a large Wikipedia corpus than algorithms that utilize link graphs or click graphs only.

AB - Link graphs of web pages have been utilized to evaluate importance of each page. Existing link analysis algorithms, including HITS and PageRank, exploit static link connectivity between pages. On the other hand, service providers often record HTTP requests that contain the resource and referrer of each request, from which we can construct a click graph that has edge weights representing the times of clicks on each link, or link traffic. Click graphs reflect users' choices of interesting links, thus the graphs are useful for evaluating importance of pages. However, clicks are often skewed onto highly popular links, so that click graphs only could not properly evaluate less clicked pages. In this paper, we propose an algorithm called click count-weighted HITS algorithm, which integrates HITS algorithm with click graphs, for finding high quality documents. Our evaluations on finding featured articles of English Wikipedia show that our click count-weighted HITS algorithm shows better performance on a large Wikipedia corpus than algorithms that utilize link graphs or click graphs only.

KW - Click graph

KW - Document ranking

KW - HITS algorithm

KW - Link analysis

KW - Quality

KW - Wikipedia

UR - http://www.scopus.com/inward/record.url?scp=85065214854&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065214854&partnerID=8YFLogxK

U2 - 10.1109/IIAI-AAI.2018.00020

DO - 10.1109/IIAI-AAI.2018.00020

M3 - Conference contribution

T3 - Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018

SP - 49

EP - 54

BT - Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -