Suggesting specific segments as link targets in Wikipedia

Renzhi Wang, Mizuho Iwaihara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Wikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. Links within Wikipedia indicate that the two texts of a link origin and destination are related about their semantic topics. Existing link detection methods focus on article titles because most of links in Wikipedia point to article titles. But there are a number of links in Wikipedia pointing to corresponding segments, because the whole article is too general and it is hard for readers to obtain the intention of the link. We propose a method to automatically predict whether a link target is a specific segment and provide which segment is most relevant. We propose a combination method of Latent Dirichlet Allocation (LDA) and Maximum Likelihood Estimation (MLE) to represent every segment as a vector, then we obtain similarity of each segment pair, finally we utilize variance, standard deviation and other statistical features to predict the results. Through evaluations on Wikipedia articles, our method performs better result than existing methods.

Original languageEnglish
Title of host publicationDigital Libraries: Knowledge, Information, and Data in an Open Access Society - 18th International Conference on Asia-Pacific Digital Libraries, ICADL 2016, Proceedings
PublisherSpringer Verlag
Pages394-405
Number of pages12
Volume10075 LNCS
ISBN (Print)9783319493039
DOIs
Publication statusPublished - 2016
Event18th International Conference on Asia-Pacific Digital Libraries, ICADL 2016 - Tsukuba, Japan
Duration: 2016 Dec 72016 Dec 9

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10075 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other18th International Conference on Asia-Pacific Digital Libraries, ICADL 2016
CountryJapan
CityTsukuba
Period16/12/716/12/9

Fingerprint

Wikipedia
Semantics
Target
Maximum likelihood estimation
Predict
Maximum Likelihood Estimation
Standard deviation
Dirichlet
Resources
Evaluation

Keywords

  • LDA
  • Link suggestion
  • Text mining
  • Wikipedia

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Wang, R., & Iwaihara, M. (2016). Suggesting specific segments as link targets in Wikipedia. In Digital Libraries: Knowledge, Information, and Data in an Open Access Society - 18th International Conference on Asia-Pacific Digital Libraries, ICADL 2016, Proceedings (Vol. 10075 LNCS, pp. 394-405). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10075 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-49304-6_42

Suggesting specific segments as link targets in Wikipedia. / Wang, Renzhi; Iwaihara, Mizuho.

Digital Libraries: Knowledge, Information, and Data in an Open Access Society - 18th International Conference on Asia-Pacific Digital Libraries, ICADL 2016, Proceedings. Vol. 10075 LNCS Springer Verlag, 2016. p. 394-405 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10075 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, R & Iwaihara, M 2016, Suggesting specific segments as link targets in Wikipedia. in Digital Libraries: Knowledge, Information, and Data in an Open Access Society - 18th International Conference on Asia-Pacific Digital Libraries, ICADL 2016, Proceedings. vol. 10075 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10075 LNCS, Springer Verlag, pp. 394-405, 18th International Conference on Asia-Pacific Digital Libraries, ICADL 2016, Tsukuba, Japan, 16/12/7. https://doi.org/10.1007/978-3-319-49304-6_42
Wang R, Iwaihara M. Suggesting specific segments as link targets in Wikipedia. In Digital Libraries: Knowledge, Information, and Data in an Open Access Society - 18th International Conference on Asia-Pacific Digital Libraries, ICADL 2016, Proceedings. Vol. 10075 LNCS. Springer Verlag. 2016. p. 394-405. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-49304-6_42
Wang, Renzhi ; Iwaihara, Mizuho. / Suggesting specific segments as link targets in Wikipedia. Digital Libraries: Knowledge, Information, and Data in an Open Access Society - 18th International Conference on Asia-Pacific Digital Libraries, ICADL 2016, Proceedings. Vol. 10075 LNCS Springer Verlag, 2016. pp. 394-405 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{1f678ed3585345b19f0d6d5638c9211b,
title = "Suggesting specific segments as link targets in Wikipedia",
abstract = "Wikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. Links within Wikipedia indicate that the two texts of a link origin and destination are related about their semantic topics. Existing link detection methods focus on article titles because most of links in Wikipedia point to article titles. But there are a number of links in Wikipedia pointing to corresponding segments, because the whole article is too general and it is hard for readers to obtain the intention of the link. We propose a method to automatically predict whether a link target is a specific segment and provide which segment is most relevant. We propose a combination method of Latent Dirichlet Allocation (LDA) and Maximum Likelihood Estimation (MLE) to represent every segment as a vector, then we obtain similarity of each segment pair, finally we utilize variance, standard deviation and other statistical features to predict the results. Through evaluations on Wikipedia articles, our method performs better result than existing methods.",
keywords = "LDA, Link suggestion, Text mining, Wikipedia",
author = "Renzhi Wang and Mizuho Iwaihara",
year = "2016",
doi = "10.1007/978-3-319-49304-6_42",
language = "English",
isbn = "9783319493039",
volume = "10075 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "394--405",
booktitle = "Digital Libraries: Knowledge, Information, and Data in an Open Access Society - 18th International Conference on Asia-Pacific Digital Libraries, ICADL 2016, Proceedings",
address = "Germany",

}

TY - GEN

T1 - Suggesting specific segments as link targets in Wikipedia

AU - Wang, Renzhi

AU - Iwaihara, Mizuho

PY - 2016

Y1 - 2016

N2 - Wikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. Links within Wikipedia indicate that the two texts of a link origin and destination are related about their semantic topics. Existing link detection methods focus on article titles because most of links in Wikipedia point to article titles. But there are a number of links in Wikipedia pointing to corresponding segments, because the whole article is too general and it is hard for readers to obtain the intention of the link. We propose a method to automatically predict whether a link target is a specific segment and provide which segment is most relevant. We propose a combination method of Latent Dirichlet Allocation (LDA) and Maximum Likelihood Estimation (MLE) to represent every segment as a vector, then we obtain similarity of each segment pair, finally we utilize variance, standard deviation and other statistical features to predict the results. Through evaluations on Wikipedia articles, our method performs better result than existing methods.

AB - Wikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. Links within Wikipedia indicate that the two texts of a link origin and destination are related about their semantic topics. Existing link detection methods focus on article titles because most of links in Wikipedia point to article titles. But there are a number of links in Wikipedia pointing to corresponding segments, because the whole article is too general and it is hard for readers to obtain the intention of the link. We propose a method to automatically predict whether a link target is a specific segment and provide which segment is most relevant. We propose a combination method of Latent Dirichlet Allocation (LDA) and Maximum Likelihood Estimation (MLE) to represent every segment as a vector, then we obtain similarity of each segment pair, finally we utilize variance, standard deviation and other statistical features to predict the results. Through evaluations on Wikipedia articles, our method performs better result than existing methods.

KW - LDA

KW - Link suggestion

KW - Text mining

KW - Wikipedia

UR - http://www.scopus.com/inward/record.url?scp=85006063472&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85006063472&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-49304-6_42

DO - 10.1007/978-3-319-49304-6_42

M3 - Conference contribution

AN - SCOPUS:85006063472

SN - 9783319493039

VL - 10075 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 394

EP - 405

BT - Digital Libraries: Knowledge, Information, and Data in an Open Access Society - 18th International Conference on Asia-Pacific Digital Libraries, ICADL 2016, Proceedings

PB - Springer Verlag

ER -