Finding co-occurring topics in wikipedia article segments

Renzhi Wang, Jianmin Wu, Mizuho Iwaihara

Research output: Chapter in Book/Report/Conference proceedingChapter

2 Citations (Scopus)

Abstract

Wikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. Identical topics in different articles indicate that the articles are related to each other about topics. Finding such co-occurring topics is useful to improve the accuracy of querying and clustering, and also to contrast related articles. Existing topic alignment work and topic relevance detection are based on term occurrence. In our research, we discuss incorporating latent topics existing in article segments by utilizing Latent Dirichlet Allocation (LDA), to detect topic relevance. We also study how segment proximities, arising from segment ordering and hyperlinks, shall be incorporated into topic detection and alignment. Experimental data show our method can find and distinguish three types of co-occurrence.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages252-259
Number of pages8
Volume8839
ISBN (Print)9783319128221
Publication statusPublished - 2014
Event16th International Conference on Asia-Pacific Digital Libraries, ICADL 2014 - Chiang Mai
Duration: 2014 Nov 52014 Nov 7

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8839
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other16th International Conference on Asia-Pacific Digital Libraries, ICADL 2014
CityChiang Mai
Period14/11/514/11/7

Fingerprint

Wikipedia
Alignment
Semantics
Proximity
Dirichlet
Experimental Data
Clustering
Resources
Term
Relevance

Keywords

  • LDA
  • Link
  • MLE
  • Wikipedia

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Wang, R., Wu, J., & Iwaihara, M. (2014). Finding co-occurring topics in wikipedia article segments. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8839, pp. 252-259). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8839). Springer Verlag.

Finding co-occurring topics in wikipedia article segments. / Wang, Renzhi; Wu, Jianmin; Iwaihara, Mizuho.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8839 Springer Verlag, 2014. p. 252-259 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8839).

Research output: Chapter in Book/Report/Conference proceedingChapter

Wang, R, Wu, J & Iwaihara, M 2014, Finding co-occurring topics in wikipedia article segments. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 8839, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8839, Springer Verlag, pp. 252-259, 16th International Conference on Asia-Pacific Digital Libraries, ICADL 2014, Chiang Mai, 14/11/5.
Wang R, Wu J, Iwaihara M. Finding co-occurring topics in wikipedia article segments. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8839. Springer Verlag. 2014. p. 252-259. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Wang, Renzhi ; Wu, Jianmin ; Iwaihara, Mizuho. / Finding co-occurring topics in wikipedia article segments. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8839 Springer Verlag, 2014. pp. 252-259 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inbook{cea015e50ec34aab881967a5f7bc4401,
title = "Finding co-occurring topics in wikipedia article segments",
abstract = "Wikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. Identical topics in different articles indicate that the articles are related to each other about topics. Finding such co-occurring topics is useful to improve the accuracy of querying and clustering, and also to contrast related articles. Existing topic alignment work and topic relevance detection are based on term occurrence. In our research, we discuss incorporating latent topics existing in article segments by utilizing Latent Dirichlet Allocation (LDA), to detect topic relevance. We also study how segment proximities, arising from segment ordering and hyperlinks, shall be incorporated into topic detection and alignment. Experimental data show our method can find and distinguish three types of co-occurrence.",
keywords = "LDA, Link, MLE, Wikipedia",
author = "Renzhi Wang and Jianmin Wu and Mizuho Iwaihara",
year = "2014",
language = "English",
isbn = "9783319128221",
volume = "8839",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "252--259",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - CHAP

T1 - Finding co-occurring topics in wikipedia article segments

AU - Wang, Renzhi

AU - Wu, Jianmin

AU - Iwaihara, Mizuho

PY - 2014

Y1 - 2014

N2 - Wikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. Identical topics in different articles indicate that the articles are related to each other about topics. Finding such co-occurring topics is useful to improve the accuracy of querying and clustering, and also to contrast related articles. Existing topic alignment work and topic relevance detection are based on term occurrence. In our research, we discuss incorporating latent topics existing in article segments by utilizing Latent Dirichlet Allocation (LDA), to detect topic relevance. We also study how segment proximities, arising from segment ordering and hyperlinks, shall be incorporated into topic detection and alignment. Experimental data show our method can find and distinguish three types of co-occurrence.

AB - Wikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. Identical topics in different articles indicate that the articles are related to each other about topics. Finding such co-occurring topics is useful to improve the accuracy of querying and clustering, and also to contrast related articles. Existing topic alignment work and topic relevance detection are based on term occurrence. In our research, we discuss incorporating latent topics existing in article segments by utilizing Latent Dirichlet Allocation (LDA), to detect topic relevance. We also study how segment proximities, arising from segment ordering and hyperlinks, shall be incorporated into topic detection and alignment. Experimental data show our method can find and distinguish three types of co-occurrence.

KW - LDA

KW - Link

KW - MLE

KW - Wikipedia

UR - http://www.scopus.com/inward/record.url?scp=84909587341&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84909587341&partnerID=8YFLogxK

M3 - Chapter

AN - SCOPUS:84909587341

SN - 9783319128221

VL - 8839

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 252

EP - 259

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -