Detection of Bursty and Significant Keyphrases from Wikipedia edit history

Zihang Chen, Mizuho Iwaihara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In an online collaboration system such as Wikipedia, edit history is stored as revisions. Topics of articles or categories grow and fade over time, and evolutionary information is retained in its edit history. We consider that a great amount of information that is related to real life events is hidden in such edit history of documents. This paper focuses on a particular temporal text mining task: effectively extracting keyphrases from burst periods in the edit history of Wikipedia articles or category. We first combine the ARIMA model with a decay function to find typical edit burst periods, then do keyphrase extraction on burst periods to reveal topics of bursts. However, keyphrase extraction methods, such as TextRank, do not consider temporal trends in text stream. In this paper, we propose TextRank-nfidf which reflects temporal trends into phrase node weights, by computing smoothed difference of editing frequency between revisions. We confirm that detected bursts and keyphrases are matching well with events along the timeline.

Original languageEnglish
Title of host publication2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538677896
DOIs
Publication statusPublished - 2019 Apr 1
Event2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Kyoto, Japan
Duration: 2019 Feb 272019 Mar 2

Publication series

Name2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings

Conference

Conference2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019
CountryJapan
CityKyoto
Period19/2/2719/3/2

Keywords

  • burst detection
  • edit history
  • extraction
  • TextRank

ASJC Scopus subject areas

  • Information Systems and Management
  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems

Fingerprint Dive into the research topics of 'Detection of Bursty and Significant Keyphrases from Wikipedia edit history'. Together they form a unique fingerprint.

  • Cite this

    Chen, Z., & Iwaihara, M. (2019). Detection of Bursty and Significant Keyphrases from Wikipedia edit history. In 2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings [8679105] (2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BIGCOMP.2019.8679105