Detection of Bursty and Significant Keyphrases from Wikipedia edit history

Zihang Chen, Mizuho Iwaihara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In an online collaboration system such as Wikipedia, edit history is stored as revisions. Topics of articles or categories grow and fade over time, and evolutionary information is retained in its edit history. We consider that a great amount of information that is related to real life events is hidden in such edit history of documents. This paper focuses on a particular temporal text mining task: effectively extracting keyphrases from burst periods in the edit history of Wikipedia articles or category. We first combine the ARIMA model with a decay function to find typical edit burst periods, then do keyphrase extraction on burst periods to reveal topics of bursts. However, keyphrase extraction methods, such as TextRank, do not consider temporal trends in text stream. In this paper, we propose TextRank-nfidf which reflects temporal trends into phrase node weights, by computing smoothed difference of editing frequency between revisions. We confirm that detected bursts and keyphrases are matching well with events along the timeline.

Original languageEnglish
Title of host publication2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538677896
DOIs
Publication statusPublished - 2019 Apr 1
Event2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Kyoto, Japan
Duration: 2019 Feb 272019 Mar 2

Publication series

Name2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings

Conference

Conference2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019
CountryJapan
CityKyoto
Period19/2/2719/3/2

Fingerprint

Online systems
Wikipedia
Decay
ARIMA models
Editing
Text mining
Node
Evolutionary
Life events

Keywords

  • burst detection
  • edit history
  • extraction
  • TextRank

ASJC Scopus subject areas

  • Information Systems and Management
  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems

Cite this

Chen, Z., & Iwaihara, M. (2019). Detection of Bursty and Significant Keyphrases from Wikipedia edit history. In 2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings [8679105] (2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BIGCOMP.2019.8679105

Detection of Bursty and Significant Keyphrases from Wikipedia edit history. / Chen, Zihang; Iwaihara, Mizuho.

2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. 8679105 (2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, Z & Iwaihara, M 2019, Detection of Bursty and Significant Keyphrases from Wikipedia edit history. in 2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings., 8679105, 2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019, Kyoto, Japan, 19/2/27. https://doi.org/10.1109/BIGCOMP.2019.8679105
Chen Z, Iwaihara M. Detection of Bursty and Significant Keyphrases from Wikipedia edit history. In 2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2019. 8679105. (2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings). https://doi.org/10.1109/BIGCOMP.2019.8679105
Chen, Zihang ; Iwaihara, Mizuho. / Detection of Bursty and Significant Keyphrases from Wikipedia edit history. 2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. (2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings).
@inproceedings{c5ad8e8a77e840eebbc1745b47a9771e,
title = "Detection of Bursty and Significant Keyphrases from Wikipedia edit history",
abstract = "In an online collaboration system such as Wikipedia, edit history is stored as revisions. Topics of articles or categories grow and fade over time, and evolutionary information is retained in its edit history. We consider that a great amount of information that is related to real life events is hidden in such edit history of documents. This paper focuses on a particular temporal text mining task: effectively extracting keyphrases from burst periods in the edit history of Wikipedia articles or category. We first combine the ARIMA model with a decay function to find typical edit burst periods, then do keyphrase extraction on burst periods to reveal topics of bursts. However, keyphrase extraction methods, such as TextRank, do not consider temporal trends in text stream. In this paper, we propose TextRank-nfidf which reflects temporal trends into phrase node weights, by computing smoothed difference of editing frequency between revisions. We confirm that detected bursts and keyphrases are matching well with events along the timeline.",
keywords = "burst detection, edit history, extraction, TextRank",
author = "Zihang Chen and Mizuho Iwaihara",
year = "2019",
month = "4",
day = "1",
doi = "10.1109/BIGCOMP.2019.8679105",
language = "English",
series = "2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings",

}

TY - GEN

T1 - Detection of Bursty and Significant Keyphrases from Wikipedia edit history

AU - Chen, Zihang

AU - Iwaihara, Mizuho

PY - 2019/4/1

Y1 - 2019/4/1

N2 - In an online collaboration system such as Wikipedia, edit history is stored as revisions. Topics of articles or categories grow and fade over time, and evolutionary information is retained in its edit history. We consider that a great amount of information that is related to real life events is hidden in such edit history of documents. This paper focuses on a particular temporal text mining task: effectively extracting keyphrases from burst periods in the edit history of Wikipedia articles or category. We first combine the ARIMA model with a decay function to find typical edit burst periods, then do keyphrase extraction on burst periods to reveal topics of bursts. However, keyphrase extraction methods, such as TextRank, do not consider temporal trends in text stream. In this paper, we propose TextRank-nfidf which reflects temporal trends into phrase node weights, by computing smoothed difference of editing frequency between revisions. We confirm that detected bursts and keyphrases are matching well with events along the timeline.

AB - In an online collaboration system such as Wikipedia, edit history is stored as revisions. Topics of articles or categories grow and fade over time, and evolutionary information is retained in its edit history. We consider that a great amount of information that is related to real life events is hidden in such edit history of documents. This paper focuses on a particular temporal text mining task: effectively extracting keyphrases from burst periods in the edit history of Wikipedia articles or category. We first combine the ARIMA model with a decay function to find typical edit burst periods, then do keyphrase extraction on burst periods to reveal topics of bursts. However, keyphrase extraction methods, such as TextRank, do not consider temporal trends in text stream. In this paper, we propose TextRank-nfidf which reflects temporal trends into phrase node weights, by computing smoothed difference of editing frequency between revisions. We confirm that detected bursts and keyphrases are matching well with events along the timeline.

KW - burst detection

KW - edit history

KW - extraction

KW - TextRank

UR - http://www.scopus.com/inward/record.url?scp=85064598975&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064598975&partnerID=8YFLogxK

U2 - 10.1109/BIGCOMP.2019.8679105

DO - 10.1109/BIGCOMP.2019.8679105

M3 - Conference contribution

T3 - 2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings

BT - 2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -