TY - GEN
T1 - Detection of Bursty and Significant Keyphrases from Wikipedia edit history
AU - Chen, Zihang
AU - Iwaihara, Mizuho
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/4/1
Y1 - 2019/4/1
N2 - In an online collaboration system such as Wikipedia, edit history is stored as revisions. Topics of articles or categories grow and fade over time, and evolutionary information is retained in its edit history. We consider that a great amount of information that is related to real life events is hidden in such edit history of documents. This paper focuses on a particular temporal text mining task: effectively extracting keyphrases from burst periods in the edit history of Wikipedia articles or category. We first combine the ARIMA model with a decay function to find typical edit burst periods, then do keyphrase extraction on burst periods to reveal topics of bursts. However, keyphrase extraction methods, such as TextRank, do not consider temporal trends in text stream. In this paper, we propose TextRank-nfidf which reflects temporal trends into phrase node weights, by computing smoothed difference of editing frequency between revisions. We confirm that detected bursts and keyphrases are matching well with events along the timeline.
AB - In an online collaboration system such as Wikipedia, edit history is stored as revisions. Topics of articles or categories grow and fade over time, and evolutionary information is retained in its edit history. We consider that a great amount of information that is related to real life events is hidden in such edit history of documents. This paper focuses on a particular temporal text mining task: effectively extracting keyphrases from burst periods in the edit history of Wikipedia articles or category. We first combine the ARIMA model with a decay function to find typical edit burst periods, then do keyphrase extraction on burst periods to reveal topics of bursts. However, keyphrase extraction methods, such as TextRank, do not consider temporal trends in text stream. In this paper, we propose TextRank-nfidf which reflects temporal trends into phrase node weights, by computing smoothed difference of editing frequency between revisions. We confirm that detected bursts and keyphrases are matching well with events along the timeline.
KW - TextRank
KW - burst detection
KW - edit history
KW - extraction
UR - http://www.scopus.com/inward/record.url?scp=85064598975&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064598975&partnerID=8YFLogxK
U2 - 10.1109/BIGCOMP.2019.8679105
DO - 10.1109/BIGCOMP.2019.8679105
M3 - Conference contribution
AN - SCOPUS:85064598975
T3 - 2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings
BT - 2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019
Y2 - 27 February 2019 through 2 March 2019
ER -