TY - GEN
T1 - Link Prediction forWikipedia Articles based on Temporal Article Embedding
AU - Ma, Jiaji
AU - Iwaihara, Mizuho
N1 - Publisher Copyright:
Copyright © 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Wikipedia articles contain a vast number of hyperlinks (internal links) connecting subjects to other Wikipedia articles. It is useful to predict future links for newly created articles. Suggesting new links from/to existing articles can reduce editors' burdens, by prompting editors about necessary or missing links in their updates. In this paper, we discuss link prediction on linked and versioned articles. We propose new graph embeddings utilizing temporal random walk, which is biased by timestamp difference and semantic difference between linked and versioned articles. We generate article sequences by concatenating the article titles and category names on each random walk path. A pretrained language model is further trained to learn contextualized embeddings of article sequences. We design our link prediction experiments by predicting future links between new nodes and existing nodes. For evaluation, we compare our model's prediction results with three random walk-based graph embedding models DeepWalk, Node2vec, and CTDNE, through ROC AUC score, PRC AUC score, Precision@k, Recall@k, and F1@k as evaluation metrics. Our experimental results show that our proposed TLPRB outperforms these models in all the evaluation metrics.
AB - Wikipedia articles contain a vast number of hyperlinks (internal links) connecting subjects to other Wikipedia articles. It is useful to predict future links for newly created articles. Suggesting new links from/to existing articles can reduce editors' burdens, by prompting editors about necessary or missing links in their updates. In this paper, we discuss link prediction on linked and versioned articles. We propose new graph embeddings utilizing temporal random walk, which is biased by timestamp difference and semantic difference between linked and versioned articles. We generate article sequences by concatenating the article titles and category names on each random walk path. A pretrained language model is further trained to learn contextualized embeddings of article sequences. We design our link prediction experiments by predicting future links between new nodes and existing nodes. For evaluation, we compare our model's prediction results with three random walk-based graph embedding models DeepWalk, Node2vec, and CTDNE, through ROC AUC score, PRC AUC score, Precision@k, Recall@k, and F1@k as evaluation metrics. Our experimental results show that our proposed TLPRB outperforms these models in all the evaluation metrics.
KW - Graph Embedding
KW - Link Prediction
KW - Temporal Random Walk
UR - http://www.scopus.com/inward/record.url?scp=85146200624&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146200624&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85146200624
T3 - International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K - Proceedings
SP - 87
EP - 94
BT - 13th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2021 as part of IC3K 2021 - Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
A2 - Cucchiara, Rita
A2 - Fred, Ana
A2 - Filipe, Joaquim
PB - Science and Technology Publications, Lda
T2 - 13th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2021 as part of 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2021
Y2 - 25 October 2022 through 27 October 2022
ER -