Revision graph extraction in Wikipedia based on supergram decomposition and sliding update

Jianmin Wu, Mizuho Iwaihara

Research output: Contribution to journalArticle

Abstract

As one of the popular social media that many people turn to in recent years, collaborative encyclopedia Wikipedia provides information in a more "Neutral Point of View" way than others. Towards this core principle, plenty of efforts have been put into collaborative contribution and editing. The trajectories of how such collaboration appears by revisions are valuable for group dynamics and social media research, which suggest that we should extract the underlying derivation relationships among revisions from chronologically-sorted revision history in a precise way. In this paper, we propose a revision graph extraction method based on supergram decomposition in the document collection of near-duplicates. The plain text of revisions would be measured by its frequency distribution of supergram, which is the variable-length token sequence that keeps the same through revisions. We show that this method can effectively perform the task than existing methods.

Original languageEnglish
Pages (from-to)770-778
Number of pages9
JournalIEICE Transactions on Information and Systems
VolumeE97-D
Issue number4
DOIs
Publication statusPublished - 2014

Fingerprint

Trajectories
Decomposition

Keywords

  • Collaboration
  • Revision history
  • Wikipedia

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Software
  • Artificial Intelligence
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition

Cite this

Revision graph extraction in Wikipedia based on supergram decomposition and sliding update. / Wu, Jianmin; Iwaihara, Mizuho.

In: IEICE Transactions on Information and Systems, Vol. E97-D, No. 4, 2014, p. 770-778.

Research output: Contribution to journalArticle

@article{1c3a808fd74f40409024bcb5d2741b8c,
title = "Revision graph extraction in Wikipedia based on supergram decomposition and sliding update",
abstract = "As one of the popular social media that many people turn to in recent years, collaborative encyclopedia Wikipedia provides information in a more {"}Neutral Point of View{"} way than others. Towards this core principle, plenty of efforts have been put into collaborative contribution and editing. The trajectories of how such collaboration appears by revisions are valuable for group dynamics and social media research, which suggest that we should extract the underlying derivation relationships among revisions from chronologically-sorted revision history in a precise way. In this paper, we propose a revision graph extraction method based on supergram decomposition in the document collection of near-duplicates. The plain text of revisions would be measured by its frequency distribution of supergram, which is the variable-length token sequence that keeps the same through revisions. We show that this method can effectively perform the task than existing methods.",
keywords = "Collaboration, Revision history, Wikipedia",
author = "Jianmin Wu and Mizuho Iwaihara",
year = "2014",
doi = "10.1587/transinf.E97.D.770",
language = "English",
volume = "E97-D",
pages = "770--778",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "4",

}

TY - JOUR

T1 - Revision graph extraction in Wikipedia based on supergram decomposition and sliding update

AU - Wu, Jianmin

AU - Iwaihara, Mizuho

PY - 2014

Y1 - 2014

N2 - As one of the popular social media that many people turn to in recent years, collaborative encyclopedia Wikipedia provides information in a more "Neutral Point of View" way than others. Towards this core principle, plenty of efforts have been put into collaborative contribution and editing. The trajectories of how such collaboration appears by revisions are valuable for group dynamics and social media research, which suggest that we should extract the underlying derivation relationships among revisions from chronologically-sorted revision history in a precise way. In this paper, we propose a revision graph extraction method based on supergram decomposition in the document collection of near-duplicates. The plain text of revisions would be measured by its frequency distribution of supergram, which is the variable-length token sequence that keeps the same through revisions. We show that this method can effectively perform the task than existing methods.

AB - As one of the popular social media that many people turn to in recent years, collaborative encyclopedia Wikipedia provides information in a more "Neutral Point of View" way than others. Towards this core principle, plenty of efforts have been put into collaborative contribution and editing. The trajectories of how such collaboration appears by revisions are valuable for group dynamics and social media research, which suggest that we should extract the underlying derivation relationships among revisions from chronologically-sorted revision history in a precise way. In this paper, we propose a revision graph extraction method based on supergram decomposition in the document collection of near-duplicates. The plain text of revisions would be measured by its frequency distribution of supergram, which is the variable-length token sequence that keeps the same through revisions. We show that this method can effectively perform the task than existing methods.

KW - Collaboration

KW - Revision history

KW - Wikipedia

UR - http://www.scopus.com/inward/record.url?scp=84897393449&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897393449&partnerID=8YFLogxK

U2 - 10.1587/transinf.E97.D.770

DO - 10.1587/transinf.E97.D.770

M3 - Article

VL - E97-D

SP - 770

EP - 778

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 4

ER -