Mining for personal name aliases on the web

Danushka Bollegala, Taiki Honma, Yutaka Matsuo, Mitsuru Ishizuka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

20 Citations (Scopus)

Abstract

We propose a novel approach to find aliases of a given name from the web. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to find candidate aliases of a given name. We use anchor texts and hyperlinks to design a word co-occurrence model and define numerous ranking scores to evaluate the association between a name and its candidate aliases. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically significant mean reciprocal rank of 0.6718. Moreover, the aliases extracted using the proposed method improve recall by 20% in a relation-detection task.

Original languageEnglish
Title of host publicationProceeding of the 17th International Conference on World Wide Web 2008, WWW'08
Pages1107-1108
Number of pages2
DOIs
Publication statusPublished - 2008
Externally publishedYes
Event17th International Conference on World Wide Web 2008, WWW'08 - Beijing
Duration: 2008 Apr 212008 Apr 25

Other

Other17th International Conference on World Wide Web 2008, WWW'08
CityBeijing
Period08/4/2108/4/25

Fingerprint

Search engines
Anchors

Keywords

  • Name alias extraction
  • Semantic web
  • Web mining

ASJC Scopus subject areas

  • Computer Networks and Communications

Cite this

Bollegala, D., Honma, T., Matsuo, Y., & Ishizuka, M. (2008). Mining for personal name aliases on the web. In Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08 (pp. 1107-1108) https://doi.org/10.1145/1367497.1367679

Mining for personal name aliases on the web. / Bollegala, Danushka; Honma, Taiki; Matsuo, Yutaka; Ishizuka, Mitsuru.

Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08. 2008. p. 1107-1108.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bollegala, D, Honma, T, Matsuo, Y & Ishizuka, M 2008, Mining for personal name aliases on the web. in Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08. pp. 1107-1108, 17th International Conference on World Wide Web 2008, WWW'08, Beijing, 08/4/21. https://doi.org/10.1145/1367497.1367679
Bollegala D, Honma T, Matsuo Y, Ishizuka M. Mining for personal name aliases on the web. In Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08. 2008. p. 1107-1108 https://doi.org/10.1145/1367497.1367679
Bollegala, Danushka ; Honma, Taiki ; Matsuo, Yutaka ; Ishizuka, Mitsuru. / Mining for personal name aliases on the web. Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08. 2008. pp. 1107-1108
@inproceedings{e40a9a218bf24b01ad992b4ff6cf3904,
title = "Mining for personal name aliases on the web",
abstract = "We propose a novel approach to find aliases of a given name from the web. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to find candidate aliases of a given name. We use anchor texts and hyperlinks to design a word co-occurrence model and define numerous ranking scores to evaluate the association between a name and its candidate aliases. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically significant mean reciprocal rank of 0.6718. Moreover, the aliases extracted using the proposed method improve recall by 20{\%} in a relation-detection task.",
keywords = "Name alias extraction, Semantic web, Web mining",
author = "Danushka Bollegala and Taiki Honma and Yutaka Matsuo and Mitsuru Ishizuka",
year = "2008",
doi = "10.1145/1367497.1367679",
language = "English",
isbn = "9781605580852",
pages = "1107--1108",
booktitle = "Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08",

}

TY - GEN

T1 - Mining for personal name aliases on the web

AU - Bollegala, Danushka

AU - Honma, Taiki

AU - Matsuo, Yutaka

AU - Ishizuka, Mitsuru

PY - 2008

Y1 - 2008

N2 - We propose a novel approach to find aliases of a given name from the web. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to find candidate aliases of a given name. We use anchor texts and hyperlinks to design a word co-occurrence model and define numerous ranking scores to evaluate the association between a name and its candidate aliases. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically significant mean reciprocal rank of 0.6718. Moreover, the aliases extracted using the proposed method improve recall by 20% in a relation-detection task.

AB - We propose a novel approach to find aliases of a given name from the web. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to find candidate aliases of a given name. We use anchor texts and hyperlinks to design a word co-occurrence model and define numerous ranking scores to evaluate the association between a name and its candidate aliases. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically significant mean reciprocal rank of 0.6718. Moreover, the aliases extracted using the proposed method improve recall by 20% in a relation-detection task.

KW - Name alias extraction

KW - Semantic web

KW - Web mining

UR - http://www.scopus.com/inward/record.url?scp=57349141388&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=57349141388&partnerID=8YFLogxK

U2 - 10.1145/1367497.1367679

DO - 10.1145/1367497.1367679

M3 - Conference contribution

SN - 9781605580852

SP - 1107

EP - 1108

BT - Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08

ER -