Automatically extracting personal name aliases from the web

Danushka Bollegala, Taiki Honma, Yutaka Matsuo, Mitsuru Ishizuka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Extracting aliases of an entity is important for various tasks such as identification of relations among entities, web search and entity disambiguation. To extract relations among entities properly, one must first identify those entities. We propose a novel approach to find aliases of a given name using automatically extracted lexical patterns. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to find candidate aliases of a given name. We use anchor texts to design a word co-occurrence model and use it to define various ranking scores to measure the association between a name and a candidate alias. The ranking scores are integrated with page-count-based association measures using support vector machines to leverage a robust alias detection method. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically significant mean reciprocal rank of 0.6718. Experiments carried out using a dataset of location names and Japanese personal names suggest the possibility of extending the proposed method to extract aliases for different types of named entities and for other languages. Moreover, the aliases extracted using the proposed method improve recall by 20% in a relation-detection task.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages77-88
Number of pages12
Volume5221 LNAI
DOIs
Publication statusPublished - 2008
Externally publishedYes
Event6th International Conference on Natural Language Processing, GoTAL 2008 - Gothenburg
Duration: 2008 Aug 252008 Aug 27

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5221 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other6th International Conference on Natural Language Processing, GoTAL 2008
CityGothenburg
Period08/8/2508/8/27

Fingerprint

Search engines
Anchors
Support vector machines
Web Search
Ranking
Experiments
Association Measure
Search Engine
Leverage
Baseline
Support Vector Machine
Count
Experiment
Text
Model

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Bollegala, D., Honma, T., Matsuo, Y., & Ishizuka, M. (2008). Automatically extracting personal name aliases from the web. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5221 LNAI, pp. 77-88). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5221 LNAI). https://doi.org/10.1007/978-3-540-85287-2_8

Automatically extracting personal name aliases from the web. / Bollegala, Danushka; Honma, Taiki; Matsuo, Yutaka; Ishizuka, Mitsuru.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5221 LNAI 2008. p. 77-88 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5221 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bollegala, D, Honma, T, Matsuo, Y & Ishizuka, M 2008, Automatically extracting personal name aliases from the web. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 5221 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5221 LNAI, pp. 77-88, 6th International Conference on Natural Language Processing, GoTAL 2008, Gothenburg, 08/8/25. https://doi.org/10.1007/978-3-540-85287-2_8
Bollegala D, Honma T, Matsuo Y, Ishizuka M. Automatically extracting personal name aliases from the web. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5221 LNAI. 2008. p. 77-88. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-540-85287-2_8
Bollegala, Danushka ; Honma, Taiki ; Matsuo, Yutaka ; Ishizuka, Mitsuru. / Automatically extracting personal name aliases from the web. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5221 LNAI 2008. pp. 77-88 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{ee0ca199e05a48aaa152b491df57e04a,
title = "Automatically extracting personal name aliases from the web",
abstract = "Extracting aliases of an entity is important for various tasks such as identification of relations among entities, web search and entity disambiguation. To extract relations among entities properly, one must first identify those entities. We propose a novel approach to find aliases of a given name using automatically extracted lexical patterns. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to find candidate aliases of a given name. We use anchor texts to design a word co-occurrence model and use it to define various ranking scores to measure the association between a name and a candidate alias. The ranking scores are integrated with page-count-based association measures using support vector machines to leverage a robust alias detection method. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically significant mean reciprocal rank of 0.6718. Experiments carried out using a dataset of location names and Japanese personal names suggest the possibility of extending the proposed method to extract aliases for different types of named entities and for other languages. Moreover, the aliases extracted using the proposed method improve recall by 20{\%} in a relation-detection task.",
author = "Danushka Bollegala and Taiki Honma and Yutaka Matsuo and Mitsuru Ishizuka",
year = "2008",
doi = "10.1007/978-3-540-85287-2_8",
language = "English",
isbn = "3540852867",
volume = "5221 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "77--88",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Automatically extracting personal name aliases from the web

AU - Bollegala, Danushka

AU - Honma, Taiki

AU - Matsuo, Yutaka

AU - Ishizuka, Mitsuru

PY - 2008

Y1 - 2008

N2 - Extracting aliases of an entity is important for various tasks such as identification of relations among entities, web search and entity disambiguation. To extract relations among entities properly, one must first identify those entities. We propose a novel approach to find aliases of a given name using automatically extracted lexical patterns. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to find candidate aliases of a given name. We use anchor texts to design a word co-occurrence model and use it to define various ranking scores to measure the association between a name and a candidate alias. The ranking scores are integrated with page-count-based association measures using support vector machines to leverage a robust alias detection method. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically significant mean reciprocal rank of 0.6718. Experiments carried out using a dataset of location names and Japanese personal names suggest the possibility of extending the proposed method to extract aliases for different types of named entities and for other languages. Moreover, the aliases extracted using the proposed method improve recall by 20% in a relation-detection task.

AB - Extracting aliases of an entity is important for various tasks such as identification of relations among entities, web search and entity disambiguation. To extract relations among entities properly, one must first identify those entities. We propose a novel approach to find aliases of a given name using automatically extracted lexical patterns. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to find candidate aliases of a given name. We use anchor texts to design a word co-occurrence model and use it to define various ranking scores to measure the association between a name and a candidate alias. The ranking scores are integrated with page-count-based association measures using support vector machines to leverage a robust alias detection method. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically significant mean reciprocal rank of 0.6718. Experiments carried out using a dataset of location names and Japanese personal names suggest the possibility of extending the proposed method to extract aliases for different types of named entities and for other languages. Moreover, the aliases extracted using the proposed method improve recall by 20% in a relation-detection task.

UR - http://www.scopus.com/inward/record.url?scp=52149108487&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=52149108487&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-85287-2_8

DO - 10.1007/978-3-540-85287-2_8

M3 - Conference contribution

AN - SCOPUS:52149108487

SN - 3540852867

SN - 9783540852865

VL - 5221 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 77

EP - 88

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -