Extracting key phrases to disambiguate personal names on the web

Danushka Bollegala*, Yutaka Matsuo, Mitsuru Ishizuka

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. How can we disambiguate these different people with the same name? This paper presents an unsupervised algorithm which produces key phrases for the different people with the same name. These key phrases could be used to further narrow down the search, leading to more person specific unambiguous information. The algorithm we propose does not require any biographical or social information regarding the person. Although there are some previous work in personal name disambiguation on the web, to our knowledge, this is the first attempt to extract key phrases to disambiguate the different persons with the same name. To evaluate our algorithm, we collected and hand labeled a dataset of over 1000 Web pages retrieved from Google using personal name queries. Our experimental results shows an improvement over the existing methods for namesake disambiguation.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages223-234
Number of pages12
Volume3878 LNCS
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event7th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2006 - Mexico City
Duration: 2006 Feb 192006 Feb 25

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3878 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other7th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2006
CityMexico City
Period06/2/1906/2/25

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'Extracting key phrases to disambiguate personal names on the web'. Together they form a unique fingerprint.

Cite this