Extracting key phrases to disambiguate personal names on the web

Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. How can we disambiguate these different people with the same name? This paper presents an unsupervised algorithm which produces key phrases for the different people with the same name. These key phrases could be used to further narrow down the search, leading to more person specific unambiguous information. The algorithm we propose does not require any biographical or social information regarding the person. Although there are some previous work in personal name disambiguation on the web, to our knowledge, this is the first attempt to extract key phrases to disambiguate the different persons with the same name. To evaluate our algorithm, we collected and hand labeled a dataset of over 1000 Web pages retrieved from Google using personal name queries. Our experimental results shows an improvement over the existing methods for namesake disambiguation.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages223-234
Number of pages12
Volume3878 LNCS
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event7th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2006 - Mexico City
Duration: 2006 Feb 192006 Feb 25

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3878 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other7th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2006
CityMexico City
Period06/2/1906/2/25

Fingerprint

Names
Person
Search engines
World Wide Web
Websites
Search Engine
Query
Evaluate
Experimental Results
Hand

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Bollegala, D., Matsuo, Y., & Ishizuka, M. (2006). Extracting key phrases to disambiguate personal names on the web. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3878 LNCS, pp. 223-234). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3878 LNCS). https://doi.org/10.1007/11671299_24

Extracting key phrases to disambiguate personal names on the web. / Bollegala, Danushka; Matsuo, Yutaka; Ishizuka, Mitsuru.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3878 LNCS 2006. p. 223-234 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3878 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bollegala, D, Matsuo, Y & Ishizuka, M 2006, Extracting key phrases to disambiguate personal names on the web. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 3878 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3878 LNCS, pp. 223-234, 7th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2006, Mexico City, 06/2/19. https://doi.org/10.1007/11671299_24
Bollegala D, Matsuo Y, Ishizuka M. Extracting key phrases to disambiguate personal names on the web. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3878 LNCS. 2006. p. 223-234. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/11671299_24
Bollegala, Danushka ; Matsuo, Yutaka ; Ishizuka, Mitsuru. / Extracting key phrases to disambiguate personal names on the web. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3878 LNCS 2006. pp. 223-234 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{94478d9ec11f4c5aa4ed61537d8d714b,
title = "Extracting key phrases to disambiguate personal names on the web",
abstract = "When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. How can we disambiguate these different people with the same name? This paper presents an unsupervised algorithm which produces key phrases for the different people with the same name. These key phrases could be used to further narrow down the search, leading to more person specific unambiguous information. The algorithm we propose does not require any biographical or social information regarding the person. Although there are some previous work in personal name disambiguation on the web, to our knowledge, this is the first attempt to extract key phrases to disambiguate the different persons with the same name. To evaluate our algorithm, we collected and hand labeled a dataset of over 1000 Web pages retrieved from Google using personal name queries. Our experimental results shows an improvement over the existing methods for namesake disambiguation.",
author = "Danushka Bollegala and Yutaka Matsuo and Mitsuru Ishizuka",
year = "2006",
doi = "10.1007/11671299_24",
language = "English",
isbn = "3540322051",
volume = "3878 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "223--234",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Extracting key phrases to disambiguate personal names on the web

AU - Bollegala, Danushka

AU - Matsuo, Yutaka

AU - Ishizuka, Mitsuru

PY - 2006

Y1 - 2006

N2 - When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. How can we disambiguate these different people with the same name? This paper presents an unsupervised algorithm which produces key phrases for the different people with the same name. These key phrases could be used to further narrow down the search, leading to more person specific unambiguous information. The algorithm we propose does not require any biographical or social information regarding the person. Although there are some previous work in personal name disambiguation on the web, to our knowledge, this is the first attempt to extract key phrases to disambiguate the different persons with the same name. To evaluate our algorithm, we collected and hand labeled a dataset of over 1000 Web pages retrieved from Google using personal name queries. Our experimental results shows an improvement over the existing methods for namesake disambiguation.

AB - When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. How can we disambiguate these different people with the same name? This paper presents an unsupervised algorithm which produces key phrases for the different people with the same name. These key phrases could be used to further narrow down the search, leading to more person specific unambiguous information. The algorithm we propose does not require any biographical or social information regarding the person. Although there are some previous work in personal name disambiguation on the web, to our knowledge, this is the first attempt to extract key phrases to disambiguate the different persons with the same name. To evaluate our algorithm, we collected and hand labeled a dataset of over 1000 Web pages retrieved from Google using personal name queries. Our experimental results shows an improvement over the existing methods for namesake disambiguation.

UR - http://www.scopus.com/inward/record.url?scp=33745557469&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745557469&partnerID=8YFLogxK

U2 - 10.1007/11671299_24

DO - 10.1007/11671299_24

M3 - Conference contribution

SN - 3540322051

SN - 9783540322054

VL - 3878 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 223

EP - 234

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -