Measuring semantic similarity between words using web search engines

Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

419 Citations (Scopus)

Abstract

Semantic similarity measures play important roles in information retrieval and Natural Language Processing. Previous work in semantic web-related applications such as community mining, relation extraction, automatic meta data extraction have used various semantic similarity measures. Despite the usefulness of semantic similarity measures in these applications, robustly measuring semantic similarity between two words (or entities) remains a challenging task. We propose a robust semantic similarity measure that uses the information available on the Web to measure similarity between words or entities. The proposed method exploits page counts and text snippets returned by a Web search engine. We dene various similarity scores for two given words P and Q, using the page counts for the queries P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using automatically extracted lexico-syntactic patterns from text snippets. These different similarity scores are integrated using support vector machines, to leverage a robust semantic similarity measure. Experimental results on Miller-Charles benchmark dataset show that the proposed measure outperforms all the existing web-based semantic similarity measures by a wide margin, achieving a correlation coeficient of 0:834. Moreover, the proposed semantic similarity measure significantly improves the accuracy (F-measure of 0:78) in a community mining task, and in an entity disambiguation task, thereby verifying the capability of the proposed measure to capture semantic similarity using web content.

Original languageEnglish
Title of host publication16th International World Wide Web Conference, WWW2007
Pages757-766
Number of pages10
DOIs
Publication statusPublished - 2007
Externally publishedYes
Event16th International World Wide Web Conference, WWW2007 - Banff, AB
Duration: 2007 May 82007 May 12

Other

Other16th International World Wide Web Conference, WWW2007
CityBanff, AB
Period07/5/807/5/12

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint Dive into the research topics of 'Measuring semantic similarity between words using web search engines'. Together they form a unique fingerprint.

  • Cite this

    Bollegala, D., Matsuo, Y., & Ishizuka, M. (2007). Measuring semantic similarity between words using web search engines. In 16th International World Wide Web Conference, WWW2007 (pp. 757-766) https://doi.org/10.1145/1242572.1242675