Abstract
Semantic similarity measures play important roles in information retrieval and Natural Language Processing. Previous work in semantic web-related applications such as community mining, relation extraction, automatic meta data extraction have used various semantic similarity measures. Despite the usefulness of semantic similarity measures in these applications, robustly measuring semantic similarity between two words (or entities) remains a challenging task. We propose a robust semantic similarity measure that uses the information available on the Web to measure similarity between words or entities. The proposed method exploits page counts and text snippets returned by a Web search engine. We dene various similarity scores for two given words P and Q, using the page counts for the queries P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using automatically extracted lexico-syntactic patterns from text snippets. These different similarity scores are integrated using support vector machines, to leverage a robust semantic similarity measure. Experimental results on Miller-Charles benchmark dataset show that the proposed measure outperforms all the existing web-based semantic similarity measures by a wide margin, achieving a correlation coeficient of 0:834. Moreover, the proposed semantic similarity measure significantly improves the accuracy (F-measure of 0:78) in a community mining task, and in an entity disambiguation task, thereby verifying the capability of the proposed measure to capture semantic similarity using web content.
Original language | English |
---|---|
Title of host publication | 16th International World Wide Web Conference, WWW2007 |
Pages | 757-766 |
Number of pages | 10 |
DOIs | |
Publication status | Published - 2007 |
Externally published | Yes |
Event | 16th International World Wide Web Conference, WWW2007 - Banff, AB Duration: 2007 May 8 → 2007 May 12 |
Other
Other | 16th International World Wide Web Conference, WWW2007 |
---|---|
City | Banff, AB |
Period | 07/5/8 → 07/5/12 |
ASJC Scopus subject areas
- Computer Networks and Communications
- Software