Measuring the similarity between implicit semantic relations using Web search engines

Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Measuring the similarity between implicit semantic relations is an important task in information retrieval and natural language processing. For example, consider the situation where you know an entity-pair (e.g. Google, YouTube), between which a particular relation holds (e.g. acquisition), and you are interested in retrieving other entity-pairs for which the same relation holds (e.g. Yahoo, Inktomi). Existing keyword-based search engines cannot be directly applied in this case because in keyword-based search, the goal is to retrieve documents that are relevant to the words used in the query - not necessarily to the relations implied by a pair of words. Accurate measurement of relational similarity is an important step in numerous natural language processing tasks such as identification of word analogies, and classification of noun-modifier pairs. We propose a method that uses Web search engines to efficiently compute the relational similarity between two pairs of words. Our method consists of three components: representing the various semantic relations that exist between a pair of words using automatically extracted lexical patterns, clustering the extracted lexical patterns to identify the different semantic relations implied by them, and measuring the similarity between different semantic relations using an inter-cluster correlation matrix. We propose a pattern extraction algorithm to extract a large number of lexical patterns that express numerous semantic relations. We then present an efficient clustering algorithm to cluster the extracted lexical patterns. Finally, we measure the relational similarity between word-pairs using inter-cluster correlation. We evaluate the proposed method in a relation classification task. Experimental results on a dataset covering multiple relation types show a statistically significant improvement over the current state-of-the-art relational similarity measures.

Original languageEnglish
Title of host publicationProceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09
Pages104-113
Number of pages10
DOIs
Publication statusPublished - 2009
Externally publishedYes
Event2nd ACM International Conference on Web Search and Data Mining, WSDM'09 - Barcelona
Duration: 2009 Feb 92009 Feb 12

Other

Other2nd ACM International Conference on Web Search and Data Mining, WSDM'09
CityBarcelona
Period09/2/909/2/12

Fingerprint

Search engines
Semantics
Processing
Information retrieval
Clustering algorithms

Keywords

  • Relational similarity measures
  • Web mining

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Cite this

Bollegala, D., Matsuo, Y., & Ishizuka, M. (2009). Measuring the similarity between implicit semantic relations using Web search engines. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09 (pp. 104-113) https://doi.org/10.1145/1498759.1498815

Measuring the similarity between implicit semantic relations using Web search engines. / Bollegala, Danushka; Matsuo, Yutaka; Ishizuka, Mitsuru.

Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09. 2009. p. 104-113.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bollegala, D, Matsuo, Y & Ishizuka, M 2009, Measuring the similarity between implicit semantic relations using Web search engines. in Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09. pp. 104-113, 2nd ACM International Conference on Web Search and Data Mining, WSDM'09, Barcelona, 09/2/9. https://doi.org/10.1145/1498759.1498815
Bollegala D, Matsuo Y, Ishizuka M. Measuring the similarity between implicit semantic relations using Web search engines. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09. 2009. p. 104-113 https://doi.org/10.1145/1498759.1498815
Bollegala, Danushka ; Matsuo, Yutaka ; Ishizuka, Mitsuru. / Measuring the similarity between implicit semantic relations using Web search engines. Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09. 2009. pp. 104-113
@inproceedings{88ad0234cf9d4b56912a1583ca1fe875,
title = "Measuring the similarity between implicit semantic relations using Web search engines",
abstract = "Measuring the similarity between implicit semantic relations is an important task in information retrieval and natural language processing. For example, consider the situation where you know an entity-pair (e.g. Google, YouTube), between which a particular relation holds (e.g. acquisition), and you are interested in retrieving other entity-pairs for which the same relation holds (e.g. Yahoo, Inktomi). Existing keyword-based search engines cannot be directly applied in this case because in keyword-based search, the goal is to retrieve documents that are relevant to the words used in the query - not necessarily to the relations implied by a pair of words. Accurate measurement of relational similarity is an important step in numerous natural language processing tasks such as identification of word analogies, and classification of noun-modifier pairs. We propose a method that uses Web search engines to efficiently compute the relational similarity between two pairs of words. Our method consists of three components: representing the various semantic relations that exist between a pair of words using automatically extracted lexical patterns, clustering the extracted lexical patterns to identify the different semantic relations implied by them, and measuring the similarity between different semantic relations using an inter-cluster correlation matrix. We propose a pattern extraction algorithm to extract a large number of lexical patterns that express numerous semantic relations. We then present an efficient clustering algorithm to cluster the extracted lexical patterns. Finally, we measure the relational similarity between word-pairs using inter-cluster correlation. We evaluate the proposed method in a relation classification task. Experimental results on a dataset covering multiple relation types show a statistically significant improvement over the current state-of-the-art relational similarity measures.",
keywords = "Relational similarity measures, Web mining",
author = "Danushka Bollegala and Yutaka Matsuo and Mitsuru Ishizuka",
year = "2009",
doi = "10.1145/1498759.1498815",
language = "English",
isbn = "9781605583907",
pages = "104--113",
booktitle = "Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09",

}

TY - GEN

T1 - Measuring the similarity between implicit semantic relations using Web search engines

AU - Bollegala, Danushka

AU - Matsuo, Yutaka

AU - Ishizuka, Mitsuru

PY - 2009

Y1 - 2009

N2 - Measuring the similarity between implicit semantic relations is an important task in information retrieval and natural language processing. For example, consider the situation where you know an entity-pair (e.g. Google, YouTube), between which a particular relation holds (e.g. acquisition), and you are interested in retrieving other entity-pairs for which the same relation holds (e.g. Yahoo, Inktomi). Existing keyword-based search engines cannot be directly applied in this case because in keyword-based search, the goal is to retrieve documents that are relevant to the words used in the query - not necessarily to the relations implied by a pair of words. Accurate measurement of relational similarity is an important step in numerous natural language processing tasks such as identification of word analogies, and classification of noun-modifier pairs. We propose a method that uses Web search engines to efficiently compute the relational similarity between two pairs of words. Our method consists of three components: representing the various semantic relations that exist between a pair of words using automatically extracted lexical patterns, clustering the extracted lexical patterns to identify the different semantic relations implied by them, and measuring the similarity between different semantic relations using an inter-cluster correlation matrix. We propose a pattern extraction algorithm to extract a large number of lexical patterns that express numerous semantic relations. We then present an efficient clustering algorithm to cluster the extracted lexical patterns. Finally, we measure the relational similarity between word-pairs using inter-cluster correlation. We evaluate the proposed method in a relation classification task. Experimental results on a dataset covering multiple relation types show a statistically significant improvement over the current state-of-the-art relational similarity measures.

AB - Measuring the similarity between implicit semantic relations is an important task in information retrieval and natural language processing. For example, consider the situation where you know an entity-pair (e.g. Google, YouTube), between which a particular relation holds (e.g. acquisition), and you are interested in retrieving other entity-pairs for which the same relation holds (e.g. Yahoo, Inktomi). Existing keyword-based search engines cannot be directly applied in this case because in keyword-based search, the goal is to retrieve documents that are relevant to the words used in the query - not necessarily to the relations implied by a pair of words. Accurate measurement of relational similarity is an important step in numerous natural language processing tasks such as identification of word analogies, and classification of noun-modifier pairs. We propose a method that uses Web search engines to efficiently compute the relational similarity between two pairs of words. Our method consists of three components: representing the various semantic relations that exist between a pair of words using automatically extracted lexical patterns, clustering the extracted lexical patterns to identify the different semantic relations implied by them, and measuring the similarity between different semantic relations using an inter-cluster correlation matrix. We propose a pattern extraction algorithm to extract a large number of lexical patterns that express numerous semantic relations. We then present an efficient clustering algorithm to cluster the extracted lexical patterns. Finally, we measure the relational similarity between word-pairs using inter-cluster correlation. We evaluate the proposed method in a relation classification task. Experimental results on a dataset covering multiple relation types show a statistically significant improvement over the current state-of-the-art relational similarity measures.

KW - Relational similarity measures

KW - Web mining

UR - http://www.scopus.com/inward/record.url?scp=70349089003&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70349089003&partnerID=8YFLogxK

U2 - 10.1145/1498759.1498815

DO - 10.1145/1498759.1498815

M3 - Conference contribution

AN - SCOPUS:70349089003

SN - 9781605583907

SP - 104

EP - 113

BT - Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09

ER -