Relational duality

Unsupervised extraction of semantic relations between entities on the web

Danushka Tarupathi Bollegala, Yutaka Matsuo, Mitsuru Ishizuka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

53 Citations (Scopus)

Abstract

Extracting semantic relations among entities is an important first step in various tasks in Web mining and natural language processing such as information extraction, relation detection, and social network mining. A relation can be expressed extensionally by stating all the instances of that relation or intensionally by defining all the paraphrases of that relation. For example, consider the ACQUISITION relation between two companies. An extensional definition of ACQUISITION contains all pairs of companies in which one company is acquired by another (e.g. (YouTube, Google) or (Powerset, Microsoft)). On the other hand we can intensionally define ACQUISITION as the relation described by lexical patterns such as X is acquired by Y, or Y purchased X, where X and Y denote two companies. We use this dual representation of semantic relations to propose a novel sequential co-clustering algorithm that can extract numerous relations efficiently from unlabeled data. We provide an efficient heuristic to find the parameters of the proposed coclustering algorithm. Using the clusters produced by the algorithm, we train an L1 regularized logistic regression model to identify the representative patterns that describe the relation expressed by each cluster. We evaluate the proposed method in three different tasks: measuring relational similarity between entity pairs, open information extraction (Open IE), and classifying relations in a social network system. Experiments conducted using a benchmark dataset show that the proposed method improves existing relational similarity measures. Moreover, the proposed method significantly outperforms the current state-of-the-art Open IE systems in terms of both precision and recall. The proposed method correctly classifies 53 relation types in an online social network containing 470; 671 nodes and 35; 652; 475 edges, thereby demonstrating its efficacy in real-world relation detection tasks.

Original languageEnglish
Title of host publicationProceedings of the 19th International Conference on World Wide Web, WWW '10
Pages151-160
Number of pages10
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event19th International World Wide Web Conference, WWW2010 - Raleigh, NC
Duration: 2010 Apr 262010 Apr 30

Other

Other19th International World Wide Web Conference, WWW2010
CityRaleigh, NC
Period10/4/2610/4/30

Fingerprint

Semantics
Industry
Clustering algorithms
Logistics
Processing
Experiments

Keywords

  • relation extraction
  • relational duality
  • relational similarity
  • web mining

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Cite this

Bollegala, D. T., Matsuo, Y., & Ishizuka, M. (2010). Relational duality: Unsupervised extraction of semantic relations between entities on the web. In Proceedings of the 19th International Conference on World Wide Web, WWW '10 (pp. 151-160) https://doi.org/10.1145/1772690.1772707

Relational duality : Unsupervised extraction of semantic relations between entities on the web. / Bollegala, Danushka Tarupathi; Matsuo, Yutaka; Ishizuka, Mitsuru.

Proceedings of the 19th International Conference on World Wide Web, WWW '10. 2010. p. 151-160.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bollegala, DT, Matsuo, Y & Ishizuka, M 2010, Relational duality: Unsupervised extraction of semantic relations between entities on the web. in Proceedings of the 19th International Conference on World Wide Web, WWW '10. pp. 151-160, 19th International World Wide Web Conference, WWW2010, Raleigh, NC, 10/4/26. https://doi.org/10.1145/1772690.1772707
Bollegala DT, Matsuo Y, Ishizuka M. Relational duality: Unsupervised extraction of semantic relations between entities on the web. In Proceedings of the 19th International Conference on World Wide Web, WWW '10. 2010. p. 151-160 https://doi.org/10.1145/1772690.1772707
Bollegala, Danushka Tarupathi ; Matsuo, Yutaka ; Ishizuka, Mitsuru. / Relational duality : Unsupervised extraction of semantic relations between entities on the web. Proceedings of the 19th International Conference on World Wide Web, WWW '10. 2010. pp. 151-160
@inproceedings{03c281d57aad4100abe72c42204eb474,
title = "Relational duality: Unsupervised extraction of semantic relations between entities on the web",
abstract = "Extracting semantic relations among entities is an important first step in various tasks in Web mining and natural language processing such as information extraction, relation detection, and social network mining. A relation can be expressed extensionally by stating all the instances of that relation or intensionally by defining all the paraphrases of that relation. For example, consider the ACQUISITION relation between two companies. An extensional definition of ACQUISITION contains all pairs of companies in which one company is acquired by another (e.g. (YouTube, Google) or (Powerset, Microsoft)). On the other hand we can intensionally define ACQUISITION as the relation described by lexical patterns such as X is acquired by Y, or Y purchased X, where X and Y denote two companies. We use this dual representation of semantic relations to propose a novel sequential co-clustering algorithm that can extract numerous relations efficiently from unlabeled data. We provide an efficient heuristic to find the parameters of the proposed coclustering algorithm. Using the clusters produced by the algorithm, we train an L1 regularized logistic regression model to identify the representative patterns that describe the relation expressed by each cluster. We evaluate the proposed method in three different tasks: measuring relational similarity between entity pairs, open information extraction (Open IE), and classifying relations in a social network system. Experiments conducted using a benchmark dataset show that the proposed method improves existing relational similarity measures. Moreover, the proposed method significantly outperforms the current state-of-the-art Open IE systems in terms of both precision and recall. The proposed method correctly classifies 53 relation types in an online social network containing 470; 671 nodes and 35; 652; 475 edges, thereby demonstrating its efficacy in real-world relation detection tasks.",
keywords = "relation extraction, relational duality, relational similarity, web mining",
author = "Bollegala, {Danushka Tarupathi} and Yutaka Matsuo and Mitsuru Ishizuka",
year = "2010",
doi = "10.1145/1772690.1772707",
language = "English",
isbn = "9781605587998",
pages = "151--160",
booktitle = "Proceedings of the 19th International Conference on World Wide Web, WWW '10",

}

TY - GEN

T1 - Relational duality

T2 - Unsupervised extraction of semantic relations between entities on the web

AU - Bollegala, Danushka Tarupathi

AU - Matsuo, Yutaka

AU - Ishizuka, Mitsuru

PY - 2010

Y1 - 2010

N2 - Extracting semantic relations among entities is an important first step in various tasks in Web mining and natural language processing such as information extraction, relation detection, and social network mining. A relation can be expressed extensionally by stating all the instances of that relation or intensionally by defining all the paraphrases of that relation. For example, consider the ACQUISITION relation between two companies. An extensional definition of ACQUISITION contains all pairs of companies in which one company is acquired by another (e.g. (YouTube, Google) or (Powerset, Microsoft)). On the other hand we can intensionally define ACQUISITION as the relation described by lexical patterns such as X is acquired by Y, or Y purchased X, where X and Y denote two companies. We use this dual representation of semantic relations to propose a novel sequential co-clustering algorithm that can extract numerous relations efficiently from unlabeled data. We provide an efficient heuristic to find the parameters of the proposed coclustering algorithm. Using the clusters produced by the algorithm, we train an L1 regularized logistic regression model to identify the representative patterns that describe the relation expressed by each cluster. We evaluate the proposed method in three different tasks: measuring relational similarity between entity pairs, open information extraction (Open IE), and classifying relations in a social network system. Experiments conducted using a benchmark dataset show that the proposed method improves existing relational similarity measures. Moreover, the proposed method significantly outperforms the current state-of-the-art Open IE systems in terms of both precision and recall. The proposed method correctly classifies 53 relation types in an online social network containing 470; 671 nodes and 35; 652; 475 edges, thereby demonstrating its efficacy in real-world relation detection tasks.

AB - Extracting semantic relations among entities is an important first step in various tasks in Web mining and natural language processing such as information extraction, relation detection, and social network mining. A relation can be expressed extensionally by stating all the instances of that relation or intensionally by defining all the paraphrases of that relation. For example, consider the ACQUISITION relation between two companies. An extensional definition of ACQUISITION contains all pairs of companies in which one company is acquired by another (e.g. (YouTube, Google) or (Powerset, Microsoft)). On the other hand we can intensionally define ACQUISITION as the relation described by lexical patterns such as X is acquired by Y, or Y purchased X, where X and Y denote two companies. We use this dual representation of semantic relations to propose a novel sequential co-clustering algorithm that can extract numerous relations efficiently from unlabeled data. We provide an efficient heuristic to find the parameters of the proposed coclustering algorithm. Using the clusters produced by the algorithm, we train an L1 regularized logistic regression model to identify the representative patterns that describe the relation expressed by each cluster. We evaluate the proposed method in three different tasks: measuring relational similarity between entity pairs, open information extraction (Open IE), and classifying relations in a social network system. Experiments conducted using a benchmark dataset show that the proposed method improves existing relational similarity measures. Moreover, the proposed method significantly outperforms the current state-of-the-art Open IE systems in terms of both precision and recall. The proposed method correctly classifies 53 relation types in an online social network containing 470; 671 nodes and 35; 652; 475 edges, thereby demonstrating its efficacy in real-world relation detection tasks.

KW - relation extraction

KW - relational duality

KW - relational similarity

KW - web mining

UR - http://www.scopus.com/inward/record.url?scp=77954650176&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954650176&partnerID=8YFLogxK

U2 - 10.1145/1772690.1772707

DO - 10.1145/1772690.1772707

M3 - Conference contribution

SN - 9781605587998

SP - 151

EP - 160

BT - Proceedings of the 19th International Conference on World Wide Web, WWW '10

ER -