Relation extraction from Wikipedia using subtree mining

Dat P T Nguyen, Yutaka Matsuo, Mitsuru Ishizuka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

67 Citations (Scopus)

Abstract

The exponential growth and reliability of Wikipedia have made it a promising data source for intelligent systems. The first challenge of Wikipedia is to make the encyclopedia machine-processable. In this study, we address the problem of extracting relations among entities from Wikipedia's English articles, which in turn can serve for intelligent systems to satisfy users' information needs. Our proposed method first anchors the appearance of entities in Wikipedia articles using some heuristic rules that are supported by their encyclopedic style. Therefore, it uses neither the Named Entity Recognizer (NER) nor the Coreference Resolution tool, which are sources of errors for relation extraction. It then classifies the relationships among entity pairs using SVM with features extracted from the web structure and subtrees mined from the syntactic structure of text. The innovations behind our work are the following: a) our method makes use of Wikipedia characteristics for entity allocation and entity classification, which are essential for relation extraction; b) our algorithm extracts a core tree, which accurately reflects a relationship between a given entity pair, and subsequently identifies key features with respect to the relationship from the core tree. We demonstrate the effectiveness of our approach through evaluation of manually annotated data from actual Wikipedia articles.

Original languageEnglish
Title of host publicationProceedings of the National Conference on Artificial Intelligence
Pages1414-1420
Number of pages7
Volume2
Publication statusPublished - 2007
Externally publishedYes
EventAAAI-07/IAAI-07 Proceedings: 22nd AAAI Conference on Artificial Intelligence and the 19th Innovative Applications of Artificial Intelligence Conference - Vancouver, BC
Duration: 2007 Jul 222007 Jul 26

Other

OtherAAAI-07/IAAI-07 Proceedings: 22nd AAAI Conference on Artificial Intelligence and the 19th Innovative Applications of Artificial Intelligence Conference
CityVancouver, BC
Period07/7/2207/7/26

Fingerprint

Intelligent systems
Syntactics
Anchors
Innovation

ASJC Scopus subject areas

  • Software

Cite this

Nguyen, D. P. T., Matsuo, Y., & Ishizuka, M. (2007). Relation extraction from Wikipedia using subtree mining. In Proceedings of the National Conference on Artificial Intelligence (Vol. 2, pp. 1414-1420)

Relation extraction from Wikipedia using subtree mining. / Nguyen, Dat P T; Matsuo, Yutaka; Ishizuka, Mitsuru.

Proceedings of the National Conference on Artificial Intelligence. Vol. 2 2007. p. 1414-1420.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nguyen, DPT, Matsuo, Y & Ishizuka, M 2007, Relation extraction from Wikipedia using subtree mining. in Proceedings of the National Conference on Artificial Intelligence. vol. 2, pp. 1414-1420, AAAI-07/IAAI-07 Proceedings: 22nd AAAI Conference on Artificial Intelligence and the 19th Innovative Applications of Artificial Intelligence Conference, Vancouver, BC, 07/7/22.
Nguyen DPT, Matsuo Y, Ishizuka M. Relation extraction from Wikipedia using subtree mining. In Proceedings of the National Conference on Artificial Intelligence. Vol. 2. 2007. p. 1414-1420
Nguyen, Dat P T ; Matsuo, Yutaka ; Ishizuka, Mitsuru. / Relation extraction from Wikipedia using subtree mining. Proceedings of the National Conference on Artificial Intelligence. Vol. 2 2007. pp. 1414-1420
@inproceedings{90879878bfc94230b9950693d705924b,
title = "Relation extraction from Wikipedia using subtree mining",
abstract = "The exponential growth and reliability of Wikipedia have made it a promising data source for intelligent systems. The first challenge of Wikipedia is to make the encyclopedia machine-processable. In this study, we address the problem of extracting relations among entities from Wikipedia's English articles, which in turn can serve for intelligent systems to satisfy users' information needs. Our proposed method first anchors the appearance of entities in Wikipedia articles using some heuristic rules that are supported by their encyclopedic style. Therefore, it uses neither the Named Entity Recognizer (NER) nor the Coreference Resolution tool, which are sources of errors for relation extraction. It then classifies the relationships among entity pairs using SVM with features extracted from the web structure and subtrees mined from the syntactic structure of text. The innovations behind our work are the following: a) our method makes use of Wikipedia characteristics for entity allocation and entity classification, which are essential for relation extraction; b) our algorithm extracts a core tree, which accurately reflects a relationship between a given entity pair, and subsequently identifies key features with respect to the relationship from the core tree. We demonstrate the effectiveness of our approach through evaluation of manually annotated data from actual Wikipedia articles.",
author = "Nguyen, {Dat P T} and Yutaka Matsuo and Mitsuru Ishizuka",
year = "2007",
language = "English",
isbn = "1577353234",
volume = "2",
pages = "1414--1420",
booktitle = "Proceedings of the National Conference on Artificial Intelligence",

}

TY - GEN

T1 - Relation extraction from Wikipedia using subtree mining

AU - Nguyen, Dat P T

AU - Matsuo, Yutaka

AU - Ishizuka, Mitsuru

PY - 2007

Y1 - 2007

N2 - The exponential growth and reliability of Wikipedia have made it a promising data source for intelligent systems. The first challenge of Wikipedia is to make the encyclopedia machine-processable. In this study, we address the problem of extracting relations among entities from Wikipedia's English articles, which in turn can serve for intelligent systems to satisfy users' information needs. Our proposed method first anchors the appearance of entities in Wikipedia articles using some heuristic rules that are supported by their encyclopedic style. Therefore, it uses neither the Named Entity Recognizer (NER) nor the Coreference Resolution tool, which are sources of errors for relation extraction. It then classifies the relationships among entity pairs using SVM with features extracted from the web structure and subtrees mined from the syntactic structure of text. The innovations behind our work are the following: a) our method makes use of Wikipedia characteristics for entity allocation and entity classification, which are essential for relation extraction; b) our algorithm extracts a core tree, which accurately reflects a relationship between a given entity pair, and subsequently identifies key features with respect to the relationship from the core tree. We demonstrate the effectiveness of our approach through evaluation of manually annotated data from actual Wikipedia articles.

AB - The exponential growth and reliability of Wikipedia have made it a promising data source for intelligent systems. The first challenge of Wikipedia is to make the encyclopedia machine-processable. In this study, we address the problem of extracting relations among entities from Wikipedia's English articles, which in turn can serve for intelligent systems to satisfy users' information needs. Our proposed method first anchors the appearance of entities in Wikipedia articles using some heuristic rules that are supported by their encyclopedic style. Therefore, it uses neither the Named Entity Recognizer (NER) nor the Coreference Resolution tool, which are sources of errors for relation extraction. It then classifies the relationships among entity pairs using SVM with features extracted from the web structure and subtrees mined from the syntactic structure of text. The innovations behind our work are the following: a) our method makes use of Wikipedia characteristics for entity allocation and entity classification, which are essential for relation extraction; b) our algorithm extracts a core tree, which accurately reflects a relationship between a given entity pair, and subsequently identifies key features with respect to the relationship from the core tree. We demonstrate the effectiveness of our approach through evaluation of manually annotated data from actual Wikipedia articles.

UR - http://www.scopus.com/inward/record.url?scp=36348929337&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=36348929337&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:36348929337

SN - 1577353234

SN - 9781577353232

VL - 2

SP - 1414

EP - 1420

BT - Proceedings of the National Conference on Artificial Intelligence

ER -