Iterative algorithm for inferring entity types from enumerative descriptions

Qian Chen, Mizuho Iwaihara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Entity type matching has many real world applications, especially in entity clustering, de-duplication and efficient query processing. Current methods to extract entities from text usually disregard regularities in the order of entities appearing in the text. In this paper, we focus on enumerative descriptions which enlist entity names in a certain hidden order, often occurring in web documents as listings and tables. We propose an algorithm to discover entity types from enumerative descriptions, where a type hierarchy is known but enumerating orders are hidden and heterogeneous, and partial entity-type mappings are given as seed instances. Our algorithm is iterative: We extract skeletons from syntactic patterns, then train a hidden Markov model to find an optimum enumerating order from seed instances and skeletons, to find a best-fit entity-type assignment.

Original languageEnglish
Title of host publicationWWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web
PublisherAssociation for Computing Machinery, Inc
Pages1285-1290
Number of pages6
ISBN (Electronic)9781450327459
DOIs
Publication statusPublished - 2014 Apr 7
Event23rd International Conference on World Wide Web, WWW 2014 - Seoul, Korea, Republic of
Duration: 2014 Apr 72014 Apr 11

Other

Other23rd International Conference on World Wide Web, WWW 2014
CountryKorea, Republic of
CitySeoul
Period14/4/714/4/11

Fingerprint

Seed
Query processing
Syntactics
Hidden Markov models

Keywords

  • Hidden markov model
  • Information extraction
  • RDF graph

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Cite this

Chen, Q., & Iwaihara, M. (2014). Iterative algorithm for inferring entity types from enumerative descriptions. In WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web (pp. 1285-1290). Association for Computing Machinery, Inc. https://doi.org/10.1145/2567948.2579706

Iterative algorithm for inferring entity types from enumerative descriptions. / Chen, Qian; Iwaihara, Mizuho.

WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web. Association for Computing Machinery, Inc, 2014. p. 1285-1290.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, Q & Iwaihara, M 2014, Iterative algorithm for inferring entity types from enumerative descriptions. in WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web. Association for Computing Machinery, Inc, pp. 1285-1290, 23rd International Conference on World Wide Web, WWW 2014, Seoul, Korea, Republic of, 14/4/7. https://doi.org/10.1145/2567948.2579706
Chen Q, Iwaihara M. Iterative algorithm for inferring entity types from enumerative descriptions. In WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web. Association for Computing Machinery, Inc. 2014. p. 1285-1290 https://doi.org/10.1145/2567948.2579706
Chen, Qian ; Iwaihara, Mizuho. / Iterative algorithm for inferring entity types from enumerative descriptions. WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web. Association for Computing Machinery, Inc, 2014. pp. 1285-1290
@inproceedings{563b370fe7384daa83d780e22936cec0,
title = "Iterative algorithm for inferring entity types from enumerative descriptions",
abstract = "Entity type matching has many real world applications, especially in entity clustering, de-duplication and efficient query processing. Current methods to extract entities from text usually disregard regularities in the order of entities appearing in the text. In this paper, we focus on enumerative descriptions which enlist entity names in a certain hidden order, often occurring in web documents as listings and tables. We propose an algorithm to discover entity types from enumerative descriptions, where a type hierarchy is known but enumerating orders are hidden and heterogeneous, and partial entity-type mappings are given as seed instances. Our algorithm is iterative: We extract skeletons from syntactic patterns, then train a hidden Markov model to find an optimum enumerating order from seed instances and skeletons, to find a best-fit entity-type assignment.",
keywords = "Hidden markov model, Information extraction, RDF graph",
author = "Qian Chen and Mizuho Iwaihara",
year = "2014",
month = "4",
day = "7",
doi = "10.1145/2567948.2579706",
language = "English",
pages = "1285--1290",
booktitle = "WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - Iterative algorithm for inferring entity types from enumerative descriptions

AU - Chen, Qian

AU - Iwaihara, Mizuho

PY - 2014/4/7

Y1 - 2014/4/7

N2 - Entity type matching has many real world applications, especially in entity clustering, de-duplication and efficient query processing. Current methods to extract entities from text usually disregard regularities in the order of entities appearing in the text. In this paper, we focus on enumerative descriptions which enlist entity names in a certain hidden order, often occurring in web documents as listings and tables. We propose an algorithm to discover entity types from enumerative descriptions, where a type hierarchy is known but enumerating orders are hidden and heterogeneous, and partial entity-type mappings are given as seed instances. Our algorithm is iterative: We extract skeletons from syntactic patterns, then train a hidden Markov model to find an optimum enumerating order from seed instances and skeletons, to find a best-fit entity-type assignment.

AB - Entity type matching has many real world applications, especially in entity clustering, de-duplication and efficient query processing. Current methods to extract entities from text usually disregard regularities in the order of entities appearing in the text. In this paper, we focus on enumerative descriptions which enlist entity names in a certain hidden order, often occurring in web documents as listings and tables. We propose an algorithm to discover entity types from enumerative descriptions, where a type hierarchy is known but enumerating orders are hidden and heterogeneous, and partial entity-type mappings are given as seed instances. Our algorithm is iterative: We extract skeletons from syntactic patterns, then train a hidden Markov model to find an optimum enumerating order from seed instances and skeletons, to find a best-fit entity-type assignment.

KW - Hidden markov model

KW - Information extraction

KW - RDF graph

UR - http://www.scopus.com/inward/record.url?scp=84990911500&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84990911500&partnerID=8YFLogxK

U2 - 10.1145/2567948.2579706

DO - 10.1145/2567948.2579706

M3 - Conference contribution

SP - 1285

EP - 1290

BT - WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web

PB - Association for Computing Machinery, Inc

ER -