Abstract
Entity type matching has many real world applications, especially in entity clustering, de-duplication and efficient query processing. Current methods to extract entities from text usually disregard regularities in the order of entities appearing in the text. In this paper, we focus on enumerative descriptions which enlist entity names in a certain hidden order, often occurring in web documents as listings and tables. We propose an algorithm to discover entity types from enumerative descriptions, where a type hierarchy is known but enumerating orders are hidden and heterogeneous, and partial entity-type mappings are given as seed instances. Our algorithm is iterative: We extract skeletons from syntactic patterns, then train a hidden Markov model to find an optimum enumerating order from seed instances and skeletons, to find a best-fit entity-type assignment.
Original language | English |
---|---|
Title of host publication | WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web |
Publisher | Association for Computing Machinery, Inc |
Pages | 1285-1290 |
Number of pages | 6 |
ISBN (Electronic) | 9781450327459 |
DOIs | |
Publication status | Published - 2014 Apr 7 |
Event | 23rd International Conference on World Wide Web, WWW 2014 - Seoul, Korea, Republic of Duration: 2014 Apr 7 → 2014 Apr 11 |
Other
Other | 23rd International Conference on World Wide Web, WWW 2014 |
---|---|
Country | Korea, Republic of |
City | Seoul |
Period | 14/4/7 → 14/4/11 |
Keywords
- Hidden markov model
- Information extraction
- RDF graph
ASJC Scopus subject areas
- Computer Networks and Communications
- Software