Cross-language latent relational search between Japanese and english languages using a web corpus

Nguyen Tuan Duc, Danushka Bollegala, Mitsuru Ishizuka

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Latent relational search is a novel entity retrieval paradigm based on the proportional analogy between two entity pairs. Given a latent relational search query {(Japan, Tokyo), (France, ?)}, a latent relational search engine is expected to retrieve and rank the entity "Paris" as the first answer in the result list. A latent relational search engine extracts entities and relations between those entities from a corpus, such as the Web. Moreover, from some supporting sentences in the corpus, (e.g., "Tokyo is the capital of Japan" and "Paris is the capital and biggest city of France"), the search engine must recognize the relational similarity between the two entity pairs. In cross-language latent relational search, the entity pairs as well as the supporting sentences of the first entity pair and of the second entity pair are in different languages. Therefore, the search engine must recognize similar semantic relations across languages. In this article, we study the problem of cross-language latent relational search between Japanese and English using Web data. To perform cross-language latent relational search in high speed, we propose a multi-lingual indexing method for storing entities and lexical patterns that represent the semantic relations extracted from Web corpora. We then propose a hybrid lexical pattern clustering algorithm to capture the semantic similarity between lexical patterns across languages. Using this algorithm, we can precisely measure the relational similarity between entity pairs across languages, thereby achieving high precision in the task of cross-language latent relational search. Experiments show that the proposed method achieves an MRR of 0.605 on Japanese- English cross-language latent relational search query sets and it also achieves a reasonable performance on the INEX Entity Ranking task.

Original languageEnglish
Article number11
JournalACM Transactions on Asian Language Information Processing
Volume11
Issue number3
DOIs
Publication statusPublished - 2012 Sep
Externally publishedYes

Keywords

  • Analogical search
  • Cross-language relational search
  • Latent relational analysis
  • Latent relational search

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'Cross-language latent relational search between Japanese and english languages using a web corpus'. Together they form a unique fingerprint.

Cite this