Cross-language latent relational search between Japanese and english languages using a web corpus

Nguyen Tuan Duc, Danushka Bollegala, Mitsuru Ishizuka

研究成果: Article査読

5 被引用数 (Scopus)

抄録

Latent relational search is a novel entity retrieval paradigm based on the proportional analogy between two entity pairs. Given a latent relational search query {(Japan, Tokyo), (France, ?)}, a latent relational search engine is expected to retrieve and rank the entity "Paris" as the first answer in the result list. A latent relational search engine extracts entities and relations between those entities from a corpus, such as the Web. Moreover, from some supporting sentences in the corpus, (e.g., "Tokyo is the capital of Japan" and "Paris is the capital and biggest city of France"), the search engine must recognize the relational similarity between the two entity pairs. In cross-language latent relational search, the entity pairs as well as the supporting sentences of the first entity pair and of the second entity pair are in different languages. Therefore, the search engine must recognize similar semantic relations across languages. In this article, we study the problem of cross-language latent relational search between Japanese and English using Web data. To perform cross-language latent relational search in high speed, we propose a multi-lingual indexing method for storing entities and lexical patterns that represent the semantic relations extracted from Web corpora. We then propose a hybrid lexical pattern clustering algorithm to capture the semantic similarity between lexical patterns across languages. Using this algorithm, we can precisely measure the relational similarity between entity pairs across languages, thereby achieving high precision in the task of cross-language latent relational search. Experiments show that the proposed method achieves an MRR of 0.605 on Japanese- English cross-language latent relational search query sets and it also achieves a reasonable performance on the INEX Entity Ranking task.

本文言語English
論文番号11
ジャーナルACM Transactions on Asian Language Information Processing
11
3
DOI
出版ステータスPublished - 2012 9
外部発表はい

ASJC Scopus subject areas

  • Computer Science(all)

フィンガープリント 「Cross-language latent relational search between Japanese and english languages using a web corpus」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル