Low-cost, bottom-up measures for evaluating search result diversification

Zhicheng Dou, Xue Yang, Diya Li, Ji Rong Wen, Tetsuya Sakai

Research output: Contribution to journalArticle

Abstract

Search result diversification aims at covering different user intents by returning a diversified document list. Most existing diversity measures require a predefined set of intents for a given query, where it is assumed that there is no relationship across these intents. However, studies have shown that modeling a hierarchy of intents has some benefits over the standard measure of using a flat list of intents. Intuitively, having more layers in the intent hierarchy seems to imply that we can consider more intricate relationships between intents and thereby identify subtle differences between documents that cover different intents. On the other hand, manually building a rich intent hierarchy imposes extra cost and is probably not very practical. In light of these considerations, we first propose a measure to build a hierarchy of intents from a given set of flat intents by clustering per-intent relevant documents and thereby identifying subintents. Furthermore, in our second measure, we consider a variant of our first measure that clusters per-topic relevance documents rather than per-intent ones, which is also intent-free. In addition, we propose our third measure, a simple, completely intent-free measure to search result diversity evaluation, which leverages document similarities. Our experiments based on TREC Web Track 2009–2013 test collections show that our proposed measures have advantages over existing diversity measures despite their low annotation costs.

Original languageEnglish
JournalInformation Retrieval Journal
DOIs
Publication statusPublished - 2019 Jan 1

    Fingerprint

Keywords

  • Evaluation measure
  • Hierarchical clustering
  • Search result diversification

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Cite this