Which diversity evaluation measures are “good”?

Tetsuya Sakai, Zhaohao Zeng

研究成果: Conference contribution

19 被引用数 (Scopus)

抄録

This study evaluates 30 IR evaluation measures or their instances, of which nine are for adhoc IR and 21 are for diversified IR, primarily from the viewpoint of whether their preferences of one SERP (search engine result page) over another actually align with users' preferences. The gold preferences were contructed by hiring 15 assessors, who independently examined 1,127 SERP pairs and made preference assessments. Two sets of preference assessments were obtained: one based on a relevance question “Which SERP is more relevant?” and the other based on a diversity question “Which SERP is likely to satisfy a higher number of users?” To our knowledge, our study is the first to have collected diversity preference assessments in this way and evaluated diversity measures successfully. Our main results are that (a) Popular adhoc IR measures such as nDCG actually align quite well with the gold relevance preferences; and that (b) While the D#-measures align well with the gold diversity preferences, intent-aware measures perform relatively poorly. Moreover, as by-products of our analysis of existing evaluation measures, we define new adhoc measures called iRBU (intentwise Rank-Biased Utility) and EBR (Expected Blended Ratio); we demonstrate that an instance of iRBU performs as well as nDCG when compared to the gold relevance preferences. On the other hand, the original RBU, a recently-proposed diversity measure, underperforms the best D#-measures when compared to the gold diversity preferences.

本文言語English
ホスト出版物のタイトルSIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
出版社Association for Computing Machinery, Inc
ページ595-604
ページ数10
ISBN(電子版)9781450361729
DOI
出版ステータスPublished - 2019 7月 18
イベント42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019 - Paris, France
継続期間: 2019 7月 212019 7月 25

出版物シリーズ

名前SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019
国/地域France
CityParis
Period19/7/2119/7/25

ASJC Scopus subject areas

  • 情報システム
  • 応用数学
  • ソフトウェア

フィンガープリント

「Which diversity evaluation measures are “good”?」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル