Comparing metrics across TREC and NTCIR: The robustness to system bias

Tetsuya Sakai*

*この研究の対応する著者

研究成果: Conference contribution

20 被引用数 (Scopus)

抄録

Test collections are growing larger, and relevance data constructed through pooling are suspected of becoming more and more incomplete and biased. Several studies have used evaluation metrics specifically designed to handle this problem, but most of them have only examined the metrics under incomplete but unbiased conditions, using random samples of the original relevance data. This paper examines nine metrics in a more realistic setting, by reducing the number of pooled systems. Even though previous work has shown that metrics based on a condensed list, obtained by removing all unjudged documents from the original ranked list, are effective for handling very incomplete but unbiased relevance data, we show that these results do not hold in the presence of system bias. In our experiments using TREC and NTCIR data, we first show that condensed-list metrics overestimate new systems while traditional metrics underestimate them, and that the overestimation tends to be larger than the underestimation. We then show that, when relevance data is heavily biased towards a single team or a few teams, the condensed-list versions of Average Precision (AP), Q-measure (Q) and normalised Discounted Cumulative Gain (nDCG), which we call AP', Q' and nDCG', are not necessarily superior to the original metrics in terms of discriminative power, i.e., the overall ability to detect pairwise statistical significance. Nevertheless, even under system bias, AP' and Q' are generally more discriminative than bpref and the condensed-list version of Rank-Biased Precision (RBP), which we call RBP'.

本文言語English
ホスト出版物のタイトルProceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM'08
ページ581-590
ページ数10
DOI
出版ステータスPublished - 2008 12月 1
外部発表はい
イベント17th ACM Conference on Information and Knowledge Management, CIKM'08 - Napa Valley, CA, United States
継続期間: 2008 10月 262008 10月 30

出版物シリーズ

名前International Conference on Information and Knowledge Management, Proceedings

Conference

Conference17th ACM Conference on Information and Knowledge Management, CIKM'08
国/地域United States
CityNapa Valley, CA
Period08/10/2608/10/30

ASJC Scopus subject areas

  • 決定科学(全般)
  • ビジネス、管理および会計(全般)

フィンガープリント

「Comparing metrics across TREC and NTCIR: The robustness to system bias」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル