Designing test collections for comparing many systems

Tetsuya Sakai*

*この研究の対応する著者

研究成果: Conference contribution

12 被引用数 (Scopus)

抄録

A researcher decides to build a test collection for comparing her new information retrieval (IR) systems with several state-of-the-art baselines. She wants to know the number of topics (n) she needs to create in advance, so that she can start looking for (say) a query log large enough for sampling n good topics, and estimating the relevance assessment cost. We provide practical solutions to researchers like her using power analysis and sample size design techniques, and demonstrate its usefulness for several IR tasks and evaluation measures. We consider not only the paired t-test but also one-way analysis of variance (ANOVA) for significance testing to accommodate comparison of m(≥ 2) systems under a given set of statistical requirements (α: the Type I error rate, β: the Type II error rate, and minD: the minimum detectable difference between the best and the worst systems). Using our simple Excel tools and some pooled variance estimates from past data, researchers can design statistically well-designed test collections. We demonstrate that, as different evaluation measures have different variances across topics, they inevitably require different topic set sizes. This suggests that the evaluation measures should be chosen at the test collection design phase. Moreover, through a pool depth reduction experiment with past data, we show how the relevance assessment cost can be reduced dramatically while freezing the set of statistical requirements. Based on the cost analysis and the available budget, researchers can determine the right balance betweeen n and the pool depth pd. Our techniques and tools are applicable to test collections for non-IR tasks as well.

本文言語English
ホスト出版物のタイトルCIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management
出版社Association for Computing Machinery, Inc
ページ61-70
ページ数10
ISBN(電子版)9781450325981
DOI
出版ステータスPublished - 2014 11月 3
イベント23rd ACM International Conference on Information and Knowledge Management, CIKM 2014 - Shanghai, China
継続期間: 2014 11月 32014 11月 7

出版物シリーズ

名前CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management

Other

Other23rd ACM International Conference on Information and Knowledge Management, CIKM 2014
国/地域China
CityShanghai
Period14/11/314/11/7

ASJC Scopus subject areas

  • 情報システムおよび情報管理
  • コンピュータ サイエンスの応用
  • 情報システム

フィンガープリント

「Designing test collections for comparing many systems」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル