Evaluating evaluation metrics on the bootstrap

Tetsuya Sakai*

*この研究の対応する著者

研究成果: Conference contribution

150 被引用数 (Scopus)

抄録

This paper describes how the Bootstrap approach to statistics can be applied to the evaluation of IR effectiveness metrics. First, we argue that Bootstrap Hypothesis Tests deserve more attention from the IR community, as they are based on fewer assumptions than traditional statistical significance tests. We then describe straightforward methods for comparing the sensitivity of IR metrics based on Bootstrap Hypothesis Tests. Unlike the heuristics-based "swap" method proposed by Voorhees and Buckley, our method estimates the performance difference required to achieve a given significance level directly from Bootstrap Hypothesis Test results. In addition, we describe a simple way of examining the accuracy of rank correlation between two metrics based on the Bootstrap Estimate of Standard Error. We demonstrate the usefulness of our methods using test collections and runs from the NTCIR CLIR track for comparing seven IR metrics, including those that can handle graded relevance and those based on the Geometric Mean.

本文言語English
ホスト出版物のタイトルProceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
ページ525-532
ページ数8
出版ステータスPublished - 2006 10月 31
外部発表はい
イベント29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - Seatttle, WA, United States
継続期間: 2006 8月 62006 8月 11

出版物シリーズ

名前Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
2006

Conference

Conference29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
国/地域United States
CitySeatttle, WA
Period06/8/606/8/11

ASJC Scopus subject areas

  • 工学(全般)
  • 情報システム
  • ソフトウェア
  • 応用数学

フィンガープリント

「Evaluating evaluation metrics on the bootstrap」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル