Bootstrap-based comparisons of IR metrics for finding one relevant document

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

This paper compares the sensitivity of IR metrics designed for the task of finding one relevant document, using a method recently proposed at SIGIR 2006. The metrics are: P+-measure, P-measure, O-measure, Normalised Weighted Reciprocal Rank (NWRR) and Reciprocal Rank (RR). All of them except for RR can handle graded relevance. Unlike the ad hoc (but nevertheless useful) "swap" method proposed by Voorhees and Buckley, the new method derives the sensitivity and the performance difference required to guarantee a given significance level directly from Bootstrap Hypothesis Tests. We use four data sets from NTCIR to show that, according to this method, "P( +)-measure ≥ O-measure ≥ NWRR ≥ RR" generally holds, where "≥" means "is at least as sensitive as". These results generalise and reinforce previously reported ones based on the swap method. Therefore, we recommend the use of P(+)-measure and O-measure for practical tasks such as known-item search where recall is either unimportant or immeasurable.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages374-389
Number of pages16
Volume4182 LNCS
Publication statusPublished - 2006
Externally publishedYes
Event3rd Asia Information Retrieval Symposium, AIRS 2006 - Singapore
Duration: 2006 Oct 162006 Oct 18

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4182 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other3rd Asia Information Retrieval Symposium, AIRS 2006
CitySingapore
Period06/10/1606/10/18

Fingerprint

Bootstrap
Metric
Swap
Bootstrap Test
Significance level
Hypothesis Test
Generalise

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Sakai, T. (2006). Bootstrap-based comparisons of IR metrics for finding one relevant document. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4182 LNCS, pp. 374-389). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4182 LNCS).

Bootstrap-based comparisons of IR metrics for finding one relevant document. / Sakai, Tetsuya.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4182 LNCS 2006. p. 374-389 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4182 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sakai, T 2006, Bootstrap-based comparisons of IR metrics for finding one relevant document. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 4182 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4182 LNCS, pp. 374-389, 3rd Asia Information Retrieval Symposium, AIRS 2006, Singapore, 06/10/16.
Sakai T. Bootstrap-based comparisons of IR metrics for finding one relevant document. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4182 LNCS. 2006. p. 374-389. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Sakai, Tetsuya. / Bootstrap-based comparisons of IR metrics for finding one relevant document. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4182 LNCS 2006. pp. 374-389 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{bac31abc57c94ba2a0c589418375a838,
title = "Bootstrap-based comparisons of IR metrics for finding one relevant document",
abstract = "This paper compares the sensitivity of IR metrics designed for the task of finding one relevant document, using a method recently proposed at SIGIR 2006. The metrics are: P+-measure, P-measure, O-measure, Normalised Weighted Reciprocal Rank (NWRR) and Reciprocal Rank (RR). All of them except for RR can handle graded relevance. Unlike the ad hoc (but nevertheless useful) {"}swap{"} method proposed by Voorhees and Buckley, the new method derives the sensitivity and the performance difference required to guarantee a given significance level directly from Bootstrap Hypothesis Tests. We use four data sets from NTCIR to show that, according to this method, {"}P( +)-measure ≥ O-measure ≥ NWRR ≥ RR{"} generally holds, where {"}≥{"} means {"}is at least as sensitive as{"}. These results generalise and reinforce previously reported ones based on the swap method. Therefore, we recommend the use of P(+)-measure and O-measure for practical tasks such as known-item search where recall is either unimportant or immeasurable.",
author = "Tetsuya Sakai",
year = "2006",
language = "English",
isbn = "3540457801",
volume = "4182 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "374--389",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Bootstrap-based comparisons of IR metrics for finding one relevant document

AU - Sakai, Tetsuya

PY - 2006

Y1 - 2006

N2 - This paper compares the sensitivity of IR metrics designed for the task of finding one relevant document, using a method recently proposed at SIGIR 2006. The metrics are: P+-measure, P-measure, O-measure, Normalised Weighted Reciprocal Rank (NWRR) and Reciprocal Rank (RR). All of them except for RR can handle graded relevance. Unlike the ad hoc (but nevertheless useful) "swap" method proposed by Voorhees and Buckley, the new method derives the sensitivity and the performance difference required to guarantee a given significance level directly from Bootstrap Hypothesis Tests. We use four data sets from NTCIR to show that, according to this method, "P( +)-measure ≥ O-measure ≥ NWRR ≥ RR" generally holds, where "≥" means "is at least as sensitive as". These results generalise and reinforce previously reported ones based on the swap method. Therefore, we recommend the use of P(+)-measure and O-measure for practical tasks such as known-item search where recall is either unimportant or immeasurable.

AB - This paper compares the sensitivity of IR metrics designed for the task of finding one relevant document, using a method recently proposed at SIGIR 2006. The metrics are: P+-measure, P-measure, O-measure, Normalised Weighted Reciprocal Rank (NWRR) and Reciprocal Rank (RR). All of them except for RR can handle graded relevance. Unlike the ad hoc (but nevertheless useful) "swap" method proposed by Voorhees and Buckley, the new method derives the sensitivity and the performance difference required to guarantee a given significance level directly from Bootstrap Hypothesis Tests. We use four data sets from NTCIR to show that, according to this method, "P( +)-measure ≥ O-measure ≥ NWRR ≥ RR" generally holds, where "≥" means "is at least as sensitive as". These results generalise and reinforce previously reported ones based on the swap method. Therefore, we recommend the use of P(+)-measure and O-measure for practical tasks such as known-item search where recall is either unimportant or immeasurable.

UR - http://www.scopus.com/inward/record.url?scp=33751354079&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33751354079&partnerID=8YFLogxK

M3 - Conference contribution

SN - 3540457801

SN - 9783540457800

VL - 4182 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 374

EP - 389

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -