Randomised vs. Prioritised Pools for Relevance Assessments: Sample Size Considerations

Tetsuya Sakai, Peng Xiao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The present study concerns depth-k pooling for building IR test collections. At TREC, pooled documents are traditionally presented in random order to the assessors to avoid judgement bias. In contrast, an approach that has been used widely at NTCIR is to prioritise the pooled documents based on “pseudorelevance,” in the hope of letting assessors quickly form an idea as to what constitutes a relevant document and thereby judge more efficiently and reliably. While the recent TREC 2017 Common Core Track went beyond depth-k pooling and adopted a method for selecting documents to judge dynamically, even this task let the assessors process the usual depth-10 pools first: the idea was to give the assessors a “burn-in” period, which actually appears to echo the view of the NTCIR approach. Our research questions are: (1) Which depth-k ordering strategy enables more efficient assessments? Randomisation, or prioritisation by pseudorelevance? (2) Similarly, which of the two strategies enables higher inter-assessor agreements? Our experiments based on two English web search test collections with multiple sets of graded relevance assessments suggest that randomisation outperforms prioritisation in both respects on average, although the results are statistically inconclusive. We then discuss a plan for a much larger experiment with sufficient statistical power to obtain the final verdict.

Original languageEnglish
Title of host publicationInformation Retrieval Technology - 15th Asia Information Retrieval Societies Conference, AIRS 2019, Proceedings
EditorsFu Lee Wang, Haoran Xie, Wai Lam, Aixin Sun, Lun-Wei Ku, Tianyong Hao, Wei Chen, Tak-Lam Wong, Xiaohui Tao
PublisherSpringer
Pages94-105
Number of pages12
ISBN (Print)9783030428341
DOIs
Publication statusPublished - 2020
Event15th Asia Information Retrieval Societies Conference, AIRS 2019 - Kowloon, Hong Kong
Duration: 2019 Nov 72019 Nov 9

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12004 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th Asia Information Retrieval Societies Conference, AIRS 2019
CountryHong Kong
CityKowloon
Period19/11/719/11/9

Keywords

  • Evaluation
  • Graded relevance
  • Pooling
  • Relevance assessments
  • Web search

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Randomised vs. Prioritised Pools for Relevance Assessments: Sample Size Considerations'. Together they form a unique fingerprint.

  • Cite this

    Sakai, T., & Xiao, P. (2020). Randomised vs. Prioritised Pools for Relevance Assessments: Sample Size Considerations. In F. L. Wang, H. Xie, W. Lam, A. Sun, L-W. Ku, T. Hao, W. Chen, T-L. Wong, & X. Tao (Eds.), Information Retrieval Technology - 15th Asia Information Retrieval Societies Conference, AIRS 2019, Proceedings (pp. 94-105). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12004 LNCS). Springer. https://doi.org/10.1007/978-3-030-42835-8_9