WWW3E8: 259,000 Relevance Labels for Studying the Effect of Document Presentation Order for Relevance Assessors

Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

研究成果: Conference contribution

1 被引用数 (Scopus)

抄録

In IR evaluation based on depth-k pooling, there are several strategies to order the pooled documents for relevance assessors. Among them, the simplest approach is to completely randomise the order "so assessors cannot tell if a document was highly ranked by some system or how many systems (or which systems) retrieved the document."An approach that is in sharp contrast to the above is the prioritisation approach taken by NTCIRPOOL, a tool widely used at NTCIR. NTCIRPOOL sorts the pooled documents by "pseudorelevance,"a statistic that reflects the popularity of each document within the depth-k pools. Although these two strategies have coexisted for over two decades, the IR research community has yet to reach a consensus as to what advantages each of these two strategies actually offer. To help researchers directly address this question using their favourite methods of analysis, we have released a large-scale data set called WWW3E8. It comprises eight independent sets of qrels for the 160 English topics of the NTCIR-15 WWW-3 task: four qrels files constructed using the randomisation approach, and another four constructed using the prioritisation approach of NTCIRPOOL. Each qrels file covers 32,375 topic-document pairs; hence, WWW3E8 contains a total of 259,000 relevance labels. Moreover, the data set contains the raw English subtask run files from the WWW-3 task, the randomised and prioritised pool files, and topic-by-run score matrices of the official measures used in the task. Hence, researchers interested in the above research question regarding document ordering can utilise WWW3E8 as a common ground to directly compare the two strategies.

本文言語English
ホスト出版物のタイトルSIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
出版社Association for Computing Machinery, Inc
ページ2376-2382
ページ数7
ISBN(電子版)9781450380379
DOI
出版ステータスPublished - 2021 7月 11
イベント44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021 - Virtual, Online, Canada
継続期間: 2021 7月 112021 7月 15

出版物シリーズ

名前SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021
国/地域Canada
CityVirtual, Online
Period21/7/1121/7/15

ASJC Scopus subject areas

  • ソフトウェア
  • コンピュータ グラフィックスおよびコンピュータ支援設計
  • 情報システム

フィンガープリント

「WWW3E8: 259,000 Relevance Labels for Studying the Effect of Document Presentation Order for Relevance Assessors」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル