Evaluating Relevance Judgments with Pairwise Discriminative Power

Zhumin Chu, Jiaxin Mao, Fan Zhang, Yiqun Liu*, Tetsuya Sakai, Min Zhang, Shaoping Ma

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Relevance judgments play an essential role in the evaluation of information retrieval systems. As many different relevance judgment settings have been proposed in recent years, an evaluation metric to compare relevance judgments in different annotation settings has become a necessity. Traditional metrics, such as , Krippendorff's α and φ have mainly focused on the inter-assessor consistency to evaluate the quality of relevance judgments. They encounter "reliable but useless"problem when employed to compare different annotation settings (e.g. binary judgment v.s. 4-grade judgment). Meanwhile, other existing popular metrics such as discriminative power (DP) are not designed to compare relevance judgments across different annotation settings, they therefore suffer from limitations, such as the requirement of result ranking lists from different systems. Therefore, how to design an evaluation metric to compare relevance judgments under different grade settings needs further investigation. In this work, we propose a novel metric named pairwise discriminative power (PDP) to evaluate the quality of relevance judgment collections. By leveraging a small amount of document-level preference tests, PDP estimates the discriminative ability of relevance judgments on separating ranking lists with various qualities. With comprehensive experiments on both synthetic and real-world datasets, we show that PDP maintains a high degree of consistency with annotation quality in various grade settings. Compared with existing metrics (e.g., Krippendorff's α, φ, DP, etc), it provides reliable evaluation results with affordable additional annotation efforts.

Original languageEnglish
Title of host publicationCIKM 2021 - Proceedings of the 30th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages261-270
Number of pages10
ISBN (Electronic)9781450384469
DOIs
Publication statusPublished - 2021 Oct 26
Event30th ACM International Conference on Information and Knowledge Management, CIKM 2021 - Virtual, Online, Australia
Duration: 2021 Nov 12021 Nov 5

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference30th ACM International Conference on Information and Knowledge Management, CIKM 2021
Country/TerritoryAustralia
CityVirtual, Online
Period21/11/121/11/5

Keywords

  • evaluation metric
  • preference test
  • relevance judgment

ASJC Scopus subject areas

  • Business, Management and Accounting(all)
  • Decision Sciences(all)

Fingerprint

Dive into the research topics of 'Evaluating Relevance Judgments with Pairwise Discriminative Power'. Together they form a unique fingerprint.

Cite this