Comparing two binned probability distributions for information access evaluation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Some modern information access tasks such as natural language dialogue tasks are difficult to evaluate, for often there is no such thing as the ground truth: different users may have different opinions about the system's output. A few task designs for dialogue evaluation have been implemented and/or proposed recently, where both the ground truth data and the system's output are represented as a distribution of users' votes over bins on a non-nominal scale. The present study first points out that popular bin-by-bin measures such as Jensen-Shannon divergence and Sum of Squared Errors are clearly not adequate for such tasks, and that cross-bin measures should be used. Through experiments using artificial distributions as well as real ones from a dialogue evaluation task, we demonstrate that two cross-bin measures, namely, the Normalised Match Distance (NMD; a special case of the Earth Mover's Distance) and the Root Symmetric Normalised Order-aware Divergence (RSNOD), are indeed substantially different from the bin-by-bin measures.Furthermore, RSNOD lies between the popular bin-by-bin measures and NMD in terms of how it behaves. We recommend using both of these measures in the aforementioned type of evaluation tasks.

Original languageEnglish
Title of host publication41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
PublisherAssociation for Computing Machinery, Inc
Pages1073-1076
Number of pages4
ISBN (Electronic)9781450356572
DOIs
Publication statusPublished - 2018 Jun 27
Event41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018 - Ann Arbor, United States
Duration: 2018 Jul 82018 Jul 12

Publication series

Name41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018

Other

Other41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
CountryUnited States
CityAnn Arbor
Period18/7/818/7/12

Keywords

  • Dialogue evaluation
  • Earth mover's distance
  • Evaluation measures
  • Jensen-shannon divergence
  • Kullback-leibler divergence
  • Order-aware divergence
  • Wasserstein distance

ASJC Scopus subject areas

  • Software
  • Computer Graphics and Computer-Aided Design
  • Information Systems

Fingerprint Dive into the research topics of 'Comparing two binned probability distributions for information access evaluation'. Together they form a unique fingerprint.

Cite this