Comparing two binned probability distributions for information access evaluation

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    Some modern information access tasks such as natural language dialogue tasks are difficult to evaluate, for often there is no such thing as the ground truth: different users may have different opinions about the system's output. A few task designs for dialogue evaluation have been implemented and/or proposed recently, where both the ground truth data and the system's output are represented as a distribution of users' votes over bins on a non-nominal scale. The present study first points out that popular bin-by-bin measures such as Jensen-Shannon divergence and Sum of Squared Errors are clearly not adequate for such tasks, and that cross-bin measures should be used. Through experiments using artificial distributions as well as real ones from a dialogue evaluation task, we demonstrate that two cross-bin measures, namely, the Normalised Match Distance (NMD; a special case of the Earth Mover's Distance) and the Root Symmetric Normalised Order-aware Divergence (RSNOD), are indeed substantially different from the bin-by-bin measures.Furthermore, RSNOD lies between the popular bin-by-bin measures and NMD in terms of how it behaves. We recommend using both of these measures in the aforementioned type of evaluation tasks.

    Original languageEnglish
    Title of host publication41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
    PublisherAssociation for Computing Machinery, Inc
    Pages1073-1076
    Number of pages4
    ISBN (Electronic)9781450356572
    DOIs
    Publication statusPublished - 2018 Jun 27
    Event41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018 - Ann Arbor, United States
    Duration: 2018 Jul 82018 Jul 12

    Other

    Other41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
    CountryUnited States
    CityAnn Arbor
    Period18/7/818/7/12

    Keywords

    • Dialogue evaluation
    • Earth mover's distance
    • Evaluation measures
    • Jensen-shannon divergence
    • Kullback-leibler divergence
    • Order-aware divergence
    • Wasserstein distance

    ASJC Scopus subject areas

    • Software
    • Computer Graphics and Computer-Aided Design
    • Information Systems

    Fingerprint Dive into the research topics of 'Comparing two binned probability distributions for information access evaluation'. Together they form a unique fingerprint.

  • Cite this

    Sakai, T. (2018). Comparing two binned probability distributions for information access evaluation. In 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018 (pp. 1073-1076). Association for Computing Machinery, Inc. https://doi.org/10.1145/3209978.3210073