Comparing two binned probability distributions for information access evaluation

Tetsuya Sakai*

*この研究の対応する著者

研究成果: Conference contribution

2 被引用数 (Scopus)

抄録

Some modern information access tasks such as natural language dialogue tasks are difficult to evaluate, for often there is no such thing as the ground truth: different users may have different opinions about the system's output. A few task designs for dialogue evaluation have been implemented and/or proposed recently, where both the ground truth data and the system's output are represented as a distribution of users' votes over bins on a non-nominal scale. The present study first points out that popular bin-by-bin measures such as Jensen-Shannon divergence and Sum of Squared Errors are clearly not adequate for such tasks, and that cross-bin measures should be used. Through experiments using artificial distributions as well as real ones from a dialogue evaluation task, we demonstrate that two cross-bin measures, namely, the Normalised Match Distance (NMD; a special case of the Earth Mover's Distance) and the Root Symmetric Normalised Order-aware Divergence (RSNOD), are indeed substantially different from the bin-by-bin measures.Furthermore, RSNOD lies between the popular bin-by-bin measures and NMD in terms of how it behaves. We recommend using both of these measures in the aforementioned type of evaluation tasks.

本文言語English
ホスト出版物のタイトル41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
出版社Association for Computing Machinery, Inc
ページ1073-1076
ページ数4
ISBN(電子版)9781450356572
DOI
出版ステータスPublished - 2018 6 27
イベント41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018 - Ann Arbor, United States
継続期間: 2018 7 82018 7 12

出版物シリーズ

名前41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018

Other

Other41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
国/地域United States
CityAnn Arbor
Period18/7/818/7/12

ASJC Scopus subject areas

  • ソフトウェア
  • コンピュータ グラフィックスおよびコンピュータ支援設計
  • 情報システム

フィンガープリント

「Comparing two binned probability distributions for information access evaluation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル