Abstract
At NTCIR-4, new retrieval effectiveness metrics called Q-measure and R-measure were proposed for evaluation based on multigrade relevance. This paper shows that Q-measure inherits both the reliability of noninterpolated Average Precision and the multigrade relevance capability of Average Weighted Precision through a theoretical analysis, and then verify the above claim through experiments by actually ranking the systems submitted to the NTCIR-3 CLIR Task. Our experiments confirm that the Q-measure ranking is very highly correlated with the Average Precision ranking and that it is more reliable than Average Weighted Precision.
Original language | English |
---|---|
Pages (from-to) | 251-262 |
Number of pages | 12 |
Journal | LECTURE NOTES IN COMPUTER SCIENCE |
Volume | 3411 |
DOIs | |
Publication status | Published - 2005 |
Externally published | Yes |
Event | Asia Information Retrieval Symposium, AIRS 2004 - Beijing, China Duration: 2004 Oct 18 → 2004 Oct 20 |
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Science(all)