Using graded-relevance metrics for evaluating community QA answer selection

Tetsuya Sakai, Yohei Seki, Daisuke Ishikawa, Kazuko Kuriyama, Noriko Kando, Chin Yew Lin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Citations (Scopus)

Abstract

Community Question Answering (CQA) sites such as Yahoo ! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of "good" answers for a given posted question is a practical research problem that will help us manage the quality of accumulated knowledge. One way to evaluate answer selection systems for CQA would be to use the Best Answers (BAs) that are readily available from the CQA sites. However, BAs may be biased, and even if they are not, there may be other good answers besides BAs. To remedy these two problems, we propose system evaluation methods that involve multiple answer assessors and graded-relevance information retrieval metrics. Our main findings from experiments using the NTCIR-8 CQA task data are that, using our evaluation methods, (a) we can detect many substantial differences between systems that would have been overlooked by BA-based evaluation; and (b) we can better identify hard questions (i.e. those that are handled poorly by many systems and therefore require focussed investigation) compared to BAbased evaluation. We therefore argue that our approach is useful for building effective CQA answer selection systems despite the cost of manual answer assessments.

Original languageEnglish
Title of host publicationProceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011
Pages187-196
Number of pages10
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event4th ACM International Conference on Web Search and Data Mining, WSDM 2011 - Hong Kong
Duration: 2011 Feb 92011 Feb 12

Other

Other4th ACM International Conference on Web Search and Data Mining, WSDM 2011
CityHong Kong
Period11/2/911/2/12

Fingerprint

Information retrieval
Costs
Experiments

Keywords

  • Best answers
  • Community question answering
  • Evaluation
  • Graded relevance
  • NTCIR
  • Test collections

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Software

Cite this

Sakai, T., Seki, Y., Ishikawa, D., Kuriyama, K., Kando, N., & Lin, C. Y. (2011). Using graded-relevance metrics for evaluating community QA answer selection. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011 (pp. 187-196) https://doi.org/10.1145/1935826.1935864

Using graded-relevance metrics for evaluating community QA answer selection. / Sakai, Tetsuya; Seki, Yohei; Ishikawa, Daisuke; Kuriyama, Kazuko; Kando, Noriko; Lin, Chin Yew.

Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011. 2011. p. 187-196.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sakai, T, Seki, Y, Ishikawa, D, Kuriyama, K, Kando, N & Lin, CY 2011, Using graded-relevance metrics for evaluating community QA answer selection. in Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011. pp. 187-196, 4th ACM International Conference on Web Search and Data Mining, WSDM 2011, Hong Kong, 11/2/9. https://doi.org/10.1145/1935826.1935864
Sakai T, Seki Y, Ishikawa D, Kuriyama K, Kando N, Lin CY. Using graded-relevance metrics for evaluating community QA answer selection. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011. 2011. p. 187-196 https://doi.org/10.1145/1935826.1935864
Sakai, Tetsuya ; Seki, Yohei ; Ishikawa, Daisuke ; Kuriyama, Kazuko ; Kando, Noriko ; Lin, Chin Yew. / Using graded-relevance metrics for evaluating community QA answer selection. Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011. 2011. pp. 187-196
@inproceedings{71b10671f83445c2af4bf0c4de06c1e5,
title = "Using graded-relevance metrics for evaluating community QA answer selection",
abstract = "Community Question Answering (CQA) sites such as Yahoo ! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of {"}good{"} answers for a given posted question is a practical research problem that will help us manage the quality of accumulated knowledge. One way to evaluate answer selection systems for CQA would be to use the Best Answers (BAs) that are readily available from the CQA sites. However, BAs may be biased, and even if they are not, there may be other good answers besides BAs. To remedy these two problems, we propose system evaluation methods that involve multiple answer assessors and graded-relevance information retrieval metrics. Our main findings from experiments using the NTCIR-8 CQA task data are that, using our evaluation methods, (a) we can detect many substantial differences between systems that would have been overlooked by BA-based evaluation; and (b) we can better identify hard questions (i.e. those that are handled poorly by many systems and therefore require focussed investigation) compared to BAbased evaluation. We therefore argue that our approach is useful for building effective CQA answer selection systems despite the cost of manual answer assessments.",
keywords = "Best answers, Community question answering, Evaluation, Graded relevance, NTCIR, Test collections",
author = "Tetsuya Sakai and Yohei Seki and Daisuke Ishikawa and Kazuko Kuriyama and Noriko Kando and Lin, {Chin Yew}",
year = "2011",
doi = "10.1145/1935826.1935864",
language = "English",
isbn = "9781450304931",
pages = "187--196",
booktitle = "Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011",

}

TY - GEN

T1 - Using graded-relevance metrics for evaluating community QA answer selection

AU - Sakai, Tetsuya

AU - Seki, Yohei

AU - Ishikawa, Daisuke

AU - Kuriyama, Kazuko

AU - Kando, Noriko

AU - Lin, Chin Yew

PY - 2011

Y1 - 2011

N2 - Community Question Answering (CQA) sites such as Yahoo ! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of "good" answers for a given posted question is a practical research problem that will help us manage the quality of accumulated knowledge. One way to evaluate answer selection systems for CQA would be to use the Best Answers (BAs) that are readily available from the CQA sites. However, BAs may be biased, and even if they are not, there may be other good answers besides BAs. To remedy these two problems, we propose system evaluation methods that involve multiple answer assessors and graded-relevance information retrieval metrics. Our main findings from experiments using the NTCIR-8 CQA task data are that, using our evaluation methods, (a) we can detect many substantial differences between systems that would have been overlooked by BA-based evaluation; and (b) we can better identify hard questions (i.e. those that are handled poorly by many systems and therefore require focussed investigation) compared to BAbased evaluation. We therefore argue that our approach is useful for building effective CQA answer selection systems despite the cost of manual answer assessments.

AB - Community Question Answering (CQA) sites such as Yahoo ! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of "good" answers for a given posted question is a practical research problem that will help us manage the quality of accumulated knowledge. One way to evaluate answer selection systems for CQA would be to use the Best Answers (BAs) that are readily available from the CQA sites. However, BAs may be biased, and even if they are not, there may be other good answers besides BAs. To remedy these two problems, we propose system evaluation methods that involve multiple answer assessors and graded-relevance information retrieval metrics. Our main findings from experiments using the NTCIR-8 CQA task data are that, using our evaluation methods, (a) we can detect many substantial differences between systems that would have been overlooked by BA-based evaluation; and (b) we can better identify hard questions (i.e. those that are handled poorly by many systems and therefore require focussed investigation) compared to BAbased evaluation. We therefore argue that our approach is useful for building effective CQA answer selection systems despite the cost of manual answer assessments.

KW - Best answers

KW - Community question answering

KW - Evaluation

KW - Graded relevance

KW - NTCIR

KW - Test collections

UR - http://www.scopus.com/inward/record.url?scp=79952373644&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952373644&partnerID=8YFLogxK

U2 - 10.1145/1935826.1935864

DO - 10.1145/1935826.1935864

M3 - Conference contribution

AN - SCOPUS:79952373644

SN - 9781450304931

SP - 187

EP - 196

BT - Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011

ER -