Unanimity-aware gain for highly subjective assessments

    Research output: Contribution to journalConference article

    Abstract

    IR tasks have diversified: human assessments of items such as social media posts can be highly subjective, in which case it becomes necessary to hire many assessors per item to reflect their diverse views. For example, the value of a tweet for a given purpose may be judged by (say) ten assessors, and their ratings could be summed up to define its gain value for computing a graded-relevance evaluation measure. In the present study, we propose a simple variant of this approach, which takes into account the fact that some items receive unanimous ratings while others are more controversial. We generate simulated ratings based on a real social-media-based IR task data to examine the effect of our unanimity-aware approach on the system ranking and on statistical significance. Our results show that incorporating unanimity can affect statistical significance test results even when its impact on the gain value is kept to a minimum. Moreover, since our simulated ratings do not consider the correlation present in the assessors' actual ratings, our experiments probably underestimate the effect of introducing unanimity into evaluation. Hence, if researchers accept that unanimous votes should be valued more highly than controversial ones, then our proposed approach may be worth incorporating.

    Original languageEnglish
    Pages (from-to)39-42
    Number of pages4
    JournalCEUR Workshop Proceedings
    Volume2008
    Publication statusPublished - 2017 Jan 1
    Event8th International Workshop on Evaluating Information Access, EVIA 2017 - Tokyo, Japan
    Duration: 2017 Dec 5 → …

    Fingerprint

    Statistical tests
    Experiments

    Keywords

    • Effect sizes
    • Evaluation measures
    • Inter-assessor agreement
    • P-values
    • Social media
    • Statistical significance

    ASJC Scopus subject areas

    • Computer Science(all)

    Cite this

    Unanimity-aware gain for highly subjective assessments. / Sakai, Tetsuya.

    In: CEUR Workshop Proceedings, Vol. 2008, 01.01.2017, p. 39-42.

    Research output: Contribution to journalConference article

    @article{9e70de79a52944d4aa54c38453bdba49,
    title = "Unanimity-aware gain for highly subjective assessments",
    abstract = "IR tasks have diversified: human assessments of items such as social media posts can be highly subjective, in which case it becomes necessary to hire many assessors per item to reflect their diverse views. For example, the value of a tweet for a given purpose may be judged by (say) ten assessors, and their ratings could be summed up to define its gain value for computing a graded-relevance evaluation measure. In the present study, we propose a simple variant of this approach, which takes into account the fact that some items receive unanimous ratings while others are more controversial. We generate simulated ratings based on a real social-media-based IR task data to examine the effect of our unanimity-aware approach on the system ranking and on statistical significance. Our results show that incorporating unanimity can affect statistical significance test results even when its impact on the gain value is kept to a minimum. Moreover, since our simulated ratings do not consider the correlation present in the assessors' actual ratings, our experiments probably underestimate the effect of introducing unanimity into evaluation. Hence, if researchers accept that unanimous votes should be valued more highly than controversial ones, then our proposed approach may be worth incorporating.",
    keywords = "Effect sizes, Evaluation measures, Inter-assessor agreement, P-values, Social media, Statistical significance",
    author = "Tetsuya Sakai",
    year = "2017",
    month = "1",
    day = "1",
    language = "English",
    volume = "2008",
    pages = "39--42",
    journal = "CEUR Workshop Proceedings",
    issn = "1613-0073",
    publisher = "CEUR-WS",

    }

    TY - JOUR

    T1 - Unanimity-aware gain for highly subjective assessments

    AU - Sakai, Tetsuya

    PY - 2017/1/1

    Y1 - 2017/1/1

    N2 - IR tasks have diversified: human assessments of items such as social media posts can be highly subjective, in which case it becomes necessary to hire many assessors per item to reflect their diverse views. For example, the value of a tweet for a given purpose may be judged by (say) ten assessors, and their ratings could be summed up to define its gain value for computing a graded-relevance evaluation measure. In the present study, we propose a simple variant of this approach, which takes into account the fact that some items receive unanimous ratings while others are more controversial. We generate simulated ratings based on a real social-media-based IR task data to examine the effect of our unanimity-aware approach on the system ranking and on statistical significance. Our results show that incorporating unanimity can affect statistical significance test results even when its impact on the gain value is kept to a minimum. Moreover, since our simulated ratings do not consider the correlation present in the assessors' actual ratings, our experiments probably underestimate the effect of introducing unanimity into evaluation. Hence, if researchers accept that unanimous votes should be valued more highly than controversial ones, then our proposed approach may be worth incorporating.

    AB - IR tasks have diversified: human assessments of items such as social media posts can be highly subjective, in which case it becomes necessary to hire many assessors per item to reflect their diverse views. For example, the value of a tweet for a given purpose may be judged by (say) ten assessors, and their ratings could be summed up to define its gain value for computing a graded-relevance evaluation measure. In the present study, we propose a simple variant of this approach, which takes into account the fact that some items receive unanimous ratings while others are more controversial. We generate simulated ratings based on a real social-media-based IR task data to examine the effect of our unanimity-aware approach on the system ranking and on statistical significance. Our results show that incorporating unanimity can affect statistical significance test results even when its impact on the gain value is kept to a minimum. Moreover, since our simulated ratings do not consider the correlation present in the assessors' actual ratings, our experiments probably underestimate the effect of introducing unanimity into evaluation. Hence, if researchers accept that unanimous votes should be valued more highly than controversial ones, then our proposed approach may be worth incorporating.

    KW - Effect sizes

    KW - Evaluation measures

    KW - Inter-assessor agreement

    KW - P-values

    KW - Social media

    KW - Statistical significance

    UR - http://www.scopus.com/inward/record.url?scp=85038872488&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85038872488&partnerID=8YFLogxK

    M3 - Conference article

    VL - 2008

    SP - 39

    EP - 42

    JO - CEUR Workshop Proceedings

    JF - CEUR Workshop Proceedings

    SN - 1613-0073

    ER -