Topic set size design

    Research output: Contribution to journalArticle

    20 Citations (Scopus)

    Abstract

    Traditional pooling-based information retrieval (IR) test collections typically have (Formula presented.)–100 topics, but it is difficult for an IR researcher to say why the topic set size should really be n.The present study provides details on principled ways to determine the number of topics for a test collection to be built, based on a specific set of statistical requirements. We employ Nagata’s three sample size design techniques, which are based on the paired t test, one-way ANOVA, and confidence intervals, respectively. These topic set size design methods require topic-by-run score matrices from past test collections for the purpose of estimating the within-system population variance for a particular evaluation measure. While the previous work of Sakai incorrectly used estimates of the total variances, here we use the correct estimates of the within-system variances, which yield slightly smaller topic set sizes than those reported previously by Sakai. Moreover, this study provides a comparison across the three methods. Our conclusions nevertheless echo those of Sakai: as different evaluation measures can have vastly different within-system variances, they require substantially different topic set sizes under the same set of statistical requirements; by analysing the tradeoff between the topic set size and the pool depth for a particular evaluation measure in advance, researchers can build statistically reliable yet highly economical test collections.

    Original languageEnglish
    JournalInformation Retrieval
    DOIs
    Publication statusAccepted/In press - 2015 Oct 27

    Fingerprint

    Information retrieval
    Analysis of variance (ANOVA)
    information retrieval
    evaluation
    confidence

    ASJC Scopus subject areas

    • Information Systems
    • Library and Information Sciences

    Cite this

    Topic set size design. / Sakai, Tetsuya.

    In: Information Retrieval, 27.10.2015.

    Research output: Contribution to journalArticle

    @article{c34ae00ee8724e2e83f4bf260d19826b,
    title = "Topic set size design",
    abstract = "Traditional pooling-based information retrieval (IR) test collections typically have (Formula presented.)–100 topics, but it is difficult for an IR researcher to say why the topic set size should really be n.The present study provides details on principled ways to determine the number of topics for a test collection to be built, based on a specific set of statistical requirements. We employ Nagata’s three sample size design techniques, which are based on the paired t test, one-way ANOVA, and confidence intervals, respectively. These topic set size design methods require topic-by-run score matrices from past test collections for the purpose of estimating the within-system population variance for a particular evaluation measure. While the previous work of Sakai incorrectly used estimates of the total variances, here we use the correct estimates of the within-system variances, which yield slightly smaller topic set sizes than those reported previously by Sakai. Moreover, this study provides a comparison across the three methods. Our conclusions nevertheless echo those of Sakai: as different evaluation measures can have vastly different within-system variances, they require substantially different topic set sizes under the same set of statistical requirements; by analysing the tradeoff between the topic set size and the pool depth for a particular evaluation measure in advance, researchers can build statistically reliable yet highly economical test collections.",
    author = "Tetsuya Sakai",
    year = "2015",
    month = "10",
    day = "27",
    doi = "10.1007/s10791-015-9273-z",
    language = "English",
    journal = "Information Retrieval",
    issn = "1386-4564",
    publisher = "Springer Netherlands",

    }

    TY - JOUR

    T1 - Topic set size design

    AU - Sakai, Tetsuya

    PY - 2015/10/27

    Y1 - 2015/10/27

    N2 - Traditional pooling-based information retrieval (IR) test collections typically have (Formula presented.)–100 topics, but it is difficult for an IR researcher to say why the topic set size should really be n.The present study provides details on principled ways to determine the number of topics for a test collection to be built, based on a specific set of statistical requirements. We employ Nagata’s three sample size design techniques, which are based on the paired t test, one-way ANOVA, and confidence intervals, respectively. These topic set size design methods require topic-by-run score matrices from past test collections for the purpose of estimating the within-system population variance for a particular evaluation measure. While the previous work of Sakai incorrectly used estimates of the total variances, here we use the correct estimates of the within-system variances, which yield slightly smaller topic set sizes than those reported previously by Sakai. Moreover, this study provides a comparison across the three methods. Our conclusions nevertheless echo those of Sakai: as different evaluation measures can have vastly different within-system variances, they require substantially different topic set sizes under the same set of statistical requirements; by analysing the tradeoff between the topic set size and the pool depth for a particular evaluation measure in advance, researchers can build statistically reliable yet highly economical test collections.

    AB - Traditional pooling-based information retrieval (IR) test collections typically have (Formula presented.)–100 topics, but it is difficult for an IR researcher to say why the topic set size should really be n.The present study provides details on principled ways to determine the number of topics for a test collection to be built, based on a specific set of statistical requirements. We employ Nagata’s three sample size design techniques, which are based on the paired t test, one-way ANOVA, and confidence intervals, respectively. These topic set size design methods require topic-by-run score matrices from past test collections for the purpose of estimating the within-system population variance for a particular evaluation measure. While the previous work of Sakai incorrectly used estimates of the total variances, here we use the correct estimates of the within-system variances, which yield slightly smaller topic set sizes than those reported previously by Sakai. Moreover, this study provides a comparison across the three methods. Our conclusions nevertheless echo those of Sakai: as different evaluation measures can have vastly different within-system variances, they require substantially different topic set sizes under the same set of statistical requirements; by analysing the tradeoff between the topic set size and the pool depth for a particular evaluation measure in advance, researchers can build statistically reliable yet highly economical test collections.

    UR - http://www.scopus.com/inward/record.url?scp=84945280242&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84945280242&partnerID=8YFLogxK

    U2 - 10.1007/s10791-015-9273-z

    DO - 10.1007/s10791-015-9273-z

    M3 - Article

    AN - SCOPUS:84945280242

    JO - Information Retrieval

    JF - Information Retrieval

    SN - 1386-4564

    ER -