Topic set size design with the evaluation measures for short text conversation

Tetsuya Sakai, Lifeng Shang, Zhengdong Lu, Hang Li

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    3 Citations (Scopus)

    Abstract

    Short Text Conversation (STC) is a new NTCIR task which tackles the following research question: given a microblog repository and a new post to that microblog, can systems reuse an old comment from the respository to satisfy the author of the new post? The official evaluation measures of STC are normalised gain at 1 (nG@1), normalised expected reciprocal rank at 10 (nERR@10), and P+, all of which can be regarded as evaluation measures for navigational intents. In this study, we apply the topic set size design technique of Sakai to decide on the number of test topics, using variance estimates of the above evaluation measures. Our main conclusion is to create 100 test topics, but what distinguishes our work from other tasks with similar topic set sizes is that we know what this topic set size means from a statistical viewpoint for each of our evaluation measures. We also demonstrate that, under the same set of statistical requirements, the topic set sizes required by nERR@10 and P+ are more or less the same, while nG@1 requires more than twice as many topics. To our knowledge, our task is the first among all efforts at TREC-like evaluation conferences to actually create a new test collection by using this principled approach.

    Original languageEnglish
    Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    PublisherSpringer Verlag
    Pages319-331
    Number of pages13
    Volume9460
    ISBN (Print)9783319289397
    DOIs
    Publication statusPublished - 2015
    Event11th Asia Information Retrieval Societies Conference, AIRS 2015 - Brisbane, Australia
    Duration: 2015 Dec 22015 Dec 4

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume9460
    ISSN (Print)03029743
    ISSN (Electronic)16113349

    Other

    Other11th Asia Information Retrieval Societies Conference, AIRS 2015
    CountryAustralia
    CityBrisbane
    Period15/12/215/12/4

    Fingerprint

    Evaluation
    Repository
    Reuse
    Design
    Text
    Requirements
    Estimate
    Demonstrate
    Knowledge

    ASJC Scopus subject areas

    • Computer Science(all)
    • Theoretical Computer Science

    Cite this

    Sakai, T., Shang, L., Lu, Z., & Li, H. (2015). Topic set size design with the evaluation measures for short text conversation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9460, pp. 319-331). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9460). Springer Verlag. https://doi.org/10.1007/978-3-319-28940-3_25

    Topic set size design with the evaluation measures for short text conversation. / Sakai, Tetsuya; Shang, Lifeng; Lu, Zhengdong; Li, Hang.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9460 Springer Verlag, 2015. p. 319-331 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9460).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Sakai, T, Shang, L, Lu, Z & Li, H 2015, Topic set size design with the evaluation measures for short text conversation. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 9460, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9460, Springer Verlag, pp. 319-331, 11th Asia Information Retrieval Societies Conference, AIRS 2015, Brisbane, Australia, 15/12/2. https://doi.org/10.1007/978-3-319-28940-3_25
    Sakai T, Shang L, Lu Z, Li H. Topic set size design with the evaluation measures for short text conversation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9460. Springer Verlag. 2015. p. 319-331. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-28940-3_25
    Sakai, Tetsuya ; Shang, Lifeng ; Lu, Zhengdong ; Li, Hang. / Topic set size design with the evaluation measures for short text conversation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9460 Springer Verlag, 2015. pp. 319-331 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{a04af267c2154c90ae933aa6653548bf,
    title = "Topic set size design with the evaluation measures for short text conversation",
    abstract = "Short Text Conversation (STC) is a new NTCIR task which tackles the following research question: given a microblog repository and a new post to that microblog, can systems reuse an old comment from the respository to satisfy the author of the new post? The official evaluation measures of STC are normalised gain at 1 (nG@1), normalised expected reciprocal rank at 10 (nERR@10), and P+, all of which can be regarded as evaluation measures for navigational intents. In this study, we apply the topic set size design technique of Sakai to decide on the number of test topics, using variance estimates of the above evaluation measures. Our main conclusion is to create 100 test topics, but what distinguishes our work from other tasks with similar topic set sizes is that we know what this topic set size means from a statistical viewpoint for each of our evaluation measures. We also demonstrate that, under the same set of statistical requirements, the topic set sizes required by nERR@10 and P+ are more or less the same, while nG@1 requires more than twice as many topics. To our knowledge, our task is the first among all efforts at TREC-like evaluation conferences to actually create a new test collection by using this principled approach.",
    author = "Tetsuya Sakai and Lifeng Shang and Zhengdong Lu and Hang Li",
    year = "2015",
    doi = "10.1007/978-3-319-28940-3_25",
    language = "English",
    isbn = "9783319289397",
    volume = "9460",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    publisher = "Springer Verlag",
    pages = "319--331",
    booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

    }

    TY - GEN

    T1 - Topic set size design with the evaluation measures for short text conversation

    AU - Sakai, Tetsuya

    AU - Shang, Lifeng

    AU - Lu, Zhengdong

    AU - Li, Hang

    PY - 2015

    Y1 - 2015

    N2 - Short Text Conversation (STC) is a new NTCIR task which tackles the following research question: given a microblog repository and a new post to that microblog, can systems reuse an old comment from the respository to satisfy the author of the new post? The official evaluation measures of STC are normalised gain at 1 (nG@1), normalised expected reciprocal rank at 10 (nERR@10), and P+, all of which can be regarded as evaluation measures for navigational intents. In this study, we apply the topic set size design technique of Sakai to decide on the number of test topics, using variance estimates of the above evaluation measures. Our main conclusion is to create 100 test topics, but what distinguishes our work from other tasks with similar topic set sizes is that we know what this topic set size means from a statistical viewpoint for each of our evaluation measures. We also demonstrate that, under the same set of statistical requirements, the topic set sizes required by nERR@10 and P+ are more or less the same, while nG@1 requires more than twice as many topics. To our knowledge, our task is the first among all efforts at TREC-like evaluation conferences to actually create a new test collection by using this principled approach.

    AB - Short Text Conversation (STC) is a new NTCIR task which tackles the following research question: given a microblog repository and a new post to that microblog, can systems reuse an old comment from the respository to satisfy the author of the new post? The official evaluation measures of STC are normalised gain at 1 (nG@1), normalised expected reciprocal rank at 10 (nERR@10), and P+, all of which can be regarded as evaluation measures for navigational intents. In this study, we apply the topic set size design technique of Sakai to decide on the number of test topics, using variance estimates of the above evaluation measures. Our main conclusion is to create 100 test topics, but what distinguishes our work from other tasks with similar topic set sizes is that we know what this topic set size means from a statistical viewpoint for each of our evaluation measures. We also demonstrate that, under the same set of statistical requirements, the topic set sizes required by nERR@10 and P+ are more or less the same, while nG@1 requires more than twice as many topics. To our knowledge, our task is the first among all efforts at TREC-like evaluation conferences to actually create a new test collection by using this principled approach.

    UR - http://www.scopus.com/inward/record.url?scp=84958044666&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84958044666&partnerID=8YFLogxK

    U2 - 10.1007/978-3-319-28940-3_25

    DO - 10.1007/978-3-319-28940-3_25

    M3 - Conference contribution

    SN - 9783319289397

    VL - 9460

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 319

    EP - 331

    BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    PB - Springer Verlag

    ER -