Test collections and measures for evaluating customer-helpdesk dialogues

Zhaohao Zeng, Cheng Luo, Lifeng Shang, Hang Li, Tetsuya Sakai

    Research output: Contribution to journalConference article

    1 Citation (Scopus)

    Abstract

    We address the problem of evaluating textual task-oriented dialogues between the customer and the helpdesk, such as those that take the form of online chats. As an initial step towards evaluating automatic helpdesk dialogue systems, we have constructed a test collection comprising 3,700 real Customer-Helpdesk multi-turn dialogues by mining Weibo, a major Chinese social media. We have annotated each dialogue with multiple subjective quality annotations and nugget annotations, where a nugget is a minimal sequence of posts by the same utterer that helps towards problem solving. In addition 10% of the dialogues have been manually translated into English. We have made our test collection DCH-1 publicly available for research purposes. We also propose a simple nugget-based evaluation measure for task-oriented dialogue evaluation, which we call UCH, and explore its usefulness and limitations.

    Original languageEnglish
    Pages (from-to)1-9
    Number of pages9
    JournalCEUR Workshop Proceedings
    Volume2008
    Publication statusPublished - 2017 Jan 1
    Event8th International Workshop on Evaluating Information Access, EVIA 2017 - Tokyo, Japan
    Duration: 2017 Dec 5 → …

    Fingerprint

    Online conferencing

    Keywords

    • Dialogues
    • Evaluation
    • Helpdesk
    • Measures
    • Nuggets
    • Test collections

    ASJC Scopus subject areas

    • Computer Science(all)

    Cite this

    Test collections and measures for evaluating customer-helpdesk dialogues. / Zeng, Zhaohao; Luo, Cheng; Shang, Lifeng; Li, Hang; Sakai, Tetsuya.

    In: CEUR Workshop Proceedings, Vol. 2008, 01.01.2017, p. 1-9.

    Research output: Contribution to journalConference article

    Zeng, Zhaohao ; Luo, Cheng ; Shang, Lifeng ; Li, Hang ; Sakai, Tetsuya. / Test collections and measures for evaluating customer-helpdesk dialogues. In: CEUR Workshop Proceedings. 2017 ; Vol. 2008. pp. 1-9.
    @article{9494bfea029b429b8f9dbcf7d87b44b0,
    title = "Test collections and measures for evaluating customer-helpdesk dialogues",
    abstract = "We address the problem of evaluating textual task-oriented dialogues between the customer and the helpdesk, such as those that take the form of online chats. As an initial step towards evaluating automatic helpdesk dialogue systems, we have constructed a test collection comprising 3,700 real Customer-Helpdesk multi-turn dialogues by mining Weibo, a major Chinese social media. We have annotated each dialogue with multiple subjective quality annotations and nugget annotations, where a nugget is a minimal sequence of posts by the same utterer that helps towards problem solving. In addition 10{\%} of the dialogues have been manually translated into English. We have made our test collection DCH-1 publicly available for research purposes. We also propose a simple nugget-based evaluation measure for task-oriented dialogue evaluation, which we call UCH, and explore its usefulness and limitations.",
    keywords = "Dialogues, Evaluation, Helpdesk, Measures, Nuggets, Test collections",
    author = "Zhaohao Zeng and Cheng Luo and Lifeng Shang and Hang Li and Tetsuya Sakai",
    year = "2017",
    month = "1",
    day = "1",
    language = "English",
    volume = "2008",
    pages = "1--9",
    journal = "CEUR Workshop Proceedings",
    issn = "1613-0073",
    publisher = "CEUR-WS",

    }

    TY - JOUR

    T1 - Test collections and measures for evaluating customer-helpdesk dialogues

    AU - Zeng, Zhaohao

    AU - Luo, Cheng

    AU - Shang, Lifeng

    AU - Li, Hang

    AU - Sakai, Tetsuya

    PY - 2017/1/1

    Y1 - 2017/1/1

    N2 - We address the problem of evaluating textual task-oriented dialogues between the customer and the helpdesk, such as those that take the form of online chats. As an initial step towards evaluating automatic helpdesk dialogue systems, we have constructed a test collection comprising 3,700 real Customer-Helpdesk multi-turn dialogues by mining Weibo, a major Chinese social media. We have annotated each dialogue with multiple subjective quality annotations and nugget annotations, where a nugget is a minimal sequence of posts by the same utterer that helps towards problem solving. In addition 10% of the dialogues have been manually translated into English. We have made our test collection DCH-1 publicly available for research purposes. We also propose a simple nugget-based evaluation measure for task-oriented dialogue evaluation, which we call UCH, and explore its usefulness and limitations.

    AB - We address the problem of evaluating textual task-oriented dialogues between the customer and the helpdesk, such as those that take the form of online chats. As an initial step towards evaluating automatic helpdesk dialogue systems, we have constructed a test collection comprising 3,700 real Customer-Helpdesk multi-turn dialogues by mining Weibo, a major Chinese social media. We have annotated each dialogue with multiple subjective quality annotations and nugget annotations, where a nugget is a minimal sequence of posts by the same utterer that helps towards problem solving. In addition 10% of the dialogues have been manually translated into English. We have made our test collection DCH-1 publicly available for research purposes. We also propose a simple nugget-based evaluation measure for task-oriented dialogue evaluation, which we call UCH, and explore its usefulness and limitations.

    KW - Dialogues

    KW - Evaluation

    KW - Helpdesk

    KW - Measures

    KW - Nuggets

    KW - Test collections

    UR - http://www.scopus.com/inward/record.url?scp=85038867537&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85038867537&partnerID=8YFLogxK

    M3 - Conference article

    AN - SCOPUS:85038867537

    VL - 2008

    SP - 1

    EP - 9

    JO - CEUR Workshop Proceedings

    JF - CEUR Workshop Proceedings

    SN - 1613-0073

    ER -