LSTM vs. BM25 for Open-domain QA: A hands-on comparison of effectiveness and efficiency

Sosuke Kato, Riku Togashi, Hideyuki Maeda, Sumio Fujita, Tetsuya Sakai

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Recent advances in neural networks, along with the growth of rich and diverse community question answering (cQA) data, have en-abled researchers to construct robust open-domain question an-swering (QA) systems. It is often claimed that such state-of-The-art QA systems far outperform traditional IR baselines such as BM25. However, most such studies rely on relatively small data sets, e.g., those extracted from the old TREC QA tracks. Given mas-sive training data plus a separate corpus of Q&A pairs as the tar-get knowledge source, how well would such a system really per-form? How fast would it respond? In this demonstration, we pro-vide the attendees of SIGIR 2017 an opportunity to experience a live comparison of two open-domain QA systems, one based on a long short-Term memory (LSTM) architecture with over 11 mil-lion Yahoo! Chiebukuro (i.e., Japanese Yahoo! Answers) questions and over 27.4 million answers for training, and the other based on BM25. Both systems use the same Q&A knowledge source for answer retrieval. Our core demonstration system is a pair of Japan-ese monolingual QA systems, but we leverage machine translation for letting the SIGIR attendees enter English questions and com-pare the Japanese responses from the two systems after translating them into English.

    Original languageEnglish
    Title of host publicationSIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
    PublisherAssociation for Computing Machinery, Inc
    Pages1309-1312
    Number of pages4
    ISBN (Electronic)9781450350228
    DOIs
    Publication statusPublished - 2017 Aug 7
    Event40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017 - Tokyo, Shinjuku, Japan
    Duration: 2017 Aug 72017 Aug 11

    Other

    Other40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017
    CountryJapan
    CityTokyo, Shinjuku
    Period17/8/717/8/11

      Fingerprint

    Keywords

    • Community question answering
    • Long short-Term memory
    • Question answering

    ASJC Scopus subject areas

    • Information Systems
    • Software
    • Computer Graphics and Computer-Aided Design

    Cite this

    Kato, S., Togashi, R., Maeda, H., Fujita, S., & Sakai, T. (2017). LSTM vs. BM25 for Open-domain QA: A hands-on comparison of effectiveness and efficiency. In SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1309-1312). Association for Computing Machinery, Inc. https://doi.org/10.1145/3077136.3084147