LSTM vs. BM25 for Open-domain QA: A hands-on comparison of effectiveness and efficiency

Sosuke Kato, Riku Togashi, Hideyuki Maeda, Sumio Fujita, Tetsuya Sakai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Recent advances in neural networks, along with the growth of rich and diverse community question answering (cQA) data, have en-abled researchers to construct robust open-domain question an-swering (QA) systems. It is often claimed that such state-of-The-art QA systems far outperform traditional IR baselines such as BM25. However, most such studies rely on relatively small data sets, e.g., those extracted from the old TREC QA tracks. Given mas-sive training data plus a separate corpus of Q&A pairs as the tar-get knowledge source, how well would such a system really per-form? How fast would it respond? In this demonstration, we pro-vide the attendees of SIGIR 2017 an opportunity to experience a live comparison of two open-domain QA systems, one based on a long short-Term memory (LSTM) architecture with over 11 mil-lion Yahoo! Chiebukuro (i.e., Japanese Yahoo! Answers) questions and over 27.4 million answers for training, and the other based on BM25. Both systems use the same Q&A knowledge source for answer retrieval. Our core demonstration system is a pair of Japan-ese monolingual QA systems, but we leverage machine translation for letting the SIGIR attendees enter English questions and com-pare the Japanese responses from the two systems after translating them into English.

Original languageEnglish
Title of host publicationSIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages1309-1312
Number of pages4
ISBN (Electronic)9781450350228
DOIs
Publication statusPublished - 2017 Aug 7
Event40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017 - Tokyo, Shinjuku, Japan
Duration: 2017 Aug 72017 Aug 11

Publication series

NameSIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Other

Other40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017
CountryJapan
CityTokyo, Shinjuku
Period17/8/717/8/11

Keywords

  • Community question answering
  • Long short-Term memory
  • Question answering

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Computer Graphics and Computer-Aided Design

Fingerprint Dive into the research topics of 'LSTM vs. BM25 for Open-domain QA: A hands-on comparison of effectiveness and efficiency'. Together they form a unique fingerprint.

Cite this