A method for building a commonsense inference dataset based on basic events

Kazumasa Omura, Daisuke Kawahara*, Sadao Kurohashi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a scalable, low-bias, and low-cost method for building a commonsense inference dataset that combines automatic extraction from a corpus and crowdsourcing. Each problem is a multiple-choice question that asks contingency between basic events. We applied the proposed method to a Japanese corpus and acquired 104k problems. While humans can solve the resulting problems with high accuracy (88.9%), the accuracy of a high-performance transfer learning model is reasonably low (76.0%). We also confirmed through dataset analysis that the resulting dataset contains low bias. We released the datatset to facilitate language understanding research.

Original languageEnglish
Title of host publicationEMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages2450-2460
Number of pages11
ISBN (Electronic)9781952148606
Publication statusPublished - 2020
Externally publishedYes
Event2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020 - Virtual, Online
Duration: 2020 Nov 162020 Nov 20

Publication series

NameEMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

Conference2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020
CityVirtual, Online
Period20/11/1620/11/20

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'A method for building a commonsense inference dataset based on basic events'. Together they form a unique fingerprint.

Cite this