A method for building a commonsense inference dataset based on basic events

Kazumasa Omura, Daisuke Kawahara*, Sadao Kurohashi

*この研究の対応する著者

研究成果

抄録

We present a scalable, low-bias, and low-cost method for building a commonsense inference dataset that combines automatic extraction from a corpus and crowdsourcing. Each problem is a multiple-choice question that asks contingency between basic events. We applied the proposed method to a Japanese corpus and acquired 104k problems. While humans can solve the resulting problems with high accuracy (88.9%), the accuracy of a high-performance transfer learning model is reasonably low (76.0%). We also confirmed through dataset analysis that the resulting dataset contains low bias. We released the datatset to facilitate language understanding research.

本文言語English
ホスト出版物のタイトルEMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
出版社Association for Computational Linguistics (ACL)
ページ2450-2460
ページ数11
ISBN(電子版)9781952148606
出版ステータスPublished - 2020
外部発表はい
イベント2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020 - Virtual, Online
継続期間: 2020 11 162020 11 20

出版物シリーズ

名前EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

Conference2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020
CityVirtual, Online
Period20/11/1620/11/20

ASJC Scopus subject areas

  • 情報システム
  • コンピュータ サイエンスの応用
  • 計算理論と計算数学

フィンガープリント

「A method for building a commonsense inference dataset based on basic events」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル