COM2SENSE: A Commonsense Reasoning Benchmark with Complementary Sentences

06/02/2021
by   Shikhar Singh, et al.
9

Commonsense reasoning is intuitive for humans but has been a long-term challenge for artificial intelligence (AI). Recent advancements in pretrained language models have shown promising results on several commonsense benchmark datasets. However, the reliability and comprehensiveness of these benchmarks towards assessing model's commonsense reasoning ability remains unclear. To this end, we introduce a new commonsense reasoning benchmark dataset comprising natural language true/false statements, with each sample paired with its complementary counterpart, resulting in 4k sentence pairs. We propose a pairwise accuracy metric to reliably measure an agent's ability to perform commonsense reasoning over a given situation. The dataset is crowdsourced and enhanced with an adversarial model-in-the-loop setup to incentivize challenging samples. To facilitate a systematic analysis of commonsense capabilities, we design our dataset along the dimensions of knowledge domains, reasoning scenarios and numeracy. Experimental results demonstrate that our strongest baseline (UnifiedQA-3B), after fine-tuning, achieves  71  51 The dataset is available at https://github.com/PlusLabNLP/Com2Sense.

READ FULL TEXT

page 1

page 6

page 14

page 15

research
12/16/2021

Commonsense Knowledge-Augmented Pretrained Language Models for Causal Reasoning Classification

Commonsense knowledge can be leveraged for identifying causal relations ...
research
05/22/2022

Commonsense Knowledge Salience Evaluation with a Benchmark Dataset in E-commerce

In e-commerce, the salience of commonsense knowledge (CSK) is beneficial...
research
05/22/2022

Housekeep: Tidying Virtual Households using Commonsense Reasoning

We introduce Housekeep, a benchmark to evaluate commonsense reasoning in...
research
09/03/2021

CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge

Most benchmark datasets targeting commonsense reasoning focus on everyda...
research
05/24/2023

Editing Commonsense Knowledge in GPT

Memory editing methods for updating encyclopedic knowledge in transforme...
research
12/15/2021

KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation

Generative commonsense reasoning requires machines to generate sentences...
research
10/08/2020

Precise Task Formalization Matters in Winograd Schema Evaluations

Performance on the Winograd Schema Challenge (WSC), a respected English ...

Please sign up or login with your details

Forgot password? Click here to reset