UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark

03/24/2021
by   Nicholas Lourie, et al.
0

Commonsense AI has long been seen as a near impossible goal – until recently. Now, research interest has sharply increased with an influx of new benchmarks and models. We propose two new ways to evaluate commonsense models, emphasizing their generality on new tasks and building on diverse, recently introduced benchmarks. First, we propose a new multitask benchmark, RAINBOW, to promote research on commonsense models that generalize well over multiple tasks and datasets. Second, we propose a novel evaluation, the cost equivalent curve, that sheds new insight on how the choice of source datasets, pretrained language models, and transfer learning methods impacts performance and data efficiency. We perform extensive experiments – over 200 experiments encompassing 4800 models – and report multiple valuable and sometimes surprising findings, e.g., that transfer almost always leads to better or equivalent performance if following a particular recipe, that QA-based commonsense datasets transfer well with each other, while commonsense knowledge graphs do not, and that perhaps counter-intuitively, larger models benefit more from transfer than smaller ones. Last but not least, we introduce a new universal commonsense reasoning model, UNICORN, that establishes new state-of-the-art performance across 8 popular commonsense benchmarks, aNLI (87.3 (90.1 (79.3

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2022

Modularized Transfer Learning with Multiple Knowledge Graphs for Zero-shot Commonsense Reasoning

Commonsense reasoning systems should be able to generalize to diverse re...
research
10/19/2020

Deriving Commonsense Inference Tasks from Interactive Fictions

Commonsense reasoning simulates the human ability to make presumptions a...
research
04/22/2019

SocialIQA: Commonsense Reasoning about Social Interactions

We introduce SocialIQa, the first large-scale benchmark for commonsense ...
research
11/18/2020

Do Fine-tuned Commonsense Language Models Really Generalize?

Recently, transformer-based methods such as RoBERTa and GPT-3 have led t...
research
04/16/2021

Back to Square One: Bias Detection, Training and Commonsense Disentanglement in the Winograd Schema

The Winograd Schema (WS) has been proposed as a test for measuring commo...
research
09/25/2016

Commonsense reasoning, commonsense knowledge, and the SP theory of intelligence

This paper describes how the "SP theory of intelligence", outlined in an...
research
12/21/2020

Exploring and Analyzing Machine Commonsense Benchmarks

Commonsense question-answering (QA) tasks, in the form of benchmarks, ar...

Please sign up or login with your details

Forgot password? Click here to reset