Adversarial NLI: A New Benchmark for Natural Language Understanding

by   Yixin Nie, et al.

We introduce a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure. We show that training models on this new dataset leads to state-of-the-art performance on a variety of popular NLI benchmarks, while posing a more difficult challenge with its new test set. Our analysis sheds light on the shortcomings of current state-of-the-art models, and shows that non-expert annotators are successful at finding their weaknesses. The data collection method can be applied in a never-ending learning scenario, becoming a moving target for NLU, rather than a static benchmark that will quickly saturate.


page 1

page 2

page 3

page 4


Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models

With large-scale pre-training, the past two years have witnessed signifi...

LSOIE: A Large-Scale Dataset for Supervised Open Information Extraction

Open Information Extraction (OIE) systems seek to compress the factual p...

Targeted Data Generation: Finding and Fixing Model Weaknesses

Even when aggregate accuracy is high, state-of-the-art NLP models often ...

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Law, interpretations of law, legal arguments, agreements, etc. are typic...

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

In this paper, we introduce an advanced Russian general language underst...

ANLIzing the Adversarial Natural Language Inference Dataset

We perform an in-depth error analysis of Adversarial NLI (ANLI), a recen...

HellaSwag: Can a Machine Really Finish Your Sentence?

Recent work by Zellers et al. (2018) introduced a new task of commonsens...

Please sign up or login with your details

Forgot password? Click here to reset