ANLIzing the Adversarial Natural Language Inference Dataset

by   Adina Williams, et al.

We perform an in-depth error analysis of Adversarial NLI (ANLI), a recently introduced large-scale human-and-model-in-the-loop natural language inference dataset collected over multiple rounds. We propose a fine-grained annotation scheme of the different aspects of inference that are responsible for the gold classification labels, and use it to hand-code all three of the ANLI development sets. We use these annotations to answer a variety of interesting questions: which inference types are most common, which models have the highest performance on each reasoning type, and which types are the most challenging for state of-the-art models? We hope that our annotations will enable more fine-grained evaluation of models trained on ANLI, provide us with a deeper understanding of where models fail and succeed, and help us determine how to train better models in future.


Adversarial Analysis of Natural Language Inference Systems

The release of large natural language inference (NLI) datasets like SNLI...

Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection

We present a first-of-its-kind large synthetic training dataset for onli...

Natural Language Inference with Mixed Effects

There is growing evidence that the prevalence of disagreement in the raw...

Improving Annotation for 3D Pose Dataset of Fine-Grained Object Categories

Existing 3D pose datasets of object categories are limited to generic ob...

Adversarial NLI: A New Benchmark for Natural Language Understanding

We introduce a new large-scale NLI benchmark dataset, collected via an i...

What Can We Learn from Collective Human Opinions on Natural Language Inference Data?

Despite the subjective nature of many NLP tasks, most NLU evaluations ha...