Annotation Artifacts in Natural Language Inference Data

03/06/2018
by   Suchin Gururangan, et al.
0

Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them to generate three new sentences (hypotheses) that it entails, contradicts, or is logically neutral with respect to. We show that, in a significant portion of such data, this protocol leaves clues that make it possible to identify the label by looking only at the hypothesis, without observing the premise. Specifically, we show that a simple text categorization model can correctly classify the hypothesis alone in about 67 et. al, 2017). Our analysis reveals that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes. Our findings suggest that the success of natural language inference models to date has been overestimated, and that the task remains a hard open problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2018

Stress Test Evaluation for Natural Language Inference

Natural language inference (NLI) is the task of determining if a natural...
research
06/02/2021

MedNLI Is Not Immune: Natural Language Inference Artifacts in the Clinical Domain

Crowdworker-constructed natural language inference (NLI) datasets have b...
research
12/16/2021

Automatically Identifying Semantic Bias in Crowdsourced Natural Language Inference Datasets

Natural language inference (NLI) is an important task for producing usef...
research
09/07/2022

Investigating Reasons for Disagreement in Natural Language Inference

We investigate how disagreement in natural language inference (NLI) anno...
research
04/10/2020

A New Dataset for Natural Language Inference from Code-mixed Conversations

Natural Language Inference (NLI) is the task of inferring the logical re...
research
10/13/2020

Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options

Large-scale natural language inference (NLI) datasets such as SNLI or MN...
research
03/05/2020

HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in Natural Language Inference

Many recent studies have shown that for models trained on datasets for n...

Please sign up or login with your details

Forgot password? Click here to reset