NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

04/10/2021
by   Aarne Talman, et al.
0

Pre-trained neural language models give high performance on natural language inference (NLI) tasks. But whether they actually understand the meaning of the processed sequences remains unclear. We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities. We specifically apply controlled corruption transformations to widely used benchmarks (MNLI and ANLI), which involve removing entire word classes and often lead to non-sensical sentence pairs. If model accuracy on the corrupted data remains high, then the dataset is likely to contain statistical biases and artefacts that guide prediction. Inversely, a large decrease in model accuracy indicates that the original dataset provides a proper challenge to the models' reasoning capabilities. Hence, our proposed controls can serve as a crash test for developing high quality data for NLI tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/12/2022

How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets

A central question in natural language understanding (NLU) research is w...
research
10/06/2022

Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation

Negation is poorly captured by current language models, although the ext...
research
06/14/2021

Probing Pre-Trained Language Models for Disease Knowledge

Pre-trained language models such as ClinicalBERT have achieved impressiv...
research
06/29/2023

A negation detection assessment of GPTs: analysis with the xNot360 dataset

Negation is a fundamental aspect of natural language, playing a critical...
research
06/16/2023

No Strong Feelings One Way or Another: Re-operationalizing Neutrality in Natural Language Inference

Natural Language Inference (NLI) has been a cornerstone task in evaluati...
research
05/01/2020

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Models for natural language understanding (NLU) tasks often rely on the ...
research
05/26/2023

Learning and Leveraging Verifiers to Improve Planning Capabilities of Pre-trained Language Models

There have been wide spread claims in the literature about the emergent ...

Please sign up or login with your details

Forgot password? Click here to reset