How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets

01/12/2022
by   Aarne Talman, et al.
0

A central question in natural language understanding (NLU) research is whether high performance demonstrates the models' strong reasoning capabilities. We present an extensive series of controlled experiments where pre-trained language models are exposed to data that have undergone specific corruption transformations. The transformations involve removing instances of specific word classes and often lead to non-sensical sentences. Our results show that performance remains high for most GLUE tasks when the models are fine-tuned or tested on corrupted data, suggesting that the models leverage other cues for prediction even in non-sensical contexts. Our proposed data transformations can be used as a diagnostic tool for assessing the extent to which a specific dataset constitutes a proper testbed for evaluating models' language understanding capabilities.

READ FULL TEXT
research
04/10/2021

NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

Pre-trained neural language models give high performance on natural lang...
research
03/01/2023

How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks

The GPT-3.5 models have demonstrated impressive performance in various N...
research
08/30/2023

ToddlerBERTa: Exploiting BabyBERTa for Grammar Learning and Language Understanding

We present ToddlerBERTa, a BabyBERTa-like language model, exploring its ...
research
06/29/2023

A negation detection assessment of GPTs: analysis with the xNot360 dataset

Negation is a fundamental aspect of natural language, playing a critical...
research
07/16/2023

GeoGPT: Understanding and Processing Geospatial Tasks through An Autonomous GPT

Decision-makers in GIS need to combine a series of spatial algorithms an...
research
09/10/2021

Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

As large-scale, pre-trained language models achieve human-level and supe...
research
05/24/2022

FLUTE: Figurative Language Understanding and Textual Explanations

In spite of the prevalence of figurative language, transformer-based mod...

Please sign up or login with your details

Forgot password? Click here to reset