BERT on a Data Diet: Finding Important Examples by Gradient-Based Pruning

11/10/2022
by   Mohsen Fayyaz, et al.
0

Current pre-trained language models rely on large datasets for achieving state-of-the-art performance. However, past research has shown that not all examples in a dataset are equally important during training. In fact, it is sometimes possible to prune a considerable fraction of the training set while maintaining the test performance. Established on standard vision benchmarks, two gradient-based scoring metrics for finding important examples are GraNd and its estimated version, EL2N. In this work, we employ these two metrics for the first time in NLP. We demonstrate that these metrics need to be computed after at least one epoch of fine-tuning and they are not reliable in early steps. Furthermore, we show that by pruning a small portion of the examples with the highest GraNd/EL2N scores, we can not only preserve the test accuracy, but also surpass it. This paper details adjustments and implementation choices which enable GraNd and EL2N to be applied to NLP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/27/2022

DeepCuts: Single-Shot Interpretability based Pruning for BERT

As language models have grown in parameters and layers, it has become mu...
research
07/15/2021

Deep Learning on a Data Diet: Finding Important Examples Early in Training

The recent success of deep learning has partially been driven by trainin...
research
05/23/2023

Eliminating Spurious Correlations from Pre-trained Models via Data Mixing

Machine learning models pre-trained on large datasets have achieved rema...
research
07/06/2022

Gender Biases and Where to Find Them: Exploring Gender Bias in Pre-Trained Transformer-based Language Models Using Movement Pruning

Language model debiasing has emerged as an important field of study in t...
research
12/15/2022

Gradient-based Intra-attention Pruning on Pre-trained Language Models

Pre-trained language models achieve superior performance, but they are c...
research
06/05/2023

NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification Tasks

Finetuning large language models inflates the costs of NLU applications ...

Please sign up or login with your details

Forgot password? Click here to reset