NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification Tasks

06/05/2023
by   Jean-Michel Attendu, et al.
0

Finetuning large language models inflates the costs of NLU applications and remains the bottleneck of development cycles. Recent works in computer vision use data pruning to reduce training time. Pruned data selection with static methods is based on a score calculated for each training example prior to finetuning, which involves important computational overhead. Moreover, the score may not necessarily be representative of sample importance throughout the entire training duration. We propose to address these issues with a refined version of dynamic data pruning, a curriculum which periodically scores and discards unimportant examples during finetuning. Our method leverages an EL2N metric that we extend to the joint intent and slot classification task, and an initial finetuning phase on the full train set. Our results on the GLUE benchmark and four joint NLU datasets show a better time-accuracy trade-off compared to static methods. Our method preserves full accuracy while training on 50 tolerate instead a minor drop of accuracy of 1 training examples for a reduction in finetuning time reaching 66

READ FULL TEXT

page 6

page 16

page 17

research
10/28/2022

Coverage-centric Coreset Selection for High Pruning Rates

One-shot coreset selection aims to select a subset of the training data,...
research
05/21/2023

Infor-Coef: Information Bottleneck-based Dynamic Token Downsampling for Compact and Efficient language model

The prevalence of Transformer-based pre-trained language models (PLMs) h...
research
11/24/2021

Accelerating Deep Learning with Dynamic Data Pruning

Deep learning's success has been attributed to the training of large, ov...
research
02/17/2023

A New Baseline for GreenAI: Finding the Optimal Sub-Network via Layer and Channel Pruning

The concept of Green AI has been gaining attention within the deep learn...
research
11/10/2022

BERT on a Data Diet: Finding Important Examples by Gradient-Based Pruning

Current pre-trained language models rely on large datasets for achieving...
research
11/30/2018

Are All Training Examples Created Equal? An Empirical Study

Modern computer vision algorithms often rely on very large training data...
research
07/07/2022

A Study on the Predictability of Sample Learning Consistency

Curriculum Learning is a powerful training method that allows for faster...

Please sign up or login with your details

Forgot password? Click here to reset