Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt

06/14/2022
by   Sören Mindermann, et al.
12

Training on web-scale data can take months. But most computation and time is wasted on redundant and noisy points that are already learnt or not learnable. To accelerate training, we introduce Reducible Holdout Loss Selection (RHO-LOSS), a simple but principled technique which selects approximately those points for training that most reduce the model's generalization loss. As a result, RHO-LOSS mitigates the weaknesses of existing data selection methods: techniques from the optimization literature typically select 'hard' (e.g. high loss) points, but such points are often noisy (not learnable) or less task-relevant. Conversely, curriculum learning prioritizes 'easy' points, but such points need not be trained on once learned. In contrast, RHO-LOSS selects points that are learnable, worth learning, and not yet learnt. RHO-LOSS trains in far fewer steps than prior art, improves accuracy, and speeds up training on a wide range of datasets, hyperparameters, and architectures (MLPs, CNNs, and BERT). On the large web-scraped image dataset Clothing-1M, RHO-LOSS trains in 18x fewer steps and reaches 2 shuffling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2021

Prioritized training on points that are learnable, worth learning, and not yet learned

We introduce Goldilocks Selection, a technique for faster model training...
research
05/24/2019

Curriculum Loss: Robust Learning and Generalization against Label Corruption

Generalization is vital important for many deep network models. It becom...
research
07/14/2023

Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy

Data-poisoning based backdoor attacks aim to insert backdoor into models...
research
01/21/2021

LEAF: A Learnable Frontend for Audio Classification

Mel-filterbanks are fixed, engineered audio features which emulate human...
research
05/18/2023

Selecting Learnable Training Samples is All DETRs Need in Crowded Pedestrian Detection

DEtection TRansformer (DETR) and its variants (DETRs) achieved impressiv...
research
08/03/2020

Low-loss connection of weight vectors: distribution-based approaches

Recent research shows that sublevel sets of the loss surfaces of overpar...
research
01/14/2021

Rescaling CNN through Learnable Repetition of Network Parameters

Deeper and wider CNNs are known to provide improved performance for deep...

Please sign up or login with your details

Forgot password? Click here to reset