How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

10/12/2022
by   Jonas Geiping, et al.
0

Despite the clear performance benefits of data augmentations, little is known about why they are so effective. In this paper, we disentangle several key mechanisms through which data augmentations operate. Establishing an exchange rate between augmented and additional real data, we find that in out-of-distribution testing scenarios, augmentations which yield samples that are diverse, but inconsistent with the data distribution can be even more valuable than additional training data. Moreover, we find that data augmentations which encourage invariances can be more valuable than invariance alone, especially on small and medium sized training sets. Following this observation, we show that augmentations induce additional stochasticity during training, effectively flattening the loss landscape.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2021

Neural Networks for Learning Counterfactual G-Invariances from Single Environments

Despite – or maybe because of – their astonishing capacity to fit data, ...
research
03/03/2023

Unproportional mosaicing

Data shift is a gap between data distribution used for training and data...
research
07/21/2023

General regularization in covariate shift adaptation

Sample reweighting is one of the most widely used methods for correcting...
research
07/02/2019

FRODO: Free rejection of out-of-distribution samples: application to chest x-ray analysis

In this work, we propose a method to reject out-of-distribution samples ...
research
10/13/2021

Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers

Empirical science of neural scaling laws is a rapidly growing area of si...
research
02/02/2019

Learned Indexes for Dynamic Workloads

The recent proposal of learned index structures opens up a new perspecti...

Please sign up or login with your details

Forgot password? Click here to reset