A Data-Augmentation Is Worth A Thousand Samples: Exact Quantification From Analytical Augmented Sample Moments

02/16/2022
by   Randall Balestriero, et al.
8

Data-Augmentation (DA) is known to improve performance across tasks and datasets. We propose a method to theoretically analyze the effect of DA and study questions such as: how many augmented samples are needed to correctly estimate the information encoded by that DA? How does the augmentation policy impact the final parameters of a model? We derive several quantities in close-form, such as the expectation and variance of an image, loss, and model's output under a given DA distribution. Those derivations open new avenues to quantify the benefits and limitations of DA. For example, we show that common DAs require tens of thousands of samples for the loss at hand to be correctly estimated and for the model training to converge. We show that for a training loss to be stable under DA sampling, the model's saliency map (gradient of the loss with respect to the model's input) must align with the smallest eigenvector of the sample variance under the considered DA augmentation, hinting at a possible explanation on why models tend to shift their focus from edges to textures.

READ FULL TEXT

page 1

page 4

page 6

page 11

research
10/21/2021

DAIR: Data Augmented Invariant Regularization

While deep learning through empirical risk minimization (ERM) has succee...
research
05/08/2019

Does Data Augmentation Lead to Positive Margin?

Data augmentation (DA) is commonly used during model training, as it sig...
research
07/21/2023

Automatic Data Augmentation Learning using Bilevel Optimization for Histopathological Images

Training a deep learning model to classify histopathological images is c...
research
11/07/2019

Data transforming augmentation for heteroscedastic models

Data augmentation (DA) turns seemingly intractable computational problem...
research
12/16/2022

Better May Not Be Fairer: Can Data Augmentation Mitigate Subgroup Degradation?

It is no secret that deep learning models exhibit undesirable behaviors ...
research
11/30/2022

Semi-Supervised Heterogeneous Graph Learning with Multi-level Data Augmentation

In recent years, semi-supervised graph learning with data augmentation (...

Please sign up or login with your details

Forgot password? Click here to reset