Data-Efficient Augmentation for Training Neural Networks

10/15/2022
by   Tian Yu Liu, et al.
0

Data augmentation is essential to achieve state-of-the-art performance in many deep learning applications. However, the most effective augmentation techniques become computationally prohibitive for even medium-sized datasets. To address this, we propose a rigorous technique to select subsets of data points that when augmented, closely capture the training dynamics of full data augmentation. We first show that data augmentation, modeled as additive perturbations, improves learning and generalization by relatively enlarging and perturbing the smaller singular values of the network Jacobian, while preserving its prominent directions. This prevents overfitting and enhances learning the harder to learn information. Then, we propose a framework to iteratively extract small subsets of training data that when augmented, closely capture the alignment of the fully augmented Jacobian with labels/residuals. We prove that stochastic gradient descent applied to the augmented subsets found by our approach has similar training dynamics to that of fully augmented data. Our experiments demonstrate that our method achieves 6.3x speedup on CIFAR10 and 2.2x speedup on SVHN, and outperforms the baselines by up to 10 various subset sizes. Similarly, on TinyImageNet and ImageNet, our method beats the baselines by up to 8 subset sizes. Finally, training on and augmenting 50 on a version of CIFAR10 corrupted with label noise even outperforms using the full dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2017

Smart Augmentation - Learning an Optimal Data Augmentation Strategy

A recurring problem faced when training neural networks is that there is...
research
10/14/2019

Rethinking Data Augmentation: Self-Supervision and Self-Distillation

Data augmentation techniques, e.g., flipping or cropping, which systemat...
research
10/18/2019

Automatic Data Augmentation by Learning the Deterministic Policy

Aiming to produce sufficient and diverse training samples, data augmenta...
research
03/23/2023

Optimization Dynamics of Equivariant and Augmented Neural Networks

We investigate the optimization of multilayer perceptrons on symmetric d...
research
09/27/2022

3D Rendering Framework for Data Augmentation in Optical Character Recognition

In this paper, we propose a data augmentation framework for Optical Char...
research
09/19/2019

Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data

Data augmentation has been widely applied as an effective methodology to...
research
02/28/2017

Learning Discrete Representations via Information Maximizing Self-Augmented Training

Learning discrete representations of data is a central machine learning ...

Please sign up or login with your details

Forgot password? Click here to reset