Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data

09/19/2019
by   Zhuoxun He, et al.
0

Data augmentation has been widely applied as an effective methodology to prevent over-fitting in particular when training very deep neural networks. The essential benefit comes from introducing additional priors in visual invariance, and thus generate images in different appearances but containing the same semantics. Recently, researchers proposed a few powerful data augmentation techniques which indeed improved accuracy, yet we notice that these methods have also caused a considerable gap between clean and augmented data. This paper revisits this problem from an analytical perspective, for which we estimate the upper-bound of testing loss using two terms, named empirical risk and generalization error, respectively. Data augmentation significantly reduces the generalization error, but meanwhile leads to a larger empirical risk, which can be alleviated by a simple algorithm, i.e. using less-augmented data to refine the model trained on fully-augmented data. We validate our approach on a few popular image classification datasets including CIFAR and ImageNet, and demonstrate consistent accuracy gain. We also conjecture that this simple strategy implies a generalized approach to circumvent local minima, which is of value to future research on model optimization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2022

Sample Efficiency of Data Augmentation Consistency Regularization

Data augmentation is popular in the training of large neural networks; c...
research
10/15/2020

Does Data Augmentation Benefit from Split BatchNorms

Data augmentation has emerged as a powerful technique for improving the ...
research
03/24/2017

Smart Augmentation - Learning an Optimal Data Augmentation Strategy

A recurring problem faced when training neural networks is that there is...
research
05/01/2020

On the Benefits of Invariance in Neural Networks

Many real world data analysis problems exhibit invariant structure, and ...
research
06/20/2019

Data Interpolating Prediction: Alternative Interpretation of Mixup

Data augmentation by mixing samples, such as Mixup, has widely been used...
research
07/21/2020

Regularizing Deep Networks with Semantic Data Augmentation

Data augmentation is widely known as a simple yet surprisingly effective...
research
10/15/2022

Data-Efficient Augmentation for Training Neural Networks

Data augmentation is essential to achieve state-of-the-art performance i...

Please sign up or login with your details

Forgot password? Click here to reset