Data Augmentation as Feature Manipulation: a story of desert cows and grass cows

03/03/2022
by   Ruoqi Shen, et al.
7

Data augmentation is a cornerstone of the machine learning pipeline, yet its theoretical underpinnings remain unclear. Is it merely a way to artificially augment the data set size? Or is it about encouraging the model to satisfy certain invariance? In this work we consider another angle, and we study the effect of data augmentation on the dynamic of the learning process. We find that data augmentation can alter the relative importance of various features, effectively making certain informative but hard to learn features more likely to be captured in the learning process. Importantly, we show that this effect is more pronounced for non-linear models, such as neural networks. Our main contribution is a detailed analysis of data augmentation on the learning dynamic for a two layer convolutional neural network in the recently proposed multi-view model by Allen-Zhu and Li [2020]. We complement this analysis with further experimental evidence that data augmentation can be viewed as a form of feature manipulation.

READ FULL TEXT
research
06/11/2020

Data Augmentation for Graph Neural Networks

Data augmentation has been widely used to improve generalizability of ma...
research
10/19/2022

Two-level Data Augmentation for Calibrated Multi-view Detection

Data augmentation has proven its usefulness to improve model generalizat...
research
08/19/2022

Predicting Exotic Hadron Masses with Data Augmentation Using Multilayer Perceptron

Recently, there have been significant developments in neural networks; t...
research
02/02/2023

Neural Network Architecture for Database Augmentation Using Shared Features

The popularity of learning from data with machine learning and neural ne...
research
04/30/2020

When does data augmentation help generalization in NLP?

Neural models often exploit superficial ("weak") features to achieve goo...
research
02/07/2023

Data augmentation for machine learning of chemical process flowsheets

Artificial intelligence has great potential for accelerating the design ...
research
04/14/2023

1-D Residual Convolutional Neural Network coupled with Data Augmentation and Regularization Techniques for the ICPHM 2023 Data Challenge

In this article, we present our contribution to the ICPHM 2023 Data Chal...

Please sign up or login with your details

Forgot password? Click here to reset