Towards Understanding How Data Augmentation Works with Imbalanced Data

04/12/2023
by   Damien A. Dablain, et al.
0

Data augmentation forms the cornerstone of many modern machine learning training pipelines; yet, the mechanisms by which it works are not clearly understood. Much of the research on data augmentation (DA) has focused on improving existing techniques, examining its regularization effects in the context of neural network over-fitting, or investigating its impact on features. Here, we undertake a holistic examination of the effect of DA on three different classifiers, convolutional neural networks, support vector machines, and logistic regression models, which are commonly used in supervised classification of imbalanced data. We support our examination with testing on three image and five tabular datasets. Our research indicates that DA, when applied to imbalanced data, produces substantial changes in model weights, support vectors and feature selection; even though it may only yield relatively modest changes to global metrics, such as balanced accuracy or F1 measure. We hypothesize that DA works by facilitating variances in data, so that machine learning models can associate changes in the data with labels. By diversifying the range of feature amplitudes that a model must recognize to predict a label, DA improves a model's capacity to generalize when learning with imbalanced data.

READ FULL TEXT

page 1

page 6

page 8

page 9

page 10

research
02/18/2023

Data Augmentation for Imbalanced Regression

In this work, we consider the problem of imbalanced data in a regression...
research
05/08/2019

Does Data Augmentation Lead to Positive Margin?

Data augmentation (DA) is commonly used during model training, as it sig...
research
10/10/2022

The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective

Data augmentation (DA) is a powerful workhorse for bolstering performanc...
research
07/13/2022

Efficient Augmentation for Imbalanced Deep Learning

Deep learning models memorize training data, which hurts their ability t...
research
03/21/2020

ARDA: Automatic Relational Data Augmentation for Machine Learning

Automatic machine learning () is a family of techniques to automate the ...
research
06/11/2020

Mixup Training as the Complexity Reduction

Machine learning has achieved remarkable results in recent years due to ...
research
05/25/2023

Visualizing data augmentation in deep speaker recognition

Visualization is of great value in understanding the internal mechanisms...

Please sign up or login with your details

Forgot password? Click here to reset