Efficient Augmentation for Imbalanced Deep Learning

07/13/2022
by   Damien Dablain, et al.
0

Deep learning models memorize training data, which hurts their ability to generalize to under-represented classes. We empirically study a convolutional neural network's internal representation of imbalanced image data and measure the generalization gap between a model's feature embeddings in the training and test sets, showing that the gap is wider for minority classes. This insight enables us to design an efficient three-phase CNN training framework for imbalanced data. The framework involves training the network end-to-end on imbalanced data to learn accurate feature embeddings, performing data augmentation in the learned embedded space to balance the train distribution, and fine-tuning the classifier head on the embedded balanced training data. We propose Expansive Over-Sampling (EOS) as a data augmentation technique to utilize in the training framework. EOS forms synthetic training instances as convex combinations between the minority class samples and their nearest enemies in the embedded space to reduce the generalization gap. The proposed framework improves the accuracy over leading cost-sensitive and resampling methods commonly used in imbalanced learning. Moreover, it is more computationally efficient than standard data pre-processing methods, such as SMOTE and GAN-based oversampling, as it requires fewer parameters and less training time.

READ FULL TEXT
research
10/17/2022

Understanding CNN Fragility When Learning With Imbalanced Data

Convolutional neural networks (CNNs) have achieved impressive results on...
research
08/29/2023

From SMOTE to Mixup for Deep Imbalanced Classification

Given imbalanced data, it is hard to train a good classifier using deep ...
research
04/12/2023

Towards Understanding How Data Augmentation Works with Imbalanced Data

Data augmentation forms the cornerstone of many modern machine learning ...
research
04/25/2017

Deep Over-sampling Framework for Classifying Imbalanced Data

Class imbalance is a challenging issue in practical classification probl...
research
01/01/2019

Augmentation Scheme for Dealing with Imbalanced Network Traffic Classification Using Deep Learning

One of the most important tasks in network management is identifying dif...
research
05/31/2023

Building Manufacturing Deep Learning Models with Minimal and Imbalanced Training Data Using Domain Adaptation and Data Augmentation

Deep learning (DL) techniques are highly effective for defect detection ...
research
01/16/2021

Improve Global Glomerulosclerosis Classification with Imbalanced Data using CircleMix Augmentation

The classification of glomerular lesions is a routine and essential task...

Please sign up or login with your details

Forgot password? Click here to reset