From SMOTE to Mixup for Deep Imbalanced Classification

08/29/2023
by   Wei-Chao Cheng, et al.
0

Given imbalanced data, it is hard to train a good classifier using deep learning because of the poor generalization of minority classes. Traditionally, the well-known synthetic minority oversampling technique (SMOTE) for data augmentation, a data mining approach for imbalanced learning, has been used to improve this generalization. However, it is unclear whether SMOTE also benefits deep learning. In this work, we study why the original SMOTE is insufficient for deep learning, and enhance SMOTE using soft labels. Connecting the resulting soft SMOTE with Mixup, a modern data augmentation technique, leads to a unified framework that puts traditional and modern data augmentation techniques under the same umbrella. A careful study within this framework shows that Mixup improves generalization by implicitly achieving uneven margins between majority and minority classes. We then propose a novel margin-aware Mixup technique that more explicitly achieves uneven margins. Extensive experimental results demonstrate that our proposed technique yields state-of-the-art performance on deep imbalanced classification while achieving superior performance on extremely imbalanced data. The code is open-sourced in our developed package https://github.com/ntucllab/imbalanced-DL to foster future research in this direction.

READ FULL TEXT
research
07/13/2022

Efficient Augmentation for Imbalanced Deep Learning

Deep learning models memorize training data, which hurts their ability t...
research
05/31/2023

Building Manufacturing Deep Learning Models with Minimal and Imbalanced Training Data Using Domain Adaptation and Data Augmentation

Deep learning (DL) techniques are highly effective for defect detection ...
research
01/26/2023

Experimenting with an Evaluation Framework for Imbalanced Data Learning (EFIDL)

Introduction Data imbalance is one of the crucial issues in big data ana...
research
01/16/2021

Improve Global Glomerulosclerosis Classification with Imbalanced Data using CircleMix Augmentation

The classification of glomerular lesions is a routine and essential task...
research
04/20/2023

Is augmentation effective to improve prediction in imbalanced text datasets?

Imbalanced datasets present a significant challenge for machine learning...
research
12/15/2022

Interpretable ML for Imbalanced Data

Deep learning models are being increasingly applied to imbalanced data i...
research
12/28/2022

Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data

Real-world datasets exhibit imbalances of varying types and degrees. Sev...

Please sign up or login with your details

Forgot password? Click here to reset