VOS: a Method for Variational Oversampling of Imbalanced Data

09/07/2018
by   Val Andrei Fajardo, et al.
0

Class imbalanced datasets are common in real-world applications that range from credit card fraud detection to rare disease diagnostics. Several popular classification algorithms assume that classes are approximately balanced, and hence build the accompanying objective function to maximize an overall accuracy rate. In these situations, optimizing the overall accuracy will lead to highly skewed predictions towards the majority class. Moreover, the negative business impact resulting from false positives (positive samples incorrectly classified as negative) can be detrimental. Many methods have been proposed to address the class imbalance problem, including methods such as over-sampling, under-sampling and cost-sensitive methods. In this paper, we consider the over-sampling method, where the aim is to augment the original dataset with synthetically created observations of the minority classes. In particular, inspired by the recent advances in generative modelling techniques (e.g., Variational Inference and Generative Adversarial Networks), we introduce a new oversampling technique based on variational autoencoders. Our experiments show that the new method is superior in augmenting datasets for downstream classification tasks when compared to traditional oversampling methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2022

Conditional Variational Autoencoder with Balanced Pre-training for Generative Adversarial Networks

Class imbalance occurs in many real-world applications, including image ...
research
05/07/2020

Minority Class Oversampling for Tabular Data with Deep Generative Models

In practice, data scientists are often confronted with imbalanced data. ...
research
06/17/2021

MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

Class-imbalanced data, in which some classes contain far more samples th...
research
08/04/2022

CIGAN: A Python Package for Handling Class Imbalance using Generative Adversarial Networks

A key challenge in Machine Learning is class imbalance, where the sample...
research
04/19/2022

Imbalanced Classification via a Tabular Translation GAN

When presented with a binary classification problem where the data exhib...
research
03/27/2023

Evaluating XGBoost for Balanced and Imbalanced Data: Application to Fraud Detection

This paper evaluates XGboost's performance given different dataset sizes...
research
11/28/2021

Imbalanced data preprocessing techniques utilizing local data characteristics

Data imbalance, that is the disproportion between the number of training...

Please sign up or login with your details

Forgot password? Click here to reset