robROSE: A robust approach for dealing with imbalanced data in fraud detection

03/22/2020
by   Bart Baesens, et al.
0

A major challenge when trying to detect fraud is that the fraudulent activities form a minority class which make up a very small proportion of the data set. In most data sets, fraud occurs in typically less than 0.5 cases. Detecting fraud in such a highly imbalanced data set typically leads to predictions that favor the majority group, causing fraud to remain undetected. We discuss some popular oversampling techniques that solve the problem of imbalanced data by creating synthetic samples that mimic the minority class. A frequent problem when analyzing real data is the presence of anomalies or outliers. When such atypical observations are present in the data, most oversampling techniques are prone to create synthetic samples that distort the detection algorithm and spoil the resulting analysis. A useful tool for anomaly detection is robust statistics, which aims to find the outliers by first fitting the majority of the data and then flagging data observations that deviate from it. In this paper, we present a robust version of ROSE, called robROSE, which combines several promising approaches to cope simultaneously with the problem of imbalanced data and the presence of outliers. The proposed method achieves to enhance the presence of the fraud cases while ignoring anomalies. The good performance of our new sampling technique is illustrated on simulated and real data sets and it is shown that robROSE can provide better insight in the structure of the data. The source code of the robROSE algorithm is made freely available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/31/2017

Anomaly Detection by Robust Statistics

Real data often contain anomalous cases, also known as outliers. These m...
research
10/24/2018

G-SMOTE: A GMM-based synthetic minority oversampling technique for imbalanced learning

Imbalanced Learning is an important learning algorithm for the classific...
research
06/07/2020

A Novel Algorithm for Optimized Real Time Anomaly Detection in Timeseries

Observations in data which are significantly different from its neighbou...
research
02/10/2020

UGRWO-Sampling: A modified random walk under-sampling approach based on graphs to imbalanced data classification

In this paper, we propose a new RWO-Sampling (Random Walk Over-Sampling)...
research
06/04/2018

MacroPCA: An all-in-one PCA method allowing for missing values as well as cellwise and rowwise outliers

Multivariate data are typically represented by a rectangular matrix (tab...
research
04/23/2020

How to find a unicorn: a novel model-free, unsupervised anomaly detection method for time series

Recognition of anomalous events is a challenging but critical task in ma...
research
06/08/2020

Quickest detection in practice in presence of seasonality: An illustration with call center data

In this chapter, we explain how quickest detection algorithms can be use...

Please sign up or login with your details

Forgot password? Click here to reset