Improving fraud prediction with incremental data balancing technique for massive data streams

02/28/2019
by   Rafiq Ahmed Mohammed, et al.
0

The performance of classification algorithms with a massive and highly imbalanced data stream depends upon efficient balancing strategy. Some techniques of balancing strategy have been applied in the past with Batch data to resolve the class imbalance problem. This paper proposes a new incremental data balancing framework which can work with massive imbalanced data streams. In this paper, we choose Racing Algorithm as an automated data balancing technique which optimizes the balancing techniques. We applied Random Forest classification algorithm which can deal with the massive data stream. We investigated the suitability of Racing Algorithm and Random Forest in the proposed framework. Applying new technique in the proposed framework on the European Credit Card dataset, provided better results than the Batch mode. The proposed framework is more scalable to handle online massive data streams.

READ FULL TEXT

page 1

page 5

research
12/19/2018

The Random Forest Classifier in WEKA: Discussion and New Developments for Imbalanced Data

Data analysis and machine learning have become an integrative part of th...
research
12/19/2018

Balanced Random Forest Classifier in WEKA

Data analysis and machine learning have become an integrative part of th...
research
03/11/2023

Credit Card Fraud Detection Using Enhanced Random Forest Classifier for Imbalanced Data

The credit card has become the most popular payment method for both onli...
research
04/28/2018

Credit risk prediction in an imbalanced social lending environment

Credit risk prediction is an effective way of evaluating whether a poten...
research
06/24/2023

Evaluating the Utility of GAN Generated Synthetic Tabular Data for Class Balancing and Low Resource Settings

The present study aimed to address the issue of imbalanced data in class...
research
09/22/2020

Gamma distribution-based sampling for imbalanced data

Imbalanced class distribution is a common problem in a number of fields ...
research
03/23/2022

A Framework for Fast Polarity Labelling of Massive Data Streams

Many of the existing sentiment analysis techniques are based on supervis...

Please sign up or login with your details

Forgot password? Click here to reset