Classification of Imbalanced Credit scoring data sets Based on Ensemble Method with the Weighted-Hybrid-Sampling

02/09/2021
by   Xiaofan Liua, et al.
0

In the era of big data, the utilization of credit-scoring models to determine the credit risk of applicants accurately becomes a trend in the future. The conventional machine learning on credit scoring data sets tends to have poor classification for the minority class, which may bring huge commercial harm to banks. In order to classify imbalanced data sets, we propose a new ensemble algorithm, namely, Weighted-Hybrid-Sampling-Boost (WHSBoost). In data sampling, we process the imbalanced data sets with weights by the Weighted-SMOTE method and the Weighted-Under-Sampling method, and thus obtain a balanced training sample data set with equal weight. In ensemble algorithm, each time we train the base classifier, the balanced data set is given by the method above. In order to verify the applicability and robustness of the WHSBoost algorithm, we performed experiments on the simulation data sets, real benchmark data sets and real credit scoring data sets, comparing WHSBoost with SMOTE, SMOTEBoost and HSBoost based on SVM, BPNN, DT and KNN.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2020

Dynamic Ensemble Learning for Credit Scoring: A Comparative Study

Automatic credit scoring, which assesses the probability of default by l...
research
07/25/2019

A comparison of Deep Learning performances with others machine learning algorithms on credit scoring unbalanced data

Training models on highly unbalanced data is admitted to be a challengin...
research
07/30/2019

Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions

This study conducts a benchmarking study, comparing 23 different statist...
research
06/15/2020

Societal biases reinforcement through machine learning: A credit scoring perspective

Does machine learning and AI ensure that social biases thrive ? This pap...
research
08/25/2022

Credit card fraud detection - Classifier selection strategy

Machine learning has opened up new tools for financial fraud detection. ...
research
06/22/2022

A proposed simulation technique for population stability testing in credit risk scorecards

Credit risk scorecards are logistic regression models, fitted to large a...
research
06/11/2020

Adaptive Sampling to Reduce Disparate Performance

Existing methods for reducing disparate performance of a classifier acro...

Please sign up or login with your details

Forgot password? Click here to reset