A comparative study on machine learning models combining with outlier detection and balanced sampling methods for credit scoring

12/25/2021
by   Hongyi Qian, et al.
0

Peer-to-peer (P2P) lending platforms have grown rapidly over the past decade as the network infrastructure has improved and the demand for personal lending has grown. Such platforms allow users to create peer-to-peer lending relationships without the help of traditional financial institutions. Assessing the borrowers' credit is crucial to reduce the default rate and benign development of P2P platforms. Building a personal credit scoring machine learning model can effectively predict whether users will repay loans on the P2P platform. And the handling of data outliers and sample imbalance problems can affect the final effect of machine learning models. There have been some studies on balanced sampling methods, but the effect of outlier detection methods and their combination with balanced sampling methods on the effectiveness of machine learning models has not been fully studied. In this paper, the influence of using different outlier detection methods and balanced sampling methods on commonly used machine learning models is investigated. Experiments on 44,487 Lending Club samples show that proper outlier detection can improve the effectiveness of the machine learning model, and the balanced sampling method only has a good effect on a few machine learning models, such as MLP.

READ FULL TEXT
research
10/18/2020

Dynamic Ensemble Learning for Credit Scoring: A Comparative Study

Automatic credit scoring, which assesses the probability of default by l...
research
10/05/2018

Wide and Deep Learning for Peer-to-Peer Lending

This paper proposes a two-stage scoring approach to help lenders decide ...
research
09/09/2020

Improving Investment Suggestions for Peer-to-Peer (P2P) Lending via Integrating Credit Scoring into Profit Scoring

In the peer-to-peer (P2P) lending market, lenders lend the money to the ...
research
09/21/2020

Machine learning based forecasting of significant daily returns in foreign exchange markets

Asset value forecasting has always attracted an enormous amount of inter...
research
07/31/2019

Are Outlier Detection Methods Resilient to Sampling?

Outlier detection is a fundamental task in data mining and has many appl...
research
10/09/2018

Building a Reproducible Machine Learning Pipeline

Reproducibility of modeling is a problem that exists for any machine lea...
research
05/31/2023

Credit Card Fraud Detection Using Asexual Reproduction Optimization

As the number of credit card users has increased, detecting fraud in thi...

Please sign up or login with your details

Forgot password? Click here to reset