CatBoost model with synthetic features in application to loan risk assessment of small businesses

06/15/2021
by   Haoxue Wang, et al.
0

Loan risk for small businesses has long been a complex problem worthy of exploring. Predicting the loan risk can benefit entrepreneurship by developing more jobs for the society. CatBoost (Categorical Boosting) is a powerful machine learning algorithm suitable for dataset with many categorical variables like the dataset for forecasting loan risk. In this paper, we identify the important risk factors that contribute to loan status classification problem. Then we compare the performance between boosting-type algorithms(especially CatBoost) with other traditional yet popular ones. The dataset we adopt in the research comes from the U.S. Small Business Administration (SBA) and holds a very large sample size (899,164 observations and 27 features). In order to make the best use of the important features in the dataset, we propose a technique named "synthetic generation" to develop more combined features based on arithmetic operation, which ends up improving the accuracy and AUC of the original CatBoost model. We obtain a high accuracy of 95.84 AUC of 98.80

READ FULL TEXT

Authors

page 4

01/08/2020

Gradient Boosting on Decision Trees for Mortality Prediction in Transcatheter Aortic Valve Implantation

Current prognostic risk scores in cardiac surgery are based on statistic...
03/13/2019

Predicting class-imbalanced business risk using resampling, regularization, and model ensembling algorithms

We aim at developing and improving the imbalanced business risk modeling...
02/03/2021

Investigating Critical Risk Factors in Liver Cancer Prediction

We exploit liver cancer prediction model using machine learning algorith...
10/24/2018

CatBoost: gradient boosting with categorical features support

In this paper we present CatBoost, a new open-sourced gradient boosting ...
04/04/2018

Qualitätsmaße binärer Klassifikationen im Bereich kriminalprognostischer Instrumente der vierten Generation

This master's thesis discusses an important issue regarding how algorith...
04/09/2018

Comparing Clinical Judgment with MySurgeryRisk Algorithm for Preoperative Risk Assessment: A Pilot Study

Background: Major postoperative complications are associated with increa...
06/12/2019

Applying economic measures to lapse risk management with machine learning approaches

Modeling policyholders lapse behaviors is important to a life insurer si...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.