CatBoost model with synthetic features in application to loan risk assessment of small businesses

by   Haoxue Wang, et al.

Loan risk for small businesses has long been a complex problem worthy of exploring. Predicting the loan risk can benefit entrepreneurship by developing more jobs for the society. CatBoost (Categorical Boosting) is a powerful machine learning algorithm suitable for dataset with many categorical variables like the dataset for forecasting loan risk. In this paper, we identify the important risk factors that contribute to loan status classification problem. Then we compare the performance between boosting-type algorithms(especially CatBoost) with other traditional yet popular ones. The dataset we adopt in the research comes from the U.S. Small Business Administration (SBA) and holds a very large sample size (899,164 observations and 27 features). In order to make the best use of the important features in the dataset, we propose a technique named "synthetic generation" to develop more combined features based on arithmetic operation, which ends up improving the accuracy and AUC of the original CatBoost model. We obtain a high accuracy of 95.84 AUC of 98.80



page 4


Gradient Boosting on Decision Trees for Mortality Prediction in Transcatheter Aortic Valve Implantation

Current prognostic risk scores in cardiac surgery are based on statistic...

Predicting class-imbalanced business risk using resampling, regularization, and model ensembling algorithms

We aim at developing and improving the imbalanced business risk modeling...

Investigating Critical Risk Factors in Liver Cancer Prediction

We exploit liver cancer prediction model using machine learning algorith...

CatBoost: gradient boosting with categorical features support

In this paper we present CatBoost, a new open-sourced gradient boosting ...

Qualitätsmaße binärer Klassifikationen im Bereich kriminalprognostischer Instrumente der vierten Generation

This master's thesis discusses an important issue regarding how algorith...

Comparing Clinical Judgment with MySurgeryRisk Algorithm for Preoperative Risk Assessment: A Pilot Study

Background: Major postoperative complications are associated with increa...

Applying economic measures to lapse risk management with machine learning approaches

Modeling policyholders lapse behaviors is important to a life insurer si...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.