Predicting class-imbalanced business risk using resampling, regularization, and model ensembling algorithms

03/13/2019
by   Yan Wang, et al.
0

We aim at developing and improving the imbalanced business risk modeling via jointly using proper evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques. Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison based on 10-fold cross validation. Two undersampling strategies including random undersampling (RUS) and cluster centroid undersampling (CCUS), as well as two oversampling methods including random oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR (L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and Boosting, are applied on the DT classifier for further model improvement. The results show that, Boosting on DT by using the oversampled data containing 50 model and it can achieve AUC, recall, and F1 score valued 0.8633, 0.9260, and 0.8907, respectively.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 10

09/09/2020

Developing and Improving Risk Models using Machine-learning Based Algorithms

The objective of this study is to develop a good risk model for classify...
01/24/2019

A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization

This paper aims to explore models based on the extreme gradient boosting...
10/29/2020

Limitations of ROC on Imbalanced Data: Evaluation of LVAD Mortality Risk Scores

Objective: This study illustrates the ambiguity of ROC in evaluating two...
06/15/2021

CatBoost model with synthetic features in application to loan risk assessment of small businesses

Loan risk for small businesses has long been a complex problem worthy of...
08/18/2019

Neural Network Based Undersampling Techniques

Class imbalance problem is commonly faced while developing machine learn...
09/21/2021

Accommodating heterogeneous missing data patterns for prostate cancer risk prediction

Objective: We compared six commonly used logistic regression methods for...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.