A Descriptive Study of Variable Discretization and Cost-Sensitive Logistic Regression on Imbalanced Credit Data

12/28/2018
by   Lili Zhang, et al.
0

Training classification models on imbalanced data sets tends to result in bias towards the majority class. In this paper, we demonstrate how the variable discretization and Cost-Sensitive Logistic Regression help mitigate this bias on an imbalanced credit scoring data set. 10-fold cross-validation is used as the evaluation method, and the performance measurements are ROC curves and the associated Area Under the Curve. The results show that good variable discretization and Cost-Sensitive Logistic Regression with the best class weight can reduce the model bias and/or variance. It is also shown that effective variable selection helps reduce the model variance. From the algorithm perspective, Cost-Sensitive Logistic Regression is beneficial for increasing the prediction ability of predictors even if they are not in their best forms and keeping the multivariate effect and univariate effect of predictors consistent. From the predictors perspective, the variable discretization performs slightly better than Cost-Sensitive Logistic Regression, provides more reasonable coefficient estimates for predictors which have nonlinear relationship against their empirical logit, and is robust to penalty weights of misclassifications of events and non-events determined by their proportions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2019

F-measure Maximizing Logistic Regression

Logistic regression is a widely used method in several fields. When appl...
research
06/19/2022

Primal Estimated Subgradient Solver for SVM for Imbalanced Classification

We aim to demonstrate in experiments that our cost sensitive PEGASOS SVM...
research
10/09/2020

CryptoCredit: Securely Training Fair Models

When developing models for regulated decision making, sensitive features...
research
01/02/2019

An Automatic Interaction Detection Hybrid Model for Bankcard Response Classification

In this paper, we propose a hybrid bankcard response model, which integr...
research
04/27/2022

Asymptotic Inference for Infinitely Imbalanced Logistic Regression

In this paper we extend the work of Owen (2007) by deriving a second ord...
research
01/03/2018

Modeling Interaction Effects in Logistic Regression: Information Analysis

The Akaike information criterion (AIC) is commonly used to select a logi...
research
02/18/2023

Identify local limiting factors of species distribution using min-linear logistic regression

Logistic regression is a commonly used building block in ecological mode...

Please sign up or login with your details

Forgot password? Click here to reset