Stability of decision trees and logistic regression

03/03/2019
by   Nino Arsov, et al.
0

Decision trees and logistic regression are one of the most popular and well-known machine learning algorithms, frequently used to solve a variety of real-world problems. Stability of learning algorithms is a powerful tool to analyze their performance and sensitivity and subsequently allow researchers to draw reliable conclusions. The stability of these two algorithms has remained obscure. To that end, in this paper, we derive two stability notions for decision trees and logistic regression: hypothesis and pointwise hypothesis stability. Additionally, we derive these notions for L2-regularized logistic regression and confirm existing findings that it is uniformly stable. We show that the stability of decision trees depends on the number of leaves in the tree, i.e., its depth, while for logistic regression, it depends on the smallest eigenvalue of the Hessian matrix of the cross-entropy loss. We show that logistic regression is not a stable learning algorithm. We construct the upper bounds on the generalization error of all three algorithms. Moreover, we present a novel stability measuring framework that allows one to measure the aforementioned notions of stability. The measures are equivalent to estimates of expected loss differences at an input example and then leverage bootstrap sampling to yield statistically reliable estimates. Finally, we apply this framework to the three algorithms analyzed in this paper to confirm our theoretical findings and, in addition, we discuss the possibilities of developing new training techniques to optimize the stability of logistic regression, and hence decrease its generalization error.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/10/2021

Classification of the Chess Endgame problem using Logistic Regression, Decision Trees, and Neural Networks

In this study we worked on the classification of the Chess Endgame probl...
research
06/19/2018

Employee Attrition Prediction

We aim to predict whether an employee of a company will leave or not, us...
research
01/26/2019

Stacking and stability

Stacking is a general approach for combining multiple models toward grea...
research
03/24/2023

Feature Space Sketching for Logistic Regression

We present novel bounds for coreset construction, feature selection, and...
research
12/12/2012

Almost-everywhere algorithmic stability and generalization error

We explore in some detail the notion of algorithmic stability as a viabl...
research
03/12/2023

Predicting Hurricane Evacuation Decisions with Interpretable Machine Learning Models

The aggravating effects of climate change and the growing population in ...
research
11/07/2018

Interpreting the Ising Model: The Input Matters

The Ising model is a widely used model for multivariate binary data. It ...

Please sign up or login with your details

Forgot password? Click here to reset