Loss-guided Stability Selection

02/10/2022
by   Tino Werner, et al.
0

In modern data analysis, sparse model selection becomes inevitable once the number of predictors variables is very high. It is well-known that model selection procedures like the Lasso or Boosting tend to overfit on real data. The celebrated Stability Selection overcomes these weaknesses by aggregating models, based on subsamples of the training data, followed by choosing a stable predictor set which is usually much sparser than the predictor sets from the raw models. The standard Stability Selection is based on a global criterion, namely the per-family error rate, while additionally requiring expert knowledge to suitably configure the hyperparameters. Since model selection depends on the loss function, i.e., predictor sets selected w.r.t. some particular loss function differ from those selected w.r.t. some other loss function, we propose a Stability Selection variant which respects the chosen loss function via an additional validation step based on out-of-sample validation data, optionally enhanced with an exhaustive search strategy. Our Stability Selection variants are widely applicable and user-friendly. Moreover, our Stability Selection variants can avoid the issue of severe underfitting which affects the original Stability Selection for noisy high-dimensional data, so our priority is not to avoid false positives at all costs but to result in a sparse stable model with which one can make predictions. Experiments where we consider both regression and binary classification and where we use Boosting as model selection algorithm reveal a significant precision improvement compared to raw Boosting models while not suffering from any of the mentioned issues of the original Stability Selection.

READ FULL TEXT
research
11/05/2014

Controlling false discoveries in high-dimensional situations: Boosting with stability selection

Modern biotechnologies often result in high-dimensional data sets with m...
research
09/24/2019

The column measure and Gradient-Free Gradient Boosting

Sparse model selection by structural risk minimization leads to a set of...
research
03/05/2021

Forward Stability and Model Path Selection

Most scientific publications follow the familiar recipe of (i) obtain da...
research
04/11/2020

Robust adaptive variable selection in ultra-high dimensional regression models based on the density power divergence loss

We consider the problem of simultaneous model selection and the estimati...
research
02/27/2007

The Loss Rank Principle for Model Selection

We introduce a new principle for model selection in regression and class...
research
12/13/2017

Stability Selection for Structured Variable Selection

In variable or graph selection problems, finding a right-sized model or ...
research
06/04/2018

Post model-fitting exploration via a "Next-Door" analysis

We propose a simple method for evaluating the model that has been chosen...

Please sign up or login with your details

Forgot password? Click here to reset