Cluster Stability Selection

01/03/2022
by   Gregory Faletto, et al.
0

Stability selection (Meinshausen and Buhlmann, 2010) makes any feature selection method more stable by returning only those features that are consistently selected across many subsamples. We prove (in what is, to our knowledge, the first result of its kind) that for data containing highly correlated proxies for an important latent variable, the lasso typically selects one proxy, yet stability selection with the lasso can fail to select any proxy, leading to worse predictive performance than the lasso alone. We introduce cluster stability selection, which exploits the practitioner's knowledge that highly correlated clusters exist in the data, resulting in better feature rankings than stability selection in this setting. We consider several feature-combination approaches, including taking a weighted average of the features in each important cluster where weights are determined by the frequency with which cluster members are selected, which we show leads to better predictive models than previous proposals. We present generalizations of theoretical guarantees from Meinshausen and Buhlmann (2010) and Shah and Samworth (2012) to show that cluster stability selection retains the same guarantees. In summary, cluster stability selection enjoys the best of both worlds, yielding a sparse selected set that is both stable and has good predictive performance.

READ FULL TEXT
research
02/27/2020

Correlated Feature Selection with Extended Exclusive Group Lasso

In many high dimensional classification or regression problems set in a ...
research
05/26/2020

The best way to select features?

Feature selection in machine learning is subject to the intrinsic random...
research
09/25/2020

Adjusted Measures for Feature Selection Stability for Data Sets with Similar Features

For data sets with similar features, for example highly correlated featu...
research
05/30/2017

Forward-Backward Selection with Early Dropping

Forward-backward selection is one of the most basic and commonly-used fe...
research
06/15/2021

Employing an Adjusted Stability Measure for Multi-Criteria Model Fitting on Data Sets with Similar Features

Fitting models with high predictive accuracy that include all relevant b...
research
03/25/2015

Stable Feature Selection from Brain sMRI

Neuroimage analysis usually involves learning thousands or even millions...
research
10/26/2018

Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization

The Model-X knockoff procedure has recently emerged as a powerful approa...

Please sign up or login with your details

Forgot password? Click here to reset