Solar: a least-angle regression for accurate and stable variable selection in high-dimensional data

by   Ning Xu, et al.

We propose a new least-angle regression algorithm for variable selection in high-dimensional data, called subsample-ordered least-angle regression (solar). Solar relies on the average L_0 solution path computed across subsamples and largely alleviates several known high-dimensional issues with least-angle regression. Using examples based on directed acyclic graphs, we illustrate the advantages of solar in comparison to least-angle regression, forward regression and variable screening. Simulations demonstrate that, with a similar computation load, solar yields substantial improvements over two lasso solvers (least-angle regression for lasso and coordinate-descent) in terms of the sparsity (37-64% reduction in the average number of selected variables), stability and accuracy of variable selection. Simulations also demonstrate that solar enhances the robustness of variable selection to different settings of the irrepresentable condition and to variations in the dependence structures assumed in regression analysis. We provide a Python package solarpy for the algorithm.



There are no comments yet.


page 11

page 24

page 28

page 35


Accuracy and stability of solar variable selection comparison under complicated dependence structures

In this paper we focus on the variable-selection peformance of solar on ...

Least angle and ℓ_1 penalized regression: A review

Least Angle Regression is a promising technique for variable selection a...

A critical review of LASSO and its derivatives for variable selection under dependence among covariates

We study the limitations of the well known LASSO regression as a variabl...

ENNS: Variable Selection, Regression, Classification and Deep Neural Network for High-Dimensional Data

High-dimensional, low sample-size (HDLSS) data problems have been a topi...

Mixed Effect Modeling and Variable Selection for Quantile Regression

It is known that the estimating equations for quantile regression (QR) c...

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Penalized likelihood methods are widely used for high-dimensional regres...

When Does the First Spurious Variable Get Selected by Sequential Regression Procedures?

Applied statisticians use sequential regression procedures to produce a ...

Code Repositories


subsample-order least-angle regression, a algorithm that performs quick, sparse, stable and accurate variable selection even under complicated dependence structures, harsh irrepresentable conditions and high multicollinearity.

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.