Sparsification and feature selection by compressive linear regression

10/21/2009
by   Florin Popescu, et al.
0

The Minimum Description Length (MDL) principle states that the optimal model for a given data set is that which compresses it best. Due to practial limitations the model can be restricted to a class such as linear regression models, which we address in this study. As in other formulations such as the LASSO and forward step-wise regression we are interested in sparsifying the feature set while preserving generalization ability. We derive a well-principled set of codes for both parameters and error residuals along with smooth approximations to lengths of these codes as to allow gradient descent optimization of description length, and go on to show that sparsification and feature selection using our approach is faster than the LASSO on several datasets from the UCI and StatLib repositories, with favorable generalization accuracy, while being fully automatic, requiring neither cross-validation nor tuning of regularization hyper-parameters, allowing even for a nonlinear expansion of the feature set followed by sparsification.

READ FULL TEXT
research
11/06/2020

Curse of Small Sample Size in Forecasting of the Active Cases in COVID-19 Outbreak

During the COVID-19 pandemic, a massive number of attempts on the predic...
research
02/13/2019

Differential Description Length for Hyperparameter Selection in Machine Learning

This paper introduces a new method for model selection and more generall...
research
10/20/2020

On the Adversarial Robustness of LASSO Based Feature Selection

In this paper, we investigate the adversarial robustness of feature sele...
research
03/10/2020

Short-Term Forecasting of CO2 Emission Intensity in Power Grids by Machine Learning

A machine learning algorithm is developed to forecast the CO2 emission i...
research
06/10/2017

Stepwise regression for unsupervised learning

I consider unsupervised extensions of the fast stepwise linear regressio...
research
06/03/2011

Multi-stage Convex Relaxation for Feature Selection

A number of recent work studied the effectiveness of feature selection u...
research
02/17/2021

Muddling Labels for Regularization, a novel approach to generalization

Generalization is a central problem in Machine Learning. Indeed most pre...

Please sign up or login with your details

Forgot password? Click here to reset