Fast Bayesian Feature Selection for High Dimensional Linear Regression in Genomics via the Ising Approximation

07/30/2014
by   Charles K. Fisher, et al.
0

Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we introduce a new approach -- the Bayesian Ising Approximation (BIA) -- to rapidly calculate posterior probabilities for feature relevance in L2 penalized linear regression. In the regime where the regression problem is strongly regularized by the prior, we show that computing the marginal posterior probabilities for features is equivalent to computing the magnetizations of an Ising model. Using a mean field approximation, we show it is possible to rapidly compute the feature selection path described by the posterior probabilities as a function of the L2 penalty. We present simulations and analytical results illustrating the accuracy of the BIA on some simple regression problems. Finally, we demonstrate the applicability of the BIA to high dimensional regression by analyzing a gene expression dataset with nearly 30,000 features.

READ FULL TEXT
research
11/03/2014

Bayesian feature selection with strongly-regularizing priors maps to the Ising Model

Identifying small subsets of features that are relevant for prediction a...
research
11/23/2014

Diversifying Sparsity Using Variational Determinantal Point Processes

We propose a novel diverse feature selection method based on determinant...
research
06/10/2017

Stepwise regression for unsupervised learning

I consider unsupervised extensions of the fast stepwise linear regressio...
research
02/18/2013

Feature Multi-Selection among Subjective Features

When dealing with subjective, noisy, or otherwise nebulous features, the...
research
02/07/2023

Sparse GEMINI for Joint Discriminative Clustering and Feature Selection

Feature selection in clustering is a hard task which involves simultaneo...
research
09/30/2017

Testing for Feature Relevance: The HARVEST Algorithm

Feature selection with high-dimensional data and a very small proportion...
research
03/10/2020

Short-Term Forecasting of CO2 Emission Intensity in Power Grids by Machine Learning

A machine learning algorithm is developed to forecast the CO2 emission i...

Please sign up or login with your details

Forgot password? Click here to reset