Adaptive penalization in high-dimensional regression and classification with external covariates using variational Bayes

11/07/2018
by   Britta Velten, et al.
0

Penalization schemes like Lasso or ridge regression are routinely used to regress a response of interest on a high-dimensional set of potential predictors. Despite being decisive, the question of the relative strength of penalization is often glossed over and only implicitly determined by the scale of individual predictors. At the same time, additional information on the predictors is available in many applications but left unused. Here, we propose to make use of such external covariates to adapt the penalization in a data-driven manner. We present a method that differentially penalizes feature groups defined by the covariates and adapts the relative strength of penalization to the information content of each group. Using techniques from the Bayesian tool-set our procedure combines shrinkage with feature selection and provides a scalable optimization scheme. We demonstrate in simulations that the method accurately recovers the true effect sizes and sparsity patterns per feature group. Furthermore, it leads to an improved prediction performance in situations where the groups have strong differences in dynamic range. In applications to data from high-throughput biology, the method enables re-weighting the importance of feature groups from different assays. Overall, using available covariates extends the range of applications of penalized regression, improves model interpretability and can improve prediction performance. We provide an open-source implementation of the method in the R package graper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2020

Flexible co-data learning for high-dimensional prediction

Clinical research often focuses on complex traits in which many variable...
research
06/03/2020

Structure Adaptive Lasso

Lasso is of fundamental importance in high-dimensional statistics and ha...
research
03/11/2022

Optimal Covariate Weighting Increases Discoveries in High-throughput Biology

The large-scale multiple testing inherent to high throughput biological ...
research
10/29/2020

Group-regularized ridge regression via empirical Bayes noise level cross-validation

Features in predictive models are not exchangeable, yet common supervise...
research
10/31/2022

Prediction of Network Covariates Using Edge and Node Attributes

In this work we consider the setting where many networks are observed on...
research
09/08/2009

Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping

We consider the problem of estimating a sparse multi-response regression...
research
10/02/2017

Scalable Bayesian regression in high dimensions with multiple data sources

Current applications of high-dimensional regression in biomedicine often...

Please sign up or login with your details

Forgot password? Click here to reset