Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models

02/07/2019
by   Jurre R. Veerman, et al.
0

For high-dimensional linear regression models, we review and compare several estimators of variances τ^2 and σ^2 of the random slopes and errors, respectively. These variances relate directly to ridge regression penalty λ and heritability index h^2, often used in genetics. Direct and indirect estimators of these, either based on cross-validation (CV) or maximum marginal likelihood (MML), are also discussed. The comparisons include several cases of covariate matrix X_n × p, with p ≫ n, such as multi-collinear covariates and data-derived ones. In addition, we study robustness against departures from the model such as sparse instead of dense effects and non-Gaussian errors. An example on weight gain data with genomic covariates confirms the good performance of MML compared to CV. Several extensions are presented. First, to the high-dimensional linear mixed effects model, with REML as an alternative to MML. Second, to the conjugate Bayesian setting, which proves to be a good alternative. Third, and most prominently, to generalized linear models for which we derive a computationally efficient MML estimator by re-writing the marginal likelihood as an n-dimensional integral. For Poisson and Binomial ridge regression, we demonstrate the superior accuracy of the resulting MML estimator of λ as compared to CV. Software is provided to enable reproduction of all results presented here.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2022

Penalization-induced shrinking without rotation in high dimensional GLM regression: a cavity analysis

In high dimensional regression, where the number of covariates is of the...
research
05/19/2020

Fast cross-validation for multi-penalty ridge regression

Prediction based on multiple high-dimensional data types needs to accoun...
research
06/17/2020

Revisiting complexity and the bias-variance tradeoff

The recent success of high-dimensional models, such as deep neural netwo...
research
02/27/2020

Tuning-free ridge estimators for high-dimensional generalized linear models

Ridge estimators regularize the squared Euclidean lengths of parameters....
research
08/13/2018

A Nonparametric Bayesian Method for Clustering of High-Dimensional Mixed Dataset

Motivation: Advances in next-generation sequencing (NGS) methods have en...
research
09/23/2021

High-dimensional regression with potential prior information on variable importance

There are a variety of settings where vague prior information may be ava...
research
03/26/2020

A partial graphical model with a structural prior on the direct links between predictors and responses

This paper is devoted to the estimation of a partial graphical model wit...

Please sign up or login with your details

Forgot password? Click here to reset