Efficient Penalized Generalized Linear Mixed Models for Variable Selection and Genetic Risk Prediction in High-Dimensional Data

06/24/2022
by   Julien St-Pierre, et al.
0

Sparse regularized regression methods are now widely used in genome-wide association studies (GWAS) to address the multiple testing burden that limits discovery of potentially important predictors. Linear mixed models (LMMs) have become an attractive alternative to principal components (PC) adjustment to account for population structure and relatedness in high-dimensional penalized models. However, their use in binary trait GWAS rely on the invalid assumption that the residual variance does not depend on the estimated regression coefficients. Moreover, LMMs use a single spectral decomposition of the covariance matrix of the responses, which is no longer possible in generalized linear mixed models (GLMMs). We introduce a new method called pglmm, a penalized GLMM that allows to simultaneously select genetic markers and estimate their effects, accounting for between-individual correlations and binary nature of the trait. We develop a computationally efficient algorithm based on PQL estimation that allows to scale regularized mixed models on high-dimensional binary trait GWAS ( 300,000 SNPs). We show through simulations that penalized LMM and logistic regression with PC adjustment fail to correctly select important predictors and/or that prediction accuracy decreases for a binary response when the dimensionality of the relatedness matrix is high compared to pglmm. Further, we demonstrate through the analysis of two polygenic binary traits in the UK Biobank data that our method can achieve higher predictive performance, while also selecting fewer predictors than a sparse regularized logistic lasso with PC adjustment. Our method is available as a Julia package PenalizedGLMM.jl.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2022

Modeling High-Dimensional Data with Unknown Cut Points: A Fusion Penalized Logistic Threshold Regression

In traditional logistic regression models, the link function is often as...
research
10/28/2021

Robust model-based estimation for binary outcomes in genomics studies

In quantitative genetics, statistical modeling techniques are used to fa...
research
11/13/2017

MM Algorithms for Variance Component Estimation and Selection in Logistic Linear Mixed Model

Logistic linear mixed model is widely used in experimental designs and g...
research
01/23/2019

High-dimensional Interactions Detection with Sparse Principal Hessian Matrix

In statistical methods, interactions are the contributions from the prod...
research
05/24/2018

Convex method for selection of fixed effects in high-dimensional linear mixed models

Analysis of high-dimensional data is currently a popular field of resear...
research
01/22/2019

Penalized Interaction Estimation for Ultrahigh Dimensional Quadratic Regression

Quadratic regression goes beyond the linear model by simultaneously incl...
research
02/14/2013

Locally epistatic genomic relationship matrices for genomic association, prediction and selection

As the amount and complexity of genetic information increases it is nece...

Please sign up or login with your details

Forgot password? Click here to reset