Cross-trait prediction accuracy of high-dimensional ridge-type estimators in genome-wide association studies

11/22/2019
by   Bingxin Zhao, et al.
0

Marginal association summary statistics have attracted great attention in statistical genetics, mainly because the primary results of most genome-wide association studies (GWAS) are produced by marginal screening. In this paper, we study the prediction accuracy of marginal estimator in dense (or sparsity free) high-dimensional settings with (n,p,m) →∞, m/n →γ∈ (0,∞), and p/n →ω∈ (0,∞). We consider a general correlation structure among the p features and allow an unknown subset m of them to be signals. As the marginal estimator can be viewed as a ridge estimator with regularization parameter λ→∞, we further investigate a class of ridge-type estimators in a unifying framework, including the popular best linear unbiased prediction (BLUP) in genetics. We find that the influence of λ on out-of-sample prediction accuracy heavily depends on ω. Though selecting an optimal λ can be important when p and n are comparable, it turns out that the out-of-sample R^2 of ridge-type estimators becomes near-optimal for any λ∈ (0,∞) as ω increases. For example, when features are independent, the out-of-sample R^2 is always bounded by 1/ω from above and is largely invariant to λ given large ω (say, ω>5). We also find that in-sample R^2 has completely different patterns and depends much more on λ than out-of-sample R^2. In practice, our analysis delivers useful messages for genome-wide polygenic risk prediction and computation-accuracy trade-off in dense high-dimensions. We numerically illustrate our results in simulation studies and a real data example.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2018

REMI: Regression with marginal information and its application in genome-wide association studies

In this study, we consider the problem of variable selection and estimat...
research
07/10/2020

High heritability does not imply accurate prediction under the small additive effects hypothesis

Genome-Wide Association Studies (GWAS) explain only a small fraction of ...
research
05/08/2018

Hierarchical inference for genome-wide association studies: a view on methodology with software

We provide a view on high-dimensional statistical inference for genome-w...
research
04/30/2018

Joint Analysis of Individual-level and Summary-level GWAS Data by Leveraging Pleiotropy

A large number of recent genome-wide association studies (GWASs) for com...
research
08/05/2017

A simple genome-wide association study algorithm

A computationally simple genome-wide association study (GWAS) algorithm ...
research
09/13/2023

Tackling the dimensions in imaging genetics with CLUB-PLS

A major challenge in imaging genetics and similar fields is to link high...
research
03/01/2018

Probability-Scale Residuals in HIV/AIDS Research: Diagnostics and Inference

The probability-scale residual (PSR) is well defined across a wide varie...

Please sign up or login with your details

Forgot password? Click here to reset