On block-wise and reference panel-based estimators for genetic data prediction in high dimensions

03/22/2022
by   Bingxin Zhao, et al.
0

Genetic prediction of complex traits and diseases has attracted enormous attention in precision medicine, mainly because it has the potential to translate discoveries from genome-wide association studies (GWAS) into medical advances. As the high dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants has a block-diagonal structure, many existing methods attempt to account for the dependence among variants in predetermined local LD blocks/regions. Moreover, due to privacy restrictions and data protection concerns, genetic variant dependence in each LD block is typically estimated from external reference panels rather than the original training dataset. This paper presents a unified analysis of block-wise and reference panel-based estimators in a high-dimensional prediction framework without sparsity restrictions. We find that, surprisingly, even when the covariance matrix has a block-diagonal structure with well-defined boundaries, block-wise estimation methods adjusting for local dependence can be substantially less accurate than methods controlling for the whole covariance matrix. Further, estimation methods built on the original training dataset and external reference panels are likely to have varying performance in high dimensions, which may reflect the cost of having only access to summary level data from the training dataset. This analysis is based on our novel results in random matrix theory for block-diagonal covariance matrix. We numerically evaluate our results using extensive simulations and the large-scale UK Biobank real data analysis of 36 complex traits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2023

High-dimensional statistical inference for linkage disequilibrium score regression and its cross-ancestry extensions

Linkage disequilibrium score regression (LDSC) has emerged as an essenti...
research
02/03/2023

Covariance estimation with uniform blocks

Estimating a covariance matrix is central to high-dimensional data analy...
research
12/13/2018

Higher Moment Estimation for Elliptically-distributed Data: Is it Necessary to Use a Sledgehammer to Crack an Egg?

Multivariate elliptically-contoured distributions are widely used for mo...
research
06/08/2021

A Unified Approach to Robust Inference for Genetic Covariance

Genome-wide association studies (GWAS) have identified thousands of gene...
research
10/21/2022

Comparison of REML methods for the study of phenome-wide genetic variation

It is now well documented that genetic covariance between functionally r...
research
04/04/2023

Semiparametric efficient estimation of genetic relatedness with double machine learning

In this paper, we propose double machine learning procedures to estimate...
research
12/14/2021

Euclid: Covariance of weak lensing pseudo-C_ℓ estimates. Calculation, comparison to simulations, and dependence on survey geometry

An accurate covariance matrix is essential for obtaining reliable cosmol...

Please sign up or login with your details

Forgot password? Click here to reset