REMI: Regression with marginal information and its application in genome-wide association studies

05/03/2018
by   Jian Huang, et al.
0

In this study, we consider the problem of variable selection and estimation in high-dimensional linear regression models when the complete data are not accessible, but only certain marginal information or summary statistics are available. This problem is motivated from the Genome-wide association studies (GWAS) that have been widely used to identify risk variants underlying complex human traits/diseases. With a large number of completed GWAS, statistical methods using summary statistics become more and more important because of restricted accessibility to individual-level data sets. Theoretically guaranteed methods are highly demanding to advance the statistical inference with a large amount of available marginal information. Here we propose an ℓ_1 penalized approach, REMI, to estimate high dimensional regression coefficients with marginal information and external reference samples. We establish an upper bound on the error of the REMI estimator, which has the same order as that of the minimax error bound of Lasso with complete individual-level data. In particular, when marginal information is obtained from a large number of samples together with a small number of reference samples, REMI yields good estimation and prediction results, and outperforms the Lasso because the sample size of accessible individual-level data can be limited. Through simulation studies and real data analysis of the NFBC1966 GWAS data set, we demonstrate that REMI can be widely applicable. The developed R package and the codes to reproduce all the results are available at https://github.com/gordonliu810822/REMI

READ FULL TEXT

page 18

page 19

page 23

research
04/30/2018

Joint Analysis of Individual-level and Summary-level GWAS Data by Leveraging Pleiotropy

A large number of recent genome-wide association studies (GWASs) for com...
research
05/08/2018

Hierarchical inference for genome-wide association studies: a view on methodology with software

We provide a view on high-dimensional statistical inference for genome-w...
research
11/22/2019

Cross-trait prediction accuracy of high-dimensional ridge-type estimators in genome-wide association studies

Marginal association summary statistics have attracted great attention i...
research
10/01/2022

Paradoxes and resolutions for semiparametric fusion of individual and summary data

Suppose we have available individual data from an internal study and var...
research
02/01/2020

Higher Criticism Tuned Regression For Weak And Sparse Signals

Here we propose a novel searching scheme for a tuning parameter in high-...
research
11/27/2019

Class-Conditional VAE-GAN for Local-Ancestry Simulation

Local ancestry inference (LAI) allows identification of the ancestry of ...
research
08/04/2016

Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

A genome-wide association study (GWAS) correlates marker variation with ...

Please sign up or login with your details

Forgot password? Click here to reset