High heritability does not imply accurate prediction under the small additive effects hypothesis

07/10/2020
by   Arthur Frouin, et al.
0

Genome-Wide Association Studies (GWAS) explain only a small fraction of heritability for most complex human phenotypes. Genomic heritability estimates the variance explained by the SNPs on the whole genome using mixed models and accounts for the many small contributions of SNPs in the explanation of a phenotype. This paper approaches heritability from a machine learning perspective, and examines the close link between mixed models and ridge regression. Our contribution is twofold. First, we propose estimating genomic heritability using a predictive approach via ridge regression and Generalized Cross Validation (GCV). We show that this is consistent with classical mixed model based estimation. Second, we derive simple formulae that express prediction accuracy as a function of the ratio n/p, where n is the population size and p the total number of SNPs. These formulae clearly show that a high heritability does not imply an accurate prediction when p>n. Both the estimation of heritability via GCV and the prediction accuracy formulae are validated using simulated data and real data from UK Biobank.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2019

Cross-trait prediction accuracy of high-dimensional ridge-type estimators in genome-wide association studies

Marginal association summary statistics have attracted great attention i...
research
03/29/2016

Locally Epistatic Models for Genome-wide Prediction and Association by Importance Sampling

In statistical genetics an important task involves building predictive m...
research
01/17/2021

Variance Estimation and Confidence Intervals from High-dimensional Genome-wide Association Studies Through Misspecified Mixed Model Analysis

We study variance estimation and associated confidence intervals for par...
research
04/25/2023

Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation

We study subsampling-based ridge ensembles in the proportional asymptoti...
research
10/30/2017

Contextual Regression: An Accurate and Conveniently Interpretable Nonlinear Model for Mining Discovery from Scientific Data

Machine learning algorithms such as linear regression, SVM and neural ne...
research
01/24/2023

Think before you shrink: Alternatives to default shrinkage methods can improve prediction accuracy, calibration and coverage

While shrinkage is essential in high-dimensional settings, its use for l...
research
01/09/2019

The Mahalanobis kernel for heritability estimation in genome-wide association studies: fixed-effects and random-effects methods

Linear mixed models (LMMs) are widely used for heritability estimation i...

Please sign up or login with your details

Forgot password? Click here to reset