Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

08/04/2016
by   Kevin L. Keys, et al.
0

A genome-wide association study (GWAS) correlates marker variation with trait variation in a sample of individuals. Each study subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here we assume that subjects are unrelated and collected at random and that trait values are normally distributed or transformed to normality. Over the past decade, researchers have been remarkably successful in applying GWAS analysis to hundreds of traits. The massive amount of data produced in these studies present unique computational challenges. Penalized regression with LASSO or MCP penalties is capable of selecting a handful of associated SNPs from millions of potential SNPs. Unfortunately, model selection can be corrupted by false positives and false negatives, obscuring the genetic underpinning of a trait. This paper introduces the iterative hard thresholding (IHT) algorithm to the GWAS analysis of continuous traits. Our parallel implementation of IHT accommodates SNP genotype compression and exploits multiple CPU cores and graphics processing units (GPUs). This allows statistical geneticists to leverage commodity desktop computers in GWAS analysis and to avoid supercomputing. We evaluate IHT performance on both simulated and real GWAS data and conclude that it reduces false positive and false negative rates while remaining competitive in computational time with penalized regression. Source code is freely available at https://github.com/klkeys/IHT.jl.

READ FULL TEXT

page 9

page 10

research
09/18/2020

Multiple-trait Adaptive Fisher's Method for Genome-wide Association Studies

In genome-wide association studies (GWASs), there is an increasing need ...
research
04/20/2022

An Adaptive and Robust Method for Multi-trait Analysis of Genome-wide Association Studies Using Summary Statistics

Genome-wide association studies (GWAS) have identified thousands of gene...
research
07/27/2018

VIMCO: Variational Inference for Multiple Correlated Outcomes in Genome-wide Association Studies

In Genome-Wide Association Studies (GWAS) where multiple correlated trai...
research
05/03/2018

REMI: Regression with marginal information and its application in genome-wide association studies

In this study, we consider the problem of variable selection and estimat...
research
12/08/2015

Nonparametric Reduced-Rank Regression for Multi-SNP, Multi-Trait Association Mapping

Genome-wide association studies have proven to be essential for understa...
research
11/15/2020

A robust statistical method for Genome-wide association analysis of human copy number variation

Conducting genome-wide association studies (GWAS) in copy number variati...
research
02/01/2020

Higher Criticism Tuned Regression For Weak And Sparse Signals

Here we propose a novel searching scheme for a tuning parameter in high-...

Please sign up or login with your details

Forgot password? Click here to reset