SEAGLE: A Scalable Exact Algorithm for Large-Scale Set-Based GxE Tests in Biobank Data

05/07/2021
by   Jocelyn T. Chi, et al.
0

The explosion of biobank data offers immediate opportunities for gene-environment (GxE) interaction studies of complex diseases because of the large sample sizes and the rich collection in genetic and non-genetic information. However, the extremely large sample size also introduces new computational challenges in GxE assessment, especially for set-based GxE variance component (VC) tests, which are a widely used strategy to boost overall GxE signals and to evaluate the joint GxE effect of multiple variants from a biologically meaningful unit (e.g., gene). In this work, we focus on continuous traits and present SEAGLE, a Scalable Exact AlGorithm for Large-scale set-based GxE tests, to permit GxE VC tests for biobank-scale data. SEAGLE employs modern matrix computations to achieve the same "exact" results as the original GxE VC tests without imposing additional assumptions or relying on approximations. SEAGLE can easily accommodate sample sizes in the order of 10^5, is implementable on standard laptops, and does not require specialized computing equipment. We demonstrate SEAGLE's performance through extensive simulations. We illustrate its utility by conducting genome-wide gene-based GxE analysis on the Taiwan Biobank data to explore the interaction of gene and physical activity status on body mass index.

READ FULL TEXT

page 5

page 6

page 8

page 9

page 11

page 12

page 14

page 18

research
10/22/2019

Integrated Quantile RAnk Test (iQRAT) for gene-level associations in sequencing studies

Testing gene-based associations is the fundamental approach to identify ...
research
09/21/2023

Unveiling Challenges in Mendelian Randomization for Gene-Environment Interaction

Many diseases and traits involve a complex interplay between genes and e...
research
05/17/2021

A powerful test for differentially expressed gene pathways via graph-informed structural equation modeling

A major task in genetic studies is to identify genes related to human di...
research
10/06/2017

Set-Based Tests for Genetic Association Using the Generalized Berk-Jones Statistic

Studying the effects of groups of Single Nucleotide Polymorphisms (SNPs)...
research
01/26/2022

Unlocking Personalized Healthcare on Modern CPUs/GPUs: Three-way Gene Interaction Study

Developments in Genome-Wide Association Studies have led to the increasi...
research
11/09/2012

LAGE: A Java Framework to reconstruct Gene Regulatory Networks from Large-Scale Continues Expression Data

LAGE is a systematic framework developed in Java. The motivation of LAGE...

Please sign up or login with your details

Forgot password? Click here to reset