SEAGLE: A Scalable Exact Algorithm for Large-Scale Set-Based GxE Tests in Biobank Data

by   Jocelyn T. Chi, et al.

The explosion of biobank data offers immediate opportunities for gene-environment (GxE) interaction studies of complex diseases because of the large sample sizes and the rich collection in genetic and non-genetic information. However, the extremely large sample size also introduces new computational challenges in GxE assessment, especially for set-based GxE variance component (VC) tests, which are a widely used strategy to boost overall GxE signals and to evaluate the joint GxE effect of multiple variants from a biologically meaningful unit (e.g., gene). In this work, we focus on continuous traits and present SEAGLE, a Scalable Exact AlGorithm for Large-scale set-based GxE tests, to permit GxE VC tests for biobank-scale data. SEAGLE employs modern matrix computations to achieve the same "exact" results as the original GxE VC tests without imposing additional assumptions or relying on approximations. SEAGLE can easily accommodate sample sizes in the order of 10^5, is implementable on standard laptops, and does not require specialized computing equipment. We demonstrate SEAGLE's performance through extensive simulations. We illustrate its utility by conducting genome-wide gene-based GxE analysis on the Taiwan Biobank data to explore the interaction of gene and physical activity status on body mass index.



There are no comments yet.


page 5

page 6

page 8

page 9

page 11

page 12

page 14

page 18


Integrated Quantile RAnk Test (iQRAT) for gene-level associations in sequencing studies

Testing gene-based associations is the fundamental approach to identify ...

A powerful test for differentially expressed gene pathways via graph-informed structural equation modeling

A major task in genetic studies is to identify genes related to human di...

Set-Based Tests for Genetic Association Using the Generalized Berk-Jones Statistic

Studying the effects of groups of Single Nucleotide Polymorphisms (SNPs)...

Unlocking Personalized Healthcare on Modern CPUs/GPUs: Three-way Gene Interaction Study

Developments in Genome-Wide Association Studies have led to the increasi...

LAGE: A Java Framework to reconstruct Gene Regulatory Networks from Large-Scale Continues Expression Data

LAGE is a systematic framework developed in Java. The motivation of LAGE...

Segmentation and genome annotation algorithms

Segmentation and genome annotation (SAGA) algorithms are widely used to ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.