The xyz algorithm for fast interaction search in high-dimensional data

10/17/2016
by   Gian-Andrea Thanei, et al.
0

When performing regression on a dataset with p variables, it is often of interest to go beyond using main linear effects and include interactions as products between individual variables. For small-scale problems, these interactions can be computed explicitly but this leads to a computational complexity of at least O(p^2) if done naively. This cost can be prohibitive if p is very large. We introduce a new randomised algorithm that is able to discover interactions with high probability and under mild conditions has a runtime that is subquadratic in p. We show that strong interactions can be discovered in almost linear time, whilst finding weaker interactions requires O(p^α) operations for 1 < α < 2 depending on their strength. The underlying idea is to transform interaction search into a closestpair problem which can be solved efficiently in subquadratic time. The algorithm is called xyz and is implemented in the language R. We demonstrate its efficiency for application to genome-wide association studies, where more than 10^11 interactions can be screened in under 280 seconds with a single-core 1.2 GHz CPU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2013

Random Intersection Trees

Finding interactions between variables in large and high-dimensional dat...
research
10/29/2018

Fast Computation of Genome-Metagenome Interaction Effects

Motivation:Association studies usually search for association between co...
research
02/10/2019

BOLT-SSI: A Statistical Approach to Screening Interaction Effects for Ultra-High Dimensional Data

Detecting interaction effects is a crucial step in various applications....
research
06/23/2021

The SKIM-FA Kernel: High-Dimensional Variable Selection and Nonlinear Interaction Discovery in Linear Time

Many scientific problems require identifying a small set of covariates t...
research
04/06/2018

High-dimensional Adaptive Minimax Sparse Estimation with Interactions

High-dimensional linear regression with interaction effects is broadly a...
research
04/02/2023

Tensor Recovery in High-Dimensional Ising Models

The k-tensor Ising model is an exponential family on a p-dimensional bin...
research
01/10/2022

Fiuncho: a program for any-order epistasis detection in CPU clusters

Epistasis can be defined as the statistical interaction of genes during ...

Please sign up or login with your details

Forgot password? Click here to reset