Genome-Wide Association Studies: Information Theoretic Limits of Reliable Learning

08/10/2018
by   Behrooz Tahmasebi, et al.
0

In the problems of Genome-Wide Association Study (GWAS), the objective is to associate subsequences of individuals' genomes to the observable characteristics called phenotypes. The genome containing the biological information of an individual can be represented by a sequence of length G. Many observable characteristics of individuals can be related to a subsequence of a given length L called causal subsequence. The environmental affects make the relation between the causal subsequence and the observable characteristics a stochastic function. Our objective in this paper is to detect the causal subsequence of a specific phenotype using a dataset of N individuals and their observed characteristics. We introduce an abstract formulation of GWAS which allows us to investigate the problem from an information theoretic perspective. In particular, as the parameters N,G, and L grow, we observe a threshold effect at Gh(L/G)/N, where h(.) is the binary entropy function. This effect allows us to define the capacity of recovering the causal subsequence by denoting the rate of the GWAS problem as Gh(L/G)/N. We develop an achievable scheme and a matching converse for this problem, and thus characterize its capacity in two scenarios: the zero-error-rate and the ϵ-error-rate.

READ FULL TEXT
research
11/21/2018

Distinguishing correlation from causation using genome-wide association studies

Genome-wide association studies (GWAS) have emerged as a rich source of ...
research
08/05/2017

A simple genome-wide association study algorithm

A computationally simple genome-wide association study (GWAS) algorithm ...
research
02/13/2022

ORBGRAND Is Almost Capacity-Achieving

Decoding via sequentially guessing the error pattern in a received noisy...
research
10/08/2021

Saddlepoint approximations in binary genome-wide association studies

We investigate saddlepoint approximations applied to the score test stat...
research
02/13/2018

The Birthday Problem and Zero-Error List Codes

As an attempt to bridge the gap between classical information theory and...
research
07/29/2020

Information-Theoretic Approximation to Causal Models

Inferring the causal direction and causal effect between two discrete ra...
research
04/06/2023

Causal Discovery and Optimal Experimental Design for Genome-Scale Biological Network Recovery

Causal discovery of genome-scale networks is important for identifying p...

Please sign up or login with your details

Forgot password? Click here to reset