SAERMA: Stacked Autoencoder Rule Mining Algorithm for the Interpretation of Epistatic Interactions in GWAS for Extreme Obesity

One of the most important challenges in the analysis of high-throughput genetic data is the development of efficient computational methods to identify statistically significant Single Nucleotide Polymorphisms (SNPs). Genome-wide association studies (GWAS) use single-locus analysis where each SNP is independently tested for association with phenotypes. The limitation with this approach, however, is its inability to explain genetic variation in complex diseases. Alternative approaches are required to model the intricate relationships between SNPs. Our proposed approach extends GWAS by combining deep learning stacked autoencoders (SAEs) and association rule mining (ARM) to identify epistatic interactions between SNPs. Following traditional GWAS quality control and association analysis, the most significant SNPs are selected and used in the subsequent analysis to investigate epistasis. SAERMA controls the classification results produced in the final fully connected multi-layer feedforward artificial neural network (MLP) by manipulating the interestingness measures, support and confidence, in the rule generation process. The best classification results were achieved with 204 SNPs compressed to 100 units (77 although it was possible to achieve 73 logloss=0.62, and MSE=0.21) with 50 hidden units - both supported by close model interpretation.

READ FULL TEXT

page 1

page 8

research
04/16/2018

Analysis of Extremely Obese Individuals Using Deep Learning Stacked Autoencoders and Genome-Wide Genetic Data

The aetiology of polygenic obesity is multifactorial, which indicates th...
research
08/28/2018

Extracting Epistatic Interactions in Type 2 Diabetes Genome-Wide Data Using Stacked Autoencoder

2 Diabetes is a leading worldwide public health concern, and its increas...
research
01/06/2018

Utilising Deep Learning and Genome Wide Association Studies for Epistatic-Driven Preterm Birth Classification in African-American Women

Genome Wide Association Studies (GWAS) are used to identify statisticall...
research
04/15/2014

Bayesian Neural Networks for Genetic Association Studies of Complex Disease

Discovering causal genetic variants from large genetic association studi...
research
03/29/2016

Locally Epistatic Models for Genome-wide Prediction and Association by Importance Sampling

In statistical genetics an important task involves building predictive m...
research
02/02/2021

Mining Feature Relationships in Data

When faced with a new dataset, most practitioners begin by performing ex...
research
02/14/2022

Gain-loss ratio of storing intermediate data from workflows

Sequentially, the systematic processing of a significant amount of data ...

Please sign up or login with your details

Forgot password? Click here to reset