Deep Learning Classification of Polygenic Obesity using Genome Wide Association Study SNPs

In this paper, association results from genome-wide association studies (GWAS) are combined with a deep learning framework to test the predictive capacity of statistically significant single nucleotide polymorphism (SNPs) associated with obesity phenotype. Our approach demonstrates the potential of deep learning as a powerful framework for GWAS analysis that can capture information about SNPs and the important interactions between them. Basic statistical methods and techniques for the analysis of genetic SNP data from population-based genome-wide studies have been considered. Statistical association testing between individual SNPs and obesity was conducted under an additive model using logistic regression. Four subsets of loci after quality-control (QC) and association analysis were selected: P-values lower than 1x10-5 (5 SNPs), 1x10-4 (32 SNPs), 1x10-3 (248 SNPs) and 1x10-2 (2465 SNPs). A deep learning classifier is initialised using these sets of SNPs and fine-tuned to classify obese and non-obese observations. Using a deep learning classifier model and genetic variants with P-value < 1x10-2 (2465 SNPs) it was possible to obtain results (SE=0.9604, SP=0.9712, Gini=0.9817, LogLoss=0.1150, AUC=0.9908 and MSE=0.0300). As the P-value increased, an evident deterioration in performance was observed. Results demonstrate that single SNP analysis fails to capture the cumulative effect of less significant variants and their overall contribution to the outcome in disease prediction, which is captured using a deep learning framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/06/2018

Utilising Deep Learning and Genome Wide Association Studies for Epistatic-Driven Preterm Birth Classification in African-American Women

Genome Wide Association Studies (GWAS) are used to identify statisticall...
research
04/16/2018

Analysis of Extremely Obese Individuals Using Deep Learning Stacked Autoencoders and Genome-Wide Genetic Data

The aetiology of polygenic obesity is multifactorial, which indicates th...
research
08/12/2021

Understanding the population structure correction regression

Although genome-wide association studies (GWAS) on complex traits have a...
research
02/04/2018

Simultaneous Selection of Multiple Important Single Nucleotide Polymorphisms in Familial Genome Wide Association Studies Data

We propose a resampling-based fast variable selection technique for sele...
research
11/04/2020

A deep learning classifier for local ancestry inference

Local ancestry inference (LAI) identifies the ancestry of each segment o...
research
02/24/2022

Analysis of Genotype-Phenotype Association using Fields and Information Theory

We show how field- and information theory can be used to quantify the re...

Please sign up or login with your details

Forgot password? Click here to reset