Feature set optimization by clustering, univariate association, Deep Machine learning omics Wide Association Study (DMWAS) for Biomarkers discovery as tested on GTEx pilot

02/24/2021
by   Abhishek Narain Singh, et al.
0

Univariate and multivariate methods for association of the genom-ic variations with the end-or-endo phenotype have been widely used for genome wide association studies. In addition to encoding the SNPs, we advocate usage of clustering as a novel method to encode the structural variations, SVs, in genomes, such as the deletions and insertions polymorphism (DIPs), Copy Number Variations (CNVs), translocation, inversion, etc., that can be used as an independent fea-ture variable value for downstream computation by artificial intelli-gence methods to predict the endo-or-end phenotype. We introduce a clustering based encoding scheme for structural variations and om-ics based analysis. We conducted a complete all genomic variants association with the phenotype using deep learning and other ma-chine learning techniques, though other methods such as genetic al-gorithm can also be applied. Applying this encoding of SVs and one-hot encoding of SNPs on GTEx V7 pilot DNA variation dataset, we were able to get high accuracy using various methods of DMWAS, and particularly found logistic regression to work the best for death due to heart-attack (MHHRTATT) phenotype. The genom-ic variants acting as feature sets were then arranged in descending order of power of impact on the disease or trait phenotype, which we call optimization and that also uses top univariate association into account. Variant Id P1_M_061510_3_402_P at chromosome 3 position 192063195 was found to be most highly associated to MHHRTATT. We present here the top ten optimized genomic va-riant feature set for the MHHRTATT phenotypic cause of death.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/09/2018

Deep Learning Classification of Polygenic Obesity using Genome Wide Association Study SNPs

In this paper, association results from genome-wide association studies ...
research
12/11/2019

More for less: Predicting and maximizing genetic variant discovery via Bayesian nonparametrics

While the cost of sequencing genomes has decreased dramatically in recen...
research
01/04/2018

Generalized Similarity U: A Non-parametric Test of Association Based on Similarity

Second generation sequencing technologies are being increasingly used fo...
research
07/03/2020

Deep interpretability for GWAS

Genome-Wide Association Studies are typically conducted using linear mod...
research
12/12/2018

Association Analysis of Common and Rare SNVs using Adaptive Fisher Method to Detect Dense and Sparse Signals

The development of next generation sequencing (NGS) technology and genot...
research
01/06/2018

Utilising Deep Learning and Genome Wide Association Studies for Epistatic-Driven Preterm Birth Classification in African-American Women

Genome Wide Association Studies (GWAS) are used to identify statisticall...

Please sign up or login with your details

Forgot password? Click here to reset