Deep neural networks with controlled variable selection for the identification of putative causal genetic variants

09/29/2021
by   Peyman H. Kassani, et al.
0

Deep neural networks (DNN) have been used successfully in many scientific problems for their high prediction accuracy, but their application to genetic studies remains challenging due to their poor interpretability. In this paper, we consider the problem of scalable, robust variable selection in DNN for the identification of putative causal genetic variants in genome sequencing studies. We identified a pronounced randomness in feature selection in DNN due to its stochastic nature, which may hinder interpretability and give rise to misleading results. We propose an interpretable neural network model, stabilized using ensembling, with controlled variable selection for genetic studies. The merit of the proposed method includes: (1) flexible modelling of the non-linear effect of genetic variants to improve statistical power; (2) multiple knockoffs in the input layer to rigorously control false discovery rate; (3) hierarchical layers to substantially reduce the number of weight parameters and activations to improve computational efficiency; (4) de-randomized feature selection to stabilize identified signals. We evaluated the proposed method in extensive simulation studies and applied it to the analysis of Alzheimer disease genetics. We showed that the proposed method, when compared to conventional linear and nonlinear methods, can lead to substantially more discoveries.

READ FULL TEXT

page 8

page 9

page 22

research
09/17/2019

Variable selection with false discovery rate control in deep neural networks

Deep neural networks (DNNs) are famous for their high prediction accurac...
research
05/15/2019

Efficient hinging hyperplanes neural network and its application in nonlinear system identification

In this paper, the efficient hinging hyperplanes (EHH) neural network is...
research
02/14/2020

A comparison of different types of Niching Genetic Algorithms for variable selection in solar radiation estimation

Variable selection problems generally present more than a single solutio...
research
01/11/2021

Statistical Methods for cis-Mendelian Randomization

Mendelian randomization is the use of genetic variants to assess the exi...
research
09/04/2018

DeepPINK: reproducible feature selection in deep neural networks

Deep learning has become increasingly popular in both supervised and uns...
research
01/30/2020

A Hybrid Two-layer Feature Selection Method Using GeneticAlgorithm and Elastic Net

Feature selection, as a critical pre-processing step for machine learnin...
research
11/08/2018

A global-local approach for detecting hotspots in multiple-response regression

We tackle modelling and inference for variable selection in regression p...

Please sign up or login with your details

Forgot password? Click here to reset