Bayesian Knockoff Generators for Robust Inference Under Complex Data Structure

11/12/2021
by   Michael J. Martens, et al.
0

The recent proliferation of medical data, such as genetics and electronic health records (EHR), offers new opportunities to find novel predictors of health outcomes. Presented with a large set of candidate features, interest often lies in selecting the ones most likely to be predictive of an outcome for further study such that the goal is to control the false discovery rate (FDR) at a specified level. Knockoff filtering is an innovative strategy for FDR-controlled feature selection. But, existing knockoff methods make strong distributional assumptions that hinder their applicability to real world data. We propose Bayesian models for generating high quality knockoff copies that utilize available knowledge about the data structure, thus improving the resolution of prognostic features. Applications to two feature sets are considered: those with categorical and/or continuous variables possibly having a population substructure, such as in EHR; and those with microbiome features having a compositional constraint and phylogenetic relatedness. Through simulations and real data applications, these methods are shown to identify important features with good FDR control and power.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2022

Feature Selection for Machine Learning Algorithms that Bounds False Positive Rate

The problem of selecting a handful of truly relevant variables in superv...
research
02/20/2020

False Discovery Rate Control via Data Splitting

Selecting relevant features associated with a given response variable is...
research
05/06/2019

Hybrid Density- and Partition-based Clustering Algorithm for Data with Mixed-type Variables

Clustering is an essential technique for discovering patterns in data. T...
research
09/04/2018

Bayesian Double Feature Allocation for Phenotyping with Electronic Health Records

We propose a categorical matrix factorization method to infer latent dis...
research
09/09/2019

Theory of Optimal Bayesian Feature Filtering

Optimal Bayesian feature filtering (OBF) is a supervised screening metho...
research
10/27/2020

Sequential knockoffs for continuous and categorical predictors: with application to a large Psoriatic Arthritis clinical trial pool

Knockoffs provide a general framework for controlling the false discover...
research
05/29/2020

Unsupervised Feature Selection via Multi-step Markov Transition Probability

Feature selection is a widely used dimension reduction technique to sele...

Please sign up or login with your details

Forgot password? Click here to reset