Knockoffs for the mass: new feature importance statistics with false discovery guarantees

07/17/2018
by   Jaime Roquero Gimenez, et al.
0

An important problem in machine learning and statistics is to identify features that causally affect the outcome. This is often impossible to do from purely observational data, and a natural relaxation is to identify features that are correlated with the outcome even conditioned on all other observed features. For example, we want to identify that smoking really is correlated with cancer conditioned on demographics. The knockoff procedure is a recent breakthrough in statistics that, in theory, can identify truly correlated features while guaranteeing that the false discovery is limited. The idea is to create synthetic data -knockoffs- that captures correlations amongst the features. However there are substantial computational and practical challenges to generating and using knockoffs. This paper makes several key advances that enable knockoff application to be more efficient and powerful. We develop an efficient algorithm to generate valid knockoffs from Bayesian Networks. Then we systematically evaluate knockoff test statistics and develop new statistics with improved power. The paper combines new mathematical guarantees with systematic experiments on real and synthetic data.

READ FULL TEXT
research
10/26/2018

Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization

The Model-X knockoff procedure has recently emerged as a powerful approa...
research
04/18/2021

A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model

The Benjamini-Hochberg (BH) procedure remains widely popular despite hav...
research
10/17/2019

Information Loss and Power Distortion from Standardizing in Multiple Hypothesis Testing

Standardization has been a widely adopted practice in multiple testing, ...
research
05/29/2019

Discovering Conditionally Salient Features with Statistical Guarantees

The goal of feature selection is to identify important features that are...
research
12/18/2018

Solving the Empirical Bayes Normal Means Problem with Correlated Noise

The Normal Means problem plays a fundamental role in many areas of moder...
research
02/07/2020

Subsampling Winner Algorithm for Feature Selection in Large Regression Data

Feature selection from a large number of covariates (aka features) in a ...

Please sign up or login with your details

Forgot password? Click here to reset