Powerful Knockoffs via Minimizing Reconstructability

11/30/2020
by   Asher Spector, et al.
0

Model-X knockoffs allows analysts to perform feature selection using almost any machine learning algorithm while still provably controlling the expected proportion of false discoveries. To apply model-X knockoffs, one must construct synthetic variables, called knockoffs, which effectively act as controls during feature selection. The gold standard for constructing knockoffs has been to minimize the mean absolute correlation (MAC) between features and their knockoffs, but, surprisingly, we prove this procedure can be powerless in extremely easy settings, including Gaussian linear models with correlated exchangeable features. The key problem is that minimizing the MAC creates strong joint dependencies between the features and knockoffs, which allow machine learning algorithms to partially or fully reconstruct the effect of the features on the response using the knockoffs. To improve the power of knockoffs, we propose generating knockoffs which minimize the reconstructability (MRC) of the features, and we demonstrate our proposal for Gaussian features by showing it is computationally efficient, robust, and powerful. We also prove that certain MRC knockoffs minimize a natural definition of estimation error in Gaussian linear models. Furthermore, in an extensive set of simulations, we find many settings with correlated features in which MRC knockoffs dramatically outperform MAC-minimizing knockoffs and no settings in which MAC-minimizing knockoffs outperform MRC knockoffs by more than a very slight margin. We implement our methods and a host of others from the knockoffs literature in a new open source python package knockpy.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 17

page 18

05/05/2020

Feature Selection Methods for Uplift Modeling

Uplift modeling is a predictive modeling technique that estimates the us...
07/02/2020

A Scale-free Approach for False Discovery Rate Control in Generalized Linear Models

The generalized linear models (GLM) have been widely used in practice to...
06/03/2021

Normalizing Flows for Knockoff-free Controlled Feature Selection

The goal of controlled feature selection is to discover the features a r...
02/25/2019

Epileptic seizure classification using statistical sampling and a novel feature selection algorithm

Epilepsy is a well-known neuronal disorder that can be identified by int...
10/08/2019

Controlling Costs: Feature Selection on a Budget

The traditional framework for feature selection treats all features as c...
04/15/2021

Piecewise-linear modelling with feature selection for Li-ion battery end of life prognosis

The complex nature of lithium-ion battery degradation has led to many ma...
03/08/2022

Beam Search for Feature Selection

In this paper, we present and prove some consistency results about the p...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.