Discovering Conditionally Salient Features with Statistical Guarantees

05/29/2019
by   Jaime Roquero Gimenez, et al.
0

The goal of feature selection is to identify important features that are relevant to explain an outcome variable. Most of the work in this domain has focused on identifying globally relevant features, which are features that are related to the outcome using evidence across the entire dataset. We study a more fine-grained statistical problem: conditional feature selection, where a feature may be relevant depending on the values of the other features. For example in genetic association studies, variant A could be associated with the phenotype in the entire dataset, but conditioned on variant B being present it might be independent of the phenotype. In this sense, variant A is globally relevant, but conditioned on B it is no longer locally relevant in that region of the feature space. We present a generalization of the knockoff procedure that performs conditional feature selection while controlling a generalization of the false discovery rate (FDR) to the conditional setting. By exploiting the feature/response model-free framework of the knockoffs, the quality of the statistical FDR guarantee is not degraded even when we perform conditional feature selections. We implement this method and present an algorithm that automatically partitions the feature space such that it enhances the differences between selected sets in different regions, and validate the statistical theoretical results with experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2018

Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization

The Model-X knockoff procedure has recently emerged as a powerful approa...
research
05/07/2015

Integrating K-means with Quadratic Programming Feature Selection

Several data mining problems are characterized by data in high dimension...
research
10/08/2019

Controlling Costs: Feature Selection on a Budget

The traditional framework for feature selection treats all features as c...
research
06/01/2022

Feature Selection for Discovering Distributional Treatment Effect Modifiers

Finding the features relevant to the difference in treatment effects is ...
research
07/17/2018

Knockoffs for the mass: new feature importance statistics with false discovery guarantees

An important problem in machine learning and statistics is to identify f...
research
10/19/2020

A Uniformly Stable Algorithm For Unsupervised Feature Selection

High-dimensional data presents challenges for data management. Feature s...
research
03/08/2022

Model-free feature selection to facilitate automatic discovery of divergent subgroups in tabular data

Data-centric AI encourages the need of cleaning and understanding of dat...

Please sign up or login with your details

Forgot password? Click here to reset