Controlling Costs: Feature Selection on a Budget

10/08/2019
by   Guo Yu, et al.
0

The traditional framework for feature selection treats all features as costing the same amount. However, in reality, a scientist often has considerable discretion regarding what variables to measure, and the decision involves a tradeoff between model accuracy and cost (where cost can refer to money, time, difficulty, or intrusiveness). In particular, unnecessarily including an expensive feature in a model is worse than unnecessarily including a cheap feature. We propose a procedure, based on multiple knockoffs, for performing feature selection in a cost-conscious manner. The key idea behind our method is to force higher cost features to compete with more knockoffs than cheaper features. We derive an upper bound on the weighted false discovery proportion associated with this procedure, which corresponds to the fraction of the feature cost that is wasted on unimportant features. We prove that this bound holds simultaneously with high probability over a path of selected variable sets of increasing size. A user may thus select a set of features based, for example, on the overall budget, while knowing that no more than a particular fraction of feature cost is wasted. In a simulation study, we investigate the practical importance of incorporating cost considerations into the feature selection process.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2022

Error-based Knockoffs Inference for Controlled Feature Selection

Recently, the scheme of model-X knockoffs was proposed as a promising so...
research
06/06/2019

Selecting Biomarkers for building optimal treatment selection rules using Kernel Machines

Optimal biomarker combinations for treatment-selection can be derived by...
research
08/12/2020

Implications on Feature Detection when using the Benefit-Cost Ratio

In many practical machine learning applications, there are two objective...
research
07/19/2022

Neural Greedy Pursuit for Feature Selection

We propose a greedy algorithm to select N important features among P inp...
research
05/29/2019

Discovering Conditionally Salient Features with Statistical Guarantees

The goal of feature selection is to identify important features that are...
research
02/01/2020

On the Consistency of Optimal Bayesian Feature Selection in the Presence of Correlations

Optimal Bayesian feature selection (OBFS) is a multivariate supervised s...
research
02/28/2017

Finding Significant Combinations of Continuous Features

We present an efficient feature selection method that can find all multi...

Please sign up or login with your details

Forgot password? Click here to reset