Feature Selection Methods for Cost-Constrained Classification in Random Forests

08/14/2020
by   Rudolf Jagdhuber, et al.
0

Cost-sensitive feature selection describes a feature selection problem, where features raise individual costs for inclusion in a model. These costs allow to incorporate disfavored aspects of features, e.g. failure rates of as measuring device, or patient harm, in the model selection process. Random Forests define a particularly challenging problem for feature selection, as features are generally entangled in an ensemble of multiple trees, which makes a post hoc removal of features infeasible. Feature selection methods therefore often either focus on simple pre-filtering methods, or require many Random Forest evaluations along their optimization path, which drastically increases the computational complexity. To solve both issues, we propose Shallow Tree Selection, a novel fast and multivariate feature selection method that selects features from small tree structures. Additionally, we also adapt three standard feature selection algorithms for cost-sensitive learning by introducing a hyperparameter-controlled benefit-cost ratio criterion (BCR) for each method. In an extensive simulation study, we assess this criterion, and compare the proposed methods to multiple performance-based baseline alternatives on four artificial data settings and seven real-world data settings. We show that all methods using a hyperparameterized BCR criterion outperform the baseline alternatives. In a direct comparison between the proposed methods, each method indicates strengths in certain settings, but no one-fits-all solution exists. On a global average, we could identify preferable choices among our BCR based methods. Nevertheless, we conclude that a practical analysis should never rely on a single method only, but always compare different approaches to obtain the best results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2020

Implications on Feature Detection when using the Benefit-Cost Ratio

In many practical machine learning applications, there are two objective...
research
04/04/2019

Cost-Sensitive Feature Selection by Optimizing F-Measures

Feature selection is beneficial for improving the performance of general...
research
10/04/2020

Test-Cost Sensitive Methods for Identifying Nearby Points

Real-world applications that involve missing values are often constraine...
research
08/11/2021

The Pitfalls of Sample Selection: A Case Study on Lung Nodule Classification

Using publicly available data to determine the performance of methodolog...
research
01/18/2022

Nonparametric Feature Selection by Random Forests and Deep Neural Networks

Random forests are a widely used machine learning algorithm, but their c...
research
05/30/2009

A Minimum Description Length Approach to Multitask Feature Selection

Many regression problems involve not one but several response variables ...

Please sign up or login with your details

Forgot password? Click here to reset