On the utility of feature selection in building two-tier decision trees

12/29/2022
by   Sergey A. Saltykov, et al.
0

Nowadays, feature selection is frequently used in machine learning when there is a risk of performance degradation due to overfitting or when computational resources are limited. During the feature selection process, the subset of features that are most relevant and least redundant is chosen. In recent years, it has become clear that, in addition to relevance and redundancy, features' complementarity must be considered. Informally, if the features are weak predictors of the target variable separately and strong predictors when combined, then they are complementary. It is demonstrated in this paper that the synergistic effect of complementary features mutually amplifying each other in the construction of two-tier decision trees can be interfered with by another feature, resulting in a decrease in performance. It is demonstrated using cross-validation on both synthetic and real datasets, regression and classification, that removing or eliminating the interfering feature can improve performance by up to 24 times. It has also been discovered that the lesser the domain is learned, the greater the increase in performance. More formally, it is demonstrated that there is a statistically significant negative rank correlation between performance on the dataset prior to the elimination of the interfering feature and performance growth after the elimination of the interfering feature. It is concluded that this broadens the scope of feature selection methods for cases where data and computational resources are sufficient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2021

Scalable Feature Selection for (Multitask) Gradient Boosted Trees

Gradient Boosted Decision Trees (GBDTs) are widely used for building ran...
research
05/28/2019

Efficient Wrapper Feature Selection using Autoencoder and Model Based Elimination

We propose a computationally efficient wrapper feature selection method ...
research
02/01/2015

Feature Selection with Redundancy-complementariness Dispersion

Feature selection has attracted significant attention in data mining and...
research
03/09/2023

A Lite Fireworks Algorithm with Fractal Dimension Constraint for Feature Selection

As the use of robotics becomes more widespread, the huge amount of visio...
research
06/30/2021

Efficient Detection of Botnet Traffic by features selection and Decision Trees

Botnets are one of the online threats with the biggest presence, causing...
research
07/26/2023

Using Markov Boundary Approach for Interpretable and Generalizable Feature Selection

Predictive power and generalizability of models depend on the quality of...
research
09/12/2021

Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection

Gradient Boosting Machines (GBM) are among the go-to algorithms on tabul...

Please sign up or login with your details

Forgot password? Click here to reset