Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection

09/12/2021
by   Afek Ilay Adler, et al.
0

Gradient Boosting Machines (GBM) are among the go-to algorithms on tabular data, which produce state of the art results in many prediction tasks. Despite its popularity, the GBM framework suffers from a fundamental flaw in its base learners. Specifically, most implementations utilize decision trees that are typically biased towards categorical variables with large cardinalities. The effect of this bias was extensively studied over the years, mostly in terms of predictive performance. In this work, we extend the scope and study the effect of biased base learners on GBM feature importance (FI) measures. We show that although these implementation demonstrate highly competitive predictive performance, they still, surprisingly, suffer from bias in FI. By utilizing cross-validated (CV) unbiased base learners, we fix this flaw at a relatively low computational cost. We demonstrate the suggested framework in a variety of synthetic and real-world setups, showing a significant improvement in all GBM FI measures while maintaining relatively the same level of prediction accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2018

Gradient Boosting With Piece-Wise Linear Regression Trees

Gradient boosting using decision trees as base learners, so called Gradi...
research
12/08/2019

VAT tax gap prediction: a 2-steps Gradient Boosting approach

Tax evasion is the illegal non-payment of taxes by individuals, corporat...
research
11/29/2021

Conceptually Diverse Base Model Selection for Meta-Learners in Concept Drifting Data Streams

Meta-learners and ensembles aim to combine a set of relevant yet diverse...
research
06/07/2020

Soft Gradient Boosting Machine

Gradient Boosting Machine has proven to be one successful function appro...
research
07/16/2019

Online Local Boosting: improving performance in online decision trees

As more data are produced each day, and faster, data stream mining is gr...
research
12/29/2022

On the utility of feature selection in building two-tier decision trees

Nowadays, feature selection is frequently used in machine learning when ...
research
12/10/2015

Cross-Validated Variable Selection in Tree-Based Methods Improves Predictive Performance

Recursive partitioning approaches producing tree-like models are a long ...

Please sign up or login with your details

Forgot password? Click here to reset