Unbiased variable importance for random forests

03/04/2020
by   Markus Loecher, et al.
0

The default variable-importance measure in random Forests, Gini importance, has been shown to suffer from the bias of the underlying Gini-gain splitting criterion. While the alternative permutation importance is generally accepted as a reliable measure of variable importance, it is also computationally demanding and suffers from other shortcomings. We propose a simple solution to the misleading/untrustworthy Gini importance which can be viewed as an overfitting problem: we compute the loss reduction on the out-of-bag instead of the in-bag training samples.

READ FULL TEXT
research
03/26/2020

From unbiased MDI Feature Importance to Explainable AI for Trees

We attempt to give a unifying view of the various recent attempts to (i)...
research
12/06/2022

The Importance of Variable Importance

Variable importance is defined as a measure of each regressor's contribu...
research
03/24/2021

Dimension Reduction Forests: Local Variable Importance using Structured Random Forests

Random forests are one of the most popular machine learning methods due ...
research
03/15/2017

Cost-complexity pruning of random forests

Random forests perform bootstrap-aggregation by sampling the training sa...
research
03/12/2019

Unbiased Measurement of Feature Importance in Tree-Based Methods

We propose a modification that corrects for split-improvement variable i...
research
02/26/2021

MDA for random forests: inconsistency, and a practical solution via the Sobol-MDA

Variable importance measures are the main tools to analyze the black-box...
research
05/12/2018

A Simple and Effective Model-Based Variable Importance Measure

In the era of "big data", it is becoming more of a challenge to not only...

Please sign up or login with your details

Forgot password? Click here to reset