Finding Influential Training Samples for Gradient Boosted Decision Trees

02/19/2018
by   Boris Sharchilev, et al.
0

We address the problem of finding influential training samples for a particular case of tree ensemble-based models, e.g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying how the model's predictions change upon leave-one-out retraining, leaving out each individual training sample. Recent work has shown that, for parametric models, this analysis can be conducted in a computationally efficient way. We propose several ways of extending this framework to non-parametric GBDT ensembles under the assumption that tree structures remain fixed. Furthermore, we introduce a general scheme of obtaining further approximations to our method that balance the trade-off between performance and computational complexity. We evaluate our approaches on various experimental setups and use-case scenarios and demonstrate both the quality of our approach to finding influential training samples in comparison to the baselines and its computational efficiency.

READ FULL TEXT
research
08/11/2021

Trading Complexity for Sparsity in Random Forest Explanations

Random forests have long been considered as powerful model ensembles in ...
research
07/24/2020

A Nonparametric Test of Dependence Based on Ensemble of Decision Trees

In this paper, a robust non-parametric measure of statistical dependence...
research
03/15/2017

Cost-complexity pruning of random forests

Random forests perform bootstrap-aggregation by sampling the training sa...
research
12/03/2019

Training Robust Tree Ensembles for Security

Tree ensemble models including random forests and gradient boosted decis...
research
01/20/2021

Dive into Decision Trees and Forests: A Theoretical Demonstration

Based on decision trees, many fields have arguably made tremendous progr...
research
10/28/2022

LegoNet: A Fast and Exact Unlearning Architecture

Machine unlearning aims to erase the impact of specific training samples...
research
01/24/2023

A Robust Hypothesis Test for Tree Ensemble Pruning

Gradient boosted decision trees are some of the most popular algorithms ...

Please sign up or login with your details

Forgot password? Click here to reset