On marginal feature attributions of tree-based models

02/16/2023
by   Khashayar Filom, et al.
0

Due to their power and ease of use, tree-based machine learning models have become very popular. To interpret these models, local feature attributions based on marginal expectations e.g. marginal (interventional) Shapley, Owen or Banzhaf values may be employed. Such feature attribution methods are true to the model and implementation invariant, i.e. dependent only on the input-output function of the model. By taking advantage of the internal structure of tree-based models, we prove that their marginal Shapley values, or more generally marginal feature attributions obtained from a linear game value, are simple (piecewise-constant) functions with respect to a certain finite partition of the input space determined by the trained model. The same is true for feature attributions obtained from the famous TreeSHAP algorithm. Nevertheless, we show that the "path-dependent" TreeSHAP is not implementation invariant by presenting two (statistically similar) decision trees computing the exact same function for which the algorithm yields different rankings of features, whereas the marginal Shapley values coincide. Furthermore, we discuss how the fact that marginal feature attributions are simple functions can potentially be utilized to compute them. An important observation, showcased by experiments with XGBoost, LightGBM and CatBoost libraries, is that only a portion of all features appears in a tree from the ensemble; thus the complexity of computing marginal Shapley (or Owen or Banzhaf) feature attributions may be reduced. In particular, in the case of CatBoost models, the trees are oblivious (symmetric) and the number of features in each of them is no larger than the depth. We exploit the symmetry to derive an explicit formula with improved complexity for marginal Shapley (and Banzhaf and Owen) values which is only in terms of the internal parameters of the CatBoost model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2021

Interpreting Deep Learning Models with Marginal Attribution by Conditioning on Quantiles

A vastly growing literature on explaining deep learning models has emerg...
research
09/16/2022

Linear TreeShap

Decision trees are well-known due to their ease of interpretability. To ...
research
09/27/2022

WeightedSHAP: analyzing and improving Shapley based feature attributions

Shapley value is a popular approach for measuring the influence of indiv...
research
02/12/2018

Consistent Individualized Feature Attribution for Tree Ensembles

Interpreting predictions from tree ensemble methods such as gradient boo...
research
10/14/2020

Decision trees as partitioning machines to characterize their generalization properties

Decision trees are popular machine learning models that are simple to bu...
research
11/28/2018

19 dubious ways to compute the marginal likelihood of a phylogenetic tree topology

The marginal likelihood of a model is a key quantity for assessing the e...
research
06/19/2023

Invariant correlation under marginal transforms

The Pearson correlation coefficient is generally not invariant under com...

Please sign up or login with your details

Forgot password? Click here to reset