Data-driven advice for interpreting local and global model predictions in bioinformatics problems

08/13/2021
by   Markus Loecher, et al.
0

Tree-based algorithms such as random forests and gradient boosted trees continue to be among the most popular and powerful machine learning models used across multiple disciplines. The conventional wisdom of estimating the impact of a feature in tree based models is to measure the node-wise reduction of a loss function, which (i) yields only global importance measures and (ii) is known to suffer from severe biases. Conditional feature contributions (CFCs) provide local, case-by-case explanations of a prediction by following the decision path and attributing changes in the expected output of the model to each feature along the path. However, Lundberg et al. pointed out a potential bias of CFCs which depends on the distance from the root of a tree. The by now immensely popular alternative, SHapley Additive exPlanation (SHAP) values appear to mitigate this bias but are computationally much more expensive. Here we contribute a thorough comparison of the explanations computed by both methods on a set of 164 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. For random forests, we find extremely high similarities and correlations of both local and global SHAP values and CFC scores, leading to very similar rankings and interpretations. Analogous conclusions hold for the fidelity of using global feature importance scores as a proxy for the predictive power associated with each feature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2021

From global to local MDI variable importances for random forests and when they are Shapley values

Random forests have been widely used for their ability to provide so-cal...
research
01/05/2023

Instance-based Explanations for Gradient Boosting Machine Predictions with AXIL Weights

We show that regression predictions from linear and tree-based models ca...
research
05/11/2019

Explainable AI for Trees: From Local Explanations to Global Understanding

Tree-based machine learning models such as random forests, decision tree...
research
08/12/2022

Unifying local and global model explanations by functional decomposition of low dimensional structures

We consider a global explanation of a regression or classification funct...
research
04/14/2023

Grouping Shapley Value Feature Importances of Random Forests for explainable Yield Prediction

Explainability in yield prediction helps us fully explore the potential ...
research
03/26/2020

From unbiased MDI Feature Importance to Explainable AI for Trees

We attempt to give a unifying view of the various recent attempts to (i)...

Please sign up or login with your details

Forgot password? Click here to reset