Consistent feature attribution for tree ensembles

06/19/2017
by   Scott M. Lundberg, et al.
0

It is critical in many applications to understand what features are important for a model, and why individual predictions were made. For tree ensemble methods these questions are usually answered by attributing importance values to input features, either globally or for a single prediction. Here we show that current feature attribution methods are inconsistent, which means changing the model to rely more on a given feature can actually decrease the importance assigned to that feature. To address this problem we develop fast exact solutions for SHAP (SHapley Additive exPlanation) values, which were recently shown to be the unique additive feature attribution method based on conditional expectations that is both consistent and locally accurate. We integrate these improvements into the latest version of XGBoost, demonstrate the inconsistencies of current methods, and show how using SHAP values results in significantly improved supervised clustering performance. Feature importance values are a key part of understanding widely used models such as gradient boosting trees and random forests, so improvements to them have broad practical implications.

READ FULL TEXT
research
02/12/2018

Consistent Individualized Feature Attribution for Tree Ensembles

Interpreting predictions from tree ensemble methods such as gradient boo...
research
12/16/2021

Exact Shapley Values for Local and Model-True Explanations of Decision Tree Ensembles

Additive feature explanations using Shapley values have become popular f...
research
05/18/2019

Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees

Tree ensembles, such as random forests and AdaBoost, are ubiquitous mach...
research
03/23/2022

On Understanding the Influence of Controllable Factors with a Feature Attribution Algorithm: a Medical Case Study

Feature attribution XAI algorithms enable their users to gain insight in...
research
10/18/2021

RKHS-SHAP: Shapley Values for Kernel Methods

Feature attribution for kernel methods is often heuristic and not indivi...
research
02/16/2023

The Inadequacy of Shapley Values for Explainability

This paper develops a rigorous argument for why the use of Shapley value...
research
11/08/2022

Individualized and Global Feature Attributions for Gradient Boosted Trees in the Presence of ℓ_2 Regularization

While ℓ_2 regularization is widely used in training gradient boosted tre...

Please sign up or login with your details

Forgot password? Click here to reset