Shapley Decomposition of R-Squared in Machine Learning Models

08/26/2019
by   Nickalus Redell, et al.
0

In this paper we introduce a metric aimed at helping machine learning practitioners quickly summarize and communicate the overall importance of each feature in any black-box machine learning prediction model. Our proposed metric, based on a Shapley-value variance decomposition of the familiar R^2 from classical statistics, is a model-agnostic approach for assessing feature importance that fairly allocates the proportion of model-explained variability in the data to each model feature. This metric has several desirable properties including boundedness at 0 and 1 and a feature-level variance decomposition summing to the overall model R^2. In contrast to related methods for computing feature-level R^2 variance decompositions with linear models, our method makes use of pre-computed Shapley values which effectively shifts the computational burden from iteratively fitting many models to the Shapley values themselves. And with recent advancements in Shapley value calculations for gradient boosted decision trees and neural networks, computing our proposed metric after model training can come with minimal computational overhead. Our implementation is available in the R package shapFlex.

READ FULL TEXT

page 6

page 7

research
12/20/2022

A Generalized Variable Importance Metric and Estimator for Black Box Machine Learning Models

The aim of this study is to define importance of predictors for black bo...
research
04/08/2019

Sampling, Intervention, Prediction, Aggregation: A Generalized Framework for Model Agnostic Interpretations

Non-linear machine learning models often trade off a great predictive pe...
research
08/26/2022

PDD-SHAP: Fast Approximations for Shapley Values using Functional Decomposition

Because of their strong theoretical properties, Shapley values have beco...
research
12/12/2022

Explainable Performance

We introduce the XPER (eXplainable PERformance) methodology to measure t...
research
06/16/2021

mSHAP: SHAP Values for Two-Part Models

Two-part models are important to and used throughout insurance and actua...
research
06/25/2020

Neural Decomposition: Functional ANOVA with Variational Autoencoders

Variational Autoencoders (VAEs) have become a popular approach for dimen...
research
09/06/2021

Bringing a Ruler Into the Black Box: Uncovering Feature Impact from Individual Conditional Expectation Plots

As machine learning systems become more ubiquitous, methods for understa...

Please sign up or login with your details

Forgot password? Click here to reset