A Comparative Study of Methods for Estimating Conditional Shapley Values and When to Use Them

05/16/2023
by   Lars Henry Berge Olsen, et al.
0

Shapley values originated in cooperative game theory but are extensively used today as a model-agnostic explanation framework to explain predictions made by complex machine learning models in the industry and academia. There are several algorithmic approaches for computing different versions of Shapley value explanations. Here, we focus on conditional Shapley values for predictive models fitted to tabular data. Estimating precise conditional Shapley values is difficult as they require the estimation of non-trivial conditional expectations. In this article, we develop new methods, extend earlier proposed approaches, and systematize the new refined and existing methods into different method classes for comparison and evaluation. The method classes use either Monte Carlo integration or regression to model the conditional expectations. We conduct extensive simulation studies to evaluate how precisely the different method classes estimate the conditional expectations, and thereby the conditional Shapley values, for different setups. We also apply the methods to several real-world data experiments and provide recommendations for when to use the different method classes and approaches. Roughly speaking, we recommend using parametric methods when we can specify the data distribution almost correctly, as they generally produce the most accurate Shapley value explanations. When the distribution is unknown, both generative methods and regression models with a similar form as the underlying predictive model are good and stable options. Regression-based methods are often slow to train but produce the Shapley value explanations quickly once trained. The vice versa is true for Monte Carlo-based methods, making the different methods appropriate in different practical situations.

READ FULL TEXT

page 16

page 17

page 18

page 19

page 20

page 21

page 23

page 24

research
10/14/2020

Conditional Monte Carlo revisited

Conditional Monte Carlo refers to sampling from the conditional distribu...
research
04/25/2021

Sampling Permutations for Shapley Value Estimation

Game-theoretic attribution techniques based on Shapley values are used e...
research
11/24/2021

An efficient estimation of nested expectations without conditional sampling

Estimating nested expectations is an important task in computational mat...
research
05/26/2023

Detecting Errors in Numerical Data via any Regression Model

Noise plagues many numerical datasets, where the recorded values in the ...
research
11/26/2021

Using Shapley Values and Variational Autoencoders to Explain Predictive Models with Dependent Mixed Features

Shapley values are today extensively used as a model-agnostic explanatio...
research
07/20/2023

Conditional expectation network for SHAP

A very popular model-agnostic technique for explaining predictive models...
research
10/20/2022

Iteratively Reweighte Least Squares Method for Estimating Polyserial and Polychoric Correlation Coefficients

An iteratively reweighted least squares (IRLS) method is proposed for es...

Please sign up or login with your details

Forgot password? Click here to reset