Explaining the data or explaining a model? Shapley values that uncover non-linear dependencies

07/12/2020
by   Daniel Vidali Fryer, et al.
52

Shapley values have become increasingly popular in the machine learning literature thanks to their attractive axiomatisation, flexibility, and uniqueness in satisfying certain notions of `fairness'. The flexibility arises from the myriad potential forms of the Shapley value game formulation. Amongst the consequences of this flexibility is that there are now many types of Shapley values being discussed, with such variety being a source of potential misunderstanding. To the best of our knowledge, all existing game formulations in the machine learning and statistics literature fall into a category which we name the model-dependent category of game formulations. In this work, we consider an alternative and novel formulation which leads to the first instance of what we call model-independent Shapley values. These Shapley values use a (non-parametric) measure of non-linear dependence as the characteristic function. The strength of these Shapley values is in their ability to uncover and attribute non-linear dependencies amongst features. We introduce and demonstrate the use of the energy distance correlations, affine-invariant distance correlation, and Hilbert-Shmidt independence criterion as Shapley value characteristic functions. In particular, we demonstrate their potential value for exploratory data analysis and model diagnostics. We conclude with an interesting expository application to a classical medical survey data set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2019

Explaining individual predictions when features are dependent: More accurate approximations to Shapley values

Explaining complex or seemingly simple machine learning models is a prac...
research
09/17/2019

The Explanation Game: Explaining Machine Learning Models with Cooperative Game Theory

Recently, a number of techniques have been proposed to explain a machine...
research
01/18/2022

Socioeconomic disparities and COVID-19: the causal connections

The analysis of causation is a challenging task that can be approached i...
research
04/09/2022

Generalised Mathematical Formulations for Non-Linear Optimized Scheduling

In practice, most of the optimization problems are non-linear requiring ...
research
02/12/2021

Explaining predictive models using Shapley values and non-parametric vine copulas

The original development of Shapley values for prediction explanation re...
research
08/22/2019

The many Shapley values for model explanation

The Shapley value has become a popular method to attribute the predictio...
research
08/16/2022

Measuring Statistical Dependencies via Maximum Norm and Characteristic Functions

In this paper, we focus on the problem of statistical dependence estimat...

Please sign up or login with your details

Forgot password? Click here to reset