Counterfactual Shapley Additive Explanations

by   Emanuele Albini, et al.
J.P. Morgan

Feature attributions are a common paradigm for model explanations due to their simplicity in assigning a single numeric score for each input feature to a model. In the actionable recourse setting, wherein the goal of the explanations is to improve outcomes for model consumers, it is often unclear how feature attributions should be correctly used. With this work, we aim to strengthen and clarify the link between actionable recourse and feature attributions. Concretely, we propose a variant of SHAP, CoSHAP, that uses counterfactual generation techniques to produce a background dataset for use within the marginal (a.k.a. interventional) Shapley value framework. We motivate the need within the actionable recourse setting for careful consideration of background datasets when using Shapley values for feature attributions, alongside the requirement for monotonicity, with numerous synthetic examples. Moreover, we demonstrate the efficacy of CoSHAP by proposing and justifying a quantitative score for feature attributions, counterfactual-ability, showing that as measured by this metric, CoSHAP is superior to existing methods when evaluated on public datasets using monotone tree ensembles.



There are no comments yet.


page 3

page 4


Actionable Interpretability through Optimizable Counterfactual Explanations for Tree Ensembles

Counterfactual explanations help users understand why machine learned mo...

Score-Based Explanations in Data Management and Machine Learning: An Answer-Set Programming Approach to Counterfactual Analysis

We describe some recent approaches to score-based explanations for query...

Counterfactual Explanations Adversarial Examples – Common Grounds, Essential Differences, and Potential Transfers

It is well known that adversarial examples and counterfactual explanatio...

Optimal Counterfactual Explanations in Tree Ensembles

Counterfactual explanations are usually generated through heuristics tha...

DisCERN:Discovering Counterfactual Explanations using Relevance Features from Neighbourhoods

Counterfactual explanations focus on "actionable knowledge" to help end-...

The Hidden Assumptions Behind Counterfactual Explanations and Principal Reasons

Counterfactual explanations are gaining prominence within technical, leg...

On Quantitative Evaluations of Counterfactuals

As counterfactual examples become increasingly popular for explaining de...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Government regulators are placing increasing emphasis on the fairness and discrimination issues in decision making processes using machine learning algorithms in high-stakes context as finance and healthcare. For example, U.S. credit regulations (U.S. Congress, 2018) put particular emphasis on the need to explain automatic decisions in terms of key factors that contributed to an adverse decision. Meanwhile, in the academic literature several techniques have been proposed to address this issue (see (Barredo Arrieta et al., 2020; Guidotti et al., 2019; Adadi and Berrada, 2018; Čyras et al., 2021) for an overview).

In the context of local explainability many approaches on which researchers have focused in the last years are based on the notion of feature attribution, i.e., distributing the output of the model for a specific input to its features (e.g., LIME (Ribeiro et al., 2016), SHAP (Lundberg and Lee, 2017), GIG (Merrill et al., 2019)). In this paper in particular we will focus on SHAP, one of the most popular techniques to generate local explanations based on the notion of Shapley value (Shapley, 1951)

from game theory. Shapley value-based frameworks for Explainable AI (XAI) consider each feature as a player in a

-person game to fairly distribute the contribution of each feature to the output of the model. To do so they compare the output of the (same) model when a feature is present with that of when the same feature is missing. There are two main limitations with this approach that have been raised in the literature:

  1. [label=()]

  2. It is not clear how to define the output of the model when a feature is missing. The most common approach is to estimate it as an expectation over a background distribution of the input features

    (Merrick and Taly, 2020).

  3. There is no explicit guidance provided on how a user might alter one’s behavior in a desirable way (Kumar et al., 2020).

Another popular area of research has developed around counterfactual explanations, also known as algorithmic recourse, i.e., given a specific input one must find the “closest possible world (input)” (Wachter et al., 2018) that gives rise to a different outcome. In practise, this means that these approaches aim to find one (or more) points that are (1) close to the one we want to explain; and (2) “plausible” (where plausibility can be defined in different ways in the literature, see (Keane et al., 2021) for more insights). Counterfactual explanations have two main limitations:

  1. [label=()]

  2. Most of the approaches in the literature are limited at finding a single counterfactual point. While this may give the user a clear understanding of what they could do in order to reverse an adverse outcome, it does not allow them to chose changes that are more suited for them.

  3. While there has been some attempt at generating diverse sets of counterfactual points (e.g., (Mothilal et al., 2020; Russell, 2019)), there is no consensus on how to limit the cognitive load for the user caused by the sheer amount of information that is provided, or – in other words – on how to provide a more amenable explanation (in terms of size), as advocated also from a social science perspective (Miller, 2019).

In this paper we present how these two general approaches for explainability can be combined in order to provide a counterfactual feature attribution grounded on the game-theoretic approach afforded by Shapley values that we call Counterfactual SHAP (CoSHAP). We are motivated by the desire to retain the simple form of explanation provided by feature attributions, while introducing the actionability properties of counterfactual explanations.

In particular, our contributions are as follows.

  • We enumerate the assumptions that are necessary to interpret Shapley values in a counterfactual sense and discuss what it means for a feature attribution method to demonstrate counterfactual behaviour.

  • We introduce the notion of counterfactual-ability of a feature attribution as a way to quantitatively evaluate its ability to suggest to the user how to act upon the input in order to overcome an adverse prediction.

  • We propose to use (a uniform distribution over) a set of counterfactual points as the background distribution for the computation of Shapley values in order to achieve higher counterfactual-ability, yielding the CoSHAP algorithm.

  • We benchmark the CoSHAP algorithm using a number of different counterfactual generation techniques from the literature against baseline feature attribution techniques. CoSHAP (using -NN as the counterfactual generation technique) is shown to have the best counterfactual-ability on several public datasets taken from the financial domain.

We note that in this paper we concentrate on tree-based models for the following reasons: (1) in the context of classification and regression for tabular data, tree-based ensemble models as XGBoost, CatBoost and LightGBM are deemed as the state-of-the-art in terms of performance and therefore are widely adopted in many industries including finance

(Sudjianto and Zoldi, 2021); (2) interventional Shapley values can be computed exactly for tree-based models using the algorithm proposed in (Lundberg et al., 2020).

We also note that in this paper we will put particular emphasis on models that are monotone, i.e., models in which their output is (positively or negatively) monotonic with respect to each of the input features. We do this for two reasons: (1) monotonicity is a key precondition for the interpretation of a feature attribution in counterfactual terms, as will be described in detail in Section 3.2; (2) high stakes models (that are usually the ones in most need of explanations) are usually constrained to be simpler at design-stage with the objective of making them more interpretable. One such common constraint is imposing some kind of monotonic relationship between the features and their output (Sudjianto and Zoldi, 2021).

2. Background

In the remainder of this paper we consider a binary classification model where and . We define the decision function as follows111

We use lower-case bold symbols to indicate vectors.


We refer to as the model output and to as the model prediction or outcome. Note that, as reported in the definition of and without loss of generality, we use as decision threshold for the binary prediction. Moreover, without loss of generality, in the sequel we assume that an input such that is an adverse outcome for the user. We also note that all the results in this paper can be trivially generalized to multi-class models.

2.1. Shapley values

The Shapley values method is a technique used in classic game theory to fairly attribute the payoff to the players in an -player cooperative game. Formally, given a set of players and the characteristic function of the game , the Shapley value of player is defined as:

In the context of machine learning models the players are the features of the model and several ways have been proposed to simulate feature absence in the characteristic function (e.g., retraining the model without such feature (Strumbelj and Kononenko, 2010), or using the conditional or marginal expectations over a background distribution (Lundberg and Lee, 2017)).

In this paper we use the approximation of the characteristic function proposed in (Lundberg and Lee, 2017) and (Lundberg et al., 2020) that simulates the absence of a feature using the marginal expectation over a background distribution .

where with an abuse of notation indicates the output of the model with feature values for features in and values for feature values not in . In the remainder of this paper we will refer to the space of Shapley values as and the Shapley values vector of as .

Figure 1. Effect of different choices of background dataset on the Shapley values ( of the same input () with the same model. Red regions correspond to areas of the feature space where the decision is adverse, i.e. , with blue regions representing the opposite, i.e. those for which . Coloured arrows and scatter points represent the directions of the Shapley values vector and the background dataset used for their computation, respectively.

2.2. Counterfactual Explanations

In its basic form, a (local) counterfactual explanation (CF) for an input is a point such that (1) gives rise to a different prediction, i.e., , (2) and are close (under some distance metric) and (3) is a “plausible” input. This last constraint has been interpreted in several ways in the literature, it may involve considerations about sparsity (e.g., (Smyth and Keane, 2021)), closeness to the data manifold (e.g., (Pawelczyk et al., 2020)), causality (e.g., (Karimi et al., 2021)), actionability (e.g., (Ustun et al., 2019; Poyiadzi et al., 2020)) or a combination thereof (e.g., (Dandl et al., 2020)). A plethora of techniques for the generation of counterfactuals exist in the literature using search algorithms (e.g., (Wachter et al., 2018; Albini et al., 2020; Spooner et al., 2021)), optimization (e.g., (Kanamori et al., 2020)

) and genetic algorithms (e.g.,

(Sharma et al., 2020)) among other methods (we refer the reader to (Keane et al., 2021; Stepin et al., 2021; Karimi et al., 2020; Verma et al., 2020) for recent surveys).

We note that, in the scope of this paper we will consider only counterfactual explanation methods that are (1) able to generate a diverse set of counterfactuals and (2) do not require the model to be differentiable since as described in Section 1 we focus on tree-based models. We note that few counterfactual explanation techniques that satisfying both of these requirements exist in the literature.

3. Interpreting Shapley values in counterfactual terms

In general, Shapley values do not have an obvious interpretation in counterfactual terms, this means that they do not provide suggestions on how a user can change their features in order to change the prediction (Kumar et al., 2020; Barocas et al., 2020). We argue that this is due to 2 main reasons: (1) the “arbitrary” choice of background distribution for the computation of Shapley values and (2) the lack of monotonicity constraint of the model. We now discuss the details of these two reasons.

3.1. Choice of the background distribution

Shapley values describe the contributions of the players (features) to the game payoff (model output). In the context of machine learning model explainability an important assumption is made: the simulation of each feature’s absence in the cooperative game using a background distribution . As pointed out in (Merrick and Taly, 2020), this means that Shapley values explain a prediction of an input in contrast to a distribution of background points. In practise, the background distribution is taken as a uniform distribution over unit point masses at a finite number of points, called the background dataset.

Therefore, the background dataset should be chosen according to the contrastive question that we aim to answer. We list some of the most common distributions that have been proposed/used.

  • Training Set () (Lundberg and Lee, 2017). The whole training set, including the samples that are labelled and/or predicted of being of the same class of the input.

  • Differently-Labelled Samples (). The samples in the training set labelled differently than the input.

  • Differently-Predicted Samples (). The samples in the training set predicted with a different class.

  • Differently-Predicted Samples Median (). A single point obtained as the feature-wise median of the points predicted with a different class.

These choices of background dataset have in common the fact that they are defined a priori, i.e., they are equal for all the inputs. This means that we are contrasting an input with a (input-invariant) distribution that may potentially be very different from . This can give rise to explanations that are sometimes misleading for a user who is typically interested in understanding which features led to the adverse outcome (in order to reverse it). In other words the constrastive question that we are answering with the Shapley values is not tailored to the specific input (user) and therefore instead of answering the question of “Why was a user rejected when compared to similar users that were accepted?” we will be answering the question of “Which features are most important in making my outcome different from that of other (accepted) users?” (potentially very different from ).

If we consider the example in Figure 1.i that shows an explanation where background dataset is the training set, we note that the Shapley values suggest that Feature A negatively contributed to the model output; this means that the current value of Feature A is “protective” against rejection when put in contrast with the expected output of the model obtained when using the background distribution . This may be useful information for the model developers but it does not allow one to gain any (actionable) insight unless we assume access to the underlying distribution . In fact, this explanation only informs the user that its Feature A is better than the one of an average customer but it does not either (a) advise them on how they can change their features in order to overcome the (adverse) outcome; or (b) inform them on which features were most important in rejecting their application.

Figures 1.ii, 1.iii and 1.iv show how alternative (but still input-invariant) background distributions (, and , respectively) may improve the explanations in terms of informing the user on which features were most important in rejecting their application when compared to other rejected samples, but they still lack the ability of giving useful insight on which features were the most important and therefore should be acted upon in order to reverse the adverse decision . This is due to the contrastive question being posed with respect to (a) samples that have much better (lower) model outputs and (b) samples that have similar model output but that are very different from .

Using a set of counterfactual points as the background dataset solves the issues mentioned in the preceding example. In particular, using counterfactuals as the background dataset allows one to answer a contrastive question that is (a) of interest for the user because it is comparing to samples that are similar to them (and implicitly more “reachable”) and (b) more amenable in terms of access to the underlying distributions. In fact, as mentioned earlier, a useful interpretation of Shapley values-based explanations requires access to the background distribution. Arguably, a user can relate to a set of similar customers more easily than the training set (that may contain very different users). For example, imagine a fraud-prone millionaire being rejected for a consumer trading account who is given an explanation that contrasts them with the average customer (who is very likely neither fraud-prone nor a millionaire).

For example, if we consider Figure 1.v that shows the direction of the Shapley values calculated using the 10 nearest neighbors of the input that were accepted, we note how both features are deemed as contributing to the rejection when compared to similar customers that were instead accepted. And in fact, Feature A has a higher importance than Feature B since the model is locally more sensitive to Feature A than Feature B as shown by the sharper color gradient in the horizontal direction.

As noted earlier, using a set of diverse counterfactual points as a background distribution (contrary to “classic“ input-invariant background distributions) means that the background distribution depends on the input . Therefore, in the sequel we will denote such distributions as where is the name of the counterfactual technique used to generate the background dataset. For simplicity of exposition, from now on we refer to explanations using a diverse set of counterfactual points as the background distribution for the computation of Shapley values as Counterfactual SHAP or, for short, CoSHAP.

Figure 2. Effect of the lack of monotonicity constraint on the Shapley values of two same inputs using the 10-NN as background dataset. White arrows indicate the monotonic trends of the model. Coloured arrows and scatter points represent the directions of the Shapley values vector and the background dataset used for their computation, respectively.

3.2. Monotonicity of the model

As remarked in (Barocas et al., 2020), feature attributions do not clearly provide guidance on how to alter the features in order to change the prediction of a model, and sometimes they can be even misleading in that respect because they make the assumption that the model is monotone.

For instance, if we consider Figure 2.i showing the Shapley values of two points, and , for a monotonic model, we note how both explanations have a positive Shapley value for Feature A but in order to overcome the adverse outcome Feature A must be increased for while it must be decreased for . This means that a user (that has no access to the model) is unable to move the feature in the most sensible direction. In contrast, we observe in Figure 2.ii – where the model is monotone in both features A and B – that the Shapley values for and both again have similar Shapley values. Changing feature A in the same direction as the sign of its Shapley value will decrease the value of the model output for both of the points and , thereby yielding a better outcome for both points. In fact, if the model is monotone then moving a feature with a positive (negative) monotone trend in one direction will give rise to a predictable change in the model output in the same (opposite) direction.

Having described how the background distribution used for the computation of Shapley values and the the monotonic behaviour of the model play a key role in giving a counterfactual interpretation to Shapley values, we now turn to the open question on how we can numerically measure this “counterfactual-ability” of a feature attribution. We will tackle this problem in Section 4.

4. Counterfactual-ability

We seek to formalise the notion that certain feature attributions will be more useful for a model user in changing features to reverse an adverse outcome. It is important to emphasise that predicting how users might engage with explanations is a very challenging problem, and behaviour may vary dramatically depending on the context. We do not claim to resolve this problem. However, we aim to set up a flexible framework to measure the ability of an explanation to help a user reverse an adverse decision, before specialising this framework under certain plausible assumptions about how a user could act on the explanations that they receive.

Counterfactual-ability. To define the counterfactual-ability of a feature attribution we will measure the cost that a user will incur when changing the input into a new input based on the information provided by the feature attribution .

We will measure the cost of changing an input into an another input via a cost function. Formally, a cost function is a function where is the cost for the user of moving from to . A very simple example of cost function could be the Euclidean distance defined in the input space.

In order to describe how a user acts upon the input based on the information provided by the feature attribution we use an action function. Formally, an action function is a function where is a subset of the input space describing plausible changes the user may enact upon when provided with an explanation . We will refer to as the action subset. Note that we do not constraint the action subset to be finite.

Figure 3. Algorithmic intuition of the action function. The purple and orange lines correspond to the action subsets with and , respectively, as per definition in Section 3

. The corresponding purple and orange points correspond to the points in the action subset with minimum cost according to L2-norm. Note that the action subset (line) when taking the top-2 features has the same direction of the Shapley values vector. Also, note that the feature axes are in the quantile space.

Intuitively the output of an action function can be interpreted as a subset of the possible options that a user may consider when changing the input based on the information provided by the feature attribution. For instance, a user may consider as possible options only changes to the most important feature according to the feature attribution. In the most extreme scenario a user may ignore the information provided by and therefore consider any change as a possible option; this would correspond to a constant action function always returning the whole input space as the action subset. In a more realistic scenario though, we expect the user to use the information provided by the feature attribution and therefore we expect the action subset to be a restricted subset of the input space, e.g., allowing only changes to the most top-3 features according to . Later in this section we will formally describe the action function that we use in the scope of this paper.

The counterfactual-ability of a feature attribution is defined as the negation of the infimum cost to act upon given the action subset for . Intuitively, the higher the cost the lower the counterfactual-ability. We formally define the counterfactual-ability of a feature attribution given an input and an action function as follows.

Note that the action function is fixed for a given user; the goal in fact is to compare how different feature attributions perform under a (given) action function rather than optimising the action function for a specific user. We note that, in the degenerate case in which the action function is a constant function always returning the whole input space, solving this optimisation problem is equivalent to finding the (possibly synthetic) counterfactual point with minimum cost from the input .

Note that the larger the action subset (containing the previously considered action subset), the smaller the counterfactual cost and therefore the larger the counterfactual-ability. However, if the action subset has multiple dimensions then the user must solve a difficult optimisation problem to realise the full potential of the counterfactual-ability - in many cases this will be unrealistic to expect. The assumptions we include below will be used to make the optimisation tractable for the user by restricting the action subset to a single line.

Choice of action function. After defining the general concepts of action and cost functions we now define a concrete instance that we use in this paper. To do so, we start with a number of assumptions. These assumptions are designed to create a sensible metric for the counterfactual-ability of an explanation in the context of algorithmic recourse. Intuitively, the assumptions aim to cast the feature attribution as a suggested direction for a user to move in feature space, and the counterfactual-ability of the a feature will therefore measure the distance (under a sensible metric) to the decision boundary along this line.

Action Function. In the scope of this paper, we use the following set of assumptions:

  • Monotonic recourse. When changing a feature a user will move its value in the opposite direction of the monotonic trend, e.g., to reduce the risk of default a user will try to increase their income (as opposed to reducing it).

  • Adverse factors recourse. A user will change the features with positive Shapley values, i.e, the features contributing to the adverse prediction (as opposed to also improving features that are already good).

  • Proportional recourse. A user will change the features with the highest (positive) Shapley values changing them proportionally to their Shapley values.

  • Recourse cost. When moving feature proportionally to their Shapley values we use the quantile shift as metric to compare the cost of the recourse.

We will call the action function satisfying this assumptions where is the number of top features that a user is considering. We formally define it as follows222We use and to indicate the dot and element-wise product, respectively..

where is the trend vector, satisfying if the trend of feature is monotonically positive and if the trend is monotonically negative;

is a function computing the quantile (marginal cumulative distribution function) of each of the features with respect to the distribution induced by the training data; and

is a (binary) indicator vector for the top- features in . Formally:

The intuition behind this choice of action function is that the feature attribution should provide a suggested direction to the user that takes them towards the decision boundary. However, realistic actions will not involve changes to every feature; rather, a user may focus on making changes to only the top- most important features, and we reflect this in our choice of action function. We use the quantile shift as a normalised metric for recourse cost, so that the action subset induced by our action function is a semi-infinite line in the normalized quantile input space in the direction of the Shapley vector with its sign adjusted to match the monotonic trend. To better understand this concept we can consider Figure 3, showing an example of the action subset induced for an input and an attribution .

We note that our choice of action function is just one instantiation of the framework that we propose. We argue that casting the explanation as a direction in which an input point may move is a natural choice that allows for concrete comparisons between methods, but we acknowledge that there is no clear answer to the question of how different users may act upon given in full generality. We believe that this topic represents an interesting future research direction, and we discuss this further in Section 7.

Cost Function. We measured the cost using the quantile shift under L1 and L2-norm, common metrics in the actionable recourse literature (Ustun et al., 2019). Formally:

Method Name Type Distribution
Counterfactual SHAP
CoSHAP FT Shapley
CoSHAP -NN Shapley
CoSHAP -NN Shapley
FREQ -NN Frequency
FREQ FT Frequency
Table 1. Explanation methods used in the experiments divided among Counterfactual SHAP variants and baselines. () Variant of the distribution where points are projected on the decision boundary (see Section 5); () type of feature attribution, i.e., Shapley values or Frequency-based feature attribution; () distribution used as background (for Shapley values) or to generate a (diverse) set of counterfactual points (for frequency-based feature attribution), refer to Section 3.1 for details.
Figure 4. Percentage of times in which the counterfactual-ability of different versions of CoSHAP (CoSHAP -NN, CoSHAP -NN and CoSHAP FT) is higher (better) than the counterfactual-ability of the baseline feature attributions (SHAP TRAIN, SHAP DIFF-LAB,SHAP DIFF-PRED, SHAP DIFF-MEDIAN, FREQ -NN and FREQ FT). The plots show how the counterfactual-ability changes when varying the number of top- features considered in the action function. Each line represents a baseline.

5. Experiments

In order to understand how different variants of CoSHAP perform in terms of counterfactual-ability we compared them against existing feature attribution techniques. Table 1 describes in detail the feature attributions that we considered in our experiments.

In particular, we considered 3 variants of Counterfactual SHAP that differs from each others for the technique used to generate counterfactual points that are as follows.

  • Feature Tweaking (FT) (Tolomei and Silvestri, 2019). We take all the -perturbations of the input that lead to a different prediction. In our experiments we used .

  • -Nearest Neighbours (-NN) (Nugent et al., 2009). We take the nearest points to in the training set, referred to as , such that their predictions are different from ’s. In our experiments we used and the euclidean distance over the quantile space as distance metric.

  • Decision Boundary -Nearest Neighbours (-NN). Since the counterfactual points generated using -NN are samples extracted from the training set they tend to be at a greater distance from the decision boundary (DB) than artificially generated counterfactual points (for instance those generated using FT). For this reason we generated a variant of -NN obtained by intersecting the DB with the lines connecting with the nearest points333 where denotes the line segment between and and and are the CoSHAP -NN and CoSHAP -NN background datasets, respectively. In practise, we obtained by simply applying the bisection method in order to find the intersections. .

For comparison with CoSHAP, we considered baselines belonging to two broad families of feature attribution methods from the literature. On the one hand, we compared CoSHAP with Shapley values obtained using common input-invariant background distributions: , and . We refer to Section 3.1 for more details about these distributions. On the other hand, we compared CoSHAP with existing feature attribution techniques (non-Shapley values-based) that have a counterfactual intent. In particular we considered the frequency-based approach proposed in (Mothilal et al., 2021), that for each feature generates the attribution score as the fraction of counterfactual points possessing a modified value of the feature with respect to the input . In order to generate a (diverse) set of counterfactual points we used the same techniques that we used to generate the background datasets for CoSHAP described earlier in this section, i.e., FT, -NN, -NN.

Setup. To run the experiments we used 4 publicly available datasets: GMSC (Give Me Some Credit) (Kaggle, 2011), HELOC (Home Equity Line Of Credit) (FICO Community, 2019), LC (Lending Club Loan Data) (Kaggle, 2019) and WINE (UCI Wine Quality) (Cortez et al., 2009). For each dataset we used a split. We trained a monotonic XGBoost model (Chen and Guestrin, 2016), using the Spearman’s Rho (with respect to the target variable) to determine the monotonic trend of the features. We hyper-trained the parameters using Bayesian optimization via hyperopt (Bergstra et al., 2013) for 1000 iterations maximizing the average ROC-AUC under a 5-fold cross validation. As described in Section 2, we used

as decision threshold in the raw score space (log odds) for the binary prediction.

Results. We measured the percentage of times in which CoSHAP performs better in terms of counterfactual-ability than baselines over 1000 rejected (i.e. with ) random samples close to the decision boundary (for which444

denotes the sigmoid function.

) for each of the 4 datasets. Figure 4 and Figure 5 show the results using the cost functions and , respectively. We report the main findings.

  • CoSHAP -NN and CoSHAP -NN consistently beat (i.e., ) all of the baselines, performing between and of the cases better than the baselines. We note that further investigation is necessary to fully understand the effects of the hyper-parameters of -NN () on the resulting CoSHAP explanation and how they related with the size and dimensionality of the training data.

  • For several datasets, CoSHAP FT does not perform well against the baselines (i.e., in more than of the considered samples the counterfactual-ability is lower than that of the baselines). In particular, CoSHAP FT fails to beat the performance of the (baseline) SHAP DIFF-PRED that uses the full set of samples predicted as belonging to a different class as background dataset. This is due to the sparsity of the counterfactual points generated by FT.

  • The results are robust with respect to the norm (L1 or L2) used to aggregate the cost of different features.

Figure 5. Percentage of times in which the counterfactual-ability of different versions of CoSHAP (CoSHAP -NN, CoSHAP -NN and CoSHAP FT) is higher than the counterfactual-ability of the baseline feature attributions. This figure is the equivalent of Figure 4 using L2-norm instead of L1-norm in the cost function.

6. Related Work

There has been recently an increasing interest in exploring the relationship between feature importance and counterfactual explanations.

A recent work (Watson, 2021) has proposed a Bayesian decision theory-based approach to the computation of the Shapley values. In particular the idea of (Watson, 2021) is to optimize the choice of the background distribution for the computation of Shapley values maximizing the expected reward for the user, i.e., , under a certain reward function . The work provides a theoretical framework for modelling user preference and beliefs but lacks (by design) concrete (1) guidance on how to select , (2) how to update the reward function based on the observed Shapley values and (3) how to interpret the feature attribution into practical actions on the input in order to (automatically) solve the optimisation problem without resorting to an update in human-in-the-loop fashion.

Other works have proposed to fill the gap between feature attribution techniques and counterfactual explanations by different means than Shapley values. In particular, (Mothilal et al., 2021) and (Sharma et al., 2020) propose techniques to generate feature attributions from a set of diverse counterfactual points but (contrary to us) they use frequency-based approaches, i.e., they give higher attribution to features that are more often changed in counterfactual points. This implies that also features potentially ignored by the model may receive a high feature importance because they are correlated with other features that are really used by the model. As remarked in (Chen et al., 2020) this behaviour may be desirable in some context as medical sciences but not in others, as in the credit scoring scenario in which users are ultimately interested in understanding why they have have been rejected by the model rather than which features correlate with rejection in the data. We used (Mothilal et al., 2021) as a baseline in the experiments in Section 5. In (Chapman-Rounds et al., 2021)

feature attributions are generated by approximating the minimal adversarial perturbation using an adversarially trained neural network on a (differentiable) neural network-based surrogate model. This approach tends to follow the most strictest interpretation of the “true to the model” paradigm

(Chen et al., 2020) enforcing only the class change but does not directly allow for the enforcement of other constraints, e.g., regarding the plausibility of such changes, as we do by providing a background distribution that is based on counterfactuals.

Other works such as (Ramon et al., 2020; Rathi, 2019; Fernández-Loría et al., 2021) analyze the complementary problem to that we analyze in this paper: they show how feature attributions can guide the search of counterfactual points (while we investigate how different techniques for the generation of counterfactual examples can empower better feature attribution).

In general, many works have explored how to evaluate counterfactual explanations (e.g., (Ustun et al., 2019; Rawal and Lakkaraju, 2020; Laugel et al., 2019)) and feature attributions (e.g., (Guidotti et al., 2018; Plumb et al., 2018; Lakkaraju et al., 2019)) but few proposed a quantitative metric to evaluate feature attributions in counterfactual terms. In (White and Garcez, 2019) the authors propose to evaluate feature attributions with a fidelity error for each of the features that (differently from counterfactual-ability) is computed changing only a single feature at a time. We also note that in this paper, our definition of counterfactual-ability is inspired by the notion of quantile shift cost proposed in (Ustun et al., 2019), originally designed to evaluate counterfactual explanations (while counterfactual-ability is a metric for feature attributions).

7. Conclusion and future work

Towards the more general goal of unifying feature attribution techniques and counterfactual explanations, we have shown how using counterfactual points as the background distribution for the computation of Shapley values (CoSHAP) allows one to obtain feature attributions that can better advise towards useful changes of the input in order to overcome an adverse outcome. We proposed a new quantitative framework to evaluate such an effect that we called counterfactual-ability, and remarked on the role that monotonicity of the model plays in the generation of feature attributions with a counterfactual intent. We evaluated CoSHAP on 4 publicly available datasets and highlighted that using simpler counterfactual techniques such as those based on nearest-neighbours within CoSHAP performs better than existing feature attribution methods.

Our proposal can be extended in several directions. Firstly, it would be interesting to explore alternative notions of action and cost function, grounding their definition with findings in psychology and the social sciences concerning how users interpret feature attributions and how they consequently change their behaviour. For example, one possibility would be to expand the definition of action function to take into account user preferences for certain actions – this could be achieved by coupling each “possible action” returned by the action function with a probability. Secondly, investigating additional metrics for the evaluation of a feature attribution in counterfactual terms would also be of interest. For instance, one could include considerations about the sparsity and/or the plausibility of the feature attributions, as well as other metrics drawn from the counterfactual explanations literature. Lastly, testing our approach on different models (e.g., neural networks) and using a wider variety of (potentially model-agnostic) counterfactual explanation techniques as

(Karimi et al., 2020, 2021; Dandl et al., 2020; Kanamori et al., 2020; Rawal and Lakkaraju, 2020; Hashemi and Fathi, 2020) represents another interesting future direction.

From a wider perspective, our work draws attention to some gaps in the literature that we believe are worthy of further investigation. On the one hand, the importance of techniques for the generation of a diverse set of counterfactuals advocated by many practitioners (Mothilal et al., 2020; Russell, 2019; Smyth and Keane, 2021)

. On the other hand, it highlights how few techniques have the capabilities of generating diverse counterfactual explanations in the context of non-differentiable models, such as ensembles of decision trees that are among the most widely adopted in industry.

Disclaimer. This paper was prepared for informational purposes by the Artificial Intelligence Research group of JPMorgan Chase & Co. and its affiliates (“JP Morgan”), and is not a product of the Research Department of JP Morgan. JP Morgan makes no representation and warranty whatsoever and disclaims all liability, for the completeness, accuracy or reliability of the information contained herein. This document is not intended as investment research or investment advice, or a recommendation, offer or solicitation for the purchase or sale of any security, financial instrument, financial product or service, or to be used in any way for evaluating the merits of participating in any transaction, and shall not constitute a solicitation under any jurisdiction or to any person, if such solicitation under such jurisdiction or to such person would be unlawful.


  • A. Adadi and M. Berrada (2018) Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 6, pp. 52138–52160. Cited by: §1.
  • E. Albini, A. Rago, P. Baroni, and F. Toni (2020)

    Relation-Based Counterfactual Explanations for Bayesian Network Classifiers

    In Proc. of the 29th Int. J. Conf. on Artificial Intell., IJCAI, pp. 451–457. Cited by: §2.2.
  • S. Barocas, A. D. Selbst, and M. Raghavan (2020) The Hidden Assumptions Behind Counterfactual Explanations and Principal Reasons. In Proc. of the 2020 Conf. on Fairness, Accountability, and Transparency, FAccT, pp. 80––89. Cited by: §3.2, §3.
  • A. Barredo Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila, and F. Herrera (2020) Explainable Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion 58, pp. 82–115. Cited by: §1.
  • J. Bergstra, D. Yamins, and D. Cox (2013)

    Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures

    In Proc. of the 30th Int. Conf. on Int. Conf. on Machine Learning, ICML, pp. I–115–I–123. Cited by: §5.
  • M. Chapman-Rounds, U. Bhatt, E. Pazos, M. Schulz, and K. Georgatzis (2021) FIMAP: Feature Importance by Minimal Adversarial Perturbation. Proc. of the 35th AAAI Conf. on Artificial Intell. 35 (13), pp. 11433–11441. Cited by: §6.
  • H. Chen, J. D. Janizek, S. Lundberg, and S. Lee (2020) True to the Model or True to the Data?. In ICML ’20 Workshop on Human Interpretability, External Links: 2006.16234 Cited by: §6.
  • T. Chen and C. Guestrin (2016) XGBoost: a scalable tree boosting system. In Proc. of the 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD, pp. 785–794. Cited by: §5.
  • P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis (2009) Modeling wine preferences by data mining from physicochemical properties. Decision Support Syst. 47 (4), pp. 547–553. Cited by: §5.
  • K. Čyras, A. Rago, E. Albini, P. Baroni, and F. Toni (2021) Argumentative XAI: a Survey. In Proc. of the 29th Int. J. Conf. on Artificial Intell., IJCAI, Vol. 5, pp. 4392–4399. Cited by: §1.
  • S. Dandl, C. Molnar, M. Binder, and B. Bischl (2020) Multi-objective counterfactual explanations. In Proc. of the 16th Int. Conf. on Parallel Problem Solving from Nature, Vol. 12269 LNCS, pp. 448–469. Cited by: §2.2, §7.
  • C. Fernández-Loría, F. Provost, and X. Han (2021) Explaining Data-Driven Decisions Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach. External Links: 2001.07417 Cited by: §6.
  • FICO Community (2019) Explainable Machine Learning Challenge. External Links: Link Cited by: §5.
  • R. Guidotti, A. Monreale, S. Ruggieri, D. Pedreschi, F. Turini, and F. Giannotti (2018) Local Rule-Based Explanations of Black Box Decision Systems. External Links: 1805.10820 Cited by: §6.
  • R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi (2019) A Survey of Methods for Explaining Black Box Models. ACM Computing Surveys 51 (5), pp. 1–42. Cited by: §1.
  • M. Hashemi and A. Fathi (2020) PermuteAttack: counterfactual explanation of machine learning credit scorecards. External Links: 2008.10138 Cited by: §7.
  • Kaggle (2011) Give Me Some Credit Competition. External Links: Link Cited by: §5.
  • Kaggle (2019) Lending Club Loan Data. External Links: Link Cited by: §5.
  • K. Kanamori, T. Takagi, K. Kobayashi, and H. Arimura (2020) DACE: distribution-Aware Counterfactual Explanation by Mixed-Integer Linear Optimization. In Proc. of the 29th Int. J. Conf. on Artificial Intell., IJCAI, pp. 2855–2862. Cited by: §2.2, §7.
  • A. Karimi, G. Barthe, B. Balle, and I. Valera (2020) Model-Agnostic Counterfactual Explanations for Consequential Decisions. In Proc. of the 24th Int. Conf. on Artificial Intell. and Statistics, AISTATS, pp. 895–905. Cited by: §2.2, §7.
  • A. Karimi, E. Zürich, S. B. Schölkopf, and I. Valera (2021) Algorithmic Recourse: from Counterfactual Explanations to Interventions. In Proc. of the 2021 ACM Conf. on Fairness, Accountability, and Transparency, FAccT, pp. 353––362. Cited by: §2.2, §7.
  • M. T. Keane, E. M. Kenny, E. Delaney, and B. Smyth (2021) If Only We Had Better Counterfactual Explanations: Five Key Deficits to Rectify in the Evaluation of Counterfactual XAI Techniques. In Proceeding of the 30th Int. J. Conf. on Artificial Intell., IJCAI, pp. 4466–4474. Cited by: §1, §2.2.
  • I. E. Kumar, S. Venkatasubramanian, C. Scheidegger, and S. A. Friedler (2020) Problems with Shapley-value-based explanations as feature importance measures. In ICML 2020, pp. 5491–5500. Cited by: item 2, §3.
  • H. Lakkaraju, E. Kamar, R. Caruana, and J. Leskovec (2019) Faithful and Customizable Explanations of Black Box Models. In Proc. of the 2019 AAAI/ACM Conf. on AI, Ethics, and Society, FAccT, pp. 131–138. Cited by: §6.
  • T. Laugel, M. Lesot, C. Marsala, X. Renard, and M. Detyniecki (2019) The dangers of post-hoc interpretability: unjustified counterfactual explanations. In Proc. of the 28th Int. J. Conf. on Artificial Intell., IJCAI, pp. 2801–2807. Cited by: §6.
  • S. M. Lundberg and S. Lee (2017) A Unified Approach to Interpreting Model Predictions. In Adv. in Neural Information Processing Syst., NeurIPS, pp. 4768––4777. Cited by: §1, §2.1, §2.1, 1st item.
  • S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S. Lee (2020) From local explanations to global understanding with explainable AI for trees. Nature Machine Intell. 2 (1), pp. 56–67. Cited by: §1, §2.1.
  • L. Merrick and A. Taly (2020) The Explanation Game: Explaining Machine Learning Models Using Shapley Values. In Int. Cross-Domain Conf. for Machine Learning and Knowledge Extraction, CD-MAKE, pp. 17–38. Cited by: item 1, §3.1.
  • J. W. L. Merrill, G. M. Ward, S. J. Kamkar, J. Budzik, and D. C. Merrill (2019) Generalized Integrated Gradients: A practical method for explaining diverse ensembles. J. of Machine Learning R. - Under Review. External Links: 1909.01869 Cited by: §1.
  • T. Miller (2019) Explanation in Artificial Intelligence: Insights from the Social Sciences. Artificial Intell. 267, pp. 1–38. Cited by: item 2.
  • R. K. Mothilal, D. Mahajan, C. Tan, and A. Sharma (2021) Towards Unifying Feature Attribution and Counterfactual Explanations: different Means to the Same End. In Proc. of the 2021 AAAI/ACM Conf. on AI, Ethics, and Society, AIES, pp. 652–663. Cited by: §5, §6.
  • R. K. Mothilal, A. Sharma, and C. Tan (2020) Explaining machine learning classifiers through diverse counterfactual explanations. In Proc. of the 2020 Conf. on Fairness, Accountability, and Transparency, Proc. of the 2020 ACM Conf. on Fairness, Accountability, and Transparency, FAccT, pp. 607–617. Cited by: item 2, §7.
  • C. Nugent, D. Doyle, and P. Cunningham (2009) Gaining Insight through Case-Based Explanation. J. of Intelligent Information Syst. 32, pp. 267–295. Cited by: 2nd item.
  • M. Pawelczyk, K. Broelemann, and G. Kasneci (2020) Learning Model-Agnostic Counterfactual Explanations for Tabular Data. In Proc. of The Web Conf. 2020, WWW, pp. 3126––3132. Cited by: §2.2.
  • G. Plumb, D. Molitor, and A. S. Talwalkar (2018) Model Agnostic Supervised Local Explanations. In Adv. in Neural Information Processing Syst., NeurIPS, pp. 2520––2529. Cited by: §6.
  • R. Poyiadzi, K. Sokol, R. Santos-Rodriguez, T. De Bie, and P. Flach (2020) FACE: Feasible and Actionable Counterfactual Explanations. In Proc. of the AAAI/ACM Conf. on AI, Ethics, and Society, AIES, pp. 344–350. Cited by: §2.2.
  • Y. Ramon, D. Martens, F. Provost, and T. Evgeniou (2020) A comparison of instance-level counterfactual explanation algorithms for behavioral and textual data: SEDC, LIME-C and SHAP-C. Adv. in Data Analysis and Classification 14, pp. 801–819. Cited by: §6.
  • S. Rathi (2019) Generating Counterfactual and Contrastive Explanations using SHAP. In 2nd Workshop on Humanizing AI (HAI) at IJCAI ’19, External Links: 1906.09293 Cited by: §6.
  • K. Rawal and H. Lakkaraju (2020) Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses. In Adv. in Neural Information Processing Syst., NeurIPS, pp. 12187–12198. Cited by: §6, §7.
  • M. T. Ribeiro, S. Singh, and C. Guestrin (2016) ”Why Should I Trust You?”. In Proc. of the 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD, pp. 1135–1144. Cited by: §1.
  • C. Russell (2019) Efficient Search for Diverse Coherent Explanations. In Proc. of the 2019 Conf. on Fairness, Accountability, and Transparency, FAccT, pp. 20–28. Cited by: item 2, §7.
  • L. S. Shapley (1951) Notes on the n-Person Game-II: The Value of an n-Person Game. U.S. Air Force, Project Rand. Cited by: §1.
  • S. Sharma, J. Henderson, and J. Ghosh (2020) CERTIFAI: A Common Framework to Provide Explanations and Analyse the Fairness and Robustness of Black-box Models. In Proc. of the AAAI/ACM Conf. on AI, Ethics, and Society, AIES, pp. 166–172. Cited by: §2.2, §6.
  • B. Smyth and M. T. Keane (2021) A Few Good Counterfactuals: generating Interpretable, Plausible and Diverse Counterfactual Explanations. External Links: 2101.09056 Cited by: §2.2, §7.
  • T. Spooner, D. Dervovic, J. Long, J. Shepard, J. Chen, and D. Magazzeni (2021) Counterfactual Explanations for Arbitrary Regression Models. In ICML’21 Workshop on Algorithmic Recourse, External Links: 2106.15212 Cited by: §2.2.
  • I. Stepin, J. M. Alonso, A. Catala, and M. Pereira-Farina (2021) A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence. IEEE Access 9, pp. 11974–12001. Cited by: §2.2.
  • E. Strumbelj and I. Kononenko (2010) An Efficient Explanation of Individual Classifications using Game Theory. J. of Machine Learning R. 11, pp. 1–18. Cited by: §2.1.
  • A. Sudjianto and S. Zoldi (2021) The Case for Interpretable Models in Credit Underwriting. External Links: Link Cited by: §1, §1.
  • G. Tolomei and F. Silvestri (2019) Generating Actionable Interpretations from Ensembles of Decision Trees. IEEE Transactions on Knowledge and Data Engineering, pp. 1540–1553. Cited by: 1st item.
  • U.S. Congress (2018) 12 CFR Part 1002 - Equal Credit Opportunity Act (Regulation B). External Links: Link Cited by: §1.
  • B. Ustun, A. Spangher, and Y. Liu (2019) Actionable Recourse in Linear Classification. In Proc. of the Conf. on Fairness, Accountability, and Transparency, FAccT, pp. 10–19. Cited by: §2.2, §4, §6.
  • S. Verma, A. Ai, J. Dickerson, and K. Hines (2020) Counterfactual Explanations for Machine Learning: A Review. External Links: 2010.10596 Cited by: §2.2.
  • S. Wachter, B. Mittelstadt, and C. Russell (2018) Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. Harvard J. of Law & Technology 31, pp. 1–52. Cited by: §1, §2.2.
  • D. S. Watson (2021) Rational Shapley Values. External Links: 2106.10191 Cited by: §6.
  • A. White and A. d. Garcez (2019) Measurable Counterfactual Local Explanations for Any Classifier. In Proc. of the 24th European Conf. on Artificial Intell., ECAI, pp. 2529–2535. Cited by: §6.