Decisions, Counterfactual Explanations and Strategic Behavior

Data-driven predictive models are increasingly used to inform decisions that hold important consequences for individuals and society. As a result, decision makers are often obliged, even legally required, to provide explanations about their decisions. In this context, it has been increasingly argued that these explanations should help individuals understand what would have to change for these decisions to be beneficial ones. However, there has been little discussion on the possibility that individuals may use the above counterfactual explanations to invest effort strategically in order to maximize their chances of receiving a beneficial decision. In this paper, our goal is to find policies and counterfactual explanations that are optimal in terms of utility in such a strategic setting. To this end, we first show that, given a pre-defined policy, the problem of finding the optimal set of counterfactual explanations is NP-hard. However, we further show that the corresponding objective is nondecreasing and satisfies submodularity. Therefore, a standard greedy algorithm offers an approximation factor of (1-1/e) at solving the problem. Additionally, we also show that the problem of jointly finding both the optimal policy and set of counterfactual explanations reduces to maximizing a non-monotone submodular function. As a result, we can use a recent randomized algorithm to solve the problem, which offers an approximation factor of 1/e. Finally, we illustrate our theoretical findings by performing experiments on synthetic and real lending data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/22/2019

Optimal Decision Making Under Strategic Behavior

We are witnessing an increasing use of data-driven predictive models to ...
11/27/2019

Actionable Interpretability through Optimizable Counterfactual Explanations for Tree Ensembles

Counterfactual explanations help users understand why machine learned mo...
06/04/2021

Counterfactual Explanations Can Be Manipulated

Counterfactual explanations are emerging as an attractive option for pro...
05/27/2019

Model-Agnostic Counterfactual Explanations for Consequential Decisions

Predictive models are being increasingly used to support consequential d...
07/06/2021

Counterfactual Explanations in Sequential Decision Making Under Uncertainty

Methods to find counterfactual explanations have predominantly focused o...
06/23/2020

On Counterfactual Explanations under Predictive Multiplicity

Counterfactual explanations are usually obtained by identifying the smal...
10/10/2020

Scaling Guarantees for Nearest Counterfactual Explanations

Counterfactual explanations (CFE) are being widely used to explain algor...

Code Repositories

strategic-decisions

Code and data for decision making under strategic behavior


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Whenever a bank decides to offer a loan to a customer, a judge decides to grant bail to a person, or a company decides to hire a new employee, the decision is increasingly informed by a data-driven predictive model. In all these high-stakes applications, the goal of the decision maker is to take decisions that maximize a given utility function while the goal of the predictive model is to provide accurate predictions of the outcomes from a set of observable features. For example, a bank may decide whether or not to offer a loan to a customer using the model’s estimate of the probability that the customer would repay the loan.

In this context, there has been a tremendous excitement on the potential of data-driven predictive models to enhance decision making in high-stakes applications. However, there has also been a heated debate about their lack of transparency and explainability (Doshi-Velez and Kim, 2017; Weller, 2017; Lipton, 2018; Gunning and Aha, 2019; Rudin, 2019). As a result, there already exists a legal requirement to grant individuals who are subject to (semi)-automated decision making the right-to-explanation in the European Union (Voigt and Von dem Bussche, 2017; Wachter et al., 2017a)

. With this motivation, there has been a flurry of work on interpretable machine learning 

(Ribeiro et al., 2016; Koh and Liang, 2017; Lundberg and Lee, 2017; Chakraborty et al., 2017; Murdoch et al., 2019; Wachter et al., 2017b; Karimi et al., 2019; Mothilal et al., 2020), which has predominantly focused on developing methods to find explanations for the predictions made by a predictive model. Within this line of work, the work most closely related to ours (Wachter et al., 2017b; Karimi et al., 2019; Mothilal et al., 2020) aims to find counterfactual explanations that help individuals understand what would have to change for a predictive model to make a positive prediction about them. In our work, rather than explaining predictions, we pursue the development of methods to find counterfactual explanations for the decisions taken by a decision maker111These counterfactual explanations help individuals understand what would have to change in order to receive a beneficial decision, rather than a positive prediction., which are ultimately what individuals who are subject to (semi)-automated decision making typically care about.

Once we focus on explaining decisions, we cannot overlook the possibility that individuals may use these explanations to invest effort strategically in order to maximize their chances of receiving a beneficial decision. However, this is also an opportunity for us to find counterfactual explanations that help individuals to self-improve and eventually increase the utility of a decision policy, as noted by several studies in economics (Coate and Loury, 1993; Fryer and Loury, 2013; Hu and Chen, 2018) and, more recently, in the computer science literature (Kleinberg and Raghavan, 2019; Tabibian et al., 2020). For example, if a bank explains to a customer that, if she reduces her credit card debt by 20%, she will receive the loan she is applying for, she may feel compelled to reduce her overall credit card debt by the proposed percentage to pay less interest, improving her financial situation, and this will eventually increase the profit the bank makes when she is able to successfully return the loan. This is in contrast with previous work on interpretable machine learning, which have ignored the influence that (counterfactual) explanations (of predictions by a predictive model) may have on the accuracy of predictive models and the utility of the decision policies.

Our contributions. We cast the above problem as a Stackelberg game in which the decision maker moves first and shares her counterfactual explanations before individuals best-respond to these explanations and invest effort to receive a beneficial decision. Under this problem formulation, we first show that, given a pre-defined policy, the problem of finding the optimal set of counterfactual explanations is NP-hard by using a novel reduction of the Set Cover problem (Karp, 1972). However, we further show that the corresponding objective function is monotone and submodular and, as a direct consequence, it readily follows that a standard greedy algorithm offers a approximation ratio at solving the problem. In addition, we show that, given a pre-defined set of counterfactual explanations, the optimal policy is deterministic and can be computed in polynomial time. Building on this result, we can reduce the problem of jointly finding both the optimal policy and set of counterfactual explanations to maximizing a non-monotone submodular function. As a consequence, we can use a recent randomized algorithm to solve the problem, which offers an approximation factor of . Finally, we perform a series of experiments with synthetic and real lending data to illustrate our theoretical findings and show that the counterfactual explanations and decision policies found by the above algorithms achieve higher utility than several competitive baselines.

Further related work. Our work builds upon previous work on interpretable machine learning, strategic machine learning, and machine-assisted decision making.

There is not yet an agreement on what constitutes a good post-hoc explanation in the literature on interpretable machine learning. However, most previous work focuses on one of the two following types of explanations: (i) feature-based explanations (Ribeiro et al., 2016; Koh and Liang, 2017; Lundberg and Lee, 2017) or (ii) counterfactual explanations (Wachter et al., 2017b; Karimi et al., 2019; Mothilal et al., 2020). Feature-based explanations help individuals understand the importance each feature has on a particular prediction, typically through local approximation, while counterfactual explanations help them understand what features would have to change for a predictive model to make a positive prediction about them. While our work also focuses on counterfactual explanations, it focuses on explaining decisions, rather than predictions. By doing so, it sheds light on the possibility of using explanations to increase the utility of a decision policy, uncovering a previously unexplored connection between interpretable machine learning and the nascent field of strategic machine learning.

Similarly as in our work, previous work on strategic machine learning also assumes that individuals may use knowledge, gained by transparency, to invest effort strategically in order to receive either a positive prediction (Dalvi et al., 2004; Brückner and Scheffer, 2011; Hardt et al., 2016; Dong et al., 2018; Hu et al., 2019; Milli et al., 2019; Miller et al., 2019) or a beneficial decision (Kleinberg and Raghavan, 2019; Tabibian et al., 2020). However, none of this previous work focuses on finding (counterfactual) explanations and they assume full transparency—individuals who are subject to (semi)-automated decision making can observe the entire predictive model or the decision policy. As a result, their formulation is fundamentally different and their technical contributions are orthogonal to ours.

In the machine-assisted decision making literature, the distinction between decisions and predictions has not been made explicit until very recently (Corbett-Davies et al., 2017; Kleinberg et al., 2018; Mitchell et al., 2018; Valera et al., 2018; Kilbertus et al., 2019; Tabibian et al., 2020). However, previous work has focused on the design of optimal decision policies rather than (counterfactual) explanations.

2 Preliminaries

Given an individual with a feature vector

and a (ground-truth) label , we assume a decision controls whether the corresponding label is realized222Without loss of generality, we assume each feature takes different values.. This setting fits a variety of real-world scenarios. For example, in university admissions, the decision specifies whether a student is admitted () or rejected (); the label indicates whether the student completes the program () or drops out () upon acceptance; and the feature vector () may include her GRE scores, undergraduate GPA, or research experience. Throughout the paper, we denote the set of feature values as , where denotes the number of feature values.

Each decision is sampled from a decision policy , where, for brevity, we will write . For each individual, the label

is sampled from a conditional probability distribution

and, without loss of generality, we index the feature values in decreasing order with respect to their corresponding outcome, , . Moreover, we adopt a Stackelberg game-theoretic formulation in which each individual with initial feature value receives a (counterfactual) explanation from the decision maker by means of a feature value before she (best-)responds333In practice, individuals with initial feature values such that may not receive any explanation since they are guaranteed to receive a positive decision.. Here, note that each individual is guaranteed to receive a positive decision if she changes from her initial feature value to the feature value . Inspired by Tabibian et al. (2020), we will assume that each individual’s best response is to change from her initial feature value to if and only if

(1)

and it is to keep her initial feature value otherwise. In the above equation, is the cost444In practice, the cost for each pair of feature values may be given by a parameterized function. an individual pays for changing from to and is the (immediate) benefit she obtains from a policy , where the function is problem dependent. Here, for simplicity, we assume that and thus the benefit is proportional to the probability that an individual receives a positive decision and, for each feature value , we refer to as its counterfactual explanation and as its region of adaptation.

At a population level, the above best response results into a transportation of mass between the original feature distribution and a new feature distribution induced by the policy and the counterfactual explanations . More specifically, we can readily derive an analytical expression for the induced feature distribution in terms of the original feature distribution, ,

(2)

Similarly as in previous work (Corbett-Davies et al., 2017; Valera et al., 2018; Kilbertus et al., 2019; Tabibian et al., 2020), we will assume that the decision maker is rational and aims to maximize the (immediate) utility , which is the expected overall profit she obtains, ,

(3)

where is a given constant reflecting economic considerations of the decision maker. For example, in university admissions, the term is proportional to the expected number of students who are admitted and complete the program, the term is proportional to the number of students who are admitted, and measures the cost of education in units of graduated students. As a direct consequence, given a feature value and a set of counterfactual explanations , we can conclude that, if , the decision maker will decide to provide the counterfactual explanation that provides the largest utility gain under the assumption that individuals best respond, ,

(4)

and, if , we arbitrarily assume that 555Note that, if , the individual’s best response is to keep her initial feature value and thus any choice of counterfactual explanation leads to the same utility..

Given the above preliminaries, our goal is to help the decision maker to first find the optimal set of counterfactual explanations for a pre-defined policy in Section 3 and then both the optimal policy and set of counterfactual explanations in Section 4.

Remarks. Given an individual with initial feature value , one may think that, by providing the counterfactual explanation that gives the largest utility gain, the decision maker is not acting in the individual’s best interest but rather selfishly. This is because there may exist another counterfactual explanation with lower cost for the individual, , . However, in our work, we argue that the provided counterfactual explanations help the individual to achieve a greater self-improvement and this is likely to result in a superior long-term well-being. For example, if a bank explains to a customer that she will receive the loan she is applying for if she reduces her credit card debt by %, rather than , even though the corresponding feature values are both within the region of adaptation of her original features , the customer will be more likely to default and this will very negatively impact her long-term well-being.

As argued very recently (Miller et al., 2019; Tabibian et al., 2020), due to Goodhart’s law, the conditional probability may change after individuals (best)-respond if the features are noncausal. Moreover, Miller et al. (2019) have argued that (best-)responses to noncausal and causal features correspond to gaming and improvement, respectively. In this work, for simplicity, we assume that does not change, however, it would be very interesting to lift this assumption in future work.

3 Finding the optimal counterfactual explanations for a policy

In this section, our goal is to find the optimal set of counterfactual explanations for a pre-defined policy , ,

(5)

where is the maximum number of counterfactual explanations the decision maker is willing to provide to the population to balance the right to explanation with trade secrets. As it will become clearer in the experimental evaluation in Section 5, our results may persuade decision makers to be transparent about their decision policies, something they are typically reluctant to be despite the increasing legal requirements, since we show that transparency increases the utility of the policies. Moreover, throughout this section, we will assume that the decision maker picking the pre-defined policy is rational666Note that, if the decision maker is rational and her goal is to maximize the utility, as defined in Eq. 3, then, for all such that , it holds that . and the policy is outcome monotonic777A policy is called outcome monotonic if .888If the policy is deterministic, our results also hold for non outcome monotonic policies. (Tabibian et al., 2020). Outcome monotonicity just implies that, the higher an individual’s outcome , the higher their chances of receiving a positive decision .

Unfortunately, using a novel reduction of the Set Cover problem (Karp, 1972), the following theorem reveals that we cannot expect to find the optimal set of counterfactual explanations in polynomial time: The problem of finding the optimal set of counterfactual explanations that maximizes utility under a cardinality constraint is NP-Hard.

Consider an instance of the Set Cover problem with a set of elements and a collection such that . In the decision version of the problem, given a constant , we need to answer the question whether there are at most sets from the collection such that their union is equal to or not. With the following procedure, we show that any instance of that problem can be transformed to an instance of the problem of finding the optimal set of counterfactual explanations, defined in Eq. 5, in polynomial time.

Consider feature values corresponding to the elements of and the sets of . Moreover, denote the first feature values as and the remaining as . We set the decision maker’s parameter to some positive constant less than . Then, we set the outcome probabilities and and the policy values and . This way, the portion of utility the decision-maker obtains from the first feature values is zero, while the portion of utility she obtains from the remaining is proportional to . Regarding the cost function, we set , , and all the remaining values of the cost function to . Finally, we set the initial feature value distribution to and . A toy example of this transformation is presented in Figure 1.

0

2

0

0

0

0

0

0
Figure 1: Consider that and with , . The red feature values have initial population , and while for the green feature values it is , and . The edges represent the cost between feature values corresponding to sets and their respective elements while all the non-visible pairwise costs are equal to 2.

In this setting, it easy to observe that an individual with initial feature value is always rejected at first and has the ability to move to a new feature value recommended to her iff . Also, we can easily see that the transformation of instances can be done in time.

Now, assume there exists an algorithm that optimally solves the problem of finding the optimal set of counterfactual explanations in polynomial time. Given the aforementioned instance and a maximum number of counterfactual explanations , the utility achieved by the set of counterfactual explanations the algorithm returns can fall into one of the following two cases:

  1. . This can happen only if all individuals, according to the induced distribution , have moved to some of the feature values , , for all with , there exists with such that with . As a consequence, if we define , it holds that for all with , there exists with such that and therefore is a set cover with .

  2. . This can happen only if every possible set of counterfactual explanations leaves the individuals of at least one feature value with a best-response of not following the counterfactual explanation they were given, , for all such that , there exists with such that, for all , it holds that . Equivalently, it holds that for all such that , there exists with such that for all , it holds that and therefore there does not exist a set cover of size less or equal than .

The above directly implies that we can have a decision about any instance of the Set Cover problem in polynomial time, which is a contradiction unless . This concludes the reduction and proves that the problem of finding the optimal set of counterfactual explanations for a given policy is NP-Hard.

Even though Theorem 3 is a negative result, we will now show that the objective function in Eq. 5 satisfies a set of desirable properties, , non-negativity, monotonicity and submodularity999A function is submodular if for every and it holds that ., which allow a standard greedy algorithm to enjoy a approximation ratio at solving the problem. To this aim, with a slight abuse of notation, we first express the objective function as a set function , which takes values over the ground set of counterfactual explanations, . Then, we have the following proposition: The function is non-negative, submodular and monotone. It readily follows that the function is non-negative from the fact that, if the decision maker is rational, it holds that for all such that .

Now, consider two sets and a feature value . Also, let be the counterfactual explanation given to the individuals with initial feature value under a set of counterfactual explanations . It is easy to see that the marginal difference can only be affected by individuals with initial features such that , and . Moreover, we can divide all of these individuals into two cases:

  1. : in this case, the addition of to causes a change in their best-response from to contributing to the marginal difference of by a factor . However, considering the marginal difference of under the set of counterfactual explanations , three subcases are possible:

    1. : the contribution to the marginal difference of is zero.

    2. : the contribution to the marginal difference of is . Since is outcome monotonic, and , it holds that

      Therefore, it readily follows that

    3. : the contribution to the marginal difference of is .

  2. : In this case, the addition of to causes a change in their best-response from to contributing to the marginal difference of by a factor . Considering the marginal difference of under the set of counterfactual explanations , two subcases are possible:

    1. : the contribution to the marginal difference of is zero.

    2. . Then, the contribution of those individuals to the marginal difference of is . Since and , it readily follows that

Finally, because , we can conclude that and therefore the aforementioned cases are sufficient. Combining all cases, we can see that the contribution of each individual to the marginal difference of is always greater or equal under the set of counterfactual explanations than under the set of counterfactual explanations . As a direct consequence, it follows that is submodular. Additionally, we can easily see that this contribution is always greater or equal than zero, leading to the conclusion that is also monotone.

The above result directly implies that the standard greedy algorithm by Nemhauser et al. (1978) will find a solution to the problem such that , where is the optimal set of counterfactual explanations.

0:  Ground set of counterfactual explanations , maximum number of counterfactual explanations and utility function
0:  Set of counterfactual explanations
1:  
2:  while  do
3:     
4:     
5:  end while
6:  return
ALGORITHM 1 Greedy algorithm

Algorithm 1 summarizes the overall procedure. It starts from a solution set and it iteratively adds to the counterfactual explanation that provides the maximum marginal difference . Finally, note that, while the greedy algorithm has a complexity of under oracle access to the utility function , it is easy to show that, given a set of counterfactual explanations, it takes to compute the utility . As a consequence, for our problem, the greedy algorithm has a complexity of

Remarks on fair decision making. In many cases, decision makers may like to ensure that individuals across the whole spectrum of the population are incentivized to self-improve. For example, in a loan scenario, the bank may use age group as a feature to estimate the probability that a customer repays the loan, however, it may like to deploy a decision policy that incentivizes individuals across all age groups in order to improve the financial situation of all. To this aim, rather than the cardinality constraint , the decision maker could incorporate a matroid constraint into the problem formulation. Formally, consider disjoint sets such that and integers such that . Then, a partition matroid is the collection of sets . In the loan example, the decision maker could search for a set of counterfactual explanations within a partition matroid where each one of the ’s corresponds to the feature values covered by each age group and . This way, the set of counterfactual explanations would include explanations for every age group. In this case, the decision maker could rely on a variety of polynomial time algorithms with global guarantees for submodular function maximization under matroid constraints, , the algorithm by Calinescu et al. (2011).

4 Finding the optimal policy and counterfactual explanations

In this section, our goal is to jointly find the optimal decision policy and set of counterfactual explanations , ,

(6)

where, similarly as in the previous section, is the maximum number of counterfactual explanations the decision maker is willing to provide to the population to balance the right to explanation with trade secrets. By jointly optimizing both the decision policy and the counterfactual explanations, Figure 2 shows that we may obtain an additional gain in terms of utility in comparison with just optimizing for the set of counterfactual explanations given the optimal decision policy in a non-strategic setting. Moreover, as we will show in the experimental evaluation in Section 5, this additional gain will be significant.

(a) Non-strategic policy
(b)
(c) Strategic policy
Figure 2: Jointly optimizing the decision policy and the counterfactual explanations can offer additional gains. The left panel shows the optimal (deterministic) decision policy under non-strategic behavior, as given by Eq. 8. Here, there does not exist a set of counterfactual explanations that increases the utility of the policy. This happens because the area of adaption of and does not include any feature value that receives a positive decision. The right panel shows the decision policy and counterfactual explanations that are (jointly) optimal in terms of utility, as given by Eq. 6. Here, the individuals with feature values and receive and , respectively, as counterfactual explanations. Since these explanations are within their areas of adaptation and , they change their initial feature values in order to receive a positive decision. Both panels share the same , and .

Similarly as in Section 3, we cannot expect to find the optimal policy and set of counterfactual explanations in polynomial time. More specifically, we have the following negative result: The problem of jointly finding both the optimal policy and the set of counterfactual explanations that maximize utility under a cardinality constraint is NP-hard. The theorem can be easily proven by using Proposition 4 and slightly extending the proof of Theorem 3.

However, while the problem of finding both the policy and the set of counterfactual explanations appears significantly more challenging than the problem of finding just the set of counterfactual explanations given a pre-defined policy (refer to Eq. 5), the following proposition shows that the problem is not inherently harder. More specifically, for each possible set of counterfactual explanations, it shows that the policy that maximizes the utility can be easily computed: Given a set of counterfactual explanations 101010Since the decision maker is rational, she will never provide an explanation that contributes negatively to her utility., the policy that maximizes the utility is deterministic and can be found in polynomial time, ,

(7)

By definition, since , it readily follows that for all . To find the remaining values of the decision policy, we first observe that, for each , the value of the decision policy does not affect the best-responses of the individuals with initial feature values . As a result, we can just set for all independently for each feature value such that the best-response of the respective individuals is the one that contributes maximally to the overall utility.

First, it is easy to see that, for all such that , we should set . Next, consider the feature values such that . Here, we distinguish two cases. If there exists such that , then, if the individuals move to that , the corresponding contribution to the utility will be higher. Moreover, the value of the decision policy that maximizes their region of adaption (and thus increases their chances of moving to ) is clearly . If there does not exist such that , then, the contribution of the corresponding individuals to the utility will be higher if they keep their initial feature values. Moreover, the value of the decision policy that will maximize this contribution will be clearly . Finally, note that, to set all the values of the decision policy, we only need to perform comparisons.

The above result reveals that, in contrast with the non strategic setting, the optimal policy given a set of counterfactual explanations is not a deterministic threshold rule with a single threshold (Corbett-Davies et al., 2017; Valera et al., 2018), ,

(8)

but rather a more conservative deterministic decision policy that does not depend only on the outcome and but also on the cost individuals pay to change features. Moreover, we can build up on the above result to prove that the problem of finding the optimal decision policy and set of counterfactual explanations can be reduced to maximizing a non-monotone submodular function. To this aim, let be the optimal policy induced by a given set of counterfactual explanations , as in Proposition 4, and define the set function over the ground set . Then, we have the following proposition:

The function is non-negative and submodular.

It readily follows that the function is non-negative from the fact that, if the decision maker is rational, for all such that .

Next, consider two sets such that and a feature value . Also, let be the counterfactual explanation given to the individuals with initial feature value under a set of counterfactual explanations . Then, it is clear that the marginal difference only depends on individuals with initial features such that either and or . Moreover, if and , the contribution to the marginal difference is positive and, if , the contribution to the marginal difference is negative.

Consider first the individuals with initial features such that and . We can divide all of these individuals into three cases:

  1. : in this case, and the individuals change their best-response from to . Moreover, under the set of counterfactual explanations , their best-response is either or and it changes to . Then, using a similar argument as in the proof of proposition 3, we can conclude that the contribution of the individuals to the marginal difference is greater or equal under the set of counterfactual explanations than under .

  2. : in this case, and . Therefore, under the set of counterfactual explanations , the individuals’ best-response changes from to and there is a positive contribution to the marginal difference while, under , the individuals’ best response does not change and the contribution to the marginal difference is zero.

  3. : in this case, . Therefore, the best-response changes from to under both sets of counterfactual explanations and there is an equal positive contribution to the marginal difference.

Now, consider the individuals with initial features such that . We can divide all of these individuals also into three cases:

  1. : in this case, under both sets of counterfactual explanations, the counterfactual explanation changes the value of the decision policy to . Moreover, the contribution to the marginal difference is less negative under the set of counterfactual explanations than under since and thus .

  2. : in this case, under the set of counterfactual explanations , the individuals’ best response does not change and thus the contribution to the marginal difference is zero and, under the set of counterfactual explanations , their best-response changes from to and thus there is a negative contribution to the marginal difference , .

  3. : in this case, under both sets of counterfactual explanations, the individuals’ best response does not change and thus the contribution to the marginal difference is zero.

As a direct consequence of the above observations, it readily follows that and therefore the function is submodular.

However, in contrast with Section 3, the function is non-monotone since it can happen that the negative marginal contribution exceeds the positive one. For example, consider the following instance of the problem, where with :

and

Assume there is a set of counterfactual explanations . Then, the optimal policy is given by inducing a movement from feature values to feature value , giving a utility equal to . Now, add to the set of counterfactual explanations , . Then, the optimal policy is given by inducing a movement from feature value to feature value , giving a lower utility, equal to . Therefore, the function is non-monotone.

Fortunately, there exist efficient algorithms with global approximation guarantees for maximizing a non-monotone submodular function under cardinality constraints. For example, Buchbinder et al. (2014) have proposed a randomized polynomial time algorithm that can find a solution such that , where and are the optimal set of counterfactual explanations and decision policy, respectively. Algorithm 2 summarizes the whole procedure, which is just a randomized variation of the standard greedy algorithm (Algorithm 1). It starts from a solution set and it iteratively adds one counterfactual explanation . However, instead of greedily choosing the element that provides the maximum marginal difference , it sorts all the candidate elements with respect to their marginal difference (line 3) and picks one at random among the top (line 4). Finally, note that, since the above randomized algorithm has a complexity of , similarly as the standard greedy algorithm, we can use Proposition 4 to conclude that, for our problem, the algorithm has a complexity of .

0:  Ground set of counterfactual explanations , maximum number of counterfactual explanations and utility function
0:  Set of counterfactual explanations
1:  
2:  while  do
3:      GetTopK
4:     
5:     
6:  end while
7:  return
ALGORITHM 2 Randomized algorithm

Remarks. To enjoy a approximation guarantee, Algorithm 2 requires that there are candidate feature values whose marginal contribution to any set is zero. In our problem, this can be trivially satisfied by adding feature values to such that , and . If the algorithm adds some of those counterfactual explanations to the set , it is easy to see that we can ignore them without causing any difference in utility or the individuals’ best-responses.

5 Experiments

In this section, we perform a series of experiments on both synthetic and real data to evaluate Algorithms 1 and 2. To this aim, we compare the utility of the following decision policies and counterfactual explanations:

Black box: decisions are taken by the optimal decision policy in the non-strategic setting, given by Eq. 8, and individuals do not receive any counterfactual explanations.

Minimum cost: decisions are taken by the optimal decision policy in the non-strategic setting, given by Eq. 8, and individuals receive counterfactual explanations of minimum cost with respect to their initial feature values, similarly as in previous work (Tolomei et al., 2017; Karimi et al., 2019; Ustun et al., 2019). More specifically, we cast the problem of finding the set of counterfactual explanations as the minimization of the weighted average cost individuals pay to change their feature values to the closest counterfactual explanation, ,

and realize that this problem is a version of the k-median problem, which we can solve using a greedy heuristic 

(Solis-Oba, 2006).

Diverse: decisions are taken by the optimal decision policy in the non-strategic setting, given by Eq. 8, and individuals receive a set of diverse counterfactual explanations of minimum cost with respect to their initial feature values, similarly as in previous work (Russell, 2019; Mothilal et al., 2020), ,

To solve the above problem, we realize it can be reduced to the weighted version of the maximum coverage problem, which can be solved using a well-known greedy approximation algorithm (Hochbaum and Pathria, 1998).

Algorithm 1: decisions are taken by the optimal decision policy in the non-strategic setting, given by Eq. 8, and individuals receive counterfactual explanations given by Eq. 4, where is found using Algorithm 1.

Algorithm 2: decisions are taken by the decision policy given by Eq. 7 and individuals receive counterfactual explanations given by Eq. 4, where is found using Algorithm 2.

(a) Utility vs. # feature values
(b) Utility vs. # explanations
(c) Individual cost vs. # explanations
Figure 3: Results on synthetic data. Panels (a) and (b) show the utility achieved by six types of decision policies and counterfactual explanations against the total number of feature values and the number of counterfactual explanations , respectively. Panel (c) shows the average cost individuals had to pay to change from their initial features to the feature value of the counterfactual explanation they receive under the same five types of decision policies and counterfactual explanations. In Panel (a), we set and, in Panels (b) and (c), we set . In all panels, we repeat each experiment times.

5.1 Experiments on Synthetic Data

Experimental setup. For simplicity, we consider feature values and where

is sampled from a Gaussian distribution

truncated from below at zero. We also sample , for of all pairs and for the rest. Finally, we set .

Results. Figures 3(a,b) show the utility achieved by each of the decision policies and counterfactual explanations for several numbers of feature values and counterfactual explanations . We find several interesting insights: (i) the decision policies given by Eq. 7 and the counterfactual explanations found by Algorithm 2 beat all other alternatives by large margins across the whole spectrum, showing that jointly optimizing the decision policy and the counterfactual explanations offer clear additional gains; (ii) the counterfactual explanations found by Algorithms 1 and 2 provide higher utility gains as the number of feature values increases and thus the search space of counterfactual explanations becomes larger; and, (iii) a small number of counterfactual explanations is enough to provide significant gains in terms of utility with respect to the optimal decision policy without counterfactual explanations.

Figure 3(c) shows the average cost individuals had to pay to change from their initial features to the feature value of the counterfactual explanation they receive. As one may have expected, the results show that, under the counterfactual explanations of minimum cost (Minimum cost and Diverse), the individuals invest less effort to change their initial features and the effort drops as the number of counterfactual explanations increases. In contrast, our methods incentivize the individuals to achieve the highest self-improvement, particularly when we jointly optimize the decision policy and the counterfactual explanations.

(a) Utility vs.
(b) Utility vs. # explanations

(c) Utility vs. # explanations under leakage
Figure 4: Results on LendingClub data. Panels (a) and (b) show the utility achieved by five types of decision policies and counterfactual explanations against the value of the parameter , which controls how difficult it is to change feature values, and the number of counterfactual explanations , respectively. Panel (c) shows the utility achieved by the same six types of decision policies and counterfactual explanations against the number of counterfactual explanations for several values of the leakage probability . In Panel (a), we set and in Panels (b) and (c), we set . In all panels, it is and we repeat each experiment involving randomization times.

5.2 Experiments on Real Data

Experimental setup. We use a publicly available dataset111111https://www.kaggle.com/wordsforthewise/lending-club/version/3 with information about all accepted and rejected loan applications in LendingClub during the 2007-2018 period. For each application, the dataset contains various demographic features about the applicant and for each accepted application it contains the current loan status (, Current, Late, Fully Paid), the latest payment information and the FICO scores.

Here, we follow the same pre-processing steps as in Tabibian et al. (2020)

, which we recall in what follows, for completeness. We use information about accepted applications to train a decision tree classifier (DTC) with

leaves. This classifier predicts whether an applicant fully pays a loan () or defaults/has a charged-off debt () on the basis of a set of raw features, , the loan amount, employment length, state, debt to income ratio, zip code and credit score (more information about raw features can be found in Appendix A.1). The classifier achieves a % accuracy, as estimated using -fold cross-validation. Then, for each (accepted or rejected) application, its unidimensional feature is set to be the leaf of the DTC where the application is mapped into, given the raw features, and the conditional probability is approximated using the prediction of the DTC for the corresponding leaf. Then, we compute the cost between feature values by comparing the raw features of the applications mapped to each leaf of the DTC, where we multiply the cost by a scaling factor representing the hardness of changing features (Appendix A.2 provides more information on the computation of ). Moreover, we compute the cost of giving a loan to the -th percentile of the value for applicants who default/have a charged-off debt. Finally, we compare the utility achieved by our explanation methods with the same baselines we used on synthetic data.

Results. First, we just compare the utility achieved by each of the decision policies and counterfactual explanations for several values of the parameter and number of counterfactual explanations . Figures 4(a, b) summarizes the results, which show that, as increases and the cost of changing to feature values with higher outcome values decreases, the competitive advantage by jointly optimizing the decision policy and the counterfactual explanations (Algorithm 2) grows significantly. Moreover, similarly as in the experiments on synthetic data, just a small number of counterfactual explanations is sufficient to provide significant gains in terms of utility with respect to the optimal decision policy without counterfactual explanations.

Next, we challenge the assumption that individuals do not share the counterfactual explanations they receive with other individuals with different feature values. To this end, we assume that, given the set of counterfactual explanations found by Algorithm 2, individuals with initial feature value receive the counterfactual explanation given by Eq. 4 and, with probability , they also receive an additional explanation picked at random from and they follow the counterfactual explanation who benefits them the most. Figure 4(c) summarizes the results for several values of and number of counterfactual explanations, which show that, while providing counterfactual explanations is always better, if the leakage probability is large, we are better off providing a small number of counterfactual explanations.

6 Conclusions

In this paper, we have studied the problem of finding the decision policies and counterfactual explanations that maximize utility in a setting in which individuals who are subject to the decisions taken by the policies use the counterfactual explanations they receive to invest effort strategically. Given a pre-defined policy, we have first shown that finding the optimal set of counterfactual explanations is a hard computational problem but presents some favorable properties, which allow the standard greedy algorithm to enjoy approximation guarantees. Additionally, we have also shown that, given a set of counterfactual explanations, the optimal decision policy is deterministic and can be found in polynomial time. Finally, building on these results, we have shown that, perhaps surprisingly, the problem of jointly finding both the optimal policy and set of counterfactual explanations reduces to maximizing a non-monotone submodular function, a problem that can be solved using a randomized greedy algorithm with approximation guarantees.

By uncovering a previously unexplored connection between strategic machine learning and interpretable machine learning, our work opens up many interesting directions for future work. For example, we have adopted a specific type of mechanism to provide counterfactual explanations (, one feature value per individual using a Stackelberg formulation). A natural next step would be to extend our analysis to other types of mechanisms fitting a variety of real-world applications. Moreover, we have assumed that the cost individuals pay to change features is given. However, our algorithms would be more effective if we develop a methodology to reliably estimate the cost function from real observational (or interventional) data. In our work, we have assumed that features take discrete values and individuals who are subject to the decisions do not share information between them. It would be interesting to lift these assumptions, extend our analysis to real-valued feature values, and develop decision policies and counterfactual explanations that are robust to information sharing between individuals (refer to Figure 4(c)). Finally, by assuming that does not change after individuals best respond, we are implicitly assuming that the features are causal. However, in practice, this assumption is likely to be violated, as recently noted by Miller et al. (2019). It would be worth exploring the use of counterfactual explanations to distinguish between noncausal and causal features.

References

  • Brückner and Scheffer (2011) Michael Brückner and Tobias Scheffer. Stackelberg games for adversarial prediction problems. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 547–555, 2011.
  • Buchbinder et al. (2014) Niv Buchbinder, Moran Feldman, Joseph Naor, and Roy Schwartz. Submodular maximization with cardinality constraints. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 1433–1452. SIAM, 2014.
  • Calinescu et al. (2011) Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740–1766, 2011.
  • Chakraborty et al. (2017) Supriyo Chakraborty, Richard Tomsett, Ramya Raghavendra, Daniel Harborne, Moustafa Alzantot, Federico Cerutti, Mani Srivastava, Alun Preece, Simon Julier, Raghuveer M Rao, et al.

    Interpretability of deep learning models: a survey of results.

    In 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pages 1–6. IEEE, 2017.
  • Coate and Loury (1993) S. Coate and G. Loury. Will affirmative-action policies eliminate negative stereotypes? The American Economic Review, 1993.
  • Corbett-Davies et al. (2017) Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 797–806, 2017.
  • Dalvi et al. (2004) Nilesh Dalvi, Pedro Domingos, Sumit Sanghai, and Deepak Verma. Adversarial classification. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 99–108, 2004.
  • Dong et al. (2018) Jinshuo Dong, Aaron Roth, Zachary Schutzman, Bo Waggoner, and Zhiwei Steven Wu. Strategic classification from revealed preferences. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 55–70, 2018.
  • Doshi-Velez and Kim (2017) Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017.
  • Fryer and Loury (2013) R. Fryer and G. Loury. Valuing diversity. Journal of Political Economy, 2013.
  • Gunning and Aha (2019) David Gunning and David W Aha.

    Darpa’s explainable artificial intelligence program.

    AI Magazine, 40(2):44–58, 2019.
  • Hardt et al. (2016) Moritz Hardt, Nimrod Megiddo, Christos Papadimitriou, and Mary Wootters. Strategic classification. In Proceedings of the 2016 ACM conference on innovations in theoretical computer science, pages 111–122, 2016.
  • Hochbaum and Pathria (1998) Dorit S Hochbaum and Anu Pathria. Analysis of the greedy approach in problems of maximum k-coverage. Naval Research Logistics (NRL), 45(6):615–627, 1998.
  • Hu and Chen (2018) L. Hu and Y. Chen. A short-term intervention for long-term fairness in the labor market. In WWW, 2018.
  • Hu et al. (2019) Lily Hu, Nicole Immorlica, and Jennifer Wortman Vaughan. The disparate effects of strategic manipulation. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 259–268, 2019.
  • Karimi et al. (2019) Amir-Hossein Karimi, Gilles Barthe, Borja Belle, and Isabel Valera. Model-agnostic counterfactual explanations for consequential decisions. arXiv preprint arXiv:1905.11190, 2019.