Bayesian Persuasion for Algorithmic Recourse

12/12/2021
by   Keegan Harris, et al.
Carnegie Mellon University
0

When subjected to automated decision-making, decision subjects may strategically modify their observable features in ways they believe will maximize their chances of receiving a favorable decision. In many practical situations, the underlying assessment rule is deliberately kept secret to avoid gaming and maintain competitive advantage. The resulting opacity forces the decision subjects to rely on incomplete information when making strategic feature modifications. We capture such settings as a game of Bayesian persuasion, in which the decision maker offers a form of recourse to the decision subject by providing them with an action recommendation (or signal) to incentivize them to modify their features in desirable ways. We show that when using persuasion, both the decision maker and decision subject are never worse off in expectation, while the decision maker can be significantly better off. While the decision maker's problem of finding the optimal Bayesian incentive-compatible (BIC) signaling policy takes the form of optimization over infinitely-many variables, we show that this optimization can be cast as a linear program over finitely-many regions of the space of possible assessment rules. While this reformulation simplifies the problem dramatically, solving the linear program requires reasoning about exponentially-many variables, even under relatively simple settings. Motivated by this observation, we provide a polynomial-time approximation scheme that recovers a near-optimal signaling policy. Finally, our numerical simulations on semi-synthetic data empirically illustrate the benefits of using persuasion in the algorithmic recourse setting.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/07/2021

Stateful Strategic Regression

Automated decision-making tools increasingly assess individuals to deter...
11/03/2020

Maximizing Welfare with Incentive-Aware Evaluation Mechanisms

Motivated by applications such as college admission and insurance rate d...
05/04/2018

Algorithmic Decision Making in the Presence of Unmeasured Confounding

On a variety of complex decision-making tasks, from doctors prescribing ...
08/12/2019

Near-optimal Robust Bilevel Optimization

Bilevel optimization studies problems where the optimal response to a se...
07/09/2019

The Secretary Recommendation Problem

In this paper we revisit the basic variant of the classical secretary pr...
04/01/2020

Network Orchestration in Mobile Networks via a Synergy of Model-driven and AI-based Techniques

As data traffic volume continues to increase, caching of popular content...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

High-stakes decision-making systems increasingly utilize data-driven algorithms to assess individuals in such domains as education [kuvcak2018machine], employment [chalfin2016productivity, raghavan2020mitigating], and lending [jagtiani2019roles]. Individuals subjected to these assessments (henceforth, decision subjects) may strategically modify their observable features in ways they believe maximize their chances of receiving favorable decisions [homonoff2021does, citron2014scored]. The decision subject often has a set of actions/interventions available to them. Each of these actions leads to some measurable effect on their observable features, and subsequently, their decision. From the decision maker’s perspective, some of these actions may be more desirable than others. Consider credit scoring as an example.222Other examples of strategic settings which arise as a result of decision-making include college admissions, in which a college/university (decision maker) decides whether or not to admit a prospective student (decision subject), hiring, in which a company decides whether or not to hire a job applicant, and lending, in which a banking institution decides to accept or reject someone applying for a loan. Oftentimes, the decision maker is aided by automated decision-making tools in these situations (e.g., [kuvcak2018machine, sanchez2020does, jagtiani2019roles]). Credit scores predict how likely an individual applicant is to pay back a loan on time. Financial institutions regularly utilize credit scores to decide whether to offer applicants their financial products and determine the terms and conditions of their offers (e.g., by setting the interest rate or credit limit). Given their (partial) knowledge of credit scoring instruments, applicants regularly attempt to improve their scores. For instance, a business applying for a loan may improve its score by paying off existing debt or cleverly manipulating its financial records to appear more profitable. While both of these interventions may improve credit score, the former is more desirable than the latter from the perspective of the financial institution offering the loan. The question we are interested in answering in this work is: how can the decision maker incentivize decision subjects to take such beneficial actions while discouraging manipulations?

The strategic interactions between decision-making algorithms and decision subjects has motivated a growing literature known as strategic learning (see e.g., [hardt2016strategic, dongetal, shavit2020causal, kleinberg2020classifiers, harris2021stateful]). While much of the prior work in strategic learning operates under the assumption of full transparency (i.e., the assessment rule is public knowledge), we consider settings where the full disclosure of the assessment rule is not a viable alternative. In many real-world situations, revealing the exact logic of the decision rule is either infeasible or irresponsible. For instance, credit scoring formulae are closely guarded trade secrets, in part to prevent the risk of default rates surging if applicants learn how to manipulate them. In such settings, the decision maker may still have a vested interest in providing some information about the decision rule to decision subjects to provide a certain level of transparency and recourse. In particular, the decision maker may be legally obliged, or economically motivated, to guide decision subjects to take actions that improve their underlying qualifications. To do so, the decision maker can recommend actions for decision subjects to take. Of course, such recommendations need to be chosen carefully and credibly; otherwise, self-interested decision subjects may not follow them or, even worse, they may utilize the recommendations to find pathways for manipulation.

We study a model of strategic learning in which the underlying assessment rule is not revealed to decision subjects. Our model captures several key aspects of the setting described above: First, even though the assessment rule is not revealed to the decision subjects, they often have prior knowledge about what the rule may be. Secondly, when the decision maker provides recommendations to decision subjects on which action to take, the recommendations should be compatible with the subjects’ incentives to ensure they will follow the recommendation. Finally, our model assumes the decision maker discloses how they generate recommendations for recourse—an increasingly relevant requirement under recent regulations (e.g.,  [gdpr]).

Utilizing our model, we aim to design a mechanism for a decision maker to provide recourse to a decision subject with incomplete information about the underlying assessment rule. We assume the assessment rule makes predictions about some future outcome of the decision subject (e.g., whether they pay back the loan in time if granted). Before the assessment rule is trained (i.e., before the model parameters are fit), the decision maker and decision subject have some prior belief about the realization of the assessment rule. This prior represents the “common knowledge” about the importance of various observable features for making accurate predictions. After training, the assessment rule is revealed to the decision maker, who then recommends an action for the decision subject to take, based on their pre-determined signaling policy. Upon receiving this action recommendation, the decision subject updates their belief about the underlying assessment rule. They then take the action which they believe (according to the update belief) will maximize their expected utility (i.e., the benefit from the decision they receive, minus the cost of taking their selected action). Finally, the decision maker uses the assessment rule to make a prediction about the decision subject.

The interaction described above is an instance of Bayesian persuasion, a game-theoretic model of information revelation originally due to kamenica2011bayesian. For background on the general Bayesian persuasion model, see Section 1.1. The specific instance of Bayesian persuasion we consider in this work is summarized below.

Interaction protocol for our setting   Before training, the decision maker and decision subject have some prior/belief about the true assessment rule. After training, the assessment rule is revealed to the decision maker. The decision maker then uses their signaling policy and knowledge of the assessment rule to recommend an action for the decision subject to take. The decision subject updates their belief given the recommendation. They then take a (possibly different) action, and receive a prediction through the assessment rule.

Our contributions

Our central conceptual contribution is to cast the problem of offering recourse under partial transparency as a game of Bayesian persuasion. Our key technical contributions consist of comparing optimal action-recommendation policies in this new setup with two natural alternatives: (1) fully revealing the assessment rule to the decision subjects, or (2) revealing no information at all about the assessment rule. We provide new insights about the potentially significant advantages of action recommendation over these baselines, and offer efficient formulations to derive the optimal recommendations. More specifically, our analysis offers the following takeaways:

  1. Using tools from Bayesian persuasion, we show that it is possible for the decision maker to provide incentive-compatible action recommendations that encourage rational decision subjects to modify their features through beneficial interventions (Section 2.1).

  2. Perhaps most importantly, we show that the optimal signaling policy is more effective than the above two baselines in encouraging positive interventions on the part of the decision subjects (Section 3).

  3. While the decision maker and decision subjects are never worse off in expectation from using optimal incentive-compatible recommendations, we show that situations exist in which the decision maker is significantly better off in expectation utilizing the optimal signaling policy (as opposed to the two baselines) (Section 3.1).

  4. We derive the optimal signaling policy for the decision maker. While the decision maker’s optimal signaling policy initially appears challenging (as it involves optimizing over continuously-many variables), we show that the problem can naturally be cast as a linear program (Section 4).

  5. We show that even for relatively simple examples, solving this linear program requires reasoning about exponentially-many variables. Motivated by this observation, we provide a polynomial-time algorithm to approximate the optimal signaling policy up to additive terms (Section 5).

  6. Finally, we empirically evaluate our persuasion mechanism on semi-synthetic data based on the Home Equity Line of Credit (HELOC) dataset, and find that the optimal signaling policy performs significantly better than the two natural alternatives in practice (Section 6).

1.1 Related Work

Bayesian Persuasion. In its most basic form, Bayesian persuasion [kamenica2011bayesian] is modeled as a game between a sender (with private information) and a receiver. At the beginning of the game, the sender and receiver share a prior over some unknown state of nature, which will eventually be revealed to the sender. Before the state of nature is revealed, the sender commits to a signaling policy, a (probabilistic) mapping from states of nature to action recommendations.333Such commitment is especially possible when the sender is a software agent (as is the case in our setting), since the agent is committed to playing the policy prescribed by its code once it is deployed. After the sender commits to a signaling policy, the state of nature is revealed to the sender, who then sends a signal (according to their policy) to the receiver. The receiver uses this signal to form a posterior over the possible states of nature, and then takes an action which affects the payoffs of both players. Several extensions to the original Bayesian persuasion model have been proposed, including persuasion with multiple receivers [arieli2019private], persuasion with multiple senders [li2018bayesian], and persuasion with heterogeneous priors [alonso2016bayesian]

. There has been growing interest in persuasion in the computer science and machine learning communities in recent years.

dughmi2017algorithmic2, dughmi2019algorithmic characterize the computational complexity of computing the optimal signaling policy for several popular models of persuasion. castiglioni2020online study the problem of learning the receiver’s utilities through repeated interactions. Work in the multi-arm bandit literature [mansour2015bayesian, MansourSSW16, immorlica2019bayesian, chen2018incentivizing, sellke2021price] leverages techniques from Bayesian persuasion to incentivize agents to perform bandit exploration.

Strategic responses to unknown predictive models. To the best of our knowledge, our work is the first to use tools from persuasion to model the strategic interaction between a decision maker and strategic decision subjects when the underlying predictive model is not public knowledge. Several prior articles have addressed similar problems through different models and techniques. For example, akyol2016price quantify the “price of transparency”, a quantity which compares the decision maker’s utility when the predictive model is fully known with their utility when the model is not revealed to the decision subjects. ghalme2021strategic

compare the prediction error of a classifier when it is public knowledge with the error when decision subjects must learn a version of it, and label this difference the “price of opacity”. They show that small errors in decision subjects’ estimates of the true underlying model may lead to large errors in the performance of the model. The authors argue that their work provides formal incentives for decision makers to adopt full transparency as a policy. Our work, in contrast, is based on the observation that even if decision makers are willing to reveal their models, legal requirements, privacy concerns, and intellectual property restrictions may prohibit full transparency. So we instead study the consequences of partial transparency—a commonplace condition in real-world domains.

bechavod2021information study the effects of information discrepancy across different sub-populations of decision subjects on their ability to improve their observable features in strategic learning settings. Like us, they do not assume the predictive model is fully known to the decision subjects. Instead, the authors model decision subjects as trying to infer the underlying predictive model by learning from their social circle of family and friends, which naturally causes different groups to form within the population. In contrast to this line of work, we study a setting in which the decision maker provides customized feedback to each decision subject individually. Additionally, while the models proposed by [ghalme2021strategic, bechavod2021information] circumvent the assumption of full information about the deployed model, they restrict the decision subjects’ knowledge to be obtained only through past data.

Algorithmic recourse. Our work is closely related to recent work on algorithmic recourse [karimi2021survey]. Algorithmic recourse is concerned with providing explanations and recommendations to individuals who are unfavorably treated by automated decision-making systems. A line of algorithmic recourse methods including [wachter2017counterfactual, ustun2019actionable, joshi2019towards] focus on finding recourses that are actionable, or realistic, for decision subjects to take to improve their decision. In contrast, our action recommendations are “actionable” in the sense that they are interventions which promote long-term desirable behaviors while ensuring that the decision subject is not worse off in expectation. Finally, more recent work [slack2021counterfactual] shows that existing recourse methods based on counterfactual approaches are not robust to manipulations. Our approach to recourse is not counterfactual-based and instead uses a Bayesian persuasion mechanism to ensure decision subject compliance.

Transparency. Recent legal and regulatory frameworks, such as the General Data Protection Regulation (GDPR) [gdpr], motivate the development of forms of algorithmic transparency suitable for real-world deployment. While this work can be thought of as providing additional transparency into the decision-making process, it does not naturally fall into the existing organizations of explanation methods (e.g., as outlined in [chen2021towards]), as our policy does not simply recommend actions based on the decision rule. Rather, our goal is to incentivize actionable interventions on the decision subjects’ observable features which are desirable to the decision maker, and we leverage persuasion techniques to ensure compliance. One of the most prevalent use cases of automated decision-making is credit scoring models (a widely used one is the FICO scoring model). These models evaluate an individual’s credit worthiness based on that individual’s payment history, credit utilization, credit history, and other factors, all of which are then weighted based on proprietary formulas. Two statutes, the Fair Credit Reporting Act (FCRA) and the Equal Credit Opportunity Act (ECOA), govern these models and enforce a requirement to provide individuals who are adversely impacted by such automated decision-making with a statement of reasons, or an outcome-based explanation [selbst2018intuitive].

Other strategic learning settings. The strategic learning literature [hardt2016strategic, ghalme2021strategic, chen2021strategic, levanon2021strategic, jagadeesan2021alternative, bechavod2021information, harris2021stateful, harris2021strategic, kleinberg2020classifiers, frankel2019improving] broadly studies machine learning questions in the presence of strategic decision subjects. There has been a long line of work in strategic learning that focuses on how strategic decision subjects adapt their input to a machine learning algorithm in order to receive a more desirable prediction, although most prior work in this literature assumes that the underlying assessment rule is fully revealed to the decision subjects, which is typically not true in reality.

2 Setting and Background

Consider a setting in which a decision maker assigns a predicted label (e.g., whether or not someone will repay a loan if granted one) to a decision subject with observable features (e.g., amount of current debt, bank account balance, etc.).444We append a

to the decision subject’s feature vector for notational convenience.

We assume the decision maker uses a linear decision rule to make predictions, i.e., , where the assessment rule is chosen by the decision maker. The goal of the decision subject is to receive a positive classification (e.g., get approved for a loan). Given this goal, the decision subject may choose to take some action from some set of possible actions to modify their observable features (for example, they may decide to pay off a certain amount of existing debt, or redistribute their debt to game the credit score). We assume that the decision subject has actions at their disposal in order to improve their outcomes. For convenience, we add to to denote taking "no action". By taking action , the decision subject incurs some cost . This could be an actual monetary cost, but it can also represent non-monetary notions of cost such as opportunity cost or the time/effort cost the decision subject may have to exert to take the action. We assume taking an action changes a decision subject’s observable feature values from to , where , and specifies the change in the th observable feature as the result of taking action . For the special case of , we have , . As a result of taking action , a decision subject, ds, receives utility . In other words, the decision subject receives some positive (negative) utility for a positive (negative) classification, subject to some cost for taking said action.

If the decision subject had exact knowledge of the assessment rule used by the decision maker, they could solve an optimization problem to determine the best action to take in order to maximize their utility. However, in many settings it is not realistic for a decision subject to have perfect knowledge of . Instead, we model the decision subject’s information through a prior over , which can be thought of as “common knowledge” about the relative importance of each observable feature to the classifier. We will use

to denote the probability density function of

(so that

denotes the probability of the deployed assessment rule being

). We assume the decision subject is rational and risk-neutral. So at any point during the interaction, if they hold a belief about the underlying assessment rule, they would pick an action that maximize their expected utility with respect to that belief. More precisely, they solve:

From the decision maker’s perspective, some actions may be more desirable than others. For example, a bank may prefer that an applicant pay off more existing debt than less when applying for a loan. To formalize this notion of action preference, we say that the decision maker receives some utility when the decision subject takes action . In the loan example, .

2.1 Bayesian Persuasion in the Algorithmic Recourse Setting

The decision maker has an information advantage over the decision subject, due to the fact that they know the true assessment rule , whereas the decision subject does not. The decision maker may be able to leverage this information advantage to incentivize the decision subject to take a more favorable action (compared to the one they would have taken according to their prior) by recommending an action to the decision subject according to a commonly known signaling policy.

Definition 2.1 (Signaling Policy).

A signaling policy is a (possibly stochastic) mapping from assessment rules to actions.555Note that since our model is focused on the decision maker’s interactions with a single decision subject, we drop the dependence of on the decision subject’s characteristics.

We use to denote the action recommendation sampled from signaling policy , where is a realization from .

The decision maker’s signaling policy is assumed to be fixed and common knowledge. This is because in order for the decision subject to perform a Bayesian update based on the observed recommendation, they must know the signaling policy. Additionally, the decision maker must have the power of commitment, i.e., the decision subject must believe that the decision maker will select actions according to their signaling policy. In our setting, this means that the decision maker must commit to their signaling policy before training their assessment rule. This can be seen as a form of transparency, as the decision maker is publicly committing to how they will use their assessment rule to provide action recommendations/recourse before they even train it. For simplicity, we assume that the decision maker shares the same prior beliefs as the decision subject over the observable features before the model is trained. These assumptions are standard in the Bayesian persuasion literature (see, e.g., [kamenica2011bayesian, mansour2015bayesian, MansourSSW16]).

In order for the decision subject to be incentivized to follow the actions recommended by the decision maker, the signaling policy needs to be Bayesian incentive-compatible.

Definition 2.2 (Bayesian incentive-compatibility).

Consider a decision subject ds with initial observable features and prior . A signaling policy is Bayesian incentive-compatible (BIC) for ds if

(1)

for all actions such that had positive support on .

In other words, a signaling policy is BIC if, given that the decision maker recommends action , the decision subject’s expected utility is at least as high as the expected utility of taking any other action under the posterior.

We remark that while for the ease of exposition our model focuses the interactions between the decision maker and a single decision subject, our results can be extended to a heterogeneous population of decision subjects. Under such a heterogeneous setting, the decision maker would publicly commit to a method of computing the signaling policy, given a decision subject’s initial observable features as input. Once a decision subject arrives, their feature values are observed and the signaling policy is computed.

3 The Motivation Behind Persuasion

As is the case in the Bayesian persuasion literature [kamenica2011bayesian, kamenica2019bayesian, dughmi2019algorithmic], the decision maker can in general achieve a higher expected utility with an optimized signaling policy than the utilities had they provided no recommendation or fully disclosed the model. To characterize how much leveraging the decision maker’s information advantage (by recommending actions according to a BIC signaling policy) may improve their expected utility, we study the following example.

Consider a simple setting under which a single decision subject has one observable feature (e.g., credit score) and two possible actions: “do nothing” (i.e., , , ) and “pay off existing debt” (i.e., , , ), which in turn raises their credit score. For the sake of our illustration, we assume credit-worthiness to be a mutually desirable trait, and credit scores to be a good measure of credit-worthiness. We assume the decision maker would like to design a signaling policy to maximize the chance of the decision subject taking action , regardless of whether or not the applicant will receive the loan. In this simple setting, the decision maker’s decision rule can be characterized by a single threshold parameter , i.e., the decision subject receives a positive classification if and a negative classification otherwise. Note that while the decision subject does not know the exact value of , they instead have some prior over it, denoted by .

Given the true value of , the decision maker recommends an action for the decision subject to take. The decision subject then takes a possibly different action , which changes their observable feature from to . Recall that the decision subject’s utility takes the form . Note that if , then holds for any value of , meaning that it is impossible to incentivize any rational decision subject to play action . Therefore, in order to give the decision maker a “fighting chance” at incentivizing action , we assume the cost of action is such that .

We observe that in this simple setting, we can bin values of into three different “regions”, based on the outcome the decision subject would receive if were actually in that region. First, if , the decision subject will not receive a positive classification, even if they take action . In this region, the decision subject’s initial feature value is “too low” for taking the desired action to make a difference in their classification. We refer to this region as region . Second, if , the decision subject will receive a positive classification no matter what action they take. In this region, is “too high” for the action they take to make any difference on their classification. We refer to this region as region . Third, if and , the decision subject will receive a positive classification if they take action and a negative classification if they take action . We refer to this region as region . Consider the following signaling policy.

Signaling policy   [leftmargin=0.65in] . Recommend action with probability and action with probability . Recommend action with probability . Recommend action with probability and action with probability

In Case 2, recommends the action () that the decision subject would have taken had they known the true , with probability . However, in Case 1 and Case 3, the decision maker recommends, with probability , an action () that the decision subject would not have taken knowing , leveraging the fact that the decision subject does not know exactly which case they are currently in. If the decision subject follows the decision maker’s recommendation from , then the decision maker expected utility will increase from to if the realized or , and will remain the same otherwise. Intuitively, if is “small enough” (where the precise definition of “small” depends on the prior over and the cost of taking action ), then it will be in the decision subject’s best interest to follow the decision maker’s recommendation, even though they know that the decision maker may sometimes recommend taking action when it is not in their best interest to take that action! That is, the decision maker may recommend that a decision subject pay off existing debt with probability when it is unnecessary for them to do so in order to secure a loan. We now give a criteria on which ensures the signaling policy is BIC.

Proposition 3.1.

Signaling policy is Bayesian incentive-compatible if , where .

Proof Sketch. We show that and . Since these conditions are satisfied, is BIC.

Proof.

Based on the decision subject’s prior over , they can calculate

  • , i.e., the probability the decision subject is in region according to the prior

  • , i.e., the probability the decision subject is in region according to the prior

  • , i.e., the probability the decision subject is in region according to the prior

Case 1: . Given the signal , the decision subject’s posterior probability density function over , , and will take the form

If the decision subject receives signal , they know with probability that they are not in region with probability . Therefore, they know that taking action will not change their classification, so they will follow the decision maker’s recommendation and take action .

Case 2: . Given the signal , the decision subject’s posterior density over , , and will take the form

The decision subject’s expected utility of taking actions and under the posterior induced by are

and

In order for to be BIC,

Plugging in our expressions for and , we see that

After canceling terms and simplifying, we see that

Next, we plug in for , , and . Note that the denominators of , , and cancel out.

Solving for , we see that

Note that always. Finally, in order for to be a valid probability, we restrict such that

This completes the proof. ∎

(a) as a function of (, left) and (, right).
(b) as a function of (, left) and (, right).
Figure 1: Illustration of how (the probability of recommending action when , left) and (the expected decision maker utility, right) change as a function of and . As increases, and expected utility increase until , at which point they remain constant. As increases, and remain constant until taking action becomes prohibitively expensive, at which point both start to decay.

Under this setting, the decision maker will achieve expected utility . See Figure 1 for an illustration of how and vary with and .

But how much better can the decision maker do by recommending actions via a BIC signaling policy, compared to natural alternatives? We answer this question concretely in the following section.

3.1 Unbounded Utility Improvements Using Persuasion

As we will see in Section 4, the expected utility of the decision maker when recommending actions via the optimal (BIC) signaling policy is trivially no worse than their expected utility if they had revealed full information about the assessment rule to the decision subject, or if they had revealed no information and let the decision subject act according to the prior. In this section, we show that the decision maker’s expected utility when recommending actions according to the optimal signaling policy can be arbitrarily higher than their expected utility from revealing full information or no information. In particular, we prove the following theorem.

Theorem 3.2.

For any , there exists a problem instance such that the expected decision maker utility from recommending actions according to the optimal signaling policy is and the expected decision maker utility for revealing full information or revealing no information is at most .

Proof.

Consider the example in Section 3.

Expected utility from revealing no information. If the decision subject acts exclusively according to the prior, they will select action with probability if and with probability otherwise. Plugging in our expressions for and , we see that the decision subject will select action only if

Canceling terms and simplifying, we see that

must hold for the decision subject to select action . Finally, substituting gives us the condition . Alternatively, if , the decision subject will select action with probability . Intuitively, this means that a rational decision subject would take action if the ratio of (the probability according to the prior that taking action is in the decision subject’s best interest) to (the cost of taking action ) is high, and would take action otherwise.

Expected utility from revealing full information. If the decision maker reveals the assessment rule to the decision subject, they will select action when and action otherwise. Therefore since and , the decision maker’s expected utility if they reveal full information is .

Expected utility from . Recall that the decision maker’s signaling policy from Section 3 sets . Under this setting, the decision maker’s expected utility is . Substituting in our expression for and simplifying, we see that the decision maker’s expected utility for recommending actions via is .

Suppose that and , for some small . The decision maker’s expected utility will always be from revealing no information because . The decision maker’s expected utility from recommending actions via will be . Since , the decision maker’s expected utility from revealing full information will be less than . Therefore, as approaches , the decision maker’s expected utility from revealing full information approaches (the smallest value possible), and the decision maker’s expected utility from approaches (the highest value possible). This completes the proof. ∎

The decision maker’s expected utility as a function of their possible strategies is summarized in Table 1. Note that when , . Therefore, the decision maker’s expected utility is always as least as good as the two natural alternatives of revealing no information about the assessment rule, or revealing full information about the rule.

No information Signaling with Full information
Decision maker utility
Table 1: Decision maker’s expected utility when (1) revealing no information about the model, (2) recommending actions according to , and (3) revealing full information about the model. See Section 3.1 for the full derivations.

4 Optimal Signaling Policy

In Section 3, we show a one-dimensional setting, where a signaling policy can obtain unbounded better utilities compared to revealing full information and revealing no information. We now derive the decision maker’s optimal signaling policy for the general setting with arbitrary numbers of observable features and actions described in Section 2. Under the general setting, the decision maker’s optimal signaling policy can be described by the following optimization:

(2)
s.t.

where we omit the valid probability constraints over for brevity. In words, the decision maker wants to design a signaling policy in order to maximize their expected utility, subject to the constraint that the signaling policy is BIC. At first glance, the optimization may initially seem hopeless as there are infinitely many values of (one for every possible ) that the decision maker’s optimal policy must optimize over. However, we will show that the decision maker’s optimal policy can actually be recovered by optimizing over finitely many variables.

By rewriting the BIC constraints as integrals over and applying Bayes’ rule, our optimization over takes the following form

s.t.

Note that if is the same for some “equivalence region” (which we formally define below), we can pull out of the integral and instead sum over the different equivalence regions. Intuitively, an equivalence region can be thought of as the set of all pairs that are indistinguishable from a decision subject’s perspective because they lead to the exact same utility for any possible action the decision subject could take. Based on this idea, we formally define a region of as follows.

Definition 4.1 (Equivalence Region).

Two assignments are equivalent (w.r.t. ) if , . An equivalence region is a subset of such that for any , all equivalent to are also in . We denote the set of all equivalence regions by .

In Figure 2, we show an example of how different equivalence regions might partition the space of possible assessment rules . In this example, there are two actions and two observable features, and the space of is partitioned into three different equivalence regions. Note that as long as the set of actions is finite, . After pulling the decision subject utility function out of the integral, our optimization takes the following form:

Figure 2: An illustration of the equivalence regions for a two action (, ) and two observable feature () setting, where . Consider an individual with , , and . The equivalence regions of are quadrants described the set of actions the decision subject could take in order to receive a positive classification. Region contains the bottom-left and top-right quadrants of , region contains the bottom-right quadrant of , and region contains the top-left quadrant of .
s.t.

Now that the decision subject’s utility no longer depends on , we can integrate over each equivalence region . We denote as the probability that the true according to the prior.

s.t.

Since it is possible to write the constraints in terms of , , it suffices to optimize directly over these quantities. The final step is to rewrite the objective. For completeness, we include the constraints which make each ,

a valid probability distribution.

Theorem 4.2 (Optimal signaling policy).

The decision maker’s optimal signaling policy can be characterized by the following linear program OPT-LP:

(OPT-LP)
s.t.

where denotes the probability of sending recommendation if . Note that the linear program OPT-LP is always feasible, as the decision maker can always recommend the action the decision subject would play according to the prior, which is BIC.

5 Computing the Optimal Signaling Policy

In Section 4, we show that the problem of determining the decision maker’s optimal signaling policy can be transformed from an optimization over infinitely many variables into an optimization over the set of finitely many equivalence regions (Theorem 4.2). However, as we will show in Section 5.1, computing the decision maker’s optimal signaling policy by solving (OPT-LP) requires reasoning over exponentially-many variables, even in relatively simple settings. This motivates the need for a computationally efficient algorithm to approximate (OPT-LP), which we present in Section 5.2.

5.1 Computational Barriers

In this section, we show that even in the setting where each action only affects one observable feature (e.g., as shown in Figure 3), the number of equivalence regions in (OPT-LP) is still exponential in the size of the input. While somewhat simplistic, we believe this action scheme reasonably reflects real-world settings in which the decision subjects are under time or resource constraints when deciding which action to take. For example, the decision subject may need to choose between paying off some amount of debt and opening a new credit card when strategically modifying their observable features before applying for a loan.

Figure 3: Graphical representation of special ordering over the actions available to each decision subject. Each branch corresponds to an observable feature and each node corresponds to a possible action the decision subject may take. The root corresponds to taking no action (denoted by ). Nodes further away from the root on branch correspond to higher , i.e., .

Under this setting, (OPT-LP) optimizes over variables, where is the number of actions available to each agent and is the number of equivalence regions. In order to determine the size of , we note that an equivalence region can be alternatively characterized by observing that assessment rules and belong to the same equivalence region if the difference in their predictions for any two actions and is the same. (This follows from straightforward algebraic manipulation of Definition 4.1.) As such, an equivalence region can essentially be characterized by the set of actions which receive a positive classification when .666Specifically, if taking action results in a positive classification for some and a negative classification for , the only way for and to be in the same equivalence region is if taking any action in results in a positive classification for and a negative classification for . Besides this special case, if and result in different classifications for the same action, they are in different equivalence regions.

Armed with this new characterization of an equivalence region, we are now ready to show the scale of for the setting described in Figure 3.

Proposition 5.1.

For the setting described in Figure 3, there are equivalence regions, where is the number of observable features of the decision subject and () is the number of actions the decision subject has at their disposal to improve observable feature .

Proof.

In order to characterize the number of equivalence regions , we define the notion of a dominated action , where an action is dominated by some other action if , with strict inequality holding for at least one index. Using this notion of dominated actions and our refined characterization of an equivalence region, it is straightforward to see that if action is dominated by action , then for any equivalence region where . Proposition 5.1 then follows directly from the fact that each action only affects one observable feature. ∎

Proposition 5.1 shows that the computation of (OPT-LP) quickly becomes intractable as the number of observable features grows large, even in this relatively simple setting. This motivates the need for an approximation algorithm for (OPT-LP), which we present in Section 5.2.

5.2 An Efficient Approximation Algorithm

Motivated by the results in Section 5.1, we aim to design a computationally efficient approximation scheme to compute an approximately optimal signaling policy for the decision maker. In particular, we adapt the sampling-based approximation algorithm of dughmi2019algorithmic to our setting in order to compute an -optimal and -approximate signaling policy in polynomial time, as shown in Algorithm 1. At a high level, Algorithm 1 samples polynomially-many times from the prior distribution over the space of assessment rules, and solves an empirical analogue of (OPT-LP). We show that the resulting signaling policy is -BIC, and is -optimal with high probability, for any .

Input: ,
Output: Signaling policy (where region contains )
Set
Pick uniformly at random. Set .
Sample .
Let denote the set of observed regions. Compute , , where is the empirical probability of .
Solve
(APPROX-LP)
s.t.
Return signaling policy .
ALGORITHM 1 Approximation Algorithm for (OPT-LP)
Theorem 5.2.

Algorithm 1 runs in poly() time (where ), and implements an -BIC signaling policy that is -optimal with probability at least .

Proof.

Our proof is similar to the approximation algorithm proof in dughmi2019algorithmic, and follows directly from the following lemmas, whose proofs are in Appendix A. First, since the approximation algorithm solves an approximation LP (APPROX-LP) of polynomial size, it runs in polynomial time.

Lemma 5.3.

Algorithm 1 runs in poly() time.

By bounding the approximation error in the BIC constraints of (APPROX-LP), we show that the resulting policy satisfies approximate BIC.

Lemma 5.4.

Algorithm 1 implements an -BIC signaling policy.

Next, we show that a feasible solution to (APPROX-LP) exists which achieves expected decision maker utility at least OPT - with probability at least . In order to do so, we first show that there exists an approximately optimal solution to (OPT-LP) such that each signal is either (i) large (i.e., output with probability above a certain threshold), or (ii) honest (i.e., the signal recommends the action the decision subject would take, had they known the true assessment rule ). Next, we show that is a feasible solution to (APPROX-LP) with high probability by applying McDiarmid’s inequality [mcdiarmid1989method] and a union bound.

Lemma 5.5.

There exists an -optimal signaling policy that is large or honest.

Lemma 5.6.

With probability at least , is a feasible solution to (APPROX-LP) and the expected decision maker utility from playing is at least OPT - .

By Lemmas 5.5 and 5.6, the decision maker’s expected utility will be at least OPT - with probability at least . ∎

Bi-criteria approximation. It is important to note that the signaling policy from Algorithm 1 is both -optimal and -incentive compatible. While one may wonder whether (i) an -optimal and exactly incentive compatible signaling policy exists, or (ii) an exactly optimal and -incentive compatible signaling policy exists, dughmi2019algorithmic show that this is generally not possible for sampling-based approximation algorithms for Bayesian persuasion (see Theorem 27 in dughmi2019algorithmic). Note that unlike the other results in dughmi2019algorithmic, these results directly apply to the setting we consider.

Computational complexity. Recall that the algorithm for computing the optimal policy runs in time polynomial in the number of equivalence regions , which can scale exponentially in the number of actions . However, without any structural assumptions, the input prior over the space of assessment rules can scale exponentially in the number of features . When and are comparable, our algorithm runs in time polynomial in the input size. We leave open the question of whether there are classes of succinctly represented prior distributions that permit efficient algorithms for computing the optimal policy in time polynomial in and . It is also plausible to design efficient algorithms that only require some form of query access to the prior distribution. However, information-theoretic lower bounds of [dughmi2019algorithmic] rule out the query access through sampling, as they show that no sampling-based algorithm can compute the optimal signaling policy with finite samples across all problem instances.

6 Experiments

In this section, we provide experimental results that validate our findings using a semi-synthetic setting where decision subjects are based on individuals in the Home Equity Line of Credit (HELOC) dataset [FICO]. We compare the decision maker utility for different models of information revelation: our optimal signaling, revealing full information, revealing no information. To do so, we first estimate agent costs using the Bradley-Terry model  [10.2307/2334029] and compute the decision maker’s expected utility for each information revelation scheme we consider. We find that the expected decision maker utility when recommending actions according to the optimal signaling policy either matches or exceeds the expected utility from revealing full information or no information about the assessment rule across all problem instances. Moreover, the expected decision maker utility from signaling is significantly higher on average. Next, we explore how the decision maker’s expected utility changes when action costs and changes in observable features are varied jointly. Our results are summarized in Figures 4, 5, and 6.

The HELOC dataset contains information about 9,282 customers who received a Home Equity Line of Credit. Each individual in the dataset has 23 observable features related to an applicant’s financial history (e.g., percentage of previous payments that were delinquent) and a label which characterizes their loan repayment status (repaid/defaulted). In order to adapt the HELOC dataset to our strategic setting, we select four features from the original 23 and define five hypothetical actions that decision subjects may take in order to improve their observable features. Actions result in changes to each of the decision subject’s four observable features, whereas action does not. For simplicity, we view actions as equally desirable to the decision maker, and assume they are all more desirable than . See Table 2

for details about the observable features and actions we consider. Using these four features, we train a logistic regression model that predicts whether an individual is likely to pay back a loan if given one, which will serve as the decision maker’s realized assessment rule.

Pair Feature () Action ()
# payments with high-utilization ratio decrease this value
# satisfactory payments increase this value
% payments that were not delinquent increase this value
revolving balance to credit limit ratio decrease this value
Table 2: Decision subject’s observable features from the HELOC dataset and corresponding actions to improve each feature. For simplicity, we assume that each action only affects one observable feature, although our model generally allows for more intricate relationships between actions and changes in observable features.

Common prior. We assume the common prior over the realized assessment rule takes the form of a multivariate Gaussian before training. This captures the setting in which both the decision maker and decision subjects have a good estimate of what the true model will be, but are somewhat uncertain about their estimate. We note that our methods extend to more complicated priors beyond the isotropic Gaussian prior we consider in this setting.

Changes in observable features. In order to examine the effects that different ) have on the decision maker’s expected utility, we consider settings in which each takes a value in .

Utilities and costs of actions. As the decision maker views actions as equally desirable, we define , and .777We set for ease of exposition — in general, actions can have different utility values based on their relative importance. Since there are 1,320 individuals in our test dataset, the maximum utility the decision maker can obtain is 1,320. As proposed in [rawal2020beyond], we use the Bradley-Terry model [10.2307/2334029] to generate the decision subject’s cost of taking action , for . See Appendix C.2 for details on our exact generation methods.

Figure 4: Total decision maker utility averaged across all cost and

configurations for three different prior variances (

). See Figure 8 to view individual plots of the settings which were averaged in order to generate this plot. The optimal signaling policy (red) consistently yields higher utility compared to the two baselines: revealing full information (blue) and no information (green). This gap increases when the decision subjects are less certain about the model parameters being used (higher ).

Results. Given a instance and information revelation scheme, we calculate the decision maker’s total expected utility by summing their expected utility for each applicant. Figure 4 shows the average total expected decision maker utility across different and cost configurations for priors with varying amounts of uncertainty. See Figure 8 in Appendix C.3 for plots of all instances which were used to generate Figure 4. Across all instances, the optimal signaling policy (red) achieves higher average total utility compared to the other information revelation schemes (blue and green). The difference is further amplified whenever the decision subjects are less certain about the true assessment rule (i.e., when is large). Intuitively, this is because the decision maker leverages the decision subjects’ uncertainty about the true assessment rule in order to incentivize them to take desirable actions, and as the uncertainty increases, so does their ability of persuasion.

6.1 Patterns under different and

Figure 5: Utility surface across different and pairs for . Optimal signaling policy (red) effectively upper-bounds the two baselines, revealing everything (blue) and revealing nothing (green) in all settings.
Figure 6: 2-D slices of Figure 5 across (left) and (right). Across these two axes, the optimal signaling policy (red) dominates the revealing full information (blue) and revealing no information (green), though it may be possible for (blue) and (green) to vary in terms of which provides higher decision maker utility.

To better understand how the decision maker’s expected utility changes as a function of and , we sweep through multiple tuples on a grid of for and measure the effectiveness of the three information revelation schemes. Figure 5 shows the surface of the decision maker utility as a function of for the optimal signaling policy (red), revealing full information (blue), and revealing no information (green). When is high and is low, the total expected decision maker utility is low as there is less incentive for the decision subject to take actions (although even under this setting, the optimal signaling policy outperforms the other two baselines). As decreases and increases, the total expected decision maker utility increases.

In Figure 6, we show 2-D slices of Figure 5 along the axis (left) and axis (right). As is expected, with small cost and sufficiently large (top row, right), the two baselines become as effective as the optimal signaling policy. Interestingly, we note that changes in different