1 Introduction
Highstakes decisionmaking systems increasingly utilize datadriven algorithms to assess individuals in such domains as education [kuvcak2018machine], employment [chalfin2016productivity, raghavan2020mitigating], and lending [jagtiani2019roles]. Individuals subjected to these assessments (henceforth, decision subjects) may strategically modify their observable features in ways they believe maximize their chances of receiving favorable decisions [homonoff2021does, citron2014scored]. The decision subject often has a set of actions/interventions available to them. Each of these actions leads to some measurable effect on their observable features, and subsequently, their decision. From the decision maker’s perspective, some of these actions may be more desirable than others. Consider credit scoring as an example.^{2}^{2}2Other examples of strategic settings which arise as a result of decisionmaking include college admissions, in which a college/university (decision maker) decides whether or not to admit a prospective student (decision subject), hiring, in which a company decides whether or not to hire a job applicant, and lending, in which a banking institution decides to accept or reject someone applying for a loan. Oftentimes, the decision maker is aided by automated decisionmaking tools in these situations (e.g., [kuvcak2018machine, sanchez2020does, jagtiani2019roles]). Credit scores predict how likely an individual applicant is to pay back a loan on time. Financial institutions regularly utilize credit scores to decide whether to offer applicants their financial products and determine the terms and conditions of their offers (e.g., by setting the interest rate or credit limit). Given their (partial) knowledge of credit scoring instruments, applicants regularly attempt to improve their scores. For instance, a business applying for a loan may improve its score by paying off existing debt or cleverly manipulating its financial records to appear more profitable. While both of these interventions may improve credit score, the former is more desirable than the latter from the perspective of the financial institution offering the loan. The question we are interested in answering in this work is: how can the decision maker incentivize decision subjects to take such beneficial actions while discouraging manipulations?
The strategic interactions between decisionmaking algorithms and decision subjects has motivated a growing literature known as strategic learning (see e.g., [hardt2016strategic, dongetal, shavit2020causal, kleinberg2020classifiers, harris2021stateful]). While much of the prior work in strategic learning operates under the assumption of full transparency (i.e., the assessment rule is public knowledge), we consider settings where the full disclosure of the assessment rule is not a viable alternative. In many realworld situations, revealing the exact logic of the decision rule is either infeasible or irresponsible. For instance, credit scoring formulae are closely guarded trade secrets, in part to prevent the risk of default rates surging if applicants learn how to manipulate them. In such settings, the decision maker may still have a vested interest in providing some information about the decision rule to decision subjects to provide a certain level of transparency and recourse. In particular, the decision maker may be legally obliged, or economically motivated, to guide decision subjects to take actions that improve their underlying qualifications. To do so, the decision maker can recommend actions for decision subjects to take. Of course, such recommendations need to be chosen carefully and credibly; otherwise, selfinterested decision subjects may not follow them or, even worse, they may utilize the recommendations to find pathways for manipulation.
We study a model of strategic learning in which the underlying assessment rule is not revealed to decision subjects. Our model captures several key aspects of the setting described above: First, even though the assessment rule is not revealed to the decision subjects, they often have prior knowledge about what the rule may be. Secondly, when the decision maker provides recommendations to decision subjects on which action to take, the recommendations should be compatible with the subjects’ incentives to ensure they will follow the recommendation. Finally, our model assumes the decision maker discloses how they generate recommendations for recourse—an increasingly relevant requirement under recent regulations (e.g., [gdpr]).
Utilizing our model, we aim to design a mechanism for a decision maker to provide recourse to a decision subject with incomplete information about the underlying assessment rule. We assume the assessment rule makes predictions about some future outcome of the decision subject (e.g., whether they pay back the loan in time if granted). Before the assessment rule is trained (i.e., before the model parameters are fit), the decision maker and decision subject have some prior belief about the realization of the assessment rule. This prior represents the “common knowledge” about the importance of various observable features for making accurate predictions. After training, the assessment rule is revealed to the decision maker, who then recommends an action for the decision subject to take, based on their predetermined signaling policy. Upon receiving this action recommendation, the decision subject updates their belief about the underlying assessment rule. They then take the action which they believe (according to the update belief) will maximize their expected utility (i.e., the benefit from the decision they receive, minus the cost of taking their selected action). Finally, the decision maker uses the assessment rule to make a prediction about the decision subject.
The interaction described above is an instance of Bayesian persuasion, a gametheoretic model of information revelation originally due to kamenica2011bayesian. For background on the general Bayesian persuasion model, see Section 1.1. The specific instance of Bayesian persuasion we consider in this work is summarized below.
Interaction protocol for our setting Before training, the decision maker and decision subject have some prior/belief about the true assessment rule. After training, the assessment rule is revealed to the decision maker. The decision maker then uses their signaling policy and knowledge of the assessment rule to recommend an action for the decision subject to take. The decision subject updates their belief given the recommendation. They then take a (possibly different) action, and receive a prediction through the assessment rule.
Our contributions
Our central conceptual contribution is to cast the problem of offering recourse under partial transparency as a game of Bayesian persuasion. Our key technical contributions consist of comparing optimal actionrecommendation policies in this new setup with two natural alternatives: (1) fully revealing the assessment rule to the decision subjects, or (2) revealing no information at all about the assessment rule. We provide new insights about the potentially significant advantages of action recommendation over these baselines, and offer efficient formulations to derive the optimal recommendations. More specifically, our analysis offers the following takeaways:

Using tools from Bayesian persuasion, we show that it is possible for the decision maker to provide incentivecompatible action recommendations that encourage rational decision subjects to modify their features through beneficial interventions (Section 2.1).

Perhaps most importantly, we show that the optimal signaling policy is more effective than the above two baselines in encouraging positive interventions on the part of the decision subjects (Section 3).

While the decision maker and decision subjects are never worse off in expectation from using optimal incentivecompatible recommendations, we show that situations exist in which the decision maker is significantly better off in expectation utilizing the optimal signaling policy (as opposed to the two baselines) (Section 3.1).

We derive the optimal signaling policy for the decision maker. While the decision maker’s optimal signaling policy initially appears challenging (as it involves optimizing over continuouslymany variables), we show that the problem can naturally be cast as a linear program (Section 4).

We show that even for relatively simple examples, solving this linear program requires reasoning about exponentiallymany variables. Motivated by this observation, we provide a polynomialtime algorithm to approximate the optimal signaling policy up to additive terms (Section 5).

Finally, we empirically evaluate our persuasion mechanism on semisynthetic data based on the Home Equity Line of Credit (HELOC) dataset, and find that the optimal signaling policy performs significantly better than the two natural alternatives in practice (Section 6).
1.1 Related Work
Bayesian Persuasion. In its most basic form, Bayesian persuasion [kamenica2011bayesian] is modeled as a game between a sender (with private information) and a receiver. At the beginning of the game, the sender and receiver share a prior over some unknown state of nature, which will eventually be revealed to the sender. Before the state of nature is revealed, the sender commits to a signaling policy, a (probabilistic) mapping from states of nature to action recommendations.^{3}^{3}3Such commitment is especially possible when the sender is a software agent (as is the case in our setting), since the agent is committed to playing the policy prescribed by its code once it is deployed. After the sender commits to a signaling policy, the state of nature is revealed to the sender, who then sends a signal (according to their policy) to the receiver. The receiver uses this signal to form a posterior over the possible states of nature, and then takes an action which affects the payoffs of both players. Several extensions to the original Bayesian persuasion model have been proposed, including persuasion with multiple receivers [arieli2019private], persuasion with multiple senders [li2018bayesian], and persuasion with heterogeneous priors [alonso2016bayesian]
. There has been growing interest in persuasion in the computer science and machine learning communities in recent years.
dughmi2017algorithmic2, dughmi2019algorithmic characterize the computational complexity of computing the optimal signaling policy for several popular models of persuasion. castiglioni2020online study the problem of learning the receiver’s utilities through repeated interactions. Work in the multiarm bandit literature [mansour2015bayesian, MansourSSW16, immorlica2019bayesian, chen2018incentivizing, sellke2021price] leverages techniques from Bayesian persuasion to incentivize agents to perform bandit exploration.Strategic responses to unknown predictive models. To the best of our knowledge, our work is the first to use tools from persuasion to model the strategic interaction between a decision maker and strategic decision subjects when the underlying predictive model is not public knowledge. Several prior articles have addressed similar problems through different models and techniques. For example, akyol2016price quantify the “price of transparency”, a quantity which compares the decision maker’s utility when the predictive model is fully known with their utility when the model is not revealed to the decision subjects. ghalme2021strategic
compare the prediction error of a classifier when it is public knowledge with the error when decision subjects must learn a version of it, and label this difference the “price of opacity”. They show that small errors in decision subjects’ estimates of the true underlying model may lead to large errors in the performance of the model. The authors argue that their work provides formal incentives for decision makers to adopt full transparency as a policy. Our work, in contrast, is based on the observation that even if decision makers are willing to reveal their models, legal requirements, privacy concerns, and intellectual property restrictions may prohibit full transparency. So we instead study the consequences of partial transparency—a commonplace condition in realworld domains.
bechavod2021information study the effects of information discrepancy across different subpopulations of decision subjects on their ability to improve their observable features in strategic learning settings. Like us, they do not assume the predictive model is fully known to the decision subjects. Instead, the authors model decision subjects as trying to infer the underlying predictive model by learning from their social circle of family and friends, which naturally causes different groups to form within the population. In contrast to this line of work, we study a setting in which the decision maker provides customized feedback to each decision subject individually. Additionally, while the models proposed by [ghalme2021strategic, bechavod2021information] circumvent the assumption of full information about the deployed model, they restrict the decision subjects’ knowledge to be obtained only through past data.
Algorithmic recourse. Our work is closely related to recent work on algorithmic recourse [karimi2021survey]. Algorithmic recourse is concerned with providing explanations and recommendations to individuals who are unfavorably treated by automated decisionmaking systems. A line of algorithmic recourse methods including [wachter2017counterfactual, ustun2019actionable, joshi2019towards] focus on finding recourses that are actionable, or realistic, for decision subjects to take to improve their decision. In contrast, our action recommendations are “actionable” in the sense that they are interventions which promote longterm desirable behaviors while ensuring that the decision subject is not worse off in expectation. Finally, more recent work [slack2021counterfactual] shows that existing recourse methods based on counterfactual approaches are not robust to manipulations. Our approach to recourse is not counterfactualbased and instead uses a Bayesian persuasion mechanism to ensure decision subject compliance.
Transparency. Recent legal and regulatory frameworks, such as the General Data Protection Regulation (GDPR) [gdpr], motivate the development of forms of algorithmic transparency suitable for realworld deployment. While this work can be thought of as providing additional transparency into the decisionmaking process, it does not naturally fall into the existing organizations of explanation methods (e.g., as outlined in [chen2021towards]), as our policy does not simply recommend actions based on the decision rule. Rather, our goal is to incentivize actionable interventions on the decision subjects’ observable features which are desirable to the decision maker, and we leverage persuasion techniques to ensure compliance. One of the most prevalent use cases of automated decisionmaking is credit scoring models (a widely used one is the FICO scoring model). These models evaluate an individual’s credit worthiness based on that individual’s payment history, credit utilization, credit history, and other factors, all of which are then weighted based on proprietary formulas. Two statutes, the Fair Credit Reporting Act (FCRA) and the Equal Credit Opportunity Act (ECOA), govern these models and enforce a requirement to provide individuals who are adversely impacted by such automated decisionmaking with a statement of reasons, or an outcomebased explanation [selbst2018intuitive].
Other strategic learning settings. The strategic learning literature [hardt2016strategic, ghalme2021strategic, chen2021strategic, levanon2021strategic, jagadeesan2021alternative, bechavod2021information, harris2021stateful, harris2021strategic, kleinberg2020classifiers, frankel2019improving] broadly studies machine learning questions in the presence of strategic decision subjects. There has been a long line of work in strategic learning that focuses on how strategic decision subjects adapt their input to a machine learning algorithm in order to receive a more desirable prediction, although most prior work in this literature assumes that the underlying assessment rule is fully revealed to the decision subjects, which is typically not true in reality.
2 Setting and Background
Consider a setting in which a decision maker assigns a predicted label (e.g., whether or not someone will repay a loan if granted one) to a decision subject with observable features (e.g., amount of current debt, bank account balance, etc.).^{4}^{4}4We append a
to the decision subject’s feature vector for notational convenience.
We assume the decision maker uses a linear decision rule to make predictions, i.e., , where the assessment rule is chosen by the decision maker. The goal of the decision subject is to receive a positive classification (e.g., get approved for a loan). Given this goal, the decision subject may choose to take some action from some set of possible actions to modify their observable features (for example, they may decide to pay off a certain amount of existing debt, or redistribute their debt to game the credit score). We assume that the decision subject has actions at their disposal in order to improve their outcomes. For convenience, we add to to denote taking "no action". By taking action , the decision subject incurs some cost . This could be an actual monetary cost, but it can also represent nonmonetary notions of cost such as opportunity cost or the time/effort cost the decision subject may have to exert to take the action. We assume taking an action changes a decision subject’s observable feature values from to , where , and specifies the change in the th observable feature as the result of taking action . For the special case of , we have , . As a result of taking action , a decision subject, ds, receives utility . In other words, the decision subject receives some positive (negative) utility for a positive (negative) classification, subject to some cost for taking said action.If the decision subject had exact knowledge of the assessment rule used by the decision maker, they could solve an optimization problem to determine the best action to take in order to maximize their utility. However, in many settings it is not realistic for a decision subject to have perfect knowledge of . Instead, we model the decision subject’s information through a prior over , which can be thought of as “common knowledge” about the relative importance of each observable feature to the classifier. We will use
to denote the probability density function of
(so thatdenotes the probability of the deployed assessment rule being
). We assume the decision subject is rational and riskneutral. So at any point during the interaction, if they hold a belief about the underlying assessment rule, they would pick an action that maximize their expected utility with respect to that belief. More precisely, they solve:From the decision maker’s perspective, some actions may be more desirable than others. For example, a bank may prefer that an applicant pay off more existing debt than less when applying for a loan. To formalize this notion of action preference, we say that the decision maker receives some utility when the decision subject takes action . In the loan example, .
2.1 Bayesian Persuasion in the Algorithmic Recourse Setting
The decision maker has an information advantage over the decision subject, due to the fact that they know the true assessment rule , whereas the decision subject does not. The decision maker may be able to leverage this information advantage to incentivize the decision subject to take a more favorable action (compared to the one they would have taken according to their prior) by recommending an action to the decision subject according to a commonly known signaling policy.
Definition 2.1 (Signaling Policy).
A signaling policy is a (possibly stochastic) mapping from assessment rules to actions.^{5}^{5}5Note that since our model is focused on the decision maker’s interactions with a single decision subject, we drop the dependence of on the decision subject’s characteristics.
We use to denote the action recommendation sampled from signaling policy , where is a realization from .
The decision maker’s signaling policy is assumed to be fixed and common knowledge. This is because in order for the decision subject to perform a Bayesian update based on the observed recommendation, they must know the signaling policy. Additionally, the decision maker must have the power of commitment, i.e., the decision subject must believe that the decision maker will select actions according to their signaling policy. In our setting, this means that the decision maker must commit to their signaling policy before training their assessment rule. This can be seen as a form of transparency, as the decision maker is publicly committing to how they will use their assessment rule to provide action recommendations/recourse before they even train it. For simplicity, we assume that the decision maker shares the same prior beliefs as the decision subject over the observable features before the model is trained. These assumptions are standard in the Bayesian persuasion literature (see, e.g., [kamenica2011bayesian, mansour2015bayesian, MansourSSW16]).
In order for the decision subject to be incentivized to follow the actions recommended by the decision maker, the signaling policy needs to be Bayesian incentivecompatible.
Definition 2.2 (Bayesian incentivecompatibility).
Consider a decision subject ds with initial observable features and prior . A signaling policy is Bayesian incentivecompatible (BIC) for ds if
(1) 
for all actions such that had positive support on .
In other words, a signaling policy is BIC if, given that the decision maker recommends action , the decision subject’s expected utility is at least as high as the expected utility of taking any other action under the posterior.
We remark that while for the ease of exposition our model focuses the interactions between the decision maker and a single decision subject, our results can be extended to a heterogeneous population of decision subjects. Under such a heterogeneous setting, the decision maker would publicly commit to a method of computing the signaling policy, given a decision subject’s initial observable features as input. Once a decision subject arrives, their feature values are observed and the signaling policy is computed.
3 The Motivation Behind Persuasion
As is the case in the Bayesian persuasion literature [kamenica2011bayesian, kamenica2019bayesian, dughmi2019algorithmic], the decision maker can in general achieve a higher expected utility with an optimized signaling policy than the utilities had they provided no recommendation or fully disclosed the model. To characterize how much leveraging the decision maker’s information advantage (by recommending actions according to a BIC signaling policy) may improve their expected utility, we study the following example.
Consider a simple setting under which a single decision subject has one observable feature (e.g., credit score) and two possible actions: “do nothing” (i.e., , , ) and “pay off existing debt” (i.e., , , ), which in turn raises their credit score. For the sake of our illustration, we assume creditworthiness to be a mutually desirable trait, and credit scores to be a good measure of creditworthiness. We assume the decision maker would like to design a signaling policy to maximize the chance of the decision subject taking action , regardless of whether or not the applicant will receive the loan. In this simple setting, the decision maker’s decision rule can be characterized by a single threshold parameter , i.e., the decision subject receives a positive classification if and a negative classification otherwise. Note that while the decision subject does not know the exact value of , they instead have some prior over it, denoted by .
Given the true value of , the decision maker recommends an action for the decision subject to take. The decision subject then takes a possibly different action , which changes their observable feature from to . Recall that the decision subject’s utility takes the form . Note that if , then holds for any value of , meaning that it is impossible to incentivize any rational decision subject to play action . Therefore, in order to give the decision maker a “fighting chance” at incentivizing action , we assume the cost of action is such that .
We observe that in this simple setting, we can bin values of into three different “regions”, based on the outcome the decision subject would receive if were actually in that region. First, if , the decision subject will not receive a positive classification, even if they take action . In this region, the decision subject’s initial feature value is “too low” for taking the desired action to make a difference in their classification. We refer to this region as region . Second, if , the decision subject will receive a positive classification no matter what action they take. In this region, is “too high” for the action they take to make any difference on their classification. We refer to this region as region . Third, if and , the decision subject will receive a positive classification if they take action and a negative classification if they take action . We refer to this region as region . Consider the following signaling policy.
Signaling policy [leftmargin=0.65in] . Recommend action with probability and action with probability . Recommend action with probability . Recommend action with probability and action with probability
In Case 2, recommends the action () that the decision subject would have taken had they known the true , with probability . However, in Case 1 and Case 3, the decision maker recommends, with probability , an action () that the decision subject would not have taken knowing , leveraging the fact that the decision subject does not know exactly which case they are currently in. If the decision subject follows the decision maker’s recommendation from , then the decision maker expected utility will increase from to if the realized or , and will remain the same otherwise. Intuitively, if is “small enough” (where the precise definition of “small” depends on the prior over and the cost of taking action ), then it will be in the decision subject’s best interest to follow the decision maker’s recommendation, even though they know that the decision maker may sometimes recommend taking action when it is not in their best interest to take that action! That is, the decision maker may recommend that a decision subject pay off existing debt with probability when it is unnecessary for them to do so in order to secure a loan. We now give a criteria on which ensures the signaling policy is BIC.
Proposition 3.1.
Signaling policy is Bayesian incentivecompatible if , where .
Proof Sketch. We show that and . Since these conditions are satisfied, is BIC.
Proof.
Based on the decision subject’s prior over , they can calculate

, i.e., the probability the decision subject is in region according to the prior

, i.e., the probability the decision subject is in region according to the prior

, i.e., the probability the decision subject is in region according to the prior
Case 1: . Given the signal , the decision subject’s posterior probability density function over , , and will take the form
If the decision subject receives signal , they know with probability that they are not in region with probability . Therefore, they know that taking action will not change their classification, so they will follow the decision maker’s recommendation and take action .
Case 2: . Given the signal , the decision subject’s posterior density over , , and will take the form
The decision subject’s expected utility of taking actions and under the posterior induced by are
and
In order for to be BIC,
Plugging in our expressions for and , we see that
After canceling terms and simplifying, we see that
Next, we plug in for , , and . Note that the denominators of , , and cancel out.
Solving for , we see that
Note that always. Finally, in order for to be a valid probability, we restrict such that
This completes the proof. ∎
Under this setting, the decision maker will achieve expected utility . See Figure 1 for an illustration of how and vary with and .
But how much better can the decision maker do by recommending actions via a BIC signaling policy, compared to natural alternatives? We answer this question concretely in the following section.
3.1 Unbounded Utility Improvements Using Persuasion
As we will see in Section 4, the expected utility of the decision maker when recommending actions via the optimal (BIC) signaling policy is trivially no worse than their expected utility if they had revealed full information about the assessment rule to the decision subject, or if they had revealed no information and let the decision subject act according to the prior. In this section, we show that the decision maker’s expected utility when recommending actions according to the optimal signaling policy can be arbitrarily higher than their expected utility from revealing full information or no information. In particular, we prove the following theorem.
Theorem 3.2.
For any , there exists a problem instance such that the expected decision maker utility from recommending actions according to the optimal signaling policy is and the expected decision maker utility for revealing full information or revealing no information is at most .
Proof.
Consider the example in Section 3.
Expected utility from revealing no information. If the decision subject acts exclusively according to the prior, they will select action with probability if and with probability otherwise. Plugging in our expressions for and , we see that the decision subject will select action only if
Canceling terms and simplifying, we see that
must hold for the decision subject to select action . Finally, substituting gives us the condition . Alternatively, if , the decision subject will select action with probability . Intuitively, this means that a rational decision subject would take action if the ratio of (the probability according to the prior that taking action is in the decision subject’s best interest) to (the cost of taking action ) is high, and would take action otherwise.
Expected utility from revealing full information. If the decision maker reveals the assessment rule to the decision subject, they will select action when and action otherwise. Therefore since and , the decision maker’s expected utility if they reveal full information is .
Expected utility from . Recall that the decision maker’s signaling policy from Section 3 sets . Under this setting, the decision maker’s expected utility is . Substituting in our expression for and simplifying, we see that the decision maker’s expected utility for recommending actions via is .
Suppose that and , for some small . The decision maker’s expected utility will always be from revealing no information because . The decision maker’s expected utility from recommending actions via will be . Since , the decision maker’s expected utility from revealing full information will be less than . Therefore, as approaches , the decision maker’s expected utility from revealing full information approaches (the smallest value possible), and the decision maker’s expected utility from approaches (the highest value possible). This completes the proof. ∎
The decision maker’s expected utility as a function of their possible strategies is summarized in Table 1. Note that when , . Therefore, the decision maker’s expected utility is always as least as good as the two natural alternatives of revealing no information about the assessment rule, or revealing full information about the rule.
No information  Signaling with  Full information  

Decision maker utility 
4 Optimal Signaling Policy
In Section 3, we show a onedimensional setting, where a signaling policy can obtain unbounded better utilities compared to revealing full information and revealing no information. We now derive the decision maker’s optimal signaling policy for the general setting with arbitrary numbers of observable features and actions described in Section 2. Under the general setting, the decision maker’s optimal signaling policy can be described by the following optimization:
(2)  
s.t. 
where we omit the valid probability constraints over for brevity. In words, the decision maker wants to design a signaling policy in order to maximize their expected utility, subject to the constraint that the signaling policy is BIC. At first glance, the optimization may initially seem hopeless as there are infinitely many values of (one for every possible ) that the decision maker’s optimal policy must optimize over. However, we will show that the decision maker’s optimal policy can actually be recovered by optimizing over finitely many variables.
By rewriting the BIC constraints as integrals over and applying Bayes’ rule, our optimization over takes the following form
s.t. 
Note that if is the same for some “equivalence region” (which we formally define below), we can pull out of the integral and instead sum over the different equivalence regions. Intuitively, an equivalence region can be thought of as the set of all pairs that are indistinguishable from a decision subject’s perspective because they lead to the exact same utility for any possible action the decision subject could take. Based on this idea, we formally define a region of as follows.
Definition 4.1 (Equivalence Region).
Two assignments are equivalent (w.r.t. ) if , . An equivalence region is a subset of such that for any , all equivalent to are also in . We denote the set of all equivalence regions by .
In Figure 2, we show an example of how different equivalence regions might partition the space of possible assessment rules . In this example, there are two actions and two observable features, and the space of is partitioned into three different equivalence regions. Note that as long as the set of actions is finite, . After pulling the decision subject utility function out of the integral, our optimization takes the following form:
s.t. 
Now that the decision subject’s utility no longer depends on , we can integrate over each equivalence region . We denote as the probability that the true according to the prior.
s.t. 
Since it is possible to write the constraints in terms of , , it suffices to optimize directly over these quantities. The final step is to rewrite the objective. For completeness, we include the constraints which make each ,
a valid probability distribution.
Theorem 4.2 (Optimal signaling policy).
The decision maker’s optimal signaling policy can be characterized by the following linear program OPTLP:
(OPTLP)  
s.t.  
where denotes the probability of sending recommendation if . Note that the linear program OPTLP is always feasible, as the decision maker can always recommend the action the decision subject would play according to the prior, which is BIC.
5 Computing the Optimal Signaling Policy
In Section 4, we show that the problem of determining the decision maker’s optimal signaling policy can be transformed from an optimization over infinitely many variables into an optimization over the set of finitely many equivalence regions (Theorem 4.2). However, as we will show in Section 5.1, computing the decision maker’s optimal signaling policy by solving (OPTLP) requires reasoning over exponentiallymany variables, even in relatively simple settings. This motivates the need for a computationally efficient algorithm to approximate (OPTLP), which we present in Section 5.2.
5.1 Computational Barriers
In this section, we show that even in the setting where each action only affects one observable feature (e.g., as shown in Figure 3), the number of equivalence regions in (OPTLP) is still exponential in the size of the input. While somewhat simplistic, we believe this action scheme reasonably reflects realworld settings in which the decision subjects are under time or resource constraints when deciding which action to take. For example, the decision subject may need to choose between paying off some amount of debt and opening a new credit card when strategically modifying their observable features before applying for a loan.
Under this setting, (OPTLP) optimizes over variables, where is the number of actions available to each agent and is the number of equivalence regions. In order to determine the size of , we note that an equivalence region can be alternatively characterized by observing that assessment rules and belong to the same equivalence region if the difference in their predictions for any two actions and is the same. (This follows from straightforward algebraic manipulation of Definition 4.1.) As such, an equivalence region can essentially be characterized by the set of actions which receive a positive classification when .^{6}^{6}6Specifically, if taking action results in a positive classification for some and a negative classification for , the only way for and to be in the same equivalence region is if taking any action in results in a positive classification for and a negative classification for . Besides this special case, if and result in different classifications for the same action, they are in different equivalence regions.
Armed with this new characterization of an equivalence region, we are now ready to show the scale of for the setting described in Figure 3.
Proposition 5.1.
For the setting described in Figure 3, there are equivalence regions, where is the number of observable features of the decision subject and () is the number of actions the decision subject has at their disposal to improve observable feature .
Proof.
In order to characterize the number of equivalence regions , we define the notion of a dominated action , where an action is dominated by some other action if , with strict inequality holding for at least one index. Using this notion of dominated actions and our refined characterization of an equivalence region, it is straightforward to see that if action is dominated by action , then for any equivalence region where . Proposition 5.1 then follows directly from the fact that each action only affects one observable feature. ∎
5.2 An Efficient Approximation Algorithm
Motivated by the results in Section 5.1, we aim to design a computationally efficient approximation scheme to compute an approximately optimal signaling policy for the decision maker. In particular, we adapt the samplingbased approximation algorithm of dughmi2019algorithmic to our setting in order to compute an optimal and approximate signaling policy in polynomial time, as shown in Algorithm 1. At a high level, Algorithm 1 samples polynomiallymany times from the prior distribution over the space of assessment rules, and solves an empirical analogue of (OPTLP). We show that the resulting signaling policy is BIC, and is optimal with high probability, for any .
(APPROXLP)  
s.t.  
Theorem 5.2.
Algorithm 1 runs in poly() time (where ), and implements an BIC signaling policy that is optimal with probability at least .
Proof.
Our proof is similar to the approximation algorithm proof in dughmi2019algorithmic, and follows directly from the following lemmas, whose proofs are in Appendix A. First, since the approximation algorithm solves an approximation LP (APPROXLP) of polynomial size, it runs in polynomial time.
Lemma 5.3.
Algorithm 1 runs in poly() time.
By bounding the approximation error in the BIC constraints of (APPROXLP), we show that the resulting policy satisfies approximate BIC.
Lemma 5.4.
Algorithm 1 implements an BIC signaling policy.
Next, we show that a feasible solution to (APPROXLP) exists which achieves expected decision maker utility at least OPT  with probability at least . In order to do so, we first show that there exists an approximately optimal solution to (OPTLP) such that each signal is either (i) large (i.e., output with probability above a certain threshold), or (ii) honest (i.e., the signal recommends the action the decision subject would take, had they known the true assessment rule ). Next, we show that is a feasible solution to (APPROXLP) with high probability by applying McDiarmid’s inequality [mcdiarmid1989method] and a union bound.
Lemma 5.5.
There exists an optimal signaling policy that is large or honest.
Lemma 5.6.
With probability at least , is a feasible solution to (APPROXLP) and the expected decision maker utility from playing is at least OPT  .
Bicriteria approximation. It is important to note that the signaling policy from Algorithm 1 is both optimal and incentive compatible. While one may wonder whether (i) an optimal and exactly incentive compatible signaling policy exists, or (ii) an exactly optimal and incentive compatible signaling policy exists, dughmi2019algorithmic show that this is generally not possible for samplingbased approximation algorithms for Bayesian persuasion (see Theorem 27 in dughmi2019algorithmic). Note that unlike the other results in dughmi2019algorithmic, these results directly apply to the setting we consider.
Computational complexity. Recall that the algorithm for computing the optimal policy runs in time polynomial in the number of equivalence regions , which can scale exponentially in the number of actions . However, without any structural assumptions, the input prior over the space of assessment rules can scale exponentially in the number of features . When and are comparable, our algorithm runs in time polynomial in the input size. We leave open the question of whether there are classes of succinctly represented prior distributions that permit efficient algorithms for computing the optimal policy in time polynomial in and . It is also plausible to design efficient algorithms that only require some form of query access to the prior distribution. However, informationtheoretic lower bounds of [dughmi2019algorithmic] rule out the query access through sampling, as they show that no samplingbased algorithm can compute the optimal signaling policy with finite samples across all problem instances.
6 Experiments
In this section, we provide experimental results that validate our findings using a semisynthetic setting where decision subjects are based on individuals in the Home Equity Line of Credit (HELOC) dataset [FICO]. We compare the decision maker utility for different models of information revelation: our optimal signaling, revealing full information, revealing no information. To do so, we first estimate agent costs using the BradleyTerry model [10.2307/2334029] and compute the decision maker’s expected utility for each information revelation scheme we consider. We find that the expected decision maker utility when recommending actions according to the optimal signaling policy either matches or exceeds the expected utility from revealing full information or no information about the assessment rule across all problem instances. Moreover, the expected decision maker utility from signaling is significantly higher on average. Next, we explore how the decision maker’s expected utility changes when action costs and changes in observable features are varied jointly. Our results are summarized in Figures 4, 5, and 6.
The HELOC dataset contains information about 9,282 customers who received a Home Equity Line of Credit. Each individual in the dataset has 23 observable features related to an applicant’s financial history (e.g., percentage of previous payments that were delinquent) and a label which characterizes their loan repayment status (repaid/defaulted). In order to adapt the HELOC dataset to our strategic setting, we select four features from the original 23 and define five hypothetical actions that decision subjects may take in order to improve their observable features. Actions result in changes to each of the decision subject’s four observable features, whereas action does not. For simplicity, we view actions as equally desirable to the decision maker, and assume they are all more desirable than . See Table 2
for details about the observable features and actions we consider. Using these four features, we train a logistic regression model that predicts whether an individual is likely to pay back a loan if given one, which will serve as the decision maker’s realized assessment rule.
Pair  Feature ()  Action () 

# payments with highutilization ratio  decrease this value  
# satisfactory payments  increase this value  
% payments that were not delinquent  increase this value  
revolving balance to credit limit ratio  decrease this value 
Common prior. We assume the common prior over the realized assessment rule takes the form of a multivariate Gaussian before training. This captures the setting in which both the decision maker and decision subjects have a good estimate of what the true model will be, but are somewhat uncertain about their estimate. We note that our methods extend to more complicated priors beyond the isotropic Gaussian prior we consider in this setting.
Changes in observable features. In order to examine the effects that different ) have on the decision maker’s expected utility, we consider settings in which each takes a value in .
Utilities and costs of actions. As the decision maker views actions as equally desirable, we define , and .^{7}^{7}7We set for ease of exposition — in general, actions can have different utility values based on their relative importance. Since there are 1,320 individuals in our test dataset, the maximum utility the decision maker can obtain is 1,320. As proposed in [rawal2020beyond], we use the BradleyTerry model [10.2307/2334029] to generate the decision subject’s cost of taking action , for . See Appendix C.2 for details on our exact generation methods.
Results. Given a instance and information revelation scheme, we calculate the decision maker’s total expected utility by summing their expected utility for each applicant. Figure 4 shows the average total expected decision maker utility across different and cost configurations for priors with varying amounts of uncertainty. See Figure 8 in Appendix C.3 for plots of all instances which were used to generate Figure 4. Across all instances, the optimal signaling policy (red) achieves higher average total utility compared to the other information revelation schemes (blue and green). The difference is further amplified whenever the decision subjects are less certain about the true assessment rule (i.e., when is large). Intuitively, this is because the decision maker leverages the decision subjects’ uncertainty about the true assessment rule in order to incentivize them to take desirable actions, and as the uncertainty increases, so does their ability of persuasion.
6.1 Patterns under different and
To better understand how the decision maker’s expected utility changes as a function of and , we sweep through multiple tuples on a grid of for and measure the effectiveness of the three information revelation schemes. Figure 5 shows the surface of the decision maker utility as a function of for the optimal signaling policy (red), revealing full information (blue), and revealing no information (green). When is high and is low, the total expected decision maker utility is low as there is less incentive for the decision subject to take actions (although even under this setting, the optimal signaling policy outperforms the other two baselines). As decreases and increases, the total expected decision maker utility increases.
Comments
There are no comments yet.