Manipulation-Proof Machine Learning

04/08/2020 ∙ by Daniel Björkegren, et al. ∙ Brown University berkeley college 0

An increasing number of decisions are guided by machine learning algorithms. In many settings, from consumer credit to criminal justice, those decisions are made by applying an estimator to data on an individual's observed behavior. But when consequential decisions are encoded in rules, individuals may strategically alter their behavior to achieve desired outcomes. This paper develops a new class of estimator that is stable under manipulation, even when the decision rule is fully transparent. We explicitly model the costs of manipulating different behaviors, and identify decision rules that are stable in equilibrium. Through a large field experiment in Kenya, we show that decision rules estimated with our strategy-robust method outperform those based on standard supervised learning approaches.



There are no comments yet.


page 22

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

An increasing number of important decisions are being made by machine learning algorithms. Algorithms determine what information we see online (perlich_machine_2014); who is hired, fired, and promoted (brynjolfsson_what_2017); who gets a loan (bjorkegren_potential_2018), and whether to give bail and parole (kleinberg_human_2018). In the typical machine learning deployment, an individual’s observed behavior is used as input to an estimator that determines future decisions.

These applications of machine intelligence raise two related problems. First, when algorithms are used to make consequential decisions, they create incentives for people to reverse engineer or ‘game.’ If agents understand how their behavior affects decisions, they may alter their behavior to achieve the outcome they desire. Second, society increasingly demands a ‘right to explanation’ about how algorithmic decisions are made (goodman_european_2016; barocas_fairness_2018). For instance, articles 13-15 of the European Union’s General Data Protection Regulation mandate that “meaningful information about the logic” of automated systems be made available to data subjects (european_union_eu_2016). However, such transparency increases the scope for gaming: the more clearly that agents know how their behavior affects a decision, the easier it is to manipulate.

These problems result from a simple core. The standard estimators that are used to construct decision rules assume that the relationship between the outcome of interest and human behaviors is stable. But this assumption tends to be violated as soon as a decision rule is implemented: agents have incentives to change their behavior to achieve more favored outcomes. When decision rules are gamed, they can produce decisions that are arbitrarily poor or unsafe. Lenders’ portfolios may be swamped with fraud, social media may be overrun by nefarious actors, self driving cars can be tricked into crashing (eykholt_robust_2018). This problem can undermine the use of machine learning in critical applications.

There are two common approaches to deal with this problem. The first, familiar to economists, restricts models to predictors that are presumed to have a theoretical or structural relationship to the outcome of interest.444An extreme version of this restricts to predictors that causally affect the outcome of interest (kleinberg_how_2018; milli_social_2019). This may make manipulation desirable: for example, an exam may induce students to study and learn general knowledge. This theory-driven approach amounts to having a dogmatic prior that the cost of manipulation is either infinite (for included features) or zero (for excluded features). However, most behaviors are manipulable at some cost. The second approach, which we refer to as the ‘industry approach’, keeps decision rules secret, and periodically updates the model to account for changes in the relationship between features and outcomes (bruckner_stackelberg_2011). However, such ‘security through obscurity’ exposes current applications to substantial risk (NIST national_institute_of_standards_and_technology_guide_2008). It also limits the application of machine learning in settings where secrecy cannot be maintained (e.g., when regulations mandate transparency, or when consumers learn decision rules directly or through third parties) or feedback is noisy or delayed (e.g., it may take years for a social media platform to learn that its content prioritization algorithm was gamed by foreign actors). There is also no guarantee that the back and forth between estimation and agents will reach equilibrium.

This paper develops a new approach. We explicitly model the costs that agents incur to manipulate their behavior, and embed the resulting game theoretic model within a machine learning estimator. This allows us to derive estimators that anticipate strategic agents, and which produce stable decisions even when the decision rule is fully transparent. We demonstrate, using Monte Carlo simulations, that our ‘strategy-robust’ estimator performs better than standard models when these costs are known, even if costs are misspecified. We then test the theory in a real world environment, through an incentivized field experiment with 1,557 people in Kenya. We use the experiment to elicit costs of manipulating behavior, and to show that the strategy-robust approach leads to more robust machine decisions.

The paper is organized into two main parts. The first part develops a method to estimate strategy-robust decision rules that are stable under manipulation. We consider a supervised machine learning framework for a policymaker making a decision for each individual . Each individual prefers a larger decision . We observe a training subset of cases that possess both features and optimal decisions . The policymaker seeks to estimate a decision rule for cases in a testing subset where only features are observed. Standard methods assume that ’s are fixed: training and test samples of are drawn from same distribution. Our method allows individuals to adjust behavior in response to the incentives generated by the decision rule: is a function of the decision rule. As a result, while our training samples come from an unincentivized distribution ; test samples come from . We assume individuals pay quadratic costs for manipulating behavior (), and that these costs can be parametrized by a matrix . We describe several methods to estimate this cost matrix, a new object needed to determine how behavior shifts when incentivized.

To sharpen intuition, we derive results for linear decision rules of the form

. The resulting estimator takes a simple nonlinear least squares form. Our method introduces a new notion of fit, which has analogues to other common linear regression approaches. Ordinary least squares (OLS) maximizes fit within sample; two stage least squares (2SLS) sacrifices fit within sample to estimate coefficients that have causal interpretations; penalized least squares (such as LASSO and ridge) sacrifice within-sample fit to better generalize to other samples drawn from the same population. Our method sacrifices fit within sample to maximize equilibrium fit in the counterfactual where the decision rule is used to allocate resources, and agents manipulate against it. Our estimator is an example of a new class of estimator that maximizes

counterfactual fit–predictive fit in a counterfactual state of the world.

We use Monte Carlo simulations to compare this new strategy-robust approach to common alternatives. OLS can perform extremely poorly when agents behave strategically. The industry approach, which periodically retrains the model, can also perform poorly and converge slowly, or not at all. By contrast, our method adjusts the model to anticipate manipulation. In simulations where agents respond to the decision rule, and manipulation costs are known, our approach exceeds the performance of other estimators. Our approach can exceed the performance of others even if manipulation costs are misspecified for some cases. Under certain parameters, the presence of manipulation can improve predictive performance, if it signals unobservables associated with the outcome of interest (in the spirit of spence_job_1973). In these cases, one may wish to use certain features that are manipulable by the types that you want to screen in, but not by those you want to screen out.

In the second part of the paper, we implement and test our method in the context of a field experiment in Kenya. This experiment allows us to compare the performance of the strategy-robust estimator to standard machine learning algorithms in a real-world environment. Specifically, we built a new smartphone app that passively collects data on how people use their phones, and disburses monetary rewards to users based on the data collected. The app is designed to mimic ‘digital credit’ products that are spreading dramatically through the developing world (francis_digital_2017). Digital credit products similarly collect user data, and convert it into a credit score using machine learning, based on the insight that historical patterns of mobile phone use can predict loan repayment (bjorkegren_big_2010; bjorkegren_behavior_2019). However, as these systems have scaled, manipulation has become commonplace as borrowers learn what behaviors will increase their credit limits (mccaffrey_m-shwari:_2013; bloomberg_phone_2015).555A recent survey in Kenya and Tanzania found that one of the top five reasons people report saving money in digital accounts is to increase the loan amount qualified for (fsd_kenya_tech-enabled_2018).

This field experiment produces several results. First, consistent with prior work, we show that a person’s mobile phone usage behaviors () can be used to predict characteristics of the phone user, such as income, intelligence (Raven’s matrices), and overall activity.666Prior work has used mobile phone data to predict income and wealth (blumenstock_predicting_2015; blumenstock_estimating_2018), gender (blumenstock_whos_2010; frias-martinez_gender-centric_2010), and employment status (sundsoy_estimating_2016), and loan repayment (bjorkegren_potential_2018; bjorkegren_behavior_2019), . Second, through the use of randomly-assigned experiments, we structurally estimate in our model, i.e., the relative costs of manipulating a variety of observed behaviors

. Our experiments offer financial incentives to participants for altering behaviors that are observed through the app, such as increasing the number of outgoing calls in a given week, or decreasing the number of incoming text messages. The pattern of costs is intuitive: outgoing communications are less costly to manipulate than incoming communications; text messages, which are relatively cheap to send, are more manipulated than calls, which are relatively expensive. We also find that complex behaviors (such as the standard deviation of talk time) are less manipulable than simpler behaviors (such as the average duration of talk time).

The next set of results demonstrate that strategy-robust decision rules, which account for the costs of manipulation, perform substantially better than standard machine learning algorithms. We make this comparison by offering rewards to people who use their phones like a person of a particular type. For instance, some people receive a message that says, “Earn up to 1000 Ksh if the Sensing app guesses that you are a high income earner, based on how you use your phone,” while others receive messages that offer rewards for acting like an “intelligent” person, and so forth. Across a variety of such decision rules, we show that classifications made with the strategy-robust algorithm are more accurate than classifications from standard algorithms.

Finally, we use our method to estimate the equilibrium cost of algorithmic transparency, i.e., the cost to the policymaker incurred for disclosing details of the decision rule. In the experiment, we experimentally vary the amount of information subjects have about the decision rule (e.g., the model used to predict the outcome), and show that the relative performance of the strategy-robust estimator increases with transparency. While predictive performance decreases by on average 23% under transparency for standard machine learning estimators, the strategy-robust estimator reduces this cost of transparency to approximately 8%. Overall, this suggests that the equilibrium cost of moving from a regime where the decision rules are secret, to one where they are disclosed, to be less than 8% in our setting. Our model allows policymakers to bound this equilibrium cost of transparency even without disclosing decision rules to the world.

Taken together, the paper develops and tests a new approach to supervised learning when agents are strategic. This relates to papers from a variety of sub-literatures have confronted the notion that agents will act strategically when their actions are used to determine allocations. Our paper aims to integrate these approaches by applying principles of mechanism design to the machine learning setting, where data may have many dimensions and traditional approaches to designing incentive-compatible allocations are not possible. To our knowledge, this is also the first paper to estimate and test a strategy-robust machine learning estimator using data from a field experiment.

1.1 Connection to Literature

The dilemma of manipulation is not new. goodhart_monetary_1975, in what has since become referred to as ‘Goodhart’s Law’, noted that once a measure becomes a target, it ceases to be a good measure. lucas_econometric_1976 also famously observed that historical patterns can warp when economic policy changes. More broadly, our approach connects with literatures in both economics and computer science.

Our problem can be viewed as a mechanism design problem. Canonical signaling models (spence_job_1973) rely on a single crossing condition to allow full revelation of individual types. In our setting, like the settings of frankel_muddled_2019 and ball_scoring_2019, there are two forms of heterogeneity: types and the costs of manipulating behavior . frankel_muddled_2019 show that unobserved heterogeneity in manipulation costs ‘muddles’ the relationship between behavior and types , causing the single crossing condition to fail. That paper shows that muddling reduces the information available in a market. ball_scoring_2019 extends that framework to multiple dimensions of behavior, and in a theoretical model similar to ours, characterizes and proves the existence of equilibrium. That paper also considers how the problem is affected by the degree of commitment available to the policymaker. Relative to this work, our paper builds a model that can be empirically estimated, which allows us to probabilistically separate types and costs.777In a related setting, hussam_targeting_2017 implement an incentive compatible mechanism that collects peer reports to estimate an individual’s entrepreneurial ability. That method requires gathering peer reports from a community during implementation; in contrast, our approach produces stand in replacements for standard machine learning models, which can use arbitrary data on behavior. Also related, holmstrom_moral_1979 shows that a principal should use any information that has signal when contracting with an agent. Our method suggests how manipulable information be downweighted. eliaz_incentive-compatible_2018 study a related problem where a “statistician” is making decisions on behalf of an agent, with two-sided incomplete information: the agent knows his preferred behavior, but the statistician knows the decision rule. They focus on characterizing incentive-compatible estimators, and find that commonly-used regularized linear models create incentive issues.

Our paper is also related to the problem in public finance of setting taxes in environments where agents adapt their behaviors. Our method weights predictors by the inverse of the matrix of the costs of manipulating them, in a manner similar to ramsey_contribution_1927. Relatedly, mirrlees_exploration_1971 recommends using proxies when it is not possible to observe the true income earning ability of potential beneficiaries. niehaus_targeting_2013 find that when implementing agents can be corrupted, considering additional poverty indicators can worsen the targeting of benefits, by making it more difficult to verify eligibility.

Finally, our approach relates to existing strands in the computer science literature. The theoretical computer science community has recently considered this problem as one of ‘strategic classification’ (hardt_strategic_2016; dong_strategic_2018). This literature is focused primarily on obtaining computationally efficient learning algorithms, and how strategic behavior can affect statistical definitions of fairness (hu_disparate_2019; milli_social_2019)

. In computer security, ‘adversarial machine learning’ considers how strategic adversaries can systematically undermine supervised learning algorithms, typically by injecting erroneous data into the model fitting procedure.

888For instance, bruckner_stackelberg_2011 study adversarial prediction when the agent acts in response to an observed predictive model, with an application to spam filtering. dong_strategic_2018 model an iterated industry approach where a policymaker observes how agents manipulate in response to previous rules, but does not know their utility functions or costs. Also related is the concept of ‘covariate shift’, which considers scenarios where a test distribution differs from the training distribution. However, it is common to assume that the conditional distribution is fixed, and the distribution of ’s changes exogenously (sayed-mouchaweh_learning_2012). The manipulation we consider induces the conditional distribution to change endogenously when action is taken based on the estimated relationship.

Thus, papers from a variety of sub-literatures have confronted the notion that agents will act strategically when their actions are used to determine allocations. Relative to prior work, our paper makes two main contributions. First, we develop an equilibrium model of manipulation that can be estimated using data, which produces a machine learning estimator that functions well under manipulation even when the decision rule is fully transparent. And second, to our knowledge for the first time in any literature, we design and implement a field experiment that stress-tests such an estimator in a real-world setting with incentivized agents.

1.2 Applications and Examples

Agents game decision rules in a wide variety of empirical settings. Manipulation has been documented in contexts ranging from New York high school exit exams (dee_causes_2019) and health provider report cards (dranove_is_2003), to pollution monitoring in China (greenstone_can_2019), to fish vendors in Chile (gonzalez-lira_slippery_2019). In the online advertising industry, firms spend many millions of dollars each year on search engine optimization, manipulating their websites in order to receive a higher ranking from search engine algorithms (borrell_associates_trends_2016). A quick Google search suggests over 50 thousand different websites (and 3,000 YouTube videos) contain the phrase “hack your credit score.”

We apply our method to an experiment that mimics poverty targeting. In developing countries, where income is difficult to observe, policymakers commonly target program eligibility () based on easily observable characteristics or behaviors () (hanna_universal_2018). The policymaker may infer a household’s type based on the levels of these variables, or, implicitly, on how they change in response to incentives.999Our method thus nests this latter case of self-targeting (nichols_targeting_1982; alatas_self-targeting:_2016), which identifies beneficiaries based on willingness to engage with a costly “ordeal.” There is evidence that such decision rules induce households to manipulate their observable features. For instance, banerjee_lack_2018 find that adding a question about flat screen TV ownership to a census caused people to underreport ownership by 16% on a follow-up survey, in order to appear less wealthy.101010In other examples from the development literature, camacho_manipulation_2011 find that after a program eligibility decision rule was made transparent to local officials in Colombia, it was manipulated by an amount corresponding to 7% of the National Health and Social Security budget. They note, “there is anecdotal evidence of people moving or hiding their assets, or of borrowing and lending children.”

The method we develop is directly relevant to a variety of other settings where a policymaker derives a decision from a prediction () based on agent behaviors (). These include other supervised settings where it is possible to obtain a ground truth value of for a training sample of individuals. For instance, in credit scoring applications, a decision about whether it is prudent to provide a loan () is made based on characteristics on the potential borrower (traditional credit scores are based on the borrower’s formal credit history, but increasingly the characteristics include private data like mobile phone usage (bjorkegren_behavior_2019) and social network structure (wei_credit_2015)). It also includes settings where no definite ground truth of exists. Search engines, social media, and spam filters attempt to determine the quality of a piece of content () based on features that can be observed (: keywords, reputation of the sender, inbound links). Manipulating these features may be costly directly, or may undermine the author’s intent in distributing the content. Similarly, ‘report cards’ for universities, hospitals, and doctors attempt to determine quality () based on indicators (: alumni giving rates, endowment size, acceptance rates, graduation rates).111111Our model does not consider behaviors that have a causal relationship to , where manipulation can be productive (kleinberg_how_2019). It thus would not cover report card variables that directly influence quality, nor the case of a student who ‘games’ a test by studying (), and as a result improves their knowledge (). The approach could be extended to cover such cases.

The remainder of the paper is organized as follows. The next section introduces our theory. Section 3 describes estimation. Section 4 describes the results of our field experiment. Section 5 discusses extensions. Section 6 concludes.

2 Theory

This section introduces the model underlying our estimator, and demonstrates the intuition with simulations.

2.1 Model

A policymaker observes a training subset of cases that possess both features and optimal decisions . The policymaker also obtains information on the costs of manipulating features, which will be detailed later. The policymaker would like to estimate the parameters of a decision rule for cases in a testing subset where only features are observed, and may be manipulated.

A policymaker has a preferred action for each individual , denominated in units of individuals’ utility. The action can be projected onto bliss behavior by the equation , with representing idiosyncratic preference.

However, the policymaker observes an individual’s actual behavior , which may differ from their bliss level . It selects a deterministic decision rule of the form121212Although randomizing a decision rule may make it harder to manipulate, it undermines a major goal of transparency: that people know how they are evaluated.:

Individuals can manipulate their behavior away from their bliss level at some cost. earns utility from the decision minus these costs:

For simplicity, we consider the case where the utility from the decision exactly coincides with the policymaker’s prediction.131313That is, we consider the case where the utility of the decision , which holds in our experiment. Under more general functions , our model would represent a linear approximation. One could easily generalize our framework to allow for more general functional forms.

Individuals are heterogeneous in two respects, bliss behaviors and gaming ability (as in frankel_muddled_2019).

Manipulation costs are quadratic:

for matrix :

Different behaviors may be differentially hard to manipulate, by themselves (the diagonal ) or in conjunction with other behaviors (the off diagonals ). And different people may find it easier or harder to manipulate (): for example, people with more technical savvy or lower opportunity cost of time may find it easier to game decision rules.

When knows the decision rule and receives benefits according to it, he will optimally manipulate behavior to level:

When behavior is not incentivized (), optimal behavior equals the bliss level (). However, as moves away from zero, behavior moves in the same direction, downweighted by the cost of manipulation (as highlighted in blue).

Decision rules. The policymaker faces expected squared loss:

The first term represents fit of the model in the counterfactual where the model is implemented and agents manipulate behavior. If the policymaker additionally cares about the costs that individuals incur manipulating, this manipulation cost results in additional term .

Our strategy-robust decision rule is given by:


which deviates from ordinary least squares due to the term which captures manipulation in response to . Additional terms ‘’ can include any weight the policymaker places on manipulation costs incurred by agents, and any regularization terms .


If the policymaker only cares about targeting performance () and there are no additional regularization terms (

), then ours is a nonlinear least squares estimator. Moment conditions are given by:

This suggests that the estimator imposes that equilibrium errors in the counterfactual are less than orthogonal to individual types : they equal the negative of an adjustment factor that accounts for the fact that induces a marginal incentive to respond. When , the resulting estimator corresponds to OLS.

When the policymaker cares about not only the resulting allocation, but also the manipulation costs that individuals incur, this is accompanied by the term , which can take a different form depending on policymaker preferences. An entity that is narrowly concerned with its own objective (e.g., profits in the case of a firm) may thus select different decision rules from those that maximize social welfare (for example, a firm may be satisfied with an equilibrium where all individuals expend welfare gaming a test, where a social planner may not).141414For example, the policymaker may place weight on the sum of manipulation costs: . The Supplemental Appendix derives a microfounded term for the case of proxy means testing.

To reduce overfitting in small samples, one may also include common forms of regularization; for example, or

. Hyperparameter

can be set with cross validation in the baseline sample. Under these regularization terms, when and the resulting estimator corresponds to LASSO, or ridge, respectively.

2.2 Intuition

We demonstrate the method with Monte Carlo simulations.

We derive desired payments , from individual types and payment rule , with deviations . We then assess decision rules based on observed behaviors generated with different estimators. Our strategy-robust estimators anticipate that behaviors may change when they are used in a decision rule, factoring in manipulation costs . This section assumes that manipulation costs are known.

Comparative Statics

We consider a case where is more predictive than in baseline behavior, but would be easily manipulated if used in a decision rule ( but ).

Figure 1 compares our method to OLS and LASSO, which mistakenly place most weight on . OLS maximizes predicted performance within the unincentivized sample ; as shown in Figure 1a, it performs poorly as manipulation becomes easier. Figure 1b shows that for a given cost of manipulation, LASSO shrinks these coefficients. However, when LASSO selects variables, it does exactly the wrong thing: it kicks out of the regression first. In contrast, our method considers how predictive features will be in equilibrium when the decision rule is implemented: . As shown in Figure 1c, when manipulation costs are high, our method approaches OLS, but as manipulation becomes easier, our method substantially penalizes . Our method can also be combined with LASSO or ridge penalization to fine tune out of sample fit.151515See Appendix Figure A2

for a comparison to ridge regression, as well as a demonstration of combining our method with ridge penalization.

If each feature is equally costly to manipulate, our method shrinks them together, similar to ridge regression, as shown in Figure A2. If all individuals have the same gaming ability (), then manipulation shifts behavior uniformly and does not affect predictive performance. However, even though predictive performance is high, individuals’ can spend substantial utility on manipulation. Figure A1 develops this intuition further, by showing how the strategy-robust method penalizes indicators that are easy to shift: Figure A1a shows the effect of scaling the cost of one behavior (). As the cost of manipulating that particular behavior () decreases, it is penalized, and weight is shifted to other behaviors. The method also penalizes indicators that make it easier to shift other predictive indicators (in a manner similar to ramsey_contribution_1927 taxation). Figure A1b shows that the effect of cost interactions: when manipulating makes it easier to manipulate ( sufficiently negative), our method further reduces weight on .

(a) (b) (c)
Manipulation Cost = 1
Note: The first behavior is more predictive (), but is easily manipulable (). (a) OLS performance deteriorates substantially when behavior can be manipulated. (b) LASSO penalization favors , which will be manipulated as soon as the decision rule is implemented. (c) Our method anticipates that will be manipulated if it is incentivized. It shifts weight to as behavior becomes manipulable.
, , , , . Squared error measured on an out of sample draw from the same population, incentivized to that decision rule.
Figure 1: Common vs. Strategy Robust Estimators


Table 1 shows the results of an example Monte Carlo simulation, chosen to demonstrate how standard approaches can fail. In this simulation, type has a large weight in the desired payment () relative to the other two dimensions (); however, the resulting behavior is much easier to manipulate ( vs. and ).

In this environment, OLS considers the static relationship in the unmanipulated data. This rule would perform well if behavior were held fixed (no manipulation column); however, once consumers adjust to the rule, it makes terrible decisions (manipulation column).

The industry approach would retrain (refresh) this model after this manipulation. If we observe how consumers adjust their behavior and reestimate OLS, we obtain , which places negative weight on the manipulated . However, its also makes terrible decisions when consumers respond to it. We can try to do better by repeatedly allowing individuals to best respond, and then reestimating the decision rule. But even with perfect information and no changes in the environment, this process can make poor decisions en route to convergence, or may not converge at all. If we estimate using data from all prior periods (), it continues to make terrible decisions over several iterations of the algorithm designer announcing decision rules to consumers, and learning from how they respond. While the performance of these decisions then begins to improve, it would require sequentially announcing over a thousand different rules to consumers, and learning from how they responded to each one, to approach equilibrium. (See the second set of estimates in Table 1.) If we instead rely on only recent data, estimating using only data from the prior period (), this approach does not reach equilibrium: it alternates between decision rules that place high and low weight on (see Table A1). Thus standard approaches can perform poorly even in ideal cases. If there were noise or frictions in learning, the risks of this approach are greater: the rule may appear to be performing well, and suddenly be devastatingly undermined (for example, gonzalez-lira_slippery_2019 find that increased enforcement of a ban on selling an endangered fish can lead vendors to learn about the decision rule, and more effectively undermine it).

In contrast, our strategy-robust estimator () anticipates that including a behavior in the decision rule will shift that behavior. It penalizes the easily manipulable behavior , and shifts weight to behaviors that are harder to manipulate ( and ). It sacrifices performance in the environment in which it is trained (in sample, no manipulation) for performance in the counterfactual where there is manipulation. When individuals manipulate as described in the model, our estimator exceeds the performance of other estimators.

Our method can reduce risk even if manipulation costs are misestimated. We consider a case with two measurement mistakes: (a) all off diagonal elements are set to zero, and (b) the estimated costs of manipulation are two times too large. Performance deteriorates relative to the case where we know the true cost matrix, but our method still outperforms OLS in the presence of manipulation. One can use our method as a first step towards equilibrium, and then follow it with the industry approach; as shown in the bottom rows, doing so skips the terrible decisions made in the first two iterations of the industry approach.

Decision Rule Performance (squared loss)
No manip. Manipulation
Panel A: Data generating process
0.200 3.000 0.100 0.100 0.267 3745.046
Panel B: Standard Approaches
0.205 3.042 0.061 0.116 0.266 3961.225
    ‘Industry’ Approach (estimated cumulatively)
after -0.798 0.061 2.090 -1.675 3.275 625.762
after -2.174 0.174 0.436 0.143 12.861 8.369
after -1.376 0.165 0.573 0.483 9.343 4.415
after -1.619 0.316 0.753 -0.059 8.442 2.105
after -1.854 0.489 0.582 -0.124 9.211 1.959
Panel C: Strategy Robust Method
-1.813 0.503 0.536 -0.096 9.155 1.939
    If costs are misestimated:
-1.566 0.658 0.719 -0.352 6.893 10.826
    Followed by Industry Approach (estimated cumulatively):
after -2.045 0.800 0.042 0.418 10.891 4.447
after -2.022 0.558 0.327 0.137 10.685 2.453
  • Notes: Monte Carlo simulation results. Panel A shows the coefficients that relate the outcome ) to behaviors () under the data generating process (DGP). Panel B shows coefficients from OLS; Panel C shows coefficients estimated with the strategy robust method. Performance is assessed on the same sample of individuals, under behavior without manipulation: , or with: . Parameters:
    , , ,

Table 1: Manipulation Can Harm Prediction (Monte Carlo)

Manipulation can improve performance

Manipulation can improve performance, if ease of manipulation () is correlated with the outcome (). In that case, manipulation itself represents a signal of the underlying type, as in spence_job_1973, and applications of self-targeting (nichols_targeting_1982; alatas_self-targeting:_2016). An example is shown in Table A2: manipulation improves the performance even of naïve estimators, as shown in the first two rows. Our method can additionally exploit cost heterogeneity, and thus further improves performance as shown in the third row.

3 Estimation

Our model can be fully estimated with experimental data. To estimate manipulation costs, we hire study participants to undermine component parts of the model, and gauge how sensitive these manipulations are to incentives.

We observe multiple time periods. Each period, an individual may desire to deviate from bliss behavior due to manipulation, or shocks that are common () or individual specific ():

where both components are mean zero: and . Then, in week we will observe behavior:


We parameterize the inverse of the cost matrix as follows:

with elements of inverse costs defined for convenience as:

Gaming ability includes two types of heterogeneity:

It is allowed to vary with characteristics that are observable in the training sample (but need not be observed in an implementation sample; for example, we survey participants on tech savviness). It also includes unobserved heterogeneity with , which will enter the model as random effects.

We estimate strategy-robust decision rules in two steps.

3.1 Primitives

We first estimate primitives: types , cost parameters and , and the distribution of unobserved gaming ability .


We infer types by observing baseline behavior prior to the implementation of a decision rule. When , behavior will not be manipulated. We can estimate types and time period fixed effects with moment conditions derived from the equation:


including only time periods where .


Our main specification recovers manipulation costs experimentally. Each week we randomly assign individuals to a decision rule . The decision rule may be a control, in which case . Or, it may be a treatment group that incentivizes one behavior , by disclosing a rule that pays incentives for : but not for other behaviors: for . These treatments make it possible to recover the inverse cost matrix (diagonal and off-diagonal elements), as well as heterogeneous gaming ability (observed and unobserved ).

Moment Conditions

We recover all parameters jointly with the following moment conditions.

Incentives are orthogonal to idiosyncratic behavior shocks (). For each pair of behaviors (including ) this yields sample moment condition:

We also have : for each time period and behavior , we obtain:

For each individual and behavior , we obtain:

given observations.

Unobserved heterogeneity is mean zero (), yielding:

Each heterogeneity characteristic is orthogonal to unobserved heterogeneity (), yielding:

These moment conditions jointly identify , , and .

Joint Estimation

We jointly solve for the parameters to minimize the squared distance from zero:


represents the associated general method of moments (GMM) loss function.

Penalization and Cross Validation

We make include two adjustments to reduce overfitting of the cost matrix to our limited dataset. First, we impose the constraint that incentivizing a behavior increases it: . Second, we regularize the cost estimates:

where we allow the possibility of using separate hyperparameters for diagonal and off diagonal costs. These penalize the cost of manipulation towards infinity (ease of manipulation towards zero), which will tend to penalize our method’s estimates towards standard methods (OLS/LASSO/etc).

We jointly solve for parameters , , and , and hyperparameters to minimize out of sample prediction error, using cross validation. Then, we impose the optimal and jointly estimate , , and on the full sample.

Unobserved Gaming Ability

After estimating these parameters, we back out the distribution of unobserved gaming ability in two steps. First we compute whether each individual manipulates more or less than predicted during incentivized weeks:

Second, to reduce the impact of noise and outliers, we shrink and winsorize these backed out shocks. We form the empirical distribution

, where is the lowest value of that leads to a nonnegative implied gaming ability.161616That is, . We set the shrinkage factor to 0.005 so that less than 5% of distribution is winsorized.171717After shrinkage, 4.1% of observations are winsorized. This yields a distribution of costs .

3.2 Decision Rules

Given these primitives, a strategy robust decision rule is given by:

taken over expectation over , and given decision rule regularization term . Hyperparameter is set through cross validation in the unmanipulated sample (where we can observe ground truth):

4 Experiment

We designed a field experiment to test the performance of our strategy-robust estimator in a real-world setting. Design started in 2017. Working with the Busara Center for Behavioral Economics in Nairobi, we developed and deployed a new smartphone-based application (‘app’) to 1,557 research subjects. The app was designed to mimic the key features of the ‘digital credit’ apps that are quickly transforming consumer credit in developing countries (francis_digital_2017). In Kenya, at the time of our study, cgap_kenyas_2018 estimates that 27% of all adults had an outstanding ‘digital credit’ loan. These phone-based apps construct an alternative credit score () based on how each applicant uses their phone (; bjorkegren_big_2010; bjorkegren_behavior_2019). The app we built similarly collects data on how each subject uses their phone, and uses that data to make cash transfer decisions. This section describes the app and experimental design (Section 4.1); estimates costs of manipulation and derives strategy-robust decision rules using our method; and compares the performance of these new estimators to traditional learning algorithms (Section 4.3). Our design was pre-specified in a pre-analysis plan registered in the AEA RCT registry under AEARCTR-0004649.

4.1 Experimental design and smartphone app

Our experiment is intended to create an environment with incentives similar to those of a ‘digital credit’ lending app. These apps run in the background on a smartphone, and collect rich data on phone use (including data on communications, mobility, social media behavior, and much more). Digital credit apps use this information to allocate loans to people who appear creditworthy (i.e., for whom exceeds some threshold). Since financial regulations prevented us from actually underwriting loans to research subjects, we instead focused on analogous problems where a decisionmaker wishes to allocate resources to individuals with specific characteristics—for instance, by paying individuals who have a certain income level, or other characteristic (e.g., intelligence, level of activity, education).181818While these target predictions may bear little resemblance to credit-worthiness, there are many settings where characteristics like these are being inferred by digital traces (for example, welfare programs that target unmarried women, or digital advertisers who target college students). This allows us to focus on the mechanics of manipulation in a prediction task, which is the same regardless of which outcome is predicted.

Smartphone app

The ‘Smart Sensing’ app we built has has two key features. First, it runs in the background on the smartphone to capture anonymized metadata on how individuals use their phones, such as when calls or texts are placed, which apps are installed and used, geolocation, battery usage, wifi connections, and when the screen was on. In total, we extract over behavioral features — Appendix Figure A2 shows the correlation between 80 different behavioral indicators (“features”) collected through the app.191919The app is designed to capture this data with minimal impact on battery life and performance. Data is uploaded to secure Busara servers at a set frequency, or can be uploaded manually. Second, the app provides a platform to deliver weekly “challenges” to research subjects (see Figure 2). These challenges appear on the subject’s phone, and offer financial incentives based on their behavior. The challenges can be very simple (‘You will receive 12 Ksh. for every incoming call you receive this week’) or more complex (‘Earn up to 1000 Ksh. if the Sensing app guesses you are a high-income earner’). Users are paid a base amount of 100 Ksh. for uploading data, plus any challenge winnings, directly via M-PESA at the conclusion of each week.

(a) Installation Screen

(b) Challenge with Hint

(c) Earnings Calculator

Figure 2: Smart Sensing App

Study population and recruitment

The subject population consists of Kenyans aged 18 years or older who own a smartphone and are able to travel to the Busara center in Nairobi. Participants were recruited through in person solicitations in public spaces in neighborhoods around Nairobi. From this master list of potential participants, every third individual was saved for a ‘top up’ sample; we drew invited individuals from this list to participate later in the experiment, to form a fresh test sample. The remaining sample was invited at the beginning. All individuals were sequentially invited for an enrollment session at the Busara center. (The center had a capacity to enroll 200 people per week.) During enrollment, participants complete a survey on a tablet on demographics and technology usage. These responses will form the ground truth about users that we seek to infer based on phone usage behavior.

Prospective participants were given the opportunity to install the Sensing App on their phones for about 16 weeks. Participants were told the dimensions of behavior that would be captured and used anonymously, and assured that no content of calls or text messages would be recorded. Participants were given the opportunity to ask questions. Participants showed understanding of the privacy tradeoffs involved, and voiced trust in Busara based on its positive reputation in this community. Participants who opted in to the study were offered help installing the Sensing App, which provided the main interaction of the study. During installation, participants had the opportunity to view the Android permissions required and to decide whether to accept. Our sample includes only participants who opted in. Participants could elect to receive challenges in English, Swahili, or both. 82.6% elected English, 15.9% elected Swahili, and 1.4% elected both.

Weekly rhythm

The study follows a weekly rhythm. Each Wednesday at noon, each user receives a generic notification, ‘Opt in to see this week’s challenge!’, via Android notifications and a text message. When a user opens the app, it will ask them to opt in to a challenge for that week. Only after a user opts in are the details of their challenge for that week revealed (see Figure 2).202020To minimize the possibility of differential attrition, the pre-opt-in notification was the same for all users regardless of their assigned challenge. Challenges are valid until 6pm Tuesday. At the conclusion of the challenge, users have 16 hours to ensure that their data is uploaded (until 10am Wednesday). Busara then computes and sends any payments to users via M-PESA by noon Wednesday, and users receive the next challenge.

Each week, participants could attrit in two ways: by not uploading their data, or by not opting in to the challenge.212121As some participants may upload data sparsely throughout the week, only those who upload within the 21-hour window at the end of the challenge-week (between 1pm Tuesday and 10am Wednesday) will be counted as having fully uploaded all of their weekly data. Participants who failed to upload or opt in were sent text message reminders, or called by Busara staff, following an attrition protocol detailed in Appendix A1.2. We include in our analysis only participant-weeks where the participant opted in, and uploaded during the end-of-week upload window.

4.2 Baseline predictions and model estimation

Predicting user characteristics

We begin the experiment with baseline weeks that have no incentives (no active challenges). These baseline weeks allow us to estimate each individual’s type in absence of manipulation, .222222In these ‘control’ weeks, the subject receives a challenge of the form, ‘Dear user, you do not have to do anything for this week’s challenge. You will receive an extra Ksh 50 for accepting this challenge.’ Our method could also be used without these control weeks, as long as there is variation in incentives between weeks; one would then need to net out the manipulation in estimation. We estimate each dimension of type using Equation 3, with week fixed effects to absorb idiosyncratic weekly shocks.

Consistent with prior work (blumenstock_predicting_2015; bjorkegren_behavior_2019), we find that characteristics of users can be predicted from phone behaviors. Results for several outcomes, based on OLS, are shown in Table 2. For characteristics such as monthly income, intelligence (Ravens Matrices), and overall phone activity, values range from 0.02 to 0.15. To make these rules easier for participants to interpret, we will focus on three variable decision rules selected via LASSO; the last row of Table 2 shows that these obtain similar when cross validated.

Monthly Income Intelligence Activity PCA
OLS (Ravens)
Average Duration of Workday Calls -6. 877 (0. 471) 0. 0009 (0. 6) -0. 0007 (0. 185)
Average Duration of Outgoing Calls 5. 746 (0. 584) -0. 0005 (0. 815) 0. 0003 (0. 607)
Calls with Non-Contacts -27. 747 (0. 005)*** -0. 006 (0. 001)*** 0. 0002 (0. 649)
# Unique Evening Text Contacts 102. 477 (0. 129) 0. 016 (0. 196) 0. 003 (0. 435)
Incoming Call Count 14. 962 (0. 065) 0. 001 (0. 416) 0. 005 (0. 0)***
Evening Text Count -5. 904 (0. 194) -0. 0007 (0. 399) -0. 0002 (0. 322)
Average Duration of Evening Calls -1. 739 (0. 637) 0. 0004 (0. 614) 0. 0007 (0. 703)
Minimum Duration of Weekend Calls 2. 950 (0. 874) 0. 003 (0. 406) -0. 0008 (0. 935)
Outgoing Texts on Weekdays -7. 130 (0. 417) -0. 002 (0. 225) -0. 0001 (0. 791)
Outgoing Text Count 3. 666 (0. 621) 0. 0008 (0. 585) 0. 001 (0. 001)***
Outgoing Call Count 14. 556 (0. 004)*** -0. 001 (0. 14) 0. 004 (0. 0)***
Incoming Text Count 1. 762 (0. 6) 0. 002 (0. 013)** 0. 001 (0. 0)***
Intercept 5259. 547 (0. 0)*** 5. 071 (0. 0)*** -0. 956 (0. 0)***
N 1539 1557 1415
R2 0. 0241 0. 0223 0. 7593
LASSO: 3 covariate model, 10-fold CV R2 0. 0180 0. 0044 0. 6173


: Each column indicates a different prediction target. P-values in parentheses. N represents individuals. 10-fold cross-validated R2 is reported for a LASSO regression where the regularization parameter is set in order to achieve a 3-covariate model.

Table 2: Behavior Predicts Individual Characteristics

Evidence that app-based challenges induce manipulation

We will eventually use variation in behavior induced by our randomized experiment to estimate the cost of manipulating different behaviors, . This exogenous variation comes from weeks when subjects are assigned ‘simple’ challenges that incentivize modifying a single behavior, of the form, ‘We’ll pay you for each additional you do’, where amount and behavior are assigned randomly. For example, one challenge was, ‘You will receive 3 Ksh. for each text you send this week, up to Ksh. 250.’ In the long run, individuals may identify new, easier ways to manipulate these indicators. To mimic this, we held focus groups to identify the most effective ways to manipulate different features, and during onboarding, exposed each participant to a discussion of how one could change different types of behavior (this is similar to hiring ‘white hat’ hackers to uncover security weaknesses).

People response to these challenges, as anticipated by our theory (Equation 2). For intuition, Table 3 shows how behavior changed in response to simple challenges. Each column shows a regression of an outcome on different incentives (randomly assigned). Individuals manipulate the particular behaviors that were incentivized, as shown by the diagonal, which is positive and significant for these outcomes. Incentivizing one behavior also affects others, as shown in the off diagonal elements. For example, incentivizing missed incoming calls also increases the number of texts sent (presumably requests to contacts to be called). Our method can theoretically exploit these cross elasticities.

# Texts # Missed # Missed # People called # Calls w non-
sent calls calls (workday) contacts (weekend)
(outgoing) (incoming) (M-F, 9am-5pm)
change in actions per ¢   of incentive
# Texts sent 24.508 -0.052 -0.836 -0.305 -0.022
(0.0)*** (0.929) (0.337) (0.161) (0.953)
# Missed Outgoing Calls 4.16 0.709 0.825 0.128 -0.002
(0.058)* (0.079)* (0.167) (0.391) (0.995)
# Missed Incoming Calls -0.206 0.324 1.187 0.22 0.502
(0.942) (0.536) (0.126) (0.255) (0.126)
# People Called during Workday 2.307 0.156 0.68 0.497 0.108
(0.357) (0.734) (0.318) (0.003)*** (0.708)
# Calls w Non-Contacts on Weekend -2.022 -0.056 1.234 0.015 1.233
(0.481) (0.916) (0.113) (0.94) (0.0)***
Week and Individual Fixed Effects X X X X X
N (person-weeks) 7976 7976 7976 7976 7976
R2 0.705 0.637 0.552 0.604 0.491

Notes: P-values in parentheses. Bold indicates diagonal: effect on behavior when behavior is incentivized. N represents person-weeks when no “incentive challenge” was assigned to the given participant. Individual and weekly fixed effects included, excluding the first week and first individual hash. Each column represents a separate regression, over the full set of covariates assigned; only the first five coefficients reported here. * p < 0.1, ** p < 0.05, *** p < 0.01.

Table 3: Behavior Changes when Incentivized

Since we have a limited sample on which to estimate costs, our challenges focus on incentivizing a subset of focal behaviors (from the full set of ). Specifically, we select behaviors that are useful in predicting the set of user characteristics that form the basis for our ‘complex’ challenges. To identify this subset, we run LASSO regressions for each to induce variable selection, and include the selected variables . For each of these variables, we pair an additional behavior that measures a similar concept but which we anticipate may be differently easy to manipulate (for example, if a naïve regression selects outgoing calls, we will also include the variable incoming calls).232323We determined “similar” behaviors as those that met at least one of the following conditions: (1) correlated with the primary behavior with a coefficient of at least 0.75; (2) was a ‘close cousin’ of the primary behavior, in that it was a different transformation of a similar underlying behavior (e.g., for ‘weekly number of late-night calls’, ‘maximum number of late-night calls in a single day’ would be considered a close cousin); (3) a cross validated LASSO regression that excluded the principal behavior from the feature set then newly picked out this variable in its optimal set. From this list of similar behaviors, we picked alternates based on our intuition of which behaviors would substitute the best, and which would be the easiest to explain in a challenge. Note that by including only a subset of variables, our procedure implicitly assumes that omitted variables are costless to manipulate (and therefore should not be included in any decision rule); we will thus underestimate the performance that could be attained with our method if costs were fully estimated.242424Note that this procedure will perform poorly if baseline predictiveness and manipulation cost are highly negatively correlated: in that case we may omit a behavior which is less predictive at baseline but is more predictive in the counterfactual because it is difficult to manipulate. In Section 5, we evaluate other potential methods to lower the expense of measuring manipulation costs.


Finally, we use the data from all weeks of the experiment to jointly estimate types and manipulation costs (using GMM with the moment conditions outlined in Section 3.1). We allow manipulation cost to differ by behavior, by whether a person reports having high tech skills, and by an unobserved random effect by person.252525We have allowed for a single dimension of observed heterogeneity in costs ; with the rest absorbed into unobserved heterogeneity . Thus Spence signaling will only be captured in that dimension . With a larger sample one could estimate a more nuanced functional form for the observable portion, which would better capture the correlations between gaming ability and bliss behavior . Table 4 summarizes these estimated costs. With our sample size, we find that off diagonal elements are noisily estimated, so we penalize them to zero (); this results in a diagonal cost matrix .

Several intuitive patterns can be discerned from the estimated manipulation costs in the top panel of Table 4 (here we present only behaviors selected by models; see Supplemental Appendix for all estimated costs). Outgoing communications are less costly to manipulate than incoming communications. Text messages, which are relatively cheap to send, are more manipulated than calls, which are relatively expensive. We also find that complex behaviors (such as the standard deviation of talk time; estimated but not shown on this summary diagram) are less manipulable than simpler behaviors (such as the average duration of talk time).

Costs are also heterogeneous across people, as shown in the bottom panel of Table 4. On average it is 10%pt easier for individuals who report advanced or higher tech skills to manipulate their mobile phone behaviors. Overall, including unobserved heterogeneity in gaming ability, the 90th percentile finds it 2.5 times easier to game than the 10th percentile.

Heterogeneity by Behavior ( diagonal; subset of behaviors selected by models)
Heterogeneity by Person ()
Low tech skills 1.00
High tech skills 1.10

In top panel: Red: used in a LASSO model; blue: used in SR model. Line segment represents standard error. Parameters estimated using GMM. In cost matrix, off diagonal elements

regularized to zero (), diagonal elements regularized with , set via cross validation. Standard errors estimated from PD approximation of inverse Hessian. Shown here with winsorized at top and bottom of range; in implementation, only bottom is winsorized, to maintain assumption of non-negative . Only behaviors selected by models shown in Panel I; for all behaviors see Supplemental Appendix.

Table 4: Estimated Manipulation Costs

4.3 Results: Naive vs. Robust Decisions

The final and most important stage of the experiment compares decisions made by standard machine learning algorithms to the decisions made by our new strategy-robust estimator that accounts for the cost of manipulating behavior. The robust decision rules can be directly estimated with Equation 1, which relies on the estimates of and that come from previous stages of the experiment.

In this final stage, subjects receive complex challenges that reward them for their ultimate classification, of the form ‘We’ll pay you M if you are classified as

.’ We consider a focal challenge of the form, ‘Earn up to 1000 Ksh. if the Sensing app guesses you are a high-income earner.’ These challenges are designed to mimic real world applications of machine learning, where depending on how they are classified, users may receive a loan (digital credit), grant (targeted aid), or other benefits.

Estimating Decision Rules

In order to keep decision rules simple and interpretable for our participants, we consider decision rules of up to three features. We regularize naïve decision rules to three features, selecting

, where is the smallest hyperparameter that results in a 3 variable LASSO model. We use the same hyperparameter to penalize our strategy robust decision rule, and allow it to select only among three variable models.262626For a given that selects three variables in a LASSO model, the strategy robust model will tend to select more than three variables, because it induces some penalization on its own. Instead of restricting to three variable models, one could alternately increase .


Participants are randomly assigned into different targets (), decision rules (standard: , or robust ), and whether the decision rule is kept opaque or revealed transparently to the user. Under the opaque treatment, users are told only the outcome and the reward. Under the transparent treatment, users see the coefficients of the decision rule, which reveals how much they are rewarded for changing which behaviors. We included an interactive interface that can be used to compute the payments that would result from different behavior (see Figure 2c). Because the transparent treatment reveals information about potential decision rules, after a person has seen a transparent challenge for , we do not assign them to an opaque challenge for the same outcome.

Table 5 summarizes the effect of decision rule incentives on behavior. High income people make more outgoing calls, and send fewer texts but receive more. If we pay people to ‘act like a high-income earner,’ without revealing the decision rule, the response is noisy and often in the wrong direction (participants place fewer calls and send more texts). Participants who are transparently presented with the decision rule change their behavior, closer to the direction incentivized by the algorithm, though the response is still noisy.

# Calls # Texts # Texts # Calls w Non-Contacts Mean Call Duration
(outgoing) (outgoing) (incoming) (incoming + outgoing) (evening, seconds)
Weekly Challenge: Use your phone like a high-income earner!
Panel I: Incentives Generated by Algorithm (¢/action)
0.625 -0.395 0.065 0 0
Panel II:
Assigned to challenge, -6.5573 14.3701 12.0135 1.1672 -6.8104
algorithm opaque (9.949) (16.405) (20.583) (3.473) (7.002)
Assigned to challenge, 11.8231 -15.69 -11.907 0.6706 -4.5744
algorithm transparent (9.083) (14.976) (18.79) (3.17) (6.392)
N (Person-weeks) 1664 1664 1664 1664 1664

Notes: The first panel reports the decision rule associated with the challenge. The second reports the results of a regression of behavior on challenge assignment. Regressions estimated based on dummy indicators for complex challenge assignment for participants assigned “income” challenge, over person-weeks when the income challenge was assigned or when no challenge was assigned (“control” weeks). Simple challenge assignment person-weeks, used in estimating costs, are not included. Standard errors in parentheses.

Table 5: Agents Game Algorithms

Performance of decision rules

We compare performance of naïve vs. robust decision rules in Table 6. The first two columns (under ‘Income’) show results for the challenge that incentivized participants to use their phones like a high-income earner; the last two columns show the performance averaged across several different challenges. The decision rules and associated manipulation costs are shown in the top panel (“Decision Rule”); the relative performance of the different estimators is shown below (under “Prediction Error”). We note several results.

Income Costs All Outcomes (Pooled)
Income, Intelligence, Activity PCA
Decision Rule
¢/action ¢/action2
# Calls (outgoing) 0.625 0.542 0.591 . .
# Texts (outgoing) -0.395 -0.107 0.035 . .
# Texts (incoming) 0.065 0 0.038 . .
# Texts (6pm-10pm) 0 -0.121 0.058 . .
Prediction Error RMSE ($) RMSE ($)
Baseline Data: Control 3.55 3.55 3.70 3.75
Baseline Data: Predicted Transparent 4.66 3.83 4.34 3.85
Implemented: Opaque 3.24 3.23 4.00 3.80
Implemented: Transparent 3.87 3.66 4.93 4.31
Predicted Cost of Transparency 0.28 0.15
Equilibrium Cost of Transparency 0.41 0.31
Average Payout ($) 3.30 3.24 3.23 2.98
N (Control Person-Weeks) 3781 3781 3781 3781
N (Treatment Person-Weeks, Opaque) 85 85 230 230
N (Treatment Person-Weeks, Trans.) 91 74 252 216

Notes: The first panel reports the decision rule associated with the challenge, and the costs associated with these behaviors. The second reports the performance of the different models over the groups they were assigned to; on the left, the naive LASSO regression, and on the right, this paper’s strategy-robust (SR) model. Performance figures estimated using a regression of model indicators on week-model RMSE, weighted by number of person-weeks. ‘Transparent Predicted’ RMSE denotes the RMSE that our theoretical model expected, given costs of manipulation and behavioral incentives. ‘Predicted Cost of Transparency’ denotes the difference between predicted transparent RMSE under the SR model and baseline RMSE under the naive LASSO. ‘Equilibrium Cost of Transparency’ denotes the difference between implemented transparent SR model RMSE and opaque naive model RMSE. Pooled performance is estimated using this same regression approach, after combining all model-weeks over the three outcomes investigated: a PCA of phone activity, intelligence, and monthly income. Full regression results and standard errors reported in appendix.

Table 6: Strategy Robust vs. Standard Decision Rules

First, in the top panel, we observe important differences in the decision rules estimated by vs. . LASSO places weight on the behaviors that were most correlated at baseline: outgoing calls, outgoing texts, and incoming texts. However, the estimated costs of manipulating some of these behavior – and in particular the costs of manipulating text messaging behavior – are low, and therefore likely to be manipulated when incentivized. Thus, our strategy robust decision rule both selects less manipulable behaviors (evening texts rather than incoming texts), and shrinks manipulable behaviors (especially outgoing texts).

We evaluate prediction error using root mean squared error (RMSE), in units of dollars, in the middle panel. The magnitude of error is similar to the average payout, around $3 for a week. The first row shows prediction error in the baseline data: LASSO performs slightly better than our strategy robust estimator when no manipulation is expected. But when people manipulate their behavior, our method is expected to perform better, as shown in the second row.

When actually implemented, our method performs better when the decision rule is transparent (average error $3.66 instead of $3.87 for income; or $4.31 vs. $4.93 for all outcomes pooled). When the decision rule is opaque, we find that our method performs comparably to or slightly better than LASSO, possibly due to increased shrinkage ($3.23 vs. $3.24 for income; $3.80 vs. $4.00 for all outcomes pooled). Table A3 reports results for all outcomes.

Even if a policymaker intended to keep the decision rule opaque, using our robust method can reduce systematic risk in the chance that agents discover the decision rule. In practical implementations, policymakers could adaptively tweak the level of robustness to match the level of manipulation. An ad hoc approach could select a convex combination of the naive and robust models; a more nuanced approach could model consumers’ uncertainty about the model.

Cost of transparency

Our framework provides a way to bound a key cost of imposing algorithmic transparency (akyol_price_2016). Many tech firms argue that imposing transparency would reduce the quality of machine decisions, because rules may perform better if they can rely on opacity to prevent manipulation. Our method allows us to bound this performance cost. We can compare the performance arising from the optimal opaque rule (under the assumption that opacity will prevent it from being manipulated) to the optimal equilibrium transparent rule (factoring in equilibrium manipulation). Because the opaque rule also faces the threat of manipulation, this difference is the upper bound of the performance cost of imposing transparency, arising from increased manipulation.

The most straightforward way to measure this cost of transparency would require disclosing the decision rule to a subset of users, and assessing any drop in performance after a process of equilibration. But for the most consequential decisions, once the decision rule is revealed to some, it can leak out to the entire market. Such disclosure irreversibly tips the market to transparency, and thus is a nonstarter for policy discussions.

Crucially, under the assumptions of our model, this quantity can be estimated without revealing the decision rule: it only requires the estimation of types and costs (the first part of our experiment).272727Our method of estimating costs does requires revealing the existence of features to users, but does not require specifying whether those features are included in the model, or with what weights (one could estimate costs for a large set of features, hiding the features critical to the model). Our method makes it possible for regulators or firms to assess the cost that transparency would impose—prior to making their model transparent. Our model based estimates suggest that transparency introduces a performance cost of (8% of baseline error) for our income targeting rule, or (4%) for all outcomes pooled together. These numbers are shown in the final rows of the middle panel of Table 6.

When we actually implement transparency in our experiment, we find that the performance cost is similar to these model based estimates: (13%) for income, or (8%) for all outcomes pooled together. (To mitigate the problem of leakage, we only assess opaque performance prior to each individual observing a transparent challenge for that outcome. Because our decision rules were not going to be used later in production, we were unconcerned about them leaking out after the experiment.)

5 Extensions

5.1 Alternate methods to estimate manipulation costs

Our method requires estimating and , which are new objects. The experimental approach we use is likely not feasible in many settings. We offer suggestions on alternative approaches to measure these costs.

Expert elicitations. We evaluate how well experts can predict the costs of manipulating different behaviors, using a method similar to dellavigna_predicting_2016. We sent a survey to 177 experts with different backgrounds (PhDs from different fields, research assistants, Busara staff who had not worked on the experiment, and Mechanical Turk workers in the US) to predict how Kenyans would manipulate different phone behaviors when incentivized. Results are shown in Figure 3. In panel A, we compare the predicted change in behavior from a given incentive to the actual experimental estimate (). In Panel B, we compare the implied structural cost estimates (for predicted costs ); although experts predict that costs are too low, the correlation is 0.75. This suggests that it may be possible to use expert elicitations to estimate manipulation costs.

(a) Reduced Form Shift in Behavior (b) Structural Cost Estimates

For structural costs we set and .

Figure 3: Expert Elicited Manipulation Cost Estimates

Partially estimated. The costs of behavior may be related to that of behavior . Because of this, we may be able to predict unknown cost based on correlations between types and known costs, for some prediction function: .

5.2 Nonlinear decision rules

To sharpen intuition, this paper focuses on linear decision rules. While many modern machine learned decision rules are nonlinear, agents’ beliefs about those rules may be well approximated by linear functions. In such a context, our derivations could be viewed as linear approximations to both these beliefs, and the actual functions. Additionally, it may be that some benefits of extreme nonlinearities that can surface in modern machine learning are lessened when manipulation is taken into account: contract theory suggests that linear decision rules are more robust (holmstrom_aggregation_1987; carroll_robustness_2015).282828With the exception that linear models can be subject to the influence of outliers; one may thus want to tamp down inputs as they approach the boundaries of the distribution of training data.

Our approach could also be extended to work in nonlinear settings. In nonlinear environments there may also be multiple equilibria. In such a setting, if iterative learning converges, it may converge to a local optimum, whereas an approach like ours could be used to select a global equilibrium.292929Thanks to Glen Weyl for this point.

6 Conclusion

This paper considers the possibility that the implementation of machine decisions changes the world they describe. We focus on the case where individuals manipulate their behavior in order to game decision rules. Our chief contribution is to derive decision rules that anticipate this manipulation, by embedding a behavioral model of how individuals will respond. This structural approach makes it possible to decompose decision rules into constituent components, and to gather data on how those components can be manipulated. From these components, our structural model allows us to understand how any proposed decision rule of a given form would be manipulated. This allows us to compute decision rules that are optimal in equilibrium.

We demonstrate our method in a field experiment in Kenya, by deploying a tailor-made smartphone app that mimics the ‘digital credit’ loan products that are now commonplace in sub-Saharan Africa. We find that even some of the world’s poorest users of technology – who are relatively recent adopters of smartphones and to whom whom the concept of an ‘algorithm’ is quite foreign (musya_how_2018) – are savvy enough to change their behavior to game machine decisions. In this setting, we show that our strategy robust estimator outperforms standard estimators on average by 13% when individuals are given information about the scoring rule. This framework also allows us to quantify the “cost of transparency”, i.e., the loss in predictive performance associated with moving from “security through obscurity” (with a naive decision rule) to a regime of full algorithmic transparency (with our strategy-robust rule). We estimate this loss to be roughly 8% in equilibrium – substantially less than the 23% loss associated with making the naive rule transparent.

Our discussion focuses on the simple case of linear models with a small number of predictor variables, where subjects have either no information or full transparency of the scoring rule. We envision useful extensions to more complex models and more nuanced beliefs. More generally, our approach of embedding a model of behavior within a machine learning estimator may be relevant to a wide range of contexts where machine learning systems face a changing human environment. In this sense, it offers a machine learning interpretation of

lucas_econometric_1976, where algorithmic decisions change the context of the systems they model. For example, financial forecasts may affect the underlying financial processes they attempt to describe, personalized news recommendations may change the information seeking behaviors of consumers, and predictions about the intensity of a disease may affect individuals’ protective behaviors and thus its realized intensity.



A1 Experimental Design

a1.1 Pre Analysis Plan

This study was pre-registered with the AEA RCT Registry (AEARCTR-0004649) prior to the experiment (September 3, 2019).303030Prior to the collection of the main outcomes in phase 2, we amended the registration, adding one sentence that specifies that the focal performance measure will be mean squared error (which corresponds with the objective minimized by the method; January 15, 2020). We later noticed that the registration still contained text in another section that appeared to specify that the focal measures would be or AUC; prior to the completion of phase 2 and prior to analysis of the main outcomes, we amended the registration to delete that sentence (February 4, 2020).

Our implementation deviated in several respects from the pre analysis plan: at the start of phase 2 the cloud server account ran out of storage space, and the Busara center was hit by a power outage due to construction on a nearby road. These two events disrupted servers for several hours during the upload window, and caused some participants’ phones to become overloaded with records. It took several weeks to recover the affected participants. Because of the disruption, we extended phase 2 and delayed the expert cost surveys.

a1.2 Attrition Management

Attrition in the context of this study had two dimensions: first, there were participants who do not regularly upload data through their app, and second, there were participants who did not participate in the assigned weekly challenges. (As some participants may have uploaded data sparsely throughout the week, only those who uploaded within the 21-hour window at the end of the challenge-week [between 1pm Tuesday and 10am Wednesday] were counted as having fully uploaded all of their weekly data.)

In order to minimize both such types of attrition, participants were sent regular reminders via text to encourage engagement. Every participant in the study was sent a text every Tuesday at 1pm to remind them to upload their data through the Smart Sensing app.

Additionally, on Wednesday, Thursday and Friday, participants who still had not uploaded data or activated their challenge respectively were contacted by phone and surveyed by the Busara team. Specifically, the protocol was as follows:

  • On Wednesday, participants who had not uploaded any data during the five day period ending on Wednesday at 12pm were contacted and surveyed, as were those who uploaded some data in this period but not during the ‘end-of-week upload window’ (between 6pm Tuesday and 10am Wednesday)

  • On Thursday, participants whose phones showed that they did not receive a challenge by Thursday 12pm were contacted and surveyed, as were participants whose phones show that they did receive a challenge but who had not opted in to accept the challenge.

  • On Friday, participants whose phones showed that they still had not received and opted-in to a challenge were contacted and surveyed, as were participants whose phones showed that they did receive a challenge but who had not opted in to accept the challenge.

For all of the above categories, any participant who did not answer a call on the first attempt would be re-contacted once more by the surveyor after the rest of the calls were complete.

Finally, to mitigate the effects of attrition during the analysis stage, any participant-weeks wherein the participant did not opt in and/or did not upload during the end-of-week upload window were dropped from the sample prior to all analysis. During baseline weeks, a single passive challenge was assigned to all participants, offering a flat bonus to upload data within the upload window; in this way, we ensured that our analysis control groups would also be restricted to those who opt in to this passive challenge, and were thus a valid comparison group to the restricted panel during the challenge weeks.

a1.3 Communicating Decision Rules

In focus groups we found that individuals had difficulty understanding decimals or complicated mathematical operations (e.g., standard deviation). We stuck to simple behaviors and formatted decision rules as follows, to make it easier for participants to understand how their marginal behavior affects their payment:

  • Each coefficient is rounded to the nearest integer. If the nearest integer is zero, the denominator was inflated by factors of 10 until it became nonzero. (If the unit was seconds or minutes, the denominator was instead inflated by factors of 60.)

  • The order of indicators was randomized between three orderings (ABC, CAB, BCA for indicators A, B, and C).

  • The constant term was reported last, unless the first coefficient was negative, in which case the constant was reported first.

A2 Appendix Figures

Note: The first behavior is more predictive in the baseline behavior (), but is easily manipulable (). Below panels show weights on coefficients as manipulation costs are scaled for:
(a) (b) Interaction ,
As becomes cheaper to manipulate ( decreases), places less weight on it, and adjusts the weight placed on . If manipulating one variable makes it easier to manipulate the other ( sufficiently negative), reduces weight on both.
, , , , . Squared error measured on an out of sample draw from the same population, incentivized to that decision rule.
Figure A1: Comparative Statics

Additional Comparative Statics

and heterogeneous The first behavior is more predictive in the baseline behavior (), but is easily manipulable (). homogenous: Features equally costly to manipulate homogeneous: Same gaming ability
(d) (e)
Manipulation Cost = 1
Like LASSO, ridge places more weight on . Our method can be combined with other forms of penalization (such as ridge shown here), to more finely manage out of sample fit. When features are equally costly to manipulate, our method penalizes in a similar manner to ridge. When gaming ability is homogenous, everyone shifts behavior equally. Predictive performance remains high, but utility is wasted on manipulation.
, , , , , , . Squared error measured on an out of sample draw from the same population, incentivized to that decision rule.

Each row and column represent a feature of behavior. Features are clustered into similar groups. The diagonal indicates that the correlation of a feature with itself is +1.

Figure A2:

A3 Appendix Tables

Decision Rule Performance (squared loss)
No manip. Manipulation
Panel A: Data generating process
0.200 3.000 0.100 0.100 0.267 3745.046
Panel B: Standard Approaches
0.205 3.042 0.061 0.116 0.266 3961.225
    ‘Industry’ Approach (estimated with just data from that period)
after -0.798 0.061 2.090 -1.675 3.275 625.762
after 0.172 3.111 -0.040 0.215 0.270 4332.208
after -0.755 0.120 2.077 -1.671 3.071 619.059
after -0.393 3.741 -1.341 1.566 1.375 11611.884