Strategic Classification from Revealed Preferences

by   Jinshuo Dong, et al.

We study an online linear classification problem, in which the data is generated by strategic agents who manipulate their features in an effort to change the classification outcome. In rounds, the learner deploys a classifier, and an adversarially chosen agent arrives, possibly manipulating her features to optimally respond to the learner. The learner has no knowledge of the agents' utility functions or "real" features, which may vary widely across agents. Instead, the learner is only able to observe their "revealed preferences" --- i.e. the actual manipulated feature vectors they provide. For a broad family of agent cost functions, we give a computationally efficient learning algorithm that is able to obtain diminishing "Stackelberg regret" --- a form of policy regret that guarantees that the learner is obtaining loss nearly as small as that of the best classifier in hindsight, even allowing for the fact that agents will best-respond differently to the optimal classifier.


page 1

page 2

page 3

page 4


Grinding the Space: Learning to Classify Against Strategic Agents

We study the problem of online learning in strategic classification sett...

Strategic Classification in the Dark

Strategic classification studies the interaction between a classificatio...

Online Convex Optimization Perspective for Learning from Dynamically Revealed Preferences

We study the problem of online learning (OL) from revealed preferences: ...

The Disparate Effects of Strategic Classification

When consequential decisions are informed by algorithmic input, individu...

The Disparate Effects of Strategic Manipulation

When consequential decisions are informed by algorithmic input, individu...

The Role of Randomness and Noise in Strategic Classification

We investigate the problem of designing optimal classifiers in the strat...

PAC-Learning for Strategic Classification

Machine learning (ML) algorithms may be susceptible to being gamed by in...

1 Introduction

Machine learning is typically studied under the assumption that the data distribution a classifier is deployed on is the same as the data distribution it was trained on. However, the outputs of many classification and regression problems are used to make decisions about human beings, such as whether an individual will receive a loan, be hired, be admitted to college, or whether their email will pass through a spam filter. In these settings, the individuals have a vested interest in the outcome, and so the data generating process is better modeled as part of a strategic game in which individuals edit their data to increase the likelihood of a certain outcome. Tax evaders may carefully craft their tax returns to decrease the likelihood of an audit. Home buyers may strategically sign up for more credit cards in an effort to increase their credit score. Email spammers may modify their emails in order to evade existing filters. In each of these settings, the individuals have a natural objective that they want to maximize — they want to increase their probability of being (say) positively classified. However, they also experience a cost from performing these manipulations (tax evaders may have to pay some tax to avoid an audit, and email spammers must balance their ability to evade spam filters with their original goal in crafting email text). These costs can be naturally modeled as the distance between the “true” features of the individual and the manipulated features that he ends up sending, according to some measure. In settings of this sort, learning can be viewed as a game between a learner and the set of individuals who generate the data, and the goal of the learner is to compute an equilibrium strategy of the game (according to some notion of equilibrium) that maximizes her utility.

The relevant notion of equilibrium depends on the order of information revelation in the game. Frequently, the learner will first deploy her classifier, and then the data generating players (agents) will get to craft their data with knowledge of the learner’s classifier. In a setting like this, the learner should seek to play a Stackelberg equilibrium of the game — i.e. she should deploy the classifier that minimizes her error after the agents are given an opportunity to best respond to the learner’s classifier. This is the approach taken by the most closely related prior work: BS11 and HMPW16. Both of these papers consider a one-shot game and study how to compute the Stackelberg equilibria of this game. To do this, they necessarily assume that the learner has (almost) full knowledge of the agents’ utility functions; in particular, it is assumed that the learner has access to the “true” distribution of agent features (before manipulation), and that the costs experienced by the agents for manipulating their data are the same for all agents and known to the learner111The particulars of the models studied in BS11 and HMPW16 differ. Brückner and Scheffer model a single data generation player who manipulates the data distribution, and experiences cost equal to the squared distance of his manipulation. Hardt et al. study a model in which each agent can independently manipulate his own data point, but assume that all agents experience cost as a function of the same separable cost function, known to the learner..

The primary point of departure in our work is to study the strategic classification problem when the learner does not know the utility functions of the agents: neither their true features , nor the cost they experience for manipulation (which can now differ for each agent). In this setting, it no longer makes sense to study learning as a one-shot game; we cannot compute an equilibrium when the utility functions of the agents are unknown. Instead, we model the learning process as an iterative, online procedure. In rounds , the learner proposes a classifier . Then, an agent arrives, with an unknown set of “true” features , and an unknown cost function for manipulation. The learner observes only the manipulated set of features , that represent the agent’s best response to . After classification, the learner observes the agent’s true label , and suffers the corresponding loss. Crucially, the learner never gets to observe either or , and we do not even assume that these are drawn from a distribution (they may be adversarially chosen). The only access that the learner has to these parameters is via the revealed preferences of the agents — the learner gets to observe the actions of the agents, which are the result of optimizing their unknown utility functions. The learner must use this information to improve her classifier.

We measure the performance of our algorithms via a quantity that we call Stackelberg regret: informally, by comparing the average loss of the learner to the loss she could have experienced with the best fixed classifier in hindsight, taking into account that the agents would have best-responded differently had she used a different classifier. If the learner were in fact interacting with the same agent repeatedly, or if the agents were drawn from a fixed distribution, then the guarantee of diminishing Stackelberg regret would imply the convergence to a Stackelberg equilibrium of the corresponding one-shot game. However, Stackelberg regret is more general, and applies even to settings in which the agents are adversarially chosen.

We add one further twist. Previous work on strategic classification has typically assumed that all agents are strategic. However, the equilibrium solutions that result from this assumption may be undesirable. For example, in a spam classification setting, the Stackelberg-optimal classifier may attain its optimal accuracy only if all agents — even legitimate (non-spam) senders — actively seek to manipulate their emails to avoid the spam filter. In these settings, it would be more desirable to compute a classifier that was optimal under the assumption that spammers would attempt to manipulate their emails in order to game the classifier, but that did not assume legitimate senders would. To capture this nuance, in our model, only agents whose true label (e.g. spammers) are strategic, and agents for whom are non-strategic.

1.1 Our Results and Techniques

The problem that the learner must solve is a bi-level optimization problem in which the objective of the inner layer (the agents’ maximization problem) is unknown. Even with full information, bi-level optimization problems are often NP-hard. As a first step in our solution, we seek to identify conditions under which the learner’s optimization problem is convex. Under these conditions, computing an optimal solution would be tractable in the full information setting. The remaining difficulty is solving the optimization problem in our limited feedback model.

We study learners who deploy linear classifiers

, and consider two natural learner loss functions: logistic loss (corresponding to logistic regression) and hinge loss (corresponding to support vector machines). The agents in our model are parameterized by target feature vectors

and cost functions , and will play the modified feature vector that maximizes their utility given . We model agent utility functions as . Using tools from convex analysis, we give general conditions on the cost functions that suffice to make the learner’s objective convex, for both logistic and hinge loss, for all . These conditions are satisfied by (among other classes of cost functions) any squared Mahalanobis distance and, more generally, by any norm-induced metric raised to a power greater than one.

Finally, we turn to the learner’s optimization problem. Once we have derived conditions under which the learner’s optimization problem is convex, we can in principle achieve quickly diminishing Stackelberg regret with any algorithm for online bandit (i.e. zeroth order) optimization that works for adversarialy chosen loss functions. However, we observe that when some of the agents are non-strategic (e.g. the non-spammers), there is additional structure that we can take advantage of. In particular, on rounds for which the agent is non-strategic, the learner can also derive gradients for her loss function, in contrast to rounds on which the agent is strategic, where she only has access to zeroth-order feedback. To take advantage of this, we analyze a variant of the bandit convex optimization algorithm of flaxman2005online

which can make use of both kinds of feedback. The regret bound we obtain interpolates between the bound of

flaxman2005online, obtained when all agents are strategic, and the regret bound of online gradient descent Zin03, obtained when no agents are strategic, as a function of the proportion of the observed agents which were strategic.

1.2 Further Related Work

In the adversarial or strategic learning literature, the most closely related works are BS11 and HMPW16, as discussed above, which also consider notions of Stackelberg equilibrium, and make other similar modelling choices. LC09 also model adversarial learning as a Stackelberg game. Other works in this line model the learning problem as a purely adversarial (zero sum) game as in DDSV04 for which the appropriate solution concept is a minmax equilibrium, study the simultaneous move equilibria of non-zero sum games as in BKS12; BS09, and study the Bayes-Nash equilibria of incomplete information games given Bayesian priors on the player types as in GSBS13. Common to all of these works is the assumption that the learner either has full knowledge of the data generator’s utility functions (when Nash and Stackelberg equilibria are computed), or else knowledge of a prior distribution (when Bayes-Nash equilibria are computed). The point of departure of the current paper is to assume that the learner does not have this knowledge, and instead only has the power to observe agent decisions in response to deployed learners.

There is a parallel literature in machine learning and algorithmic game theory focusing on the problem of

learning from revealed preferences — which corresponds to learning from the choices that agents make when optimizing their (unknown) utility functions in response to some decision of the learner as in BV06. This literature has primarily focused on how buyers with unknown combinatorial valuation functions make purchases in response to prices. Learning problems studied in this literature include learning to predict what purchase decisions a buyer will make in response to a set of prices drawn from an unknown distribution as in ZR12; BDMUV14, finding prices that will maximize profit or welfare after buyers best respond as in ACDKR15; RUW16; RSUW17, and generalizations of these problems as in JRRW16. We study the problem of strategic classification with this sort of “revealed preferences” feedback.

Finally, Stackelberg games are studied extensively in the “security games” literature: see tambe2011security for an overview. Most closely related to this paper is the work of security, who develop no-regret algorithms for certain kinds of security games when “attackers” arrive online, using a notion of regret that is equivalent to the “Stackelberg regret” that we bound. This work is similar in motivation to ours: its goal is to give algorithms to compute equilibrium-like strategies for the “defender” without assuming that he has an unrealistic amount of knowledge about his opponents’ utility functions. Technically, the work is quite different, since we are operating in very different settings. In particular, security are primarily interested in information theoretic bounds, and do not give computationally efficient algorithms (since in general, solving Stackelberg security games is NP hard — see korzhyk2010complexity.)

2 Model and Preliminaries

We study a sequential binary classification problem in which a learner wishes to classify a sequence of examples over rounds. The example at each round is associated with an agent , where is the feature vector, is the true label and is a distance function that maps pairs of feature vectors to costs. We will think of as the cost for the agent to change his feature vector to the feature vector . If the example is positive (), then we say the agent is non-strategic, and if the example is negative (), then we say the agent is strategic.

Each agent has a utility function . In each round , the interaction between the learner and the agent is the following:

  1. The learner commits to a linear classifier parameterized by , where is the set of feasible parameters.

  2. An adversary, oblivious to the learner’s choices up to round , selects an agent .

  3. The agent sends the data point to the learner:

    • If the agent is strategic (), then the agent sends the learner his best response to (ties broken arbitrarily):

    • If the agent is non-strategic (), then the agent does not modify the feature vector, and so sends to the learner.

  4. The learner observes , and experiences classification loss .

We are mainly interested in two standard classification loss functions . The first is logistic loss, which corresponds to logistic regression:

The second is hinge loss, which corresponds to a support vector machine:

The interaction between the learner and the agent in each strategic round can be viewed as a Stackelberg game, in which the learner as the leader plays her strategy first, and then the agent as the follower best responds. With this observation, we define a regret notion termed as Stackelberg regret for measuring the performance of the learner.222The same regret notion has also appeared in the context of repeated security games (security). In words, Stackelberg regret is the difference between the cumulative loss of the learner and the cumulative loss it would have experienced if it had deployed the single best classifier in hindsight , for the same sequence of agents (who would have responded differently). More formally, for a history of play involving the agents and the sequence of classifiers the Stackelberg regret is defined to be

Observe that the regret-minimizing classifier is a Stackelberg equilibrium strategy for the learner in the one-shot game in which the learner first commits to a classifier , then all the agents simultaneously respond with , which results in the learner experiencing classification loss equal to .

In order to derive efficient algorithms with sub-linear Stackelberg regret, we will impose some restrictions the agents’ distance functions (formally described in Section 3), and we will assume the feature vectors have norm bounded by . We also assume our feasible set is convex, contains the unit -ball, and has norm bounded by , such that for any , .

3 Conditions for a Convex Learning Problem

In this section, we derive general conditions under which the learner’s problem in our setting can be cast as a convex minimization problem. At each round, the learner proposes some hypothesis and receives a loss that depends on the best response of the strategic agent to . Even when the original loss function is a convex function of alone, holding fixed, it may no longer be convex when is chosen strategically as a function of .

Since the learner’s objective is a summation of the loss over all rounds, and the sums of convex loss functions are convex, it suffices to show convexity for the loss experienced at each fixed round . In the following, we omit the subscript to avoid notational clutter.

Recall that each strategic agent (with ), when facing a classifier , will misreport his feature vector as by solving a maximization problem:

where is a “distance” function modeling the cost of deviating from the intended .

We first show that both the logistic and hinge loss objective are convex if is convex in , and then prove the convexity of when the distance has the form , for satisfying reasonable assumptions (stated in Theorem 2). In LABEL:sec:example, we give a large class of examples satisfying our assumptions, as well as several examples of functions which fail to satisfy our assumptions for which the problem we are studying is ill-posed (for example because strategic agents might not have finite best responses). This shows that it is necessary to impose constraints on when studying this problem.

As the first step, consider the learner’s objective function,


Note that we are now writing rather than to make explicit the dependence on and . Where and are clear from context and fixed, we continue to write . We note also that is not necessarily a well-defined mapping since the strategic agents can break the tie arbitrarily when the is not unique. However, as we show in Theorem 2, is well-defined, which is all that is necessary for the learner’s cost to be well defined.

We instantiate as either logistic loss or hinge loss 333We write to denote .: note that both are non-increasing and convex functions in .

We will rely on the following fact in convex analysis.

Lemma 1 (e.g. rockafellar2015convex, Theorem 5.1).

Let be a non-increasing convex function of a single variable and be convex. Then is convex in .

This yields the following approach.

Theorem 1.

Suppose, for all , a strategic agent’s best-response function satisfies the condition that the function is convex. Then logistic loss and hinge loss are convex functions of , i.e. for all , both of the following are convex:

  • (logistic) ;

  • (hinge) .


For strategic agents, i.e. when , this follows immediately from Lemma 1 and the fact that both loss functions are non-increasing convex functions of the variable . When , i.e. the agent is not strategic, . Note that is linear (and hence convex), so the lemma applies here as well. ∎

Therefore, in order to show the convexity of the learner’s loss function for every , it suffices to show that for any fixed , is convex in . In the next section, we give sufficient conditions for this to be the case.

3.1 Sufficient Conditions

We need to recall a definition before we present the main result of this section.

Definition 1.

A function is positive homogeneous of degree if for any scalar with and vector , we have

Note that need not be an integer.

Theorem 2.

Let be the strategic agent’s utility function. If has the form where satisfies:

  • for all ;

  • is convex over ;

  • is positive homogeneous of degree

then the function is well-defined (i.e. finite and independent of the choices of maximizer) and convex.

Note that by this theorem, when the conditions hold we can speak of the function without ambiguity even when there are multiple best responses .


The proof is broken up into a series of steps, which are described here. The remainder of this subsection then states each step in more detail and proves it.

  1. First, we show (Corollary LABEL:cor:best) that any best response satisfies , where is some subgradient of at and is the convex conjugate of .

  2. Next, we show (Claim LABEL:claim:finite) that is finite for all .

  3. Next, we prove (Claim LABEL:claim:homo) that homogeneity of follows from homogeneity of .

  4. Finally, we apply a slight generalization of Euler’s Theorem, which says that if is homogenous and convex, there is a such that for any choice of subgradient (the set of subgradients at ), we have . This, together with steps 2 and 3, implies that for any choice of best response , and this function is well-defined and convex.

All missing proofs appear in the appendix.

We begin with the first step. First, we rewrite the utility function:

We perform a change of variable from to and write:

The first step only relies on the convexity of . Let

be the set of maximizers of given . We note that is a convex set since is concave in for any . Note that if is differentiable, is a singleton set.