The Effect of Strategic Noise in Linear Regression

07/14/2020 ∙ by Safwan Hossain, et al. ∙ UNIVERSITY OF TORONTO 0

We build on an emerging line of work which studies strategic manipulations in training data provided to machine learning algorithms. Specifically, we focus on the ubiquitous task of linear regression. Prior work focused on the design of strategyproof algorithms, which aim to prevent such manipulations altogether by aligning the incentives of data sources. However, algorithms used in practice are often not strategyproof, which induces a strategic game among the agents. We focus on a broad class of non-strategyproof algorithms for linear regression, namely ℓ_p norm minimization (p > 1) with convex regularization. We show that when manipulations are bounded, every algorithm in this class admits a unique pure Nash equilibrium outcome. We also shed light on the structure of this equilibrium by uncovering a surprising connection between strategyproof algorithms and pure Nash equilibria of non-strategyproof algorithms in a broader setting, which may be of independent interest. Finally, we analyze the quality of equilibria under these algorithms in terms of the price of anarchy.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Linear regression aims to find a linear relationship between explanatory variables and response variables. Under certain assumptions, it is known that minimizing a suitable loss function on training data generalizes well to unseen test data

[3]. However, traditional analysis assumes that the algorithm has access to untainted data drawn from the underlying distribution. Relaxing this assumption, a significant body of recent work has focused on making machine learning algorithms robust to stochastic or adversarial noise; the former is too benign [23, 16, 15, 27], while the latter is too pessimistic [20, 4, 9, 17]. A third model, more recent and prescient, is that of strategic noise, which is a game-theoretic modeling of noise that sits in between the two. Here, it is assumed that the training set is provided by self-interested agents, who may manipulate to minimize loss on their own data.

We focus on strategic noise in linear regression. Dekel et al. [13] provide an example of retailer Zara, which uses regression to predict product demand at each store, partially based on self-reported data provided by the stores. Given limited supply of popular items, store managers may engage in strategic manipulation to ensure the distribution process benefits them, and there is substantial evidence that this is widespread [7]. Strategic behavior by even a small number of agents can significantly affect the overall system, including agents who have not participated in such behavior. Prior work has focused on designing strategyproof algorithms for linear regression [30, 13, 10], under which agents provably cannot benefit by misreporting their data. While strategyproofness is a strong guarantee, it is only satisfied by severely restricted algorithms. Indeed, as we observe later in the paper, most practical algorithms for linear regression are not strategyproof.

When strategic agents with competing interests manipulate the input data under a non-strategyproof algorithm, a game is induced between them. Game theory literature offers several tools to analyze such behaviour, such as Nash equilibria and the price of anarchy 

[28]. We use these tools to answer three key questions:

  • Does the induced game always admit a pure Nash equilibrium?

  • What are the characteristics of these equilibria?

  • Is there a connection between strategyproof algorithms and equilibria of non-strategyproof algorithms?

We consider linear regression algorithms which minimize the -norm of residuals (where

) with convex regularization. This class includes most popular linear regression algorithms, including the ordinary least squares (OLS), lasso, group lasso, ridge regression, and elastic net regression. Our key result is that the game induced by an algorithm in this class has three properties: a) it always has a pure Nash equilibrium, b) all pure Nash equilibria result in the same regression hyperplane, and c) there exists a strategyproof algorithm which returns this equilibrium regression hyperplane given non-manipulated data. We also analyze the quality of this equilibrium outcome, measured by the pure price of anarchy. We show that for a broad subset of algorithms in this class, the pure price of anarchy is unbounded.

1.1 Related Work

A special case of linear regression is facility location in one dimension [26], where each agent is located at some on the real line. An algorithm elicits the preferred locations of the agents (who can misreport) and chooses a location to place a facility. A significant body of literature in game theory is devoted to understanding strategyproof algorithms in this domain [26, 6], which includes placing the facility at the median of the reported locations. A more recent line of work studies equilibria of non-strategyproof algorithms such as placing the facility at the average of the reported locations [32, 33, 36]. Similarly, in the more general linear regression setting, prior work has focused on strategyproof algorithms [30, 13, 10]. We complete the picture by studying equilibria of non-strategyproof algorithms for linear regression.

We use a standard model of strategic manipulations in linear regression [30, 13, 10]. Perote and Perote-Pena [30] designed a strategyproof algorithm in two dimensions. Dekel et al. [13] proved that least absolute deviations (LAD), which minimizes the -norm of residuals without regularization, is strategyproof. Chen et al. [10] extended their result to include regularization, and designed a new family of strategyproof algorithms in high dimensions. They also analyzed the loss in mean squared error (MSE) under a strategyproof algorithm as compared to the OLS, which minimizes MSE. They showed that any strategyproof algorithm has at least twice as much MSE as the OLS in the worst case, and that this ratio is for LAD. Our result (Theorem 6) shows that the ratio of the equilibrium MSE under the algorithms we study to the optimal MSE of the OLS is unbounded. Through the connection we establish to strategyproof algorithms (Theorem 5), this also implies unbounded ratio for the broad class of corresponding strategyproof algorithms.

Finally, we mention that strategic manipulations have been studied in various other machine learning contexts, e.g., manipulations of feature vectors

[18, 14], strategic classification [25, 18, 14], competition among different algorithms [24, 19, 2, 1], or manipulations due to privacy concerns [11, 5].

2 Model

In linear regression, we are given training data points of the form , where are the explanatory variables, and is the response variable.111In the regression literature, these are also called independent and dependent variables, respectively. Following the standard convention, we assume that the last component of each is a constant, say . Let be the matrix with as its column, and . The goal of a linear regression algorithm is to find a hyperplane with normal vector such that

is a good estimate of

. The residual of point is .

Algorithms:

We focus on a broad class of algorithms parametrized by and a regularizing function . The -regression algorithm minimizes the following loss function over :

(1)

We assume that is convex and differentiable. For , this objective is strictly convex, admitting a unique optimum . When there is no regularization, we refer to it as the -regression algorithm.

Strategic model:

We follow a standard model of strategic interactions studied in the literature [30, 13, 10]. A training data point is provided by an agent . denotes the set of all agents. is public information, which is non-manipulable, but is held private by agent . We assume a subset of agents (with ) are honest and always report . The remaining agents in (with ) are strategic and may report . Note that we allow all agents in be strategic; that is, we allow and . For convenience, we assume that and . However, we emphasize that our algorithms do not know which agents are strategic and which are honest. Given a set of reports , honest agents’ reports are denoted by (note that ) and strategic agents’ reports by . In accordance with related literature, we focus our analysis to the training set and do not consider strategic manipulation in test data, leaving this for future work.

The -regression algorithm takes as input and , and returns minimizing the loss in Equation (1). We say that is the outcome for agent . Since and are non manipulable, we can treat them as fixed. Hence, is the only input which matters, and is the output for these manipulating agents. For an algorithm , we use the notation , and let denote the function returning agent ’s outcome . A strategic agent manipulates to ensure this outcome is as close to her true response variable as possible. Formally, agent has single-peaked preferences (with strict preference denoted by ) over with peak at . That is, for all or , we have . Agent is perfectly happy when . In this work, we assume that for each agent , both and are bounded (WLOG, say they belong to ).

Nash equilibria:

This strategic interaction induces a game among agents in , and we are interested in the pure Nash equilibria (PNE) of this game. We say that is a Nash equilibrium (NE) if no strategic agent can strictly gain by changing her report, i.e., if , . We say that is a pure Nash equilibrium (PNE) if it is a NE and each is deterministic. Let denote the set of pure Nash equilibria under when the peaks of agents’ preferences222 Equilibria can generally depend on the full preferences, but results in Section 4 show only peaks matter. are given by . For , let be the corresponding PNE outcome.

Strategyproofness:

We say that an algorithm is strategyproof if no agent can benefit by misreporting her true response variable regardless of the reports of the other agents, i.e., , . Note that strategyproofness implies that each agent reporting her true value (i.e. ) is a pure Nash equilibrium.

Pure price of anarchy (PPoA):

It is natural to measure the cost of selfish behavior on the overall system. A classic notion is the pure price of anarchy (PPoA) [21, 28], which is defined as the ratio between the maximum social cost under any PNE and the optimal social cost under honest reporting, for an appropriate measure of social cost. Here, social cost is a measure of the overall fit. In regression, it is typical to measure fit using the norm of absolute residuals for some . While we study the equilibrium of regression mechanisms for different values, we need to evaluate them using a single value of , so that the results are comparable. For our theoretical analysis, we use mean squared error (which corresponds to ) since it is the standard measure of fit in the literature [10]. One way to interpret our results is: If our goal were to minimize the MSE, which regression mechanism would we choose, assuming that the strategic agents would achieve equilibrium? We also present empirical results for other values of . Slightly abusing the notation by letting map all reports to all outcomes (not just for agents in ), we write:

where is the outcome of OLS (i.e. the )-regression algorithm) under honest reporting, which minimizes mean squared error. Note that the PPoA, as we have defined it, measures the impact of the behavior of strategic agents on all agents, including on the honest agents.

3 Warm-Up: The 1D Case

As a warm-up, we review the more restricted facility location setting in one dimension. Here, each agent has an associated scalar value and the algorithm must produce the same outcome for all agents (i.e. ). Hence, the algorithm is a function . This is a special case of linear regression where agents have identical explanatory variables.

Much of the literature on facility location has focused on strategyproof algorithms. Moulin [26] showed that an algorithm is strategyproof and anonymous333This is a mild condition which requires treating the agents symmetrically. if and only if it is a generalized median given by , where denotes the median and is a fixed constant (called a phantom) for each . Caragiannis et al. [6] focused on a notion of worst-case statistical efficiency, and provided a characterization of generalized medians which exhibit optimal efficiency. In particular, they showed that the uniform generalized median given by is has optimal statistical efficiency.

A more recent line of literature has focused on manipulations under non-strategyproof rules. Recall that under a non-strategyproof rule , each strategic agent reports a value , which may be different from . For the facility location setting, the -regression algorithm described in Section 2 reduces to . For , this is known to be strategyproof [10]. When , which is the focus of our work, this rule is not strategyproof, as we observe in Section 4.

In this family, the most natural rule is the average rule given by . This corresponds to with no honest agents or regularization. For this rule, Renault and Trannoy [32] showed that there is always a pure Nash equilibrium, and the pure Nash equilibrium outcome is unique. This outcome is given by , which coincides with the outcome of the uniform generalized median, which is strategyproof.

Generalizing this result, Yamamura and Kawasaki [36] proved that any algorithm satisfying four natural axioms has a unique PNE outcome, which is given by the generalized median , where for each .

We note that the ‘vanilla’ -norm algorithm with no honest agents or regularization satisfies the axioms of Yamamura and Kawasaki [36]. Using the result of Yamamura and Kawasaki [36] described above, this algorithm has a unique PNE outcome given by the generalized median , where for each . It is easy to see that and . For , is the minimizer . Taking the derivative w.r.t. , we can see that the optimal solution is given by

(2)

Below, we extend this to the general -regression algorithm with , convex regularizer , and with the possibility of honest agents. We omit the proof because, in the next section, we prove this more generally for the linear regression setting (Theorems 34, and 5).

Theorem 1.

Consider facility location with agents, of which a subset of agents are strategic and have single-peaked preferences with peaks at . Let denote the -regression algorithm with and convex regularizer . Then, the following statements hold for .

  1. For each , there is a pure Nash equilibrium .

  2. For each , all pure Nash equilibria have the same outcome .

  3. There exists a strategyproof algorithm such that for all and all pure Nash equilibria , .

Theorem 1 guarantees the existence of a pure Nash equilibrium and highlights an interesting structure of the equilibrium. The next immediate question is to analyze the quality of this equilibrium. We show that the PPoA of any -regression algorithm (i.e. without regularization) is . Interestingly, this holds even if only a single agent is strategic, and the bound is independent of .

Theorem 2.

Consider facility location with agents, of which a subset of agents are strategic. Let denote the -regression algorithm with . When , .

Proof.

Define and . As PPoA is measured with MSE, the optimal social cost is achieved with the location . Let denote the unique PNE outcome of the algorithm. Note that . For , this holds by definition. To see this for , WLOG let . Then all manipulating agents must be reporting , and the honest agents maintain their honest reports in (see Lemma  5). However, then loss optimal outcome on this input cannot be as would have a lower loss. A symmetric argument holds for . Thus, .

We first show a lower bound of . Suppose a strategic agent has preference with peak at and the remaining agents have preferences with peak at . Note that and . We note that a PNE equilibrium is given by and , regardless of which agents other than are strategic. By Equation (2), the outcome on this input is . Now, we have that the MSE in the equilibrium is whereas the optimal MSE under honest reports is

Hence, we have that .

For the upper bound, since the MSE is a strictly convex function with a minimum at the sample mean , the maximum allowable value of is achieved at one of the end-points or . Hence, we have

We show that each quantity inside in the last expression is . Let us prove this for the first quantity. The argument is symmetric for the second. Note that for each and each , we have,

Hence, we have that for each ,

Summing this over all , we get , as desired. ∎

We remark that both Theorems 1 and 2, due to their generality, are novel results in the facility location setting.

4 Linear Regression

We now turn to the more general linear regression setting, which is the focus of our work, and highlight interesting similarities and differences to the facility location setting. Recall that for linear regression, the -regression algorithm finds the optimal minimizing the loss function:

Let be a strategic agent. Recall that her outcome is denoted by . Let denote the set of her best responses as a function of the reports of the other agents. Informally, it is the set of reports that agent can submit to induce her most preferred outcome.

4.1 Properties of the Algorithm, Best Responses, and Pure Nash Equilibria

We begin by establishing intuitive properties of -regression algorithms. We first derive the following lemmas.

Lemma 1.

Fix strategic agent and reports of the other agents. Let and be two possible reports of agent , and let and be the corresponding optimal regression coefficients, respectively. Then, implies .

Proof.

Suppose for contradiction that . We note that at the optimal regression coefficients, the gradient of our strictly convex loss function must vanish. Let the loss functions on the two instances be given by and , respectively. So for ,

Since is optimal for , taking the derivative, we have

where the last inequality follows because and is not the vector (its last element is a non-zero constant). Hence, the gradient of at is not zero, which is a contradiction. ∎

Lemma 2.

For , , and , we have

Proof.

Note that vector majorizes the vector . For , is a convex function. Hence, by the Karamata majorization inequality, the result follows. ∎

Lemma 3.

The outcome of agent is continuous in , and strictly increasing in her own report for any fixed reports of the other agents.

Proof.

For continuity, we refer to Corollary 7.43 in Rockafellar and Wets [35], which states that function is single-valued and continuous on its domain, when function is proper444A function is proper if the domain on which it is finite is non-empty., strictly convex, lower semi-continuous, and has , .555 is known as the horizon function of . It is easy to check that our loss function given in Equation (1) satisfies these conditions. Hence, its minimizer is continuous in . Since , it follows that is also continuous in .

For strict monotonicity, first note that . Now consider two instances of -linear regression, and , that differ only in agent ’s reported response, denoted and , respectively in the two instances. Hence, . Let and be the corresponding optimal regression parameters. Without loss of generality, assume , and for contradiction, suppose that . Using Lemma 1, we get that . Because our strictly convex loss function has a unique minimizer, we have and . Let us define and , we get

(3)
(4)

Adding Equations (4) and (3), we have:

(5)

Note that because we assumed and , using Lemma 2, we get

which contradicts Equation 5. ∎

The last lemma demonstrates that -regression cannot be strategyproof. Consider an instance where each strategic agent has and these true data points do not all lie on a hyperplane. Then under honest reporting, not all strategic agents can be perfectly happy, and any agent with (or ) can slightly decrease (or increase) her report to achieve a strictly more preferred outcome. Next, we show that the best response of an agent is always unique and continuous in the reports of the other agents.

Lemma 4.

For each strategic agent , the following hold about the best response function .

  1. The best response is unique, i.e., for any reports of the other agents.

  2. is a continuous function of .

Proof.

We first show uniqueness of the best response. By Lemma 3, is continious and strictly increasing in . Consider the minimization problem: , where is constant. So for now, let us consider to be a function of only . Since , it achieves a minimum at and a maximum at . If , then the minimum of the problem is achieved at . Symmetric case holds for where minimum is achieved at . Lastly, if , by intermediate value theorem, , which is then the minimum. In all cases, the minimum is unique since is strictly increasing. We now show that this unique minimum is indeed the unique best response. If then reporting makes agent perfectly happy as her outcome matches the peak of her preference, which is clearly best response. If , then and her outcome is . Under any other report, her outcome would be , which cannot be more preferred. A symmetric argument holds for case.

Now we can use the uniqueness of the best response to argue its continuity. More specifically, we want to show that is continuous, where is jointly continious due to the continuity of . We use the sequence definition of continuity. Fix a convergent sequence . Since there is always a unique minimum, the sequence is well-defined. We want to show . By the Bolzano-Weirstrass theorem, every bounded sequence in has a convergent sub-sequence. Therefore, this has a convergent sub-sequence that converges to some . Let . We want to first show . By the continuity of , . Also by the minimum, for every individual element of the subsequence , we have that . Now again by continuity of , both the above sequences converge and we have: . Since is the unique minimizer for , we have that . So, every convergent sub-sequence of converges to . Since this is a bounded sequence, we have that if , then . Thus, is continuous. ∎

We remark that part 1 of Lemma 4 is a strong result: it establishes a unique best response for every possible single-peaked preferences that an agent may have (in fact, our proof shows that this best response depends only on the peak and not on the full preferences). This allows us to avoid further assumptions on the structure of the agent preferences.

Finally, we derive a simple characterization of pure Nash equilibria in our setting. We show that under a PNE, each strategic agent must be in one of three states: either she is perfectly happy (), or wants to decrease her outcome () but is already reporting , or wants to increase her outcome () but is already reporting .

Lemma 5.

is a pure Nash Equilibrium if and only if holds for all .

Proof.

For the ‘if’ direction, we check that in each case, agent cannot change her report to attain a strictly better outcome. When and , every other report will result in an outcome (Lemma 3), which the agent prefers even less. A symmetric argument holds for the and case. Finally, when , the agent is already perfectly happy.

For the ‘only if’ direction, suppose is a PNE. Consider agent . The only way the condition is violated is if and or and . In the former case, Lemma 3 implies that for a sufficiently small , agent increasing her report to must result in an outcome , which the agent strictly prefers over . This contradicts the assumption that is a PNE. A symmetric argument holds for the second case. ∎

Note that Lemma 5 immediately implies a naïve but simple algorithm to find a pure Nash equilibrium. Since for each , this induces possible vectors. For each such vector, we can compute the outcome of the mechanism , and check whether the conditions of Lemma 5 are satisfied. This might lead one to believe that the strategic game that we study is equivalent to the finite game induced by the possible strategy profiles. However, this is not true because limiting the strategy set of the agents can give rise to new equilibria which are not equilibria of the original game. We give an explicit example illustrating this below. We further discuss the issue of computing a PNE in Section 5.

Example 1: Finite game leading to different equilibria.

We use an example from 1D facility location with the average rule — recall that this is a special case of linear regression — to illustrate this point. Consider an example with two agents and with true points and , respectively, whose preferences are such that each agent strictly prefers outcome to when .

If the agents are allowed to report values in the range , then the unique PNE of the game is agent reporting and agent reporting , and the PNE outcome is .

Now, consider the version with finite strategy spaces, where each agent must report . Suppose the agents report honestly, i.e., . Then, the outcome is . The only way agent could possibly improve is by reporting , but in that case the outcome would be , increasing . A similar argument holds for agent . Hence, honest reporting is a PNE of the finite game, but not of the original game.

4.2 Analysis of Pure Nash Equilibria

We are now ready to prove the main results of our work. We begin by showing that a PNE always exists, generalizing the first statement of Theorem 1 from 1D facility allocation to linear regression.

Theorem 3.

For and convex regularizer , the -regression algorithm admits a pure Nash Equilibrium.

Proof.

Consider the mapping from the reports of strategic agents to their best responses, i.e., . Recall that best responses are unique due to Lemma 4. Also, note that pure Nash equilibria are precisely fixed points of this mapping.

Brouwer’s fixed point theorem states that any continuous function from a convex compact set to itself has a fixed point [31]. Note that is a function from to , and is a convex compact set. Further, is a continuous function since each is a continuous function (Lemma 4). Hence, by Brouwer’s fixed point theorem, has a fixed point (i.e. pure Nash equilibrium). ∎

Next, we show that there is a unique pure Nash equilibrium outcome (i.e. all pure Nash equilibria lead to the same hyperplane ), generalizing the second statement in Theorem 1.

Theorem 4.

For any and convex regularizer , the -regression algorithm has a unique pure Nash equilibrium outcome.

Proof.

Assume by contradiction that there are two equilibria and , which result in distinct outcomes and