Designing the Game to Play: Optimizing Payoff Structure in Security Games

05/05/2018 ∙ by Zheyuan Ryan Shi, et al. ∙ University of Southampton Carnegie Mellon University Swarthmore College 0

Effective game-theoretic modeling of defender-attacker behavior is becoming increasingly important. In many domains, the defender functions not only as a player but also the designer of the game's payoff structure. We study Stackelberg Security Games where the defender, in addition to allocating defensive resources to protect targets from the attacker, can strategically manipulate the attacker's payoff under budget constraints in weighted L^p-norm form regarding the amount of change. Focusing on problems with weighted L^1-norm form constraint, we present (i) a mixed integer linear program-based algorithm with approximation guarantee; (ii) a branch-and-bound based algorithm with improved efficiency achieved by effective pruning; (iii) a polynomial time approximation scheme for a special but practical class of problems. In addition, we show that problems under budget constraints in L^0-norm form and weighted L^∞-norm form can be solved in polynomial time. We provide an extensive experimental evaluation of our proposed algorithms.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Research efforts in security games have led to success in various domains, ranging from protecting critical infrastructure [Letchford and Conitzer2013, Wang et al.2016] and catching fare invaders in metro systems [Yin et al.2012], to combating poaching [Fang et al.2016] and preventing cyber intrusions [Durkota et al.2015, Basilico et al.2016]. In these games, a defender protects a set of targets from an attacker by allocating defensive resources. One key element that characterizes the strategies of the players is the payoff structure. Existing work in this area typically treats the payoff structure of the players as given parameters, sometimes with uncertainties known a priori given the nature of the domain. However, under various circumstances, the defender is able to change the attacker’s payoff, thus rendering the existing models inadequate in expressiveness. For example, in wildlife poaching, the law enforcement agency may charge a variable fine if the poacher is caught at different locations, e.g., in the core area vs. in the non-core area. In cybersecurity, the network administrator may change the actual or appeared value of any network node for a potential attacker. In these cases, the defender’s decision making is two-staged: she chooses the payoff structure, as well as the strategy of allocating defensive resources. With a properly chosen payoff structure, the defender may save the effort by achieving much better utility with the same or even less amount of resources.

As existing work in security games does not provide adequate tools to deal with this problem (see Section 2 for more details), we aim to fill this gap as follows. We study how to design the attacker’s payoff structure in security games given budget constraints in weighted -norm (). That is, the distance between the original payoff structure and the modified payoff structure is bounded, using distance metrics such as Manhattan distance (i.e., -norm) with varying weights for reward and penalty of different targets. The intuition behind this setting is that the defender can change the payoffs to make a target that is preferable to the defender more attractive to the attacker and disincentivize the attacker from attacking targets that can lead to a significant loss to the defender. More change incurs a higher cost to the defender and the defender has a fixed budget for making the changes. Our findings can be summarized as follows:

-norm case: When the budget constraint is in weighted -norm form, i.e. additive cost, our contribution is threefold. (i) We exploit several key properties of the optimal manipulation and propose a mixed integer linear program (MILP)-based algorithm with approximation guarantee. (ii) We propose a novel branch-and-bound approach with improved efficiency achieved by effective pruning for the general case. (iii) Finally, we show that a polynomial time approximation scheme (PTAS) exists for a special but practical case where the budget is very limited and the manipulation cost is uniform across targets. The PTAS is built upon the key observation that there is an optimal solution where no more than two targets’ payoffs are changed in this restricted case.

and -norm cases: We propose a and a algorithm for problems under budget constraints in -norm form and weighted -norm form111The -norm is not actually a norm, but we use the term for simplicity and its definition is given in Sec. 4.2. The definition of weighted -norm is given in Sec. 4.1., respectively, where is the total number of targets. For -norm form budget, i.e. limited number of targets to manipulate, our algorithm converts the problem into subproblems and reduces each subproblem to a problem of finding a subset of items with the maximum average weight. The latter can be solved in time. For -norm form budget, i.e. limited range of manipulation on each target, we reduce the problem to traditional Stackelberg Security Games with fixed payoff structure, which again admits an efficient algorithm.

Numerical evaluation: We provide extensive experimental evaluation for the proposed algorithms. For problems with -norm form budget constraint, we show that the branch-and-bound approach with an additive approximation guarantee can solve up to hundreds of targets in a few minutes. This is faster than other baseline algorithms we compare to. Somewhat surprisingly, naively solving non-convex subproblems using interior point method achieves good performance in practice despite that there is no theoretical guarantee of solution quality. We also evaluate the proposed algorithm for the -norm form case and show its superior performance over two greedy algorithms and a MILP based algorithm.

2 Preliminaries and Related Work

The security game that we consider in this paper features a set of targets, . The defender has units of defensive resources, each can protect one target. The attacker can choose to attack one target after observing the defender’s strategy. If the defender covers target when it is attacked, the defender gets a reward and the attacker gets a penalty . Otherwise, the defender gets a penalty and the attacker gets a reward . When the defender commits to a mixed strategy , that is, covering target

with probability

, the defender’s and attacker’s expected utilities when target is attacked are and , respectively.

We adopt the commonly used solution concept of Strong Stackelberg Equilibrium (SSE) [Kiekintveld et al.2009]. At an SSE, the defender chooses an optimal strategy that leads to the highest expected utility for her when the attacker chooses a best response (assumed to be a pure strategy w.l.o.g), breaking ties in favor of the defender. Given a coverage , the attack set contains all the targets which have a weakly higher attacker’s expected utility than any other target, i.e.,


[Kiekintveld et al.2009] show that there exists an SSE where the defender only covers the targets in the attack set.

Given the game parameters, the optimal defender strategy in such a game can be computed using multiple linear programs (LPs) [Conitzer and Sandholm2006] or an efficient algorithm called ORIGAMI [Kiekintveld et al.2009] based on enumerating the possible attack sets. We leverage insights from both works to devise our algorithms.

Although many algorithms have been developed for security games under various settings, in most of the existing literature, the payoff structure is treated as fixed and cannot be changed by the defender, either in the full information case  [Korzhyk et al.2010, Paruchuri et al.2008, Laszka et al.2017], or in the presence of payoff uncertainties [Kiekintveld et al.2013, Kiekintveld et al.2011, Yin and Tambe2012, Letchford et al.2009, Blum et al.2014]. As mentioned earlier, in many real-world scenarios the defender has control over the attacker’s payoffs. The approaches above ignore this aspect and thus leave room for further optimization.

Indeed, despite its significance, jointly optimizing the payoff structure and the resource allocation is yet under-explored. A notable exception, and a most directly related work to ours, is the audit game model [Blocki et al.2013, Blocki et al.2015]. The defender can choose target-specific “punishment rates”, in order to maximize her expected utility offset by the cost of setting the punishment rate. Compared with their model, ours is more general in that we allow not only manipulation of attacker’s penalty, but also attacker’s reward. This realistic extension makes their core techniques inapplicable. Also, we treat the manipulation cost as a constraint instead of a regularization term in the objective function, for in some real-world settings, payoffs can be manipulated only once, yet the defender may face multiple attacks afterwards. This makes it hard to determine the regularization coefficient. Another closely related work [Schlenker et al.2018] focuses on the use of honeypot [Kiekintveld et al.2015, Durkota et al.2015, Píbil et al.2012]. It studies the problem of deceiving a cyber attacker by manipulating the attacker’s (believed) payoff. However, it assumes the defender can only change the payoff structure, ignoring the allocation of defensive resources after the manipulation.  [Horák et al.2017] study the manipulation of attacker’s belief in a repeated game. They assume actively engaging attacker and defender, which is not the case in our problem.

If we conceptually decouple the payoff manipulation from resource allocation, the defender faces a two-stage decision. She first chooses the structure of the game, and then plays the game. Thus, our problem may be viewed as a mechanism design problem, albeit not in a conventional setting. Most work in mechanism design considers private information games [Fujishima et al.1999, Myerson1989], while in our work, and in most security game literature, the payoff information is public. Some design the incentive mechanism using a Stackelberg game [Kang and Wu2015], with applications to network routing [Sharma and Williamson2007], mobile phone sensing [Yang et al.2012], and ecology surveillance [Xue et al.2016]. However, these works solve the Stackelberg game to design the mechanism, rather than designing the structure of the Stackelberg game.

3 Optimizing Payoff with Budget Constraint in Weighted -norm Form

In this section, we focus on computing the optimal way of manipulating attacker’s payoffs and allocating defensive resources when the defender can change the attacker’s reward and penalty at a cost that grows linearly in the amount of change and the defender has a limited budget for making the changes. The cost rate, referred to as weights, may be different across targets. This is an abstraction of several domains. For example, a network administrator may change the actual or appeared value of any network node although such change often incurs time and hardware costs.

Let , , ,

denote the attacker’s reward and penalty vectors before and after the manipulation. Similar to the initial payoff structure, we require that

and denote . Let and be the amount of change in attacker’s reward and penalty and the weights on resp.. The budget constraint is in weighted norm form, i.e., where is the budget. The defender’s strategy is characterized by . Given this strategy, in the manipulated game the attacker attacks some target , which belongs to the attack set . We first show some properties of the optimal solution.

Theorem 1.

There is an optimal solution with corresponding attack target and attack set which satisfies the following conditions:

  1. .

  2. ; , .

  3. and , .

Proof sketch.

Condition 1: If any with , we may either set or push into the attack set. There is no need to protect a target that is not in the attack set. Condition 2: Flipping the sign of and leads to no better solution. Condition 3: For each target , we can shift the change on to and vice versa while one is more budget efficient than the other depending on coverage. We can show that these manipulations can be done simultaneously.222Due to limited space, the full proofs are included in the full version of the paper:

Similar to the multiple LPs formulation in [Conitzer and Sandholm2006], we consider subproblems , each assuming some target is the attack target, and the best solution among all subproblems is the optimal defender strategy. Condition 2 in Property 1 shows it is possible to infer the sign of and given the attack target. So in the sequel, we abuse the notation by treating as the absolute value of the amount of change, and assume w.l.o.g. that in , and . Thus, a straightforward formulation for is

s.t. (3)

The above formulation is non-convex due to the quadratic terms in Constraint 3 which leads to an indefinite Hessian matrix (see Appendix B), and thus no existing solvers can guarantee global optimality for the above formulation.

3.1 A MILP-based Solution with Approximation Guarantee

To find a defender strategy with solution quality guarantee, we solve the atomic version of the subproblems with MILPs. We show an approximation guarantee which improves as the fineness of discretization grows. We further propose a branch-and-bound-like framework for pruning subproblems to improve runtime efficiency.

In the atomic version of the payoff manipulation problem, we assume the defender can only make atomic changes, with the minimum amount of change given as . We refer to the atomic version of as . can be formulated as the MILP in Equations 9-19. We simplify the objective function as since . All constraints involving sub/super-script without a summation apply to all proper range of summation indices. We use binary representation for and in constraints 10-14. The binary representation results in bilinear terms like . We introduce variables and constraints 18-19 to linearize them.

Constraint 4-8

The optimal defender strategy for the atomic payoff manipulation problem can be found by checking the solution to all the subproblems and compare the corresponding . We can also combine all the subproblems by constructing a single MILP, with additional variables indicating which subproblem is optimal. The details can be found in the full version.

A natural idea to approximate the global optima of the original -constrained payoff manipulation problem is, for each attack target , approximate with using small enough . Theorem 2 below shows such an approximation bound.

Theorem 2.

The solution of the atomic problem is an additive -approximation to the original problem.

Proof sketch.

The floor and ceiling notations are about the “integral grid” defined by . Suppose is an optimal solution to . Let , , and except . We can show such feasible solutions yield the desired approximation bound. ∎

We note that the idea of discretizing the manipulation space is similar to [Blocki et al.2013, Blocki et al.2015]. Yet allowing changes in both reward and penalty and the difference in objective function make our formulation different and the reduction to SOCP used in  [Blocki et al.2015] inapplicable.

We can further improve the practical runtime of the MILPs by pruning and prioritizing subproblems as shown in Alg. 1. We first compute a global lower bound by checking a sequence of greedy manipulations. Inspired by Condition 2 and 3 in Property 1, we greedily spend all the budget on one target to increase its reward or penalty, leaving all other targets’ payoff parameters unchanged (Lines 2 - 8).

Upper bounds in can be computed with budget reuse: we independently spend the full amount of budget on each target to increase and and decrease and , , as much as possible. For the ease of notation, in Alg. 1 we assume manipulations have uniform cost. The weighted case can be easily extended.

The subproblem is pruned if its upper bound is lower than the global lower bound. To make the pruning more efficient, we solve subproblems in descending order of their corresponding lower bounds, hoping for an increase in the global lower bound. For subproblems that cannot be pruned, we set to the desired accuracy and solve the MILP to approximate the subproblem optima. We also add to the MILP the linear constraint on derived from the global lower bound.

To get the bounds, we call an improved version of the ORIGAMI algorithm in [Kiekintveld et al.2009] by doing a binary search on the size of the attack set , and solve the linear system. It is denoted as ORIGAMI-BS in Alg. 1. Recall is the defender’s total resource. Let be the attacker’s expected utility for attack set and . From and , we obtain


We iteratively cut the search space by half based on and . The complexity improves from to . A complete description can be found in the full version.

0:   Payoffs , budget
1:   Initialize containing set of indices of pruned subproblems. Set to be a desired accuracy.
2:   for Subproblem  do
3:       Greedy Modifications (GM):
7:   end for
9:   Sort in decreasing .
10:   for sorted  do
11:       Overuse Modifications (OM):
15:       if  then
16:           Prune
17:       else
18:           run MILP of with additional constraint
19:       end if
20:   end for
21:   Output:  Best solution among and all ’s.
Algorithm 1 Branch-and-bound

We end this subsection by remarking that atomic payoff manipulation arises in many real-world applications. For example, it is infeasible for the wildlife ranger to charge the poacher a fine of $. In those cases, our proposed MILP formulation could be directly applied.

3.2 PTAS for Limited Budget and Uniform Costs

We show that for a special but practical class of problems, there exist a PTAS. In many applications, the defender has only a limited budget . Additionally, the weights on and are the same. W.l.o.g., we assume . We first show a structural theorem below and then discuss its algorithmic implication.

Theorem 3.

When budget and , there exists an optimal solution which manipulates the attack target and at most one other target.

Proof sketch.

Since is limited, either or is unchanged according to Condition 3 of Property 1. Assume all manipulations happen on attacker’s reward. If some three targets get manipulated, we can simultaneously increase for attack target and decrease for such that ’s utility increases to be the same as target , until some becomes . After such change, the defender’s utility does not decrease, and the number of targets manipulated decreases. Other cases also hold due to symmetry. ∎

The theorem above is tight, i.e. we show in the full version an instance where two targets are manipulated. When and , Theorem 3 naturally suggests a PTAS – we can use linear search for manipulations on all pairs of targets as shown in Alg. 2, where is a unit vector with a single one at position . Theorem 4 shows the approximation guarantee, with a proof similar to Theorem 2, which is included in the full version.

0:   Payoffs , budget , tolerance .
1:   Initialize
2:   for

 all ordered pairs of targets

3:       for  do
8:       end for
9:   end for
10:   Output:  
Algorithm 2 PTAS for a special case in
Theorem 4.

Alg. 2 returns an additive approximate solution.

4 Optimizing Payoff with Budget Constraint in Other Forms

In this section, we explore budget constraints in other forms and show polynomial time algorithms correspondingly.

4.1 Weighted -norm Form

Consider the case where the defender can make changes to and for every target up to the extent specified by and respectively. Following previous notations, this requirement can be represented by a budget constraint in weighted -norm form, i.e., . Equivalently, the defender can choose and from a given range. A real-world setting for this problem is when a higher level of authority specifies a range of penalty for activities incurring pollution and allow the local agencies to determine the concrete level of penalty for different activities.

We observe that Condition 2 of Property 1 still holds in this setting. Therefore, such problem can be solved by simply solving subproblems. In the subproblem which assumes is the attack target, we may set reward and penalty of to be the upper bound in the given range and choose the lower bound for other targets. With our improved ORIGAMI-BS algorithm, this problem can be solved in time.

Theorem 5.

With budget constraint in weighted -norm, solving for defender’s optimal strategy reduces to solving for defender’s optimal coverage in fixed-payoff security games.

4.2 -norm Form

In some domains, the defender can make some of the targets special. For example, in wildlife protection, legislators can designate some areas as "core zones", where no human activity is allowed and much more severe punishment can be carried out. But the defender cannot set all the areas to be core zones. We model such restrictions as setting a limit on the Hamming distance between the original penalty vector and the manipulated penalty vector for the attacker, i.e., where is the budget. Following [Donoho and Elad2003], we refer to it as a -norm form budget constraint for simplicity even though it is not technically a norm. That is, the defender needs to pay a unit cost to manipulate on target but the magnitude of change can be arbitrary. The defender needs to choose which targets to make changes. We do not consider the case where the defender can arbitrarily modify the attacker’s reward as it is not practical and will lead to a trivial solution: the defender will place all coverage on one attack target and set .

We assume the defender has a budget which allows her to change the penalty of targets. Similar to the case, we first observe that the defender will choose an extreme penalty value once he decides to change the penalty of a target.

Property 1.

There exists an optimal solution where either for targets or for targets and for 1 target. If , then .


When is the attack target, the defender would like to maximize and minimize for all . If and , target will not be attacked as . In such case, target is effectively removed from the game. ∎

The defender’s problem becomes non-trivial when the budget , and we now provide a algorithm (Alg. 3) for solving this problem. We note that several intuitive greedy algorithms do not work, even in more restrictive game settings. A detailed comparison of our algorithm, several greedy algorithms, and a baseline MILP is provided in Section 5.

First, we sort the targets in decreasing attacker’s reward . Let and for all . When is the attack target, by Property 1, we have and . Let for . We notice that one of the ’s, denoted as , encapsulates the attack set in the optimal solution to our problem. That is, in the optimal solution, each target in the attack set is in ; those targets not in the attack set either are outside , or, if they are in , have . A proof can be found in the full version. This allows us to formally define a subproblem : assume (i) the optimal attack set is encapsulated by , (ii) the attack target is , and (iii) no target is covered with certainty, what is the defender’s optimal strategy ? A subproblem may be infeasible. First, we show that can be solved in time. From Equation 21, for subproblem , we have


Let if and if . Then reduces to finding out of the ’s to set to 0, and set the rest , so as to maximize the above quotient. As a result, is closely connected to the problem of choosing subsets with maximum weighted average, which can be solved efficiently.

Proposition 1.

[Eppstein and Hirschberg1997] Given a set where , real numbers , positive weights , and an integer . Among all subsets of of order , a subset which maximizes can be found in time.

Lemma 1.

The subproblem can be solved in time.


Consider Equation 22. We equate and . Let , . By Property 1, we may assume targets will be removed. Finding a subset , , to maximize is equivalent to our problem to maximize the quotient in Equation 22. ∎

After we find the optimal choices for the targets, we need to verify on Line 6 of Alg. 3 that the attack set is valid. Since for , we need , where is the attacker’s expected utility as defined in Equation 20. We also need valid coverage probabilities ’s. These could have been violated by setting some ’s to .

0:   Payoffs , budget
1:   Initialize .
2:   for attack set  do
3:       for attack target  do
4:           )
5:           Set for , for .
6:           If solution is valid, i.e. and , then update
7:           Repeat inner iteration with and .
8:       end for
9:   end for
10:   for target with largest  do
11:       for attack target  do
12:           Update if big enough for
13:       end for
14:   end for
15:   Output:  
Algorithm 3 Algorithm for budget in -norm form

We are now ready to show the main result of this section.

Theorem 6.

There is a algorithm for finding the optimal defender strategy with budget constraint in -norm.

Proof sketch.

Consider Alg. 3. Since is fixed, ’s cover all the attack sets that need to be checked. There are subproblems . For each , we run a randomized algorithm for the maximum weighted average problem with expected running time (Line 4). A deterministic algorithm exists in [Eppstein and Hirschberg1997]. The subproblems miss the solutions where some target is covered with certainty. In this case, is the only possible attack set, and the solution is found on Lines 10-14. A solution is feasible if by removing targets we can keep the sum of coverage probabilities below the defender’s resources. ∎

5 Experimental Results

5.1 Simulation Results for Budget Problem

We compare our branch-and-bound (BnB) algorithm (Alg. 1) with three baseline algorithms – NonConv, multiple MILP, and single MILP. NonConv refers to solving non-convex optimization problems as shown in Equations 2-8 using IPOPT [Wächter and Biegler2006] solver with default parameter setting, which converges to local optima with no global optimality guarantee. Multiple MILPs, as specified by Equations 9-19, and the single MILP formulation, in the full version, are equivalent and have an approximation guarantee specified in Thm. 2. The original payoff structures are randomly generated integers between and with penalties obtained by negation (recall is the number of targets). Budget and weights of the manipulations are randomly generated integers between and .

We set which gives an additive -approximate solution. Gurobi is used for solving MILPs, which is terminated when either time limit (15 min) or optimality gap () is achieved. For each problem size, we run experiments on a PC with Intel Core i7 processor. The solution quality of a particular algorithm is measured by the multiplicative gap between that algorithm and BnB, i.e. where

is best solution value by algorithm A. Thus a positive (negative) gap indicates better (worse) solution value than BnB. We report mean and standard deviation of the mean of runtime and solution quality in Fig. 

1. Small instances refer to problem sizes from 5 to 25. Large instances refer to problem sizes from 50 to 250.

[clip, trim=1.15in 3.1in 1.2in 3.3in, width=1.4in]camera_ready_figures/L1-small-mean

(a) Runtime, small instances

[clip, trim=1.15in 3.05in 0.85in 3.3in, width=1.4in]camera_ready_figures/L1-small-gap

(b) Gap, small instances

[clip, trim=1.15in 3.3in 1.2in 3.5in, width=1.4in]camera_ready_figures/L1-big-mean

(c) Runtime, large instances

[clip, trim=1.5in 3.3in 1.2in 3.3in, width=1.4in]camera_ready_figures/L1-big-gap

(d) Gap, large instances
Figure 1: Runtime and solution quality for case with standard deviation of the mean shown as vertical line

For problems of small size (Fig. (a)a and (b)b), BnB finds better solutions in nearly the same time as NonConv, faster than the other two. Since budget size can easily be indivisible by which is the atomic change we can make, greedy manipulation cannot be achieved by MILPs when such indivisibility happens. On the other hand, BnB first computes a global lower bound using such greedy manipulations, thus creating a gap between BnB and the other two MILP-based algorithms. Indeed the multiplicative gap between the greedy solution and the optimal solution is reported as

with a variance of

. For problems of large size (Fig. (c)c and (d)d), we only compare BnB and NonConv as the other two algorithms timed out in solving MILP. BnB runs faster than NonConv. It returns better solutions for three problem sizes and nearly the same solution for the other two cases. The MILP-based solution including BnB also has a larger standard deviation in runtime than NonConv.

5.2 Simulation Results for Budget Problem

We compare the performance of our algorithm with a baseline MILP and two greedy algorithms. Greedy1 removes a target that can lead to most solution quality increase at a time. Greedy2 starts from the target with highest and determines whether to remove it by checking the solution quality before and after removal. Details of these algorithms can be found in the full version.

Initial payoffs are generated in the same way as in the previous subsection. In Fig. (a)a, we assume the defender has resource and budget , the worst case for the algorithm. The runtime of MILP starts to explode with more than 100 targets, while the algorithm solves the problem rather efficiently. We also note that MILP exhibits high variance in runtime. The variances of other algorithms, including the algorithm, are relatively trivial and thus not plotted. We then test the algorithms with multiple defender resources, as shown in Fig. (b)b. With targets, we assume the defender has units of resources and a budget . Most MILP instances reach the time limit of 5 minutes when . Yet the algorithm’s runtime is almost the same as the single resource case.

Our algorithm and MILP are guaranteed to provide the optimal solution. In contrast, the greedy algorithms exhibit fast runtime but provide no solution guarantee. We measure the solution quality in Fig. (c)c and (d)d using where . Greedy1, which runs slightly slower than Greedy2, achieves higher solution quality but both greedy algorithms can lead to a significant loss. In fact, extreme examples exist, as shown in the full version.

[clip, trim=1.1in 3.3in 1.2in 3.6in, width=1.4in]camera_ready_figures/L0-r1-time

(a) Resource

[clip, trim=1.1in 3.3in 1.2in 3.6in, width=1.4in]camera_ready_figures/L0-rn10-time

(b) Resource

[clip, trim=1.1in 3.05in 1.1in 3.2in, width=1.4in]camera_ready_figures/L0-r1-greedy

(c) Resource

[clip, trim=1.1in 3.05in 1.1in 3.2in, width=1.4in]camera_ready_figures/L0-rn10-greedy

(d) Resource
Figure 2: Runtime and solution quality for case averaged over 22 trials. MILP has a time limit of 300 seconds. The error bars are standard deviations of the mean.


The research is initiated with the support of the CAIS summer scholar program.


  • [Basilico et al.2016] N. Basilico, A. Lanzi, and M. Monga. A security game model for remote software protection. In ARES ’16, pages 437–443, Aug 2016.
  • [Blocki et al.2013] Jeremiah Blocki, Nicolas Christin, Anupam Datta, Ariel D. Procaccia, and Arunesh Sinha. Audit games. In IJCAI ’13, pages 41–47, 2013.
  • [Blocki et al.2015] Jeremiah Blocki, Nicolas Christin, Anupam Datta, Ariel D. Procaccia, and Arunesh Sinha. Audit games with multiple defender resources. In AAAI’15, pages 791–797, 2015.
  • [Blum et al.2014] Avrim Blum, Nika Haghtalab, and Ariel D Procaccia. Learning optimal commitment to overcome insecurity. In NIPS, 2014.
  • [Conitzer and Sandholm2006] Vincent Conitzer and Tuomas Sandholm. Computing the optimal strategy to commit to. In EC, 2006.
  • [Donoho and Elad2003] David L Donoho and Michael Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proceedings of the National Academy of Sciences, 100(5):2197–2202, 2003.
  • [Durkota et al.2015] Karel Durkota, Viliam Lisỳ, Branislav Bosanskỳ, and Christopher Kiekintveld. Optimal network security hardening using attack graph games. In IJCAI, 2015.
  • [Eppstein and Hirschberg1997] David Eppstein and Daniel S. Hirschberg. Choosing subsets with maximum weighted average. J. Algorithms, 24(1):177–193, 1997.
  • [Fang et al.2016] Fei Fang, Thanh H. Nguyen, Rob Pickles, Wai Y. Lam, Gopalasamy R. Clements, Bo An, Amandeep Singh, Milind Tambe, and Andrew Lemieux. Deploying paws: Field optimization of the protection assistant for wildlife security. In AAAI’16, pages 3966–3973, 2016.
  • [Fujishima et al.1999] Yuzo Fujishima, Kevin Leyton-Brown, and Yoav Shoham. Taming the computational complexity of combinatorial auctions: Optimal and approximate approaches. In IJCAI’99, pages 548–553, 1999.
  • [Horák et al.2017] Karel Horák, Quanyan Zhu, and Branislav Bošanský. Manipulating adversary’s belief: A dynamic game approach to deception by design for proactive network security. In

    Decision and Game Theory for Security

    . Springer, 2017.
  • [Kang and Wu2015] Xin Kang and Yongdong Wu. Incentive mechanism design for heterogeneous peer-to-peer networks: A stackelberg game approach. IEEE Transactions on Mobile Computing, 14(5):1018–1030, 2015.
  • [Kiekintveld et al.2009] Christopher Kiekintveld, Manish Jain, Jason Tsai, James Pita, Fernando Ordóñez, and Milind Tambe. Computing optimal randomized resource allocations for massive security games. In AAMAS ’09, pages 689–696, 2009.
  • [Kiekintveld et al.2011] Christopher Kiekintveld, Janusz Marecki, and Milind Tambe. Approximation methods for infinite bayesian stackelberg games: Modeling distributional payoff uncertainty. In AAMAS, 2011.
  • [Kiekintveld et al.2013] Christopher Kiekintveld, Towhidul Islam, and Vladik Kreinovich. Security games with interval uncertainty. In AAMAS ’13, 2013.
  • [Kiekintveld et al.2015] Christopher Kiekintveld, Viliam Lisỳ, and Radek Píbil. Game-theoretic foundations for the strategic use of honeypots in network security. In Cyber Warfare, pages 81–101. Springer, 2015.
  • [Korzhyk et al.2010] Dmytro Korzhyk, Vincent Conitzer, and Ronald Parr. Complexity of computing optimal stackelberg strategies in security resource allocation games. In AAAI, 2010.
  • [Laszka et al.2017] Aron Laszka, Yevgeniy Vorobeychik, Daniel Fabbri, Chao Yan, and Bradley Malin. A game-theoretic approach for alert prioritization. In

    AAAI-17 Workshop on Artificial Intelligence for Cyber Security (AICS)

    , 2017.
  • [Letchford and Conitzer2013] Joshua Letchford and Vincent Conitzer. Solving security games on graphs via marginal probabilities. In AAAI, 2013.
  • [Letchford et al.2009] Joshua Letchford, Vincent Conitzer, and Kamesh Munagala. Learning and approximating the optimal strategy to commit to. In Marios Mavronicolas and Vicky G. Papadopoulou, editors, Algorithmic Game Theory, pages 250–262. Springer Berlin Heidelberg, 2009.
  • [Myerson1989] Roger B Myerson. Mechanism design. The New Palgrave: Allocation, Information, and Markets, 1989.
  • [Paruchuri et al.2008] Praveen Paruchuri, Jonathan P. Pearce, Janusz Marecki, Milind Tambe, Fernando Ordonez, and Sarit Kraus. Playing games for security: An efficient exact algorithm for solving bayesian stackelberg games. In AAMAS ’08, 2008.
  • [Píbil et al.2012] Radek Píbil, Viliam Lisỳ, Christopher Kiekintveld, Branislav Bošanskỳ, and Michal Pěchouček. Game theoretic model of strategic honeypot selection in computer networks. In International Conference on Decision and Game Theory for Security, pages 201–220. Springer, 2012.
  • [Schlenker et al.2018] Aaron Schlenker, Omkar Thakoor, Haifeng Xu, Milind Tambe, Phebe Vayanos, Fei Fang, Long Tran-Thanh, and Yevgeniy Vorobeychik. Deceiving cyber adversaries: A game theoretic approach. In AAMAS, 2018.
  • [Sharma and Williamson2007] Yogeshwer Sharma and David P. Williamson. Stackelberg thresholds in network routing games or the value of altruism. In EC ’07, pages 93–102, 2007.
  • [Wächter and Biegler2006] Andreas Wächter and Lorenz T Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical programming, 106(1):25–57, 2006.
  • [Wang et al.2016] Zhen Wang, Yue Yin, and Bo An. Computing optimal monitoring strategy for detecting terrorist plots. In AAAI’16, pages 637–643, 2016.
  • [Xue et al.2016] Yexiang Xue, Ian Davies, Daniel Fink, Christopher Wood, and Carla P. Gomes. Behavior identification in two-stage games for incentivizing citizen science exploration. In Principles and Practice of Constraint Programming. Springer, 2016.
  • [Yang et al.2012] Dejun Yang, Guoliang Xue, Xi Fang, and Jian Tang. Crowdsourcing to smartphones: Incentive mechanism design for mobile phone sensing. In Mobicom ’12, 2012.
  • [Yin and Tambe2012] Zhengyu Yin and Milind Tambe. A unified method for handling discrete and continuous uncertainty in bayesian stackelberg games. In AAMAS ’12, 2012.
  • [Yin et al.2012] Zhengyu Yin, Albert Xin Jiang, Matthew Paul Johnson, Christopher Kiekintveld, Kevin Leyton-Brown, Tuomas Sandholm, Milind Tambe, and John P Sullivan. Trusts: Scheduling randomized patrols for fare inspection in transit systems. In IAAI, 2012.

Appendix A Omitted Algorithms

a.1 ORIGAMI with Binary Search

0:   Payoffs
1:   Initialize .
2:   while  do
3:       Let , attack set .
4:       Calculate and using Equations 20 - 21
5:       if  for some  then
7:       else if  then
9:       else
11:           break
12:       end if
13:   end while
14:   Let . Let
15:   if  then
16:       . For all , set
17:   end if
18:   for  do
20:   end for
21:   for  do
22:       .
23:   end for
24:   Output:  
Algorithm 4 ORIGAMI-BS

a.2 A single MILP for Budget Problem


All constraints involving sub/super-script without a summation apply to all proper range of summation indices.

We first introduce non-negative integer variables , and constraints , , , to replace the absolute value change. We use binary representation for and in constraints 24-25. Specifically, we have where are 0-1 variables. Recall from Prop. 1 we can assume without loss of generality that and .

After the above reformulation, notice that we have bilinear term like involved in the formulation. We introduce real-valued variables and constraints 31-32 to enforce and

. We then introduce binary variables

to indicate whether target is in the attack set (the set of targets with highest attacker utility) and constraint 34-35 to enforce that the attack target has the highest attacker’s expected utility. Constraint 36 enforces to be upper bounded by the defender’s expected utility of the attack target. Therefore maximizing gives the defender’s expected utility of the attack target.

a.3 Baseline Algorithms for Budget Problem