1 Introduction
Stackelberg security games played between a defender (leader) and an attacker (follower) have been widely studied in the past few years [9, 11, 10, 15]. Most models, in particular, including all the deployed security systems in [15], assume that the attacker is not able to observe (even partially) the defender’s instantiated pure strategy (i.e., which targets are being protected), thus he makes decisions based only on his knowledge of the defender’s mixed strategy. This fails to capture the attacker’s realtime surveillance, by which he may partially observe the deployed pure strategy. For example, the attacker may observe the protection status of a certain target while approaching for an attack; or in some security domains information regarding the protection status of certain targets may leak to the attacker due to realtime surveillance or even an insider threat; further, wellprepared attackers may approach certain adversarially chosen target to collect information before committing an attack.
Unfortunately, this problem– an issue we refer to as information leakage – has not received much attention in Stackelberg security games. In the literature of patrolling games, attackers’ realtime surveillance is indeed considered [2, 1, 3, 4, 5, 17]. However, all these papers study settings of patrols carried out over space and time, i.e., the defender follows a schedule of visits to multiple targets over time. In addition, they assume that it takes time for the attacker to execute an attack, during which the defender can interrupt the attacker by visiting the attacked target. Therefore, even if the attacker can fully observe the current position of the defender (in essence, status of all targets), he may not have enough time to complete an attack on a target before being interrupted by the defender. The main challenge there is to create patrolling schedules with the smallest possible time between any two target visits. In contrast, we consider information leakage in standard security game models, where the attack is instantaneous and cannot be interrupted by the defender’s resource reallocation. Furthermore, as may be more realistic in our settings, we assume that information is leaked from a limited number of targets. As a result, our setting necessitates novel models and techniques. We also provide efficient algorithms with complexity analysis.
This paper considers the design of optimal defender strategy in the presence of partial information leakage. Considering that realtime surveillance is costly in practice, we explicitly assume that information leaks from only one target, though our model and algorithms can be generalized. We start from the basic security game model where the defender allocates resources to protect targets without any scheduling constraint. Such models have applications in real security systems like ARMOR for LAX airport and GUARDS for airports in general [15]. We first show via a concrete example in Section 2
how ignoring information leakage can lead to significant utility loss. This motivates our design of optimal defending strategy given the possibility of information leakage. We start with a linear program formulation. However, surprisingly, we show that it is difficult to solve the LP
even for this basic case, whereas the optimal mixed strategy without leakage can be computed easily. In particular, we show that the defender oracle, a key subproblem used in the column generation technique employed for most security games, is NPhard. This shows the intrinsic difficulty of handling information leakage. We then approach the problem from three directions: efficient algorithms for special cases, approximation algorithms and heuristic algorithms for sampling that improves upon the status quo. Our experiments support our hypothesis that ignoring information leakage can result in significant loss of utility for the defender, and demonstrates the value of our algorithms.2 Model of Information Leakage
Consider a standard zerosum Stackelberg security game with a defender and an attacker. The defender allocates security resources to protect targets, which are denoted by the set . In this paper we consider the case where the security resources do not have scheduling constraints. That is, the defender’s pure strategy is to protect any subset of of size at most . For any , let be the reward [ be the cost] of the defender when the attacked target is protected [unprotected]. We consider zerosum games, therefore the attacker’s utility is the negation of the defender’s utility. Let denote a pure strategy and be the set of all possible pure strategies. With some abuse of notation, we sometimes regard as a subset of denoting the protected targets; and sometimes view it as an dimensional vector with ’s specifying the protected targets. The intended interpretation should be clear from context. The support
of a mixed strategy is defined to be the set of pure strategies with nonzero probabilities. Without information leakage, the problem of computing the defender’s optimal mixed strategy can be compactly formulated as linear program (
1) with each variable as the marginal probability of covering target. The resulting marginal vector
is a convex combination of the indicator vectors of pure strategies, and a mixed strategy with small support can be efficiently sampled, e.g., by Comb Sampling [16].(1) 
Building on this basic security game, our model goes one step further and considers the possibility that the protection status of one target leaks to the attacker. Here, by “protection status” we mean whether this target is protected or not in an instantiation of the mixed strategy. We consider two related models of information leakage:

PRobabilistic Information Ieakage (PRIL): with probability a single target leaks information; and with probability no targets leak information. So we have where is the dimensional simplex. In practice, is usually given by domain experts and may be determined by the nature or property of targets.

ADversarial Information Leakage (ADIL): with probability , one adversarially chosen target leaks information. This model captures the case where the attacker will strategically choose a target for surveillance and with certain probability he succeeds in observing the protection status of the surveyed target.
Given either model – PRIL with any or ADIL – we are interested in computing the optimal defender patrolling strategy. The first question to ask is: why does the issue of information leakage matter and how does it affect the computation of the optimal defender strategy? To answer this question we employ a concrete example.
Consider a zerosum security game with targets and resources. The profiles of reward [cost ] is [], where the coordinates are indexed by target ids. If there is no information leakage, it is easy to see that the optimal marginal coverage is . The attacker will attack an arbitrary target, resulting in a defender utility of . Now, let us consider a simple case of information leakage. Assume the attacker observes whether target is protected or not in any instantiation of the mixed strategy, i.e., . As we will argue, how the marginal probability is implemented would matter now. One way to implement is to protect target with probability and protect with probability . However, this implementation is “fragile” in the presence of the above information leakage. In particular, if the attacker observes that target is protected (which occurs with probability ), he infers that the defender is protecting target and will attack or , resulting in a defender utility of ; if target is not protected, the attacker will just attack, resulting in a defender utility of . Therefore, the defender gets expected utility .
Now consider another way to implement the same marginal by the following mixed strategy:
If the attacker observes that target is protected (which occurs with probability ), then he infers that target is protected with probability , and target are both protected with probability . Some calculation shows that the attacker will have the same utility on target and thus will choose an arbitrary one to attack, resulting in a defender utility of . On the other hand, if target is observed to be unprotected, the defender gets utility . In expectation, the defender gets utility .
As seen above, though implementing the same marginals, the latter mixed strategy achieves better defender utility than the former one in the presence of information leakage. However, is it optimal? It turns out that the following mixed strategy achieves an even better defender utility of , which can be proved to be optimal: protect with probability , with probability and with probability .
This example shows that compact representation by marginal coverage probabilities is not sufficient for computing the optimal defending strategy assuming information leakage. This naturally raises new computational challenges: how can we formulate the defender’s optimization problem and compute the optimal solution? Is there still a compact formulation or is it necessary to enumerate all the exponentially many pure strategies? What is the computational complexity of this problem? We answer these questions in the next section.
3 Computing Optimal Defender Strategy
We will focus on the derivation of the PRIL model. The formulation for the ADIL model is provided at the end of this section since it admits a similar derivation. Fixing the defender’s mixed strategy, let () denote the event that target is protected (unprotected). For the PRIL model, the defender’s utility equals
where is the defender’s utility when there is no information leakage; and
is the defender’s utility when target leaks out its protection status as (i.e., protected) multiplied by probability . Similarly
is the defender’s expected utility multiplied by probability when target leaks status (i.e., unprotected)
Define variables (setting ). Using the fact that and , we obtain the following linear program which computes the defender’s optimal patrolling strategy:
(2) 
where are variables; denotes a pure strategy and the sum condition “” means summing over all the pure strategies that protect both targets and (or if ); denotes the probability of choosing strategy .
Unfortunately, LP (2) suffers from an exponential explosion of variables, specifically, . From the sake of computational efficiency, one natural idea is to find a compact representation of the defender’s mixed strategy. As suggested by LP (2), the variables , indicating the probability that targets are both protected, are sufficient to describe the defender’s objective and the attacker’s incentive constraints.
Let us call variables the pairwise marginals and think of them as a matrix , i.e., the ’th row and ’th column of is (not to be confused with the marginals ). We say is feasible if there exists a mixed strategy, i.e., a distribution over pure strategies, that achieves the pairwise marginals . Clearly, not all are feasible. Let be the set of all feasible . The following lemma shows a structural property of .
Lemma 1.
is a polytope and any is a symmetric positive semidefinite (PSD) matrix.
Proof.
Notice that is feasible if and only there exists for any pure strategy such that the following linear constraints hold:
(3) 
These constraints define a polytope for variables , therefore its projection to the lower dimension , which is precisely , is also a polytope.
To prove is PSD, we first observe that any vertex of , characterizing a pure strategy, is PSD. In fact, let be any pure strategy, then the pairwise marginal w.r.t. is , which is PSD. Therefore, any , which is a convex combination of its vertices, is also PSD. ∎
(4) 
With Lemma 1, we may rewrite LP (2) compactly as LP (4) with variables , , and . Therefore, we would be able to compute the optimal strategy efficiently in polynomial time if the constraints determining the polytope were only polynomially many – recall that this is the approach we took with LP (1) in the case of no information leakage. However, perhaps surprisingly, the problem turns out to be much harder in the presence of leakage.
Lemma 2.
Optimizing over is NPhard.
Proof.
We prove by reduction from the densest subgraph problem. Given any graph instance , let be the adjacency matrix of . Consider the following linear program:
(5) 
This linear program must have a vertex optimal solution which satisfies for some pure strategy . Therefore, the linear objective satisfies
Notice that equals the density of a subgraph of with nodes indicated by . Since is the optimal solution to LP (5), it also maximizes the density over all subgraphs with nodes. In other words, the ability of optimizing LP (5) implies the ability of computing the densest subgraph, which is NPhard. Therefore, optimizing over is NPhard. ∎
Lemma 2 suggests that there is no hope of finding polynomially many linear constraints which determine or, more generally, an efficient separation oracle for , assuming . In fact, is closely related to a fundamental geometric object, known as the correlation polytope
, which has applications in quantum mechanics, statistics, machine learning and combinatorial problems. We show a connection between
and the correlation polytope in Appendix B. For further information, we refer the reader to [13].Another approach for computing the optimal defender strategy is to use the technique of column generation, which is a master/slave decomposition of an optimization problem. The essential part of this approach is the slave problem, which is also called the “defender best response oracle” or “defender oracle” for short [6]. We defer the derivation of the defender oracle to Appendix A, while only mention that a similar reduction as in the proof of Lemma 2 also implies the follows.
Lemma 3.
The defender oracle is NPhard.
By now, we have shown the evidence of the difficulty of solving LP (2) using either marginals or the technique of column generation. For the ADIL model, a similar argument yields that the following LP formulation computes the optimal defender strategy. It is easy to check that it shares the same marginals and defender oracle as the PRIL model.
(6) 
where variable is the defender’s expected utility when an adversarially chosen target is observed by the attacker.
3.1 Leakage from Small Support of Targets
Despite the hardness results for the general case, we show that the defender oracle admits a polynomial time algorithm if information only leaks from a small subset of targets; we call this set the leakage support. By reordering the targets, we may assume without loss of generality that only the first targets, denoted by set , could possibly leak information in both the PRIL and ADIL model. For the PRIL model, this means for any and for the ADIL model, this means the attacker only chooses a target in for surveillance.
Why does this make the problem tractable? Intuitively the reason is as follows: when information leaks from a small set of targets, we only need to consider the correlations between these leaking targets and others, which is a much smaller set of variables than in LP (2) or (6). Restricted to a leakage support of size , the defender oracle is the following problem (See Appendix A for the derivation). Let be a symmetric matrix of the following block form
(7) 
where ; for any integers is a submatrix and, crucially, is a diagonal matrix. Given of form (7), find a pure strategy such that is maximized. That is, the defender oracle identifies the size principle submatrix with maximum entry sum for any of form (7). Note that in general case.
Before detailing the algorithm, we first describe some notation. Let be the ’th row of matrix and be the vector consisting of the diagonal entries of . For any subset of , let be the submatrix of consisting of rows in and columns in , and be the entry sum of . The following lemma shows that Algorithm 1 solves the defender oracle. Our main insight is that for a pure strategy to be optimal, once the set is decided, its complement can be explicitly identified, therefore we can simply bruteforce search to find the best . Lemma 4 provides the algorithm guarantee, which then yields the polynomial solvability for the case of small (Theorem 1).
Lemma 4.
Let be the size of the leakage support. Algorithm 1 solves the defender oracle and runs in time. In particular, the defender oracle admits a time algorithm if is a constant.
Proof.
First, it is easy to see that Algorithm 1 runs in time since the forloop is executed at most times. We show that it solves the defender oracle problem.
Let denote the indices of the principle submatrix of with maximum entry sum. Notice that can also be viewed as a pure strategy. Let and . We claim that, given , must be the set of indices of the largest values from the set , where is defined as . In other words, if we know , the set can be easily identified. To prove the claim, we rewrite the as follows:
where and is the subvector of with indices in . Given , is fixed, therefore must be the set of indices of the largest elements from . Algorithm 1 then loop over all the possible ( many ) and identifies the optimal one, i.e., the one achieving the maximum . ∎
Theorem 1.
(Polynomial Solvability) There is an efficient time algorithm which computes the optimal defender strategy in the PRIL and ADIL model, if the size of the leakage support is a constant.
3.2 An Approximation Algorithm
We now consider approximation algorithms. Recall that information leakage is due to the correlation between targets, thus one natural way to minimize leakage is to allocate each resource independently with certain distributions. Naturally, the normalized marginal becomes a choice, where is the solution to LP (1). To avoid the waste of using multiple resources to protect the same target, we sample without replacement. Formally, the independent sampling without replacement algorithm proceeds as follows: 1. compute the optimal solution of LP (1); 2. independently sample elements from without replacement using distribution .
Zerosum games exhibit negative utilities, therefore an approximation ratio in terms of utility is not meaningful. To analyze the performance of this algorithm we shift all the payoffs by a constant, , and get an equivalent constantsum game with all nonnegative payoffs. Theorem 2 shows that this algorithm is “almost” a to the optimal solution in the PRIL model, assuming information leaks out from any target with equal probability . We note that proving a general approximation ratio for any turns out to be very challenging, intuitively because the optimal strategy adjusts according to different while the sampling algorithm does not depend on . However, experiments empirically show that the ratio does not vary much for different on average (see Section 5).
Theorem 2.
Assume each target leaks information with equal probability . Let be the shifted cost and be the defender utility achieved by independent sampling without replacement. Then we have:
where is an additive loss to , which is usually small in security games.
Proof of Theorem 2
Let be a function of any , where is the probability that target are both protected using independent sampling without replacement. We first prove Lemma 5, which provides a lower bound regarding how good the pairwise marginals in approximate the given marginals . The difficulty of proving Lemma 5 lies at that does not have a close form in terms of if we sample without replacement. Our proof is based on a coupling argument by relating the algorithm to independent sampling with replacement.
Lemma 5.
Given , satisfies the following (in)equalities:
(8)  
(9)  
(10) 
Proof.
The first equation is easy to see, since each sampled pure strategy has different targets due to sampling without replacement. To prove the other two inequalities, we instead consider independent sampling with replacement. Similarly, define function to a function of , where is the probability that target are protected together when sampling with replacement. Contrary to , has succinct close forms, therefore we can lower bound entries in . We first consider .
where we used the fact for any . Now we lower bound as follows.
where all the equations just follow the arithmetic, while the inequality uses the fact that and is a decreasing function of . We now upperbound the term using the formula , as follows
Plugging in the above upper bound back to Inequality 3, we thus have
where the last inequality is due to the fact that is a decreasing function for and is upper bounded by .
Therefore, we have . We then conclude our proofs by claiming that and .
To prove our claim, we use a coupling argument. Consider the following two stochastic process (StoP):

: at time independently sample a random value () with probability for any until precisely different elements from show up.

: at time independently sample a random value () with probability for .
Let [] denote all the possible random sequences generated by [], and [] denote the subset of [], which consists of all the sequences including at least one . For any , let be the subset of sequences in , whose first element is precisely . Notice that any sequence in has at least length of while any sequence in has precisely elements. Furthermore, and for any and .
Now, think of each sequence as a probabilistic event generated by the stochastic process. Notice that due to the independence of the sampling procedure, therefore, we have
However, and . This proves .
Notice that is equivalent to . To prove this inequality, we claim that it is without loss of generality to assume the first sample is in both processes. This is because, if the first shows up at the ’th sample, moving to the first position would not change the probability of the sequence due to independence between each sampling step. Conditioned on is sampled first, a similar argument as above shows that the probability of Stochastic process generating is at least the probability of stochastic process generating . ∎
Let be the optimal solution to LP (1) and be the corresponding objective value – the defender optimal utility with no leakage. To prove Theorem 2, we start from comparing with . From the objective of LP (2), we know that , since is the best possibly utility using resources, and since if target is uncovered, the defender gets utility at most . Therefore, we have
(12)  
where we used the equation . We now examine . A simple argument yields that, if for all and each target is covered by probability at least for any , then the defender utility is at least . Therefore, by Lemma 5 we have
(13)  
where we used the fact that . Comparing Inequalities (12) and (13), we have . This concludes our proof of Theorem 2.
4 Sampling Algorithms
From Carathéodory’s theorem we know that, given any marginal coverage , there are many different mixed strategies achieving the same marginal (e.g., see examples in Section 2). Another way to handle information leakage is to generate the optimal marginal coverage , computed by LP (1), with low correlation between targets. Such a “good” mixed strategy, e.g., the mixed strategy with maximum entropy, is usually supported on a pure strategy set of exponential size. In this section, we propose two sampling algorithms, which efficiently generate a mixed strategy with exponentially large support and are guaranteed to achieve any given marginal .
4.1 MaxEntropy Sampling
Perhaps the most natural choice to achieve low correlation is the distribution with maximum entropy restricted to achieving the marginal , which can be formulated as the solution of Convex Program (CP) (14). However, naive approaches for CP (14) require exponential running time since there are variables. Interestingly, it turns out that this be resolved.
(14) 
where variable is the probability of taking pure strategy .
Theorem 3.
There is an efficient algorithm which runs in time and outputs a pure strategy with probability for any pure strategy , where is the optimal solution to Convex Program (14) (within machine precision^{1}^{1}1Computers cannot solve general convex programs exactly due to possible irrational solutions. Therefore, our algorithm is optimal within machine precision, and we simply call it ”solved”.).
The proof of Theorem 3 relies on Lemmas 6 and 7. Lemma 6 presents a compact representation of based on the KKT conditions of CP (14) and its dual – the unconstrained Convex Program (15):
(15) 
where variables and . We notice that the dual program (15) as well as the characterization of in Lemma 6 are not new (e.g., see [14]), and we state it for completeness. Our contribution lies at proving that CP (15) can be computed efficiently in time in our security game setting despite the summation of terms.
Lemma 6.
Let be the optimal solution to CP (15) and set for any , then the optimal solution of CP (14) satisfies
(16) 
where for any pure strategy .
Furthermore, can be computed in time.
Proof.
As proved in [14], the above is precisely where is the optimal solution to CP (14). We show that can be computed in time. Notice that CP (15) has variables but an expression of exponentially many terms, specifically, . The essential difficulty of computing lies at computing the sum , since the other parts can be explicitly calculated in polynomial time. Fortunately the sum exhibits some combinatorial structure,and combinatorial algorithms could be employed for computation. In particular, we show that a dynamic program computes the sum in time. The algorithm for computing can be designed in a similar fashion, and hence left to the reader. Since a convex program can be solved efficiently in machine precision given the access to its function value and derivatives, we then conclude our proof by describing the following dynamic program to compute , given any .
Notice that the set of all pure strategies consists of