Log In Sign Up

Learning and Planning in Feature Deception Games

by   Zheyuan Ryan Shi, et al.

Today's high-stakes adversarial interactions feature attackers who constantly breach the ever-improving security measures. Deception mitigates the defender's loss by misleading the attacker to make suboptimal decisions. In order to formally reason about deception, we introduce the feature deception game (FDG), a domain-independent game-theoretic model and present a learning and planning framework. We make the following contributions. (1) We show that we can uniformly learn the adversary's preferences using data from a modest number of deception strategies. (2) We propose an approximation algorithm for finding the optimal deception strategy and show that the problem is NP-hard. (3) We perform extensive experiments to empirically validate our methods and results.


Mixed Strategies for Security Games with General Defending Requirements

The Stackelberg security game is played between a defender and an attack...

Green Security Game with Community Engagement

While game-theoretic models and algorithms have been developed to combat...

A game-theoretic approach to timeline-based planning with uncertainty

In timeline-based planning, domains are described as sets of independent...

Learning Mixed Strategies in Trajectory Games

In multi-agent settings, game theory is a natural framework for describi...

An Iterative Security Game for Computing Robust and Adaptive Network Flows

The recent advancement in cyberphysical systems has led to an exponentia...

Connected Subgraph Defense Games

We study a security game over a network played between a defender and k ...

Game of Trojans: A Submodular Byzantine Approach

Machine learning models in the wild have been shown to be vulnerable to ...

1 Introduction

The world today poses more challenges to security than ever before. Consider cyberspace or the financial world where a defender is protecting a collection of targets, e.g. servers or accounts. Despite the ever-improving security measures, malicious attackers work diligently and creatively to outstrip the defense [Potter and Day2009]. Against an attacker with previously unseen exploits and abundant resources, the attempt to protect any target is almost surely a lost cause [Hurlburt2016]. However, the defender could induce the attacker to attack a less harmful, or even fake, target. This can be seen as a case of deception.

Deception has been recognized as an important tactic in military operations for millenia [Latimer2001]. More recently, deception has been extensively studied in cybersecurity [Jajodia et al.2016, Horák et al.2017]. Cyberattackers use tools such as Nmap [Lyon2009] to probe the target network. Security researchers have proposed many deceptive measures to manipulate the network’s replies to these probes [Jajodia et al.2017, Albanese et al.2016], which could confound and mislead an attempt to attack. In addition, the use of honey-X, such as honey pots, honey users, and honey files have been proposed and implemented to attract and induce the attackers to attack these fake targets [Spitzner2003]. For example, nakashima2013 nakashima2013 reported that country A once created encrypted, but fake, files labeled with the names of country B’s military systems and put them in folders marked for sharing with country A’s intelligence agency. Using these sensitive filenames as bait, country A successfully lured the hackers on the other end to these decoy targets.

Be it commanding an army or protecting a computer network, a common characteristic is that the attacker gathers information about the defender’s system via surveillance to make decisions, and the defender can (partly) control how her system appears to the attacker. We formalize this view of the defender’s system as features, and propose the feature deception game (FDG) to model the strategic interaction between the defender and the attacker.

It is evident that the FDG model could be applied to many domains, by appropriately defining the relevant set of features. To be concrete, we will ground our discussion in cybersecurity, where a cyberattacker observes the features of each network node and then chooses a node to compromise. The left column of Table 1 presents some relevant features. Note that these features can be continuous or discrete. If an intruder has an exploit for Windows machines, a Linux server might not be attractive to him. If the attacker is interested in exfiltration, he might choose a machine running database services. Based on such information, the defender could strategically make machines that lead to extensive harm if compromised appear undesirable to the attacker, by changing the feature values, e.g. Table 1. However, before doing so, she needs to learn the attacker’s preferences from attack patterns in order to make an informed decision.

Feature Observable value Hidden value
Operating system Windows 2016 RHEL 7
Service version v1.2 v1.4
IP address
Open ports 22, 445 22, 1433
Round trip time for probes
[Shamsi et al.2014]
16 ms 84 ms
Table 1: Some relevant features for cybersecurity

Our Contributions

We make four key contributions. First, we propose the FDG model, which abstracts the information relevant to decision-making as features. In an FDG, the defender manipulates the features of each target in the system. Upon observing the features, the attacker chooses a target to attack based on his preferences in a stochastic way.

Our second contribution is a sample complexity analysis of learning in FDG. We show that to learn a restricted class of attacker’s preferences, the defender needs only gather a polynomial number of data points on a modest number of feature configurations. The sample complexity is dependent on the number of features and the norm of the inverse feature difference matrix (as introduced later).

Third, we analyze the planning problem of finding the optimal deception strategy once the attacker’s preferences are learned. We show that the problem is NP-hard in general and propose an approximation algorithm.

Finally, we perform extensive experiments to validate our results. We demonstrate that we can learn the attacker’s behavior model using a modest amount of data. Our planning algorithm can solve FDGs with 200 targets in 1 minute. Our combined learning and planning framework can find a deception strategy that is almost as good as the optimal strategy had we known the attacker’s true preferences.

2 The Model

In an FDG, a defender aims to protect a set of targets from an adversary. Each target has a set of features which the defender can strategically manipulate. The adversary observes these features and then chooses a target to attack. The defender incurs a loss if the adversary chooses to attack target .111Typically, the loss is non-negative, but it might be negative if, for example, the target is set up as a decoy or honeypot, and allows the defender to gain information about the attacker. The defender’s objective is to minimize her expected loss.


Features are the key element of the FDG model. Each feature has an observable value and a hidden value. The hidden value is fixed, while the defender controls the observable value. Only the observable values are visible to the adversary. This ties into the notion of deception, where one may think of the hidden value as representing the ‘ground truth,’ whereas the observable value is what the defender would like the attacker to see. Table 1 shows an example of the observable and hidden values of different features in cybersecurity.

Deception means that the defender manipulates the attacker’s perceived value of a target, not the actual value. Thus, changing the observable values does not affect the defender’s loss at each target.

Feature representation

We represent the observable feature values of target

by a vector

. We denote their corresponding hidden values as

. We allow for both continuous and discrete features. In practice, we may have categorical features, such as the type of operating system, and they can be represented using one-hot encoding with binary features.

Feasibility constraints

For a feature with hidden value , the defender can set its observable value , where the feasible set is determined by the hidden value. In the sequel, for continuous features, we assume takes the form . This captures the feasibility constraint in setting up the observable value of a feature based on its hidden value. For binary features, . In addition to these feasibility constraints for individual features, we also allow for linear constraints over multiple features, which could encode natural constraints for categorical features with one-hot encoding, e.g. . They may also encode the realistic considerations when setting up the observable features. For example, could mean that a Linux machine cannot possibly have ActiveX available.

Budget constraint

Deception comes at a cost. We assume the cost is additive across targets and features: , where . For a continuous feature , represents the cost associated with unit of change from the hidden value to the observable value. If is binary, defines the cost of switching states. The defender has a budget to cover these costs.

Defender strategies

The defender’s strategy is an observable feature configuration . We assume the defender uses only pure strategies. We discuss the relaxation of this assumption in Section 7.

Attacker strategies

The attacker’s pure strategy is to choose a target to attack. Since human behavior is not perfectly rational, we reason about the adversary using a general class of bounded rationality models. We assume the attacker’s utilities are characterized by a score function over the observable features of a target. Given the observable feature configuration , he attacks target

with probability

. We assume that is parameterized by

, and takes the form of various function classes, e.g., a neural network. Given the strong expressive power of neural networks, such model can approximate a large class of actual behavioral models of the attacker.

The ultimate goal of the defender is to find the optimal feature configuration against an unknown attacker. This may be decomposed into two subtasks. First, she learns the attacker’s behavior model from attack data (learning). Then, she manipulates the feature configuration to minimize her expected loss (planning), based on the learned attacker’s preferences. In the following sections, we first analyze the sample complexity of the learning task and then propose algorithms for the planning task.

3 Learning the Adversary’s Preferences

The defender attempts to learn the adversary’s score function from a set of labeled data points. The data point is denoted as . is the set of targets in this example, and is the observable feature configuration of each target in . The label indicates that the adversary chooses target to attack.

A general approach for learning the score function , parameterized by

, is by maximum-likelihood estimation, i.e.,


Assuming is Lipschitz and differentiable in

, we can apply any gradient-based optimizer, e.g. RMSProp 

[Hinton et al.2012], to solve the optimization problem for . This approach can be applied to general score functions, though it is not guaranteed to find the optimal solution given the non-convexity of .

To analyze the sample complexity of learning the adversary’s preferences, we now proceed with a different learning algorithm and a special form of score function — a one-layer neural network followed by an exponential function, parameterized by .


We show that, in an FDG with features, the defender can learn the attacker’s behavior model correctly with high probability, using only observable feature configurations and a polynomial number of samples. We view this condition as very mild, because even if the network administrator’s historical dataset does not meet the requirement, she could set up a honeynet to elicit attacks, where she can control the feature configurations at each target [Spitzner2003].

haghtalab2016IJCAI haghtalab2016IJCAI studied a closely related problem on the sample complexity in Stackelberg security games. However, their techniques cannot be directly applied, as in security games the coverage probability is the only “feature” that the defender controls, while in FDG there could be an arbitrary number of features. We leverage the high-level idea in [Haghtalab et al.2016], and introduce the inverse feature difference matrix . Specifically, given observable feature configurations , for any two targets , let be the matrix whose -entry is . captures the matrix-level correlation among different feature configurations. We use the matrix norm of to bound the learning error. In doing so, we also eliminate the technical conditions they imposed on defender strategies. Our result is formally stated as the following theorem; its proof is deferred to Appendix B.1.

Theorem 3.1.

Consider observable feature configurations . Let , where is the matrix norm induced by the vector norm, i.e. . With samples for each of the feature configurations, with probability , we can uniformly learn within multiplicative error .

Proof sketch.

For feature configuration , let be the attack probability on target , and assume for all and . Construct a system of equations , where . To find , we solve the equations , where is based on empirical distributions . Using a concentration argument, we can bound the difference . With the norm of , we can then bound the difference . ∎

The in the theorem above need not be large. Consider a sequence of feature configurations , and focus on targets 1 and 2. For each , let the features on target 1 be identical to target 2, except for the -th feature, where and . This leads to , and thus . This is compatible with the binary encoding of categorical features, if we represent the default category as all 0’s.

When the score function is approximately learned, the optimal feature configuration assuming the learned score function is also near optimal against the true score function . Let be the defender’s expected loss using feature configuration assuming . Define as the corresponding defender’s utility assuming . We can adapt Theorem 3.7 of haghtalab2016IJCAI haghtalab2016IJCAI to obtain the following result, whose proof is included in Appendix B.2.

Theorem 3.2.

Suppose that for some , for all . Then, for all . Furthermore, let and , then .

4 Computing the Optimal Feature Configuration

We now embark on our second task: assuming the (learned) adversary’s behavior model, compute the optimal observable feature configuration to minimize the defender’s expected loss. This can be formulated as an optimization problem.

In general, this optimization problem is typically non-convex, and very difficult to solve. We show that the decision version of FDG is NP-complete, hence finding the optimal feature configuration is NP-hard. In fact, this holds even when there is only a single binary feature and the score function takes the form in Eq. (1).

Theorem 4.1.

FDG is NP-complete.


We reduce from the Knapsack problem. Given , , decide whether there exists such that and .

We construct an instance of FDG. Let the set of targets be , and let there be a single binary feature, i.e. and for each . Since there is only one feature, we abuse the notation by using . Suppose each target’s hidden value of the feature is . Consider a score function such that and . For each , let if , and . We chose a large enough such that . In addition, for each , let if , and . Finally, let the budget .

For a solution to each Knapsack instance, we construct a solution to the above FDG where for , and . First, we know if and only if . Since for all , if and only if . Note that . Thus, is a certificate of Knapsack if and only if is feasible for FDG and the defender’s expected loss is at most . ∎

We present an approximation algorithm based on mixed integer linear programming (MILP). Given

, scaling the score by a factor of does not affect the attack probability, where is the norm of . Thus, we treat the score function as .

Let . We divide the interval into subintervals, each of length . On interval with , we approximate the function with the line segment of slope connecting the points and . We use this method to approximate the score of each target, , in the following mathematical program. We represent , where each variable indicates the quantity takes up on the interval . The constraints in Eq. (6)-(7) ensure that only if . While the formulation presented in Eq. (2)-(9) is not technically a MILP, we can linearize the objective and the constraint involving absolute value, but avoid doing so here for clarity. The MILP formulation is relegated to Appendix A.

Categorical constraints (8)

We can now establish the following bound, whose proof appears in Appendix B.3.

Theorem 4.2.

Given , The MILP above is a -approximation to the original problem.

Proof sketch.

We first need to analyze the tightness of the linear approximation. Using inequalities for and for , we can uniformly bound the approximation error . This allows us to bound the difference between true and approximated defender’s expected utility on any feature configuration by . We use the triangle inequality to connect the approximate solution and the true solution. ∎

While the mathematical program in Eq. (2)-(9) could be transformed into a MILP, the necessary linearization introduces many additional variables, increasing the size of the problem. To improve scalability, we perform binary search on the objective value . Specifically, the objective at each iteration of binary search becomes


The complete procedure is given as Alg. 1 in Appendix A. With the objective in Eq. (10), we no longer need to perform linearization to obtain a MILP. This leads to significant performance improvement as we show later in the experiments. We also preserve the approximation bound; the proof appears in Appendix B.4.

Theorem 4.3.

Given and tolerance , binary search gives a -approximation.

In addition, we propose two exact algorithms for special cases of FDG, which can be found in Appendix C. When the deception cost is associated with discrete features only, we provide an exact MILP formulation. When there is no budget and feasibility constraints, we can find the optimal defender strategy in time.

5 Experiments

We present the experimental results for our learning and planning algorithms separately, and then combine them to demonstrate the effectiveness of our learning and planning framework. All experiments are carried out on a 3.8GHz Intel Core i5 CPU with 32GB RAM. We use RMSProp as our gradient-based optimizer, Ipopt as our non-convex solver, and CPLEX 12.8 as the MILP solver. All results are averaged over 20 instances; error bars represent the standard deviations.

5.1 Learning

Simple score function

First, we assume the adversary uses the score function specified in Eq (1). The defender learns this score function using either the closed-form estimation (CF) in Theorem 3.1, or a gradient-based algorithm (GD).

We study how the learning accuracy changes with the size of training sample . We sample the parameters of the true score function uniformly at random from . We then generate feature configurations uniformly at random. For each of them, we sample the attacked target times according to , obtaining a training set of samples. We also generate a test set of feature configurations sampled uniformly at random. We measure the learning error as the mean total variation distance between the attack distribution from the learned model and that of the true model :

Figure (a)a shows that for both learning approaches, the learning error decreases as we increase the number of samples. In general, GD outperforms CF. This is expected as the size of the training dataset is far smaller than that implied by Theorem 3.1. The learning error increases for both methods when we have more features, and for CF, more targets as well.

(a) Learning 1-layer score function
(b) Learning 3-layer score function
(c) Planning with 1-layer score function,
(d) Planning with 1-layer score function,
(e) Planning with 3-layer score function,
(f) Planning with 3-layer score function,
(g) Planning with 3-layer score function,
(h) Planning with 3-layer score function,
(i) Learning + Planning, 1-layer score function,
(j) Learning + Planning, 1-layer score function,
(k) Learning + Planning, 3-layer score function,
(l) Learning + Planning, 3-layer score function,
Figure 1: Experimental results

Complex score function

We assume the adversary uses a 3-layer neural network score function, whose details are in Appendix E. Since CF is no longer applicable, for each sample size , we generate feature configurations and sample an attacked target for each of them in the training set. Fig. (b)b shows GD can minimize the learning error to below 0.15.

We also measured , the error in the score function parameter , which directly relates to the sample complexity bound in Theorem 3.1. We include the results in Appendix D.

5.2 Planning

We test our algorithms on finding the optimal feature configuration against a known attacker model. The FDG parameter distributions are included in Appendix E. Experiments on our special case algorithms are included in Appendix D.

Simple score function

Fig. (c)c shows that the MILP in Eq (2)-(9) becomes impractical to test when , whereas the binary search extension (MILPBS) scales up to problems with 200 targets. We note that this is already at the scale of many real-world problems. Fig. (d)d shows that MILPBS also scales better on the number of features. We set the MILP’s error bound at 0.005; the difference in the two algorithms’ results is negligible.

Complex score function

When the features are continuous without feasibility constraints, planning becomes a non-convex optimization problem. We can apply the gradient-based optimizer or non-convex solver. We also introduce a greedy heuristic inspired by the special case algorithm in Appendix 

C.2. Greedy (Alg. 2 in Appendix A) finds the feature vectors that maximize and minimize the score, respectively, by GD. It then greedily applies these features to targets of extreme losses. Recall that is the defender’s expected loss using feature configuration . We measure the solution gap of alg as , where is the solution from alg. We choose GD as baseline as it typically yields the best solution.

Fig. (e)e(h)h show the running time and solution gap fixing either or . GD does well in both dimensions. Greedy lags in solution quality, while non-convex solver struggles in both. However, these algorithms do not provide any solution guarantees, thus we cannot conclude that any of them (including GD) yields a good defender strategy.

5.3 Combining Learning and Planning

We integrate the learning and planning algorithms to examine our full framework. The defender learns a score function using algorithm L. Then, she uses planning algorithm P to find an optimal configuration assuming . We measure the solution gap as , where is the optimal feature configuration against the true attacker model, computed using MILPBS or GD. We choose them as baselines as they have better solution quality in the previous experiments.

Simple score function

We test learning algorithms and planning algorithms . As shown in Fig. (i)i, more training data lead to smaller solution gaps. Consistently with the results in Fig. (a)a, GD learning has better performance. Consistently with the planning results, the difference between MILP and MILPBS in solution quality is negligible. With targets, all algorithms yield solution gaps below 0.1 (Fig. (j)j). With , CF learning shows unsatisfactory results. Compared to Fig. (a)a, we have more targets, which increases the sample complexity. GD learning performs well on all problem sizes.

Complex score function

We test learning algorithm GD and planning algorithms . Fig. (k)k shows that all algorithms benefit from having more training data, and GD performs the best. Compared to the case with 1-layer score functions, more data are required here to achieve a small solution gap. Since we have small learning error for both cases (Fig. (a)a,(b)b), this suggests planning is more sensitive to complex score functions than simple score functions. When given enough data, Fig. (l)l shows that GD can achieve a solution gap below 0.2 with as many as 50 targets.

6 Related Work

Deception as a game

Deception has been studied in many domains, and of immediate relevance is its use in cybersecurity [Rowe2007]. Studies have suggested that deceptively responding to an attacker’s scanning and probing could be a useful defensive measure [Jajodia et al.2017, Albanese et al.2016]. schlenker2018AAMAS schlenker2018AAMAS and wang2018Gamesec wang2018Gamesec propose game-theoretic models where the defender strategically manipulates the query response to a known attacker. In addition to proposing a domain-independent model, we advance the state of the art by (1) providing a unified learning and planning framework with theoretical guarantee which can deal with unknown attackers, (2) extending the finite “type” space in both papers, where “type” is defined by the combination of feature values, to an infinite feature space that allows for both continuous and discrete features, and (3) incorporating a highly expressive bounded rationality model whereas both papers assume perfectly rational attackers.

On the game-theoretic modeling of deception in general, horak2017Gamesec horak2017Gamesec study a defender that engages an attacker in a sequential interaction. nguyen2019AAAI nguyen2019AAAI offer a complementary view where the attacker aims at deceiving the defender. In [Yin et al.2014, Guo et al.2017, Nguyen et al.2019] deception is defined as deceptively allocating defensive resources. We study feature deception where no effective tools can thwart an attack, which is arguably more realistic in high-stakes interactions. When such tools exist, feature deception is still valuable for strategic defense.

Learning in Stackelberg games

Much work has also been devoted to learning in Stackelberg games. Our work is most directly related to that of haghtalab2016IJCAI haghtalab2016IJCAI. They consider an SUQR adversary behavior model, and provide a similar learning guarantee as our Theorem 3.1. The only decision variable in their model, the coverage probability, may be viewed as a single feature in FDG. FDG allows for an arbitrary number of features, and this realistic extension makes their key technique inapplicable. Our main learning result also removes the technical constraints on defender strategies present in their work. sinha2016AAMAS sinha2016AAMAS study learning adversary’s preferences in a probably approximately correct (PAC) setting. However, their learning accuracy depends heavily on the quality of distribution from which they sample the defender’s strategies. We provide a uniform guarantee in a distribution-free context. Other papers [Blum et al.2014, Marecki et al.2012, Letchford et al.2009, Peng et al.2019] study the online learning setting with rational attackers. As pointed out by haghtalab2016IJCAI haghtalab2016IJCAI, considering the more realistic bounded rationality scenario allows us to make use of historical data and use our algorithm more easily in practice.

Planning with boundedly rational attackers

yang2012AAMAS yang2012AAMAS propose a MILP-based solution similar to our approximation algorithm. We generalize the coverage probability to features, and adopt a more expressive behavior model. The subsequent papers that incorporate learning with such bounded rationality models do not provide any theoretical guarantee [Yang et al.2014, Fang et al.2015]. In these papers, manipulating coverage probabilities affects the defender’s utility. This is not the case in FDG due to the very notion of deception. However, this does not make our model technically easier to analyze in any substantial way.

7 Discussion and Conclusion

We conclude with a few remarks regarding the generality and limitations of our work. First, our model allows for more sophisticated attackers who can outstrip deception. A singleton feasible set could indicate that the defender knows the attacker is able to find out the hidden value of a feature.

Second, we assumed the hidden feature values are fixed, as they often represent environmental parameters beyond the defender’s control, or at least present high cost of manipulation. Altering them also does not align conceptually with deception. As a result, we treat defender’s losses as fixed.

Third, we assumed the defender uses only pure strategies. In many domains such as cybersecurity, it is often too costly to frequently perform system reconfiguration. Thus, the system appears to the attacker as static. We leave to future work to explore the additional strength of mixed strategies in applications where they are appropriate.

Finally, it would be interesting to consider the attacker’s ability to recognize deception after repeated interactions.


This research was sponsored by the Combat Capabilities Development Command Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-13-2-0045 (ARL Cyber Security CRA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Combat Capabilities Development Command Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes not withstanding any copyright notation here on.


  • [Albanese et al.2016] Massimiliano Albanese, Ermanno Battista, and Sushil Jajodia. Deceiving attackers by creating a virtual attack surface. In Cyber Deception. Springer, 2016.
  • [Blum et al.2014] Avrim Blum, Nika Haghtalab, and Ariel D Procaccia. Learning optimal commitment to overcome insecurity. In NIPS, 2014.
  • [Fang et al.2015] Fei Fang, Peter Stone, and Milind Tambe. When security games go green: Designing defender strategies to prevent poaching and illegal fishing. In IJCAI, 2015.
  • [Guo et al.2017] Qingyu Guo, Bo An, Branislav Bosanskỳ, and Christopher Kiekintveld. Comparing strategic secrecy and stackelberg commitment in security games. In IJCAI, pages 3691–3699, 2017.
  • [Haghtalab et al.2016] Nika Haghtalab, Fei Fang, Thanh H Nguyen, Arunesh Sinha, Ariel D Procaccia, and Milind Tambe. Three strategies to success: Learning adversary models in security games. In IJCAI, 2016.
  • [Hinton et al.2012] Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky.

    Neural networks for machine learning lecture 6a overview of mini-batch gradient descent.

  • [Horák et al.2017] Karel Horák, Quanyan Zhu, and Branislav Bošanskỳ. Manipulating adversary’s belief: A dynamic game approach to deception by design for proactive network security. In GameSec, 2017.
  • [Hurlburt2016] George Hurlburt. ” good enough” security: The best we’ll ever have. Computer, 49(7):98–101, 2016.
  • [Jajodia et al.2016] Sushil Jajodia, VS Subrahmanian, Vipin Swarup, and Cliff Wang. Cyber deception. Springer, 2016.
  • [Jajodia et al.2017] Sushil Jajodia, Noseong Park, Fabio Pierazzi, Andrea Pugliese, Edoardo Serra, Gerardo I Simari, and VS Subrahmanian. A probabilistic logic of cyber deception. IEEE Transactions on Information Forensics and Security, 12(11), 2017.
  • [Latimer2001] Jon Latimer. Deception in War. John Murray, 2001.
  • [Letchford et al.2009] Joshua Letchford, Vincent Conitzer, and Kamesh Munagala. Learning and approximating the optimal strategy to commit to. In SAGT, 2009.
  • [Lyon2009] Gordon Fyodor Lyon. Nmap network scanning: The official Nmap project guide to network discovery and security scanning. Insecure, 2009.
  • [Marecki et al.2012] Janusz Marecki, Gerry Tesauro, and Richard Segal. Playing repeated stackelberg games with unknown opponents. In AAMAS, 2012.
  • [Nakashima2013] Ellen Nakashima. To thwart hackers, firms salting their servers with fake data, 2013.
  • [Nguyen et al.2019] Thanh Hong Nguyen, Yongzhao Wang, Arunesh Sinha, and Michael P. Wellman. Deception in finitely repeated security games. In AAAI, 2019.
  • [Peng et al.2019] Binghui Peng, Weiran Shen, Pingzhong Tang, and Song Zuo. Learning optimal strategies to commit to. In AAAI, 2019.
  • [Potter and Day2009] Bruce Potter and Greg Day. The effectiveness of anti-malware tools. Computer Fraud & Security, 2009(3):12–13, 2009.
  • [Rowe2007] Neil C Rowe. Deception in defense of computer systems from cyber attack. In Cyber Warfare and Cyber Terrorism. IGI Global, 2007.
  • [Schlenker et al.2018] Aaron Schlenker, Omkar Thakoor, Haifeng Xu, Fei Fang, Milind Tambe, Long Tran-Thanh, Phebe Vayanos, and Yevgeniy Vorobeychik. Deceiving cyber adversaries: A game theoretic approach. In AAMAS, 2018.
  • [Shamsi et al.2014] Zain Shamsi, Ankur Nandwani, Derek Leonard, and Dmitri Loguinov. Hershel: single-packet os fingerprinting. In ACM SIGMETRICS Performance Evaluation Review, 2014.
  • [Sinha et al.2016] Arunesh Sinha, Debarun Kar, and Milind Tambe. Learning adversary behavior in security games: A pac model perspective. In AAMAS, 2016.
  • [Spitzner2003] Lance Spitzner. The honeynet project: Trapping the hackers. IEEE Security & Privacy, 99(2):15–23, 2003.
  • [Wang and Zeng2018] Wei Wang and Bo Zeng. A two-stage deception game for network defense. In GameSec, 2018.
  • [Yang et al.2012] Rong Yang, Fernando Ordonez, and Milind Tambe. Computing optimal strategy against quantal response in security games. In AAMAS, 2012.
  • [Yang et al.2014] Rong Yang, Benjamin Ford, Milind Tambe, and Andrew Lemieux. Adaptive resource allocation for wildlife protection against illegal poachers. In AAMAS, 2014.
  • [Yin et al.2014] Yue Yin, Bo An, Yevgeniy Vorobeychik, and Jun Zhuang. Optimal deceptive strategies in security games: A preliminary study. In

    AAAI Spring Symposium on Applied Computational Game Theory

    , 2014.

Appendix A Deferred Algorithms

We show the MILP formulation for the mathematical program in Eq. (2)- (9). We use to denote the set of continuous features, and denotes the set of discrete features. For discrete feature , we assume that and budget have been processed such that Constraint (5) has been modified to . This transformation based on simplifies our presentation below.

Categorical constraints (22)

We establish the variables in the MILP above with the FDG variables as below.


All equations above involving index without summation should be interpreted as applying to all .

2 while  do
3       Solve the MILP in Eq. (2)-(9) with objective in Eq. (10).
4       if objective value  then
5             Let
6      else
7             Let
return , the MILP solution when was last updated
Algorithm 1 Milp-bs
1Use gradient-based method to find and .
2 Sort the targets such that .
3 Initialize .
4 while  and budget  do
5       Let if
6       if Cost remaining budget then
7             , decrease the budget, .
8      if Cost remaining budget then
9             , decrease the budget, .
return feature configuration
Algorithm 2 Greedy

Appendix B Deferred Proofs

b.1 Proof of Theorem 3.1

We require the following lemma.

Lemma B.1.

[Haghtalab et al.2016] Given observable features , and samples, we have with probability , for all .

Proof of Theorem 3.1.

Fix . Fix two nodes . For each where , we have


The system of equations above can be represented by . Let be the matrix norm induced by vector norm, that is,

It is known that . In our case, the feature values are bounded in and thus . This yields . Now, choose such that . Suppose is invertible.

Let and . Suppose we have samples. From Lemma B.1, for any node and any feature configuration where , with probability . The bound holds for all strategies simultaneously with probability at least , using a union bound argument. In particular, for our chosen nodes and , we have

Define similarly as but using empirical distribution instead of true distribution . Let . Then, for each , we have

Therefore, we have . Let be such that , i.e. . Observe that

This leads to

For any observable feature configuration ,


b.2 Proof of Theorem 3.2


Let and . Since

we get

That is, . The proof of Theorem 3.7 in [Haghtalab et al.2016] now follows if we redefine their