“If deception111Deceiving means to “deliberately cause (someone) to believe something that is not true, especially for personal gain” according to the Oxford English Dictionary. can be used for cyber-attacks, can it also be used for defense?”. This question was raised by the renowned hacker, Kevin Mitnick, in his famous book The Art of Deception . The reason for asking for such advanced cyber-defense mechanisms is the ever-escalating progress of cyber-infiltration 
. Furthermore, the prolific reliance of governments, industries, and individuals on cyber-infrastructure, thanks to the growth of applications of AI and machine learning tools, makes them particularly attractive for cyber-terrorism and cyber-crime.
Deception is a familiar technique for war zone strategists  and for cyber-infiltrators or hackers . In cyber-infiltration, for example, deception can be achieved by changing malware signatures, social engineering, concealing codes, and encrypting exploits. On the other hand, deception defense strategies can use deceits and feints to thwart an infiltrator’s cognitive processes, disrupt the breach process, and delay infiltration activities. Such deception can be carried out via misleading, obfuscations, and fake responses. These methods rely on the infiltrator’s belief in the network responses and data. For instance, honeypot servers (fake servers that mimic actual servers) are commonly used to actively detect malicious activity and reveal the infiltrator’s strategies . From a cyber-deception standpoint, two factors, namely, the amount of deception and the frequency of deception, characterize the cost of implementing a deception mechanism.
Cyber-deception can be mathematically formalized as a non-cooperative two-player dynamic game [1, 24, 29], which can represent the adversarial sequential decision making nature of a deception/infiltration scenario and the limited cyber-defense/infiltration resources. In particular, games with imperfect information can represent the information asymmetry, which is at the heart of cyber-deception, i.e., the deceiver has full knowledge over the cyber-infrastructure, such as servers, but the infiltrator does not. Recently, in , the authors proposed a modeling framework for cyber-deception in terms of a one-sided partially observable Markov decision processes-a special class of partially observable Markov decision processes (POSGs), where one player has full observability and the second one does not. Despite this unique modeling paradigm, POSGs are notoriously intractable to solve in general . Three approximate methods have been proposed in the literature for solving POSGs either by a memory bounded representation of the value function 
, by approximating it by a series of smaller, related Bayesian games using heuristics, or by using heuristic search value iteration .
For POSGs, the optimal strategies for the deceiver may require infinite-memory, or they may require unbounded-memory. To obtain a tractable formulation, we use finite-memory strategies for the infiltrator. Using finite memory, we have a compact representation of the infiltrator strategies. Then, we search for strong finite-memory infiltrator strategies that induces a low cost for the infiltrator. The set of all finite-memory infiltrator strategies is uncountable, and the measure of the finite-memory strategies that induces a low cost for the infiltrator may be small compared to the set of all finite-memory strategies. Therefore, methods such as evolutionary algorithms or Bayesian methods may not be applicable in this setting. We define the infiltrator strategy synthesis problem as a synthesis problem in parametric MDPs, and we use our previous work  to synthesize strategies in a parametric MDP using convex-concave approach . Finally, using the infiltrator strategies, we compute a robust deceiver strategy that maximizes the worst-case cost of the all strategies using mixed-integer linear programming (MILP). We demonstrate that, we can compute strategies for a deception game with higher number of states and action than those that can be tackled by the state-of-the-art methods.
The rest of the paper is organized as follows. In the next section, we motivate the problem studied in this paper with a motivating example. In Section III, we provide some preliminary definitions and formulate the problem. In Section IV, we describe our two-stage tractable approach. In Section V, we apply the proposed approach to the motivating example. Finally, we conclude the paper in Section VI and give directions for future research.
Ii Motivating Example:
Active Deception for Network Security
We motivate the problem under study in this paper by a computer network security example adapted from . Consider a multilayer network used typically in critical network operations such as power plants and manufacturing facilities [20, 4] (see Figure 1). The infiltrator starts the attack from outside of the network and proceeds by accessing deeper layers, where more sensitive/valuable assets are located. The middle layers are composed of databases containing confidential data and the last layer provides access to physical devices. Infiltration to the last layer of the network can lead to critical damage to the facility .
The network we consider consists of layers as illustrated in Figure 2. Each column in Figure 2 corresponds to a network layer, e.g., emails, databases, actuators and sensors, and etc.
The deceiver’s task here is to manipulate the infiltrator’s belief over whether he/she is being detected. The deceived infiltrator is hence convinced to take wrong actions due to the uncertainty about the infiltration progress. The deceiver has to manipulate the infiltrator’s belief and keep him d in the network.
One way to represent the network security problem depicted in Figure 2 is to represent it by a one-sided POSG (to be described formally in the sequel), which is comprised of two parts. The upper-half corresponds to the states, wherein the presence of the infiltrator in the network has not been detected () and the deception mechanism cannot be implemented; whereas, the lower half corresponds to the case wherein the infiltrator is detected and therefore we can choose either actions or .
The arrows illustrate the transitions in the game characterized by a transition function . We assume all transitions are deterministic, except transitions from to .
The infiltrator starts the attack at a computer outside the network . The infiltrator attempts to access computers in deeper layers by ing them. He/she also has the option to at any layer, , which reveals the presence of the infiltrator and the infiltrator is forced to exit the network and restart the attack. Another course of action is to incur small amount of damage by ing of data, which does not attract the deceiver’s attention.
The infiltrator does not receive observations for detection state , but is aware of (the network Layer being infiltrated). The deceiver has no information over the infiltrator until the infiltrator has been detected.
Iii Problem Statement
In this section, we first lay the formal foundations for POSGs, followed by a formal problem statement together with the underlying optimization problem.
A probability distribution over a finite or countably infinite set is a function with . The set of all distributions on is . The support of a distribution is . A distribution is Dirac if . Let be a finite set of variables over the real numbers . The set of multivariate polynomials over is . An instantiation for is a function .
Definition 1 (Sg)
A stochastic game (SG) is a tuple with a finite set of states, a set of Player 1 states, a set of Player 2 state, the initial state , a finite set of actions, and a transition function . We define costs using a state-action cost function .
A Markov decision process (MDP) is an SG in which , and consequently . A path of an SG is an (in)finite sequence , where , , , and for all . For finite , denotes the last state of . The set of (in)finite paths of is ().
Definition 2 (SG strategy)
A strategy for is a pair of functions such that for all , .
A Player- strategy (for ) is memoryless if implies for all . It is deterministic if is a Dirac distribution for all .
Definition 3 (One-sided POSG)
A one-sided partially observable stochastic game (POSG) is a tuple , with the underlying SG of , a finite set of observations, and the observation function for Player 2.
A partially observable Markov decision process (POMDP) is an one-sided POSG in which , and consequently . In a POSG, players cannot make any choice when . Thus, for a POSG with states , the POSG with states and and all transitions unchanged is equivalent w.r.t. the properties considered in the paper. Consequently, a POSG where one player never has a choice, i.e. is also an MDP.
Without loss of generality, we assume Player 2 can observe its own available actions, thus for all with . Essentially, in a one-sided POSG, Player 1 has full observability, while Player 2 has only partial observability. We lift the observation function to paths: For , the associated observation sequence is .
Definition 4 (POSG Strategy)
An observation-based strategy for a one-sided POSG is a strategy for the underlying SG such that for all with . is the Player 1 strategy, and is the (observation-based) Player 2 strategy.
Applying the strategy to a POSG resolves all nondeterminism and partial observability, resulting in the induced Markov chain
induced Markov chain. In general, POSGs extend POMDPs and optimal strategies require infinite memory in general . To represent observation-based strategies with finite memory, we use finite-state controllers (FSCs). If such an FSC has memory states, we speak of memory size for the underlying strategy .
Definition 5 (Fsc)
A finite-state controller (FSC) for a POMDP is a tuple , where is a finite set of memory nodes, is the initial memory node, is the action mapping , and is the memory update . The set denotes the set of FSCs with memory nodes, called -FSCs.
From a node and the observation in the current state of the POMDP, the next action is chosen from randomly as given by . Then, the successor node of the FSC is determined randomly via .
Specifications. For a POSG and a set of target states, we consider the maximal (or minimal) probability () to reach , as well as the maximal (or minimal) expected cost (). For a probability bound and an expected cost bound , we also consider specifications of the form and , where the probability or the expected cost to reach shall be at most or , respectively. The specification is satisfied for a strategy and the POSG if the probability of reaching a target state in is at most , denoted by . This satisfaction relation is analogous for expected cost.
Sufficiently strong strategies. For a POSG and a property , a strategy for Player 1 is called sufficiently strong for against a set of strategies for Player 2, if for each strategy of Player 2, .
Iii-B Deception Games
Definition 6 (Deception Game)
A deception game is a one-sided POSG, where Player 1 (with full observability) is called the infiltrator and Player 2 (with partial observability) is called the deceiver. See Figure 3 for an illustration of the robust deception problem.
In the POSG that captures the motivating example, we identify the target states as the states that correspond to Layer . The goal is to deceive the infiltrator such that the expected cost to reach those states is maximized. We state the formal problem.
[backgroundcolor=gray!50!white] Problem 1: Given a deception game , a set of target states , memory bound , and specification , compute a sufficiently strong strategy for the deceiver, against infiltration-strategies with memory .
The straightforward solution to Problem 1 can be given by solving a robust mixed-integer linear program (MILP) with variables as described next.
For and each action , the strategy variable denotes which action is active in each state . If , then Player 1 takes action in state . For and , the cost variables and represent the expected cost of reaching . is a large constant that automatically satisfies the constraints in (6) and (7) if , which means the deceiver does not select action in state . is the discount factor to ensure that we have finite expected cost.
We then have the following robust MILP that solves Problem 1.
The objective in (1) minimizes the expected cost of the deceiver. We assign the expected cost of the states in the target set to 0 by the constraints in (2). We ensure that the strategies of the infiltrator and the deceiver are well-defined with the constraints in (3) and (4). The constraints in (5)–(7) gives the computation for the expected cost in the states of the POSG.
Unfortunately, robust MILP’s are notoriously hard to solve, except in special cases of small problems with few constraints 
or only having binary variables. Recently,  proposes a heuristic relaxation for robust MILP’s using the affinely adjustable robust counterpart . However, large robust MILP’s, such as the robust MILP above for Problem 1, remain intractable in general.
Iv Tractable Approach
In this section, we propose a two-stage approach for solving Problem 1. We begin by giving some key formalisms.
Iv-1 Parametric MDPs
Instead of having fixed probabilities in the transition function of an MDP, we allow to describe the transition probabilities of an MDP as polynomials over a fixed set of variables. By having different values for the parameters, we can induce different MDPs.
Definition 7 (pMDP)
A parametric Markov decision process (pMDP) is a tuple with a finite (or countably infinite) set of states, initial state , a finite set of actions, a finite set of parameters, and a transition function .
Applying an instantiation to a pMDP , denoted , replaces each polynomial in by . is also called the instantiation of at . Instantiation is well-defined for if the replacement yields probability distributions, i. e., if is an MDP.
Definition 8 (pMDP Synthesis Problem)
Given a pMDP and a property , the synthesis problem is to compute an instantiation such that , if one exists.
The pMDP synthesis problem amounts to solving the following NLP.
For , the cost variable represents an upper bound of expected cost of reaching target set , and the parameters in set enter the NLP as part of the functions from in the transition function .
One particular efficient method for finding parameter instantiations is given in , which solves the NLP in (8)–(13) via a reformulation to a convex-concave programming problem  and an efficient integration of model checking calls to improve its performance. We are now ready to outline the proposed tractable approach.
Iv-a Stage 1: Computing Infiltration Strategies
In this section, we are given a POSG and are interested in computing several sufficiently strong infiltrator strategies. For an overview of the approach in Stage 1, see Figure 4.
Given a POSG , compute several sufficiently strong strategies for the infiltrator.
We detail the approach first for an infiltrator that uses memoryless strategies. We then generalize the approach to any (finite but fixed) amount of memory for the infiltrator. The approach extends the reduction from POMDPs to pMCs in .
Iv-A1 Memoryless strategies
A memoryless strategy maps observations to distributions over actions. Equivalently, such strategy maps observation-action pairs to probabilities (ObAct-probabilities). Any memoryless strategy is uniquely defined by its ObAct-probabilities. Instead of finding suitable strategies for the infiltrator, we can thus reformulate our approach to finding suitable ObAct-probabilities. Hence, a suitable strategy can be described as parameter values that satisfy the property of an pMDP. To find the ObAct-probabilities, we construct an equivalent pMDP as illustrated in the following example.
Consider the POSG in Fig. 5(a). The states for Player 1 are given by circles. Similarly, the rectangles depict the states for Player 2. We indicate the actions from the states with solid lines to black dots. The corresponding probabilities from the black dots give the probability of transitioning to the next state. Thus, we draw dashed (direct) lines from the Player 1 (Player 2) nodes. The colors of states indicate the corresponding observations.
The corresponding pMDP for a memoryless, observation-based strategy for player 2 is given in Fig. 5(b). The actions of player 1 are unchanged. As there is no more nondeterminism in the states of player 2, we can also view these states as player 1 states. Moreover, to avoid clutter, we omit action indications. Consider state with outgoing action , which leads with probability to . Under a strategy, we take this action with probability : Thus, in the pMDP there is an arc with probability from to . If we take action with probability , we take action with probability . The same probabilities for taking the actions have to hold in state . Observe that the probabilities are not immediately reflected, as there are two paths from to . The parameters in are different. Indeed, there are parameters as there are three actions to take.
We formalize the construction below for situations, where the infiltrator has three available actions in each state, to simplify the notation.
Given a POSG with SG . Let . Without loss of generality, refer to if . The pMDP is the corresponding pMDP for POSG , if , , , and
where denotes , and , respectively.
The construction is straightforward to adapt to a varying number of actions per state: Indeed, we just need additional variables indicating that we take action in a state with observation .
We denote the memoryless strategy for the infiltrator in the deception game that is induced by a valuation in the infiltration-pMDP.
Given a deception game , with its corresponding infiltration-pMDP , and a property . Let be a solution for the pMDP synthesis problem for the property . Then is a sufficiently strong strategy for the infiltrator on the and .
Iv-A2 Adding Memory to the Infiltrator Strategy
The ideas for the synthesis of memoryless strategies can be lifted to a finite () memory setting. We apply an unfolding of the POSG, and then search for a memoryless infiltrator strategy. The idea is to create copies of the POSG, and allow the infiltrator to freely switch between copies. The different copies correspond to the internal memory of the infiltrator.
Iv-B Stage 2: Computing a Robust Deception Strategy
In this section, we assume that the deceiver obtained strategies for the infiltrator. Then, the task of the deceiver is to compute a strategy that minimizes the worst-case expected cost to reach a target set against any of the infiltration strategies under all deceiver policies. If the deception strategy has access to the memory node of the infiltration strategy, we call the infiltration memory transparent.
Given a deception game, a set of transparent infiltration strategies, compute a deception strategy that is optimal under all deception strategies against the set of infiltration strategies.
Iv-B1 Deceiving against Transparent-Memory Infiltrator Strategies
We focus on memoryless infiltration strategies. Memoryless infiltration strategies are the most prominent transparent memory case, and for transparent memory, each finite memory strategy can be reduced to a memoryless strategy by unfolding the memory into the MDP, as in Stage 1.
To compute a robust deception strategy against transparent infiltrator strategies, we remove the uncertain constraints of the robust MILP in (1) – (7) by using the infiltrator strategies that we obtained in Stage 1. Then, we reduce the problem of synthesizing a robust deception strategy problem to solving an MILP instead of solving a robust MILP. We now give the details of the resulting MILP with the infiltrator strategies.
We define the following variables for the following MILP: For , , and for each infiltrator strategy , the cost variables and give the expected cost of reaching for each infiltrator strategies. For and each action , the strategy variable denotes which action is active in each state . If , then the deceiver takes action in state .
For observation , for each action and for each infiltrator strategy , represents the probability of choosing action upon observation for each strategy of Player 2. Similar to the Problem 1, is a large constant that automatically satisfies the constraints in (18) and (19) if , which means the deceiver does not select action in state .
We thus have the following MILP:
The objective in (14) minimize the worst-case expected cost of deception against the infiltrator strategies. The constraint in (15) sets the expected cost of the states in to be 0. Constraint (16) ensures that the deceiver picks one of the actions in for each state in . The constraints (17)–(19) concerns the cost computation for each state in , and .
|Infiltrator’s Position ()||Detection mode ()||Infiltrator ()||Deceiver ()|
V Numerical Experiments
At this point, we are ready to return to the motivating example. We define the expected discounted cost for the deceiver corresponding to each action the infiltrator takes by
where is the discount factor and denotes the loss at stage . For this example, the specification of the deceiver is , which is to minimize against all infiltration strategies. Table I outlines the elements of the one-sided deception POSG. Note that since the players take actions concurrently, the costs depend on their joint actions. Furthermore, some of the costs are dependent on the layer index the infiltrator has accessed. For more details about the example, the interested reader is referred to .
We first describe our results on a 4-layer network, given by . Then, we demonstrate our approach on a 12-layer network. For each network, we first compute infiltration strategies for the deceiver using the approach in Stage 1. Then, using these infiltration strategies, we compute a robust deceiver strategy, which is given by in Stage 2. The arising pMDP problems from Stage 1 is solved using the approach in . We solve the MILP problems from Stage 2 using the MILP solver GUROBI .
For the 4-layer network, we first construct the one-sided deception POSG with 2 memory nodes. The POSG consists of 49 states, 8 action choices for the deceiver, and 34 actions for the infiltrator. After computing an optimal deceiver strategy, the worst-case induced cost against infiltration strategies obtained from the approach in Stage 1 is 282.22. The obtained cost is comparable to the approach given by . The procedure for obtaining the infiltration strategies took 193.29 seconds, and time to compute the optimal deceiver strategy is 216.92 seconds.
The optimal deceiver strategy that we obtained is to engage the infiltrator in first 2 layers, then blocking the infiltrator in the last layers. This seems to be beneficial compared to always blocking the infiltrator, which leads to a worst-case expected cost of 341.72, and always engaging the infiltrator, which leads to a worst-case expected cost of 304.05.
We also give the results with different number of infiltrator strategies on a 4-layer and 12-layer network in Table II. The expected cost of the defender increases with increasing number of infiltrator strategies in all cases. Also, by increasing number of layers, the deceiver can craft an optimal strategy in a 12-layer network that incurs a less expected loss compared to the optimal strategy in a 4-layer network. However, if the deceiver always blocks or engages, the expected cost increases with increasing number of layers.
|Number of infiltrator strategies||10||100||1000|
|Number of Layers & deceiver strategies|
|4 & Always engage||260.17||297.45||304.05|
|4 & Always block||246.12||286.19||341.72|
|4 & Optimal||228.64||256.81||282.22|
|12 & Always engage||277.71||301.46||313.91|
|12 & Always block||256.39||294.65||308.12|
|12 & Optimal||217.76||239.71||261.49|
We presented an approach to solve a partially observable stochastic game (POSG), where one of the player has full observability over the states, and the other player only has partial observability. We formulated the problem as a robust mixed-integer linear program, which is intractable to solve in general. To obtain a more scalable approach, we computed a robust optimal strategy for the deceiver by synthesizing a set of infiltration strategies using parameter synthesis in parametric Markov decision processes. Using a mixed-integer linear program and the infiltration strategies, we computed the robust deception strategy. We illustrated our approach on a POSG model for network security and we showed that we can handle larger networks compared to the previous approaches in the literature.
Future work concerns removing the transparency approach, which means the deceiver has access to the memory node of the infiltration strategy, if the infiltration strategy is memory-based. Also, we will explore some of the recent methods to solve mixed-integer linear problems proposed in the literature, such as methods proposed in [9, 10].
T. Basar and Geert J. Olsder.
Dynamic noncooperative game theory, volume 23. Siam, 1999.
-  A. Ben-Tal, A. Goryashko, E. Guslitzer, and A. Nemirovski. Adjustable robust solutions of uncertain linear programs. Mathematical Programming, 99(2):351–376, 2004.
-  Dimitris Bertsimas and Melvyn Sim. Robust discrete optimization and network flows. Mathematical programming, 98(1-3):49–71, 2003.
-  E. Byres and J. Lowe. The myths and facts behind cyber security risks for industrial control systems. In Proceedings of the VDE Kongress, volume 116, pages 213–218, 2004.
-  Krishnendu Chatterjee, Martin Chmelík, and Mathieu Tracol. What is decidable about partially observable markov decision processes with -regular objectives. Journal of Computer and System Sciences, 82(5):878–911, 2016.
-  Murat Cubuktepe, Nils Jansen, Sebastian Junges, Joost-Pieter Katoen, and Ufuk Topcu. Synthesis in pMDPs: A tale of 1001 parameters. CoRR, abs/1803.02884, 2018.
-  Donald C Daniel and Katherine L Herbig. Strategic Military Deception: Pergamon Policy Studies on Security Affairs. Elsevier, 2013.
-  Dorothy Elizabeth Robling Denning. Information warfare and security, volume 4. Addison-Wesley Reading, MA, 1999.
-  Souradeep Dutta, Susmit Jha, Sriram Sanakaranarayanan, and Ashish Tiwari. Output range analysis for deep neural networks. arXiv preprint arXiv:1709.09130, 2017.
Souradeep Dutta, Susmit Jha, Sriram Sankaranarayanan, and Ashish Tiwari.
Learning and verification of feedback control systems using feedforward neural networks.IFAC-PapersOnLine, 51(16):151–156, 2018.
-  R. Emery-Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Approximate solutions for partially observable stochastic games with common payoffs. In Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004. Proceedings of the Third International Joint Conference on, pages 136–143. IEEE, 2004.
-  Kai-Simon Goetzmann, Sebastian Stiller, and Claudio Telha. Optimization over integers with robustness in cost and few constraints. In International Workshop on Approximation and Online Algorithms, pages 89–101. Springer, 2011.
-  Gurobi Optimization, Inc. Gurobi optimizer reference manual. http://www.gurobi.com, 2013.
-  Ernst Moritz Hahn, Tingting Han, and Lijun Zhang. Synthesis for pctl in parametric markov decision processes. In NASA Formal Methods Symposium, pages 146–161. Springer, 2011.
-  E. A Hansen, D. S. Bernstein, and S. Zilberstein. Dynamic programming for partially observable stochastic games. In AAAI, volume 4, pages 709–715, 2004.
-  J. P. Holdren and E. S. Lander. Big data and privacy: a technological perspective. Report to the President (Executive Office of the President President’s Council of Advisors on Science and Technology), 2014.
-  K. Horák, B. Bosanskỳ, and M. Pechoucek. Heuristic search value iteration for one-sided partially observable stochastic games. In AAAI, pages 558–564, 2017.
-  K. Horák, Q. Zhu, and B. Bošanskỳ. Manipulating adversary’s belief: A dynamic game approach to deception by design for proactive network security. In International Conference on Decision and Game Theory for Security, pages 273–294. Springer, 2017.
-  Sebastian Junges, Nils Jansen, Ralf Wimmer, Tim Quatmann, Leonore Winterer, Joost-Pieter Katoen, and Bernd Becker. Finite state controllers for pomdps via parameter synthesis. In UAI, 2018.
-  D. Kuipers and M. Fabro. Control systems cyber security: Defense in depth strategies. United States. Department of Energy, 2006.
-  A. Kumar and S. Zilberstein. Dynamic programming approximations for partially observable stochastic games. In FLAIRS Conference, 2009.
-  D. Kushner. The real story of Stuxnet. IEEE Spectrum, 3(50):48–53, 2013.
-  Thomas Lipp and Stephen Boyd. Variations and extension of the convex–concave procedure. Optimization and Engineering, 17(2):263–287, 2016.
-  M. H. Manshaei, Q. Zhu, T. Alpcan, T. Basar, and J. Hubaux. Game theory meets network security and privacy. ACM Computing Surveys (CSUR), 45(3):25, 2013.
-  K. D. Mitnick and W. L. Simon. The art of deception: Controlling the human element of security. John Wiley & Sons, 2011.
-  Jean Pauphilet, Diego Kiner, Damien Faille, and Laurent El Ghaoui. A tractable numerical strategy for robust milp and application to energy management. In Decision and Control (CDC), 2016 IEEE 55th Conference on, pages 1490–1495. IEEE, 2016.
-  Lance Spitzner. Honeypots: tracking hackers, volume 1. Addison-Wesley Reading, 2003.
-  Colin Tankard. Advanced persistent threats and how to monitor and deter them. Network security, 2011(8):16–19, 2011.
-  Q. Zhu and T. Basar. Game-theoretic methods for robustness, security, and resilience of cyberphysical control systems: games-in-games principle for optimal cross-layer resilient control systems. IEEE control systems, 35(1):46–65, 2015.