Log In Sign Up

Active Deception using Factored Interactive POMDPs to Recognize Cyber Attacker's Intent

This paper presents an intelligent and adaptive agent that employs deception to recognize a cyber adversary's intent. Unlike previous approaches to cyber deception, which mainly focus on delaying or confusing the attackers, we focus on engaging with them to learn their intent. We model cyber deception as a sequential decision-making problem in a two-agent context. We introduce factored finitely nested interactive POMDPs (I-POMDPx) and use this framework to model the problem with multiple attacker types. Our approach models cyber attacks on a single honeypot host across multiple phases from the attacker's initial entry to reaching its adversarial objective. The defending I-POMDPx-based agent uses decoys to engage with the attacker at multiple phases to form increasingly accurate predictions of the attacker's behavior and intent. The use of I-POMDPs also enables us to model the adversary's mental state and investigate how deception affects their beliefs. Our experiments in both simulation and on a real host show that the I-POMDPx-based agent performs significantly better at intent recognition than commonly used deception strategies on honeypots.


page 1

page 2

page 3

page 4


Analyzing Cyber-Attack Intention for Digital Forensics Using Case-Based Reasoning

Cyber-attacks are increasing and varying dramatically day by day. It has...

Framework to Describe Intentions of a Cyber Attack Action

The techniques and tactics used by cyber adversaries are becoming more s...

Cyberattack Action-Intent-Framework for Mapping Intrusion Observables

The techniques and tactics used by cyber adversaries are becoming more s...

Developing Optimal Causal Cyber-Defence Agents via Cyber Security Simulation

In this paper we explore cyber security defence, through the unification...

Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning

We pose an active perception problem where an autonomous agent actively ...

Secure (S)Hell: Introducing an SSH Deception Proxy Framework

Deceiving an attacker in the network security domain is a well establish...

Analysis of Attacker Behavior in Compromised Hosts During Command and Control

Traditional reactive approach of blacklisting botnets fails to adapt to ...

1 Introduction

An important augmentation of conventional cyber defense utilizes deception-based cyber defense strategies (Pingree, 2018). These are typically based on the use of decoy systems called honeypots (Spitzner, 2003) with additional monitoring capabilities. Currently, honeypots tend to be passive systems with the purpose of consuming the attacker’s CPU cycles and time, and possibly logging the attacker’s actions. However, the information inferred about the attackers’ precise intent and capability is usually minimal.

On the other hand, honeypots equipped with fine-grained logging abilities offer an opportunity to better understand attackers’ intent and capabilities. We may achieve this by engaging and manipulating the attacker to perform actions that reveal his or her true intent. One way of accomplishing this is to employ active deception. Active strategies entail adaptive deception which seeks to influence the attackers’ beliefs and manipulates the attackers into performing desired actions (Jajodia et al., 2016). We investigate how multi-agent decision making can be used toward automating adaptive deception strategies to better understand the attacker.

We represent cyber deception on a single host as a decision-making problem between a defender and an attacker. We introduce a factored variant of the well-known interactive partially observable Markov decision process 

(Gmytrasiewicz and Doshi, 2005), labeled as I-POMDP, to computationally model the decision making of the defender while reasoning about the attacker’s beliefs and capabilities as it acts and observes. I-POMDP exploits the factored structure of the problem, representing the dynamics and observation function using algebraic decision diagrams, and solving the model using a method that directly operates on these factored representations (Bahar et al., 1997). This brings some level of tractability to an otherwise intractable framework, sufficient to adequately solve the cyber deception domain. I-POMDP explicitly models the beliefs of the attacker and the defender throughout the interaction. This allows for detailed inferences about how specific deceptive actions affect the attacker’s subjective view of the system. We evaluate the performance of I-POMDP in promoting active deception with multiple attacker types both in simulation and on a real host. Our results show that the I-POMDP-based agent learns the intent of the attacker much more accurately compared to baselines that do not engage the attacker or immediately deploy all decoys en masse.

2 Background on I-POMDPs

Interactive POMDPs (I-POMDPs) are a generalization of POMDPs to sequential decision-making in multi-agent environments (Gmytrasiewicz and Doshi, 2005; Doshi, 2012). Formally, an I-POMDP for agent in an environment with one other agent is defined as,

denotes the interactive state space. This includes the physical state as well as models of the other agent , which may be intentional or subintentional (Dennett, 1986). In this paper, we ascribe intentional models to the other agent as they model the other agent’s beliefs and capabilities as a rational agent. is the set of joint actions of both agents. represents the transition function, : . The transition function is defined over the physical states and excludes the other agent’s models. This is a consequence of the model non-manipulability assumption – an agent’s actions do not directly influence the other agent’s models. is the set of agent ’s observations. is the observation function, : . The observation function is defined over the physical state space only as a consequence of the model non-observability assumption – other’s model parameters may not be observed directly. defines the reward function for agent , : . The reward function for I-POMDPs usually assigns utilities to the other agent’s physical states.

We limit our attention to a finitely nested I-POMDP, in which the interactive state space at strategy level is defined bottom up as,

Above, represents agent ’s frame, defined as . Here, represents ’s optimality criterion and the other terms are as defined previously. is the set of agent ’s intentional models, defined as . The interactive state space is typically restricted to a finite set of ’s models, which are updated after every interaction to account for the belief update of agent . The interactive state space for agent at level can be then defined as,

Here, Reach is the set of level models that could have in steps; Reach . We obtain Reach() by repeatedly updating ’s beliefs in the models in .

3 Modeling Cyber Deception using Factored I-POMDPs

Engaging and deceiving human attackers into intruding controlled systems and accessing obfuscated data offers a proactive approach to computer and information security. It wastes attacker resources and potentially misleads the attacker. Importantly, it offers an untapped opportunity to understand attackers’ beliefs, capabilities, and preferences and how they evolve by sifting the detailed activity logs. Identifying these mental and physical states not only informs the defender about the attacker’s intent, but also guides new ways of deceiving the attacker. In this section, we first introduce our domain of cyber deception and subsequently discuss how it can be modeled in a factored I-POMDP.

3.1 Cyber Deception Domain

The cyber deception domain models the interactions between the attacker and the defender on a single honeypot host system. A state of the interaction is modeled using 11 state variables defining a total of 4,608 states. Table 1 briefly summarizes the state space. The S_DATA_DECOYS and C_DATA_DECOYS state variables represent the presence of sensitive data decoys and critical data decoys. The HOST_HAS_DATA variable represents the true type of valuable data on the system. We assume that a system cannot have two different types of valuable data simultaneously. This is a reasonable assumption because usually different hosts on enterprise networks possess different assets. We differentiate between sensitive_data and critical_data as distinct targets. Sensitive data, for example, includes private data of employees, high ranking officials, or any data that the attacker would profit from stealing. Also, in practical scenarios, honeypots never contain any real valuable data. Consequently, in the cyber deception domain in this paper, the HOST_HAS_DATA is none. However, the attacker is unaware of the honeypot or the data decoys and hence forms a belief over this state variable. Thus, the HOST_HAS_DATA variable gives a subjective view of the attacker being deceived.

State Variable Name Values Description
PRIVS_DECEPTION user, root, none Deceptive reporting of privileges
S_DATA_DECOYS yes, no Presence of sensitive data decoys
C_DATA_DECOYS yes, no Presence of critical data decoys
HOST_HAS_DATA sensitive_data, Type of valuable data
critical_data, none on the system
DATA_ACCESS_PRIVS user, root Privileges required to access or find data
ATTACKER_PRIVS user, root Attacker’s highest privileges
DATA_FOUND yes, no Valuable data found by the attacker
VULN_FOUND yes, no Local PrivEsc discovered by attacker
IMPACT_CAUSED yes, no Attack successful
ATTACKER_STATUS active, inactive Presence of attacker on the host
HOST_HAS_VULN yes, no Presence of local PrivEsc vulnerability
Table 1: The state of the cyber deception domain is comprised of 11 variables.

There are 5 observation variables for the attacker which make a total of 48 unique observations. We include three different types of attackers; the data exfil attacker, data manipulator and persistent threat. The data exfil attacker represents a threat that aims to steal valuable private data from the host. The data manipulator attacker represents a threat that seeks to manipulate data that is critical for the operation of a business or a physical target. Thus, the data exfil attacker targets sensitive_data in the system and the data manipulator attacker targets critical_data. The persistent threat attacker wants to establish a strong presence in the system at a high privilege level.

Action name States affected Description
FILE_RECON_SDATA DATA_FOUND Search for sensitive data for theft
FILE_RECON_CDATA DATA_FOUND Search for critical data for manipulation
VULN_RECON VULN_FOUND Search for local PrivEsc vulnerability
PRIV_ESC ATTACKER_PRIVS Exploit local PrivEsc vulnerability
CHECK_ROOT none Check availability of root privileges
START_EXFIL IMPACT_CAUSED Upload critical data over network
PERSIST IMPACT_CAUSED Establish a permanent presence in the system
EXIT ATTACKER_STATUS Terminate the attack
Table 2: The actions available to the attacker.

The attacker in the interaction can perform one of 9 actions to gather information about the system, manipulate the system, or take action on objectives. Table 2 briefly summarizes the actions available to the attacker. The FILE_RECON_SDATA and FILE_RECON_CDATA actions cause the DATA_FOUND variable to transition to yes. The FILE_RECON_SDATA action is slightly worse at finding data than the FILE_RECON_CDATA. This reflects the fact that private sensitive information is slightly difficult to find because it is often stored in user directories in arbitrary locations. On the other hand, critical data, like service configuration or database files, are stored in well-known locations on the system. The attacker gets information about the DATA_FOUND transition through the DATA observation variable. It simulates the data discovery phase of an attack. VULN_RECON is another action that works similarly and causes the VULN_FOUND transition to yes. This transition depicts the attacker looking for vulnerabilities to raise privileges. Depending on the type of the attacker, the START_EXFIL, MANIPULATE_DATA, or PERSIST actions can be performed to achieve the attacker’s main objectives. We assume that the attacker is unable to discern between decoy data and real data, and hence, unable to determine which variable influences the DATA_FOUND state transition during file discovery. The attacker, however, can distinguish between different types of valuable data. So, if the system contains data that is different from what the attacker expects, the attacker can observe this from the DISCREPANCY observation variable. As DATA and DISCREPANCY are separate observation variables, the attacker can observe a discrepancy even when data has been found. When this occurs, the attacker develops a belief over the decoy data states as the host can have only one type of data. This realistically models a situation in which the attacker encounters multiple decoys of different types and suspects deception.

The defender in the interaction starts with complete information about the system. The defender’s actions mostly govern the deployment and removal of different types of decoys. These actions influence the S_DATA_DECOYS and C_DATA_DECOYS states. Additionally, the defender can influence the attacker’s observations about his privileges through the PRIVS_DECEPTION state. The defender gets perfect observations whenever the attacker interacts with a decoy. Additionally, the defender gets stochastic observations about the attacker’s actions through the LOG_INFERENCE observation variable. The attacker is rewarded for exiting the system after causing an impact. For the data exfil and data manipulator attacker types, this is achieved by performing the START_EXFIL and MANIPULATE_DATA actions respectively. The persistent threat attacker is rewarded for getting root level persistence in the system.

Figure 1: The attacker starts with a low prior belief on the existence of decoys and an active defender. If decoys are indistinguishable from real data, the attacker attributes his observation to the existence of real data even when the host has none.

Figure 1 illustrates a scenario taken from an actual simulation run with the data manipulator attacker type. Initially, the attacker has a non-zero belief over the existence of data on the system. However, the true state of the system on the left shows that the system does not actually contain any data. In the absence of the defender or any static data decoys, the attacker will eventually update his beliefs to accurately reflect the reality by performing the FILE_RECON_CDATA action and observing the result. However, to avoid this belief state, the defender deploys data decoys when the attacker acts. The attacker’s inability to tell the difference between decoy data and real data and his prior belief about the absence of decoys leads him to attribute his observations to the existence of real data leading to the attacker being deceived.

(a) Dynamics compactly represented as a two time-slice DBN for select joint actions and observation variables.
(b) An ADD representing the observation function
Figure 2: I-POMDP representation of the cyber deception domain.

3.2 Factored I-POMDPs for Modeling Cyber Deception

Factored POMDPs have been effective toward solving structured problems with large state and observation spaces (Feng and Hansen, 2014; Poupart, 2005). Motivated by this observation, we extend the finitely-nested I-POMDP reviewed in Section 2 to its factored representation, I-POMDP. Formally, this extension is defined as:

is the factored interactive state space consisting of physical state factors and agent ’s models . In a finitely-nested I-POMDP the set is bounded similarly to finitely-nested I-POMDPs. Action set is defined exactly as before. We use algebraic decision diagrams (ADDs) (Bahar et al., 1997) to represent the factors for agent ’s transition, observation, and reward functions compactly. defines the transition function represented using ADDs as for and . is the set of observation variables which make up the observation space. is the observation function represented as ADDs, . defines the reward function for agent . The reward function is also represented as an ADD, .

We illustrate I-POMDP by modeling the cyber deception domain of Section 3.1 in the framework. Figure 1(a) shows the DBN for select state and observation variables given that the attacker engages in reconnaissance actions. The two slices in the DBN represent the sets of pre- and post-action state variables, and where represents a single state variable. Similarly, and denote the sets of observation variables for agents and respectively. The ADD represents the complete transition function for action . This is analogous to the complete action diagram defined by Hoey et al. (Jesse Hoey et al., 1999) for MDPs. Similarly, the observation function is represented using the ADD (Fig. 1(b)), which is analogous to the complete observation diagram (Feng and Hansen, 2014). Additionally, in an I-POMDP, agent also recursively updates the beliefs of agent . The attacker types are modeled as frames in . Let be the set of all models in . Because neither nor are directly accessible to agent , they are represented as ADDs and . The distribution over is then . Using these factors, we can now define the distribution over and given action and observation as a single ADD using existential abstraction:


Here, the ADD compactly represents ,

represents the probabilities

, represents , and represents the recursive belief update transitions of the original I-POMDP. Thus, the constructed ADD contains the transition probabilities for all interactive state variables given action and observation . The I-POMDP belief update can then be computed as:


where the ADD is obtained as in Eq. 1.

Symbolic Perseus (Poupart, 2005) offers a relatively scalable point-based approximation technique that exploits the ADD structure of factored POMDPs. Toward generalizing this technique for I-POMDP, we are aided by the existence of point-based value iteration for I-POMDPs (Doshi and Perez, 2008). Subsequently, we may generalize the

-vectors and its backup from the latter to the factored representation of I-POMDP



Here, is the set of -vectors from the next time step and is a belief point from the set of considered beliefs . A popular way of building is to project an initial set of beliefs points forwards for time steps using the belief update of Eq. 2.

4 Experiments and Analysis

We modeled the full cyber deception domain described in Section 3.1 from the perspective of a level-1 defender using the I-POMDP framework. We implemented the generalized Symbolic Perseus using the point-based updates of the -vectors and the belief set projection as given in Section 3.2, in order to solve I-POMDP. The solver has several enhancements such as cached ADD computations and ADD approximations for additional speed up.

We evaluate the deception policy generated by I-POMDP in simulations and on an actual system consisting of a standalone attacker programmed via Metasploit (Maynor, 2011) and a defender workstation. We simulate each attacker type using the optimal policy computed by the level-0 attacker POMDP. We show these policies for each type of attacker in the supplementary material. For the simulations, we randomly sample the frame and the starting privileges of the attacker to simulate a threat with unknown intentions and privileges. The defender begins knowing about the existence of decoys on the system. The attacker, on the other hand, does not have prior knowledge about any vulnerabilities or data on the system. The defender engages with the attacker by deploying decoys, facilitating deceptive observations, or adding known vulnerabilities to the system. In the simulations, the state transitions and observations for both agents are generated by sampling from the joint transition functions and individual observation functions.


We compare the I-POMDP policy against other passive baselines: one that does not engage and passively observes the attacker, and another which uses deception indiscriminately having deployed both sensitive and critical data decoys and all vulnerabilities in the honeypot at the beginning. We label the first baseline as NO-OP(no decoy) and the second baseline as NO-OP(all decoys). We perform the simulations for 30 trials with an attacker type randomly picked in each trial. During each trial, the defender begins not knowing the type of the attacker and believes that the state is that the attacker’s privileges are not known. We set in to 5. The generalized Symbolic Perseus is then run on 200 projected belief points until convergence to obtain the policy, which prescribes the subsequent actions for the defender until the end of the trial. It converges in about 6 minutes with a mean time per backup of 37 secs on Ubuntu 18 with Intel i7 and 64 GB RAM.

The NO-OP(no decoy) and NO-OP(all decoy) yielded a mean ( std err.) of 4.30 0.16 and 3.26 0.20 steps of engagement with the attacker, respectively. The longest engagement among these consisted of 7 and 5 steps, respectively. With NO-OP(no decoy), the attacker spends time searching for data and attempting to escalate his privileges but without much success, finally exiting the system. With NO-OP(all decoys), the attacker either quickly exploits the vulnerabilities or encounters the data decoys but quickly exits often due to the encountered data not being as expected. However, the I-POMDP agent engaged with the attacker for a mean duration of 5.90 0.24 with the longest interaction happening for 9 steps. It leverages the information gained by the first few observations to avoid using decoys that the attacker would find suspicious. For example, the defender first manipulates the attacker’s observations about his privileges. This increases her chances of observing file enumeration or vulnerability discovery activity, forming a belief over the frames. Subsequently, the defender baits the attacker using decoys and observes the interaction to solidify her belief. This minimizes the risk of the attacker encountering unexpected decoys or noticing discrepancies.

The simulations above are predicated on the level-1 defender believing that none of the level-0 attacker types are aware of the deception, which is the typical case. However, if the defender believes that the attacker believes that there is a small chance at 0.1 of decoys being used, we observed that the attacker often quickly exited the system as one would expect.

(a) When engaging a defender-unaware attacker, I-POMDP-based defender outperforms other passive agents in engaging the attackers and recognizing their intent
(b) On the actual host deployment, the I-POMDP-based agent uses implemented deception techniques to engage with the attacker for longer duration than other agents
Figure 3: Cross entropy (KL divergence) of the beliefs of the I-POMDP agent and other baselines in simulations. Cross entropies near zero signify good intent recognition.

Do the extended engagements facilitated by the I-POMDP agent help in intent recognition? Figure 2(a)(a) shows the cross-entropy between the defender’s belief of the attacker’s frame and the attacker’s true type, as it varies across the steps of the interaction. The defender’s I-POMDP policy eventually yields the lowest cross-entropy values compared to the baselines, often reaching zero in 6 steps. We show the cross-entropy for more steps because the attacker remains in the system performing a few more actions. The sharp decrease in cross-entropy in the first three steps is because the attacker’s decoy interactions (if the attacker is of type data exfil or manipulator) are perfectly observed by the defender (some other interactions generate noisy observations). Multiple consecutive data reconnaissance steps filter out the persistence attacker type, and the final step of either exfiltrating the data or manipulating it allows the defender to distinguish between the remaining two attacker types. But, for the NO-OP(no decoy) with no deception, the only source of information about the attacker is his general actions, which is noisy. Hence, such a defender is unable to form accurate beliefs before the attacker leaves the interaction. For the NO-OP(all decoy) agent that indiscriminately uses deception, observations from decoy interactions are perfect, but the risk of the attacker encountering contradicting decoys and suspecting deception is also high leading to early exits.

Figure 4: System architecture of the testbed used to deploy the agents. The defender manipulates the system through decoys and commonly used coreutils binaries to give deviant observations.
Host deployment

In our next phase of experimentation, we evaluated the real-world feasibility of deploying an operational I-POMDP on a host system and testing its efficacy. The testbed consists of 3 separate hosts: the attacker, the adaptive honeypot and the defender. Figure 4 shows the overall architecture of our testbed implementation. The attacker system runs a Kali Linux distribution which is well known for the variety of offensive and defensive cybersecurity tools that are preinstalled on it. The adaptive honeypot on which the interaction takes place runs a Metasploitable 3 Linux distribution. This distribution has a wide range of builtin vulnerabilities and is commonly used to simulate victim workstations in cyber attack simulations. The adaptive honeypot also contains an attacker agent that executes the attacks and communicates with the attacker. The attacker agent implements the actions given by the attacker’s optimal plan located on the attacker host using realistic techniques commonly used by real attackers. We implement real exploits to facilitate privilege escalation on the host. The adaptive honeypot also has a defender agent that implements the defender’s actions and gets observations.

The defender AI located on the defender workstation solves the I-POMDP and computes the optimal action. For implementing the observation function, the I-POMDP agent monitors and analyzes the system logs to get information about the attacker’s actions (i.e., observations). To enable this, we use GrAALF (Setayeshfar et al., 2019), a graphical framework for processing and querying system call logs. GrAALF analyzes system call logs in real-time and provides the stochastic LOG_INFERENCE observation variable values (pertaining to file and vulnerability searches) as well as the perfectly observed DATA_DECOY_INTERACTION variable values to the defender.

Our results in Fig. 3(b) show the adaptive deception strategy employed by the I-POMDP agent is better at engaging adversaries on a honeypot as compared to the passive strategies that are commonly used. While the cross entropy does not reach zero due to the challenge of accurately inferring the attacker’s actions from the logs (leading to noisier observations), it gets close to zero, which is indicative of accurate intent recognition.

5 Related Work

AI methods are beginning to be explored for use in cyber deception. An area of significant recent interest has been game-theoretic multi-agent modeling of cyber deception, which contrasts with the decision-theoretic modeling adopted in this paper.

Schlenker et al. (Schlenker et al., 2018) introduced cyber deception games based on Stackelberg games (Simaan and J.B. Cruz, 1973). These model deception during the network reconnaissance phase when the attacker is deceived into intruding a honeypot. Another similar approach (Durkota et al., 2015) allocates honeypots in a network using a Stackelberg game. The game uses attack graphs to model the attacker and creates an optimal honeypot allocation strategy to lure attackers. Jajodia et al. (Jajodia et al., 2017) develop probabilistic logic to model deception during network scanning. While these efforts focus on static deployment of deception strategies at the network level, we seek active deception at the host level – once the attacker has entered the honeypot. Further, we model individual phases of the attack in greater detail, which allows us to employ realistic deception techniques at each phase.

At the host level, Carroll et al. (Carroll and Grosu, 2011) models deception as a signaling game while Horak et al. (Horák et al., 2017) creates a model for active deception using partially observable stochastic games. However, both of these take a high-level view modeling defender actions rather abstractly. In contrast, our defender actions are realistic and can be implemented on honeypots as demonstrated in Section 4. Ferguson-Walter et al. (Ferguson-Walter et al., 2019) model possible differences between the attacker’s and defender’s perceptions toward the interaction by modeling cyber deception as a hypergame (Kovach et al., 2015). Hypergames model different views of the game being played from the perspective of the players. While this approach similar to ours represents the attacker’s perspective of the game, we explicitly model the adversary using a subjective decision-theoretic approach and do not solve for equilibrium.

6 Conclusion

Our approach of utilizing automated decision making for deception to recognize attacker intent is a novel application of AI and decision making in cyber security. It elevates the extant security methods from anomaly and threat detection to intent recognition. We introduced a factored variant of the well-known I-POMDP framework, which exploits the environment structure and utilized it to model the new cyber deception domain. Our experiments revealed that the I-POMDP-based agent succeeds in engaging various types of attackers for a longer duration than passive honeypot strategies, which facilities intent recognition. Importantly, the agent is practical on a real system with logging capabilities paving the way for its deployment in actual honeypots.

Broader Impact

On a broader scale, the I-POMDP framework that we introduce makes I-POMDPs tractable to be applied to larger problems. I-POMDPs are suitable for modeling multi-agent interactions due to their ability to model opponents from the perspective of an individual. This has a multitude of applications like negotiations, studying human behavior, cognition, etc. Through our work, we hope to make I-POMDPs tractable to be applied to such domains. Another area that we hope to motivate through our research is deception in human interactions. Modeling other agents explicitly will help understand how deceptive or real information influences an individual’s beliefs. This has a wide range of potential applications such as studying how biases can be exploited, the effect of fake news on individuals, and how individuals can detect deception. We hope our research will eventually motivate further research in areas like counter deception and deception resilience in agents.

At an application level, our work aims to motivate the use of AI and decision making to create informed cyber defense strategies. Our work provides a new perspective different from the traditional action-reaction dynamic that has defined interactions between cyber attackers and defenders for years. Our framework models the opponent’s mental states and preferences. This will aid security teams in understanding threats at a deeper level. We hope our framework will motivate the development of adaptive and intelligent deceptive solutions that can study and predict attackers at a deeper level. Understanding attackers’ mental models, inherent biases, and preferences will go a long way in forming flexible cyber defense strategies that can adapt to different threats.


  • [1] R. I. Bahar, E. A. Frohm, C. M. Gaona, G. D. Hachtel, E. Macii, A. Pardo, and F. Somenzi (1997) Algebric decision diagrams and their applications. Formal methods in system design 10 (2-3), pp. 171–206. Cited by: §1, §3.2.
  • [2] T. E. Carroll and D. Grosu (2011) A game theoretic investigation of deception in network security. Security and Communication Networks 4 (10), pp. 1162–1172. External Links: Document, ISBN 9781424445813, ISSN 19390122 Cited by: §5.
  • [3] D. Dennett (1986) Intentional systems. brainstorms. Cambridge, MA: MIT Press. Cited by: §2.
  • [4] P. Doshi and D. Perez (2008) Generalized point based value iteration for interactive pomdps.. In AAAI, pp. 63–68. Cited by: §3.2.
  • [5] P. Doshi (2012) Decision making in complex multiagent settings: a tale of two frameworks. AI Magazine 33 (4), pp. 82–95. Cited by: §2.
  • [6] K. Durkota, V. Lisỳ, B. Bošanskỳ, and C. Kiekintveld (2015) Approximate solutions for attack graph games with imperfect information. In

    International Conference on Decision and Game Theory for Security

    pp. 228–249. Cited by: §5.
  • [7] Z. Feng and E. A. Hansen (2014) Approximate planning for factored pomdps. In Sixth European Conference on Planning, Cited by: §3.2, §3.2.
  • [8] K. Ferguson-Walter, S. Fugate, J. Mauger, and M. Major (2019) Game theory for adaptive defensive cyber deception. In Proceedings of the 6th Annual Symposium on Hot Topics in the Science of Security, pp. 4. Cited by: §5.
  • [9] P. J. Gmytrasiewicz and P. Doshi (2005) A framework for sequential planning in multi-agent settings.

    Journal of Artificial Intelligence Research

    24, pp. 49–79.
    Cited by: §1, §2.
  • [10] K. Horák, Q. Zhu, and B. Bošanskỳ (2017) Manipulating adversary’s belief: a dynamic game approach to deception by design for proactive network security. In International Conference on Decision and Game Theory for Security, pp. 273–294. Cited by: §5.
  • [11] S. Jajodia, N. Park, F. Pierazzi, A. Pugliese, E. Serra, G. I. Simari, and V. Subrahmanian (2017) A probabilistic logic of cyber deception. IEEE Transactions on Information Forensics and Security 12 (11), pp. 2532–2544. Cited by: §5.
  • [12] S. Jajodia, V. Subrahmanian, V. Swarup, and C. Wang (2016) Cyber deception. Springer. Cited by: §1.
  • [13] A. Jesse Hoey, R. S. Aubin, and C. Boutilier (1999) SPUDD: stochastic planning using decision diagrams. Proceedings of Uncertainty in Artificial Intelligence (UAI). Stockholm, Sweden. Page (s) 15. Cited by: §3.2.
  • [14] N. S. Kovach, A. S. Gibson, and G. B. Lamont (2015) Hypergame theory: a model for conflict, misperception, and deception. Game Theory 2015. Cited by: §5.
  • [15] D. Maynor (2011) Metasploit toolkit for penetration testing, exploit development, and vulnerability research. Elsevier. Cited by: §4.
  • [16] Pingree (2018) Emerging Technology Analysis : Deception Techniques and Technologies Create Security Technology Business Opportunities. Trapx security, pp. 1–18. External Links: Document Cited by: §1.
  • [17] P. Poupart (2005) Exploiting structure to efficiently solve large scale partially observable markov decision processes. Ph.D. Thesis, Citeseer, University of Toronto. Cited by: §3.2, §3.2.
  • [18] A. Schlenker, O. Thakoor, H. Xu, L. Tran-Thanh, F. Fang, P. Vayanos, M. Tambe, and Y. Vorobeychik (2018) Deceiving cyber adversaries: A game theoretic approach. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2, pp. 892–900. External Links: ISBN 9781510868083, ISSN 15582914 Cited by: §5.
  • [19] O. Setayeshfar, C. Adkins, M. Jones, K. H. Lee, and P. Doshi (2019) GrAALF: supporting graphical analysis of audit logs for forensics. arXiv preprint arXiv:1909.00902. Cited by: §4.
  • [20] M. Simaan and Jr. J.B. Cruz (1973) On the stackelberg strategy in nonzero-sum games. Journal of Optimization Theory and Applications 11 (5), pp. 533–555. Cited by: §5.
  • [21] L. Spitzner (2003) The honeynet project: trapping the hackers. IEEE Security & Privacy 1 (2), pp. 15–23. Cited by: §1.


Attacker Policies

We model the attackers using optimal policies of their level-0 POMDPs. For our problem, we define three distinct types of attackers which are modeled as separate frames in the I-POMDP. Below we discuss the optimal policies for each type.

Figure 5: Optimal policy for data exfil type attacker

The data exfil attacker frame

The data exfil type attacker is rewarded for stealing sensitive_data on the host. We model this type based on threats that steal private data and other sensitive data from systems. The attacker starts with no knowledge of the existence of data on the system. We see that the optimal policy recommends the FILE_RECON_SDATA action which simulates sensitive data discovery on computers. After failing to find data after the first few attempts, the attacker attempts to escalate privileges and search again. If the attacker encounters unexpected types of decoys, the attacker leaves since there is no reward for stealing data that is not sensitive. Also, the observation of discrepancies when data is found will inform the attacker about the possibility of deception. This is because the system only contains a single type of data. On being alerted to the possibility of being deceived, the attacker leaves the system.

The data manipulator attacker frame

Figure 6: Optimal policy for data manipulator type attacker

The data manipulator type attacker is rewarded for manipulating critical_data on the host. This attacker type is modeled after attackers that intrude systems to manipulate data that is critical for a business operation. Similar to the data exfil type, the attacker starts with no information about the data. The optimal policy for this attacker type recommends FILE_RECON_CDATA action in the initial steps. Because critical data like service configurations or databases are usually stored in well-known locations, the FILE_RECON_CDATA is modeled to find critical_data quickly as compared to sensitive data. In the subsequent interaction steps, the attacker escalates privileges to continue the search if data is not found in the initial steps. Like the data exfil attacker, the data manipulator also leaves the system on observing discrepancies, suspecting deception, or on failure to find data.

The persistent threat attacker frame

Figure 7: Optimal policy for persistent threat type attacker

The persistent threat type attacker aims to establish root level persistence on the host. Such attacks are common. Attackers establish a strong presence in an organization’s network and stay dormant for an extended duration. For this attacker type, the policy consists of vulnerability discovery actions in the initial steps. The attacker escalates privileges by performing the PRIV_ESC action on finding vulnerabilities. Once the attacker has the required privileges, the PERSIST action is performed to complete the objective.

While all three attacker policies may seem significantly different from their actions, the defender’s observations of these actions are noisy. The errors in observation come from the noisy nature of real-time log analysis. For example, the VULN_RECON action models vulnerability discovery on a host. This action involves looking through the local file system for any vulnerable scripts, enumerating system information, listing services, etc. Thus a VULN_RECON can be mistaken for a FILE_RECON_CDATA or a FILE_RECON_SDATA in real-time log analysis. Similarly, it is difficult to tell the difference between the FILE_RECON_CDATA and FILE_RECON_SDATA from logs alone. Hence, without baiting the attacker into performing further actions, it is challenging to infer the intent of the attacker from the first few actions.