A Game Theoretic Approach for Dynamic Information Flow Tracking to Detect Multi-Stage Advanced Persistent Threats

Advanced Persistent Threats (APTs) infiltrate cyber systems and compromise specifically targeted data and/or resources through a sequence of stealthy attacks consisting of multiple stages. Dynamic information flow tracking has been proposed to detect APTs. In this paper, we develop a dynamic information flow tracking game for resource-efficient detection of APTs via multi-stage dynamic games. The game evolves on an information flow graph, whose nodes are processes and objects (e.g. file, network endpoints) in the system and the edges capture the interaction between different processes and objects. Each stage of the game has pre-specified targets which are characterized by a set of nodes of the graph and the goal of the APT is to evade detection and reach a target node of that stage. The goal of the defender is to maximize the detection probability while minimizing performance overhead on the system. The resource costs of the players are different and the information structure is asymmetric resulting in a nonzero-sum imperfect information game. We first calculate the best responses of the players and characterize the set of Nash equilibria for single stage attacks. Subsequently, we provide a polynomial-time algorithm to compute a correlated equilibrium for the multi-stage attack case. Finally, we experiment our model and algorithms on real-world nation state attack data obtained from Refinable Attack Investigation system.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

06/22/2020

Dynamic Information Flow Tracking for Detection of Advanced Persistent Threats: A Stochastic Game Approach

Advanced Persistent Threats (APTs) are stealthy customized attacks by in...
06/30/2020

A Multi-Agent Reinforcement Learning Approach for Dynamic Information Flow Tracking Games for Advanced Persistent Threats

Advanced Persistent Threats (APTs) are stealthy attacks that threaten th...
03/26/2021

Multi-Stage Attack Detection via Kill Chain State Machines

Today, human security analysts collapse under the sheer volume of alerts...
07/24/2020

Stochastic Dynamic Information Flow Tracking Game using Supervised Learning for Detecting Advanced Persistent Threats

Advanced persistent threats (APTs) are organized prolonged cyberattacks ...
09/06/2018

Adaptive Strategic Cyber Defense for Advanced Persistent Threats in Critical Infrastructure Networks

Advanced Persistent Threats (APTs) have created new security challenges ...
07/21/2017

A Dynamic Game Analysis and Design of Infrastructure Network Protection and Recovery

Infrastructure networks are vulnerable to both cyber and physical attack...
02/01/2018

Anomaly Detection in Log Data using Graph Databases and Machine Learning to Defend Advanced Persistent Threats

Advanced Persistent Threats (APTs) are a main impendence in cyber securi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Advanced Persistent Threats (APTs) are long-term stealthy attacks mounted by intelligent and resourceful adversaries with the goal of sabotaging critical infrastructures and/or exfiltrating critical information. Typically, APTs target companies and organizations that deal with high-value information and intellectual property. APTs monitor the system for long time and perform tailored attacks that consist of multiple stages. In the first stage of the attack, APTs start with an initial reconnaissance step followed by an initial compromise. Once the attacker establishes a foothold in the system, the attacker tries to elevate the privileges in the subsequent stages and proceed to the target through more internal compromises. The attacker then performs data exfiltration at an ultra-low-rate.

Detecting APTs is a challenging task as these attacks are stealthy and customized. However, APTs introduce information flows, as data-flow and control- flow commands, while interacting with the system. Dynamic Information Flow Tracking (DIFT) is a promising detection mechanism against APTs as DIFT detects adversaries in a system by tracking the traces of the information flows introduced in the system [1]. DIFT taints or tags sensitive information flows across the system as suspicious and tracks the propagation of tagged flows through the system and generates security analysis referred as traps, which are based on certain pre-specified security rules, for any unauthorized usage of tagged data [2].

Our objective in this paper is to obtain a resource-efficient analytical model of DIFT to detect multi-stage APTs by an optimal tagging and trapping procedure. There is an inherent trade-off between the effectiveness of DIFT and the resource costs incurred due to memory overhead for tagging and tracking non-adversarial information flows. Adversarial interaction makes game theory a promising framework to characterize this trade-off and develop an optimal DIFT defense, which is the contribution of this paper. Each stage of the APT attack is a stage in our multi-stage game model which is characterized by a unique set of critical locations and critical infrastructures of the system, referred to as

destinations. Note that, the intermediate stages in the attack hold critical information to the adversary for achieving its goals in the final stage.

The contributions of this paper are the following:

  1. [leftmargin=*]

  2. We model the interaction of APTs and DIFT with the system as a two-player multi-stage nonzero-sum game with imperfect information structure. The adversary strategizes in each stage of the game to reach a destination node of that specific stage and the defender strategizes to detect the APT in a resource-efficient manner. A solution to this game gives an optimal policy for DIFT that performs selective tagging that minimize both overtagging and undertagging, a tag propagation rule with tag sanitization, and an optimal selection of security rules and trap locations to conduct security analysis to maximize the probability of APT detection while minimizing memory and performance overhead on the system.

  3. We provide algorithms to compute best responses of the adversary and the defender. The best response of the adversary is obtained by reducing it to a shortest path problem on a directed graph such that a shortest path gives a sequence of transitions of the attacker that has maximum probability of reaching the final target. The best response of the defender, which is a subset of nodes, is obtained by using the submodularity property of its payoff function.

  4. We consider a special case of the problem where the attack is a single-stage attack. For this case, we characterize the set of Nash equilibrium of the game. This characterization is obtained by proving the equivalence of the sequential game to a suitably defined bimatrix-game formulation.

  5. We provide a polynomial-time iterative algorithm to compute a correlated equilibrium of the game for the multi-stage attack. The correlated equilibrium provide an algorithm to obtain locally optimal equilibrium strategies for both the players by transforming the two-player game to an -player game, where denotes the number of processes and objects in the system, denotes the number of stages of the APT attack, and denotes the cardinality of the set of security rules.

  6. We perform experimental analysis of our model on the real-world multi-stage attack data obtained using Refinable Attack INvestigation (RAIN) framework [3], [4] for a three day nation state attack.

Related Work

Reference
DIFT

Tag source and Tag propagation Security analysis
Newsome et al. [1]
  1. [wide, labelwidth=!, labelindent=0pt]

  2. inputs from network sockets

  3. data-flow-based

attacks altering jump targets, format string attacks, attacks using system call arguments, and attacks targeted at specific libraries
Clause et al. [5]
  1. [wide, labelwidth=!, labelindent=0pt]

  2. data from network hosts

  3. data- and control-flow-based

each instance of call, return, or branch instruction
Suh et al. [2]
  1. [wide, labelwidth=!, labelindent=0pt]

  2. all I/O except initial program

  3. data-flow-based

use of tainted data as load addresses, store addresses, jump targets, and branch conditions
Yin et al. [6]
  1. [wide, labelwidth=!, labelindent=0pt]

  2. text, password, HTTP, ICMP, FTP, document, and directory

  3. data-flow-based

anomalous information access, anomalous information leakage, and excessive information access
Vogt et al. [7]
  1. [wide, labelwidth=!, labelindent=0pt]

  2. all inputs specified by Netscape

  3. data-flow-based

whenever tainted data is transferred to a third party
Dalton et al. [8]
  1. [wide, labelwidth=!, labelindent=0pt]

  2. every word of memory

  3. data-flow-based

high level semantic attacks, memory corruption, low-overhead security exceptions
Table I: An overview of the DIFT architectures for data-flow and control-flow based tracking for different choices of tag sources and security analyses

There are different architectures for DIFT available in the literature to prevent a wide range of attacks [5]. The fundamental concepts in these architectures remain same, however, they differ in the choice of tagging units, tag propagation rules: data- and control-flow dependencies based rules, and the set of security rules used for verification of the authenticity of the information flows [9]. Table I gives a brief overview of the different DIFT architectures used in some representative papers. While the papers in Table I gave software modeling of DIFT architecture, we provide an analytical model of DIFT. Specifically, we model DIFT to detect APTs by tracking information flows using data-flow dependencies.

Game theory has been widely used in the literature to analyse and design security in cyber systems against different types of adversaries [10], [11]. For instance, the FlipIt game modeled in [12] captures the interaction between APTs and the defender when both the players are trying to take control of a cyber system. In [12], both APT and defender take actions periodically and pay a cost for each of their action. Lee et al. in [13] introduced a control-theoretic approach to model competing malwares in FlipIt game. Game models are available for APT attacks in cloud storage [14] and cyber systems [15]. Interaction between an APT and a defender that allocate Central Processing Units (CPUs) over multiple storage devices in a cloud storage system is formulated as a Colonel Blotto (zero-sum) game in [14]. Another zero-sum game model is given in [15] to model the competition between the APT and the defender in a cyber system.

Often in practice, the resource costs for the defender and the adversary are not the same, hence the game model is nonzero-sum. In this direction, a nonzero-sum game model is given in [16] to capture the interplay between the defender, the APT attacker, and the insiders for joint attacks. The approach in [16] models the incursion stage of the APT attack, while our model in this paper captures the different stages of an APT attack. More precisely, we provide a multi-stage game model that detect APTs by implementing a data-flow-based DIFT detection mechanism while minimizing resource costs.

A DIFT-based game model for single-stage attack is given in the recent work [17]. Later, [18] extended the model in [17] to the case of multi-stage attack. The approaches in [17] and [18] consider a DIFT architecture in which the locations in the system to perform security analysis, called as traps or tag sinks, are pre-specified and the defender will select the data channels that are to be tagged. In this paper, we provide an analytical model for data-flow based DIFT architecture that select not only the data channels to be tagged but also the locations to conduct security analysis and also the security rules that are to be verified. The proposed model, hence captures a general model of data-flow based DIFT.

Organization of the Paper

The rest of the paper is organized as follows: Section 2 describes the preliminaries of DIFT and the system. Section 3 introduces the notations used in the paper and then presents the game formulation. Section 4 discusses the solution concept for the game we consider. Section 6 presents a solution approach to the game for the single-stage attack. Section 7 presents a solution to the game for the multi-stage attack. Section 8 explains the experimental results of the model and results on real-world data. Finally, Section 9 gives the concluding remarks.

2 Preliminaries

In this section, we discuss the detection mechanism DIFT and the graphical representation of the system referred to as information flow graph.

2-a Dynamic Information Flow Tracking

DIFT detection system has three major components: 1) tag sources, 2) tag propagation rules, and 3) tag sinks or traps. Tag is a single or multiple bit marking, depending on the level of granularity, that denotes the sensitivity of a data flow. Data channels, such as keyboards, network interface, and hard disks, are considered as sensitive and hence tagged by DIFT when it holds information that could be exploited by an APT [7]. All information flows emanating from a tagged channel are tagged flows. The tag status of the information flows propagate through the system based on the pre-specified propagation rules which are either data-flow-based or data- and control-flow-based. Hence, whenever a tagged flow mixes with a benign flow, the resulting flow gets tagged [5].

Tagged flows are inspected at specific locations called tag sinks also referred as traps in order to determine the runtime behavior of the system. Tag sinks are specified either using the memory and code locations (like tag sources) or using types of instructions where the users want to analyze a tagged flow before executing certain types of instructions [5]. Tag sinks are generated in the system when an unusual usage of a tagged information is detected. The system then obtains the details of the associated flow, like terminal points of the flow, the path traversed, and concludes if the flow is spurious or not based on the system’s or program’s security rules. In case if the system concludes that the flow is spurious, it terminates the system operation. On the other hand, if the flow is found to be not spurious, then the system continues its operation.

Conventional DIFT will tag all the sensitive channels in the system. This, however, results in tagging of numerous authentic flows referred as overtagging [9] which leads to false alarms and performance overhead resulting in system slowdown. On the other hand, untagged spurious flows due to undertagging are security threats to the system. Moreover, conventional DIFT only adds tag and never removes tag leading to tag spread [9]. To reduce tag spread and the overhead caused by tagging, the notion of tag sanitization was introduced in [9]. The output of constant operations (where the output is independent of the source data) and a tagged flow successfully passing all security rules can be untagged. An efficient tagging policy must incorporate tag sanitization and perform selective tagging in such a way that both overtagging and undertagging are minimized. Also, the selection of security rules and the locations of the tag sinks must be optimal to reduce performance and memory overhead on the system.

2-B Information Flow Graph

Information flow graph is a graphical representation of the system in which the node set corresponds to the processes, objects, and files in the system and edge set represents interactions between different nodes. More precisely, the edges of the graph represent information flows captured using system log data of the system, for the whole-system execution and workflow during the entire period of logging. The node set denote the subset of nodes that correspond to critical data centers and the critical infrastructure sites of the system known as destinations. We consider multi-stage attacks consisting of, say stages, where each stage is characterized by a unique set of destinations. The set denotes the set of destinations in the stage of the attack and hence . The interaction of DIFT and APTs, which we formally model in Section 3, evolves through .

3 Problem Formulation: Game Model

In this section, we model a two player multi-stage game between APTs and DIFT. We model the different stages of the game in such a way that each stage of the APT attack translates to a stage in the game.

3-a System Model

We denote the adversarial player of the game by and the defender player by . In the stage of the attack, the objective of is to evade detection and reach a destination node in stage , given by . The objective of is to detect before reaches a node in . In order to detect , identifies a set of processes as the tag sources such that any information flow passing through a process is marked as sensitive. tracks the traversal of a tagged flow through the system and generates security analysis at tag sinks denoted as using pre-specified rules.

Let be the set of security rules. We consider security policy that are based on the terminal points of the flow. Therefore, , where represents that the pair of terminal points of the flow violate the security policy of the system and otherwise. Here, , since not all node pairs in have a directed path between them. Hence the number of security rules that are relevant to a node is atmost . Without loss of generality, we assume that each node in is associated with security rules. As is large, applying all security rules at every tag sink is not often required. In our game model, DIFT selects a subset of rules at every tag sink to perform security analysis.

3-B State Space of the Game

Let denote the subset of nodes in the information flow graph that are susceptible (vulnerable) to attacks. In order to characterize the entry point of the attack by a unique node, we introduce a pseudo-process such that is connected to all the processes in the set . Let , , and . Note that, is the root node of the modified graph and hence transitions are allowed from and no transition is allowed into .

Now we define the state space of the game. Each decision point in the game is a state of the state space and is defined by the source of the flow in set , the stage of the attack, the current process along with its tag status, trap status, and the status of the security rules applicable at . We use to denote the process at the stage of the attack. Then the state space of the game is denoted by , where with . Here is the state in corresponding to the pseudo-node . The remaining states are given by , for , where , , , and . Here, if is tagged and otherwise. Similarly, if is a tag sink and otherwise, and denotes the selection of security rules (bit denotes that a rule is selected and bit denotes that the rule is not selected). Note that has exponential cardinality. Tagging means tagging all sensitive flows which is not desirable on account of the performance overhead. Therefore, is neither a tag source nor a tag sink and it is always in stage 1 with origin at itself as denoted by state . We give the following definition for an adversarial flow in the state space originating at the state .

Definition 3.1.

An information flow in that originates at state and terminates at state is said to satisfy the stage-constraint if the flow passes through some destinations in in order.

3-C Actions of the Players

The players and have finite action sets over the state space denoted by sets and , respectively. The action set of is a subset of and represents the next node in that is reached by the flow. can also end the game by dropping the information flow at any point of time by transitioning to a null state . Thus . Note that, for a state in is decided by the process in to which the adversary transitions from , i.e., the transition from in the state space. Further, for a particular adversarial flow remains fixed for all states in that the flow traverses. As the tag propagation rules are pre-specified by the user, the action set of includes selection of tag sources, tag sinks, security check rules, and tag sanitization. Hence the action set of defender at is a binary tuple, , and . While the objective of is to exploit the vulnerable processes of the system to successfully launch an attack, the objective of is to select an optimal set of tagged nodes, say , and an optimal set of tag sinks, say , and a set of security rules such that any spurious information flow in the system is detected at some tag sink before reaching the destination.

3-D Information of the Game

Both the adversary and the defender know the graph . At any state in the game, the defender has the information about the tag source status of , the tag sink status of , and the set of security rules chosen at . However, the adversary is unaware of the tag source status, the tag sink status, and the security rules chosen at that state. On the other hand, while the adversary knows the stage of the attack, the defender does not know the stage of the attack and hence the unique set of destinations targeted by in that particular stage. Thus, the players and have asymmetric knowledge resulting in an imperfect information game.

3-E Strategies of the Players

Now we define the strategies of both the players. A strategy is a rule that the player uses to select actions at every step of the game. Since the action sets of the players are lower level processes with memory constraints and computational limitations, we consider stationary strategies which are defined below for both the players.

Definition 3.2.

A player strategy is stationary if it depends only on the current state.

Additionally, we consider mixed strategies

and hence there are probability distributions over the action sets

and . The defender strategy at a process , is a tuple of length , , that consists of the probability that is tagged , the probability that is a tag sink , and the probability of selecting each rule in corresponding to , . The pseudo-process has for . Note that the defender strategy does not depend on the stage, as the defender is unaware of the stage of the attack. The adversary on the other hand knows the stage of the attack and hence the strategy of , i.e., the transition probability distribution , depends on the attack stage. Consider a process and let denotes the set of neighbors of defined as . Then, implies that one of the following cases hold: 1)  and , and 2)  and . Here, case 1) corresponds to transition in the same stage to a neighbor node or dropping out of the game and case 2) corresponds to transition at a destination from one stage to the next stage. Note that, in case 2) (i.e., and ) . Also, . Taken together, the strategies of and

are given by the vectors

and , respectively. Note that, is a vector whose length equals the number of edges in the state space , say , while is a vector of length with each entry of length . Notice that is defined in such a way that a flow that originate at in the state space reaches a state , for some and for some , after passing through some destinations of stages . By this definition of state space and strategies of the game, all information flows in satisfy the stage-constraints, given in Definition 3.1, and can affect the performance of the system and even result in system breakdown, if malicious.

3-F Payoffs to the Players

Now we define the payoff functions of the players and , denoted by and , respectively. The payoff function for both the players include penalties and rewards at every stage of the attack. If the adversarial flow reaches a destination in the stage satisfying the stage-constraint, then the adversary earns an intermediate reward and the defender incurs an intermediate penalty. On the other hand, if the adversary gets detected at some stage , then the adversary incurs a penalty, the defender receives a reward, and the game terminates. In addition to this, the defender is also associated with costs for tagging the nodes, setting tag sinks at the nodes, and selecting security rules from the set , as tagging and security analysis of information flows lead to resource overhead such as memory and storage.

More precisely, consists of: (i) reward for successfully reaching a destination in the stage satisfying the stage-constraints, and (ii) cost if the adversary is detected by the defender. Similarly, consists of: (a) memory cost for tagging node , (b) memory cost for setting tag sink at node , (c) cost , for , for selecting the security check rule at a tag sink, (d) cost if the adversary reaches a destination in the stage satisfying the stage-constraint, and (e) reward for detecting the adversary. We assume that the cost of tagging a node and the cost of setting tag sink at a node, and , respectively, are independent of the attack stage. However, and depends on the average traffic at process and hence and . Here, is a fixed tagging cost and is a fixed cost for setting tag sink, where is the set of negative real numbers, and denotes the average traffic at node .

Recall that, the origin of any adversarial information flow in the state space is . For a flow originating at state in , let denotes the probability that the flow will get detected at stage and denotes the probability that the flow will reach some destination in set . Note that and depends on the tag source status, the tag sink status and also the set of security rules selected. For a given strategy, and , the payoffs and are given by,

(1)
(2)

3-G Preliminary Analysis of the Model

In this subsection, we perform an initial analysis of our model. A multi-stage attack consisting of stages belongs to one of the following scenarios.

  1. [leftmargin=*]

  2. The adversary drops out of the game before reaching some destination in .

  3. The adversary reaches some destination each in and then drops out of the game, for ( possibilities).

  4. The adversary reaches some destination each in .

  5. The defender detects the adversary at some stage.

The utility of the game is different for each of the cases listed above. In scenario 1), and incurs zero payoff. In scenario 2), adversary earns rewards for reaching stages , respectively, defender incurs penalty for not detecting the adversary at stages , respectively, and the game terminates. In scenario 3), the adversary earns rewards for reaching destinations in all stages and wins the game and the defender incurs a total penalty for not detecting the adversary at all the stages. In the last scenario, adversary incurs the penalty for getting detected and the defender earns the reward for detecting the adversary and wins the game.

(3)
(4)

For calculating the payoffs of and at a decision point in the game (i.e., at a state in ), we define utility functions and for the adversary and defender, respectively, at every state in the state space . Let denotes the probability with which the adversary drops out of the game at state , for any and . Let denotes the probability that an information flow originating at reaches a destination in and then drops out before reaching a destination in , without getting detected by the defender, when the current state is . Also let denote the probability that an information flow is detected by the defender when the current state is . To characterize the utility of the players at a state in , we now introduce few notations. For notational brevity, let us denote by , for . For state , define

Then,

Using the definitions of and at a state in , the payoffs of the defender and the adversary at a state is given by Eqs. (3) and (4) respectively.

In Eqs. (3) and (4), denotes the probability that node is tagged in a flow whose current state is and denotes the probability that node is a tag sink in a flow whose current state is . Similarly, denotes the probability that the security rule is selected for inspecting authenticity of a flow whose current state is . Eqs. (3) and (4) give a system of linear equations each for the utility vectors and , where , denote the utilities at the state in . Now we give the following result, which relates global payoffs with local payoffs , respectively.

Lemma 3.3.

Consider the defender and adversary strategies and , respectively. Then, the following hold: (i) , and (ii) .

Proof.

(i): By definition, . Here,

(5)

Where, is the total probability that a flow originating at reach some destination in . Similarly, is the total probability that a flow originating at reach some destination in . Thus

(6)

From Eqs. (5) and (6), we get

(7)

Since ,

(8)

From Eqs. (7) and (8), we get .

(ii): Notice that is the probability that the process is a tag source in a flow originating at . Thus . Similarly, we get and for . This along with Eqs. (7) and (8) implies that . This completes the proof of (i) and (ii). ∎

4 Game Model: Solution Concept

This section presents an overview of the notions of equilibrium considered in this work. We first describe the concept of a player’s best response to a given mixed policy of an opponent.

Definition 4.1.

Let denote an adversary strategy (transition probabilities) and denote a defender strategy (probabilities of tagging, tag sink selection, and security rule selection at every node in the graph). The set of best responses of the defender given by

Similarly, the best responses of the adversary are given by

Intuitively, the best responses of the defender are the set of tagging strategies, the set of tag sink selection strategies, and the set of security rule selection strategies that jointly maximize the defender’s utility for a given adversary strategy. At the same time, the best responses of the adversary are the sets of transition probabilities that maximize the adversary’s utility for a given defender (tagging, tag sink selection, and security rule selection) strategy. A mixed policy profile is a Nash equilibrium (NE) if the mixed policy of each player is a best response to the fixed mixed policy of the rest of the players. Formal definition of Nash equilibrium is as follows.

Definition 4.2.

A pair of mixed policies is a Nash equilibrium if

A Nash equilibrium captures the notion of a stable solution as it occurs when neither player can improve its payoff by unilaterally changing its strategy. Unilateral deviation of the adversary’s strategy is a change in one of the transition probabilities for fixed defender’s strategy and unilateral deviation of the defender’s strategy is a change in either tagging probability, or tag sink selection probability, or the probability of selecting a security rule at a node, for fixed adversary strategy. Kuhn’s equivalence result [19] between mixed and stochastic policies along with Nash’s result in [20] that prove the existence of a Nash equilibrium (NE) for a finite game with mixed strategy, guarantees the existence of NE for the game we consider in this paper. While there exists a Nash equilibrium for games with rational, noncooperative players, it is NP-hard to compute it in general, especially for nonzero-sum dynamic games of the type considered in this paper. Also note that, for the game considered in this paper, the utility functions for the players are nonlinear in the probabilities. A weaker solution concept which is a relaxation of the Nash equilibrium is the correlated equilibrium defined as follows.

Definition 4.3.

Let denote a joint probability distribution over the set of defender and adversary actions. The distribution is a correlated equilibrium if for all strategies and ,

Here, denotes the expectation. We next consider a simpler version of the correlated equilibrium that models the local policies at each process.

Definition 4.4.

Let denote a joint probability distribution over the set of defender and adversary actions. The distribution is a local correlated equilibrium if for all states , , and strategies and , we have