1 Introduction
Penetration testing (pentesting) evaluates the security of an IT infrastructure by trying to identify and exploit vulnerabilities. It constitutes a central, often mandatory component of a security audit, e.g., the Payment Card Industry Data Security Standard prescribes ‘network vulnerability scans at least quarterly and after any significant change in the network’ [10]
. Network penetration tests are frequently conducted on networks with hundreds of machines. Here, the vulnerability of the network is a combination of hostspecific weaknesses that compose to an attack. Consequently, an exhausting search is out of question, as the search space for these combinations grows exponentially with the number of hosts. Choosing the right attack vector requires a vast amount of experience, arguably making network pentesting more art than science.
While it is conceivable that an experienced analyst comes up with several of the most severe attack vectors, this is not sufficient to provide for a sound mitigation strategy, as the evaluation of a mitigation strategy requires a holistic security assessment. So far, there is no rigorous foundation for what is arguably the most important step, the step after the penetration test: how to mitigate these vulnerabilities.
In practice, the severity of weaknesses is assessed more or less in isolation, proposed countermeasures all too often focus on single vulnerabilities, and the mitigation path is left to the customer. There are exceptions, but they require considerable manual effort.
Simulated pentesting was proposed to automate largescale network testing by simulating the attack finding process based on a logical model of the network. The model may be generated from network scans, public vulnerability databases and manual inspection with various degrees of automation and detail. To this end, AI planning methods have been proposed [4, 33] and in fact used commercially, at a company called Core Security, since at least 2010 [11]. These approaches, which derive from earlier approaches based on attack graphs [40, 45, 46]
, assume complete knowledge over the network configuration, which is often unavailable to the modeller, as well as the attacker. We follow a more recent approach favouring Markov decisions processes (MDP) as the underlying state model to obtain a good middle ground between accuracy and practicality
[12, 19] (we discuss this in detail as part of our related work discussion, Section 2).Simulated penetration testing has been used to great success, but an important feature was overseen so far. If a model of the network is given, one can reason about possible mitigations without implementing them – namely, by simulating the attacker on a modified model. This allows for analysing and comparing different mitigation strategies in terms of the (hypothetical) network resulting from their application. Algorithmically, the attackerplanning problem now becomes part of a larger whatif planning problem, in which the best mitigation plans are constructed.
The algorithm we propose optimizes the mitigation strategy based on a set of possible mitigation actions. Mitigation actions can represent, but are not limited to, changes to the network topology, e.g., adding a packet filter, system updates that remove vulnerabilities, and configuration changes or applicationlevel firewalls which work around issues. While, e.g., an applicationlevel firewall might be an efficient temporary workaround for a vulnerability that affects a single host, contracting a software vendor to provide a patch might be more costefficient in case the vulnerability appears throughout the network. To reflect cases like this, mitigation actions are assigned a cost for their first application (setup cost), and another potentially different cost for all subsequent applications (application cost). The algorithm computes optimal combinations w.r.t. minimizing the maximal attacker success for a given budget, and proposes dominant mitigation strategies with respect to cost and attacker success probability. This minmax notion is similar to a Stackelberg game, which are frequently used in security games
[29]. The foundational assumption is that the defender acts first, while the adversary can chose her best response after observing this choice, similar to a market leader and her followers. The algorithm thus provides a wellfounded basis for a holistic mitigation strategy.After discussing related work in Section 2 and giving a running example in Section 3, we present our mitigation analysis models and algorithms in Sections 4 to 6, framed in a formalism suited for a large range of mitigation/attack planning problems. In Section 7, we show that a particular class of these models can be derived by scanning a given network using the Nessus networkvulnerability scanner. The attacker action model is then derived using a vulnerability database and data associated using the Common Vulnerability Scoring System (CVSS). This methodology provides a largely automated method of deriving a model (only the network topology needs to be given by hand), which can then be used as it is, or further refined. In Section 8, we evaluate our algorithms w.r.t. problems from this class, derived from a vulnerability database and a simple scalable network topology.
2 Related Work
Our work is rooted in a long line of research on network security modeling and analysis, starting with the consideration of attack graphs. The simulated pentesting branch of this research essentially formulates attack graphs in terms of standard sequential decision making models – attack planning – from AI. We give a brief background on the latter first, before considering the history of attack graph models.
Automated Planning is one of the oldest subareas of AI (see [13] for a comprehensive introduction). The area is concerned with generalpurpose planning mechanisms that automatically find a plan, when given as input a highlevel description of the relevant world properties (the state variables), the initial state, a goal condition, and a set of actions, where each action is described in terms of a precondition and a postcondition over state variable values. In classical planning, the initial state is completely known and the actions are deterministic, so the underlying state model is a directed graph (the state space) and the plan is a path from the initial state to a goal state in that graph. In probabilistic planning, the initial state is completely known but the action outcomes are probabilistic, so the underlying state model is a Markov decision process (MDP) and the plan is an action policy mapping states to actions. In partially observable probabilistic planning
, we are in addition given a probability distribution over the possible initial states, so the underlying state model is a partially observable MDP (POMDP).
The founding motivation for Automated Planning mechanisms is flexible decision taking in autonomous systems, yet the generality of the models considered lends itself to applications as diverse as the control of modular printers, [42], natural language sentence generation [25, 26], greenhouse logistics [18], and, in particular, network security penetration testing [4, 33, 43, 12, 19]. This latter branch of research – network attack planning as a tool for automated security testing – has been coined simulated pentesting, and is what we continue here.
Simulated pentesting is rooted in the consideration of attack graphs, first introduced by Philipps and Swiler [40]. An attack graph breaks down the space of possible attacks into atomic components, often referred to as attack actions, where each action is described by a conjunctive precondition and postcondition over relevant properties of the system under attack. This is closely related to the syntax of classical planning formalisms. Furthermore, the attack graph is intended as an analysis of threats that arise through the possible combinations of these actions. This is, again, much as in classical planning. That said, attack graphs come in many different variants, and the term “attack graph” is rather overloaded. From our point of view here, relevant distinction lines are the following.
In several early works (e. g. [45, 50]), the attack graph is the attackaction model itself, presented to the human as an abstracted overview of (atomic) threats. It was then proposed to instead reason about combinations of atomic threats, where the attack graph (also: “full” attack graph) is the state space arising from all possible sequencings of attack actions (e. g. [41, 46]). Later, positive formulations – positive preconditions and postconditions only – where suggested as a relevant special case, where attackers keep gaining new assets, but never lose any assets during the course of the attack [50, 2, 23, 37, 36, 14]. This restriction drastically simplifies the computational problem of nonprobabilistic attack graph analysis, yet it also limits expressive power, especially in probabilistic models where a stochastic effect of an attack action (e. g., crashing a machine) may be detrimental to the attacker’s objectives.^{1}^{1}1The restriction to positive preconditions and postconditions is actually known in Automated Planning not as a planning problem of interest in its own right, but as a problem relaxation
, serving for the estimation of goal distance to guide search on the actual problem
[6, 20].A close relative of attack graphs are attack trees (e. g. [45, 34]). These arose from early attack graph variants, and developed into “Graphical Security Models” [27]: Directed acyclic AND/OR graphs organizing known possible attacks into a topdown refinement hierarchy. The human user writes that hierarchy, and the computer analyzes how attack costs and probabilities propagate through the hierarchy. In comparison to attack graphs and planning formulations, this has computational advantages, but cannot find unexpected attacks, arising from unforeseen combinations of atomic actions.
Probabilistic models of attack graphs/trees have been considered widely (e. g. [8, 35, 44, 47, 9, 30, 21]), yet they weren’t, at first, given a formal semantics in terms of standard sequential decision making formalisms. The latter was done later on by the AI community in the simulated pentesting branch of research. After initial works linking nonprobabilistic attack graphs to classical planning [4, 33], Sarraute et al. [43] devised a comprehensive model based on POMDPs, designed to capture penetration testing as precisely as possible, explicitly modeling the incomplete knowledge on the attacker’s side, as well as the development of that knowledge during the attack. As POMDPs do not scale – neither in terms of modeling nor in terms of computation – it was thereafter proposed to use MDPs as a more scalable intermediate model [12, 19]. Here we build upon this latter model, extending it with automated support for mitigation analysis.
Mitigation analysis models not only the attacker, but also the defender, and in that sense relates to gametheoretic security models. The most prominent application of such models thus far concerns physical infrastructures and defenses (e. g. [49]), quite different from the network security setting. A line of research considers attackdefense trees (e. g. [28, 27], not based on standard sequential decision making formalisms. Some research considers pentesting but from a very abstract theoretical perspective [5]. A basic difference to most gametheoretic models is that our mitigation analysis does not consider arbitrarily long exchanges of action and counteraction, but only a single such exchange: defender applies network fixes, attacker attacks the fixed network. The latter relates to Stackelberg competitions, yet with interacting statespace search models underlying each side of the game.
3 Running Example
We will use the following running example for easier introduction of our formalism and to foreshadow the modelling of networks which we will use in Section 7. Let us consider a network of five hosts, i.e., computers that are assigned an address at the network layer. It consists of a webserver , an application server , a database server , and a workstation . We partition the network into three zones called as follows: 1) the sensitive zone, which contains important assets, i.e., the database server and the information it stores, 2) the DMZ, which contains the services that need to be available from the outside, i.e., and , 3) the user zone, in which is placed and 4) the internet, which is assumed under adversarial control by default and contains at least a host .
These zones are later (cf. Section 8) used to define the adversarial goals and may consist of several subnets. For now, each zone except the internet consists of exactly one subnet. These subnets are interconnected, with the exception of the internet, which is only connected to the DMZ. Firewalls filter some packets transmitted between the zones. We will assume that the webserver can be accessed via HTTPs (port 443) from the internet.
4 Penetration Testing Formalism
Intuitively, the attacks we consider might make a service unavailable, but not physically remove a host from the network or add a physical connection between two hosts. We thus distinguish between network propositions and attacker propositions, where the former describes the network infrastructure and persistent configuration, while the latter describes the attacker’s advance through the network. By means of this distinction, we may assume the state of the network to be fixed, while everything else can be manipulated by the attacker. The network state will, however, be altered during mitigation analysis, which we will discuss in more detail in Section 5.
Networks are logically described through a finite set of network propositions . A concrete network state is a subset of network propositions that are true in this state. All propositions are considered to be false.
Example 1
In the running example, the network topology is described in terms of network propositions assigning a host to a subnet , e.g., . Connectivity is defined between subnets, e.g., indicates that TCP packets with destination port (HTTPS) can pass from the internet into the DMZ. We assume that the webserver , the workstation and the database server are vulnerable, e.g., for a vulnerability with CVE identifier affecting on TCP port 443, that compromises integrity.
We formalize network penetration tests in terms of a probabilistic planning problem:
Definition 1 (penetration testing task)
A penetration testing task is a tuple consisting of:

a finite set of attacker propositions ,

a finite set of (probabilistic) attacker actions (cf. Definition 2),

the attacker’s initial state ,

a conjunction over attacker proposition literals, called the attacker goal, and

a nonnegative attacker budget , including the special case of an unlimited budget .
The objective in solving such a task – the attacker’s objective – will be to maximize attack probability, i. e., to find action strategies maximizing the likelihood of reaching the goal. We now specify this in detail.
The attacker proposition are used to describe the state of the attack, e. g., dynamic aspects of the network and which hosts the attacker has gained access to.
Example 2
Consider an attacker that initially controls the internet, i.e., and has not yet caused to crash, . The attacker’s aim might be to inflict a privacyloss on , i.e., , with a budget of units, which relate to the attacker actions below.
The attacks themselves are described in terms of actions which can depend on both network and attacker propositions, but only influence the attacker state.
Definition 2 (attacker actions)
An attacker action is a tuple where

is a conjunction over network proposition literals called the networkstate precondition,

is a conjunction over attacker proposition literals called the attackerstate precondition,

is the action cost, and

is a finite set of outcomes, each consisting of an outcome probability and a postcondition over attacker proposition literals. We assume that .
The networkstate precondition , attackerstate precondition and postconditions represent the conditions under which can be applied as well as the stochastic effect of the application of : holds after the application of with probability . This can be used to model attacks that are probabilistic by nature, as well as to model incomplete knowledge (on the attacker’s side) about the actual network configuration.
Because is limited to attacker propositions, we implicitly assume that the attacker cannot have a direct influence on the network itself. Although this is very restrictive, it is a common assumption in the penetration testing literature (e. g.. [23, 37, 36, 14]). The attacker action cost can be used to represent the effort the attacker has to put into executing what is being abstracted by the action. This can for example be the estimated amount of time an action requires to be carried out, or the actual cost in terms of monetary expenses.
Example 3
If an attacker controls a host which can access a second host that runs a vulnerable service, it can compromise the second host w.r.t. privacy, integrity or availability, depending on the vulnerability. This is reflected, e.g., by an attacker action which requires access to a vulnerable within the DMZ, via the internet.
In addition, needs to be under adversarial control (which is the case initially), and be available: .
The cost of this known vulnerability maybe set to , in which case the adversarial budget above relates to the number of such vulnerabilities used. More elaborate models are possible to distinguish known vulnerabilities from zeroday exploits which may exists, but only be bought or developed at high cost, or threats arising from social engineering.
There could be three different outcomes , with different probabilities: in case the exploit succeeds, in case the exploit has no effect, and if it crashes . For example, we may have , , and because the exploit is of stochastic nature, with a small (but not negligible) probability to crash the machine.
Regarding the first action outcome, , note that we step here from a vulnerability that affects integrity, to the adversary gaining control over . This is, of course, not a requirement of our formalism; it is a practical design decision that we make in our current model acquisition setup (and that was made by previous works on attack graphs with similar model acquisition machinery e. g. [37, 47]), because the vulnerability databases available do not distinguish between a privilege escalation and other forms of integrity violation. We get back to this in Section 7.
Regarding the third action outcome, , note that negation is used to denote removal of literals, i. e., the following attacker state will not contain anymore, so that all vulnerabilities on cease to be useful to the attacker.
Assume a fixed penetration testing task. Given some network state, we can now define the state space, in which attacks are computed.
Definition 3 (state space)
The state space of in the network state is the probabilistic transition system where

is the set of attacker states, or states for short. Each state is associated with the set of attacker propositions true in , and the remaining budget .

is the transition probability function, and corresponds to the application of attacker actions to states. An attacker action is applicable to a state in if the network precondition is satisfied in , the attacker precondition is satisfied in , and there is enough budget in for the application of , i. e., . The result of an outcome in is the state where contains all propositions that are contained in and all propositions whose negation does not occur in . We define . For states and action , the transition probabilities are then defined as if is applicable to and for ,^{2}^{2}2We assume here that each outcome leads to a different state. and otherwise.

is the initial state where and .

is the set of goal states, where if is satisfied in .
Viewing the state space of as a Markov decision process (MDP), an attack in for the network state is a solution to that MDP, i. e., a policy. A policy is a partial function where (1) for every where is defined, is applicable to in ; and (2) is closed, i. e., is defined for every reachable under from the initial state .
There are various objectives for MDP policies, i. e., notions of optimality, in the literature. For attack planning, arguably the most natural objective is success probability: the likelihood that the attack policy will reach a goal state.
Unfortunately, finding such an optimal policy is EXPTIMEcomplete in general [32]. Furthermore, recent experiments have shown that, even with very specific restrictions on the action model, finding an optimal policy for a penetration testing task is feasible only for small networks of up to 25 hosts [48]. For the sake of scalability we thus focus on finding critical attack paths, instead of entire policies.^{3}^{3}3Similar approximations have been made in the attackgraph literature. Huang et al. [22], e.g., try to identify critical parts of the attackgraph by analysing only a fraction thereof, in effect identifying only the most probable attacks.
Definition 4 (critical attack path)
A critical attack path in the network state is a path within the state space of in , that starts in an initial state , ends in a goal state , and maximizes among all paths from to any goal state.
In other words, a critical attack path is a sequence of actions whose success probability is maximal. We will also refer to such paths as optimal attack plans, or optimal attack action sequences. In contrast to policies, if any action within a critical attack path does not result in the desired outcome, we consider the attack to have failed. Critical attack paths are conservative approximations of optimal policies, i. e., the success probability of a critical attack path is a lower bound on the success probability of an optimal policy.
Example 4
Reconsider the outcomes of action from Example 3, . Assuming a reasonable set of attacker actions similar to the previous examples, no critical path will rely on the outcomes or , as otherwise would be redundant or even counterproductive. Thus the distinction between these two kinds of failures becomes unnecessary, which is reflected in the models we generate in Section 7 and 8. The downside of considering only single paths instead of policies can be observed in the following example. Consider the case where a second action has similar outcomes to , but while is considerably smaller than . Assuming that is the only host that can be used to reach or , an optimal policy might chose in favour of , while a critical attack path will insist on .
5 Mitigation Analysis Formalism
Finding possible attacks, e. g., through a penetration testing task as defined above, is only the first step in securing a network. Once these are identified, the analyst or the operator need to come up with a mitigation plan to mitigate or contain the identified weaknesses. This task can be formalized as follows.
Definition 5 (mitigationanalysis task)
Let be a set of network propositions, and let be a penetration testing task. A mitigationanalysis task is a triple consisting of

the initial network state ,

a finite set of fixactions , and

the mitigation budget .
The objective in solving such a task – the defender’s objective –will be to find dominant mitigation strategies within the budget, i. e., fixaction sequences that reduce the attack probability as much as possible while spending the same cost. We now specify this in detail.
Fixactions encode modifications of the network mitigating attacks simulated through .
Definition 6 (fixactions)
Each fixaction is a triple of precondition and postcondition , both conjunctions over network proposition literals, and fixaction cost .
We call applicable to a network state if is satisfied in . The set of applicable in is denoted by . The result of this application is given by the state which contains all propositions with positive occurrences in , and all propositions of whose negation is not contained in .
Example 5
Removing a vulnerability by, e.g., applying a patch, is modelled as a fixaction with , and cost .
A fixaction with , and cost may represent adding a firewall between the DMZ and the internet (assuming it was not present before, i.e., ). It is much cheaper to add a rule to an existing firewall than to add a firewall, which can be represented by a similar rule with instead of in the precondition, and lower cost.
Note that, in contrast to attacker actions, fixactions are deterministic. A sequence of fixactions can be applied to a network in order to lower the success probability of an attacker.
Definition 7 (mitigation strategy)
A sequence of fixactions is called a mitigation strategy if it is applicable to the initial network state and its application cost is within the available mitigation budget, where

are said to be applicable to a network state if is applicable to and are applicable to . The resulting state is denoted .

The application cost of is .
To evaluate and compare different mitigation strategies, we consider their effect on the optimal attack. As discussed in the previous section, for the sake of scalability we use critical attack paths (optimal i. e. maximumsuccessprobability attackaction sequences) to gauge this effect, rather than full optimal MDP policies. As attacker actions in may contain a precondition on the network state, changing the network state affects the attacker actions in the state space of , and consequently the critical attack paths. To measure the impact of a mitigation strategy, we define to be the success probability of a critical attack path in , or if there is no critical attack path (and thus there is no way in which the attacker can achieve its goal).
Definition 8 (dominance, solution)
Let be two mitigation strategies. dominates if

and , or

and .
The solution to is the Pareto frontier of mitigation strategies : the set of that are not dominated by any other mitigation strategy.
In other words, we consider a mitigation strategy better than another one, , if either reduces the probability of an successful attack to the network more, while not imposing a higher cost, or costs less than while it lowers the success probability of an attack at least by the same amount. The solution to our mitigationanalysis task is the set of dominant (nondominated) mitigation strategies.
This is similar Stackelberg games, in which a market leader moves first and is followed by one or more market followers, and thus optimises his strategy w.r.t. their best response. Stackelberg games in the twoplayer setting are frequently used in security settings [29].
It is easy to see that our notion of solutions is welldefined, in the following way:
Theorem 1
The solution to always exists, is nonempty, is unique, and is finite.
Proof.
As we assumed that all fixactions have positive cost, it immediately follows that the empty mitigation strategy is not dominated by any other mitigation strategy, and hence . is unique since it must contain all nondominated mitigation strategies. Coming to the last part, assume for contradiction that is not finite. As the number of different network propositions is finite, the number of different network states is finite as well. Therefore, there must be a network state that is reached from the initial network state by infinitely many mitigation strategies in . As all fixactions have positive cost, there must in particular be two mitigation strategies so that and , i. e., so that dominates , in contradiction to the definition of . ∎
Proven in the extended version [whatiffull].
6 Analysis Algorithms
We actually want to compute with reasonable efficiency. We thus specify how we compute critical paths and solve mitigation tasks as a whole.
6.1 Penetration Testing
We compute critical attack paths through a compilation from network penetration testing tasks to classical, deterministic, planning formalisms. The latter can then be solved using standard algorithms for finding minimalcost plans. This compilation is compromised of two steps. First, in order to get rid of stochastic action outcomes, we apply the alloutcome determinization (e. g.[53, 31]). That is, we create a deterministic action for every attacker action and stochastic outcome , where has the same precondition than , and the postcondition of . Second, to get classical planning methods output attack sequences with highest chances of success, instead of minimal cost, we encode the outcome probability as action costs: (cf.[24]). The attack resulting from the classical planning method is guaranteed to have minimal cost , and hence must be maximal, i. e., is a critical attack path.
Given this encoding, the attackplanning problem can be solved with standard planning algorithms and tools. The state of the art consists in heuristic search methods [39], which employ search guidance through heuristic functions – functions mapping states to estimates of costtogoal – to find optimal solutions quickly. In our implementation, we use an extension of the FD system [16]
, with the LMcut heuristic function
[17].6.2 Mitigation Analysis
In this section, we formally define the mitigation analysis algorithm used and the pruning techniques employed to improve performance. Finally, we show this techniques correct.
We compute the solution to a given mitigationanalysis task , i.e., the Pareto frontier w.r.t. Definition 8, using a depthoriented tree search algorithm. While a naïve implementation needs to consider every sequence of fix actions over for inclusion in the global Pareto frontier , often enough it is sufficient to consider subsets of , as most fixactions are commutative and thus the analysis invariant w.r.t. permutations. This is particularly relevant for attack mitigations, as fixes are often local and independent, however, commutativity is not always given, consider, e.g., the firewall rule discussed in Example 5. As a firewall needs to be acquired before firewall rules can be added cheaply, this imposes a constraint that we formalize in the notion of commutativity. We define commutativity on top of interference [52] which we will also need later on.
Definition 9 (interference, commutativity)
Let be a mitigationanalysis task with network propositions and fixactions , and let .

Action disables if there exists a proposition literal and or vice versa.

Actions and conflict if there exists a proposition literal such that and or vice versa.

Actions and interfere if they conflict or either disable the other. We write for the set of actions with which interferes.

Action enables if there exists a proposition and or vice versa.

Actions and are commutative if they do not interfere and not enable the other.
The interference and commutativity relations on elements from can both be computed up front. To avoid considering permutations of commutative actions, we apply a transition reduction technique based on socalled sleep sets [15, 52]. A sleep set for a sequence is a set of operators that are applicable after but skipped during search. When expanding successor actions for , we only consider applicable actions outside . Let be these actions, ordered in the same way as they are considered by the search algorithm. For each successor path , we set .
We globally maintain a) , the current bound for the cost of lowering the attacker probability to zero, in order to prune sequences that are dominated from the start, b) , a map from network states to cheapest fixaction sequences, in order to prune cases where a fixaction sequence has reached a network state in a cheaper way before, c) and , a map from network states to optimal attack action sequences, in order to spare the search for an attack action sequence if we have already saved one.
is always equal to the cost of the cheapest fixaction sequence found so far which leads to a state with zero attacker success probability, i.e. such that . Any fixaction sequence with higher cost is dominated by definition and can thus be safely pruned.
maps each already considered network state to the cheapest fixaction sequence reaching this state found so far. If is defined in the current network state and , we can stop right away and prune as well as all successors, as is more expensive than the already known sequence leading to the same network state. Even if this is not the case, but is defined, we can save an additional search in the attacker spate space.
maps each already considered network state to the computed optimal attack action sequence, i.e., if leading to was considered before, we store the corresponding optimal attack plan , . We can similarly also make use of the optimal attack plan for the parent state of . This can be done by letting be , then computing the parent state and afterwards once again using the map: . Having the parent attack plan is useful, because we can also spare an additional search in the attacker state space if is still applicable to the state space induced by the current network state .
The mitigation analysis algorithm ParetoFrontier (Figure 3) expects as arguments a network state , the corresponding fixaction sequence leading to , the sleep set for and the mitigation budget . ParetoFrontier explores the space of applicable fix sequences in a Iterative Deepening search (IDS) manner as described in Figure 2. This means we keep executing ParetoFrontier with an increasing mitigation budget until a termination criterion is satisfied. We initialize to the cost of the most expensive fixaction. We maintain the global boolean flag cut_off to indicate in ParetoFrontier that the search was cut off because of low budget. The Pareto frontier under construction is initially empty and is initially equal to . In each iteration, is increased by multiplying it with a factor . The IDS terminates if one of the following conditions holds: 1) we have already found a state with , 2) during the last call to ParetoFrontier, the search was not cut off because of low budget, or 3) we already tried the maximal budget .
procedure IDSParetoFrontier()  
1:  global: , , , , cut_off  
2:  ;  
3:  loop  
4:  cut_off false;  
5:  call ParetoFrontier(, , , )  
6:  if then return; endif  
7:  if not cut_off then return; endif  
8:  if then return; endif  
9:  ; 
procedure ParetoFrontier(, , sleep, )  
1:  let be parent of w.r.t.  
2:  
3:  if applicable to and then  
4:  ;  
5:  ;  
6:  else if and are defined then  
7:  ;  
8:  ;  
9:  else  
10:  compute and corresponding ;  
11:  endif  
12:  if then  
13:  remove all dominated by ;  
14:  add to ;  
15:  endif  
16:  ;  
17:  ;  
18:  if then  
19:  ;  
20:  return;  
21:  endif  
22:  ;  
23:  for do  
24:  ;  
25:  if then continue; endif  
26:  if then continue; endif  
27:  ;  
28:  if is defined then  
29:  if then continue; endif  
30:  if then  
31:  cut_off true;  
32:  continue;  
33:  endif  
34:  
35:  ;  
36:  ;  
37:  endfor 
Theorem 2
IDSParetoFrontier always terminates and computes such that it is equal to the pareto frontier modulo permutations on commutative fixactions for a given mitigationanalysis task .
Proof.
We will first argue about the correctness of ParetoFrontier and lastly about IDSParetoFrontier itself.
The plain ParetoFrontier algorithm without any of the optimizations would do the following: enumerate all fix action sequences within the budget, compute for every sequence the corresponding network state , check whether was already seen, compute if this is not the case and finally store it in unless it is dominated by another sequence in . The plain algorithm indeed terminates because of the duplicate check on and the finiteness of the network state space, which we already argued in the proof to Theorem 1. Finally, it computes such that is equal to , because all fixaction sequences in the budget (modulo duplicate states) are checked. It remains to explain why the optimizations conserve this fact. We will argue for each optimization step by step.

Checking the applicability of in lines 25: Consider fixaction sequence leading to network state with parent network state and plan for state space of . We can get from the map which is always correctly set in line 17. Consider as still being applicable and leading to a goal state for which is checked in line 3, is a) either an optimal plan for and or b) there is another plan optimal for with higher success probability than . Everything is fine in case 1a). In case 1b), rather gives us a lower bound for , i.e. , which is also fine. The reason is that all we want to know is whether we can add to which we only could if .

Taking from and and not further considering if in lines 68 and 2829 and: it is clearly not necessary to compute again, if we have done it already for exactly the same . It is further safe to prune if , because for all fixaction suffixes , will always be dominated by if .

Pruning fixaction sequence such that in line 25: this can be done because is clearly dominated by another sequence already in with by positivity of . is only assigned in line 19 and only if and . The latter is in turn enforced by the check in line 25 itself.

Not considering permutations of commutative fixactions and by applying the sleep set method in lines 26 and 34: we can easily derive from Definition 9 that commutativity implies for all network states , . With the sleep set method, we enforce that only one ordering of and is considered if they are both applicable in a state . Let be and , then we first call and is not in and can thus still be considered in the recursion. Later, we call and is in and not considered in the recursion such that we effectively only consider action sequences in which is ordered before . This preserves Pareto optimality because and are commutative. As a side remark: because we prune permutations of commutative fix actions, the resulting Pareto frontier can only contain at most one permutation of every subset of , even tough the permutations do not dominate each other. That is why have stated “modulo permutations on commutative fixactions” in Theorem 2.

Calling ParetoFrontier in an IDS manner in Figure 2: we observe that ParetoFrontier is always called with and in IDSParetoFrontier and recursive calls are constructed in lines 24 and 27 such that effectively for all calls to ParetoFrontier, and have the relation: . The correctness of the sleep sets is established by calling ParetoFrontier with the empty set in IDSParetoFrontier and the correctness of the construction in line 34. The only problem remaining with the IDS approach could be that we terminate the loop in IDSParetoFrontier even though is not yet complete or we do not terminate at all. In fact it does not matter with which budgets , ParetoFrontier is exactly called as long as it is called with a budget large enough such that is complete. We will argue why we only terminate if this is the case.
It is safe to terminate as soon as since increasing can never result in finding another sequence with and . Further, increasing does not change anything if there was not a single fixaction not considered because of low budget, i.e. if cut_off = false. Lastly, cannot be increased if it is already equal to .
IDSParetoFrontier guarantees termination because is increased in every iteration and will thus eventually be equal to . In case, , the algorithm will eventually come to a point where all reachable network states were expanded, line 29 in Figure 3 fires for all applicable , cut_off will remain false and finally the loop in IDSParetoFrontier terminates.
∎
Proven in the extended version [whatiffull].
6.3 Strong Stubborn Sets for Mitigation Analysis
The number of applicable fixactions branched over in a given network state , cf. line 22 of Figure 3, is a critical scaling parameter as it is the branching factor in a tree search (over fixaction sequences) where each node in the search tree contains another worstcase exponential tree search for an optimal attack plan. It is therefore highly relevant to reduce as much as possible. We next observe that, to this end, one can adapt a prominent pruning technique called strong stubborn sets (SSS), which allows the search to consider only a subset of applicable actions in each state. SSS were invented in verification and later adapted to AI planning [51, 52]; their known variants are limited to singleagent search, like the attack planning in our setting, i. e., movecountermove setups were not considered. We provide an extension to such a setup – our setup of fixaction planning vs. attackaction planning – here. Our key observation is that, where standard SSS notions identify through a subset of actions contributing to achieving the goal, we can here identify through a subset of actions contributing to disvalidating the current critical attack path.
To lower the probability of a critical attack path, it is necessary to remove at least some precondition of any of its actions. In each execution of ParetoFrontier for network state , we have a critical path . Based on this, we can define a ‘relevant’ set of propositions which is the set of negated propositions preconditioned in , i.e. . Relevant fixactions then are ones helping to render nonapplicable; specifically, we define the set of those fixactions that have an element from in the postcondition. In line with previous AI planning terminology [52], we call a disjunctive action landmark: a set of fixactions so that every applicable fixaction sequence that starts in and ends in where contains at least one action . Intuitively, a disjunctive action landmark is a set of actions at least one of which must be used to invalidate .
Now, towards identifying a subset of applicable fixactions branching over which in a network state suffices for Pareto optimality, using only in would be insufficient. This is because it is possible that no action from is actually applicable in , so we must first enable such an action. For this purpose, we define the notion of necessary enabling set , as the set of fixactions achieving a fixaction precondition not true in , i.e. .
Finally, for the definition of SSS, remember the notion of interference from Definition 9 and that is the set of fixactions with which interferes. We must also include interfering fixactions into the set of fixactions considered, because interfering actions represent alternate exclusive choices that the search needs to branch over.
Definition 10 (strong stubborn set [52])
Let be a mitigationanalysis task with network propositions and fixactions , let be a network state of , let be a critical attack path, and let . A strong stubborn set (SSS) in is an action set such that:

contains a disjunctive action landmark for in .

For each , we have .

For each , we have for the necessary enabling set of in .
The SSS computation algorithm in Figure 4 starts with the disjunctive action landmark and adds actions to the candidate set until conditions 2 and 3 are satisfied. Hence, the algorithm indeed computes a SSS. It is called in ParetoFrontier in line 22 before iterating over the applicable operators . Given the SSS , it is sufficient for the algorithm to iterate solely over the operators in instead of the complete set , while preserving the Pareto optimality of the algorithm. This statement is formally proven in Theorem 3.
procedure ComputeStubbornSet(, )  
1:  ; /* for some disj. action landmark */  
2:  repeat  
3:  for all do  
4:  if then  
5:  ;  
6:  else /* for some nec. enabling set */  
7:  ;  
8:  until reaches a fix point  
9:  return 
Theorem 3
Using only instead of in line 22 of ParetoFrontier preserves Theorem 2.
Proof.
For any state fixaction sequence and , nonempty fixaction suffixes which do not invalidate the attacker plan need not be considered, as would still be a plan for , but (by positivity of fixaction costs) would be dominated by . We thus show that for all states from which a cheapest fixaction sequence leading to a state where is not applicable anymore and consisting of actions exists, contains an action starting such a sequence. A simple induction then shows that ParetoFrontier restricted to is Pareto optimal. The rest of the proof follows that of [1, Theorem 1].
Let be a SSS computed by Alg. 4 and be a cheapest sequence for invalidating . Since contains a disjunctive action landmark for the propositions preconditioned by , contains an action from . Let be the action with smallest index in that is also contained in , i.e., and . Then:

: otherwise by definition of SSS, a necessary enabling set for would have to be contained in , and some action from would have to occur before in to enable , contradicting that was chosen with the smallest index.

is independent of : otherwise, using and the definition of SSS, at least one of would have to be contained in , again contradicting the assumption.
Hence, we can move to the front: is also a sequence for invalidating . It has the same cost as and is hence a cheapest such sequence. Thus, we have found a cheapest fixplan of length started by an action , completing the proof. ∎
Proven in the extended version [whatiffull].
7 Practical Model Acquisition
The formalism and algorithm introduced in the previous sections encompass a broad range of network models. In this section, we describe a highly automated approach to acquire a particular form of such network models in practice, demonstrating our method to be readily applicable. The general workflow is similar to MulVAL [38], which integrates machinereadable vulnerability descriptions and reports from network vulnerability scanners such as Nessus to derive a simple logical model specified in Datalog. Our workflow follows the same idea, but in addition we incorporate possible mitigation actions described in a concise and general schema. Moreover, our formalism considers the probabilistic/uncertain nature of exploits.^{4}^{4}4 Code is available at http://upload.soundadl.bplaced.net/whatif.zip