Network interdiction is typically posed as a one- or multi-stage decision-making problem in a network with two decision-makers, an interdictor and an evader. The evader traverses in the network (e.g., between two fixed nodes, a source and a destination), while the interdictor aims to disrupt to the maximum possible extent (or completely stop) the evader’s movement. Network interdiction forms a rather broad class of deterministic and stochastic optimization problems with applications mostly arising in the military, law-enforcement and infectious disease control contexts, see surveys in smith2008algorithms ; smith2013modern and the references therein.
Perhaps, the most well-known and studied network interdiction problem is the shortest path interdiction problem 5 , where the interdictor seeks for a set of arcs whose removal subject to some budgetary constraint, maximizes the cost of the shortest path between two specified nodes. That is, the evader is assumed to move along the shortest path between these nodes. If the interdictor is allowed to block at most arcs, then this problem is often referred to as the -most vital arcs problem 11 . The shortest path interdiction problem is known to be -hard 11 . Nevertheless, rather effective exact solution approaches for various versions of the network interdiction problem exist in the literature, see, e.g., 5 ; sullivan2014exact .
A number of studies consider the network interdiction problem in stochastic settings cormican1998stochastic ; 3 ; janjarassuk2008reformulation ; 9 , where either the outcomes of interdiction actions are uncertain, or there is uncertainty with respect to the evader’s actions. An interesting dynamic deterministic version of the problem has been recently considered in sefair2016dynamic , where the evader can dynamically adjust her†††Note that in the remainder of the paper we refer to the interdictor and the evader as “he/his” and “she/her,” respectively. movement at every node of her path in the network by observing the interdictor’s actions. The interdictor, in turn, can interdict arcs any time the evader arrives at a node in the network. It is also assumed that the interdictor has a limited interdiction budget.
The current study is motivated by a recent work of Borrero et al. in 1 . Specifically, in their model the interdictor and the evader interact sequentially over multiple time periods. In each time period, the interdictor can block at most arcs only for the duration of the current period, while the evader is assumed to be greedy, i.e., in each time period the evader traverses along the shortest path between two fixed nodes in the interdicted network.
The key feature of the model in 1 is that the interdictor has incomplete initial information about the network, including its structure and costs, but learns about the network structure and arc costs by observing the evader’s actions. In particular, it is assumed that the feedback is deterministic and perfect that is, the interdictor learns about the existence and the exact costs of the arcs used by the evader in the previous time periods. The quality of an interdiction policy is measured using either cumulative regret or time stability. The former is defined as the difference in the total cost incurred by the evader under the current interdiction policy against the policy of an oracle interdictor with prior complete knowledge of the network. Clearly, the oracle interdictor implements an optimal solution of the -most vital arcs problem in each time period. Time stability is defined as the number of time periods that are necessary for the policy to gain sufficient amount of information in order to implement a solution of the -most vital arcs problem that is also optimal in the full information network for the remainder of the time horizon.
The main results of 1 can be summarized as follows. First, it is shown that for their deterministic setting, in general, there do not exist policies that perform better than any other policy for any graph consistent with the initial information available to the interdictor. Thus, the focus of 1 is on the greedy and robust interdiction policies. They are greedy because they block a set of the -most vital arcs from the network known to the interdictor in every time period; they are robust because whenever the exact cost of the arc is not known to the interdictor, then the policies assume the worst-case scenario for the evader. These policies turn out to be “efficient” in the following sense: (i) they eventually find and maintain an optimal solution to the -most vital arcs problem in the full information network (i.e., an optimal solution of the oracle interdictor), within a finite number of time periods (possibly, instance dependent); and (ii) this class of policies is not dominated that is, for any possible instance of the initial information available to the interdictor and any policy that is not greedy and robust, there exists a greedy and robust policy that is strictly better (with respect to either the cumulative regret or time stability) than the aforementioned non-greedy and/or non-robust policy for some graph that is consistent with the initial information available to the interdictor.
Property (i) of greedy and robust policies also implies that they have a finite regret. Furthermore, these policies detect when the instantaneous regret reaches zero in real time, i.e., when an optimal solution of the oracle interdictor is found. Finally, in addition to these attractive theoretical properties, the results of computational experiments in 1 also confirm the superiority of greedy and robust interdiction policies against several other benchmark policies.
Note that in 1 , similar to the vast majority of the related interdiction literature, the authors focus on the interdictor’s perspective. However, given that the outlined greedy and robust interdiction policies are very simple to implement (which is important for their applicability in practical applications) and have attractive theoretical properties, it is natural to explore the evader’s perspective. In particular, if we assume that the interdictor is greedy and robust, what are good strategic policies for the evader against greedy and robust interdiction policies and how can they be constructed? These research questions form the main focus of the current study.
In particular, we consider the deterministic setting similar to 1 with an additional modification. Specifically, with respect to the initial information available to the interdictor, in 1 it is assumed that for some arcs the interdictor is aware only about some upper and lower bounds on their costs. In contrast, we assume that whenever the existence of the arc is known to the interdictor initially, then its cost is also known, which in fact, is a less favorable scenario for the evader, whose perspective we consider. Furthermore, recall that in the considered setting the feedback is perfect that is, whenever the arcs is used by the evader, its existence and precise cost information is revealed to the interdictor. Therefore, under our rather mild assumption on the initial information available to the interdictor, the greedy and robust interdiction policies of 1 reduce to simply the greedy ones.
Our formulation of the evader’s problem can be viewed as a particular case of online combinatorial optimization problem with sleeping experts14 , which is a generalization of the classical multi-armed bandit formulation (see, e.g., 16
). The problem with stochastic loss functions and adversarial availability of actions is discussed in20 ; 15 . The evader’s problem coincides with the online adversarial shortest path problem 20 . Nevertheless, uniform mixing assumption 20 does not hold for deterministic strategies and the notion of regret (for example, per action regret 14 ; 17 ) compares an arbitrary policy with a policy of using the best ranked action in the hindsight, i.e., with the greedy evasion policy. A distinctive feature of our setting is that the evader’s actions determine the information collected by the interdictor and, thus, influence the structure of the setting.
In view of the discussion above, the contribution of this paper (and its remaining structure) can be summarized as follows:
In Section 2 we formulate the evasion model under the assumption that the interdictor follows a greedy interdiction policy. In the deterministic setting with perfect feedback, this model can be viewed as a sequential bilevel combinatorial optimization problem.
In Section 3, we show that the evader’s problem is -hard even for the setting with two time periods as long as there are no restrictions for the initial information available to the interdictor. Note that if there is only one time period, then under our assumptions further discussed in Section 2 the evader’s problem coincides with the shortest path problem.
In Section 4, we provide theoretical analysis of evasion policies in the setting with two time periods, where the interdictor has no initial information about the network arcs. We show that under some mild assumption, it is optimal for the evader to either follow a greedy policy or seek for two arc-disjoint paths with minimal total cost. The latter problem is known to be polynomially solvable, which implies that if the interdictor has no initial information about the network arcs, then the evader’s problem is polynomially solvable for .
In Section 5, we exploit these theoretical properties to develop a heuristic algorithm for the strategic evader in a more general setting with an arbitrary time horizon and no restrictions on the initial information available to the interdictor. In Section 6, we perform computational experiments that demonstrate that the proposed heuristic consistently outperforms the greedy evasion policy on several classes of synthetic network instances.
Finally, Section 7 concludes the paper and outlines promising directions for future research.
Notation. In the remainder of the paper we use the following notation. Let be a connected weighted directed graph, where and denote its sets of nodes and directed arcs, respectively. Denote by a nonnegative arc cost associated with each arc . For we define a subgraph of induced by this subset of arcs as .
We assume that there are
time periods (epochs), where. In every time period, the evader traverses between two specified nodes in , which are referred to as the source and destination nodes and denoted by and , respectively. Let be a set of all simple directed paths from to in . Hence, path is given by a sequence of arcs , which we denote by for convenience. Then is the cost of path , that is . Define as the cost of the shortest path from to in , i.e., .
2 Mathematical Model
We consider a sequential decision-making process, where at each time an evader and an interdictor interact. The evader has full information about the underlying network, while the interdictor has limited information about its structure and costs. In particular, we assume that the interdictor initially observes a subnetwork of the given network , i.e., he is informed only about the existence of arcs in along with their costs . Let
where we refer to as the initial information available to the interdictor, as it contains his initial knowledge about the structure and costs of the network. We assume that the set of nodes is known to the interdictor upfront including the evader’s source and destination nodes and , respectively.
Then in each time period the following sequence of events takes place:
The interdictor chooses set of at most arcs to be blocked for exactly one period.
The evader traverses along path . We refer to as the evader’s instantaneous loss. The evader also reveals the arcs in and their costs to the interdictor.
The interdictor updates the information available to him, i.e., .
We make several assumptions with respect to the construction of the evader’s problem:
A1. In every time period the interdictor acts first. Furthermore, the interdictor is greedy in the sense that he always blocks a set of -most vital arcs in the observed network, i.e.,
A2. The network is not trivially -separable, i.e., any subset of arcs in is not an - cut.
A3. If there is more than one possible choice for , then the interdictor blocks arcs following a well-defined deterministic rule, which is consistent in the sense that if is chosen from a collection of blocking solutions , then it is also chosen from any collection of solutions containing .
A4. The evader has full information about the graph’s structure and costs. The evader observes the interdictor’s actions before choosing a path and cannot use interdicted arcs.
A5. The interdictor is initially given information only about subnetwork . Each time period he observes path and cost of each arc used by the evader, i.e., , .
Assumption A1 is motivated the fact that the interdictor has only limited information about the network. As we discuss in details Section 1, the study in 1 establishes a number of attractive and practically relevant features of greedy interdiction policies. Assumption A2 is technical as it ensures that the evader’s problem is feasible at every time period.
Assumption A3 implies that the interdictor’s policies are deterministic. The consistency assumption mimics an analogous assumption in 1 for the evader’s policies. For example, one can think that in each time period the interdictor ranks all feasible blocking solutions in the observed network based on some criteria, e.g., their costs to the evader, resolving ties according to any deterministic criteria. Then the interdictor selects the highest-rank blocking solution from such a list.
Assumption A4 implies that the evader has some degree of monitoring of the interdictor’s actions. As pointed out in 1 , this assumption can also be interpreted in the context of repeated interactions in a stochastic setting, where such monitoring might arise naturally from the evasion decisions by trial and error in multiple time periods. Assumption A5 represents the case of the perfect (or transparent) feedback (from the evader to the interdictor), which is common in the learning theory literature, see, e.g., 23 . This assumption is also made in 1 .
Finally, recall that the interdictor has full information about the costs of arcs in . Thus, A5 implies that whenever existence of the arc is known to the interdictor at any time period, then he is also aware of its cost.
In view of the discussion above, the evader’s problem can be formulated as the following -level combinatorial optimization problem:
where condition (2b) ensures that does not include arcs, which are blocked by the interdictor at time period . Constraint (2c) requires to be a set of -most vital arcs in by assumption A1. Then (2d) states that a set of arcs known to the interdictor at time period is updated according to assumption A5.
Next, we provide formal definitions of evasion and interdiction policies along with some illustrative examples.
An evasion policy is a deterministic sequence of set functions such that for each , , , where summarizes the initial information as well as the history of the interdiction and evasion decisions up to time :
The evader is referred to as greedy if she chooses the shortest path in the interdicted network in each time period. The corresponding evader’s solution is referred to as greedy. Otherwise, the evader is considered to be strategic.
We denote by and , , the evasion decisions by the greedy and strategic evaders, respectively. Given and , define the cumulative loss of the evader under evasion policy over time periods as follows:
Let be a class of all feasible evasion policies for and . Given , , and , we say that evasion policy strongly dominates policy if .
The interdictor is referred to as a greedy semi-oracle, if he has complete knowledge of the evader’s policy and whenever the -most vital arcs problem in , i.e., problem (2c), has multiple optimal solutions, then the interdictor selects the one that is least favourable for the evader at time . More precisely, he chooses a set of the -most vital arcs in so as to maximize the evader’s loss, , in period under policy . Furthermore, among these solutions the one that maximizes the cardinality of is preferred.
Simply speaking we assume that a greedy semi-oracle has full information about the network’s structure and costs, but restricted to act in a greedy fashion blocking a set of the -most vital arcs in the currently observed network. However, given his knowledge of he can anticipate the evader’s actions and thus, select solutions that are preferable for him. Furthermore, in Definition 3 we assume that the blocking decision of a greedy semi-oracle is always myopic since it takes into account only the evader’s decision in the current time period.
The concept of a semi-oracle is introduced in 1 . We provide an additional requirement that the interdictor maximizes as his auxiliary objective. Note that for the evader’s problem in period , i.e., problem (2) with and , the concept of the greedy semi-oracle corresponds to the pessimistic version of the bilevel problem that is, whenever the lower-level decision-maker (the interdictor) has multiple optimal solutions, then he prefers the one that is least favorable to the upper-level decision-maker (the evader); see Colson2007 for further details on bilevel optimization.
Next, we provide two illustrative examples comparing the greedy evader against a strategic one. These examples provide us with further motivation of exploring the structural properties of strategic evasion policies discussed in Section 4. In the examples, we assume that the interdictor is a greedy semi-oracle.
Example 1 (see Figure 1).
Graph used in this example, is provided in Figure 1. Let and be the evader’s source and destination nodes, respectively and be a real number such that . We also set , and .
First, we assume that the evader is greedy. As , then we have and the greedy evader follows the shortest path from to given by . Note that any subset of arcs of is a set of the -most vital arcs in , where . Recall that the interdictor is a greedy semi-oracle and thus, he blocks arcs and in order to maximize . It implies and . Consequently, the cumulative loss of the greedy evader is .
Consider a strategic evader, who traverses and sequentially. Note that and the cumulative loss of the strategic evader is given by .
Example 1 illustrates that if we evader is aware that the interdictor is greedy, then she can exploit this fact to decrease her cumulative loss. Furthermore, observe that the paths used by the strategic evader have some arcs in common with the shortest path. In Section 4 we formally establish that this observation is, in fact, a necessary condition for the strategic evasion policy whenever it outperforms the greedy one under the assumption that , and . Also note that, if is sufficiently large, then for any given constant we can guarantee that . Thus, the greedy evasion solution does not approximate the optimal solution with any constant factor.
Nonetheless, one may check by a straightforward calculation that in Example 1 the greedy policy turns to be optimal for any . In Example 2 provided below we demonstrate that the greedy evasion policy can be dominated by a strategic policy for arbitrarily large , while does not necessarily need to be empty.
Example 2 (see Figure 1).
As in the previous example consider the graph depicted in Figure 1. In contrast to Example 1, we only change in a particular way and assume that is arbitrarily large, i.e., . More precisely, let be non-empty and given by .
First, we have regardless of the evasion policy as the interdictor acts first. Then the greedy evader traverses along path . Next, at the interdictor blocks , which is a set of the -most vital arcs in . It implies that . Furthermore, , for all . Thus, the cumulative loss of the greedy evader is given by for any .
For a strategic evader, assume that she traverses through at . Then and . Consequently, and all . It implies that the cumulative loss of the strategic evader is . Therefore, for any we have:
and thus, the greedy evasion policy is suboptimal for arbitrarily large values of parameter .
3 Computational Complexity
In this section we show that the evader’s problem is -hard in the case of even for instances where the interdiction problem is polynomially solvable. Furthermore, we prove that it is in -hard to distinguish instances where the evader’s regret is from those where it is strictly positive. The latter proposition rules out any type of approximation algorithm for the evader’s problem. Henceforth, we assume that the interdictor is a greedy semi-oracle.
Observe that the evader’s problem in the case of can be solved efficiently whenever the -most vital arcs problem in admits a polynomial time algorithm. However, in general the -most vital arcs problem is known to be -hard 5 and, thus, checking feasibility of an evasion solution is already -hard. Therefore, an interesting question is to determine complexity of the evader’s problem in the cases when the interdiction problem is “easy” to solve. We make the following remark.
Alternatively, the evader may have access to an interdiction oracle which provides the optimal blocking decision in the network currently observed by the interdictor. The latter implies that the evader’s problem with coincides with the shortest path problem in the interdicted network, which is known to be polynomially solvable Ahuja . However, the existence of such an oracle is somewhat impractical, unless the evader makes her decisions sequentially given the oracle’s response as a feedback.
In our complexity reduction below we assume that arc set consists of two disjoint subsets, namely, the arcs that are either removable or unremovable by the interdictor, respectively. The notion of unremovable arcs is technical and made without loss of generality as we can make all arcs removable by a polynomial time modification of the original graph. Specifically, we can simply replace each unremovable arc by parallel arcs of equal costs. This construction guarantees that after the removal of at most arcs, at least one of these arcs remains intact and thus, nodes and remain connected by a directed arc. Alternatively (e.g., if parallel arcs are not allowed), we can replace each unremovable arc by paths of length two by adding and of new nodes and arcs, respectively, which results in the same outcome.
Next, we define the classical 3-SAT problem, which is known to be -complete GJ :
Instance: collection of clauses on a finite set of variables , , such that for .
Question: is there a truth assignment for that satisfies all the clauses in ?
Boolean formula is satisfied under assignment if and only if each clause is true. Clause is true if and only if it contains either literal such that or literal such that .
We also define the decision version of the evader’s problem (EP) for :
Instance: network together with source and destination nodes, subset of arcs known to the interdictor, the interdiction budget and threshold .
Question: is there two paths of total cost at most that can be traversed sequentially by the evader given that the interdictor is a greedy semi-oracle?
The proof of our main result below is based on the reduction from the 3-SAT problem, where for any instance of 3-SAT we construct a particular instance of the 2-EP problem. Following the discussion above we construct the instance of 2-EP such that feasibility of any evasion solution can be checked in polynomial time with respect to the number of arcs, i.e., the -most vital arcs problem in both and is polynomially solvable. Our construction is inspired and similar to the one used in 18 , where it is shown that the problem of finding two minimum-cost arc-disjoint paths with non-uniform costs (e.g., changing over time or type of flow) is strongly -complete. However, our problem setting requires a somewhat different arc cost structure and the use of unremovable arcs defined at the beginning of this section.
Specifically, given boolean formula , let be the number of occurrences of variable in . For each variable we construct a lobe as illustrated in Figure 2.
The lobes are connected to one another in series with and . Recall that and are source and destination nodes, respectively. For each clause , we add two nodes , , together with arcs , , and of cost . Finally, to connect clauses to variables we add the following arcs with zero costs: and , if the -th occurrence of variable is the literal , which is a literal in clause ; and , if the -th occurrence of variable is the literal , which is a literal in clause . We refer to Figure 3 that illustrates the constructed graph for .
Problem 2-EP is strongly -complete for the class of instances where the interdiction problem is polynomially solvable.
Consider a “yes”-instance of 3-SAT. Thus, there exists assignment such that boolean formula is satisfied under . Assume that we construct a graph for this instance as outlined in the discussion above. Next, let , where is the set of unremovable arcs, and is such that , where .
Observe that , since the interdictor is a greedy semi-oracle. Actually, is sufficiently large to block all arcs in and interdiction of maximizes the cost of the shortest path in , i.e., .
At let the evader choose path that is constructed in the following way. It traverses through the lower part of the -th lobe, if , and it traverses through the upper part, if . Observe that . The information collected by the interdictor is updated and thus, . Note that set of arcs consists of path together with parallel unremovable arcs and set that contains distinct arcs that do not form a path from to . Furthermore, any evasion decision at does not include arcs in and goes through all the lobes. We conclude that the interdiction problem in both and is polynomially solvable.
The presence of unremovable arcs of sufficiently large costs that are parallel to arcs in enforces the interdictor to remove arcs in the order of their costs. That is he blocks arcs of zero cost first and then blocks arcs of unit cost. Recall that . Therefore, the blocking decision at consists of at least arcs of including all arcs with zero cost. We conclude that and due to the fact .
Then there exist at least un-blocked subpaths or that correspond to the variable assignments in . Together with arcs and they form path that is arc-disjoint with by their construction and can be traversed by the evader at . The cumulative loss of the evader equals . Hence, the answer to the evader’s problem is “yes.”
Consider a “yes”-instance of the evader’s problem with , and is such that . Then there exist two paths that can be traversed sequentially by the evader and their total cost is at most .
Observe that the choice of does not depend on the evader’s actions. Path goes through either the upper or the lower part of each lobe. Therefore, path in the evader’s decision has cost , while has zero cost. Furthermore, the fact that implies that is arc-disjoint with .
Next, we note that as the cost of is zero, then it needs to contain arcs and . Furthermore, needs to traverse through arcs , and each lobe through the zero cost arcs. We construct assignment in the following way: if traverses through the lower part of the -th lobe, let and, if it traverses through the upper part, then let . According to this assignment due to the existence of each clause , , is satisfied, which implies the necessary result. ∎
Consider the class of instances of the problem 2-EP where the interdiction problem is polynomially solvable. Then any polynomial time heuristic has no fixed worst-case bound, unless .
Consider a modification of the network structure used in the above-mentioned reduction from the 3-SAT problem. Specifically, we modify the graph corresponding to a 3-SAT instance in the following way: each arc with a unit cost is replaced by an arc with a zero cost. Furthermore, we add an unremovable path . Assume that all arcs in have cost . We refer to Figures 4 and 5 that illustrate the constructed graph for . First, we show that Theorem 1 holds true for the modified graph, if we change in a particular way.
Consider a “yes”-instance of 3-SAT. Thus, there exists assignment such that boolean formula is satisfied under . Assume that we construct a graph for this instance as outlined in the discussion above. Let be the set of all unremovable arcs and define set of arcs as follows. Assume that consists of arcs in the upper part of the -th lobe, if , and of arcs in the lower part of the -th lobe, otherwise. Consider the evader’s problem with , , , and .
Note that the blocking decision in the first round is given by since the interdictor is a greedy semi-oracle. We define path as in the proof of Theorem 1 and observe that ; see Figure 4, for example. Furthermore, the interdiction problem in and can be solved in polynomial time with respect to any feasible evasion decision at .
Then we show that with blocking decision in the second round satisfies and . Actually, if the interdictor blocks and all arcs in , then the cost of the shortest path in the currently observed network equals . If we choose so that inequality holds, then satisfies and forms a set of -most vital arcs in the network currently observed by the interdictor. Specifically, if any arc between is blocked, then the cost of the shortest path is strictly less than . We form path following the proof of Theorem 1. Then and . We conclude that the evader’s problem has answer “yes”.
Consider a “yes”-instance of the evader’s problem with , , , and , where is the set of all unremovable arcs and contains either arcs in the upper part of the -th lobe, or arcs in the lower part of the -th lobe.
We conclude that there exist two paths, i.e., and , that can be traversed sequentially by the evader with total cost at most . Note that the interdiction decision in the first round does not depend on the evader’s actions and is given by . Furthermore, observe that path goes through each lobe by construction of set .
Following our discussion above, we conclude that and thus, is arc-disjoint with . Observe that as the cost of is zero, then it needs to contain arcs and . Furthermore, needs to traverse through arcs , and each lobe through the zero cost arcs. Then a truth assignment can be constructed precisely as in the proof of Theorem 1.
The key observation is that the optimal objective value of the evader’s problem is zero, while any nonoptimal solution has value of at least . So, for any given constant the existence of a polynomial time -approximation algorithm for the evader’s problem implies that the corresponding 3-SAT problem can be solved in polynomial time, that contradicts with . It concludes the proof. ∎
We conclude that the evader’s problem is notoriously hard even in the case of assuming that the interdiction problem is polynomially solvable. Moreover, we cannot construct a polynomial time heuristic with a fixed worst case bound. The latter motivates us either to explore analytical properties of the optimal evasion policies in a simple case, i.e., and , or propose a heuristic algorithm that takes into account alternative solutions and outperforms a naive greedy approach; see Examples 1 and 2 in Section 2.
4 Analysis of Greedy Policies
In this section we analyse the case when there are two time epochs and the interdictor has no initial information about the network, i.e., and . Under some mild assumption we show that the optimal solution of the evader’s problem in this case is either greedy, or consists of two distinct paths that intersect with the overall shortest path. Furthermore, in the specific case of we prove that the greedy evasion policy is optimal.
In the remainder of this section for simplicity of exposition we assume that the costs of all possible paths from to are distinct. Thus, we can enumerate them in the strictly increasing order of their costs, i.e.,
where . Denote by the index of path in the above ordering.
Let , , and assume the evader is greedy. Then, for any
path is blocked at by the interdictor.
Remark 2 is implied by the definition of a greedy evader. Furthermore, since and , we have . Next, we formulate necessary conditions for an evasion solution to be optimal in the case of and .
Assume that the interdictor is a greedy semi-oracle with and . Let and , . If and is an optimal solution of the evader’s problem for , then either and , or and satisfy the following conditions:
First, suppose that the greedy policy is optimal. As , then by definition of we have that the greedy evader follows and sequentially. Hence, and .