I Introduction
Resilience is a critical concern for communication networks that are deployed in support of cloud systems. However, the recent trend towards the virtualization of network equipment and functions potentially introduces new fragility into such systems due to layering [1].
Many studies reveal the fragility unique in layered networks [2, 3]. For example, a network system may be realized by the combination of virtualized functions and infrastructure (physical) nodes. The nodes in the infrastructure layer host some functions including the orchestrator function that manages the life cycle of virtualized functions and the mapping between the two layers [4]. Therefore, the functionality of the orchestrator function depends on the infrastructure node hosting it; at the same time, it is necessary to have a working virtualized orchestrator to manage the physical computation resources on every infrastructure node. This interdependency between two layers results in increased fragility.
Furthermore, the interdependency has an influence on recovery decisions after a massive failure. After massive failures, it is critical to start providing necessary connections as soon as possible, even when available resources, such as manpower or backup equipment, to repair the system is limited. The prioritization of specific connections or services are wellstudied in [5, 6, 7] for single layer networks. However, this prioritization becomes more complex when there is interdependency between layers, since the role of each node is determined not only by the topology of a network but also by the interdependency [8, 9]. The following example characterizes the inherent complexity of the problem.
Let us consider an example illustrated in Fig. 1. The network consists of two constituent layers, which represent a virtualized function layer and an infrastructure layer . Each server on can host one function . Suppose that either or hosts a virtualized orchestration function among the four servers; i.e. or can be an orchestration function. As explained above, at least one orchestration function needs to be available for servers to be functional. The demand of each server shows the amount of resources needed to repair it.
Our problem is to determine the recovery order of the servers, considering the number of functions available during the recovery process. Here, the following two recovery orders are compared in terms of the total number of functions available over recovery time steps: and . For simplicity, it is assumed that only one unit of resource is available at each time step .
0  1  1  1  2  3  4  
0  0  0  0  3  3  4 
Table I describes the number of available functions at each time step when following each recovery order. Note that an integer in each cell represents the the number of functions available (utility) at the time step. For instance, in , we first recover and obtain 1 available function (utility) at , since it takes two steps to satisfy the demand of the node. A recovered node stays functional until the last step and continues providing the same utility at every step after the step in which it was recovered. Therefore, the computation capability at and is 1, as there are no other nodes recovered during these steps. Since is recovered after three steps, another unit of utility is added at . In , the interdependency between the virtualized function layer and the infrastructure layer plays an interesting role in the recovery process. Even though sufficient resources are assigned to and in the first two steps, the utility remains 0 until is recovered. This is because the two nodes () cannot receive the orchestration messages due to the unreachability to , which is an orchestration function. Hence, the total utility jumps to 3, once is recovered at . As a result, the total utility over time of is 12, while the total utility of is 10.
Hence, the total utility available during recovery is different depending on which recovery order we adopt. Motivated by this simple example, the question addressed in this paper is the following. How do we find a recovery order that maximizes the accumulated utility during the recovery process in networks with interdependency between layers? This problem is a variant of the progressive recovery problem [5], which aims at maximizing the amount of flows going through a network during the recovery process. However, the fundamental difference lies in the consideration of the interconnectedness between nodes in different layers.
In order to answer this question, we first prove that the progressive recovery problem with a general graph always has an equivalent progressive recovery problem with a simpler graph. Additionally, the NPhardness of the simpler problem is shown, which implies that the general case cannot be solved in polynomial time. We then propose a Deep reinforcement learningbased algorithm for Progressive Recovery (DeepPR) to solve the problem. Deep reinforcement learning seems suitable for this problem, because it is easy to calculate the total available computation capability when given a recovery order, as seen in the previous example, even though determining the optimum order is intractable.
Ii Related Works
Pioneering work [5] on the progressive recovery problem focuses on determining the recovery order of communication links that maximizes the amount of flows on the recovered network with limited resources. As an extension, the work [6] proposes node evaluation indices to decide the recovery order to maximize the number of virtual networks accommodated. Considering the necessity of monitoring to observe failure situations, the joint problem of progressive recovery and monitor placement is discussed in [7].
recovery datacenter [10] (reachability to contents) prediction of cascade by learning  [11] TCOM opt algo for convex [12] analytical solution
Buldyrev et al. in [13]
The fragility induced by dependency between network layers has been pointed out in the context of interdependent network research [2, 3, 14]. In particular, the interdependency between virtualized nodes and physical nodes in optical networks is considered in [2]. A similar dependency caused by VNF orchestration is discussed in [3].
The works in [15, 16, 17] analyze the behaviors of failure propagations in such interdependent networks when each node performs local recovery (healing), where a functioning node substitutes for the failed node by establishing new connections with its neighbors.
Progressive recovery problems in interdependent networks have been discussed in [18, 8, 9, 19]
. Classifying the progressive recovery problems by the types of interdependency, the work
[18]proposes the optimum algorithm for a special case and heuristics for other cases. ILP and DPbased algorithms are employed to solve a variant of the progressive recovery problem in
[8].Iii Model
Iiia Network Model
A network, which consists of virtulized functions and infrastructure nodes hosting the functions, is modeled by an interdependent network that is formed by two constituent graphs , which correspond to the virtualized orchestration function layer and the infrastructure node layer . A pair of nodes in different constituent graphs can be connected by an arc representing their dependency relationships: . Edges in are called intraedges because they connect pairs of nodes in a constituent network. In contrast, arcs in are called inter or dependency arcs. An arc indicates that a node has dependency on a node . The node is called a supporting node, and is a supported node.
Two node attribute functions are defined to capture the characteristics of each node: demand and utility functions. The demand function represents how many resources needs to be assigned to fully recover a given node. This demand can be interpreted as the cost or manpower to repair a specific node in the context of recovery problems. The utility function indicates the computational capability of a given node, such as the number of functions it can host, when it is fully recovered.
IiiB Network Failure and Progressive Recovery Plan
When a network failure event occurrs at time , some nodes in the network become nonfunctional. Let denote a set of nonfunctional nodes at time . With this notation, the nonfunctional nodes right after the failure are represented as . A failure is represented by a node set in this paper, because any failure of an edge can be converted to a node failure by replacing the nonfunctional edge with a nonfunctional node and two functional edges .
In progressive recovery scenarios, we receive a limited amount of resources at each time step after a failure. The resource function indicates the amount of the repair resources available at time .
A progressive recovery plan is an assignment of the available resources to the nonfunctional nodes. Formally, is a matrix whose entries indicate the amount of resources assigned to a specific node at a specific time. Because of the limitation on the available resource amount, for every .
During the recovery process, nodes can be classified by two measures: the amount of resources assigned to the node and the functionality of the node. A node is saturated when it has received enough recovery resources: . Let denote a set of saturated nodes at time . A node is said to be functional if and only if it is (1) saturated and (2) reachable from at least one saturated supporting node in the other constituent graph via a simple path consisting of functional nodes. When a node is functional at time , the node state function ; otherwise . A node is recovered at only when it becomes functional by assigning . In real networks, a nonfunctional saturated node can be interpreted as either an infrastructure node unreachable from an orchestration function or a virtualized function that is hosted on an infrastructure node that is nonfunctional.
A resource assignment at each step is called a splitting assignment when it prevents any nodes from saturation or recovery, even though there exists a node that can be saturated or recovered at . Contrarily, a concentrating assignment saturates or recovers some node if possible, and provides all the extra resources, which cannot saturate nor recover any node, to one unsaturated node.
Iv Problem Formulation
This section formulates the progressive recovery problem in interdependent networks, and discusses and theoretically proves some properties of the problem.
Iva The Problem and Special Cases
The progressive recovery problem is to find a recovery plan represented by a (time step node)matrix that maximizes the sum of utility provided by functional nodes during the recovery.
Problem 1.
Progressive Recovery Problem (PR): Given a graph , a demand function , an utility function , a set of initially failed nodes , and a resource function , maximize the networkwide utility by deciding a resource assignment matrix .
For clear representation of special cases, the problem is characterized by a fivetuple . When or return the same constant for any node, the function is called nodeinvariant. Similarly, is timeinvariant when the amount of available resources does not change over time.
A simpler case of the problem is one in which it is assumed that the functionality of virtualized functions totally depends on the functionality of a physical server hosting the function. In other words, there is no need for recovery (resource allocations) to repair virtualized functions, since the unavailability of the functions occurs only due to the loss of physical servers hosting them. In our terminology, when virtualized function nodes are nonfunctional, they are always saturated.
The interdependency between the virtual and physical layer still exists even with the above assumption, since any physical machine needs at least an indirect connection with a virtual control function. Obviously, a virtual function needs at least one physical machine, which can host it, to be functional.
Definition 1.
A graph in the progressive recovery problem is said to be onelayered when nodes in never require repair resources to be functional. In other words, nodes in are nonfunctional only because the loss of supporting nodes in the other constituent graph: for any node ).
Problem 2.
Onelayered star case (StarPR): Assume that the graph topology is a star whose nodes are in , except for the center node ; also, each node is biconnected with : .
Problem 3.
Onelayered rooted tree case: Extend Problem 2 by adding more nodes to that are not adjacent to the node in . i.e., the graph is a tree rooted at the node in : .
IvB Intractability
Definition 2.
TimeInvariant Incremental Knapsack Problem (IIK) [22]: Let denote a set of items, which each have value and weight . For any subset of , the value and weight are defined as follows: , and . IK is to find a sequence of subsets of , from time 1 to that maximize subject to , where is the available capacity of the knapsack at time . Note that IK is known to be NPhard.
Theorem 1.
The onelayered star case (StarPR) is NPhard.
Proof.
What needs to be shown is .
Given an instance of IIK, an instance of StarPR is constructed as follows. We construct a graph with ’s that corresponds to each item and a special node . Edges are added so that each is adjacent to : . Formally, . The set of failed nodes consists of ’s. The demand and utility functions are defined using the given weight and value functions, respectively. The available resource function value for time is defined by the given capacity function . This conversion is obviously executed in polynomial time.
Clearly, IIK reaches the optimum if and only if StarPR reaches the optimum, since the objective functions of these two problems are identical with the settings above. The progressive property of StarPR, which accumulates utility over time, is inherited in the property of IIK solutions that . ∎
Therefore, the PR problem is, in general, a NPhard problem. This proof also implies that the intractability of a progressive recovery problem changes, depending on the , , and functions. The work [18] provides a polynomial time optimum algorithm for the onelayered star case (Case 1 in [18]) with and , where is a constant. This means that is in class P.
IvC Relations among PR with Different Topology
This section first characterizes the optimum recovery plan in special types of graphs (onelayered graphs). Also, it is proven that the optimum recovery plan of a general network topology shares the same property with that of onelayered graphs, by showing the conversion of the general case into onelayered graph cases.
Lemma 1.
The optimum recovery plan for any onelayered star graph only consists of concentrating assignments when : .
Proof.
First, we argues the statement is true for a star graph with two nodes , , and . Suppose only consists of concentrating assignments and includes some splitting assignment.
Because concentrate resources on a node , the node becomes functional after steps. After the step, it takes more steps to recover the other node . Note that during these steps, the networkwide utility is always . Therefore, .
Let us think about that contains a splitting assignment at one time step and concentrating assignments for the other steps. The splitting must be conducted before becomes functional, since there are only two nodes. Then, it takes steps for and steps for to be recovered. Note that receives one unit of resource at both and th step. Thus, . The same discussion can be applied to the cases with more splitting, and decreases when more splitting assignments are included in . When only consists of the splitting, it takes steps for both nodes to be recovered. Therefore the networkwide utility is .
Second, we change the settings by allowing more general demands . Without loss of generality, suppose . There are three recovery plans to be compared. Let denote the recovery plan only consisting of concentrations with the prioritization of and be a plan including splitting. Based on the previous discussion, , and .
When uses the splitting assignment at one step, is recovered at th step, and it takes more steps to recover , where . This is because the splitting assigns one unit of resources to , and the ceiling function at th step may assign another excess unit, depending on if is divided by . Therefore, is at most . When exploits more splitting assignments, the networkwide utility decreases as observed in the previous setting.
It is easily shown by similar discussion that, for any , a recovery plan that only includes concentrating assignments is better than plans including splitting assignments. This is because the difference in resource amounts is just a problem of scaling of and . Thus, the inherent property of the spitting and concentrating assignments does hold even with any different .
It is also obvious that similar discussions hold for general star graphs with nodes. The key property here is that the splitting delays recovery of a certain node by assigning resources to more nodes, even though the number of steps required to recover all nodes is fixed: .
∎
Lemma 2.
The optimum recovery plan for any onelayered rooted tree never saturates any node that is not adjacent to a functional node; i.e., the candidate nodes for resource assignments are always adjacent to a functional node when : .
Proof.
For contradiction, let us think the case where saturation gives us better networkwide utility. Suppose there are two adjacent nodes in a rooted tree, such that is adjacent to an independent node, but is not.
First, we consider the case only with concentrating assignments. After saturating , it takes steps to recover . During these steps, the utility provided by remains 0. In contrast, when is recovered before , it takes to recover , and will provide utility of at each of these steps. This generates contradiction, since the number of total steps in both scenarios stays the same.
Second, let us try to improve the total utility, by introducing the splitting assignments, from . However, it is impossible based on the discussion in star graphs. When exploiting the splitting at one step, the duration that is functional is strictly less than . ∎
Theorem 2.
The optimum recovery plan for any onelayered rooted tree only consists of concentrating assignments that allocate resources to nodes adjacent to a functional node when : .
Proof.
When a network has only one functional node, Lemma 2 eliminates the possibilities to assign resources beyond the neighbors of the functional node. Then, the network can be considered as a star graph consisting of the functional node and its neighbors. Hence, the statement holds because of Lemma 1.
Accordingly, the node that becomes functional next is adjacent to a functional node. By contracting the edge between the two functional nodes, the problem is reduced to the original problem with one functional node.
∎
Definition 3.
Pseudo star graph : Given a graph and a node state function at time , the logical star graph consists of one logical functional node and the nodes adjacent to any of the functional nodes in original graph, and edges connecting and the others. Formally, , and .
The same statement holds for the case where has more nodes, and there exists more biconnected pairs of nodes between and .
Theorem 3.
For any onelayered graph, the optimum recovery plan only consists of concentrating assignments that allocate resources to nodes adjacent to a functional node when . .
Proof.
It is trivial that the optimum recovery plan does not saturate any node that is not adjacent to a functional node, even when a graph has more than one independent nodes or any cycle. Based on a discussion similar to Lemma 2, an assignment of resources to a node adjacent to a functional node always provides more networkwide utility over time, since the node assigned resources starts contributing to the utility in an earlier step. Thus, the candidate nodes for resource assignment at each step are the nodes adjacent to any functional node.
Therefore, a resource assignment decision at each time step, is equivalent to the progressive recovery problem in a logical star graph , where is a node state function reflecting recovery from to . Therefore, it can be considered as the recovery problem in a star graph with a single logical functional node at the center and surrounding leaf nodes .
Hence, it is easily provable, by the argument in Lemma 1, that the optimum plan does not involve splitting assignments, since the concentration of the split resources to a node can always recover the node in an earlier time step and provide more networkwide utility. ∎
Lemma 3.
Algorithm 1 is the optimum (exponential) algorithm to solve the progressive recovery problem in a general graph with , where is a constant representing the amount of available resource.
Proof.
When , any recovery plan recovers at most one node at each time step. The duration that a node is functional is the duration that the rest of nonfunctional nodes are recovered: , where is a set of nonfunctional nodes. Hence, the total utility to which the recovered node contributes until the last step is .
The remaining problem is the same problem with to recover the rest of nonfunctional nodes. The problems with smaller subsets are already solved in previous loops. Thus, the algorithm can reuse the precalculated results stored in . Therefore, the value of , which is always the maximum for subsets of the same size, reaches the optimum when .
The recovery plan is restored by traversing (line 17  22), and the optimum networkwide utility is .
This algorithm does not depend on any assumption on specific graph topology, since it solves the problem in a logical star graph. The set , which represents a set of nodes adjacent to any functional nodes, implicitly composes the logical star graph for each time step. ∎
Lemma 4.
The complexity of Algorithm 1 is .
Proof.
The two forloops in line 34 collectively go through all subsets in the power set of . Also, in line 7, the algorithm calculates the networkwide utility by attempting to recover each node that is adjacent to a functional node. In the worst case, the size of is , and it requires a traversal of to check the adjacency. Thus, the computation complexity is . ∎
Next, we claim that the progressive recovery problem with any network topology can be coverted into the case in a onelayered graph.
Definition 4.
A pair of nodes and is called a support pair when and .
Lemma 5.
When and are the first support pair recovered in a given graph , the order of saturation of these two nodes does not influence the total utility.
Proof.
Let us assume that a recovery plan saturates first and later. Note that there may be some nodes saturated before and between and . Since are the first supporting pair to be recovered, there is no functional node in before is saturated. The total utility generated until the step when is saturated is , where is a set of saturated nodes that are reachable from or . When we exchange the ordering of and , the total utility until the step when is saturated remains the same, because the saturated nodes until are same. Therefore, the order of saturation of and does not change the total utility. ∎
Lemma 6.
In any graph, the first two nodes saturated by the optimum recovery plan are always the nodes in a support pair. .
Proof.
For contradiction, assume a node was a node saturated at first by the optimum recovery plan , and the two nodes in a support pair will be recovered right after . Without loss of generality, it is assumed that is saturated first from Lemma 5. Then, the total utility until the step when is saturated is , where iff is adjacent to or ; otherwise, 0.
However, another recovery plan , which saturates and first and later, provides the total utility until is , since and are already functional at . It contradicts the fact that is the optimum. ∎
Lemma 7.
Let us think about the recovery of a onelayered rooted tree satisfying the following property.

There is only one node that is saturated.

Any node adjacent to the node has utility of zero: .
In the onelayered rooted tree, the second node recovered by the optimum recovery plan has utility strictly greater than zero: .
Proof.
All the nodes adjacent to have utility of zero. Therefore, the first node recovered by the is one of these node. For contradiction, assume the second node is also one of these zeroutility nodes, and let be the first node recovered, whose utility is greater than 0 (th node recovered in the plan).
In order to recover , it is necessary to have a zeroutility node that is already recovered for the reachability to . There are two possible scenarios: (1) is adjacent to , or (2) is adjacent to .
For the first scenario, we can exchange the recovery order of and . This exchange has no influence on the candidate nodes at each step after th recovery, because the recovered nodes until th recovery stay the same. However, it increases the utility and contradicts the fact that is optimum.
For the second scenario, we can exchange the recovery order of and . Again, this does not change any candidate sets for recovery after th recovery. Since is recovered at the very beginning, we can use the same discussion with the first scenario. Therefore, it provides a contradiction. Therefore, the second node recovered in the optimum plan should have nonzeroutility. ∎
Theorem 4.
A progressive recovery problem with any general graph with has an equivalent progressive recovery problem with a onelayered graph. .
Proof.
For clarification, a new onelayered graph consists of a layer consisting of only one node and another layer that can be any graph.
The problem with a general graph is converted into the problem with a onelayered graph as follows. We add a new node to and put all the nodes and edges in the original into . An edge is added between and each , where consists of nodes that are originally in of ; i.e. .
Lemma 6 shows that the first two nodes to be saturated (recovered) are the ones in a support pair. Also, according to Lemma 5, it can be assumed without loss of generality that a node in in each support pair is the first node to be saturated.
The edges newly added confirm that the first node recovered is one of the nodes in , since is the only saturated node in the initial step. The other correspondence between two problems to be checked is that the second node recovered in is that forms a support pair with in the original graph , and Lemma 7 guarantees this. ∎
Therefore, it is enough to think about the cases of onelayered graphs. Also, it is possible to aggregate multiple nodes in into one logical node in to decide the resource assignment, as the proof of Theorem 3 suggests. Thus, without loss of generality, the rest of this paper only deals with the onelayered graphs with one node in .
Corollary 1.
Algorithm 1 is the optimum algorithm for the progressive recovery problem with any general graph with , a timeinvariant , and , where is a constant representing the amount of available resource.
V Reinforcement Learning for Progressive Recovery Plan
Va QLearning
Reinforcement Learning (RL) is a method to learn the best mapping of scenarios to actions . The key elements of RL are an agent, who learns the mapping by numerical rewards for its trial actions, and an environment, which updates scenarios and returns the numerical reward depending on actions the agent takes.
In Qlearning, the mapping is learned using the actionvalue function that represents the quality of each pair of a state and an action. In theory, the Qvalue converges after infinite trial actions (experiences): , which means the expected reward achievable by following the optimum action sequence (policy) from state taking action at time . Note that is a discount factor for future rewards that defines the scope of learning. For each experience, the update of the Qvalue is performed by , where , and is a learning rate. is called the target, since should be equal to by convergence.
VB Deep QNetwork (DQN)
Mnih et al. [23]
report a significant improvement in RL by introducing Deep QNetwork (DQN). Instead of explicitly calculating the Qvalues, DQN uses neural networks (NNs), which are parametrized by a weight function
, as a function approximator to estimate the optimum Qvalues:
.The dramatical improvement by DQN in learning performance is achieved mainly by introducing experience replay and TargetNet [23]. Also, greedy exploration is used to effectively traverse stateaction pairs.
VB1 Experience Replay
It is known that the correlation among experiences causes fluctuations of the learning process. Experience Replay buffers the experiences and randomly takes samples from for the learning. This random sampling prevents DQN from undergoing fluctuation due to correlated experiences.
VB2 TargetNet and EvalNet
The learning by DQN updates not only but also the target value , since involves the estimate of the Qvalue. In order to stabilize the learning, it is proposed to use two separated NNs; one, named EvalNet for the learning for each experience, and the other, named TargetNet, for calculating the target value . The weight function of TargetNet is periodically updated by copying the weight function of EvalNet.
VB3 greedy Exploration
The tradeoff between exploration and exploitation is one of the crucial challenges in RL. The
greedy exploration is a commonly used approach to address this challenge. In this greedy approach, the agent follows the current best action known in a current state to reinforce the previous learning (exploitation) with probability
. With probability , it tries a different action that can potentially return a better reward (exploration).VC Applying DQN to PR
In our problem, the agent tries to learn the optimum resource allocations to nonfunctional nodes. Therefore, the legal actions for our agent are selecting a subset of nonfunctional nodes. Here, we assume a situation where at most one node is recovered at a time step by setting . Therefore, the number of legal actions is always the number of 2permutations of , , since the available resources are used to saturate the first node in the permutation and may be assigned to the second if remaining. Each state is represented as a
boolean vector in which
th element indicates the remaining demand of the corresponding node . The sum of utilities of the functional nodes is recognized as a reward of the state.One of the biggest challenges in our problem is the size of the state space, which grows exponentially in the number of nodes. Even with a graph with 20 nodes, over 1 million () possible states exist, and the number of values is . In order to improve the performance of exploration in such a huge state space, we take a random action among a set of legal actions with probability .
Vi Evaluations
Simulations are conducted with different topology, node attributes, and resource amount, as explained below. DeepPR is evaluated by comparing it with a few baseline algorithms, including theoretical optimum and heuristics.
Via Simulation Settings
ViA1 Network Topology
GNP random graphs [24] and the BT North America graph [25] are used as network topologies. Since our theoretical results indicate it is enough to test the algorithm performance in onelayered graphs with single node in , a node in is randomly selected among the nodes in each graph. For GNP random graphs, we used the following ranges: , and . Note that only connected GNP random graphs are fed into our simulations. The BT North America graph is based on an IP backbone network with 36 nodes and 76 edges.
ViA2 Node Attributes and Available Resource
The utility, demand, and resource values are randomly selected among the integers within given ranges. Here, the following setting is used: (utility range, demand range, resource amount available at each time step).
ViB Baseline Algorithms
DeepPR is compared with three baseline algorithms named DPOPT, RANDOM, and RATIO. DPOPT is the optimum networkwide utility calculated by a bottomup dynamic programming technique, which enables us to obtain the optimum until relatively larger graphs compared to simple enumerations. RANDOM is a heuristic algorithm that randomly selects one of the nonfunctional nodes adjacent to functional nodes. RATIO is a greedy heuristic algorithm inspired by the approximation algorithm of the set cover problem. This heuristic assigns resources to the most costeffective node among the nodes adjacent to functional nodes at each time step by calculating . The detailed explanation of both algorithms are available on our technical report.
ViC Results and Discussions
Fig. 4 illustrates a sample of the learning curve of DeepPR over episodes, which are alternative sequences of states and actions from the initial network state to the fully recovered state. This sample is obtained in a GNP graph with 19 nodes, and similar curves are also observed in other graphs. Since the NNs are randomly initialized, the initial values do not reflect the actual rewards. Through the update on values during explorations, the NNs, which estimate values, are gradually trained as to produce more accurate rewards. In the figure, the utility (total reward) that DeepPR achieves stays at approximately 725 until around the 250th episode, and after that, it continues increasing towards around 900. Because of the exploration by random actions, utility values fluctuate during the entire training period. Note that each episode takes 1.057 seconds on average in a computer with a 2.5 GHz Intel Core i5 CPU, Intel HD Graphics 4000 (1536 MB), and 8 G memory.
Fig. 4 indicates a comparison among the four algorithms in terms of total utility in GNP random graphs. In smaller graphs, the utility obtained by DeepPR always matches with the theoretical optimum (DPOPT). In theory, learning is guaranteed to achieve the optimum by visiting each stateaction pair an infinite number of times. Since it is easier to visit each stateaction pair a greater number of times in graphs with fewer states and action choices, the estimation of values seems to converge to more accurate values, which leads to the optimum. In contrast, the difference between DPOPT and DeepPR becomes notable in some larger graphs for the same reason. Compared to RATIO, DeepPR performs slightly better in those larger graphs. Also, RANDOM is the worst heuristic among the four methods over all sizes of graphs and continues getting worse along with the graph size because of the increase of recovery choices.
Fig. 4 shows the utility obtained by three algorithms in the BT North America graph. Here, DPOPT is not included since it is infeasible due to the number of nodes. A similar trend from GNP graphs is also observed in this practical topology.
In general, RATIO performs well since it decides the recovery ordering based on the costeffectiveness, which is used in an approximation algorithm for the set cover problem. It is known that greedy choices based on the costeffectiveness could reach the nearly optimum in many cases. At the same time, the performance of DeepPR is quite interesting, when considering a fact that the RL agent does not know the meaning of states, actions, and the condition for recovery.
Vii Conclusion
This paper discusses a progressive recovery problem of interdependent networks to maximize the total available computation utility of the networks, where a limited amount of resources arrives in a time sequence. It is proved that the recovery problem with a general network topology always has an equivalent progressive recovery problem with a onelayered graph, which is much simpler but still NPhard. In order to solve the intractable recovery problem, a deep reinforcement learningbased algorithm, DeepPR is introduced by taking node state vectors and total utility over time as its states and discounted rewards, respectively. The simulation results indicate that it achieves 98.4% of the theoretical optimum in smaller GNP random networks.
References
 [1] S. G. Kulkarni, G. Liu, K. K. Ramakrishnan, M. Arumaithurai, T. Wood, and X. Fu, “Reinforce: Achieving efficient failure resiliency for network function virtualization based services,” in Proceedings of the 14th International Conference on Emerging Networking EXperiments and Technologies, CoNEXT ’18, (New York, NY, USA), pp. 41–53, ACM, 2018.
 [2] H. Rastegarfar, D. C. Kilper, M. Glick, and N. Peyghambarian, “Cyberphysical interdependency in dynamic softwaredefined optical transmission networks,” IEEE/OSA Journal of Optical Communications and Networking, vol. 7, pp. 1126–1134, Dec 2015.
 [3] J. Liu, Z. Jiang, N. Kato, O. Akashi, and A. Takahara, “Reliability evaluation for NFV deployment of future mobile broadband networks,” IEEE Wireless Communications, vol. 23, pp. 90–96, June 2016.
 [4] ETSI, “ETSI GS NFVMAN 001 V1.1.1 (201412),” Retrieved on March 27, 2019. , Retrieved on March 27, 2019.
 [5] J. Wang, C. Qiao, and H. Yu, “On progressive network recovery after a major disruption,” in 2011 Proceedings IEEE INFOCOM, pp. 1925–1933, April 2011.
 [6] M. Pourvali, K. Liang, F. Gu, H. Bai, K. Shaban, S. Khan, and N. Ghani, “Progressive recovery for network virtualization after largescale disasters,” in 2016 International Conference on Computing, Networking and Communications (ICNC), pp. 1–5, Feb 2016.
 [7] S. Ciavarella, N. Bartolini, H. Khamfroush, and T. L. Porta, “Progressive damage assessment and network recovery after massive failures,” in IEEE INFOCOM 2017  IEEE Conference on Computer Communications, pp. 1–9, May 2017.
 [8] Y. Zhao, M. Pithapur, and C. Qiao, “On progressive recovery in interdependent cyber physical systems,” in 2016 IEEE Global Communications Conference (GLOBECOM), pp. 1–6, Dec 2016.
 [9] A. Majdandzic, L. A. Braunstein, C. Curme, I. Vodenska, S. LevyCarciente, H. Eugene Stanley, and S. Havlin, “Multiple tipping points and optimal repairing in interacting networks,” Nature Communications, vol. 7:10850, March 2016.
 [10] S. Ferdousi, F. Dikbiyik, M. Tornatore, and B. Mukherjee, “Progressive datacenter recovery over optical core networks after a largescale disaster,” in 2016 12th International Conference on the Design of Reliable Communication Networks (DRCN), pp. 47–54, March 2016.
 [11] T. Pan, A. Kuhnle, X. Li, and M. Thai, “Vulnerability of interdependent networks with heterogeneous cascade models and timescales,” in 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 290–299, July 2018.
 [12] A. Tajer, M. Zohdy, and K. Alnajjar, “Resource allocation under sequential resource access,” IEEE Transactions on Communications, vol. 66, pp. 5608–5620, Nov 2018.
 [13] S. V. Buldyrev, R. Parshani, G. Paul, H. E. Stanley, and S. Havlin, “Catastrophic cascade of failures in interdependent networks,” Nature, vol. 464, pp. 1025–1028, Apr 2010.
 [14] D. H. Shin, D. Qian, and J. Zhang, “Cascading effects in interdependent networks,” IEEE Network, vol. 28, pp. 82–87, July 2014.
 [15] M. Stippinger and J. Kertész, “Enhancing resilience of interdependent networks by healing,” Physica A: Statistical Mechanics and its Applications, vol. 416, pp. 481 – 487, 2014.
 [16] L. K. Gallos and N. H. Fefferman, “Simple and efficient selfhealing strategy for damaged complex networks,” Phys. Rev. E, vol. 92, p. 052806, Nov 2015.
 [17] A. Behfarnia and A. Eslami, “Error correction coding meets cyberphysical systems: Messagepassing analysis of selfhealing interdependent networks,” IEEE Transactions on Communications, vol. 65, pp. 2753–2768, July 2017.
 [18] A. Mazumder, C. Zhou, A. Das, and A. Sen, “Progressive recovery from failure in multilayered interdependent network using a new model of interdependency,” in Critical Information Infrastructures Security (C. G. Panayiotou, G. Ellinas, E. Kyriakides, and M. M. Polycarpou, eds.), (Cham), pp. 368–380, Springer International Publishing, 2016.
 [19] E. E. Lee II, J. E. Mitchell, and W. A. Wallace, “Restoration of services in interdependent infrastructure systems: A network flows approach,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 37, pp. 1303–1317, Nov 2007.
 [20] A. Sen, A. Mazumder, J. Banerjee, A. Das, and R. Compton, “Identification of K most vulnerable nodes in multilayered network using a new model of interdependency,” in 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 831–836, April 2014.
 [21] D. T. Nguyen, Y. Shen, and M. T. Thai, “Detecting critical nodes in interdependent power networks for vulnerability assessment,” IEEE Transactions on Smart Grid, vol. 4, pp. 151–159, March 2013.
 [22] D. Bienstock, J. Sethuraman, and C. Ye, “Approximation algorithms for the incremental knapsack problem via disjunctive programming,” 2013.
 [23] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., “Humanlevel control through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015.
 [24] NetworkX, “networkx.generators.random_graphs.gnp_random_graph,” Retrieved on March 27, 2019. https://networkx.github.io/, Retrieved on March 27, 2019.
 [25] The Internet Topology Zoo, “BT North America,” Retrieved on March 27, 2019. http://www.topologyzoo.org/dataset.html, Retrieved on March 27, 2019.