1 Introduction
In the modern era, the functionality of infrastructure systems is of significant importance in providing continuous services to communities and in supporting their public health and safety. Natural and anthropogenic hazards pose significant challenges to infrastructure systems and cause undesirable system malfunctions and consequences. Past experiences show that these malfunctions are not always inevitable despite design strategies like increasing system redundancy and reliability nach . Therefore, a sequential rational decisionmaking framework should enable malfunctioned systems to be restored in a timely manner after the hazards. Further, postevent stressors and chaotic circumstances, time limitations, budget and resource constraints, and complexities in the community recovery process, which are twinned with catastrophe, highlight the necessity for a comprehensive riskinformed decisionmaking framework for recovery management at the community level. A comprehensive decisionmaking framework must take into account indirect and delayed consequences of decisions (also called the posteffect property of decisions), which requires foresight or planning. Such a comprehensive decisionmaking system must also be able to handle largescale scheduling problems that encompass large combinatorial decision spaces to make the most rational plans at the community level.
Developing efficient computational methodologies for sequential decisionmaking problems has been a subject of significant interest kochenderfer ; bertsekasbook . In the context of civil engineering, several studies have utilized the framework of dynamic programming for management of bridges and pavement maintenance meidani ; frangopol ; jiang1 ; jiang2 ; Shaf . Typical methodological formulations employ principles of dynamic programming that utilize stateaction pairs. In this study, we develop a powerful and relatively unexplored methodological framework of formulating large infrastructure problems as stringactions, which will be described in Section 5.2. Our formulation does not require an explicit statespace model; therefore, it is shielded against the common problem of state explosion when such methodologies are employed. The sequential decisionmaking methodology presented here not only manages networklevel infrastructure but also considers the interconnectedness and cascading effects in the entire recovery process that have not been addressed in the past studies.
Dynamic programming formulations frequently suffer from the curse of dimensionality. This problem is further aggravated when we have to deal with large combinatorial decision spaces characteristic of community recovery. Therefore, using approximation techniques in conjunction with the dynamic programming formalism is essential. There are several approximation techniques available in the literature Bertsekasneuro ; kuhn ; medury ; busoniu . Here, we use a promising class of approximation techniques called rollout algorithms. We show how rollout algorithms blend naturally with our stringaction formulation. Together, they form a robust tool to overcome some of the limitations faced in the application of dynamic programming techniques to massive realworld problems. The proposed approach is able to handle the curse of dimensionality in its analysis and management of multistate, largescale infrastructure systems and data sources. The proposed methodology is also able to consider and improve the current recovery policies of responsible public and private entities within the community.
Among infrastructure systems, electrical power networks (EPNs) are particularly critical insofar as the functionality of most other networks, and critical facilities depend on EPN functionality and management. Hence, the method is illustrated in an application to recovery management of the modeled EPN in Gilroy, California following a severe earthquake. The illustrative example shows how the proposed approach can be implemented efficiently to identify nearoptimal recovery decisions. The computed nearoptimal decisions restored the EPN of Gilroy in a timely manner, for residential buildings as well as main food retailers, as an example of critical facilities that need electricity to support public health in the aftermath of hazards.
The remainder of this study is structured as follows. In Section 2, we introduce the background of system resilience and the system modeling used in this study. In Section 3, we introduce the case study used in this paper. In Section 4, we describe the earthquake modeling, fragility, and restoration assessments. In Section 5, we provide a mathematical formulation of our optimization problem. In Section 6, we describe the solution method to solve the optimization problem. In Section 7, we demonstrate the performance of the rollout algorithm with the stringaction formulation through multiple simulations. In Section 8, we present a brief conclusion of this research.
2 System Resilience
The term resilience is defined in a variety of ways. Generally speaking, resilience can be defined as “the ability to prepare for and adapt to changing conditions and withstand and recover rapidly from disruptions”directive2013critical . Hence, resilience of a community (or a system) is usually delineated with the measure of community functionality, shown by the vertical axis of Fig. 1 and four attributes of robustness, rapidity, redundancy, and resourcefulness bruneau1 . Fig. 1 illustrates the concept of functionality, which can be defined as the ability of a system to support its planned mission, for example, by providing electricity to people and facilities. The understanding of interdependencies among the components of a system is essential to quantify system functionality and resilience. These interdependencies produce cascading failures where a largescale cascade may be triggered by the malfunction of a single or few components buldyrev2010catastrophic . Further, they contribute to the recovery rate and difficulty of the entire recovery process of a system. Different factors affect the recovery rate of a system, among which modification before disruptive events (exante mitigations), different recovery policies (expost actions), and nature of the disruption are prominent barabadi2018post . Fig. 1 also highlights different sources of uncertainty that are associated with community functionality assessment and have remarkable impacts in different stages from prior to the event to the end of the recovery process. Therefore, any employed model to assess the recovery process should be able to consider the impacts of the influencing parameters.
In this study, the dependency of networks is modeled through an adjacency matrix , where indicates the magnitude of dependency between components i and j watts1998collective . In this general form, the adjacency matrix A
can be a timedependent stochastic matrix to capture the uncertainties in the dependencies and probable timedependent variations.
According to the literature, the resilience index for each system is defined by the following equation bruneau1 ; cimellaro2014community :
(1) 
where is the functionality of a system at time t, is the control time of the system, and is the time of occurrence of event e, as shown in Fig. 1.
3 Description of Case Study
In the case study of this paper, the community in Gilroy, California, USA is used as an example to illustrate the proposed approach. Gilroy is located approximately 50 kilometers (km) south of the city of San Jose with a population of 48,821 at the time of the 2010 census (see Fig. 2) Gilroy1 . The study area is divided into 36 gridded rectangles to define the community and encompasses 41.9 km^{2} area of Gilroy. In this study, we do not cover all the characteristics of Gilroy; however, the adopted model has a resolution that is sufficient to study the methodology at the community level under hazard events.
Gilroy contains 13 food retailers, but six main retailers, each of which has more than 100 employees, provide the main food requirements of Gilroy inhabitants Gilroy2 , as shown in Fig. 3 and summarized in Table 1.
Food Retailer  Walmart  Costco  Target  Mi Pueblo Food  Nob Hill Foods  Safeway 

Number of Employees  395  220  130  106  100  130 
To assign the probabilities of shopping activity to each urban grid rectangle, the gravity model Popgen is used. The gravity model identifies the shopping location probabilistically, given the location of residences. These probabilities are assigned to be proportional to food retailersﬂ capacities and inversely corresponding to retailersﬂ distances from centers of urban grid rectangles. Consequently, distant small locations are less likely to be selected than close large locations.
If the center of an urban grid is , then food retailer is chosen according to the following distribution Popgen :
(2) 
where is the capacity of food retailer , determined by Table 1, is a negative constant, and is the travel time from urban grid rectangle to food retailer . Googleﬂs Distance Matrix API was called from within R by using the ggmap package ggmap to provide distances and travel times for the assumed transportation mode of driving.
Fig. 4 depicts the EPN components, located within the defined boundary. Llagas power substation, the main source of power in the defined boundary, is supplied by an 115 kV transmission line. Distribution line components are positioned at 100 m and modeled from the substation to the urban grids centers, food retailers, and water network facilities. In this study, the modeled EPN has 327 components.
4 Hazard and Damage Assessment
4.1 Earthquake Simulation
The seismicity of the Gilroy region of California is mainly controlled by the San Andreas Fault (SAF), which caused numerous destructive earthquakes like the Loma Prieta earthquake Loma1
. The spatial estimation of groundmotion amplitudes from earthquakes is an essential element of risk assessment, typically characterized by groundmotion prediction equations (GMPEs). GMPEs require several parameters, such as earthquake magnitude
, fault properties (), soil conditions (i.e., the average shearwave velocity in the top 30 m of soil, ), and epicentral distances () to compute the seismic intensity measure () at any point. Modern GMPEs typically take the form(3) 
where and reflect the intraevent (within event) and interevent (eventtoevent) uncertainty respectively jayaram . In this study, the GMPE proposed by Ambrahamson et al. abrahamson2013update is used, and a ground motion similar to the Loma Prieta earthquake, one of the most devastating hazards that Gilroy has experienced Loma1 , with epicenter approximately 12 km of Gilroy downtown on the SAF projection is simulated. Figs. 5 and 6 show the map of and ground motion field for Peak Ground Acceleration (PGA), respectively.
4.2 Fragility Function and Restoration
In the event of an earthquake, the relations between groundmotion intensities and earthquake damage are pivotal elements in the loss estimation and the risk analysis of a community. Fragility curves describe the probability of experiencing or exceeding a particular level of damage as a function of hazard intensity. It is customary to model component fragilities with lognormal distributions kennedy1984seismic . The conditional probability of being in or exceeding a particular damage state (), conditioned on a particular level of intensity measure , is defined by
(4) 
where
is the standard normal distribution;
andare the mean and standard deviation of
. The fragility curves can be obtained based on a) postearthquake damage evaluation data (empirical curves) emp1 b) structural modeling (analytical curves) anl1 c) expert opinions (heuristics curves) hu1 . In the present study, the seismic fragility curves included in hazus ; xie are used for illustration.To restore a network, a number of available resource units, , as a generic single number including equipment, replacement components, and repair crews are considered for assignment to damaged components, and each damaged component is assumed to require only one unit of resource ouyang . The restoration times based on the level of damage, used in this study, are presented in Table 2, based on hazus ; saeedLA .
Damage States  

Component  Undamaged  Minor  Moderate  Extensive  Complete 
Electric substation  0  1  3  7  30 
Transmission line component  0  0.5  1  1  2 
Distribution line component  0  0.5  1  1  1 
5 Optimization Problem Description
5.1 Introduction
After an earthquake event occurs, each EPN component ends up in one of the damage states as shown in Table 2. Let the total number of damaged components be . Note that . Both and are nonnegative integers. Also, in this study, . This assumption is justified by the availability of limited resources with the planner where large number of components are damaged in the aftermath of a severe hazard.
A decision maker or planner has the task of assigning units of resources to these damaged components. A decision maker has a heuristic or expert policy on the basis of which he can make his decisions to optimize multiple objectives. The precise nature of the objective of the planner can vary, which will be described in detail in Section 5.2
. Particularly, at the first decision epoch, the decision maker or a resource planner deploys
unit of resources at out of damaged components. Each unit of resource is assigned to a distinct damaged component. At every subsequent decision epoch, the planner must have an option of reassigning some or all of the resources to new locations based on his heuristics and objectives. He must have the flexibility of such a reassignment even if the repair work at the currently assigned locations is not complete. At every decision epoch, it is possible to forestall the reassignment of the units of resource that have not completed the repair work; however, we choose to solve the more general problem of preemptive assignment, where nonpreemption at few or all the locations is a special case of our problem. The preemptive assignment problem is a richer decision problem than the nonpreemptive case in the sense that the process of optimizing the decision actions is a more complex task because the size of the decision space is bigger.In this study, we assume that the outcome of the decisions is fully predictable emi ; iEMSs . We are preparing a separate study to address the relaxation of this assumption, i.e., when outcomes of decisions exhibit uncertainty. The modified methods to deal with the stochastic problem will form a part of a separate paper ieeeconf .
We improve upon the solutions offered by heuristics of the planner by formulating our optimization problem as a dynamic program and solving it using the rollout approach.
5.2 Optimization Problem Formulation
Suppose that the decision maker starts making decisions and assigning repair locations to different units of resource. The number of such nontrivial decisions to be made is less than or equal to . When becomes less than or equal to (because of sequential application of repair actions to the damaged components), the assignment of units of resource becomes a trivial problem in our setting because each unit can simply be assigned one to one, in any order, to the damaged components. Consequently, a strict optimal assignment can be achieved in the trivial case. The size of this trivial assignment problem reduces by one for every new decision epoch until all the remaining damaged components are repaired. The additional units of resources retire because deploying more than one unit of resource to the same location does not decrease the repair time associated with that damaged component. Henceforth, we focus on the nontrivial assignment problem.
Let the variable denote the decision epoch, and let be the set of all damaged components before a repair action is performed. Let denote the decision epoch at which repair action is selected so that . Note that . Let represent the string of actions owing to the nontrivial assignment. We say that a repair action is completed when at least 1 out of the N damaged components is repaired. Let be the powerset of . Let,
(5) 
so that . Let be the set of all repaired components after the repair action is completed. Note that , where , and the decisionmaking problem moves into the trivial assignment problem previously discussed.
We wish to calculate a string of repair actions that optimizes our objective functions . We deal with two objective functions in this study denoted by mapping and .

Objective 1: Let the variable represent the population of Gilroy and represent a constant threshold. Let be the string of repair actions that results in restoration of electricity to number of people. Here, where is the number of damaged component at the decision epoch. Let represent the time required to restore electricity to number of people as a result of repair actions . Formally,
(6) Objective 1 is to compute the optimal solution given by
(7) We explain the precise meaning of restoration of electricity to people in more detail in Section 7.1. To sum up, in objective 1, our aim is to find a string of actions that minimizes the number of days needed to restore electricity to a certain fraction () of the total population of Gilroy.

Objective 2: We define the mapping in terms of number of people who have electricity per unit of time; our objective is to maximize this mapping over a string of repair actions. Let the variable denote the total time elapsed between the completion of repair action and , in which is the time elapsed between the start and completion of repair action . Let be the total number of people who have electricity after the repair action is complete. Then,
(8) We are interested in the optimal solution given by
(9)
Note that our objective function in the second case mimics the resilience index and can be interpreted in terms of (1). Particularly, the integral in (1) is replaced by a sum because of discrete decision epochs, is replaced by the product , is analogous to , and the integral limits are changed to represent the discrete decision epochs.
6 Optimization Problem Solution
Calculating or is a sequential optimization problem. The decision maker applies the repair action at the decision epoch to maximize or minimize a cumulative objective function. The string of actions, as represented in or , are an outcome of this sequential decisionmaking process. This is particularly relevant in the context of dynamic programming where numerous solution techniques are available for the sequential optimization problem. Rollout is one such method that originated in dynamic programming. It is possible to use the dynamic programming formalism to describe the method of rollout, but here we accomplish this by starting from first principles Bertsekas2013 . We will draw comparisons between rollout with first principles and rollout in dynamic programming at suitable junctions. The description of the rollout algorithm is inherently tied with the notion of approximate dynamic programming.
6.1 Approximate Dynamic Programming
Let’s focus our attention on objective 1. The extension of this methodology to objective 2 is straightforward; we need to adapt notation used for objective 2 in the methodology presented below, and a maximization problem replaces a minimization problem. Recall that we are interested in the optimal solution given by (7). This can be calculated in the following manner:
First calculate as follows:
(10) 
where the function is defined by
(11) 
Next, calculate as:
(12) 
where the function is defined by
(13) 
Similarly, we calculate the solution as follows:
(14) 
where the function is defined by
(15) 
The functions are called the optimal costtogo functions and are defined by the following recursion:
(16) 
where the boundary condition is given by:
(17) 
Note that is a standard notation used to represent costtogo functions in the dynamic programming literature.
The approach discussed above to calculate the optimal solutions is typical of the dynamic programming formulation. However, except for very special problems, such a formulation cannot be solved exactly because calculating and storing the optimal costtogo functions can be numerically intensive. Particularly, for our problem, let ; then the storage of requires a table of size
(18) 
where for objective 1, and for objective 2. In the dynamic programming literature, this is called as the curse of dimensionality. If we consider objective 2 and wish to calculate such that (we assume for the sake of this example that only a single damaged component is repaired at each ), then for 50 damaged components and 10 unit of resources, . In practice, in (14) is replaced by an approximation denoted by . In the literature, is called as a scoring function or approximate costtogo function Bertsekas1997 . One way to calculate is with the aid of a heuristic; there are several ways to approximate that do not utilize heuristic algorithms. All such approximation methods fall under the category of approximate dynamic programming.
The method of rollout utilizes a heuristic in the approximation process. We provide a more detailed discussion on the heuristic in Section 6.2. Suppose that a heuristic is used to approximate the minimization in (15), and let denote the corresponding approximate optimal value; then rollout yields the suboptimal solution by replacing with in (14):
(19) 
The heuristic used in the rollout algorithm is usually termed as the base heuristic. In many practical problems, rollout results in a significant improvement over the underlying base heuristic to solve the approximate dynamic programming problem Bertsekas1997 .
6.2 Rollout Algorithm
It is possible to define the base heuristic in several ways:

The current recovery policy of regionally responsible public and private entities,

The importance analyses that prioritize the importance of components based on the considered importance factors limnios ,

A random policy without any preassumption,
The rollout method described in Section 6.1, using first principles and stringaction formulation, for a discrete, deterministic, and sequential optimization problem has interpretations in terms of the policy iteration algorithm in dynamic programming. The policy iteration algorithm (see howard for the details of the policy iteration algorithm including the definition of policy in the dynamic programming sense) computes an improved policy (policy improvement step), given a base policy (stationary), by evaluating the performance of the base policy. The policy evaluation step is typically performed through simulations ieeeconf . Rollout policy can be viewed as the improved policy calculated using the policy iteration algorithm after a single iteration of the policy improvement step. For a discrete and deterministic optimization problem, the base policy used in the policy iteration algorithm is equivalent to the base heuristic, and the rollout policy consists of the repeated application of this heuristic. This approach was used by the authors in Bertsekas1997 where they provide performance guarantees on the basic rollout approach and discuss variations to the rollout algorithm. Henceforth, for our purposes, base policy and base heuristic will be considered indistinguishable.
On a historical note, the term rollout was first coined by Tesauro in reference to creating computer programs that play backgammon Tesauro . An approach similar to rollout was also shown much earlier in abramson .
Ideally, we would like the rollout method to never perform worse than the underlying base heuristic (guarantee performance). This is possible under each of the following three cases Bertsekas1997 :

The rollout method is terminating (called as optimized rollout).

The rollout method utilizes a base heuristic that is sequentially consistent (called as rollout).

The rollout method is terminating and utilizes a base heuristic that is sequentially improving (extended rollout and fortified rollout).
A sequentially consistent heuristic guarantees that the rollout method is terminating. It also guarantees that the base heuristic is sequentially improving. Therefore, 3 and 1 are the special cases of 2 with a less restrictive property imposed on the base heuristic (that of sequential improvement or termination). When the base heuristic is sequentially consistent, the fortified and extended rollout method are the same as the rollout method.
A heuristic must posses the property of termination to be used as a base heuristic in the rollout method. Even if the base heuristic is terminating, the rollout method need not be terminating. Apart from the sequential consistency of the base heuristic, the rollout method is guaranteed to be terminating if it is applied on problems that exhibit special structure. Our problem exhibits such a structure. In particular, a finite number of damaged components in our problem are equivalent to the finite node set in Bertsekas1997 . Therefore, the rollout method in this study is terminating. In such a scenario, we could use the optimized rollout algorithm to guarantee performance without putting any restriction on the base heuristic to be used in the proposed formulation; however, a wiser base heuristic can potentially enhance further the computed rollout policy. Nevertheless, our problem does not require any special structure on the base heuristic for the rollout method to be sequentially improving, which is justified later in this section.
In the terminology of dynamic programming, a base heuristic that admits sequential consistency is analogous to the Markov or stationary policy. Similarly, the terminating rollout method defines a rollout policy that is stationary.
Two different base heuristics are considered in this study. The first base heuristic is a random heuristic denoted by . The idea behind consideration of this heuristic is that in actuality there are cases where there is no thoughtout strategy or the computation of such a scheme is computationally expensive. We will show though simulations that the rollout formulation can accept a random base policy at the community level from a decision maker and improve it significantly. The second base heuristic is called a smart heuristic because it is based on the importance of components and expert judgment, denoted by . The importance factors used in prioritizing the importance of the components can accommodate the contribution of each component in the network. This base heuristic is similar in spirit to the items 2 and 5 listed above. More description on the assignment of units of resources based on and is described in Section 7.1.1. We also argue there that and are sequentially consistent. Therefore, in this study, and for our choice of heuristics, the extended, fortified, and rollout method are equivalent.
Let be any heuristic algorithm; the state of this algorithm at the first decision epoch is , where . Similarly, the state of the algorithm at the ^{th} decision epoch is the solution given by , i.e., the algorithm generates the path of the states . Note that is the dummy initial state of the algorithm . The algorithm terminates when for objective 1, and for objective 2. Henceforth, in this section, we consider only objective 1 without any loss of generality. Let denote the costtogo starting from the solution, generated by applying (i.e., is used to evaluate the costtogo). The costtogo associated with the algorithm is equal to the terminal reward, i.e., . Therefore, we have: . We use this heuristic costtogo in (14) to find an approximate solution to our problem. This approximation algorithm is termed as “Rollout on ” () owing to its structure that is similar to the approximate dynamic programming approach rollout. The algorithm generates the path of the states as follows:
(20) 
where, , and
(21) 
The algorithm is sequentially improving with respect to and outperforms (see shankar2014 for the details of the proof).
The algorithm described above is termed as onestep lookahead approach because the repair action at any decision epoch (current step) is optimized by minimizing the costtogo given the repair action at (see (20)). It is possible to generalize this approach to incorporate multistep lookahead. Suppose that we optimize the repair actions at any decision epoch and (current and the next step combined) by minimizing the costtogo given the repair actions for the current and next steps. This can be viewed as a twostep lookahead approach. Note the similarity of this approach with the dynamic programming formulation from first principles in Section 6.1, except for the difficulty of estimating the costtogo values exactly. Also, note that a twostep lookahead approach is computationally more intensive than the onestep approach. In principle, it is possible to extend it to step size , where . However, as increases, the computational complexity of the algorithm increases exponentially. Particularly, when is selected equal to at the first decision epoch, the algorithm finds the exact optimal solution by exhaustively searching through all possible combinations of repair action at each , with computational complexity . Also, note that provides a tighter upper bound on the optimal objective value compared to the bound obtained from the original heuristic approach.
7 Results
7.1 Discussion
We show simulation results for two different cases. In Case 1, we assume that people have electricity when their household units have electricity. Recall that the city is divided into different gridded rectangles according to population heat maps (Fig. 2), and different components of the EPN network serving these grids are depicted in Fig. 4. The entire population living in a particular gridded rectangle will not have electricity until all the EPN components serving that grid are either undamaged or repaired posthazard (functional). Conversely, if the EPN components serving a particular gridded rectangle are functional, all household units in that gridded rectangle are assumed to have electricity.
In Case 2, along with household units, we incorporate food retailers into the analysis. We say that people have the benefit of electric utility only when the EPN components serving their gridded rectangles are functional, and they go to a food retailer that is functional. A food retailer is functional (in the electric utility sense) when all the EPN components serving the retailer are functional. The mapping of number of people who access a particular food retailer is done at each urban grid rectangle and follows the gravity model explained in Section 3.
In both the cases, the probability that a critical facility like a food retailer or an urban grid rectangle has electricity is
(22) 
where is the minimum number of EPN components required to supply electricity to , is the event that has electricity, and is the event that the EPN component has electricity. The sample space is a singleton set that has the outcome, “has electricity.”
For all the simulation results provided henceforth, the number of units of resource available with the planner is fixed at 10.
7.1.1 Case 1: Repair Action Optimization of EPN for Household Units
The search space undergoes a combinatorial explosion for modest values of and , at each , until few decision epochs before moving into the trivial assignment problem, where the value of is small. Because of the combinatorial nature of the assignment problem, we would like to reduce the search space for the rollout algorithm, at each , without sacrificing on the performance. Because we consider EPN for only household units in this section, it is possible to illustrate techniques, to reduce the size of the search space for our rollout algorithm, that provide a good insight into formulating such methods for other similar problems. We present two representative methods to deal with the combinatorial explosion of the search space, namely, 1step heuristic and Nstep heuristic. Note that these heuristics are not the same as the base heuristic or .
Before we describe the 1step and Nstep heuristic, we digress to discuss and .
Both, and , have a preordained method of assigning units of resources to the damaged locations. This order of assignment remains fixed at each . In , this order is decided randomly; while in , it is decided based on importance factors. Let’s illustrate this further with the help of an example. Suppose that we name each of the components of the EPN with serial numbers from 1 to 327 as shown partially in Fig. 7; the assignment of these numbers to the EPN components is based on and remains fixed at each , where a damaged component with a lower number is always assigned unit of resource before a damaged component with a higher number, based on the availability of units of resource. Therefore, the serial numbers depict the preordained priority assigned to components that is decided before decisionmaking starts. E.g., if the component number 21 and 22 are both damaged, the decision maker will assign one unit of resource to component 21 first and then schedule repair of component 22, contingent on availability of resources. Such a fixed predecided assignment of unit of resource by heuristic algorithm and matches the definition of a consistent path generation in Bertsekas1997 . Therefore, and are sequentially consistent. Note that the assignment of numbers 1 to 327 in Fig. 7 is assumed only for illustration purposes; the rollout method can incorporate a different preordained order defined by and .
We now discuss the 1step and Nstep heuristic. In Fig. 7, note that each successive EPN component (labeled 1327), is dependent upon the prior EPN component for electricity. E.g., component 227 is dependent upon component 50. Similarly, component 55 is dependent upon component 50. The components in the branch 5357 and 225231 depend upon the component 52 for electricity. We exploit this serial nature of an EPN by representing the EPN network as a tree structure as shown in Fig. 8. Each number in the EPN tree represents an EPN component; each node represents a group of components determined by the label of the node, and the arcs of the tree capture the dependence of the nodes.
If the number of damaged components in the root node of the EPN tree is greater than , then it would be unwise to assign a unit of resource at the first decision epoch to the fringe nodes of our EPN tree because we do not get any benefit until we repair damaged components in the root node. As soon as the number of damaged components in the root node of the EPN tree becomes less than , only then we explore the assignment problem at other levels of the EPN tree.
1step Heuristic: We increase the pool of candidate damaged components, where the assignment of units of resources must be considered, to all the damaged components of the next level of the EPN tree if and only if the number of damaged components at the current level of the EPN tree is less than . Even after considering the next level of the EPN tree, if the number of damaged components is less than , we take one more step and account for all damaged components two levels below the current level. We repeat this until the pool of candidate damaged components is greater than or equal to , or the levels of EPN tree are exhausted.
Nstep Heuristic: Note that it might be possible to ignore few nodes at each level of the EPN tree and assign units of resources to only some promising nodes. This is achieved in the Nstep heuristic (here N in Nstep is not same as number of workers). Specifically, if the number of the damaged components at the current level of the EPN tree is less than , then the algorithm searches for a node at the next level that has the least number of damaged components, adds these damaged components to the pool of damaged components, and checks if the total number of damaged components at all explored levels is less than . If the total number of candidate damaged components is still less than , the previous process is repeated at unexplored nodes until the pool of damaged components is greater than or equal to , or the levels of the EPN tree are exhausted. Essentially, we do not consider the set () of all damaged components, at each , but only a subset of denoted by (1step heuristic) and (Nstep heuristic).
We simulate multiple damage scenarios following an earthquake with the models discussed previously in Section 4. On average, the total number of damaged components in any scenario exceeds 60%.
We show the performance of in Fig. 9. The faint lines depict plots of EPN recovery for multiple scenarios when is used for decision making. Here the objective pursued by the decision maker is objective 2. The black line shows the mean of all the recovery trajectories, and the red lines show the standard deviation. Henceforth, in various plots, instead of plotting the recovery trajectories for all the scenarios, we compare the mean of the different trajectories.
Fig. 10 shows the performance of our algorithm with respect to . Our simulation results demonstrate significant improvement over algorithm when is used, for both the 1step case and the Nstep case. Another result is the performance shown by the 1step heuristic with respect to the Nstep heuristic. Even though the Nstep heuristic skips some of the nodes at each level in EPN tree, to accommodate more promising nodes into the search process, the performance improvement shown is minimal. Even though all the damaged components are not used to define the search space of the rollout algorithm, only a small subset is chosen with the use of either 1step and Nstep heuristic (limited EPN tree search), the improvement shown by over is significant. This is because pruning the search space of rollout algorithm using a subset of (restricting an exhaustive search), is only a small part of the entire rollout algorithm. Further explanation for such a behavior is suitably explained in Section 7.1.2.
Fig. 11 shows the histogram of values of for multiple scenarios, as a result of application of stringactions computed using and (1step and Nstep heuristic). The rollout algorithms show substantial improvement over , for our problem.
Fig. 12 shows simulation results for multiple scenarios when objective 2 is optimized, but when is considered instead of . This simulation study highlights interesting behavior exhibited by the rollout algorithm. In the initial phase of decision making, the algorithm competes with the rollout algorithm , slightly outperforming the rollout algorithm in many instances. However, after a period of 10 days, rollout (both 1step and Nstep heuristic) comprehensively outperforms . Because rollout has the lookahead property, it offers conservative repair decisions initially (despite staying competitive with ), in anticipation of overcoming the loss suffered due to initial conservative decisions. Optimizing with foresight is an emergent behavior exhibited by our optimization methodology, which can offer significant advantages in critical decisionmaking scenarios.
7.1.2 Case 2: Repair Action Optimization of EPN for Household Units & Retailers
It is difficult to come up with techniques similar to 1step and Nstep heuristic to reduce the size of the search space for objective 1 and objective 2, when multiple networks are considered simultaneously in the analysis. This is because any such technique would have to simultaneously balance the pruning of candidate damaged components in Fig. 8 (to form subsets like and ), serving both food retailers and household units. There is no natural/simple way of achieving this as in the case of the 1step and Nstep heuristics where only household units were considered. Whenever we consider complex objective functions and interaction between networks, it is difficult to prune the action space in a physically meaningful way just by applying heuristics. For our case, any such heuristic will have to incorporate the gravity model explained in Section 3. The heuristic must also consider the network topology and actual physical position of important EPN components within the network. As previously seen, our methodology works well even if we select only a small subset of ( and ) to construct to avoid huge computational costs. This is because our methodology leverages the use of the onestep lookahead property with consistent and sequential improvement over algorithm or , to overcome any degradation in performance as a result of choosing .
In Figs. 13 and 14, is used as the base heuristic. This base heuristic is the same as the one used in the simulations shown in Case 1, which is not particularly well tuned for Case 2. Despite this, shows a stark improvement over . Fig. 13 shows that the rollout algorithm, with the random selection of candidate damaged components (), significantly outperforms for objective 1. When we select the candidate damaged locations randomly, in addition to the randomly selected damaged components, we add to the set the damaged components selected by at each . For the rollout algorithm, the mean number of days, over multiple damage scenarios, to provide electricity to times the total number of people is approximately 8 days, whereas for the base heuristic it is approximately 30. Similarly, Fig. 14 shows that for objective 2 the benefit of EPN recovery per unit of time with rollout is significantly better than .
Figs. 15 and 16 provide the synopsis of the results for both the objectives when is considered. In Fig. 15, the number of days to reach a threshold as a result of algorithm is better than . However, note that the number of days to achieve objective 1 is still fewer using the rollout algorithm. The key inference from this observation is that rollout might not always significantly outperform the base heuristic but will never perform worse than the underlying heuristic.
For simulations in Figs. 15 and 16, the candidate damaged locations in defining the search space for the rollout algorithm are again chosen randomly and are a subset of . As in the case of simulations in Fig. 13, we add damaged components selected by to the set . Note that the number of days required to restore electricity to 80% of people in Fig. 13 is a day less than that required in Fig. 15 despite performing rollout on a random base heuristic instead of the smart base heuristic used in the later. This can be attributed to 3 reasons: a) The damaged components in the set are chosen randomly for each simulation case. b) was designed for the simulations in Section. 7.1.1 and is not particularly well tuned for simulations when both household units and retailers are considered simultaneously. c) is used in simulations in Fig. 15 to approximate the costtogo function whereas is used in simulations in Fig. 13 for the approximation of the scoring function.
7.2 Computational Efforts
We provide a brief description of the computational efforts undertaken to optimize our decisionmaking problem. We have simulated multiple damage scenarios for each simulation result provided in this work. The solution methodology was implemented in MATLAB. The rollout algorithm gives multiple calls to the base heuristic function and searches for stringactions over the set , , or . This is akin to giving millions of calls to the simulator. Therefore, our implementation of the code had to achieve a run time on the order of few micro seconds () so that millions of calls to the simulator are possible. A single call to the simulator will be defined as evaluating some repair action , , or and completing the rollout process until all the damaged components are repaired (complete rollout Bertsekasconst ). Despite the use of best software practices, to mitigate large action spaces by coding a fast simulator, it is imperative to match it with good hardware capability for simulations of significant size. It is possible to parallelize the simulation process to fully exploit modern day hardware capabilities. We ran our simulations on the Summit super computer (see Section 9). Specifically, (100/376) Poweredge C6320 Haswell nodes were used, where each node has 2x e52680v3 (2400/9024) cores. In Case 1, we never encountered any or exceeding for any damage scenario, whereas for case 2 we limited the to at most , for all damage scenarios. On this system, with a simulator call run time of 1 , we managed to run simulator calls per node for different simulation plots. Our computational efforts emphasize that for massive combinatorial decisionmaking problems, it is possible to achieve nearoptimal performance with our methodology by harnessing powerful present day computational systems and implementing sound software practices.
8 Concluding Remarks
In this study, we proposed an optimization formulation based on the method of rollout, for the identification of nearoptimal community recovery actions, following a disaster. Our methodology utilizes approximate dynamic programming algorithms along with heuristics to overcome the curse of dimensionality in its analysis and management of largescale infrastructure systems. We have shown that the proposed approach can be implemented efficiently to identify nearoptimal recovery decisions following a severe earthquake for the electrical power serving Gilroy. Different base heuristics, used at the communitylevel recovery, are used in the simulation studies. The intended methodology and the rollout policies significantly enhanced these base heuristics. Further, the efficient performance of the rollout formulation in optimizing different common objective functions for community resilience is demonstrated. We believe that the methodology is adaptable to other infrastructure systems and hazards.
9 Acknowledgement
“The research herein has been funded by the National Science Foundation under Grant CMMI1638284. This support is gratefully acknowledged. Any opinions, findings, conclusions, or recommendations presented in this material are solely those of the authors and do not necessarily reflect the views of the National Science Foundation.”
“This work utilized the RMACC Summit supercomputer, which is supported by the National Science Foundation (awards ACI1532235 and ACI1532236), the University of Colorado Boulder and Colorado State University. The RMACC Summit supercomputer is a joint effort of the University of Colorado Boulder and Colorado State University.”
10 References
References
 (1) J. A. Nachlas, Reliability Engineering: Probabilistic Models and Maintenance Methods, CRC Press, 2017.
 (2) M. J. Kochenderfer, Decision Making Under Uncertainty: Theory and Application, MIT Press, 2015.
 (3) D. P. Bertsekas, Dynamic Programming and Optimal Control, Vol. 1, Athena Sci. Belmont, MA, 1995.

(4)
H. Meidani, R. Ghanem, Random Markov decision processes for sustainable infrastructure systems, Struct. and Infrastruct. Eng. 11 (5) (2015) 655–667.
 (5) D. M. Frangopol, M. J. Kallen, J. M. Van Noortwijk, Probabilistic models for lifecycle performance of deteriorating structures: review and future directions, Prog. in Struct. Eng. and Mater. 6 (4) (2004) 197–212.
 (6) H. Ellis, M. Jiang, R. B. Corotis, Inspection, Maintenance, and Repair with Partial Observability, J. of Infrastruct. Syst. 1 (2) (1995) 92–99.
 (7) R. B. Corotis, J. Hugh Ellis, M. Jiang, Modeling of riskbased inspection, maintenance and lifecycle cost with partially observable Markov decision processes, Struct. and Infrastruct. Eng. 1 (1) (2005) 75–84.
 (8) E. Fereshtehnejad, A. Shafieezadeh, A randomized pointbased value iteration POMDP enhanced with a counting process technique for optimal management of multistate multielement systems, Struct. Saf. 65 (2017) 113–125.
 (9) D. P. Bertsekas, J. N. Tsitsiklis, NeuroDynamic Programming, 1st Edition, Athena Sci., 1996.
 (10) K. D. Kuhn, NetworkLevel Infrastructure Management Using Approximate Dynamic Programming, J. of Infrastruct. Syst. 16 (2) (2009) 103–111.
 (11) A. Medury, S. Madanat, Incorporating network considerations into pavement management systems: A case for approximate dynamic programming, Transp. Res. Part C: Emerg. Technol. 33 (2013) 134–150.

(12)
L. Busoniu, R. Babuska, B. De Schutter, D. Ernst, Reinforcement Learning and Dynamic Programming Using Function Approximators, Vol. 39, CRC Press, 2010.
 (13) Office of the Press Secretary, The White House, Presidential Policy Directive – Critical Infrastructure Security and Resilience (Feb 2013).
 (14) M. Bruneau, S. E. Chang, R. T. Eguchi, G. C. Lee, T. D. O’Rourke, A. M. Reinhorn, M. Shinozuka, K. Tierney, W. A. Wallace, D. Von Winterfeldt, A Framework to Quantitatively Assess and Enhance the Seismic Resilience of Communities, Earthq. Spectra 19 (4) (2003) 733–752.
 (15) S. V. Buldyrev, R. Parshani, G. Paul, H. E. Stanley, S. Havlin, Catastrophic cascade of failures in interdependent networks, Nat. 464 (7291) (2010) 1025.
 (16) A. Barabadi, Y. Z. Ayele, Postdisaster infrastructure recovery: Prediction of recovery rate using historical data, Reliab. Eng. & Syst. Saf. 169 (2018) 209–223.
 (17) D. J. Watts, S. H. Strogatz, Collective dynamics of ‘smallworld’ networks, Nat. 393 (6684) (1998) 440.
 (18) T. McAllister, Research Needs for Developing a RiskInformed Methodology for Community Resilience, J. of Struct. Eng. 142 (8).
 (19) G. P. Cimellaro, D. Solari, V. Arcidiacono, C. S. Renschler, A. Reinhorn, M. Bruneau, Community resilience assessment integrating network interdependencies, Earthq. Eng. Res. Inst., 2014. doi:10.4231/D3930NV8W.

(20)
The
Association of Bay Area Governments (City of Gilroy Annex) (2011).
URL http://resilience.abag.ca.gov/wpcontent/documents/2010LHMP/GilroyAnnex2011.pdf 
(21)
M. Harnish,
20152023
HOUSING ELEMENT POLICY DOCUMENT AND BACKGROUND REPORT (Dec 2014).
URL http://www.gilroy2040.com/wpcontent/uploads/2015/03/GilHE_Adopted_HE_Compiled_web.pdf 
(22)
A. Adigaa, A. Agashea, S. Arifuzzamana, C. L. Barretta, R. Beckmana,
K. Bisseta, J. Chena, Y. Chungbaeka, S. Eubanka, E. Guptaa, M. Khana, C. J.
Kuhlmana, H. S. Mortveita, E. Nordberga, C. Riversa, P. Stretza, S. Swarupa,
A. Wilsona, D. Xiea,
Generating
a synthetic population of the United States, Tech. rep. (Jan 2015).
URL http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.712.1618&rep=rep1&type=pdf 
(23)
D. Kahle, H. Wickham,
ggmap:
Spatial Visualization with ggplot2, The R J. 5 (2013) 144––162.
URL http://journal.rproject.org/archive/20131/kahlewickham.pdf  (24) National Research Council, Practical Lessons from the Loma Prieta Earthquake, The Natl. Acad. Press, Washington, DC, 1994. doi:10.17226/2269.
 (25) N. Jayaram, J. W. Baker, Correlation model for spatially distributed groundmotion intensities, Earthq. Eng. & Struct. Dyn. 38 (15) (2009) 1687–1708.
 (26) N. A. Abrahamson, W. J. Silva, R. Kamai, Update of the AS08 groundmotion prediction equations based on the NGAWest2 data set, Pac. Earthq. Eng. Res. Cent., 2013.
 (27) R. P. Kennedy, M. K. Ravindra, Seismic fragilities for nuclear power plant risk studies, Nucl. Eng. and Des. 79 (1) (1984) 47–68.
 (28) M. Colombi, B. Borzi, H. Crowley, M. Onida, F. Meroni, R. Pinho, Deriving vulnerability curves using Italian earthquake damage data, Bull. of Earthq. Eng. 6 (3) (2008) 485–504.
 (29) A. Singhal, A. S. Kiremidjian, Method for Probabilistic Evaluation of Seismic Structural Damage, J. of Struct. Eng. 122 (12) (1996) 1459–1467.
 (30) K. Jaiswal, W. Aspinall, D. Perkins, D. Wald, K. Porter, Use of Expert Judgment Elicitation to Estimate Seismic Vulnerability of Selected Building Types, in: Proc. of the 15th World Conf. on Earthq. Eng., Lisbon, Portugal, 2012.

(31)
Department of Homeland Security, Emergency Preparedness and Response
Directorate, FEMA, Mitigation Division,
Multihazard Loss Estimation
Methodology, Earthquake Model: HAZUSMH MR1, Advanced Engineering Building
Module, Washington, DC (Jan 2003).
URL https://www.hsdl.org/?view&did=11343  (32) L. Xie, J. Tang, H. Tang, Q. Xie, S. Xue, Seismic Fragility Assessment of Transmission Towers via Performancebased Analysis, in: Proc. of the 15th World Conf. on Earthq. Eng., Lisbon, Portugal.
 (33) M. Ouyang, L. DueñasOsorio, X. Min, A threestage resilience analysis framework for urban infrastructure systems, Struc. Saf. 36 (2012) 23–31.
 (34) S. Nozhati, B. R. Ellingwood, H. Mahmoud, J. W. van de Lindt, Identifying and Analyzing Interdependent Critical Infrastructure in PostEarthquake Urban Reconstruction, in: Proc. of the 11th Natl. Conf. in Earthq. Eng., Earthq. Eng. Res. Inst., Los Angeles, CA, 2018.
 (35) S. Nozhati, B. R. Ellingwood, H. Mahmoud, Y. Sarkale, E. K. P. Chong, N. Rosenheim, An approximate dynamic programming approach to community recovery management, in: Eng. Mech. Inst., 2018.
 (36) S. Nozhati, Y. Sarkale, B. R. Ellingwood, E. K. P. Chong, H. Mahmoud, A combined approximate dynamic programming & simulated annealing optimization method to address communitylevel food security in the aftermath of disasters, in: submitted to iEMSs 2018. arXiv:1804.00250.
 (37) Y. Sarkale, S. Nozhati, E. K. P. Chong, B. R. Ellingwood, H. Mahmoud, Solving Markov decision processes for networklevel posthazard recovery via simulation optimization and rollout, in: submitted to IEEE CASE 2018 (Invited Paper). arXiv:1803.04144.
 (38) D. P. Bertsekas, Rollout Algorithms for Discrete Optimization: A Survey, SpringerVerlag New York, 2013, pp. 2989–3013. doi:10.1007/9781441979971_8.
 (39) D. P. Bertsekas, J. N. Tsitsiklis, C. Wu, Rollout Algorithms for Combinatorial Optimization, J. of Heuristics 3 (3) (1997) 245–262. doi:10.1023/A:1009635226865.
 (40) N. Limnios, Fault Trees, WileyISTE, 2013.
 (41) Y. Ouyang, S. Madanat, Optimal scheduling of rehabilitation activities for multiple pavement facilities: exact and approximate solutions, Transportation Res. Part A: Policy and Pract. 38 (5) (2004) 347–365. doi:10.1016/j.tra.2003.10.007.
 (42) Y. Liu, E. K. P. Chong, A. Pezeshki, Bounding the greedy strategy in finitehorizon string optimization, in: 2015 IEEE 54th Annu. Conf. on Decis. and Control (CDC), IEEE, 2015, pp. 3900–3905.
 (43) H. Masoomi, J. W. van de Lindt, Restoration and functionality assessment of a community subjected to tornado hazard, Struct. and Infrastruct. Eng. 14 (3) (2018) 275–291.
 (44) R. A. Howard, Dynamic Programming and Markov Processes, MIT Press, Cambridge, MA, 1960.

(45)
G. Tesauro, G. R. Galperin,
Online
Policy Improvement using MonteCarlo Search, in: M. C. Mozer,
M. I. Jordan, T. Petsche (Eds.), Advances in Neural Information Processing
Systems 9, MIT Press, 1997, pp. 1068–1074.
URL http://papers.nips.cc/paper/1302onlinepolicyimprovementusingmontecarlosearch.pdf  (46) B. Abramson, ExpectedOutcome: A General Model of Static Evaluation, IEEE Trans. on Pattern Anal. and Mach. Intell. 12 (2) (1990) 182–193. doi:10.1109/34.44404.
 (47) S. Ragi, H. D. Mittelmann, E. K. P. Chong, Directional Sensor Control: Heuristic Approaches, IEEE Sens. J. 15 (1) (2015) 374–381. doi:10.1109/JSEN.2014.2343019.
 (48) D. P. Bertsekas, Rollout Algorithms for Constrained Dynamic Programming, Tech. rep., Lab. for Inf. & Decis. Syst. (Apr 2005).