Synthesis of Optimal Resilient Control Strategies

07/11/2017
by   Christel Baier, et al.
Masarykova univerzita
TU Dresden
0

Repair mechanisms are important within resilient systems to maintain the system in an operational state after an error occurred. Usually, constraints on the repair mechanisms are imposed, e.g., concerning the time or resources required (such as energy consumption or other kinds of costs). For systems modeled by Markov decision processes (MDPs), we introduce the concept of resilient schedulers, which represent control strategies guaranteeing that these constraints are always met within some given probability. Assigning rewards to the operational states of the system, we then aim towards resilient schedulers which maximize the long-run average reward, i.e., the expected mean payoff. We present a pseudo-polynomial algorithm that decides whether a resilient scheduler exists and if so, yields an optimal resilient scheduler. We show also that already the decision problem asking whether there exists a resilient scheduler is PSPACE-hard.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

06/12/2021

Markov Decision Processes with Long-Term Average Constraints

We consider the problem of constrained Markov Decision Process (CMDP) wh...
07/01/2021

Strategy Complexity of Mean Payoff, Total Payoff and Point Payoff Objectives in Countable MDPs

We study countably infinite Markov decision processes (MDPs) with real-v...
09/14/2017

Synthesizing Optimally Resilient Controllers

Recently, Dallal, Neider, and Tabuada studied a generalization of the cl...
03/10/2022

Strategy Complexity of Point Payoff, Mean Payoff and Total Payoff Objectives in Countable MDPs

We study countably infinite Markov decision processes (MDPs) with real-v...
04/30/2019

The Complexity of POMDPs with Long-run Average Objectives

We study the problem of approximation of optimal values in partially-obs...
05/07/2022

Airport Digital Twins for Resilient Disaster Management Response

Airports are constantly facing a variety of hazards and threats from nat...

1 Introduction

Computer systems are resilient when they incorporate mechanisms to adapt to changing conditions and to recover rapidly or at low costs from disruptions. The latter property of resilient systems is usually maintained through repair mechanisms, which push the system towards an operational state after some error occurred. Resilient systems and repair mechanisms have been widely studied in the literature and are an active field of research (see, e.g., [2] for an overview). Errors such as measurement errors, read/write errors, connection errors do not necessarily impose a system error but may be repaired to foster the system to be operational. Examples of repair mechanisms include rejuvenation procedures that face the degradation of software over time [12], the evaluation of checksums to repair communication errors, or methods to counter an attack from outside a security system. The repair of a degraded software system could be achieved, e.g., by clearing caches (fast, very good availability), by running maintenance methods (more time, less availability, but higher success), or by a full restart (slow, cutting off availability, but guaranteed success). Depending on the situation the system faces, there is a trade-off between these characteristics and a choice has to be made, which of the repair mechanisms should be executed to fulfill further constraints on the repair, which errors should be avoided, and to optimize an overall goal. Usually, finding suitable control strategies performing the choices for repair is done in an ad-hoc manner and requires a considerable engineering effort.

In this paper, we face the question of an automated synthesis of resilient control strategies that maximize the long-run average availability of the system. Inspired by the use of probabilistic response patterns to describe resilience [7], we focus on control strategies that are probabilistically resilient, i.e., with high probability repair mechanisms succeed within a given amount of time or other kinds of costs. Our formal model we use to describe resilient systems is provided by Markov decision processes (MDPs, see, e.g., [18, 16]). That is, directed graphs over states with edges annotated by actions that stand for non-deterministic choices and stochastic information about the probabilistic choices resolved after taking some action. Following [3, 15], we distinguish between three kinds of states: error, repair and operational states. Error states stand for states where a disruption of the system is discovered, initiating a repair mechanism modeled by repair states. Operational states are those states where the system is available and no repair is required. To reason about the trade-off between choosing control strategies, we amend error and repair states with cost values, and operational states with payoff values, respectively. Assigned costs formalize, e.g., the time required or the energy consumed for leaving an error or repair state. Likewise, assigned payoff values quantify the benefit of some operational state, e.g., stand for the number of successfully completed tasks while being operational. We define the long-run average availability as the mean-payoff. Control strategies in MDPs are provided by (randomized) schedulers that, depending on the history of the system execution, choose the probability of the next action to fire. When the probabilities for action choices are Dirac, i.e., exactly one action is chosen almost surely, the scheduler is called deterministic. Schedulers which select an action only depending on the current state, i.e., do not depend on the history, are called memoryless. For a given cost bound and a probability threshold , we call a scheduler resilient if the scheduler ensures for every error a recovery within at most costs with probability at least .

Our Contribution.

We show that if the cost bound is represented in unary, the existence of a resilient scheduler is solvable in polynomial time. Further, we show that if there is at least one resilient scheduler, then there also exists an optimal resilient scheduler computable in polynomial time. Here, optimality means that achieves the maximal long-run average availability among all resilient schedulers. The constructed scheduler is randomized and uses finite memory. The example below illustrates that deterministic or memoryless randomized schedulers are less powerful. If is encoded in binary, our algorithms are exponential, and we show that deciding the existence of a resilient scheduler becomes PSPACE-hard. Let us note that all numerical constants (such as or MDP transition probabilities) except for are represented as fractions of binary numbers. The key technical ingredients of our results are non-trivial observations about the structure of resilient schedulers, which connect the studied problems to the existing works on MDPs with multiple objectives and optimal strategy synthesis [16, 10, 6]. The PSPACE-hardness result is obtained by a simple reduction of the cost-bounded reachability problem in acyclic MDPs [14]. More details are given at appropriate places in Section 3 and in the appendix.

0

0

0

1

1

, 1/2

, 1/2

Figure 1: Optimal resilient schedulers might require finite memory and randomization
Example.

As a simple example, consider an MDP model of a resilient system depicted in Fig. 1. Operational states are depicted by thin rounded boxes, error states are shown as rectangles and repair states are depicted by thick-rounded boxes. Assigned cost and payoff values are indicated above the nodes of the MDP. For edges without any action name or probability, we assume one action with probability one. The system starts its execution in the operational state , from which it reaches the error state and directly invokes a repair mechanism by switching to the repair state , where either action or can be chosen. After taking , an operational state is reached that, however, does not grant any payoff. When choosing , a fair coin is flipped and either the repair mechanism has to be tried again or the operational state is reached, while providing the payoff value 1 for each visit of . Assume that we have given the cost bound and probability threshold . The memoryless deterministic strategy always choosing yields the maximal possible mean payoff of , but is not resilient as . The memoryless randomized scheduler that chooses with probability is resilient and achieves the maximal mean payoff of , when ranging over all memoryless randomized schedulers. Differently, the finite-memory randomized scheduler playing with probability in the second step and with probability 1 in all other steps yields the mean payoff of , which is optimal within all resilient schedulers. As this example shows, optimal resilient schedulers might require randomization and finite memory in terms of remembering the accumulated costs spent so far after an error occurred.

Related work.

Concerning the analysis of resilient systems, [3] presented algorithms to reason about trade-offs between costs and payoffs using (probabilistic) model-checking techniques. In [17], several metrics to quantify resiliency and their applications to large scale systems has been detailed.

Synthesis of control strategies for resilient systems have been mainly considered in the non-probabilistic setting. In [15], a game-theoretic approach towards synthesizing strategies that maintain a certain resilience level has been presented. The resilience level is defined in terms of the number of errors from which the system can recover simultaneously. Automatic synthesis of Pareto-optimal implementations of resilient systems were detailed in [9]. Robust synthesis procedures with both, qualitative and mean-payoff objectives have been presented in [5]. In [13], the authors present algorithms to synthesize controllers for fault-tolerant systems compliant to constraints on power consumption.

Optimization problems for MDPs with mean-payoff objectives and constraints on cost structures have been widely studied in the field of constrained Markov decision processes (see, e.g., [18] and [1] for an overview). MDPs with multiple constraints on the probabilities for satisfying -regular specifications were studied in [10]. This work has been extended to also allow for (multiple) constraints on the expected total reward in MDPs with rewards in [11]. Synthesis of optimal schedulers with multiple long-run average objectives in MDPs has been considered in [8, 6]

. All of the mentioned approaches have in common that they adapt well-known linear programs to synthesize optimal memoryless randomized schedulers (see, e.g.,

[16, 18]). We also use combinations of similar techniques to find optimal resilient schedulers. As far as we know, we are the first to consider mean-payoff optimization problems under cost-bounded reachability probability constraints. Although we investigate these problems in the context of resilient systems, they are interesting by its own.

2 Notations and problem statement

Given a finite set , we denote by the set of probability distributions on , i.e., the set of functions where . By we denote finite or infinite sequences of elements of . We assume that the reader is familiar with principles about probabilistic systems, logics, and model-checking techniques and refer to [4] for an introduction in these subjects.

2.1 Markov decision processes

A Markov decision process (MDP) is a triple , where is a finite state space, an initial state, a finite set of actions, and a transition probability function, i.e., a function where for all and . For , let denote the set of actions that are enabled in , i.e., iff is a probability distribution over . Unless stated differently, we suppose that any MDP does not have any trap states, i.e., states where . Paths in are alternating sequences of states and actions, such that for all . The set of all finite paths starting in state is denoted by , where we omit when all finite paths from any state are issued.

A (randomized, history-dependent) scheduler for is a function . A -path in is a path in where for all we have that . We write for the probability measure on infinite paths of induced by a scheduler and starting in . For a scheduler and , denotes the residual scheduler given by for each finite path where the first state of equals the last state of . Here is used for the concatenation operator on finite paths. is called memoryless if for all and all finite paths where the last state of is . We abbreviate memoryless (randomized) schedulers as MR-schedulers.

2.2 Markov decision processes with repair

Let be an MDP and suppose that we have given two disjoint sets of states . Intuitively, stands for the set of states where an error occurs, and stands for the set of states where the system modeled is operational. In all other states, we assume that a repair mechanism is running, triggered directly within the next transition after some error occurred. We formalize the latter assumption by

(*)

where and stand for the standard next and weak-until operator, respectively, borrowed from computation tree logic (CTL, see, e.g., [4]). Assumption (*2.2) also asserts that as soon as a repair protocol has been started, the system does not enter a new error state before a successful repair, i.e., until the system switches to its operational mode.

Further, we suppose that states in are amended with non-negative integer values, i.e., we are given a non-negative integer reward function . For an operational state , the value is viewed as the payoff value of state , while for the non-operational states , the value is viewed as the repairing costs caused by state . To reflect this intuitive meaning of the reward values, we shall write instead of for and instead of for . Furthermore, we assume if and if . For a finite path , let and be and , respectively.

An MDP with repair is formally defined as a tuple , where assumption (*2.2) is satisfied and the transition probability function of is rational, assuming representation of probabilities as fractions of binary numbers.

2.3 Long-run availability and resilient schedulers

Given an MDP with repair and a scheduler for , we define the long-run availability of , denoted by , as the expected long-run average (mean-payoff) of the payoff function. That is, for any ,

agrees with the expectation of the random variable

under that assigns to each infinite path the value

Let us further assume that we have given a rational probability threshold and a cost bound . The threshold is always represented as a fraction of two binary numbers. The bound is represented either in binary or in unary, which significantly influences the (computational) complexity of the studied problems.

Definition 1 (Resilient schedulers)

A scheduler is said to be probabilistically resilient with respect to and if the following conditions (Res) and (ASRep) hold for all finite -paths from to an error state :

(Res)
(ASRep)

Here, denotes the set of infinite paths for which there exist a finite path  and an infinite path such that and the last state of is in . Further, denotes the set restricted to paths satisfying .

The task addressed in this paper is to check the existence of resilient schedulers (i.e., schedulers that are probabilistically resilient w.r.t. and ), and if so, construct an optimal resilient scheduler that has maximal long-run availability amongst all resilient schedulers, i.e., , where

3 The results

In the following, we present and prove our main result of this paper:

Theorem 3.1

Let be an MDP with repair, a rational probability threshold, and a cost bound encoded in unary. The existence of a probabilistically resilient scheduler w.r.t. and is decidable in polynomial time. If such a scheduler exists, then an optimal probabilistically resilient scheduler (w.r.t. and ) is computable in polynomial time.

If  is encoded in binary, our algorithms are exponential, and we show that even the existence of a probabilistically resilient scheduler w.r.t. and becomes PSPACE-hard. The optimal scheduler is randomized and history dependent, which is unavoidable (see the example in the introduction). More precisely, the memory requirements of are finite with at most memory elements, and this memory is only used in the repairing phase where the scheduler needs to remember the error state and the total costs accumulated since visiting this error state.

For the rest of this section, we fix an MDP with repair where , a rational probability threshold , and a cost bound . We say that a scheduler is resilient if it is probabilistically resilient w.r.t. and .

The proof of Theorem 3.1 is obtained in two steps. First, the MDP is transformed into a suitable MDP where the total costs accumulated since the last error are explicitly remembered in the states. Hence, the size of is polynomial in the input size if is encoded in unary. We will show that the problem of computing an optimal resilient scheduler can be safely considered in instead of . In the second step, it is shown that there exists an optimal memoryless resilient scheduler for computable in time polynomial in the size of . This is the very core of our paper requiring non-trivial observations and constructions. Roughly speaking, we start by connecting our problem to the problem of multiple mean-payoff optimization, and use the results and algorithms presented in [6] to analyze the limit behavior of resilient schedulers. First, we show how to compute the set of end components such that resilient schedulers can stay only in these end components without loosing availability. We also compute memoryless schedulers for these end components that can safely be adopted by resilient schedulers. Then, we show that the behavior of a resilient scheduler prior entering an end component can also be modified so that it becomes memoryless and the achieved availability does not decrease. After understanding the structure of resilient schedulers, we can compute an optimal memoryless resilient scheduler for by solving suitable linear programs.

The first step (i.e, the transformation of into ) is described in Section 3.1, and the second step in Section 3.2.

3.1 Transformation

Let be an MDP with repair where is an MDP such that with

Intuitively, state indicates that the system is in state executing a repair procedure that has been triggered by visiting somewhen in the past and with accumulated costs so far. For technical reasons, we also include triples with in which case a repair mode with total cost has just finished. The sets of error and operational states in are:

   and    .

The action set of is the same as for . In what follows, we write for the set of actions that are enabled in state of . Then, . Let and . Then, if . If and , then

For, , , and we have:

In all remaining cases, we set . The reward function of is given by and . Note that assumption (*2.2) ensures that for all states .

There is a one-to-one correspondence between the paths in and in . More precisely, given a (finite or infinite) path in , let denote the unique path in that arises from by replacing each repair state with . Vice versa, each path in can be lifted to a path in such that . Next lemmas follow directly from definitions of and .

Lemma 1

For each finite path in starting in some state we have .

Lemma 2

For each infinite path in , .

The one-to-one correspondence between the paths in and in carries over to the schedulers for and . Given a scheduler for , let denote the scheduler for given by for all finite paths of . This yields a scheduler transformation that maps each scheduler for to a scheduler for . Vice versa, given a scheduler for there exists a scheduler such that .

Due to assumption (*2.2) we have that for all repair states that are reachable from in . Thus, with Lemma 1 and Lemma 2, we obtain:

Lemma 3

Let be a scheduler for and a scheduler for such that . Then:

  1. For each state :   and

    where .

Corollary 1

Proof

The above transformations and for paths and schedulers of to paths and schedulers of , and the inverse mappings and for paths and schedulers of to paths and schedulers of are compatible with the residual operator for schedulers in the following sense:

Thus, part (a) of Lemma 3 yields that is resilient for if and only if is resilient for . Part (b) of Lemma 3 then yields the claim.∎

The following mainly technical lemma shows that residual schedulers arising from resilient schedulers maintain the resilience property.

Lemma 4

Let be a resilient scheduler for , and let be a state of such that . Let be a set of finite -paths initiated in and terminating in , and let be a scheduler for resilient for the initial state changed to . Consider the scheduler which is the same as except that for every finite path such that where we have that . Then is resilient (for the initial state ).

3.2 Solving the resilience-availability problem for

In this section, we analyze the structure of resilient schedulers for and prove the following proposition:

Proposition 1

The existence of a resilient scheduler for can be decided in polynomial time. The existence of some resilient scheduler for implies the existence of an optimal memoryless resilient scheduler for computable in polynomial time.

Note that Theorem 3.1 follows immediately from Proposition 1 and Corollary 1.

We start by introducing some notions. A fragment of is a pair where and is a function such that and for every . An MR-scheduler for is a function assigning a probability distribution over to every . We say that a scheduler for is consistent with if for every ending in a state of we have that .

An end component of is a fragment of such that

  • is strongly connected, i.e., for all there is a finite path from to such that and for all ;

  • for all , , and such that we have .

Let be a scheduler for (not necessarily resilient). For every infinite path , let be the set of states occurring infinitely often in . For every , let be the set of all actions executed infinitely often from along . For a fragment , let be the set of all infinite paths such that and , and let be the probability of all starting in . If is not an end component, then clearly . Hence, there are end components such that:

for all , and

We say that stays in these end components.

Proposition 1 is proved as follows. We show that there is a set , computable in time polynomial in , consisting of triples of the form such that is an end component of and is an MR-scheduler for , satisfying the following conditions (E1) and (E2):

(E1)

If , then the two triples are either the same or .

(E2)

Every is strongly connected, i.e., the directed graph , where iff there is some such that and , is strongly connected. (In this case,

is a bottom strongly connected component of the Markov chain induced by

.)

Further, we can safely restrict ourselves to resilient schedulers whose long-run behavior is captured by some subset in the following sense:

Lemma 5

Given the set , for every resilient scheduler there exist a set and a resilient scheduler such that

  • almost all -paths starting in visit a state of ,

  • is consistent with for every ,

  • .

Using Lemma 5, we prove the following:

Lemma 6

Given the set , there is a linear program computable in time polynomial in satisfying the following: If is not feasible, then there is no resilient scheduler for . Otherwise, there is a subset and an MR-scheduler for the fragment with and for every such that

  • and are computable in time polynomial in ,

  • the scheduler consistent with and for every is resilient, and

  • for every resilient scheduler we have that .

In the next subsections, we show how to compute the set satisfying conditions (E1) and (E2) in polynomial time and provide proofs for Lemmas 5 and 6. Note that Proposition 1 then follows from Lemma 6 and the polynomial-time computability of .

3.2.1 Constructing the set .

For each , we define the weight function given by

and otherwise (in particular, for all states in that do not have the form ). For every scheduler , let be the expected value (under ) of the random variable assigning to each infinite -path the value

.

We say that a scheduler for is average-resilient if for all . Note that if is a resilient scheduler for , then for almost all

(this follows by a straightforward application of the strong law of large numbers). Thus, we obtain:

Lemma 7

Every resilient scheduler for is average-resilient.

Although an average-resilient scheduler for is not necessarily resilient, we show that the problems of maximizing the long-run availability under resilient and average-resilient schedulers are to some extent related. The latter problem can be solved by the algorithm of [6]. More precisely, by Theorem 4.1 of [6], one can compute a linear program  in time polynomial in such that:

  • if is not feasible, then there is no average-resilient scheduler for ;

  • otherwise, there is a 2-memory stochastic update scheduler for , constructible in time polynomial in , which is average-resilient and achieves the maximal long-run availability among all average-resilient schedulers.

The scheduler almost surely “switches” from its initial mode to its second mode where it behaves memoryless. Hence, there is a set (computable in time polynomial in ) comprising triples that enjoy the following properties (H1) and (H2):

(H1)

is an end component of and is an MR-scheduler for achieving the maximal long-run availability among all average-resilient schedulers for every initial state .

(H2)

If , then the two triples are either the same or . Further, every is strongly connected.

We show that for every and every , the scheduler is resilient when the initial state is changed to  (see Lemma 10). So, starts to behave like a resilient scheduler after a “switch” to some . However, in the initial transient phase, may violate the resilience condition, which may disallow a resilient scheduler to enter some of the end components of . Thus, a resilient scheduler can in general be forced to stay in an end component that does not appear in . So, the set needs to be larger than , and we show that a sufficiently large is computable in polynomial time by Algorithm 1.

Algorithm 1 starts by initializing to , to , and to . Then, it computes the linear program and checks its feasibility. If is not feasible, the initial state of is removed from in the way described below. Otherwise, the algorithm constructs the scheduler , adds to , and “prunes” into . If the state is deleted from , some state of is chosen as a new initial state. This goes on until becomes empty. Here, the MDP is the largest MDP subsumed by which does not contain the states in . Note that when a state of is deleted, all actions leading to this state must be disabled; and if all outgoing actions of a state are disabled, then must be deleted. Hence, deleting the states appearing in may enforce deleting additional states and disabling further actions. Note that every is obtained in some iteration of the repeat-until cycle of Algorithm 1 by constructing the scheduler for the current value of . We denote this MDP as (note that is not necessarily connected). The set returned by Algorithm 1 indeed satisfies conditions (E1) and (E2). The outcome is possible, in which case there is no resilient scheduler for as the linear program of Lemma 6 is not feasible for .

An immediate consequence of property (H1) is the following:

Lemma 8

Let and . Then achieves the maximal long-run availability for the initial state among all average-resilient schedulers for .

input : the transformed MDP
output : the set satisfying (E1) and (E2)
1 , , repeat
2       Compute the linear program if  is feasible then
3             compute the scheduler and the set satisfying (H1) and (H2)
4      else
5            
6      if  then
7            
until  becomes emptyreturn
Algorithm 1 Computing the set .

The next lemma follows easily from the construction of .

Lemma 9

Let be a scheduler for (not necessarily resilient) and let be an end component where stays with positive probability. Then there is such that is an end component of and .

Let . Since is an MR-scheduler, the behavior of in an error state (for an arbitrary initial state ) is independent of the history. That is, the resilience condition is either simultaneously satisfied or simultaneously violated for all visits to . However, if the second case holds, is not even average-resilient, what is a contradiction. Thus, we obtain:

Lemma 10

Let , and let . Then the scheduler is resilient when the initial state is changed to . Further, if is a resilient scheduler for with the initial state , then .

3.2.2 Proof of Lemma 5.

Let be a resilient scheduler for . We show that there is another resilient scheduler satisfying the conditions of Lemma 5. First, let us consider the end components where stays. For every , let be a triple with the maximal such that (such a triple exists due to Lemma 9). We say that is associated to . Let be the conditional availability w.r.t. scheduler under the condition that an infinite path initiated in stays in . Given a triple , we use to denote the availability achieved by scheduler for . Note that is independent of .

Lemma 11

, where is the triple associated to .

Further, we say that is offending if there is a finite -path initiated in ending in a state , where is associated to , such that and the availability achieved by the scheduler in is strictly larger than . Note that if no is offending, we can choose as the set of triples associated to , and redefine the scheduler into a resilient scheduler as follows: behaves exactly like until a state of some is visited. Then, switches to immediately. The scheduler is resilient because (a visit to a repair state is preceded by a visit to the associated fail state which also belongs to ) and hence we can apply Lemma 4. Clearly, is consistent with every such that . It remains to show that the availability achieved by in is not smaller than the one achieved by . This follows immediately by observing that whenever makes a switch to after performing a finite -path initiated in ending in , the availability achieved by the resilient scheduler for the initial state  must be bounded by , because otherwise some would be offending. So, the introduced “switch” can only increase the availability.

Now assume that is offending, and let be the triple associated to . We construct a resilient scheduler which stays in and achieves availability not smaller than the one achieved by . This completes the proof of Lemma 5, because we can then successively remove all offending pairs. Since is offending, there is a finite -path initiated in ending in a state such that and the availability achieved by in is larger than . Since , there is a state such that . Note that is resilient for the initial state , and almost all infinite paths initiated in visit the state  under the scheduler .

Now, we construct a resilient scheduler achieving availability at least in  such that all components where stays (for the initial state ) are among . Let be the probability that an infinite path initiated in stays in under the scheduler . If , we put . Now assume . We cannot have , because then is bounded by (see Lemma 11). Let be the conditional availability achieved in by