Reactive synthesis is an exciting and promising approach to solving a crucial problem, whose importance is ever-increasing due to ubiquitous deployment of embedded systems: obtaining correct and verified controllers for safety-critical systems. Instead of an engineer programming a controller by hand and then verifying it against a formal specification, synthesis automatically constructs a correct-by-design controller from the given specification (or reports that no such controller exists).
Typically, reactive synthesis is modeled as a two-player zero-sum game on a finite graph that is played between the system, which seeks to satisfy the specification, and its environment, which seeks to violate it. Although this model is well understood, there are still multiple obstacles to overcome before synthesis can be realistically applied in practice. These obstacles include not only the high computational complexity of the problem, but also more fundamental ones. Among the most prohibitive issues in this regard is the need for a complete model of the interaction between the system and its environment, including an accurate model of the environment, the actions available to both antagonists, as well as the effects of these actions.
This modeling task often places an insurmountable burden on engineers as the environments in which real-life controllers are intended to operate tend to be highly complex or not fully known at design time. Moreover, when a controller is deployed in the real world, a common source of errors is a mismatch between the controller’s intended result of an action and the result that actually manifests. Such situations arise, for instance, in the presence of disturbances, when the effect of an action is not precisely known, or when the intended control action of the controller cannot be executed, e.g., when an actuator malfunctions. By a slight abuse of notation from control theory, we subsume all such errors under the generic term disturbance.
To obtain controllers that can handle disturbances, one has to yield control over their occurrence to the environment. However, due to the antagonistic setting of the two-player zero-sum game, this would allow the environment to violate the specification by causing disturbances at will. Overcoming this requires the engineer to develop a realistic disturbance model, which is a highly complex task, as such disturbances are assumed to be rare events. Also, incorporating such a model into the game leads to a severe blowup in the size of the game, which can lead to intractability due to the high computational complexity of synthesis.
To overcome these difficulties, Dallal, Neider, and Tabuada [DBLP:conf/cdc/DallalNT16] proposed a conceptually simple, yet powerful extension of infinite games termed “games with unmodeled intermittent disturbances”. Such games are played similarly to classical infinite games: two players, called Player and Player , move a token through a finite graph, whose vertices are partitioned into vertices under the control of Player and Player , respectively; the winner is declared based on the resulting play. In contrast to classical games, however, the graph is augmented with additional disturbance edges that originate in vertices of Player and may lead to any other vertex. Moreover, the mechanics of how Player moves is modified: whenever Player moves the token, her move might be overridden, and the token instead moves along a disturbance edge. This change in outcome implicitly models the occurrence of a disturbance—the intended result of the controller and the actual result differ—, but it is not considered to be antagonistic. Instead, the occurrence of a disturbance is treated as a rare event without any assumptions on frequency, distribution, etc.
This non-antagonistic nature of disturbances is different from existing approaches in the literature and causes many interesting phenomena that do not occur in the classical theory of infinite games. Some of these already manifest in the parity game shown in Figure 1, in which vertices are labeled with non-negative integers, so-called colors, and Player wins if the highest color seen infinitely often is even. Consider, for instance, vertex . In the classical setting without disturbances, Player wins every play reaching by simply looping in this vertex forever (since the highest color seen infinitely often is even). However, this is no longer true in the presence of disturbances: a disturbance in causes a play to proceed to vertex , from which Player can no longer win. In vertex , Player is in a similar, yet less severe situation: she wins every play with finitely many disturbances but loses if infinitely many disturbances occur. Finally, vertex falls into a third category: from this vertex, Player wins every play even if infinitely many disturbances occur. In fact, disturbances partition the set of vertices from which Player can guarantee to win into three disjoint regions (indicated as shaded boxes in Figure 1): (A) vertices from which she can win if at most a fixed finite number of disturbance occur, (B) vertices from which she can win if any finite number of disturbances occurs but not if infinitely many occur, and (C) vertices from which she can win even if infinitely many disturbances occur.
The observation above gives rise to a question that is both theoretically interesting and practically important: if Player can tolerate different numbers of disturbances from different vertices, how should she play to be resilient111We have deliberately chosen the term resilience so as to avoid confusion with the already highly ambiguous notions of robustness and fault tolerance. to as many disturbances as possible, i.e., to tolerate as many disturbances as possible while still winning? Put slightly differently, disturbances induce an order on the space of winning strategies (“a winning strategy is better if it is more resilient”), and the natural problem is to compute optimally resilient winning strategies, yielding optimally resilient controllers. Note that this is in stark contrast to the classical theory of infinite games, where the space of winning strategies is unstructured.
Dallal, Neider, and Tabuada [DBLP:conf/cdc/DallalNT16] have already solved the problem of computing optimally resilient winning strategies for safety games. Their approach exploits the existence of maximally permissive winning strategies in safety games, which allows Player to avoid “harmful” disturbance edges during a play. In games with more expressive winning conditions, however, this is no longer possible, as witnessed by vertex in the example of Figure 1: although Player can avoid a disturbance edge by looping in forever, she needs to move to eventually in order to see an even color (otherwise she looses), thereby potentially risking to lose if a disturbance occurs. In fact, the problem of constructing optimally resilient winning strategies for games other than safety is still open.
In this paper, we solve this problem for an extensive class of infinite games, including parity games and Muller games. In particular, our contributions are as follows:
We introduce a novel concept, termed resilience, which captures for each vertex how many disturbances need to occur for Player to lose. This concept generalizes the notion of determinacy and allows us to derive optimally resilient winning strategies.
We present an algorithm for computing the resilience of vertices and optimally resilient winning strategies. Our algorithm uses solvers for the underlying game without disturbances as a subroutine, which it invokes a linear number of times on various subgames. For many winning conditions, the time complexity of our algorithm thus falls into the same complexity class as solving the original game without disturbances. In particular, we obtain an quasipolynomial algorithm for parity games with disturbances, which matches the currently best known upper bound for classical parity games.
In addition to natural assumptions on the winning condition, e.g., that games are determined and effectively solvable, our algorithm requires the winning condition to be prefix-independent. However, we show that the classical notion of game reduction carries over to the setting of games with disturbances. As a consequence, our algorithm can be applied to an extensive class of infinite games (using a reduction from prefix-dependent games to prefix-independent ones if necessary), including all -regular games.
Finally, we discuss various further phenomena that arise in the presence of disturbances. Amongst others, we illustrate how the additional goal of avoiding disturbances whenever possible affects the memory requirements of strategies. Moreover, we raise the question of how benevolent disturbances can be leveraged to recover from losing a play. However, an in-depth investigation of these phenomena is outside the scope of this paper and left for future work.
This paper is structured as follows: after setting up definitions and notations in Section 2, we present our algorithm for computing optimally resilient strategies in Section 3. In Section 4, we discuss the necessary assumptions on the winning condition in detail and show that the notion of game reduction carries over to games with disturbances. In Section 5, we identify further interesting research questions arising in the context of disturbances. Finally, we discuss related work in Section 6.
For notational convenience, we employ some ordinal notation à la von Neumann: the non-negative integers are defined inductively as and . Now, the first limit ordinal is , the set of the non-negative integers. The next two successor ordinals are and . These ordinals are ordered by set inclusion, i.e., we have . For convenience of notation, we also denote the cardinality of by .
2.1 Infinite Games with Disturbances
An arena (with unmodeled disturbances) consists of a finite directed graph , a partition of into the set of vertices of Player (denoted by circles) and the set of vertices of Player (denoted by squares), and a set of disturbance edges (denoted by dashed arrows). Note that only vertices of Player have outgoing disturbance edges. We require that every vertex has a successor with to avoid finite plays.
A play in is an infinite sequence such that and for all : implies , and implies . Hence, the additional bits for denote whether a standard or a disturbance edge has been taken to move from to . We say starts in . A play prefix is defined similarly and ends in . The number of disturbances in a play is defined as , which is either some (if there are finitely many disturbances, namely ) or it is equal to (if there are infinitely many). A play is disturbance-free, if .
A game (with unmodeled disturbances) consists of an arena and a winning condition . A play is winning for Player , if , otherwise it is winning for Player . Hence, winning is oblivious to occurrences of disturbances. A winning condition is prefix-independent if for all and all we have if and only if .
In examples, we often use the parity condition, the canonical -regular winning condition. Let be a coloring of a set of vertices. The (max-) parity condition
requires the maximal color occurring infinitely often during a play to be even. A game is a parity game, if for some coloring of the vertices of . In figures, we label vertices of a parity game by a pair where is the name of the vertex and its color.
Furthermore, in our proofs we make use of the safety condition
for a given set of unsafe vertices. It requires Player to only visit safe vertices, i.e., Player wins a play if it visits at least one unsafe vertex.
A strategy for Player is a function such that for every . A play is consistent with , if for every with and , i.e., the next vertex is the one prescribed by the strategy unless a disturbance edge is used. A strategy is positional, if for all .
Note that a strategy does not have access to the bits indicating whether a disturbance occurred or not. However, this is not a restriction: let be a play with for some . We say that this disturbance is consequential (w.r.t. ), if , i.e., if the disturbance transition traversed by the play did not lead to the vertex the strategy prescribed. Such consequential disturbances can be detected by comparing the actual vertex to ’s output . On the other hand, inconsequential disturbances will just be ignored. In particular, the number of consequential disturbances is always at most the number of disturbances.
2.2 Infinite Games without Disturbances
We characterize the classical notion of infinite games, i.e., those without disturbances, (see, e.g., [GraedelThomasWilke02]) as a special case of games with disturbances. Let be a game with vertex set . A strategy for Player in is said to be a winning strategy for her from , if every disturbance-free play that starts in and that is consistent with is winning for Player .
The winning region of Player in contains those vertices such that Player has a winning strategy from . Thus, the winning regions of are independent of the disturbance edges, i.e., we obtain the classical notion of infinite games. We say that Player wins from , if . Solving a game amounts to determining its winning regions. Note that every game has disjoint winning regions. In contrast, a game is determined, if every vertex is in either winning region.
2.3 Resilient Strategies
Let be a game with vertex set and let . A strategy for Player in is -resilient from if every play that starts in , that is consistent with , and with , is winning for Player . Thus, a -resilient strategy with is winning even under at most disturbances, an -resilient strategy is winning even under any finite number of disturbances, and an -resilient strategy is winning even under infinitely many disturbances. Note that every strategy is -resilient, as no play has strictly less than zero disturbances. Furthermore, a strategy is -resilient from if and only if it is winning for Player from .
We define the resilience of a vertex of as
Note that the definition is not antagonistic, i.e., it is not defined via strategies of Player . Nevertheless, due to the remarks above, resilient strategies generalize winning strategies.
Let be a determined game. Then, if and only if .
A strategy is optimally resilient, if it is -resilient from every vertex . Every such strategy is a uniform winning strategy for Player , i.e., a strategy that is winning from every vertex in her winning region. Hence, positional optimally resilient strategies can only exist in games which have uniform positional winning strategies for Player .
Our goal is to determine the mapping and to compute an optimally resilient strategy.
3 Computing Optimally Resilient Strategies
To compute optimally resilient strategies, we first characterize the vertices of finite resilience in Subsection 3.1. All other vertices either have resilience or . To distinguish between these possibilities, we show how to determine the vertices with resilience in Subsection 3.2. In Subsection 3.3, we show how to compute optimally resilient strategies using the results of the first two subsections. We only consider prefix-independent winning conditions in Subsections 3.1 and 3.3. In Section 4, we show how to overcome this restriction.
3.1 Characterizing Vertices of Finite Resilience
Our goal in this subsection is to characterize vertices with finite resilience in a game with prefix-independent winning condition, i.e., those vertices from which Player can win even under disturbances, but not under disturbances, for some . To illustrate our approach, consider the parity game in Figure 1 (on Page 1). The winning region of Player only contains the vertex . Thus, by Remark 2, is the only vertex with resilience zero, every other vertex has a larger resilience.
Now, consider the vertex , which has a disturbance edge leading into the winning region of Player . Due to this edge, has resilience one. From , a disturbance-free play violating the winning condition starts that is consistent with every strategy for Player . Due to prefix-independence, prepending the disturbance edge does not change the winner and consistency with every strategy for her. Hence, this play witnesses that has resilience at most one, while being in Player ’s winning region yields the matching lower bound.
However, is the only vertex to which this reasoning applies. Now, consider : from here, Player can force a play to visit using a standard edge. From this property, one can argue that has resilience one as well. Again, this is the only vertex to which this reasoning is applicable.
In particular, from Player can avoid reaching the vertices for which we have determined the resilience by using the self loop. However, this comes at a steep price for her: doing so results in a losing play, as the color of
is odd. Thus, if she wants to have a chance at winning, she has to take a risk by moving to, from which she has a -resilient strategy that is winning, if no more disturbances occur. For this reason, has resilience one as well. The same reasoning applies to : Player can force the play to and from there Player has to take a risk by moving to .
The vertices , , and share the property that Player can either enforce a play violating the winning condition or to reach a vertex with already determined finite resilience. These three vertices are the only ones currently satisfying this property. They all have resilience one since Player can enforce to reach a vertex of resilience one, but he cannot enforce to reach a vertex of resilience zero. Now, we can also determine the resilience of . The disturbance edge from to witnesses that it is two.
Afterwards, these two arguments no longer apply to new vertices: no disturbance edge leads from a to a vertex whose resilience is already determined and Player has a winning strategy from each of these vertices that additionally avoids vertices whose resilience is already determined. Thus, our reasoning cannot determine their resilience. This is consistent with our goal, as all four vertices have non-finite resilience, i.e., and have resilience and and have resilience . Note that our reasoning here cannot distinguish these two values. We solve this problem in Subsection 3.2.
In this subsection, we formalize the reasoning described above: starting from the vertices in Player ’s winning region having resilience zero, we use a disturbance update and a risk update to determine all vertices of finite resilience. To simplify our proofs, we describe both as monotone operators updating partial rankings mapping vertices to , which might update already defined values. We show that alternatingly applying these updates eventually yields a stable ranking that indeed characterizes the vertices of finite resilience.
Throughout this section, we fix a game with and with prefix-independent satisfying the following condition: the game is determined for every . We discuss this requirement in Section 4.
A ranking for is a partial mapping . The domain of is denoted by , its image by . Let and be two rankings. We say that refines if and for all . A ranking is sound, if we have if and only if (cf. Remark 2).
Let be a ranking for . We define the ranking as
with , if . We call the disturbance update of .
The disturbance update of a sound ranking is sound and refines .
As the minimization defining ranges over a superset of , we have for every . This immediately implies refinement. From this inequality, we also obtain for every , due to soundness of . Finally, consider some . Then, by soundness of . Thus, as well, as both and are greater than zero. Altogether, is sound as well. ∎
Again, let be a ranking for . For every let
the winning region of Player in the game where he either wins by reaching a vertex with or by violating the winning condition. Now, define , where is undefined if is in none of the . We call the risk update of .
The risk update of a sound ranking is sound and refines .
We will show for every , which implies both refinement and for every , as argued in the proof of Lemma 1.
Thus, let . Trivially, . Thus, Player wins the game from by violating the safety condition right away. Hence, and thus .
To complete the proof of soundness of , we just have to show for every . Towards a contradiction, assume , i.e., . Thus, Player has a strategy from that ensures that either the winning condition is violated or that a vertex with is reached, i.e., by soundness of . Hence, Player has a winning strategy for from . This implies that he also has a winning strategy from : play according to until a vertex with is reached. From there, mimic when starting from . Every resulting disturbance-free play has a suffix that violates . Thus, by prefix-independence, the whole play violates as well, i.e., it is winning for Player . Thus, , which yields the desired contradiction, as winning regions are always disjoint.∎
Let be the unique sound ranking with domain , i.e., maps exactly the vertices in Player ’s winning region to zero. Starting with , we inductively define a sequence of rankings such that for an odd (even) is the disturbance update (the risk update) of , i.e., we alternate between disturbance and risk updates.
Due to refinement, the eventually stabilize, i.e., there is some such that for all . Define . Due to being sound and by Lemma 1 and Lemma 2, each , and in particular, is sound. If , let be the minimal with ; otherwise, is undefined.
If , then for all .
We show the following stronger result for every :
If is odd, then for every .
If is even, then for every .
The disturbance update increases the maximal rank by at most one and the risk update does not increase the maximal rank at all. Furthermore, due to refinement, the rank of is set and then only decreases. Hence, we obtain and for odd and even , respectively. In the remainder of the proof, we show a matching lower bound.
We say that a vertex is updated to in if and either or both and (here, is the unique ranking with empty domain). Now, we show the following by induction over , which implies the matching lower bound.
If is odd, then no is updated in to some .
If is even, then no is updated in to some .
For , we have , and clearly, no vertex is assigned a negative rank by . For and , we obtain . As , , and are sound, neither nor update some to zero.
Now let and first consider the case where is odd. Towards a contradiction, assume that is updated in to some value less than . Since is odd, is the disturbance update of . Further, as is updated in , there exists some disturbance edge such that . Thus, , i.e., . First, we show , i.e., the rank of is stable during the last two updates.
First assume towards a contradiction . Then, is updated in to some rank of at most , which is in turn smaller than , violating the induction hypothesis for . Hence, . The same reasoning yields a contradiction to the assumption . Thus, we indeed obtain .
Since is the disturbance update of , we obtain . Due to refinement, we obtain , i.e., altogether . The latter equality contradicts our initial assumption, namely being updated in to .
Now consider the case where is even. Again, assume towards a contradiction that is updated in to some value less than . Since is even, is the risk update of . Further, as is updated in , Player wins the game from , where . Hence, he has a strategy such that every play starting in and consistent with either violates or eventually visits some vertex with . We claim for all .
Towards a contradiction, assume for some . Note that we have . Thus, is updated in to some value strictly less than , which contradicts the induction hypothesis for . Hence, we indeed obtain for all .
Thus, there are two types of vertices in : those for which is defined, which implies due to the induction hypothesis and refinement, and those where is undefined, which implies due to the claim above.
We claim that Player wins from , which implies . This contradicts being updated in , our initial assumption.
To this end, we construct a strategy from that either violates or reaches a vertex with as follows. From , mimics until a vertex in is reached (if it is at all). If is of the first type, then we have . If is of the second type, then is updated in to some rank . As is the risk update of , Player has a strategy from that either violates or reaches a vertex with . Thus, starting in , mimics from until such a vertex is reached (if it is reached at all). Thus, every play that starts in and is consistent with either violates or reaches a vertex with , which proves our claim.∎
From the proof of Lemma 3, we obtain an upper bound on the maximal rank of . This in turn implies that the stabilize quickly, as implies .
We have for some .
We have .
Lemma 3 also shows that an algorithm computing the does not need to implement the definition of the two updates as presented above, but can be optimized by taking into account that a rank is never updated once set.
The main result of this section shows that characterizes the resilience of vertices of finite resilience.
Let be defined for as described above, and let .
If , then .
If , then .
1.) Let . We prove and .
“”: An -resilient strategy from is also -resilient from for every . Thus, to prove
we just have to show that Player has no -resilient strategy from . By definition, for every strategy for Player , we have to show that there is a play starting in and consistent with that has at most disturbances and is winning for Player . So, fix an arbitrary strategy .
We define a play with the desired properties by constructing longer and longer finite prefixes before finally appending an infinite suffix. During the construction, we ensure that each such prefix ends in in order to be able to proceed with our construction.
The first prefix just contains the starting vertex , i.e., the prefix does indeed end in . Now, assume we have produced a prefix ending in some vertex , which implies that is defined. We consider three cases:
If , then by definition of , i.e., Player has a winning strategy from . Thus, we extend by the unique disturbance-free play that starts in and is consistent with and , without its first vertex. In that case, the construction of the infinite play is complete.
If is odd, then received its rank during a disturbance update. Hence, there is some such that with . In this case, we extend by such a vertex to obtain the new prefix , which satisfies the invariant, as is in . Further, we have as the rank of had to be defined in order to be considered during the disturbance update assigning a rank to .
If is even, then received its rank during a risk update. We claim that Player has a strategy that guarantees one of the following outcomes from : either the resulting play violates or it encounters a vertex that satisfies and .
In that case, consider the unique disturbance-free play that starts in and is consistent with and the strategy as above. If violates , then we extend by without its first vertex. In that case, the construction of the infinite play is complete.
If does not violate , then we extend by the prefix of without its first vertex and up to (and including) the first occurrence of a vertex in satisfying the properties described above. Note that this again satisfies the invariant.
It remains to argue our claim: was assigned its rank because it is in Player ’s winning region in the game with winning condition , for
Hence, from , Player has a strategy to either violate the winning condition or to reach . Thus, for every yields . Finally, we have , as the rank of was assigned due to the vertices in already having ranks.
Note that only in two cases, we extend the prefix to an infinite play. In the other two cases, we just extend the prefix to a longer finite one. Thus, we first show that this construction always results in an infinite play. To this end, let and two of the prefixes constructed above such that is an extension of . A simple induction proves . Hence, as the value can only decrease finitely often, at some point an infinite suffix is added. Thus, we indeed construct an infinite play.
Finally, we have to show that the resulting play has the desired properties: by construction, the play starts in and is consistent with . Furthermore, by construction, it has a disturbance-free suffix that violates . Thus, by prefix-independence, the whole play also violates . It remains to show that it has at most disturbances. To this end, let and two of the prefixes such that is obtained by extending once. If the extension consists of taking the disturbance edge , then we have . The only other possibility is the extension consisting of a finite play prefix that is consistent with the strategy . Then, by construction, we obtain .
Thus, there are at most many disturbances in the play, as the current rank decreases with every disturbance edge and does not increase with the other type of extension, but is always non-negative.
“”: Here, we construct a strategy for Player that is -resilient from every , i.e., from , has to be winning even under disturbances. As every strategy is -resilient, we only have to consider those with .
The proof is based on the fact that is both stable under the disturbance and under the risk update, i.e., the disturbance update and the risk update of are , which yields the following properties. Let be a disturbance edge such that . Then, we have . Also, for every with , Player has a winning strategy from for the game (note the strict inequality). Here, we apply the determinacy of the game , as the risk update is formulated in terms of Player ’s winning region.
Now, we define as follows: it always mimics a strategy for some , which is initialized by the starting vertex. The strategy is mimicked until a consequential (w.r.t. ) disturbance edge is taken, say by reaching the vertex . In that case, the strategy discards the history of the play constructed so far, updates to , and begins mimicking . This is repeated ad infinitum.
Now, consider a play that starts in , is consistent with , and has less than disturbances. The part up to the first consequential disturbance edge (if it exists at all) is consistent with . Now, let be the corresponding disturbance edge. Then, we have , as being a winning strategy for the safety condition never visits vertices with a rank smaller than . Thus, we conclude . Similarly, the part between the first and the second consequential disturbance edge (if it exists at all) is consistent with . Again, if is the corresponding disturbance edge, then we have . Continuing this reasoning shows that less than (consequential) disturbance edge lead to a vertex with , as the rank is decreased by at most one for every disturbance edge. The suffix starting in this vertex is disturbance-free and consistent with . Hence, the suffix satisfies , i.e., by prefix-independence, the whole play satisfies as well. Thus, is indeed -resilient from every .
2.) Let . The disturbance update of being implies that every disturbance edge starting in leads back to . Similarly, the risk update of being implies for . Thus, from every , Player has a strategy such that every disturbance-free play that starts in and is consistent with satisfies the winning condition and never leaves . Using these properties, we construct a strategy that is -resilient from every , which implies .
The definition of the strategy here is similar to the one above yielding the lower bound on the resilience. Again, always mimics a strategy for some , which is initialized by the starting vertex. The strategy is mimicked until a consequential (w.r.t. ) disturbance edge is taken, say by reaching the vertex . In that case, the strategy discards the history of the play constructed so far, updates to , and begins mimicking . This is repeated ad infinitum.
Due to the properties of the disturbance edges and the strategies , such a play never leaves , even if disturbances occur. Furthermore, if only finitely many disturbances occur, then the resulting play has a disturbance-free suffix that starts in some and is consistent with . As is winning from in , this suffix satisfies . Hence, by prefix-independence of , the whole play also satisfies . Thus, is indeed an -resilient strategy from every . ∎
We have for some .
3.2 Characterizing Vertices of Resilience
Our goal in this subsection is to determine the vertices of resilience , i.e., those from which Player can win even under an infinite number of disturbances. Intuitively, in this setting, we give Player control over the occurrence of disturbances, as he cannot execute more than infinitely many disturbances during a play. To this end, consider again the parity game in Figure 1 (on Page 1). From Player wins even if Player controls whether the disturbance edge is taken from , as both and have color zero. On the other hand, giving Player control over the disturbance edges implies that he wins from , as he can use the one incident to infinitely often to move to , which has color one.