Synthesis of Admissible Shields

04/15/2019 ∙ by Laura Humphrey, et al. ∙ 0

Shield synthesis is an approach to enforce a set of safety-critical properties of a reactive system at runtime. A shield monitors the system and corrects any erroneous output values instantaneously. The shield deviates from the given outputs as little as it can and recovers to hand back control to the system as soon as possible. This paper takes its inspiration from a case study on mission planning for unmanned aerial vehicles (UAVs) in which k-stabilizing shields, which guarantee recovery in a finite time, could not be constructed. We introduce the notion of admissible shields, which improves k-stabilizing shields in two ways: (1) whereas k-stabilizing shields take an adversarial view on the system, admissible shields take a collaborative view. That is, if there is no shield that guarantees recovery within k steps regardless of system behavior, the admissible shield will attempt to work with the system to recover as soon as possible. (2) Admissible shields can handle system failures during the recovery phase. In our experimental results we show that for UAVs, we can generate admissible shields, even when k-stabilizing shields do not exist.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Technological advances enable the development of increasingly sophisticated systems. Smaller and faster microprocessors, wireless networking, and new theoretical results in areas such as machine learning and intelligent control are paving the way for transformative technologies across a variety of domains – self-driving cars that have the potential to reduce accidents, traffic, energy consumption, and pollution; and unmanned systems that can safely and efficiently operate on land, under water, in the air, and in space. However, in each of these domains, concerns about safety are being raised

[16],[7]. Specifically, there is a concern that due to the complexity of such systems, traditional test and evaluation approaches will not be sufficient for finding errors, and alternative approaches such as those provided by formal methods are needed [17].

Formal methods are often used to verify systems at design time, but this is not always realistic. Some systems are simply too large to be fully verified. Others, especially systems that operate in rich dynamic environments or those that continuously adapt their behavior through methods such as machine learning, cannot be fully modeled at design time. Still others may incorporate components that have not been previously verified and cannot be modeled, e.g., proprietary components or pre-compiled code libraries.

Also, even systems that have been fully verified at design time may be subject to external faults such as those introduced by unexpected hardware failures or human inputs. One way to address this issue is to model nondeterministic behaviours (such as faults) as disturbances, and verify the system with respect to this disturbance model [18]. However, it is impossible to model all potential unexpected behavior at design time.

An alternative in such cases is to perform runtime verification to detect violations of a set of specified properties while a system is executing [14]. An extension of this idea is to perform runtime enforcement of specified properties, in which violations are not only detected but also overwritten in a way that specified properties are maintained.

A general approach for runtime enforcement of specified properties is shield synthesis, in which a shield monitors the system and instantaneously overwrites incorrect outputs. A shield must ensure both correctness, i.e., it corrects system outputs such that all properties are always satisfied, as well as minimum deviation, i.e., it deviates from system outputs only if necessary and as rarely as possible. The latter requirement is important because the system may satisfy additional noncritical properties that are not considered by the shield but should be retained as much as possible.

Bloem et al. [4] proposed the notion of -stabilizing shields. Since we are given a safety specification, we can identify wrong outputs, that is, outputs after which the specification is violated (more precisely: after which the environment can force the specification to be violated). A wrong trace is then a trace that ends in a wrong output. The idea of shields is that they may modify the outputs so that the specification always holds, but that such deviations last for at most consecutive steps after a wrong output. If a second violation happens during the -step recovery phase, the shield enters a mode where it only enforces correctness, but no longer minimizes the deviation. This proposed approach has two limitations with significant impact in practice. (1) The -stabilizing shield synthesis problem is unrealizable for many safety-critical systems, because a finite number of deviations cannot be guaranteed. (2) -stabilizing shields make the assumption that there are no further system errors during the recovery phase.

In this paper, we introduce admissible shields, which overcome the two issues of -stabilizing shields. To address shortcoming (1), we guarantee the following: (a) Admissible shields are subgame optimal. That is, for any wrong trace, if there is a finite number of steps within which the recovery phase can be guaranteed to end, the shield will always achieve this. (b) The shield is admissible, that is, if there is no such number , it always picks a deviation that is optimal in that it ends the recovery phase as soon as possible for some possible future inputs. (This is defined in more detail below.) As a result, admissible shields work well in settings in which finite recovery can not be guaranteed, because they guarantee correctness and may well end the recovery period if the system does not pick adversarial outputs. To address shortcoming (2), admissible shields allow arbitrary failure frequencies and in particular failures that arrive during recovery, without losing the ability to recover.

As a second contribution, we demonstrate the use of admissible shields through a case study involving mission planning for an unmanned aerial vehicle (UAV). Manually creating and executing mission plans that meet mission objectives while addressing all possible contingencies is a complex and error-prone task. Therefore, having a shield that changes the mission only if absolutely necessary to enforce certain safety properties has the potential to lower the burden on human operators, and ensures safety during mission execution. We show that admissible shields are applicable in this setting, whereas -stabilizing shields are not.

Related Work: Our work builds on synthesis of reactive systems [20], [3] and reactive mission plans [9] from formal specifications, and our method is related to synthesis of robust [1] and error-resilient [10] systems. However, our approach differs in that we do not synthesize an entire system, but rather a shield that considers only a small set of properties and corrects the output of the system at runtime. Li et al. [15] focused on the problem of synthesizing a semi-autonomous controller that expects occasional human intervention for correct operation. A human-in-the-loop controller monitors past and current information about the system and its environment. The controller invokes the human operator only when it is necessary, but as soon as a specification is violated ahead of time, such that the human operator has sufficient time to respond. Similarly, our shields monitor the behavior of systems at run time, and interfere as little as possible. Our work relates to more general work on runtime enforcement of properties [12], but shield synthesis [4] is the first appropriative work for reactive systems, since shields act on erroneous system outputs immediately without delay. While [4] focuses on shield synthesis for systems assumed to make no more than one error every steps, this work assumes only that systems generally have cooperative behavior with respect to the shield, i.e., the shield ensures a finite number of deviations if the system chooses certain outputs. This is similar in concept to cooperative synthesis as considered in [2], in which a synthesized system has to satisfy a set of properties (called guarantees) only if certain environment assumptions hold. The authors present a synthesis procedure that maximizes the cooperation between system and environment for satisfying both guarantees and assumptions as far as possible.

Outline: In what follows, we begin in Section 2 by motivating the need for admissible shields through a case study involving mission planning for a UAV. In Sections 3, 4, 5, we define preliminary concepts, review the general shield synthesis framework, and describe our approach for synthesizing admissible shields. Section 6 provides experimental results, and Section 7 concludes.

2 Motivating Example

In this section, we apply shields on a scenario in which a UAV must maintain certain properties while performing a surveillance mission in a dynamic environment. We show how a shield can be used to enforce the desired properties, where a human operator in conjunction with a lower-level autonomous planner is considered as the reactive system that sends commands to the UAV’s autopilot. We discuss how we would intuitively want a shield to behave in such a situation. We show that the admissible shields provide the desired behaviors and address the limitations of -stabilizing shields.

To begin, note that a common UAV control architecture consists of a ground control station that communicates with an autopilot onboard the UAV [5]. The ground control station receives and displays updates from the autopilot on the UAV’s state, including position, heading, airspeed, battery level, and sensor imagery. It can also send commands to the UAV’s autopilot, such as waypoints to fly to. A human operator can then use the ground control station to plan waypoint-based routes for the UAV, possibly making modifications during mission execution to respond to events observed through the UAV’s sensors. However, mission planning and execution can be very workload intensive, especially when operators are expected to control multiple UAVs simultaneously [8]. To address this issue, methods for UAV command and control have been explored in which operators issue high-level commands, and automation carries out low-level execution details.

Several errors can occur in this type of human-automation paradigm [6]. For instance, in issuing high-level commands to the low-level planner, a human operator might neglect required safety properties due to high workload, fatigue, or an incomplete understanding of exactly how the autonomous planner might execute the command. The planner might also neglect these safety properties either because of software errors or by design. Waypoint commands issued by the operator or planner could also be corrupted by software that translates waypoint messages between ground station and autopilot specific formats or during transmission over the communication link.

As the mission unfolds, waypoint commands will be sent periodically to the autopilot. If a waypoint that violates the properties is received, a shield that monitors the system inputs and can overwrite the waypoint outputs to the autopilot would be able to make corrections to ensure the satisfaction of the desired properties.

Consider the mission map in Fig. 1 [13], which contains three tall buildings (illustrated as blue blocks), over which a UAV should not attempt to fly. It also includes two unattended ground sensors (UGS) that provide data on possible nearby targets, one at location and one at , as well as two locations of interest, and . The UAV can monitor , , and from several nearby vantage points. The map also contains a restricted operating zone (ROZ), illustrated with a red box, in which flight might be dangerous, and the path of a possible adversary that should be avoided (the pink dashed line). Inside the communication relay region (large green area), communication links are highly reliable. Outside this region, communication relies on relay points with lower reliability.

Figure 1: A map for UAV mission planning.

Given this scenario, properties of interest include:

  1. Connected waypoints. The UAV is only allowed to fly to directly connected waypoints.

  2. No communication. The UAV is not allowed to stay in a location with reduced communication reliability.

  3. Restricted operating zones. The UAV has to leave a ROZ within 2 time steps.

  4. Detected by an adversary. Locations on the adversary’s path cannot be visited more than once over any window of 3 time steps.

  5. UGS. If a UGS reports a possible nearby target, the UAV should visit a respective waypoint within 7 steps (for visit , for visit , , , or ).

  6. Go home. Once the UAV’s battery is low, it should return to a designated landing site at within 10 time steps.

The task of the shield is to ensure these properties during operation. In this setting, the operator in conjunction with a lower-level planner acts as a reactive system that responds to mission-relevant inputs; in this case data from the UGSs and a signal indicating whether the battery is low. In each step, the next waypoint is sent to the autopilot, which is encoded in a bit representation via outputs , , , and . We attach the shield as shown in Fig. 2. The shield monitors mission inputs and waypoint outputs, correcting outputs immediately if a violation of the safety properties becomes unavoidable.

We represent each of the properties by a safety automaton, the product of which serves as the shield specification. Fig. 3 models the “connected waypoints” property, where each state represents a waypoint with the same number. Edges are labeled by the values of the variables . For example, the edge leading from state to state is labeled by . For clarity, we drop the labels of edges in Fig. 3. The automaton also includes an error state, which is not shown. Missing edges lead to this error state, denoting forbidden situations.

How should a shield behave in this scenario? If the human operator wants to monitor a location in a ROZ, he or she would like to simply command the UAV to “monitor the location in the ROZ and stay there”, with the planner handling the execution details. If the planner cannot do this while meeting all the safety properties, it is appropriate for the shield to revise its outputs. Yet, the operator would still expect his or her commands to be followed to the maximum extent possible, leaving the ROZ when necessary and returning whenever possible. Thus, the shield should minimize deviations from the operator’s directives as executed by the planner.

Figure 2: The interaction between the operator/planner (acting as a reactive system) and the shield. Figure 3: Safety automaton of Property 1 over the map in Fig. 1.

Using a -stabilizing shield. As a concrete example, assume the UAV is currently at , and the operator commands it to monitor . The planner then sends commands to fly to then , which are accepted by the shield. The planner then sends a command to loiter at , but the shield must overwrite it to maintain Property 3, which requires the UAV to leave the ROZ within two time steps. The shield instead commands the UAV to go to . Suppose the operator then commands the UAV to fly to , while the planner is still issuing commands as if the UAV is at . The planner then commands the UAV to fly to , but since the actual UAV cannot fly from to directly, the shield directs the UAV to on its way to . The operator might then respond to a change in the mission and command the UAV fly back to , and the shield again deviates from the route assumed by the planner, and directs the UAV back to , and so on. Therefore, a single specification violation can lead to an infinitely long deviation between the UAV’s actual position and the UAV’s assumed position. A -stabilizing shield is allowed to deviate from the planner’s commands for at most consecutive time steps. Hence, no -stabilizing shield exists.

Using an admissible shield. Recall the situation in which the shield caused the actual position of the UAV to “fall behind” the position assumed by the planner, so that the next waypoint the planner issues is two or more steps away from the UAV’s current waypoint position. The shield should then implement a best-effort strategy to “synchronize” the UAV’s actual position with that assumed by the planner. Though this cannot be guaranteed, the operator and planner are not adversarial towards the shield, so it will likely be possible to achieve this re-synchronization, for instance when the UAV goes back to a previous waypoint or remains at the current waypoint for several steps. This possibility motivates the concept of an admissible shield. Assume that the actual position of the UAV is and the its assumed position is . If the operator commands the UAV to loiter at , the shield will be able to catch up with the state assumed by the planner and to end the deviation by the next specification violation.

3 Preliminaries

We denote the Boolean domain by , the set of natural numbers by , and abbreviate by . We consider a reactive system with a finite set of Boolean inputs and a finite set of Boolean outputs. The input alphabet is , the output alphabet is , and . The set of finite (infinite) words over is denoted by (), and . We will also refer to words as (execution) traces. We write for the length of a trace . For and , we write for the composition . A set of words is called a language. We denote the set of all languages as .

Reactive Systems. A Mealy machine (reactive system, design) is a 6-tuple , where is a finite set of states, is the initial state, is a complete transition function, and is a complete output function. Given the input trace , the system produces the output trace , where for all . The set of words produced by is denoted .

Let and be reactive systems. A serial composition of and is realized if the input and output of are fed to . We denote such composition as , where , , , and .

Specifications. A specification is a set of allowed traces. realizes , denoted by , iff . A specification is realizable if there exists a design that realizes it. A safety specification is represented by an automaton , where , , and is a set of safe states. The run induced by trace is the state sequence such that ; the run is accepting if . Trace (of a design ) satisfies if the induced run is accepting. The language is the set of all traces satisfying .

Games. A (2-player, alternating) game is a tuple , where is a finite set of game states, is the initial state, is a complete transition function, and is a winning condition. The game is played by two players: the system and the environment. In every state (starting with ), the environment first chooses an input letter , and then the system chooses some output letter . This defines the next state , and so on. Thus, a (finite or infinite) word over results in a (finite or infinite) play, a sequence of game states. A play is won by the system iff is . A safety game defines via a set of safe states: is iff , i.e., if only safe states are visited. Let denote the states that occur infinitely often in . A Büchi game defines via a set of accepting states: is iff .

It is easy to transform a safety specification into a safety game such that a trace satisfies the specification iff the corresponding play is won. Given a safety specification . A finite trace is wrong, if the corresponding play is not won, i.e., if there is no way for the system to guarantee that any extension of the trace satisfies the specification. An output is called wrong, if it makes a trace wrong; i.e., given , a trace an input , and an output , is wrong iff is not wrong, but is.

A deterministic (memoryless) strategy for the environment is a function . A deterministic (memoryless) strategy for the system is a function . A strategy is winning for the system, if for all strategies of the environment the play that is constructed when defining the outputs using and satisfies . The winning region is the set of states from which a winning strategy exists. A strategy is cooperatively winning if there exists a strategy and , such that the play constructed by and satisfies .

For a Büchi game with accepting states , consider a strategy of the environment, a strategy of the system, and a state . We set the distance , if the play defined by and reaches from an accepting state that occurs infinitely often in in steps. If no such state is visited, we set . Given two strategies and of the system, we say that dominates if: (i) for all and all , , and (ii) there exists and such that .

A strategy is admissible if there is no strategy that dominates it.

4 Admissible Shields

Bloem et al. [4] presented the general framework for shield synthesis. A shield has two main properties: (i) For any design, a shield ensures correctness with respect to a specification. (ii) A shield ensures minimal deviation. We revisit these properties in Sec. 4.1. The definition of minimum deviation is designed to be flexible and different notions of minimum deviation can be realized. -stabilizing shields represent one concrete realization. In Sec. 4.2, we present a new realization of the minimum deviation property resulting in admissible shields.

4.1 Definition of Shields

A shield reads the input and output of a design as shown in Fig. 2. We then address the two properties, correctness and minimum deviation, to be ensured by a shield.

The Correctness Property. With correctness we refer to the property that the shield corrects any design’s output such that a given safety specification is satisfied. Formally, let be a safety specification and be a Mealy machine. We say that ensures correctness if for any design , it holds that .

Since a shield must work for any design, the synthesis procedure does not need to consider the design’s implementation. This property is crucial because the design may be unknown or too complex to analyze. On the other hand, the design may satisfy additional (noncritical) specifications that are not specified in but should be retained as much as possible.

The Minimum Deviation Property. Minimum deviation requires a shield to deviate only if necessary, and as infrequently as possible. To ensure minimum deviation, a shield can only deviate from the design if a property violation becomes unavoidable. Given a safety specification , a Mealy machine does not deviate unnecessarily if for any design and any trace that is not wrong, we have that . In other words, if does not violate , keeps the output of intact.

A Mealy machine is a shield if ensures correctness and does not deviate unnecessarily.

Ideally, shields end phases of deviations as soon as possible, recovering quickly. This property leaves room for interpretation. Different types of shields differentiate on how this property is realized.

4.2 Defining Admissible Shields

In this section we define admissible shields using their speed of recovery. We distinguish between two situations. In states of the design in which a finite number of deviations can be guaranteed, an admissible shield takes an adversarial view on the design: it guarantees recovery within steps regardless of system behavior, for the smallest possible. In these states, the strategy of an admissible shield conforms to the strategy of -stabilizing shield. In all other states, admissible shields take a collaborative view: the admissible shield will attempt to work with the design to recover as soon as possible. In particular, an admissible shield plays an admissible strategy, that is, a strategy that cannot be beaten in recovery speed if the design acts cooperatively.

We will now define admissible shields. For failures of the system that are corrected by the shield, we consider four phases:

  1. The innocent phase consisting of inputs and outputs , in which no failure occurs; i.e., .

  2. The misstep phase consisting of a input and a wrong output ; i.e., .

  3. The deviation phase consisting of inputs and outputs in which the shield is allowed to deviate, and for a correct output we have .

  4. The final phase consisting and in which the shield does not deviate, and .

Adversely -stabilizing shields have a deviation phase of length at most .

Definition 1

A shield adversely -stabilizes a trace , if for any input and any wrong output , for any correct output and for any correct trace there exists a trace such that for any trace such that , we have


Note that it is not always possible to adversely -stabilize a shield for a given or even for any .

Definition 2 (Adversely -Stabilizing Shields [4])

A shield is adversely -stabilizing if it adversely -stabilies any finite trace.

An adversely -stabilizing shield guarantees to end deviations after at most steps and produces a correct trace under the assumption that the failure of the design consists of a transmission error in the sense that the wrong letter is substituted for a correct one. We use the term adversely to emphasize that finitely long deviations are guaranteed for any future inputs and outputs of the design.

Definition 3 (Adversely Subgame Optimal Shield)

A shield is adversely subgame optimal if for any trace , adversely stabilizes and there exists no shield that adversely -stabilizes for any .

An adversely subgame optimal shield guarantees to deviate in response to an error for at most time steps, for the smallest possible.

Definition 4

A shield collaboratively -stabilizes a trace , if for any input and any wrong output , there exists a correct output , a correct trace , and a trace such that for any trace such that , we have


Definition 5 (Collaborative -Stabilizing Shield)

A shield is collaboratively -stabilizing if it collaboratively -stabilizes any finite trace.

A collaborative -stabilizing shield requires that it must be possible to end deviations after steps, for some future input and output of . It is not necessary that this is possible for all future behavior of allowing infinitely long deviations.

Definition 6 (Collaborative Subgame Optimal Shield)

A shield is collaborative subgame optimal if for any trace , collaboratively stabilizes and there exists no shield that adversely -stabilizes for any .

Definition 7 (Admissible Shield)

A shield is admissible if for any trace , whenever there exists a and a shield such that adversely -stabilizes , then adversely -stabilizes . If such a does not exist for trace , then collaboratively -stabilizes for a minimal .

An admissible shield ends deviations whenever possible. In all states of the design where a finite number of deviations can be guaranteed, an admissible shield deviates for each violation for at most steps, for the smallest possible. In all other states, the shield corrects the output in such a way that there exists design’s inputs and outputs such that deviations end after steps, for the smallest possible.

5 Synthesizing Admissible Shields

The flow of the synthesis procedure is illustrated in Fig. 4. Starting from a safety specification with , the admissible shield synthesis procedure consists of five steps.

Figure 4: Outline of our admissible shield synthesis procedure.

5.0.1 Step 1. Constructing the Violation Monitor .

From we build the automaton to monitor property violations by the design. The goal is to identify the latest point in time from which a specification violation can still be corrected with a deviation by the shield. This constitutes the start of the recovery period, in which the shield is allowed to deviate from the design. In this phase the shield monitors the design from all states that the design could reach under the current input and a correct output. A second violation occurs only if the next design’s output is inconsistent with all states that are currently monitored. In case of a second violation, the shield monitors the set of all input-enabled states that are reachable from the current set of monitored states.

The first phase of the construction of the violation monitor considers as a safety game and computes its winning region so that every reactive system must produce outputs such that the next state of stays in . Only in cases in which the next state of is outside of the shield is allowed to interfere.

The second phase expands the state space to via a subset construction, with the following rationale. If the design makes a mistake (i.e., picks outputs such that enters a state ), we have to “guess” what the design actually meant to do and we consider all output letters that would have avoided leaving and continue monitoring the design from all the corresponding successor states in parallel. Thus, is essentially a subset construction of , where a state of represents a set of states in .

The third phase expands the state space of by adding a counter and a output variable . Initially is 0. Whenever a property is violated is set to 2. If , the shield is in the recovery phase and can deviate. If and there is no other violation, is decremented to 0. In order to decide when to decrement from 2 to 1, we add an output to the shield. If this output is set to and , then is set to 1.

The final violation monitor is , with the set of states , the initial state , the input/output alphabet with , and the next-state function , which obeys the following rules:

  1. if , and

  2. if , and , and if is then , else .

Our construction sets whenever the design leaves the winning region, and not when it enters an unsafe state. Hence, the shield can take a remedial action as soon as “the crime is committed”, before the damage is detected, which would have been too late to correct the erroneous outputs of the design.

Figure 5: Safety automaton of Example 5.0.1.

Figure 6: The deviation monitor .

Example 1. We illustrate the construction of using the specification from Fig. 5 over the outputs and . (Fig. 5 represents a safety automaton if we make all missing edges point to an (additional) unsafe state.) The winning region consists of all safe states, i.e., . The resulting violation monitor is . The transition relation is illustrated in Table 1 and lists the next states for all possible present states and outputs. Lightning bolts denote specification violations. The update of counter , which is not included in Table 1, is as follows: Whenever the design commits a violation is set to . If no violation exists, is decremented in the following way: if or , is set to 0. If and is , is set to 1, else remains 2. In this example, is set to , whenever we are positive about the current state of the design (i.e. in , , and ).

Let us take a closer look at some entries of Table 1. If the current state is and we observe the output , a specification violation occurs. We assume that meant to give an allowed output, either or . The shield continues to monitor both and ; thus, enters the state . If the next observation is , which is allowed from both possible current states, the possible next states are and , therefore traverses to state . However, if the next observation is again , which is neither allowed in nor in , we know that a second violation occurs. Therefore, the shield monitors the design from all three states and enters the state .

{F} {F} {F,S} {S}
{S} {T} {T} {T}
{T} {F} {F} {F}
{F,S} {F} {F,S,T} {S,T}
{S,T} {F,T} {F,T} {F,T}
{F,T} {F} {F,S,T} {F,S}
{F,S,T} {F} {F,S,T} {F,S,T}
Table 1: of of Example 5.0.1.

5.0.2 Step 2. Constructing the Deviation Monitor .

We build to monitor deviations between the shield and design outputs. Here, and iff . That is, if there is a deviation in the current time step, then will be in in the next time step. Otherwise, it will be in . This deviation monitor is shown in Fig. 6.

5.0.3 Step 3. Constructing and Solving the Safety Game .

Given the automata and and the safety automaton , we construct a safety game , which is the synchronous product of , , and , such that is the state space, is the initial state, is the input of the shield, is the output of the shield, is the next-state function, and is the set of safe states such that

and .

We require , which ensures that the output of the shield satisfies , and that the shield can only deviate in the recovery period (i.e., if , no deviation is allowed). We use standard algorithms for safety games (cf. [11]) to compute the winning region and the most permissive non-deterministic winning strategy that is not only winning for the system, but also contains all deterministic winning strategies.

5.0.4 Step 4. Constructing the Büchi Game .

Implementing the safety game ensures correctness () and that the shield keeps the output of the design intact, if does not violate . The shield still has to keep the number of deviations per violation to a minimum. Therefore, we would like the recovery period to be over infinitely often. This can be formalized as a Büchi winning condition. We construct the Büchi game by applying the non-deterministic safety strategy to the game graph .

Given the safety game with the non-deterministic winning strategy and the winning region , we construct a Büchi game such that is the state space, the initial state and the input/output alphabet and remain unchanged, is the transition function, and is the set of accepting states. A play is winning if infinitely often.

5.0.5 Step 5. Solving the Büchi Game .

Most likely, the Büchi game contains reachable states, for which cannot be enforced infinitely often. We implement an admissible strategy that enforces to visit infinitely often whenever possible. This criterion essentially asks for a strategy that is winning with the help of the design.

The admissible strategy for a Büchi game can be computed as follows[11]:

  1. Compute the winning region and a winning strategy for (cf. [19]).

  2. Remove all transitions that start in and do not belong to from . This results in a new Büchi game with if or if .

  3. In the resulting game , compute a cooperatively winning strategy . In order to compute , one first has to transform all input variables to output variables. This results in the Büchi game . Afterwards, can be computed with the standard algorithm for the winning strategy on .

The strategy is an admissible strategy of the game , since it is winning and cooperatively winning [11]. Whenever the game starts in a state of the winning region , any play created by is winning. Since coincides with in all states of the winning region , is winning. We know that is cooperatively winning in the game . A proof that is also cooperatively winning in the original game can be found in [11].

Theorem 5.1

A shield that implements the admissible strategy in the Büchi game in a new reactive system with is an admissible shield.

Proof 1

First, the admissible strategy is winning for all winning states of the Büchi game . Since winning strategies for Büchi games are subgame optimal, a shield that implements ends deviations after the smallest number of steps possible, for all states of the design in which a finite number of deviations can be guaranteed. Second, is cooperatively winning in the Büchi game . Therefore, in all states in which a finite number of deviation cannot be guaranteed, a shield that implements the strategy recovers with the help of the design as soon as possible.

The standard algorithm for solving Büchi games contains the computation of attractors; the -th attractor for the system contains all states from which the system can “force” a visit of an accepting state in steps. For all states of the game , the attractor number of corresponds to the smallest number of steps within which the recovery phase can be guaranteed to end, or can end with the help of the design if a finite number of deviation cannot be guaranteed.

Theorem 5.2

Let be a safety specification and be the cardinality of the state space of . An admissible shield with respect to can be synthesized in time, if it exists.

Proof 2

Our safety game and our Büchi game have at most states and at most edges. Safety games can be solved in time and Büchi games in time [19].

6 Experimental Results

We implemented our admissible shield synthesis procedure in Python, which takes a set of safety automata defined in a textual representation as input. The first step in our synthesis procedure is to build the product of all safety automata and construct the violation monitor 5.0.1. This step is performed on an explicit representation. For the remaining steps we use Binary Decision Diagrams (BDDs) for symbolic representation. The synthesized shields are encoded in Verilog format. To evaluate the performance of our tool, we constructed three sets of experiments, the basis of which is the safety specification of Fig. 1. This example represents a map with 15 waypoints and the six safety properties 1-6. First, we reduced the complexity of the example by only considering 8 out of 15 waypoints. This new example, called Map, consists of the waypoints to with their corresponding properties. The second series of experiments, called Map, considers the original specification of Fig. 1 over all 15 waypoints. The synthesized shields behave as described in Section 2. The third series of experiments, called Map, considers a map with waypoints, essentially adding a duplicate of the map in Fig. 1. All results are summarized in Table 2 and in Table 3. For both tables, the first columns list the set of specification automata and the number of states, inputs, and outputs of their product automata. The next column lists the smallest number of steps under which the shield is able to recover with the help of the design. The last column lists the synthesis time in seconds. All computation times are for a computer with a 2.6 GHz Intel i5-3320M CPU with 8 GB RAM running an 64-bit distribution of Linux. Source code, input files, and instructions to reproduce our experiments are available for download111 .

Table 2: Results of and . Example Property Time [sec] Map 1 9 0 3 3 0.52 1+4 12 0 3 3 1.2 1+5a 46 1 3 4 6.2 1+5b 32 1 3 3 7 1+4+5a 55 1 3 4 17 1+4+5b 36 1 3 3 12 Map