# A New Simulation Metric to Determine Safe Environments and Controllers for Systems with Unknown Dynamics

We consider the problem of extracting safe environments and controllers for reach-avoid objectives for systems with known state and control spaces, but unknown dynamics. In a given environment, a common approach is to synthesize a controller from an abstraction or a model of the system (potentially learned from data). However, in many situations, the relationship between the dynamics of the model and the actual system is not known; and hence it is difficult to provide safety guarantees for the system. In such cases, the Standard Simulation Metric (SSM), defined as the worst-case norm distance between the model and the system output trajectories, can be used to modify a reach-avoid specification for the system into a more stringent specification for the abstraction. Nevertheless, the obtained distance, and hence the modified specification, can be quite conservative. This limits the set of environments for which a safe controller can be obtained. We propose SPEC, a specification-centric simulation metric, which overcomes these limitations by computing the distance using only the trajectories that violate the specification for the system. We show that modifying a reach-avoid specification with SPEC allows us to synthesize a safe controller for a larger set of environments compared to SSM. We also propose a probabilistic method to compute SPEC for a general class of systems. Case studies using simulators for quadrotors and autonomous cars illustrate the advantages of the proposed metric for determining safe environment sets and controllers.

## Authors

• 13 publications
• 11 publications
• 25 publications
• 50 publications
• 33 publications
• ### A Class of Control Certificates to Ensure Reach-While-Stay for Switched Systems

11/29/2017 ∙ by Hadi Ravanbakhsh, et al. ∙ 0

• ### Using control synthesis to generate corner cases: A case study on autonomous driving

This paper employs correct-by-construction control synthesis, in particu...
07/25/2018 ∙ by Glen Chou, et al. ∙ 0

• ### Safe Control under Uncertainty

Controller synthesis for hybrid systems that satisfy temporal specificat...
10/25/2015 ∙ by Dorsa Sadigh, et al. ∙ 0

• ### Automated Synthesis of Safe Digital Controllers for Sampled-Data Stochastic Nonlinear Systems

We present a new method for the automated synthesis of digital controlle...
01/10/2019 ∙ by Fedor Shmarov, et al. ∙ 0

• ### Fast and Safe Path-Following Control using a State-Dependent Directional Metric

This paper considers the problem of fast and safe autonomous navigation ...
02/05/2020 ∙ by Zhichao Li, et al. ∙ 0

• ### Counter-example Guided Learning of Bounds on Environment Behavior

There is a growing interest in building autonomous systems that interact...
01/20/2020 ∙ by Yuxiao Chen, et al. ∙ 0

• ### Data-Driven Strategies for Hierarchical Predictive Control in Unknown Environments

05/13/2021 ∙ by Charlott Vallon, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Recent research in robotics and control theory has focused on developing complex autonomous systems, such as robotic manipulators, autonomous vehicles, and surgical robots. Since many of these systems are safety-critical, it is important to design provably-safe controllers while determining environments in which safety can be guaranteed. In this work, we focus on reach-avoid objectives, where the goal is to design a controller to reach a target set of states (referred to as reach set) while avoiding unsafe states (avoid set). Reach-avoid problems are common for autonomous vehicles in the real world; for example, a drone flying in an indoor setting. Here the reach set could be a desired goal position and the avoid set could be the set of the obstacles. In such a setting, it is important to determine the environments in which the drone can safely navigate, as well as the corresponding safe controllers.

Typically, a mathematical model of the system, such as a physics-based first principles model, is used for synthesizing a safe controller in different environments (e.g., (Tomlin et al., 2000; Tabuada, 2009)

). However, when the system dynamics are unknown, synthesizing such a controller becomes challenging. In such cases, it is a common practice to identify a model for the system. This model represents an abstraction of the system behavior. Recently, there has been an increased interest in using machine learning (ML) based tools, such as neural networks and Gaussian processes, for learning abstractions

directly from the data collected on the system (Bansal et al., 2016, 2017; Lenz et al., 2015). One of the many verification challenges for ML-based systems (Seshia et al., 2016) is that such abstractions cannot be directly used for verification, since it is not clear a priori how representative the abstraction is of the actual system. Hence, to use the abstraction to provide guarantees for the system, we need to first quantify the differences between it and the system.

One approach is to use model identification techniques that provide bounds on the mismatch between the dynamics of the system and its abstraction both in time and frequency domains (see (Gevers et al., 2003; Hjalmarsson and Ljung, 1992; Ljung, 1987) and references therein). This bound is then used to design a provably stabilizing controller for the system. These approaches have largely been limited to linear abstractions and systems, and the focus has been on designing asymptotically stabilizing controllers.

Another way to quantify the difference between a general non-linear system and its abstraction relies on the notion of a (approximate) simulation metric  (Alur et al., 2000; Girard and Pappas, 2007; Baier et al., 2008). Such a metric measures the maximal distance between the system and the abstraction output trajectories over all finite horizon control sequences. Standard simulation metrics (referred to as SSM here on) have been used for a variety of purposes such as safety verification (Girard and Pappas, 2011), abstraction design for discrete (Larsen and Skou, 1991), nonlinear (Pola et al., 2008), switched (Girard et al., 2010) systems, piecewise deterministic and labelled Markov processes (Desharnais et al., 2002; Strubbe and Van Der Schaft, 2005), and stochastic hybrid systems (Abate and Prandini, 2011; Garatti and Prandini, 2012; Julius and Pappas, 2009; Bujorianu et al., 2005), model checking (Baier et al., 2008; Katoen et al., 2007), and model reduction (Dean and Givan, 1997; Papadopoulos and Prandini, 2016).

Once computed, the SSM is used to expand the unsafe set (or avoid set) in (Abate and Prandini, 2011). For reach-avoid scenarios, we additionally use it to contract the reach set as shown in Figure 1. If we can synthesize a safe controller that ensures the abstraction trajectory avoids the expanded avoid set and reaches the contracted reach set, then the system trajectory is guaranteed to avoid and reach the original avoid set and reach set respectively. This follows from the property that SSM captures the worst case distance between the trajectories of the system and the abstraction. Consequently, the set of safe environments for the system can be obtained by finding the set of environments for which we can design a safe controller for the abstraction with the modified specification.

Even though powerful in its approach, SSM computes the maximal distance between the system and the abstraction trajectories across all possible controllers. We show in this paper that this is unnecessary and might lead to a conservative bound on the quality of the abstraction for the purposes of controller synthesis. In particular, the larger the distance between the system and the abstraction, the larger the expansion (contraction) of the avoid (reach) set. In many cases, this results in unrealizability wherein there does not exist a safe controller for the abstraction for the modified specification.

In this paper, we propose SPEC, SPEcification-Centric simulation metric, that overcomes these limitations. SPEC achieves this by computing the distance across

1. only those controllers that can be synthesized by a particular control scheme and that are safe for the abstraction (in the context of the original reach-avoid specification) — these are the only potential safe controllers for the system;

2. only those abstraction and system trajectories for which the system violates the reach-avoid specification, and

3. only between the abstraction trajectory and the reach and the avoid sets.

If the reach-avoid specification is changed using SPEC in a similar fashion as that for SSM, it is guaranteed that if a controller is safe for the abstract model, it remains safe for the system. SPEC can be significantly less conservative than SSM, and can be used to design safe controllers for the system for a broader range of reach-avoid specifications. In fact, we show that, among all uniform distance bounds (i.e., a single distance bound is used to modify the specification in all environments), SPEC provides the largest set of environments such that a safe controller for the abstraction is also safe for the system.

Note that a similar metric has been used earlier (Ghosh et al., 2016) to find tight environment assumptions for temporal logic specifications. However, it applies in much more restricted settings since it relies on having simple linear representations of the abstraction which can be expressed as a linear optimization problem.

In general, it is challenging to compute both SSM and SPEC when the dynamics of the system are not available. Several approaches have been proposed in the literature for computing SSM (Abate, 2009; Girard and Pappas, 2007; Julius and Pappas, 2009); however, restrictive assumptions on the dynamics of the systems are often required to compute it. More recently, a randomized approach has been proposed to compute SSM (Abate and Prandini, 2011; Garatti and Prandini, 2012) for finite-horizon properties that relies on “scenario optimization”, which was first introduced for solving robust convex programs via randomization (Calafiore and Campi, 2005) and then extended to semi-infinite chance-constrained optimization problems (Campi and Garatti, 2011). Scenario optimization is a sampling-based method to solve semi-infinite optimization problems, and has been used for system and control design (Calafiore and Campi, 2006; Campi et al., 2009). In this work, we propose a scenario optimization-based computational method for SPEC that has general applicability and is not restricted to a specific class of systems. Indeed, the only assumption is that the system is available as an oracle, with known state and control spaces, which we can simulate to determine the corresponding output trajectory. Given that the distance metric is obtained via randomization and, hence, is a random quantity, we provide probabilistic guarantees on the performance of SPEC. However, this confidence is a design parameter and can be chosen as close to 1 as desired (within a simulation budget). To summarize, this paper’s main contributions are:

• SPEC, a new simulation metric that is less conservative than SSM, and provides the largest set of environments such that a safe controller for the abstraction is also safe for the system;

• a method to compute SPEC that is not restricted to a specific class of systems, and

• a demonstration of the proposed approach on numerical examples and simulations of real-world autonomous systems, such as a quadrotor and an autonomous car.

## 2. Mathematical Preliminaries

Let be an unknown, discrete-time, potentially non-linear, dynamical system with state space and control space . Let be an abstraction of with the same state and control spaces as , whose dynamics are known. We also assume that the bounds between the dynamics of and are not available beforehand (i.e., we cannot a priori quantify how different the two are). denotes the trajectory of at time starting from the initial state and applying the controller . is similarly defined. For ease of notation, we drop and from the trajectory arguments wherever convenient.

We define by the set of all reach-avoid scenarios (also referred to as environment scenarios here on), for which we want to synthesize a controller for . A reach-avoid scenario is a three-tuple, , where is the initial state of . and are (potentially time varying) sequences of avoid and reach sets respectively. We leave and abstract except where necessary. If the sets are not time varying, we can replace (respectively ) by the stationary (respectively ). Similarly, if there is no avoid or reach set at a particular time, we can represent and .

For each , we define a reach-avoid specification,

 (1) φ(e):={ξ(⋅):∀t∈Tξ(t)∉A(t)∧ξ(t)∈R(t)},

where denote the time-horizon . We say satisfies the specification , denoted , if .

The reader might observe that our use of in (1) differs somewhat from the intuitive notion of a reach set (depicted, e.g., in Fig. 1). Specifically, (1) defines the reach-avoid specification such that the output trajectory must remain within at all times , while the usual notion involves eventually reaching a desired set of states. Note, however, that for the purposes of defining , these notions are equivalent if in (1) represents the backwards reachable tube corresponding to the desired reach set: if a state is reachable eventually, then the trajectory stays within the backwards reachable tube at all time points. We henceforth use the in the latter sense since it simplifies the mathematics in the paper.

Finally, we define to be the space of all permissible controllers for , and to be the space of all finite horizon control sequences over . For example, if we restrict ourselves to linear feedback controllers, represents the set of all linear feedback controllers that are defined over the time horizon .

## 3. Problem Formulation

Given the set of reach-avoid scenarios , the controller scheme , and the abstraction , our goal is two-fold:

1. to find the environment scenarios for which it is possible to design a controller such that satisfies the corresponding reach-avoid specification ,

2. to find a corresponding safe controller for each scenario in (1).

Mathematically, we are interested in computing the set

 (2) ES={e∈E:∃u∈UΠ(e),ξS(⋅;x0,u)⊨φ(e)},

and the corresponding set of safe controllers for each

 (3) US(e)={u∈UΠ(e):ξS(⋅;x0,u)⊨φ(e)}.

When a dynamics model of is known, several methods have been studied in literature to compute the sets and for reach-avoid problems (Tomlin et al., 1998; Mitsch et al., 2013; Tomlin et al., 2000). However, since a dynamics model of is unknown, the computation of these sets is challenging in general. To overcome this problem, one generally relies on the abstraction . We make the following assumptions on and :

###### Assumption 1 ().

is available as an oracle that can be simulated, i.e., we can run an execution (or experiment) on and obtain the corresponding system trajectory .

###### Assumption 2 ().

For any , we can determine if there exist a controller such that and can compute such a controller.

Assumption 1 states that even though we do not know the dynamics of , we can run an execution of . Assumption 2 states that it is possible to verify whether satisfies a given specification or not. Although it is not a straightforward problem, since the dynamics of are known, several existing methods can be used for obtaining a safe controller for .

Under these assumptions, we show that we can convert a verification problem on to a verification problem on . In particular, we compute a distance bound, SPEC, between and which along with allows us to compute a conservative approximation of and .

## 4. Running Example

We now introduce a very simple example that we will use to illustrate our approach, a 2 state linear system in which the system and the abstraction differ only in one parameter. Although simple, this example illustrates several facets of SPEC. We present more realistic case studies in Section 8.

Consider a system whose dynamics are given as

 (4) x(t+1)=[x1(t+1)x2(t+1)]=[2000.1][x1(t)x2(t)]+[10]u(t).

We are interested in designing a controller for to regulate it from the initial state to a desired state over a time-horizon of 20 steps, i.e, . In particular, we have

 X0={[0,0]},A=∅,R=⋃−4≤x∗1≤4R(⋅;x∗),

where

 R(t;x∗)= R2,t∈{0,1,…,H−1}, R(H;x∗)= {x:∥x−x∗∥2<γ}.

We use in our simulations. Thus, each consists of a final state (equivalently, a reach set ) to which we want the system to regulate, starting from the origin. Consequently, the system trajectory satisfies the reach-avoid specification in this case if .

For the purpose of this example, we assume that the system dynamics in (4) are unknown; only the dynamics of its abstraction are known and given as

 (5) x(t+1)=[x1(t+1)x2(t+1)]=[2000.1][x1(t)x2(t)]+[10.1]u(t).

In this example, we use the class of linear feedback controllers as , although other control schemes can very well be used. In particular, for any given environmental scenario , the space of controllers is given by

 UΠ(e)={LQR(q,x∗):0.1≤q≤100},

where is a Linear Quadratic Regulator (LQR) designed for the abstraction dynamics in (5) to regulate the abstraction trajectory to 111That is, we penalize the trajectory deviation to the desired state in the LQR cost function., with the state penalty matrix and the control penalty coefficient . Here,

is an identity matrix. Thus, for different values of

we get different controllers, which affect the various characteristics of the resultant trajectory, such as overshoot, undershoot, and final state. for any given can be obtained by solving the discrete-time Riccati equation (Kwakernaak and Sivan, 1972). Our goal thus is to use the dynamics in (5) to find the set of final states to which can be regulated and the corresponding regulator in .

## 5. Solution Approach

### 5.1. Computing approximate safe sets using M and simulation metric

Computing sets and exactly can be challenging since the dynamics of are unknown a priori. Generally, we use the abstraction as a replacement for to synthesize and analyze safe controllers for . However, to provide guarantees on using , we would need to quantify how different the two are.

We quantify this difference through a distance bound, , between and . is used to modify the specification to a more stringent specification such that if then . Thus, the set of safe controllers for for can be used as an approximation for . In particular, if we define the sets and as

 (6) Uφ(e;d):={u∈UΠ(e):ξM(⋅;x0,u)⊨φ(e;d)}Eφ(d):={e∈E:Uφ(e,d)≠∅},

then and can be used as an approximation of and respectively. Consequently, a verification problem on can be converted into a verification problem on using the modified specification.

One such distance bound is given by the simulation metric, SSM, between and defined as

 (7) da=maxe∈Emaxu∈UΠ(e)∥ξS(⋅;x0,u)−ξM(⋅;x0,u)∥∞

Here, the -norm is the maximum distance between the trajectories across all timesteps. Typically SSM is computed over the space of all finite horizon controls instead of  (Girard and Pappas, 2011). Since we are interested in a given control scheme, we restrict this computation to . In general, is difficult to compute, because it requires searching over (the potentially infinite) space of controllers and environments. An approximate technique to compute was presented for systems whose dynamics were unknown with probabilistic guarantees in (Abate and Prandini, 2011).

However, if can be computed then it can be used to modify a specification to as follows: “expand” the avoid set to get the augmented avoid set , and “contract” the reach set to obtain a conservative reach set (see Figure 1). Here, () is the Minkowski sum(difference)222The Minkowski sum of a set and a scalar is the set of all points that are the sum of any point in and , where is a disc of radius around the origin.. Consequently, is the set of trajectories which avoid and are always contained in ,

 (8) φ(e;da):={ξ(⋅):ξ(t)∉A(t;da),ξ(t)∈R(t;da)∀t∈T}.

Then it can be shown that any controller that satisfies the specification for also ensures that satisfies the specification .

###### Proposition 0 ().

For any and controller , we have implies .

The proof of Proposition 1 can be found in the Appendix. Proposition 1 implies that and can be used as approximations of and respectively. However, the distance bound in (7) does not take into account the reach-avoid specification (environment) for which a controller needs to be synthesized. Thus, can be quite conservative. As a result, the modified specification can be so stringent that the set of environments for which we can synthesize a provably safe controller for the abstraction (and hence for the system) itself will be very small, resulting in a very conservative approximation of .

### 5.2. Specification-Centric Simulation Metric (SPEC)

To overcome these limitations, we propose SPEC,

 (9)

where

 (10) d(ξS(⋅),ξM(⋅))=mint∈T(min{h(ξM(t;x0,u),A(t)),−h(ξM(t;x0,u),R(t))})1(ξS(⋅)⊭φ(e))

Here is the set of all controls such that satisfies the specification . represents the indicator function which is if is true and otherwise, is the signed distance function defined as

 h(x,K):={infk∈K∥x−k∥,if x∉K−infk∈KC∥x−k∥,otherwise.

If for any , is empty, we define the distance function to be zero. Similarly, if there is no or at a particular , the corresponding signed distance function is defined to be . There are four major differences between (7) and (9):

1. To compute the we only consider the feasible set of controllers that can be synthesized by the control policy, , as all other controllers do not help us in synthesizing a safe controller for (as they are not even safe for ).

2. To compute the distance between and , we only consider those trajectories where violates the specification. This is because a non-zero distance between the trajectories of and , where the does not give us any additional information in synthesizing a safe controller.

3. Within a falsifying , we compute the minimum distance of the abstraction trajectory from the avoid and reach sets rather than the system trajectory, as that is sufficient to obtain a margin to discard behaviors that are safe for the abstraction but unsafe for the system.

4. Finally, a minimum over time of this distance is sufficient to discard an unsafe trajectory, as the trajectory will be unsafe if it is unsafe at any .

These considerations ensure that is far less conservative compared to and allows us to synthesize a safe controller for the system for a wider set of environments. We first prove that can be used to compute an approximation of .

###### Proposition 0 ().

If , then implies

The proof of Proposition 2 can be found in the Appendix. Thus, if we define and as in (6) then they can be used as approximations of and respectively. Proposition 2 requires that the set of controllers that satisfy the modified specification, , is a subset of the set of the controllers that satisfy the actual specification, . When , this condition is trivially satisfied as the modified specification is more stringent than the actual specification. Other control schemes, such as the set of linear feedback controllers and feasibility-based optimization schemes also satisfy this condition. In fact, in such cases, the proposed metric, , quantifies the tightest (largest) approximation of , i.e., , such that .

###### Theorem 3 ().

Let be such that whenever . Let be any distance bound such that

 (11) ∀e∈E,∀u∈UΠ(e),ξM(⋅)⊨φ(e;d)→ξS(⋅)⊨φ(e).

Then . Moreover, . Hence, and quantify the tightest (largest) approximations of and respectively among all uniform distance bounds .

Theorem 3 states that is the smallest among all (uniform) distance bounds between and , such that a safe controller synthesized on is also safe for . Even though this is a stricter condition than we need for defining , where we care about the existence of at least one safe controller for , it allows us to use any safe controller for as a safe controller for . Formally, , for all such that .

Intuitively, to compute (9), we collect all pairs (across all and ) where and . We then evaluate (10) for each pair and take the maximum to compute . By expanding (contracting) every () uniformly by , we ensure that none of the collected above is feasible once the specification is modified, and hence, will never falsify . To ensure this, we prove that is the minimum distance by which the avoid sets should be augmented (or the reach sets should be contracted). Thus, can also be interpreted as the minimum by which the specification should be modified to ensure that for all .

###### Corollary 0 ().

Let satisfies (11), then implies .

We conclude this section by discussing the relative advantages and limitations of SPEC and SSM, and a few remarks.

#### Comparing SPEC and SSM

SSM is specification-independent (and hence environment-independent); and hence can be reused across different tasks and environments. This is ensured by computing the distance between trajectories across all input control sequences; however, the very same aspect can make SSM overly-conservative. Making SPEC specification-dependent trades in generalizability for a less conservative measure. Although environment-dependent, the set of safe environments obtained using SPEC is larger compared to SSM. This is an important trade-off to make for any distance metric–the utility of a distance metric could be somewhat limited if it is too conservative.

The computational complexities for computing SPEC and SSM are the same since they both can be computed using Algorithm 1. To compute SSM we sample from a domain of all finite horizon controls. To compute SPEC we additionally need to be able to define and sample from the set of environment scenarios, but we believe that some representation of the environment scenarios is important for practical applications.

###### Remark 1 ().

The proposed framework can also be used in the scenarios where there is a deterministic controller for each environment. In such cases, (and ) is a singleton set for every environment (see Section 8.2 for an example). However, from a control theory perspective, it might be useful to have a set of safe controllers that have different transient behaviors, that the system designer can choose from without recomputing the distance metric.

## 6. Distance Metric Computation

Since a dynamics model of is not available, the computation of the distance bound is generally difficult. Interestingly, this computational issue can be resolved using a randomized approach, such as scenario optimization (Calafiore and Campi, 2006). Scenario optimization has been used for a variety of purposes (Campi et al., 2009; Campi and Garatti, 2011), such as robust control, model reduction, as well as for the computation of SSM (Abate and Prandini, 2011).

Computing by scenario optimization is summarized in Algorithm 1. We start by (randomly) extracting realizations of the environment , (Line 2). Each realization consists of an initial state , and a sequence of reach and avoid sets, and . For each , we extract a controller (Line 5). If such a controller does not exist, we denote to be a null controller . (if not ) is then applied to both the system as well as the abstraction to obtain the corresponding trajectories and (Line 6). We next compute the distance between these two trajectories, , using (10) (Line 7). If , no satisfying controller exists for , and hence is trivially . The maximum across all these distances,

, is then used as an estimate for

(Line 10).

Although simple in its approach, scenario optimization provides provable approximation guarantees. In Algorithm 1, we have to sample both an and a corresponding controller . We define a joint sample space

 (12)

contains all feasible pairs for . We create a dummy sample for all

where a satisfying controller does not exist for the abstraction. We next define a probability distribution on

, where is probability of sampling and is the probability of sampling given . This distribution is key to capture the sequential nature of sampling only after sampling . For where , since has only a single entry for , i.e, . In Algorithm 1, in Line 2, we sample . In Line 5, we sample .

###### Proposition 0 ().

Let be the joint sample space as defined in (12), with the probability distribution . Select a ‘violation parameter’ and a ‘confidence parameter’ . Pick such that

 (13) N≥2ϵ(ln1β+1),

then, with probability at least , the solution to Algorithm 1 satisfies the following conditions:

1. provided .

The proof of Proposition 1 can be found in the Appendix. Intuitively, Proposition 1 states that is a high confidence estimate of , if a large enough is chosen. If we discard the confidence parameter

for a moment, this proposition states that the size of the violation set (the set of

where the corresponding distance is greater than ) is smaller than or equal to the prescribed value. As tends to zero, approaches the desired optimal solution . In turn, the simulation effort grows unbounded since is inversely proportional to .

As for the confidence parameter , one should note that is a random quantity that depends on the randomly extracted pairs. It may happen that the extracted samples are not representative enough, in which case the size of the violation set will be larger than . Parameter controls the probability that this happens; and the final result holds with probability . Since in (13) depends logarithmically on ; can be pushed down to small values such as , to make so close to to lose any practical importance.

Finally, once we have a high confidence estimate of , we can use it with Proposition 2 to provide guarantees on the safety of a controller for the system, provided that it is safe for the abstraction. (Statement (2) in Proposition 1)

Note that the controller is extracted randomly from the set (Line 5). Obtaining and randomly sampling from it can be challenging in itself depending on the control scheme, , and the specification, . However, one way to randomly extract is using rejection sampling, i.e., we randomly sample controllers from the set until we find a controller that satisfies the specification for the model. Since the controller performance is evaluated only on the model during this process, it is often cheap and does not put the system at risk. Nevertheless, choosing a good control scheme makes this process more efficient, as the number of samples rejected before a feasible controller is found will be fewer (see Section 7 for further discussion on this). Rejection sampling, however, poses a problem when and there is no way of knowing that beforehand. In such cases, one can impose a limit on the number of rejected samples to make sure the algorithm terminates. This problem can also be overcome easily when there is a single safe controller for each environment, i.e., is a singleton set (see Remark 1).

###### Remark 2 ().

Even though we have presented scenario optimization to estimate

, alternative derivative free optimization approaches such as Bayesian optimization, simulated annealing, evolutionary algorithms, and covariance matrix adaptation can be used as well. However, for many of these algorithms, it might be challenging to provide formal guarantees on the quality of the resultant estimate of the distance bound.

Algorithm 1 samples environment scenarios and corresponding controllers prior to running any executions on and . Imagine at iteration , we have ; and if at iteration , , then the th sample is not informative for approximating . A simple way to overcome this issue would be to consider only as the set of feasible controllers at the th iteration; i.e., consider controllers where . This variant of Algorithm 1 would reduce the number of executions of the system; and ensure that each execution is informative for estimating . To implement this scheme, we would maintain a running max which contains the maximum of seen till now. In iteration , instead of sampling from in Line 5, we sample from . Further, before the end of loop, in Line 7, we update .

## 7. Running Example: Distance Computation

We now apply the proposed algorithm to compute for the setting described in Section 4. in this case is given as

 Uφ(e)= {u∈UΠ(e):∥ξM(H;x0,u)−x∗∥2<γ},

where is the set of LQR controllers (see Section 4). To illustrate the importance of the choice of distance metric, we compute two different distance metrics between and : in (7) and in (9). To compute , we use Algorithm 1. To compute , we modify Algorithm 1 to sample a random controller from in Line 5 and compute using (7) in Line 7.

According to the scenario approach with and , we extract different reach-avoid scenarios (i.e., different final states to reach). For each , we obtain a feasible LQR controller using rejection sampling. In particular, we randomly sample a penalty parameter , solve the corresponding Riccati equation to obtain , and apply it on . If the corresponding satisfies , we use as our feasible controller sample; otherwise, we sample a new and repeat the procedure until a feasible controller is found. This procedure tends to be really fast and requires simulating only . A feasible controller was found within 3 samples of for all in this case. For , we randomly sample a penalty parameter and use as the controller.

The obtained distance metrics are Since , it can be used to synthesize a safe controller for ; however, we can synthesize controller only for those reach-avoid scenarios where satisfies a much stringent specification: must reach within a ball of radius 0.07 around the target state. Consequently, the set is likely to be very small. In contrast, ; thus, Proposition 1 ensures that any controller designed on that satisfies is guaranteed to satisfy it for as well. In particular, the dynamics of and are same for the state , and state is uncontrollable for and remain at all times. Thus, any controller that reaches within a ball of radius around a desired state for , if applied on , also ensures that the system state reaches within the same ball. Even though this relationship between and is unknown, is able to capture it only through simulations of . This example also illustrates that significantly reduces the conservativeness in SSM, and does not unnecessarily contract the set of safe environments.

## 8. Case Studies

We now demonstrate how SPEC can be used to obtain the safe set of environments and controllers for an autonomous quadrotor and an autonomous car. In Section 8.1, we demonstrate how SPEC provides much larger safe sets compared to SSM. In Section 8.2, we demonstrate how SPEC not only captures the differences between the dynamics of and , but also other aspects of the system, in particular the sensor error, that might affect the satisfiability of a specification.

### 8.1. Safe Altitude Control for Quadrotor

Our first example illustrates how the proposed distance metric behaves when the only difference between the system and the abstraction is the value of one parameter. However, unlike the running example, the system and the abstraction dynamics are non-linear. Moreover, we illustrate how SPEC can be used in the cases where all safe controllers for may not be safe for .

We use the reach-avoid setting described in (Fisac et al., 2017), where the authors are interested in controlling the altitude of a quadrotor in an indoor setting while ensuring that it does not go too close to the ceiling or the floor, which are obstacles in our experiments.

A dynamic model of quadrotor vertical flight can be written as:

 (14) z(t+1)=z(t)+Tvz(t)vz(t+1)=vz(t)+T(ku(t)+g),

where is the vehicle’s altitude, is its vertical velocity and is the commanded average thrust. The gravitational acceleration is and the discretization step is 0.01. The control input is bounded to . We are interested in designing a controller for that ensures safety over a horizon of 100 timesteps. In particular, we have , , and . The avoid set at any time is given as . We again assume that the dynamics in (14) are unknown. Consider an abstraction of with same dynamics as (14) except that the value of parameter in the abstraction dynamics, , is different.

The space of controllers is given by all possible control sequences over the time horizon (i.e., .) For computing , we use the Level Set Toolbox (Mitchell, 2008) that gives us both the set of initial states from which there exist a controller that will keep the outside the avoid set at all times (also called the reachable set), as well as the corresponding least restrictive controller. In particular, we can apply any control when the abstraction trajectory is inside the reachable set and the safety-preserving control (given by the toolbox) when the trajectory is near the boundary of the reachable set. For computation of the distance bounds, we sample a random controller sequence according to this safety-preserving control law. If any initial state lies outside the reachable set, then it is also guaranteed that so we do not need to do any rejection sampling in this case.

When , has strictly less control authority compared to . Thus, any controller that satisfies the specification for will also satisfy the specification for , so itself is an under approximation of . SPEC is again able to capture this behavior. Indeed, we computed an estimate for the distance bound using Algorithm 1 and the obtained numbers are and . Note that not only is conservative, it may not be particularly useful in synthesizing a safe controller for . computed using Algorithm 1 ensures that a safe controller designed on for is also safe for with high probability, only when this controller is randomly selected from the set . However, a random controller selected from is unlikely to satisfy for itself, and thus nothing can be said about either. Thus, it is hard to actually compute an approximation of . In contrast, samples a controller from the set in Algorithm 1. Therefore, to synthesize a controller, we randomly select a controller from the set , which is guaranteed to be safe on both and with high probability. Therefore, it might be better to compare to , which is defined similar to , except the inner maximum in (7) is computed over instead. in this case turns out to be .

Note that if we could instead compute the distance metrics exactly, , since . However, random sampling based estimate of can be greater than that of if the controllers corresponding to a large distance between the and are sparse in compared to that in .

For illustration purposes, we also compute the reachable set , by augmenting the avoid set by and recomputing the reachable sets using the Level Set Toolbox. As shown in Figure 2, (the area withing the blue contour) is indeed contained within (the area within the red contour). Here, has been computed using the system dynamics. Even though (the area within the magenta contour) is also contained in , it is significantly smaller in size compared to .

When , has strictly less control authority compared to . Consequently, there might exist some environments for which it is possible to synthesize a safe controller for , but the same controller when deployed on might lead to an unsafe behavior. We again compute the distance bounds using Algorithm 1 and the obtained numbers are . The corresponding reachable sets are shown in Figure 3. Even though we start with an overly optimistic abstraction, both and are able to compute an under approximation of ; however, the set estimated by is, once again, overly conservative.

### 8.2. Webots: Lane Keeping

We now show the application of the proposed metric for designing a safe lane keeping controller for an autonomous car.

In this example, we use the Webots simulator (Webots, 1998). The car model within the simulator is our . For the abstraction we consider the bicycle model,

 (15) ˙x=v⋅sinθ˙y=v⋅cosθ˙v=a˙θ=vltanω

where is the state, representing perpendicular deviation from the center of the lane, position along the road, speed, and heading respectively. The maximum speed is limited to km/hr. We have two inputs, (1) a discrete acceleration control ; and (2) a continuous steering control . For our experiments, we use , which translates to about seconds of simulated trajectory. The dynamics of are typically much more complex than and include the physical effects like friction and slip on the road.

In this case, ; the initial and is set to . . The reach set corresponds to keeping the car within the of the center of the lane. For keeping the car in the lane, the car is equipped with two sensors, a camera (to capture the lane ahead) and compass (to measure the heading of the car). There is an on board perception module, which first captures the image of the road ahead; and processes it to detect the lane edges and provide an estimate of the deviation of the car from the center of the lane.

There is another car (referred to as the environment car hereon) driving in the front of , which might obstruct the lane and cause the perception module to incorrectly detect the lane center. For each , the set of possible initial states of the environment car is given by . We set the initial speed and heading of the environment car to and respectively. We want to make sure that remains within the lane despite all possible initial positions of the environment car. For this purpose, we compute the worst-case across all .

If the environment car or its shadow covers the lane edges (see Figure 5 for some possible scenarios), then the lane detection fails. Technically speaking, if such a scenario occurs, then should slow down and come to stop until the image processing starts detecting the lane again. Consequently, our control scheme , is a hybrid controller shown in Figure 4, where in each mode the controller is given by an LQR controller (with a fixed Q and R matrix) corresponding to the (linearized) dynamics in that mode. In this example, our controller is a deterministic controller since the Q and R matrices are fixed, and hence .

In Figure 4, in mode (1), the lane is detected and . When the we transition to mode (2) given the lane is still detected. When the lane is no longer detected, we transition to mode (3) if , or mode (4) if . In modes (3) and (4), the car slows down until the lane is detected again.

By setting and we get . We used Algorithm 1, to sample different initial states of the , ; and environment car in the simulator, . Since the controller is deterministic, the set of feasible controllers is a singleton set, and hence we do not need to sample a feasible controller (Line 5 in Algorithm 1). Among these environment scenarios, the controller on is also able to safely control for 2519 scenarios. is determined entirely by the remaining 445 controller, and computed to be 0.34m. We show the application of the the computed for a sample environment scenario in Figure 6. The green lines represent the original reach set. The yellow shaded region represents the contracted reach set for the model computed using . The model’s trajectory (shown in blue) is contained in the yellow region and hence satisfies the more constrained specification. As a result, even though the system’s trajectory (shown in dotted red) leaves the yellow region, it is contained within the original reach set at all times.

We also analyze these 445 environmental scenarios that contribute to , and notice that the fault lies within the perception module. In Figure 7, we show one such scenario. In this case, . Because of the left rotation of the car, the rightmost lane appears smaller and farther due to the perspective distortion. Furthermore, the presence of the environment car completely cover the rightmost lane in the image. The image processing module now detects the leftmost lane as the center lane and the center lane as the rightmost lane. Consequently, the module returns an inaccurate estimation of the center of the lane, causing to go outside the center lane. This example illustrates that the samples in Algorithm 1 that contributed to could also be used to analyze the reasons behind the violation of the safety specification by .

## 9. Conclusion and Future Work

Determining safe environments and synthesizing safe controllers for autonomous systems is an important problem. Typically, we rely on an abstraction of the system to synthesize controllers in different environments. However, when a mathematical model of the system is not available, for example when the abstraction is obtained through data, it becomes challenging to provide safety guarantees for the system based on the abstraction. In this paper, we propose a specification-centric simulation metric SPEC that can be used to determine the set of safe environments; and to synthesize a safe controller using such data-driven abstractions. We also present an algorithm to compute this metric using executions on the system without knowing its true dynamics. The proposed metric is less conservative and allows controller synthesis for reach-avoid specifications over a broader range of environments compared to the standard simulation metric.

In future, it would be interesting to extend the proposed framework for more general specifications and study its application in runtime-assurance frameworks like (Herbert* et al., 2017) and (Desai et al., 2018). Another interesting direction will be to explore active sampling methods, such as Bayesian Optimization, for the computation of SPEC.

## References

• (1)
• Abate (2009) A. Abate. 2009. A contractivity approach for probabilistic bisimulations of diffusion processes. In Proceedings of the 48th IEEE Conference on Decision and Control.
• Abate and Prandini (2011) A. Abate and M. Prandini. 2011. Approximate abstractions of stochastic systems: A randomized method. In Conference on Decision and Control and European Control Conference.
• Alur et al. (2000) R. Alur, T. A. Henzinger, G. Lafferriere, and G. J. Pappas. 2000. Discrete abstractions of hybrid systems. In Proceedings of the IEEE.
• Baier et al. (2008) C. Baier, J. Katoen, and K. G. Larsen. 2008. Principles of model checking. MIT press.
• Bansal et al. (2016) S. Bansal, A. K. Akametalu, F. J. Jiang, F. Laine, and C. J. Tomlin. 2016. Learning quadrotor dynamics using neural network for flight control. In Conference on Decision and Control.
• Bansal et al. (2017) S. Bansal, R. Calandra, T. Xiao, S. Levine, and C. J. Tomlin. 2017. Goal-Driven Dynamics Learning via Bayesian Optimization. In Conference on Decision and Control.
• Bujorianu et al. (2005) M. L. Bujorianu, J. Lygeros, and M. C. Bujorianu. 2005. Bisimulation for general stochastic hybrid systems. In International Workshop on Hybrid Systems: Computation and Control.
• Calafiore and Campi (2005) G. C. Calafiore and M. C. Campi. 2005. Uncertain convex programs: randomized solutions and confidence levels. In Mathematical Programming.
• Calafiore and Campi (2006) G. C. Calafiore and M. C. Campi. 2006. The scenario approach to robust control design. In IEEE Transactions on Automatic Control.
• Campi and Garatti (2011) M. C. Campi and S. Garatti. 2011. A sampling-and-discarding approach to chance-constrained optimization: feasibility and optimality. In Journal of Optimization Theory and Applications.
• Campi et al. (2009) M. C. Campi, S. Garatti, and M. Prandini. 2009. The scenario approach for systems and control design. In Annual Reviews in Control.
• Dean and Givan (1997) T. Dean and R. Givan. 1997.

Model minimization in Markov decision processes. In

AAAI/IAAI.
• Desai et al. (2018) Ankush Desai, Shromona Ghosh, Sanjit A. Seshia, Natarajan Shankar, and Ashish Tiwari. 2018. SOTER: Programming Safe Robotics System using Runtime Assurance. In arXiv:1808.07921.
• Desharnais et al. (2002) J. Desharnais, A. Edalat, and P. Panangaden. 2002. Bisimulation for labelled Markov processes. In Information and Computation.
• Fisac et al. (2017) J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula, and C. J. Tomlin. 2017. A general safety framework for learning-based control in uncertain robotic systems. In arXiv:1705.01292.
• Garatti and Prandini (2012) S. Garatti and M. Prandini. 2012. A simulation-based approach to the approximation of stochastic hybrid systems. In Analysis and design of hybrid systems.
• Gevers et al. (2003) M. Gevers, X. Bombois, B. Codrons, G. Scorletti, and B. D. O. Anderson. 2003. Model Validation for Control and Controller Validation in a Prediction Error Identification framework-Part I: Theory. In Automatica.
• Ghosh et al. (2016) S. Ghosh, D. Sadigh, P. Nuzzo, V. Raman, A. Donzé, A. L. Sangiovanni-Vincentelli, S. Sastry, and S. A. Seshia. 2016. Diagnosis and Repair for Synthesis from Signal Temporal Logic Specifications. In Proceedings of the 19th International Conference on Hybrid Systems: Computation and Control.
• Girard and Pappas (2007) A. Girard and G. J. Pappas. 2007. Approximate bisimulation relations for constrained linear systems. In Automatica.
• Girard and Pappas (2011) A. Girard and G. J. Pappas. 2011. Approximate Bisimulation: A bridge between computer science and control theory. In European Journal of Control.
• Girard et al. (2010) A. Girard, G. Pola, and P. Tabuada. 2010. Approximately bisimilar symbolic models for incrementally stable switched systems. In IEEE Transactions on Automatic Control.
• Herbert* et al. (2017) Sylvia L Herbert*, Mo Chen*, SooJean Han, Somil Bansal, Jaime F Fisac, and Claire J Tomlin. 2017. FaSTrack: a Modular Framework for Fast and Guaranteed Safe Motion Planning. IEEE Conference on Decision and Control.

Estimating model variance in the case of undermodeling. In

IEEE Transactions on Automatic Control.
• Julius and Pappas (2009) A. A. Julius and G. J. Pappas. 2009. Approximations of stochastic hybrid systems. In IEEE Transactions on Automatic Control.
• Katoen et al. (2007) J. P. Katoen, T. Kemna, I. Zapreev, and D. N. Jansen. 2007. Bisimulation minimisation mostly speeds up probabilistic model checking. In International Conference on tools and algorithms for the construction and analysis of systems.
• Kwakernaak and Sivan (1972) H. Kwakernaak and R. Sivan. 1972. Linear optimal control systems.
• Larsen and Skou (1991) K. G. Larsen and A. Skou. 1991. Bisimulation through probabilistic testing. In Information and computation.
• Lenz et al. (2015) I. Lenz, R. A. Knepper, and A. Saxena. 2015. DeepMPC: Learning Deep Latent Features for Model Predictive Control.. In Robotics: Science and Systems.
• Ljung (1987) L. Ljung. 1987. System identification: theory for the user.
• Mitchell (2008) I. M. Mitchell. 2008. The flexible, extensible and efficient toolbox of level set methods. In Journal of Scientific Computing.
• Mitsch et al. (2013) S. Mitsch, K. Ghorbal, and A. Platzer. 2013. On Provably Safe Obstacle Avoidance for Autonomous Robotic Ground Vehicles. In Robotics: Science and Systems.
• Papadopoulos and Prandini (2016) A. V. Papadopoulos and M. Prandini. 2016. Model reduction of switched affine systems. In Automatica.
• Pola et al. (2008) G. Pola, A. Girard, and P. Tabuada. 2008. Approximately bisimilar symbolic models for nonlinear control systems. In Automatica.
• Seshia et al. (2016) Sanjit A. Seshia, Dorsa Sadigh, and S. Shankar Sastry. 2016. Towards Verified Artificial Intelligence. In arXiv:1606.08514.
• Strubbe and Van Der Schaft (2005) S. Strubbe and A. Van Der Schaft. 2005. Bisimulation for communicating piecewise deterministic Markov processes (CPDPs). In International Workshop on Hybrid Systems: Computation and Control.
• Tabuada (2009) Paulo Tabuada. 2009. Verification and Control of Hybrid Systems: A Symbolic Approach. Springer Science & Business Media.
• Tomlin et al. (2000) C. J. Tomlin, J. Lygeros, and S. Sastry. 2000. A game theoretic approach to controller design for hybrid systems. In Proceedings of the IEEE.
• Tomlin et al. (1998) C. J. Tomlin, G. J. Pappas, and S. Sastry. 1998. Conflict resolution for air traffic management: A study in multiagent hybrid systems. In IEEE Transactions on automatic control.
• Webots (1998) Webots. 1998. Commercial Mobile Robot Simulation Software.

## 10. Appendix

### 10.1. Proof of Proposition 1

Proof. Let us consider for a given environment and control , . We would like to prove that .

From (7), we have

 (16) ∥ξS(t)−ξM(t)∥≤da∀t∈T.

From the definition of specification in (8), we have if and only if . Therefore, and . Since ,

 (17) ∥ξM(t)−a∥>da,∀t∈T,∀a∈A(t).

Combining (16) and (17) implies that

 (18) ∥ξS(t)−a∥>0,∀t∈T,∀a∈A(t).

Equation (18) implies that for any . Similarly, it can be shown that

 ∥ξS(t)−r∥>0,∀t∈T,∀r∈R(t)c,

where denotes the complement of the set . Therefore, .

Since and for all , we have .

### 10.2. Proof of Proposition 2

Proof. We prove the desired result by contradiction. Suppose there exists an environment and a controller such that but