ShieldNN: A Provably Safe NN Filter for Unsafe NN Controllers

06/16/2020
by   James Ferlez, et al.
0

In this paper, we consider the problem of creating a safe-by-design Rectified Linear Unit (ReLU) Neural Network (NN), which, when composed with an arbitrary control NN, makes the composition provably safe. In particular, we propose an algorithm to synthesize such NN filters that safely correct control inputs generated for the continuous-time Kinematic Bicycle Model (KBM). ShieldNN contains two main novel contributions: first, it is based on a novel Barrier Function (BF) for the KBM model; and second, it is itself a provably sound algorithm that leverages this BF to a design a safety filter NN with safety guarantees. Moreover, since the KBM is known to well approximate the dynamics of four-wheeled vehicles, we show the efficacy of ShieldNN filters in CARLA simulations of four-wheeled vehicles. In particular, we examined the effect of ShieldNN filters on Deep Reinforcement Learning trained controllers in the presence of individual pedestrian obstacles. The safety properties of ShieldNN were borne out in our experiments: the ShieldNN filter reduced the number of obstacle collisions by 99.4 incorporating ShieldNN during training: for a constant number of episodes, 28 less reward was observed when ShieldNN wasn't used during training. This suggests that ShieldNN has the further property of improving sample efficiency during RL training.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 20

09/18/2020

Learning Safe Neural Network Controllers with Barrier Certificates

We provide a novel approach to synthesize controllers for nonlinear cont...
02/22/2021

Provably Correct Training of Neural Network Controllers Using Reachability Analysis

In this paper, we consider the problem of training neural network (NN) c...
09/03/2021

Provably Safe Model-Based Meta Reinforcement Learning: An Abstraction-Based Approach

While conventional reinforcement learning focuses on designing agents th...
11/05/2019

AReN: Assured ReLU NN Architecture for Model Predictive Control of LTI Systems

In this paper, we consider the problem of automatically designing a Rect...
02/20/2022

Runtime-Assured, Real-Time Neural Control of Microgrids

We present SimpleMG, a new, provably correct design methodology for runt...
10/28/2021

Learning to Control using Image Feedback

Learning to control complex systems using non-traditional feedback, e.g....
01/30/2020

Universally Safe Swerve Manoeuvres for Autonomous Driving

This paper characterizes safe following distances for on-road driving wh...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this paper, we will consider the safety of data-trained memoryless feedback controllers111

RL-trained feed-forward neural networks, for example.

in the following context: we will assume an autonomous vehicle that is described by the Kinematic Bicycle Model (KBM) dynamics (a good approximation for four-wheeled vehicles KongKinematicDynamicVehicle2015 ), and a safety property that the autonomous vehicle should avoid a stationary, fixed-radius disk in the plane. In particular, we propose ShieldNN, an algorithm to design Rectified Linear Unit (ReLU) safety networks for this scenario: a ShieldNN network composed in series with any memoryless feedback controller1 makes the composition of the two controllers provably safe by the aforementioned criterion. This structure itself distinguishes ShieldNN from most other work on safe data-trained controllers: instead of designing a single safe controller, ShieldNN uses the KBM dynamics to design a controller-agnostic NN that corrects – in real-time – unsafe control actions generated by any controller. In other words, ShieldNN designs a “safety-filter” NN that takes as input the instantaneous control action generated by a controller (along with the state of the system) and outputs a safe control for the KBM dynamics; this safety-filter NN thus replaces unsafe controls generated by the original controller with safe controls, whereas safe controls generated by the original controller are passed through unaltered – i.e., unsafe controls are “filtered” out. A block diagram of this structure is depicted in Fig. 1.

The benefits of this approach are manifest, especially for data-trained controllers. On the one hand, existing controllers that have been designed without safety in mind can be made safe by merely incorporating the safety filter in the control loop. In this scenario, the safety filter can also be seen as a countervailing factor to controllers trained to mimic experts: the expert learning can be seen as a design for “performance”, and the safety filter is added to correct unanticipated unsafe control behavior as needed.

Figure 1: Block diagram of ShieldNN in the control loop for a four-wheeled autonomous vehicle.

On the other hand, the controller-agnostic nature of our proposed filter means that ShieldNN itself may be incorporated into training. In this way, the safety filter can be seen to function as a kind of “safety expert” during training, and this can potentially improve sample efficiency by eliminating training runs that end in unsafe states.

The main theoretical contribution of this paper is thus the development of the ShieldNN algorithm. The central pillar of ShieldNN is the notion of a barrier function, because a barrier function allows the safety problem for a feedback controller to be recast as a set membership problem for the outputs of said controller. In particular, this recasting reduces the safety-filter design problem into one of designing a prediction-style NN whose outputs are constrained to lie in a specific set. As a prerequisite for ShieldNN then, we propose a novel class of candidate barrier functions for the KBM dynamics that is characterized by three real-valued parameters (one of which is the safety radius). The necessity for a class of candidate barrier functions stems from the difficulty in analytically designing a barrier function for the KBM dynamics. Thus, ShieldNN is functionally divided into two parts: a verifier, which soundly verifies a particular choice of barrier function (from the aforementioned class), and a synthesizer, which designs the actual ShieldNN filter.

Furthermore, we also validated these theoretical results with experimental validation on four-wheeled vehicle models. In particular, we apply ShieldNN safety-filters both before and after RL training for an autonomous vehicle simulated in CARLA Dosovitskiy17 . Our results show that incorporating ShieldNN dramatically improved the safety of the vehicle: it reduced the number of obstacle collisions by 99.4%-100% in our safety experiments. We also studied the effect of incorporating ShieldNN during training: for a constant number of episodes, 28% less reward was observed when ShieldNN wasn’t used during training. This suggests that ShieldNN has the further property of improving sample efficiency during RL training.

Related Work.

Motivated by the lack of safety guarantees in Deep RL, recent works in the literature of safe RL have focused on designing new RL algorithms that can take safety into account. The work in this area can be classified into three categories. The works in the first category focus on how to modify the training algorithm to take into account safety constraints. Representative examples of this work include reward-shaping 

saunders2018trial , Bayesian and robust regression liu2019robust ; berkenkamp2016bayesian ; pauli2020training , and policy optimization with constraints gaskett2003reinforcement ; moldovan2012safe ; turchetta2016safe ; wen2020safe . Unfortunately, such approaches do not provide provable guarantees on the safety of the trained controller. The second category of literature focuses on using ideas from control theory to augment the RL agent and provide safety guarantees. Examples of this literature include the use of Lyapunov methods berkenkamp2017safe ; chow2019lyapunov ; chow2018lyapunov , safe model predictive control koller2018learning , reachability analysis akametalu2014reachability ; govindarajan2017data ; fisac2018general , barrier certificates wang2018safe ; cheng2019end ; wabersich2018scalable ; srinivasan2020synthesis ; taylor2020control ; li2019temporal ; cheng2020safe ; wang2018safe ; robey2020learning , and online learning of uncertainties shi2019neural . Unfortunately, such methods suffer from either being computationally expensive, specific to certain controller structures or training algorithms or require certain assumptions on the system model. Finally, the third category focus on applying formal verification techniques (e.g., model checking) to verify formal safety properties of pretrained RL agents. Representative examples of this category are the use of SMT-like solvers sun2019formal ; dutta2018output ; liu2019algorithms and hybrid-system verification fazlyab2019efficient ; xiang2019reachable ; ivanov2019verisig . These techniques only assess the safety of a given RL agent instead of designing a safe agent.

2 Problem Statement

As described above, we consider the kinematic bicycle model (KBM) as the dynamical model for our autonomous car. However, the usual KBM is defined in terms of the absolute Cartesian position of the bicycle , which is inconsistent with the sensing modalities typically available to an autonomous robot. Thus, we instead describe the bicycle kinematics in terms of relative positional variables that are directly measurable via LiDAR or visual sensors. In particular, the dynamics for the distance to the obstacle, , and the angle of the bicycle with respect to the obstacle,

, comprise a system of ordinary differential equations. These quantities describe a state-space model that is given by:

(1)
Figure 2: Obstacle specification and minimum barrier distance as a function of relative bicycle orientation, .

where ; is the linear acceleration input; is the front-wheel steering angle input222That is the steering angle can be set instantaneously, and the dynamics of the steering rack can be ignored.; and . For the sake of intuition, we note a few special cases: when , the bicycle is oriented tangentially to the obstacle, and when , the bicycle is pointing directly at or away from the obstacle, respectively (see Fig. 2). is an intermediate quantity, an invertible function of .

We make the further assumption that the KBM has a control constraint on such that . To simplify further notation, we will consider directly as a control variable; this is without loss of generality, since there is a bijection between and the actual steering control angle, . Thus is also constrained:

. Finally, we define the state and control vectors for the KBM as:

and , where , the set of admissible controls.

Problem 1.

Consider a KBM robot with maximum steering angle , length parameters and maximum velocity 333In our KBM model, this technically requires a feedback controller on , but this won’t affect our results. . Consider also a disk-shaped region of radius centered at the origin, . Find a set of safe initial conditions, , and a ReLU NN:

(2)

such that for any globally Lipschitz continuous controller , the state feedback controller:

(3)

is guaranteed to prevent the robot from entering the unsafe region if it was started from a state in . Equivalently, applying feedback controller ensures that for all time when the initial condition is chosen in .

3 Approach

The most important feature of creftype 1 is that is a memoryless function that must correct the output of a feedback controller instantaneously. The existence of such a corrective function is not a priori guaranteed for the KBM dynamics. However, the well-known theory of Barrier Functions (BFs) provides a mechanism for ensuring the safety of a dynamical systems: in short, barrier functions are real-valued functions of the system state whose properties ensure that the value of the function remains greater than zero along trajectories of the system ames2019control ; ames2016control . Thus, if a barrier function is designed so that its zero super-level set is contained inside the set of safe states, then that subset is forward-invariant; i.e. if the system starts from a safe state, then it will stay safe for all future time. In this way, barrier functions can be used to to convert safety properties into an instantaneous – albeit state-dependent – set membership problem for control actions.

Thus, in the spirit of creftype 1, we employ the usual theory of autonomous barrier functions to control systems under state-feedback control: i.e. a control system in closed loop with a state-feedback controller . In this scenario, a feedback controller in closed loop converts the control system into an autonomous one – the autonomous vector field . Moreover, the conditions for a barrier function can be translated into a set membership problem for the outputs of such a feedback controller. This is explained in the following corollary.

Corollary 1.

Let be a control system that is Lipschitz continuous in both of its arguments on a set ; furthermore, let with , and let be a class function. If the set

(4)

is non-empty for each , and a feedback controller satisfies

(5)

then is forward invariant for the closed-loop dynamics .

Proof.

This follows directly from an application of zeroing barrier functions (XuRobustnessControlBarrier2015, , Theorem 1). ∎

Corollary 1 is the foundation of ShieldNN: the only difference is that instead of designing a single controller , we will design a safe “combined” controller . In this usage, when a controller generates a control action, , that lies outside of the set , must map it to a control within the set .

Thus, Corollary 1 admits the following three-step framework for developing ShieldNN filters.

ShieldNN Framework:

(1) Design a Candidate Barrier Function. For a function, , to be a barrier function for a specific safety property, its zero super-level set, , must be contained in the set of safe states.

(2) Verify the Existence of Safe Controls. (ShieldNN Verifier) Show that the set is non-empty for each state . This establishes that a safe feedback controller may exist.

(3) Design a Safety Filter. (ShieldNN Synthesizer) If possible, design such that ; then obtain a safety filter as:

(6)

ShieldNN thus hinges on the design of a barrier function, and then the design of two prediction-type NN functions: , which generates a safe control at each ; and , which overrides any unsafe control for a state with the associated value of .

4 Barrier Function(s) for the KBM Dynamics: the Basis of ShieldNN

It difficult to analytically derive a single barrier function as a function of a particular robot and safety radius for the KBM. Thus, we instead define a class of candidate barrier functions for a specific robot: this class is further parameterized by a unit-less scaling parameter and the safety radius, and it has the property that there are guaranteed parameter choices that actually result in a barrier function. However, since the analytically guaranteed parameter choices are impractically conservative, this need a ShieldNN verifier algorithm to establish whether a particular (user-supplied) choice of barrier function parameters does indeed constitute a barrier function.

In particular, we propose the following class of candidate barrier functions to certify control actions so that the bicycle doesn’t get within units of the origin (creftype 1):

(7)

where is an additional parameter whose function we shall describe subsequently. First note that the equation has a unique solution, for each value of :

(8)

so the smallest value of is . Thus, the function satisfies the requirements of (1) in the ShieldNN framework: i.e. , the zero super-level set of , is entirely contained in the set of safe states as proscribed by creftype 1, independent of the choice of . See Fig. 2, which also depicts another crucial value, .

Remark 1.

Note that is independent of the velocity state, . This will ultimately force ShieldNN filters to intervene only by altering the steering input.

A barrier function also requires a class function, . For ShieldNN, we choose a linear function

(9)

where is the assumed maximum linear velocity (see creftype 1), and is a constant selected according to the following theorem.

Theorem 1.

Consider any fixed parameters , and . Assume that (as specified by creftype 1). If is chosen such that:

(10)

then the Lie derivative is a monotonically increasing function in for all for each fixed choice of and the remaining state and control variables.

In particular, for all such that it is the case that:

(11)

In addition to concretely defining our class of candidate barrier functions, Theorem 1 is the essential facilitator of the ShieldNN algorithm. In particular, note that

(12)

since and . Hence, the set is independent of , so (11) gives a sufficient condition for safe controls (2) in terms of a single state variable, , and a single control variable . This simplifies not only the ShieldNN verifier but also the ShieldNN synthesizer, as we shall demonstrate in the next section.

5 ShieldNN

(a) Safe/unsafe steering controls. is shown in light green; and in dark green.
(b) Illustration of (orange) and two constant- slices of the final ShieldNN filter, (black).
Figure 3: Illustrated ShieldNN products for , , , .

ShieldNN Verifier: The overall ShieldNN algorithm has three inputs: the specs for a KBM robot (, and ); the desired safety radius (); and the barrier parameter . From these inputs, the ShieldNN verifier first soundly verifies that these parameters lead to an actual barrier function for creftype 1. As per Theorem 1, it suffices to show that is non-empty for each .

If the sets have a complicated structure (both themselves and relative to each other), then establishing this could in principle be quite difficult. However, the barrier functions under consideration actually appear to generate quite nice regions of safe controls. In particular, it appears to the case that the set of safe steering angles in any particular orientation state is an interval clipped at the maximum/minimum steering inputs. That is each such set can be written as:

where and are continuous functions of . Even more helpfully, the function generally appears to be concave, and the symmetry of the problem dictates that . See Fig. 2(a) for an example with parameters , , and ; is shown in light green, and and are shown in dark green.

Of course these observations about and are difficult to show analytically, given the nature of the equations (c.f. (12)). Nevertheless, we can exhibit a sound algorithm to verify these claims for particular parameter values, and hence that the input parameters correspond to a legitimate barrier function. Due to space constraints, the details of this algorithm appear in the supplementary material.

ShieldNN Synthesizer: Given a verified barrier function, recall from (3) in Section 3 that synthesizing a ShieldNN filter requires two components: and . That is chooses a safe control for each state, and overrides any unsafe controls with the output of .

Design of . This task is much easier than it otherwise would be, since the ShieldNN verifier also verifies the safe controls as lying between the continuous functions and where and is concave and . In particular, then, it is enough to design as any neural network such that

(13)

This property can be achieved in several ways, including training against samples of for example. However, we chose to synthesize directly in terms of tangent line segments to (and thus exploit the concavity of ). A portion of just such a function is illustrated by the orange line in Fig. 2(b).

Design of . Since the value of is designed to lie inside the interval of safe controls, the function can itself be used to decide when an unsafe control is supplied. In particular, using this property and the symmetry , we can simply choose

(14)

Note: in this construction, the closer approximates its lower bound, , the less intrusive the safety filter will be. Two constant- slices of such a are shown in Fig. 2(b).

6 ShieldNN Evaluation

We conduct a series of experiments to evaluate ShieldNN’s performance when applied to unsafe RL controllers. The CARLA Simulator Dosovitskiy17 is used as our RL environment, and we consider an RL agent whose goal is to drive a simulated vehicle while avoiding the obstacles in the environment. The video recordings and other details of the experiments can be found here. The goals of the experiments are to assess the following:

  1. The effect of ShieldNN when applied during RL training (Experiment 1) in terms of the average collected reward, obstacle avoidance, etc.

  2. The safety of the RL agent when ShieldNN is applied after training (Experiment 2).

  3. The robustness of ShieldNN when applied in a different environment than that used in training (Experiment 3).

RL Task: The RL task is to drive a simulated four-wheeled vehicle from point A to point B on a curved road that is populated with obstacles. The obstacles are static CARLA pedestrian actors randomly spawned at different locations between the two points. We define unsafe states as those in which the vehicle hits an obstacle. As ShieldNN is designed for obstacle avoidance, we do not consider the states when the vehicle hits the sides of the roads to be unsafe with respect to ShieldNN. Technical details and graphical representations are included in the Supplementary Materials.

Reward function and termination criteria: If the vehicle reaches point B, the episode terminates, and the RL agent gets a reward value of a . The episode terminates, and the agent gets penalized by a value of a in the following cases: when the vehicle (i) hits an obstacle; (ii) hits one of the sides of the road; (iii) has a speed lower than 1 KPH after 5 seconds from the beginning of the episode; or (iv) has a speed that exceed the maximum speed (45 KPH). The reward function is a weighted sum of four terms, and the weights were tuned during training. The four terms are designed in order to incentivize the agent to keep the vehicle’s speed between a minimum speed (35 KPH) and a target speed (40 KPH), maintain the desired trajectory, align the vehicle’s heading with the direction of travel, and keep the vehicle away from obstacles. The reward function is defined formally in the Supplementary Materials.

Proximal Policy Optimization (PPO) schulman2017proximal was used to train a neural network to perform the desired RL task. The network receives measurements and , which are synthesized from CARLA position and orientation measurements to simulate LiDAR input. The network then outputs the new control actions: throttle, , and steering angle, . The steering angle is subsequently processed by the ShieldNN filter to produce a corrected “safe” steering angle , which is applied to the simulated car with the original throttle input generated by the PPO agent. The ShieldNN filter is synthesized according to Section 5 with the parameters and KBM parameters , and . The full details of the architecture and signal processing for the agent are provided in the supplementary materials.

Experiment 1: Effect of ShieldNN During RL Training The goal of this experiment is to study the effect of applying ShieldNN to an RL agent during training. We train three RL agents for 6000 episodes each in order to compare (i) the collected reward and (ii) the obstacle hit rate after an equal number of training episodes. The three agents are characterized as follows: Agent 1 is trained with no obstacles and without the ShieldNN filter in place (Obstacles OFF + Filter OFF); Agent 2 is trained with obstacles spawned at random but without ShieldNN in place (Obstacles ON + Filter OFF); and Agent 3 is trained with obstacles spawned at random and with the ShieldNN filter in place (Obstacles ON + Filter ON).

When obstacles are not present (Agent 1), the RL agent quickly learns how to drive the vehicle, as indicated by the rapid growth in the reward function shown in Fig. 3(a). When obstacles are present but ShieldNN is not used (Agent ), the RL agent’s ability to learn the task degrades, as indicated by a 30% reduction in collected reward. However, when obstacles are present and the ShieldNN filter is in place (Agent 3), the agent collects 28% more reward on average than Agent , and collects a similar amount of reward to Agent . This is an indication that ShieldNN filters improves the training of the system by reducing the number of episodes that are terminated early due to collisions.

Similar behavior can be observed in Fig. 3(b), which shows the obstacle collision rate (averaged across episodes). This figure shows that Agent 2 is slowly learning how to avoid obstacles, since its average obstacle collision rate decreases from 80% to 47% in 60000 episodes. However, Agent 3, which uses ShieldNN during training, has an obstacle collision rate of almost zero. In total, Agent 3 suffers only three collisions across all 60000 episodes. We believe that these three collisions are due to the discrepancy between the KBM and the dynamics of the vehicle used by the CARLA simulator.

(a) Reward (raw & smoothed data for 3 cases)
(b) Obstacle collision rate
Figure 4: Results of Experiment 1, evaluation of effect of ShieldNN during training.

Experiment 2: Safety Evaluation of ShieldNN The goal of this experiment is to validate the safety guarantees provided by ShieldNN when applied to non-safe controllers. To do this, we evaluate the three trained agents from Experiment 1 in the same environment they were trained in, and with obstacles spawned randomly according to the same distribution used during training. With this setup, we consider two evaluation scenarios: (i) when the ShieldNN filter is in place (ShieldNN ON) and (ii) when ShieldNN filter is not in place (ShieldNN OFF). Table 1 shows all six configurations of this experiment. For each configuration, we run 200 episodes and record three metrics: (i) the minimum distance between the center of the vehicle and the obstacles, (ii) the average percentage of track completion, and (iii) the percentage of hitting obstacles across the 200 episodes.

Fig. 4(a) and 4(b) show the histograms of the minimum distance to obstacles for each configuration. The figure also show two vertical lines at 2.3 m and 4 m: the former is the minimum distance at which a collision can occur, given the length of the vehicle, and the latter is the value of the safe distance used to design the ShieldNN filter. Whenever the ShieldNN was not used in the 200 testing episodes (ShieldNN OFF, Fig. 4(a)), the average of all the histograms is close to the 2.3 m line indicating numerous obstacle collisions. The exact percentage of obstacle hit rate is reported in Table Table 1. Upon comparing the histograms in Fig. 4(a) with those in 4(b), we conclude that ShieldNN nevertheless renders all the three agents safe: note that the center of mass of the histograms shifts above the safety radius parameter, , used to design the ShieldNN filter. In particular, Agents 2 and 3 were able to avoid all the obstacles spawned in all 200 episodes, while Agent 1 hit only 0.5% of the obstacles spawned. Again, we believe this is due to the difference between the KBM used to design the filter and the actual dynamics of the vehicle. In general, the obstacle hitting rate is reduced by and for Agents 1, 2, and 3, respectively.

Training Testing Experiment 2 Experiment 3A
Config Obstacle Filter Filter TC% OHR% TC% OHR%
1 OFF OFF OFF 7.59 99.5 27.53 79.5
2 OFF OFF ON 98.82 0.5 98.73 0.5
3 ON OFF OFF 94.82 8.5 71.88 34
4 ON OFF ON 100 0 100 0
5 ON ON OFF 62.43 44 50.03 60
6 ON ON ON 100 0 100 0

TC% := Track Completion %   OHR% := Obstacle Hit Rate %

Table 1: Experiment 2 & 3, evaluation of safety and performance with and without ShieldNN.

Experiment 3: Robustness of ShieldNN in Different Environments The goal of this experiment is to test the robustness of ShieldNN when applied inside a different environment than the one used to train the RL agents. We split the experiment into two parts:

Part 3-A:

We use the same setup and metrics as in Experiment 2, but we perturb the locations of the spawned obstacles by a Gaussian distribution

in the lateral and longitudinal directions. Fig. 4(c) and 4(d) show that despite this obstacle perturbation, ShieldNN is still able to maintain a safe distance between the vehicle and the obstacles whereas this is not the case when ShieldNN is OFF. Table 1 shows an overall increase of obstacle hit rate and a decrease in track completion rate when ShieldNN is OFF compared to the previous experiment. This is expected, as the PPO algorithm is trained with the obstacles spawned at locations with a different distribution than the one used in testing. However, ShieldNN continues to demonstrate its performance and safety guarantees by having almost track completion rate and almost obstacle hit rate.

Part 3-B:

We use a completely different environment and track than the ones used in training, but we spawn the obstacles at locations with the same distribution used in training. We first perform transfer learning and train the pretrained Agents 2 and 3 for 500 episodes in the new environment. In this case, ShieldNN still achieves the desired safety distance on average, and achieving exactly zero obstacle hitting rates in both cases; it also achieves track completions of

and respectively. Implementation details and results for Experiment 3B are included in the Supplementary Material.

(a) Experiment 2, ShieldNN OFF
(b) Experiment 2, ShieldNN ON
(c) Experiment 3A, ShieldNN OFF
(d) Experiment 3A, ShieldNN ON
Figure 5: Distributions of distance-to-obstacles for experiments 2 & 3, with and without ShieldNN.

Side Effects of ShieldNN: In our experiments, applying ShieldNN during training had the side effect of creating a higher curb hitting rate during both training and testing, as compared to the case when the agent was trained with ShieldNN OFF. In particular, after training for 6000 episodes, the curb hitting rate for agent 2 went from down to . However for agent 3 it went from down to

. This is due to the fact that ShieldNN forces the vehicle to steer away from facing an obstacle which, in turn, increases the probability of hitting one of the sides of the road. This side effect suggests the possibility for future research in generalizing ShieldNN to provide safety guarantees against hitting environment boundaries as well.

References

Additional Notation

Throughout the rest of this appendix we will use the following notation:

(15)

Where is the right-hand side of the ODE in (1) and the variable is merely a placeholder, since the (40) doesn’t depend on it at all. In particular, (40) has the following relationship with (12):

(16)

Moreover, we define the following set:

(17)

which is the subset of the zero-level set of that is compatible with our assumption that (see creftype 1).

Proofs for Section 4

There are two claims from Section 4 that require proof.

  1. First, we stated Theorem 1 without proof.

  2. Second, we claimed that for any KBM parameters and , there exists a safety radius, , and a barrier parameter, , such that and (as defined in (7) and (9)) comprise a barrier function for the KBM.

We provide proofs for each of these in the next two subsections.

Proof of Theorem 1

We prove the first claim of Theorem 1 as the following Lemma.

Lemma 1.

Consider any fixed parameters , , and . Furthermore, define

(18)

Now suppose that is as in (7), and is as in (9) with is chosen such that .

Then for each , the function

(19)

is increasing on its domain, .

Remark 2.

Note the relationship between the function in (19) and the function used to to define in Corollary 1. That is the set that we are interested in characterizing in Theorem 1.

Proof.

We will show that when , each such function has a strictly positive derivative on its domain. In particular, differentiating gives:

(20)

To ensure that this derivative is strictly positive, it suffices to choose such that

(21)

For this, we consider two cases: and .

When , then for all . Thus it suffices to choose such that

(22)

which is assured under the assumption that if

(23)

Now, when , choosing according to (23) ensures that (21) is true for all . Thus, we also have to ensure (21) holds for . But in this case, , so (21) will be satisfied if

(24)

Thus, the desired conclusion holds if we choose as defined in the statement of the lemma. ∎

Now, we have the prerequisites to prove Theorem 1.

Proof.

(Theorem 1) The first claim of Theorem 1 is proved as Lemma 1. Thus, it remains to show that for any with — that is — we have that (11) holds. However, this follows from Lemma 1.

In particular, choose an arbitrary , and choose an arbitrary ; as usual we will only need to concern ourselves with the steering control, . First, observe that by definition:

(25)

However, the conclusion of this implication can be rewritten using the definition (19):

(26)

We now invoke Lemma 1: since by construction, Lemma 1 indicates that is strictly increasing on the interval . Combining this conclusion with (26), we see that . Again using the definition of in (26), we conclude that

(27)

Thus, we conclude that by the definition thereof (see the statement of Theorem 1). Finally, since and were chosen arbitrarily, we get the desired conclusion. ∎

Proof of That a Barrier Function Exists for Each KBM Instance

For and to be a useful class of barrier functions, it should be that case that at least one of these candidates is in fact a barrier function for each instance of the KBM. We make this claim in the form of the following Theorem.

Theorem 2.

Consider any KBM robot with length parameters ; maximum steering angle ; and maximum velocity . Furthermore, suppose that the following two conditions hold:

  1. , or equivalently, ;

  2. ; and

Then for every such that the set is non-empty. In particular, the feedback controller (interpreted as a function of only):

(28)

is safe.

Remark 3.

Note that there is always a choice of and such that condition (ii) can be satisfied. In particular, it suffices for and to be chosen such that:

(29)

Thus, by making large enough relative , it is possible to choose a such that the inequality (29) holds, and (ii) is satisfied.

Proof.

(Theorem 2) As a consequence of Theorem 1, it is enough to show that is non-empty for every .

The strategy of the proof will be to consider the control , and verify that for each such that , we have:

(30)

The symmetry of the problem will allow us to make a similar conclusion for .

We proceed by partitioning the interval into the following three intervals:

and consider the cases that is in each such interval separately.

Case 1 : In this case, , and by assumption. It is direct to show that:

(31)

and

(32)

Hence, the term in (12) can be lower bounded by zero, and the first term in (12) is identically zero by (32). Thus, in this case, (12) is lower bounded as as:

(33)

which of course will be greater than zero since with by assumption (i).

Case 2 : In this case, . Thus, for , we have that:

(34)
(35)
(36)

Consequently, (30) is automatically satisfied, since all of the quantities in the Lie derivative are positive.

Case 3 : In this case, as in Case 2. However, the term is now negative in this case:

(37)

Thus, since the other two terms are positive on this interval, we need to have:

(38)

This follows because on , and ; i.e. we substituted the lower and upper end points of , respectively. Noting that , we finally obtain:

(39)

The preceding is just another form of (ii) so we have the desired conclusion in (30).

The conclusion of the theorem then follows from the combined consideration of Cases 1-3 and Theorem 1 as claimed above. ∎

Proofs for Section 5

Additional Notation

Throughout the rest of this appendix we will use the following notation:

(40)

Where is the right-hand side of the ODE in (1) and the variable is merely a placeholder, since the (40) doesn’t depend on it at all. In particular, (40) has the following relationship with (12):

(41)

Moreover, we define the following set:

(42)

which is the subset of the zero-level set of that is compatible with our assumption that (see creftype 1).

Proofs

ShieldNN Verifier

Recall that the main function of the ShieldNN verifier to soundly verify that

(43)

for a concave function and with . The conclusion about follows directly from the symmetry of the problem, so we will focus on verifying the claims for .

As a foundation for the rest of this subsection, we make the following observation.

Proposition 1.

Suppose that (43) holds with . Then for any such that it is the case that

(44)
Proof.

This follows directly from the definition of , and the fact that we are considering it on the barrier, i.e. for which implies that and hence that . ∎

This suggests that we should start from (44) in order to establish the claim in (43). To this end, let be real numbers, and define:

(45)

with the appropriate modifications for other interval types , and . We also define a related quantity:

(46)

We can thus develop a sound algorithm to verify (43) and the concavity of by soundly verifying the following three properties in sequence:

Property 1. Show that ; that is intersects the lower control constraint a single orientation angle, . And likewise by symmetry.

Property 2. Verify that is the graph of a function (likewise for by symmetry), and that . Thus, define according to .

Property 3. Verify that as defined in Property 2 is concave.

The ShieldNN verifier algorithm expresses each of these properties as the sound verification that a particular function is greater than zero on a subset of its domain. Naturally, the functions that are associated with these properties are either itself or else derived from it (i.e. literally obtained by differentiating), and so each is an analytic function where the variables and appear only in trigonometric functions. Thus, these surrogate verification problems are easily approachable by over-approximation and the Mean-Value Theorem.

With this program in mind, the remainder of this appendix consists of one section each explaining how to express Property 1-3 as minimum-verification problems. These are followed by a section that describes the main algorithmic component of the ShieldNN verifier, CertifyMin.

Verifying Property 1

To verify Property 1, we can start by using a numerical root finding algorithm to find a zero of , viewed as a function of . However, there is no guarantee that this root, call it is the only root on the set . Thus, the property to be verified in this case in the assumptions of the following proposition.

Proposition 2.

Suppose that . Furthermore, suppose that there exists an such that:

  1. ;

  2. and ;