# Learning Control Barrier Functions from Expert Demonstrations

Inspired by the success of imitation and inverse reinforcement learning in replicating expert behavior through optimal control, we propose a learning based approach to safe controller synthesis based on control barrier functions (CBFs). We consider the setting of a known nonlinear control affine dynamical system and assume that we have access to safe trajectories generated by an expert - a practical example of such a setting would be a kinematic model of a self-driving vehicle with safe trajectories (e.g. trajectories that avoid collisions with obstacles in the environment) generated by a human driver. We then propose and analyze an optimization-based approach to learning a CBF that enjoys provable safety guarantees under suitable Lipschitz smoothness assumptions on the underlying dynamical system. A strength of our approach is that it is agnostic to the parameterization used to represent the CBF, assuming only that the Lipschitz constant of such functions can be efficiently bounded. Furthermore, if the CBF parameterization is convex, then under mild assumptions, so is our learning process. We end with extensive numerical evaluations of our results on both planar and realistic examples, using both random feature and deep neural network parameterizations of the CBF. To the best of our knowledge, these are the first results that learn provably safe control barrier functions from data.

## Authors

• 4 publications
• 2 publications
• 1 publication
• 1 publication
• 13 publications
• 15 publications
• 14 publications
• ### Learning for Safety-Critical Control with Control Barrier Functions

Modern nonlinear control theory seeks to endow systems with properties o...
12/20/2019 ∙ by Andrew Taylor, et al. ∙ 0

• ### End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Reinforcement Learning (RL) algorithms have found limited success beyond...
03/21/2019 ∙ by Richard Cheng, et al. ∙ 0

• ### Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning

Learning-based methods have been successful in solving complex control t...
03/22/2018 ∙ by Torsten Koller, et al. ∙ 0

• ### Synthesis of Control Barrier Functions Using a Supervised Machine Learning Approach

Control barrier functions are mathematical constructs used to guarantee ...
03/10/2020 ∙ by Mohit Srinivasan, et al. ∙ 0

• ### Learning Parametric Constraints in High Dimensions from Demonstrations

We present a scalable algorithm for learning parametric constraints in h...
10/08/2019 ∙ by Glen Chou, et al. ∙ 0

• ### Safety Considerations in Deep Control Policies with Probabilistic Safety Barrier Certificates

Recent advances in Deep Machine Learning have shown promise in solving c...
01/22/2020 ∙ by Tom Hirshberg, et al. ∙ 0

• ### Learning Constraints from Demonstrations

We extend the learning from demonstration paradigm by providing a method...
12/17/2018 ∙ by Glen Chou, et al. ∙ 6

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Consider the following safety-critical scenarios: a self-driving car navigating through traffic, two unmanned aerial vehicles (UAVs) avoiding collision, and a robotic manipulator in a laboratory setting that must avoid injuring researchers. Although vastly different in terms of their environments, safety-specifications, and underlying dynamics, they share several key properties: (i) their dynamics are well understood and modeled, and can be accurately identified, (ii) their dynamics are inherently nonlinear, and (iii) expert demonstrations of safe and desirable behavior are readily available or can be easily collected. Motivated by these unifying properties, this paper proposes the design of safe controllers for known nonlinear dynamical systems based on control barrier functions learned from expert demonstrations.

Barrier functions, which are also referred to as barrier certificates, were first proposed in [12] as a means of certifying the safety of dynamical systems with respect to semi-algebraic safe sets. In that work, a sum-of-squares (SOS) programming [11] approach for synthesizing polynomial barrier functions for given polynomial systems was also described. The notion of control barrier functions (CBFs) for dynamical control systems was first introduced in [25] to guarantee the existence of a control law that renders a desired safe set forward invariant. The notion of CBFs was refined by introducing reciprocal [2] and zeroing CBFs [3], which do not require that sub-level sets of the CBF be invariant within the safe set. In particular, zeroing CBFs can be used to compute a minimally invasive “correction” to a nominal control law. Importantly, this correction maintains safety by computing the solution to a quadratic program (QP) [3].

One open problem that has not been fully addressed in prior work is how such CBFs can be synthesized for general classes of systems. This challenge is similar to that which arises when addressing stability using control Lyapunov functions (CLFs) as the analog to Lyapunov functions [16]. Notably, control Lyapunov functions are a subset of control barrier functions (see [3] and [28]). Analytic and SOS based approaches to synthesizing CBFs and CLFs are summarized in [1] and have appeared in [27, 23]. These approaches, however, are known to be limited in scope and scalability.

### 1.1 Related work on learning and CBFs

Methods using barrier and control barrier functions to ensure safety and guide exploration during episodic supervised learning of uncertain linear dynamics include

[7, 24, 20, 19]. These approaches typically assume that a valid (control) barrier function is provided, and should be viewed as complementary to our results. In [29]

, an imitation learning based approach is used to to train a deep neural network (DNN) to replicate a CBF based controller. While the authors of

[29] present empirical validation of their results, no theoretical guarantees of correctness are provided. Most similar in spirit to our paper are the results in [18] and [15]. In [18]

, the authors parameterize a CBF by a support vector machine, and use a supervised learning approach to characterize regions of the state-space as safe or unsafe based on collected data. While conceptually appealing, we note that their training procedure does not ensure

a priori that there exist control actions such that the learned safe set can be made forward invariant,111In particular, they do not ensure that the derivative condition , holds for the learned CBF at the observed data points, with the system dynamics, and an extended class function– see Section 2 for more details. and hence cannot guarantee safe execution of the system. In [15], a method is proposed which incrementally learns a linear CBF by clustering expert demonstrations into linear subspaces and fitting low dimensional representations. While both papers [18, 15] empirically validate their methods, neither provide proofs of correctness of the learned CBF.

Contributions.

In this paper, we propose and analyze an optimization based approach to learning a zeroing CBF (henceforth referred to simply as a CBF) from expert trajectories for known control affine nonlinear systems. In particular, we provide precise and verifiable conditions on the expert trajectories, an additional auxiliary data-set, and the hyperparameters of the optimization problem so as to ensure that the learned CBF guarantees safe execution of the system. We further show how the underlying optimization problem can be efficiently solved when it is cast over different function spaces. In particular, we show that the problem can be solved via convex optimization when the function space lies within a (possibly infinite-dimensional) reproducing kernel Hilbert space (RKHS); alternatively, when we consider the function space of deep neural networks (DNNs), the problem can be solved via first-order stochastic methods such as Adam or SGD. To the best of our knowledge, these are the first such results that learn a CBF from expert demonstrations with provable safety guarantees.

Paper structure. The rest of this paper is structured as follows. In Section 2, we introduce notation and formulate the general problem of learning a CBF from expert demonstrations. In Section 3, we derive a set of sufficient conditions on the learned CBF and data-set that guarantee safety of the resulting closed-loop system, and we subsequently use these conditions to formulate an optimization problem for computing a function satisfying these conditions. We show in Section 3.4 that this optimization problem can be efficiently solved for CBFs embedded in RKHS and DNN function classes, and in Section 3.5, we provide further details on the expert trajectory collection process. We present three numerical studies in Section 4: (i) a two-dimensional planar problem for which we explicitly compute and verify all of the conditions of our main theorem, showing that the conditions are indeed satisfied in practice, (ii) a two UAV collision-avoidance example where expert trajectories are generated by the closed form CBF from [17], and (iii) the same two UAV collision avoidance example, where now expert trajectories are generated by human players of a video game interface. We end with conclusions and discussions of directions for future work in Section 5.

## 2 Preliminaries and problem formulation

Let and be the set of real and non-negative real numbers, respectively, and the set of -dimensional real vectors. For and , we let denote the closed -norm ball around . For a given set , we denote by , , and the boundary, interior, and complement of , respectively. For two sets and , we denote their Minkowski sum by . A continuous function is an extended class function if it is strictly increasing with . The inner-product between two vectors is denoted by .

### 2.1 Valid control barrier functions

At time , let and be the state and input, respectively, of the dynamical control system described by the initial value problem

 ˙x(t)=f(x(t))+g(x(t))u(t),x(0)∈Rn (2.1)

where and are locally Lipschitz continuous functions. Let the unique solution to (2.1) under a locally Lipschitz continuous control law be where is the maximum definition interval of . Note that we do not explicitly assume forward completeness of (2.1) under here, i.e., may be bounded.

Consider next a continuously differentiable function , and define the set

 C:={x∈Rn∣∣h(x)≥0}, (2.2)

which defines a set that we wish to certify as safe, i.e., that it satisfies prescribed safety specifications and can be made forward invariant through an appropriate choice of control action. We further assume that has non-empty interior, and let be an open set such that . To avoid technicalities (see [1, Remark 5]), we assume that is strictly contained within , i.e., for each there exists such that . The function is said to be a valid control barrier function on if there exists a locally Lipschitz continuous extended class function such that

 (2.3)

holds for all , where defines constraints on the control input . Consequently, we define the set of CBF consistent inputs induced by a valid CBF to be

 KCBF(x):={u∈Rm∣∣⟨∇h(x),f(x)+g(x)u⟩≥−α(h(x))}.

The next result follows from [3, 28].

###### Lemma 2.1.

Assume that is a valid control barrier function on and that with is locally Lipschitz continuous. Then is forward invariant under , i.e., implies for all . Moreover, it holds that is asymptotically stable under , which implies that approaches as when and if , which holds if is compact.

While the previous result provides strong guarantees of safety given a valid control barrier function, one is still left with the potentially daunting task of finding a continuously differentiable function such that (i) the set defined in equation (2.2) captures a sufficiently large volume of “safe” states needed for the task at hand, and (ii) that it satisfies the derivative constraint (2.3) on an open set . While safety constraints are often naturally specified on a subset of the configuration space of a system (e.g., to avoid collision, vehicles must maintain a minimum separating distance), ensuring that a CBF specified using such geometric intuition also satisfies constraint (2.3) can involve verifying complex relationships between the vector field of the system, the candidate control barrier function, and its gradient.

As described in the introduction, this challenge motivates the approach taken in this paper, wherein we propose an optimization based approach to learning a CBF from expert demonstrations for a system with known dynamics.

### 2.2 Problem formulation

To formalize the previous discussion, we explicitly distinguish between geometric safety specifications, i.e., those that can be directly specified on (a subset) of the state-space of the system , and the set defined in equation (2.2) that is certified as safe by the CBF. To that end, let define the aforementioned geometric safe set.

Toward the goal of learning a valid CBF, we assume that we are given a set of expert trajectories consisting of discretized data-points such that . This is illustrated in Figure 0(a). For , we define the sets

 D′:=N1⋃i=1Bϵ,p(xi)and D=D′∖bd(D′). (2.4)

Several comments are in order. First, note that we define based on expert trajectories for which control inputs are available so that the derivative constraint (2.3) can be enforced during learning. Second, by construction, the component of defines an -net over , i.e., for all , (slightly abusing notation) there exists such that . Finally, conditions on will be specified later to ensure the validity of the learned CBF, but we note now that at a minimum, we require that be such that .

###### Remark 1.

We note that a conceptually similar approach, defined in terms of taking a point-wise union over previously seen safe trajectories, is used to define a safe terminal set in the Learning Model Predictive Control method of [14].

We next define the set , for , as

 N:={bd(D)⊕Bσ,p(0)}∖D,

which should be thought of as a “layer” of width surrounding ; see Figure 0(b) for a graphical depiction. As will be made clear in the sequel, by enforcing that the value of the learned CBF is negative on , which can be accomplished through appropriate sampling, we ensure that the zero level set is contained within , which is a necessary condition for to be valid.

While the above definition of a CBF is specified over all of , e.g., the definition of in equation (2.2) considers all such that , we make a minor modification to this definition in order to restrict the domain of interest to , i.e., we will certify that is a valid local CBF over with respect to the set

 C:={x∈N∪D∣∣h(x)≥0}. (2.5)

This restriction is natural, as we are learning a CBF from data sampled only over , and we will show that the inclusion holds. It then follows that if is shown to satisfy the derivative constraint (2.3) for all , then both , as defined in (2.5), and the compact set can be made forward invariant by some .

## 3 An optimization based approach

In this section, we define and analyze an optimization based approach to generating valid local control barrier functions from expert demonstrations. To this end, let be a normed function space of continuously differentiable functions for which local Lipschitz bounds

 Lh(x):=supx1,x2∈Bϵ,p(x)|h(x1)−h(x2)|∥x1−x2∥p

can be efficiently estimated. Commonly used examples of such spaces include infinite dimensional reproducing kernel Hilbert spaces (RKHS) such as those defined by random Fourier (RF) features

[13], and more recently deep neural networks (DNNs) [9]. We defer a discussion of results specific to these two classes of CBFs to the end of this section, and focus now on a general method applicable to these, and other, spaces .

Recall the definition of and define the set . We also assume that points are sampled from the set such that forms an -net of – conditions on will be specified in the sequel. We emphasize that no associated inputs are needed for the samples , as these points are not generated by the expert, and can instead be obtained by simple computational methods such as gridding or uniform sampling.

We begin by deriving a set of sufficient conditions in terms of constraints on the learned CBF , as well as conditions on the data-sets and , that ensure that is a valid local CBF on . We then use these constraints to formulate an optimization problem that can be efficiently solved for the aforementioned function classes .

### 3.1 Guaranteeing C⊂D⊆S

We begin with the simple and intuitive requirement that the learned CBF satisfy

 h(xi)≥γsafe∀xi∈Xsafe, (3.1)

for a yet to be specified parameter . This in particular ensures that the set over which , as defined in equation (2.5), has non-empty interior.

We now derive conditions under which the learned CBF satisfies for all , which in turn ensures that due to constraint (3.1).

###### Proposition 3.1.

Let be Lipschitz continuous with local Lipschitz constant . Let and be an -net of with for all . Then, if

 h(xi)≤−γunsafe∀xi∈XN (3.2)

it holds that for all .

###### Proof.

By equation (3.2), we have that for each . We then have, for any , that there exists a point satisfying , from which the following chain of inequalities follows immediately

 h(x) =h(x)−h(xi)+h(xi)≤|h(x)−h(xi)|−γunsafe ≤Lh(xi)∥x−xi∥p−γunsafe≤Lh(xi)¯ϵ−γunsafe<0

where the first inequality follows from the assumption that for all , the second by the local Lipschitz assumption on , the third by the assumption that forms an -net of , and the final inequality by the condition on of the proposition. ∎

We note that as stated, the constraints (3.1) and (3.2), as well as the condition of Proposition 3.1 may be incompatible, leading to infeasibility of an optimization problem built around them. This incompatibility arises from the fact that we are simultaneously asking for the value of to vary from to over a short distance while having a low Lipschitz constant. In particular, as posed, the constraints require that for and safe and unsafe samples, respectively, but the samplings requirements imply that for at least some pair , which in turn implies that

 L(xu)≳|h(xs)−h(xu)|∥xs−xu∥2≳γsafe+γunsafe¯ϵ+ϵ.

Thus, if and are chosen to be too large, we may exceed the required bound of , and set over which may be undesirably small (i.e., the volume of would be too small).

We address this issue as follows: for fixed , , and , constraint (3.1) is relaxed to

 h(xi)≥γsafe,xi∈¯Xsafe, (3.3)

where now

 ¯Xsafe={xi∈Xsafe∣∣infx∈XN∥x−xi∥p≥γunsafe+γsafeLh} (3.4)

corresponds to an inner subset of expert trajectory samples. Intuitively, this introduces a buffer region across which can vary in value from to without having an excessively large Lipschitz constant. A near identical argument as that used to prove Proposition 3.1 can now be used to guarantee that the set defined in equation (2.5) contains the set

 ¯D=⋃xi∈¯XsafeBϵ,p(xi),

defined as the union of -balls around the points in , and thus, can be seen as a “minimum volume” guarantee on the set .

###### Corollary 3.2.

Let be Lipschitz continuous with local constant . Let , and be an -net of with for all . Then, if constraint (3.3) is satisfied, it holds that for all .

### 3.2 Guaranteeing valid local control barrier functions

The conditions in the previous subsection guarantee that the level-sets of the learned CBF satisfy the desired properties. We now derive conditions that ensure that the derivative constraint (2.3) is also satisfied by the learned CBF.

Because we assume that the CBF functions are continuously differentiable over a compact domain , we immediately have that is Lipschitz continuous with local Lipschitz constant . Note that to verify that a CBF satisfying the constraints of the previous section is valid, it suffices to show that there exists a single control input such that the derivative constraint (2.3) holds. Our approach is to use the control inputs provided by the expert demonstrations. We discuss the consequences of this choice further in Section 3.5.

To that end, note that for a fixed , the function is Lipschitz continuous, with Lipschitz constant denoted by , as , and are all assumed to be Lipschitz continuous. Following a similar argument as in the previous subsection, we then have the following result guaranteeing that the learned CBF satisfies the derivative constraint (2.3) for all .

###### Proposition 3.3.

Suppose is Lipschitz continuous with constant . Let , and be an -net of with for all . Then if

 ⟨∇h(xi),f(xi)+g(xi)ui⟩≥−α(h(xi))+γdyn (3.5)

for all it holds that for all .

###### Proof.

Following a similar argument as the proof of Proposition 3.1, we note that by equation (3.5), we have that for each . We then have, for any , that there exists a point satisfying , from which the following chain of inequalities follows immediately

 q(x) =q(xi)+q(x)−q(xi)≥γdyn−|q(x)−q(xi)| ≥γdyn−Lq(xi)∥x−xi∥p≥γdyn−Lq(xi)¯ϵ≥0,

where the first inequality follows from the assumption that for all , the second by the Lipschitz assumption on , the third by the assumption that forms an -net of , and the final inequality by the condition on of the proposition. ∎

The following theorem, which follows immediately from the previous results, states a set of sufficient conditions guaranteeing that a learned CBF is locally valid on the domain . We next use these conditions to formulate an optimization based approach to learning a CBF from data.

###### Theorem 3.4.

Let a continously differentiable function be a candidate CBF, and let the sets , , , , and , and the data-sets , , and be defined as above. Suppose that forms a -net of satisfying the conditions of Proposition 3.1, and that forms an -net of satisfying the conditions of Corollary 3.2 & Proposition 3.3. Then if satisfies constraints (3.1), (3.2), and (3.5), it holds that the set is non-empty, , and the function is a valid local control barrier function on with domain .

### 3.3 Control barrier filters

We introduce a simple and natural extension to the notion of a local CBF. Consider the same scenario as above, together with an additional set that satisfies the following condition: for each there exists no continuous signal with for some and for all . This means that the set filters all trajectories starting from , i.e., each trajectory starting from has to pass through to reach and thereby renders safe (see Figure 0(c)). This follows in the spirit of set invariance [5]. As illustrated in Figure 0(c), this allows us to remove the perhaps counter-intuitive requirement of having to introduce “artificial” unsafe samples in a region that is clearly safe, further reducing the conservatism of the resulting controller.

### 3.4 Computing a Control Barrier Function

Using the results of the previous subsection, we propose solving the following optimization problem to learn a CBF from expert trajectories:

 minimizeh∈H ∥h∥ subjectto h(xi)≥γsafe,∀xi∈¯Xsafe(Lh) h(xi)≤−γunsafe Lip(h(xi),¯ϵ)≤Lh∀xi∈XN (3.6a) q(xi,ui):=⟨∇h(xi),f(xi,ui)⟩+α(h(xi))≥γdyn Lip(q(xi,ui),ϵ)≤Lq∀(xi,ui)∈Zdyn (3.6b)

The positive constants , , , and are hyperparameters that are set according to the conditions of Theorem 3.4 given data-sets and defining corresponding and -nets. Here the constraints defined in equations (3.6a) and (3.6b) assume that there exists a function that returns an upper bound on the Lipschitz constant of its argument in an neighborhood. We note that it may be difficult to enforce these bounds while solving the optimization problem, in which case we must resort to bootstrapping the values of and by iteratively solving optimization problem (3.6), computing the values and for the learned CBF , verifying if the conditions of Theorem 3.4 hold, and readjusting the hyperparameters accordingly if not. This is a standard approach to hyperparameter tuning, and we show in Section 4 that it can indeed be successfully applied to verify the conditions of Theorem 3.4 in practice.

#### 3.4.1 Convexity

We first note that optimization problem (3.6) is convex in if the function is linear in its argument, and if we exclude the bounds (3.6a) and (3.6b), and instead verify them via the bootstrapping method described above. Therefore, if is parameterized as with a convex set and a known but possibly nonlinear transformation, then problem (3.6

) is convex, and can be solved efficiently using standard solvers. Note that very rich function classes such as infinite dimensional RKHS from statistical learning theory can be approximated to arbitrary accuracy as such a

[13].

In the more general case when , such as when is a DNN or when is a general nonlinear function of its argument, optimization problem (3.6) is non-convex. Due to the computational complexity of general nonlinear constrained programming, we propose an unconstrained relaxation of problem (3.6) which can be solved efficiently in practice by first order gradient based methods. Let for . Our unconstrained relaxation is the following optimization problem:

 minimizeθ∈Θ (3.7) +λd∑(xi,ui)∈Zdyn[γdyn−(⟨∇hθ(xi),f(xi,ui)⟩+α(hθ(xi)))]+

The positive parameters allow us to trade off the relative importance of each of the terms in the optimization. While equation (3.7) is in general a non-convex optimization problem, it can be solved efficiently in practice with stochastic first-order gradient methods such as Adam or SGD.

#### 3.4.2 Lipschitz continuity of H

As described earlier, because we assume that functions in are continuously differentiable and we restrict ourselves to a compact domain , we immediately have that and are both uniformly Lipschitz over . We show here two examples of where it is computationally efficient to estimate an upper bound on the Lipschitz constants of functions .

In the case of random Fourier features with random features, where and is

 ϕ(x)=√2ℓ(cos(⟨x,w1⟩+b1),…,cos(⟨x,wℓ⟩+bℓ)),

then we can analytically compute upper bounds as follows. First, we have by the Cauchy-Schwarz inequality . To bound , we bound the spectral norm of the Jacobian , which is a matrix where the -th row is . Let and observe that

 ∥Dϕ(x)∥=√2ℓsup∥v∥2=1(ℓ∑i=1s2i⟨wi,v⟩2)1/2≤√2ℓsup∥v∥2=1(ℓ∑i=1⟨wi,v⟩2)1/2=√2ℓ∥W∥,

where is a matrix with the -th row equal to . While the bound

can be used in computations, we can further understand order-wise scaling of the bound as follows. For random Fourier features corresponding to the popular Gaussian radial basis function kernel,

where

is the (inverse) bandwidth of the Gaussian kernel. Therefore, by standard results in non-asymptotic random matrix theory

[21], we have that

 ∥W∥≤σ(√ℓ+√n+√2log(1/δ))

w.p. at least . Combining these calculations, we have that the Lipschitz constant of can be bounded by w.p. at least .

We now bound the Lipschitz constant of the gradient . We do this by bounding the spectral norm of the Hessian , with . A simple bound is

 ∥∇2h(x)∥≤√2/ℓ∥θ∥∞∥W∥2≤3√2∥θ∥∞σ2(ℓ+n+2log(1/δ))/√ℓ,

where the last inequality holds w.p. at least .

When is a DNN, accurately estimating the Lipschitz constant is more involved. In general, the problem of exactly computing the Lipschitz constant of is known to be NP-hard [22]

. Notably, because most commonly-used activation functions

are known to be 1-Lipschitz (e.g. ReLU, tanh, sigmoid), a naive upper bound on the Lipschitz constant of

is given by the product of the norms of the weight matrices; that is, . However, this bound is known to be quite loose [9]. Recently, the authors of [9] proposed a semidefinite-programming based approach to efficiently compute an accurate upper bound on

. In particular, this approach relies on incremental quadratic constraints to represent the couplings between pairs of neurons in the neural network

. On the other hand, there are relatively few results that provide accurate upper bounds for the Lipschitz constant of the gradient of when is a neural network. While ongoing work looks to extend the results from [9] to compute upper bounds on , to the best of our knowledge, the only general method for computing an upper bound on is through post-hoc sampling [26].

### 3.5 Data Collection

We briefly comment on how data should be collected to ensure that the conditions of Theorem 3.4 are satisfied.

#### What should the experts do?

At a high level, our results state that if a smooth CBF can be found that satisfies the constraints (3.1), (3.2), and (3.5) over a sufficiently fine sampling of the state-space, then the resulting function is a valid CBF. We focus here on the derivative constraint (3.5), which must be verified to hold for some , by using the expert example data . In particular, the more transverse the vector field is to the outward pointing normal of the level sets of the learned CBF , the larger the inner-product term in constraint (3.5) without increasing the Lipschitz constant of . In words, this says that the expert demonstrations should demonstrate how to move away from the unsafe set.

#### Constructing ϵ-nets

In order to construct an -net of a set , a simple randomized algorithm which repeatedly uniformly samples from

works with high probability (see, for example,

[21]). Hence, as long as we can efficiently sample from (e.g. when is a basic primitive set or has an efficient set-membership oracle), uniform sampling is a viable strategy. Alternatively, a gridding approach can be taken. We note that in either case, for a set of diameter on the order of samples are required. While this exponential dependence is undesirable, we observe that in practice, the expert demonstrations allow us to focus on a subset of the state-space associated with desirable behavior, significantly reducing the diameters of the sets to be sampled.

## 4 Numerical Experiments

All code will be publicly released at https://github.com/nikolaimatni/learningcbfs.

#### 4.0.1 Planar Example

Our first experiment is the following two dimensional planar system adapted from [10]:

 ˙x1 =−x1+(x21+δ)u1 (4.1) ˙x2 =−x2+(x22+δ)u2,

where is a fixed parameter guaranteeing that the system is globally feedback linearizable. We set in our experiments. The desired safe set is . We generate expert data for this system as follows. Because the system is feedback linearizable, given a desired trajectory , we can easily design a nominal controller which tracks . We can then construct a safe controller (w.r.t. ) by solving the CBF-QP problem [2, 3] with the CBF .

We design two sets of desired trajectories. Let the unit vector . The first set is defined for a fixed as from . We do this for , sampling time equi-spaced points along each curve. The second set of desired trajectories are for a fixed , where we consider a trajectory that starts at and ends up at , and one where and . We grid across both and to ensure a densely sampled set of points. All sample points are shown in Figure 2(left). We consider the corresponding to the circular trajectories (green in Fig. 2) as defining . We then set to be points sampled (red in Fig. 2) along the circle at and . Our samples are specifically chosen to form a net over and , with and , respectively.

We parameterize using random Fourier features corresponding to the Gaussian kernel with . We set and then solve the optimization problem with , , using cvxpy [8] with the MOSEK [4] backend. Next, we verify that our specific choices of satisfied the necessary conditions, computing and to obtain and , respectively. This verification is shown in Fig. 3. The level sets of the resulting CBF are plotted in Fig. 2(right).

#### 4.0.2 Aircraft Collision Avoidance

In this subsection, we apply the control barrier filter technique (Section 3.2) to the aircraft collision avoidance problem in [17]. The joint state vector of the two aircraft, indexed with and , is , denoting positions in the -plane and orientations. The controls are the translational and angular velocities with constraints and . The control goal is to reach the target states if , or if , . The safety specification is that the two airplanes should maintain a minimal distance to avoid collisions. To this end, we define the geometric safe set as,

 S:={x∈R6∣∣p2x,r+p2y,r≥D2s}. (4.2)

where is the relative position.

#### Generating training data

We consider two ways of generating expert demonstrations. First, we used a standard tracking MPC as the nominal controller equipped with the closed form constructive CBF in [17] for collision avoidance (which we refer to as CBF-MPC). To generate the expert trajectories, we started the system from 400 randomly generated initial conditions inside the set . Each run terminated when the airplanes were sufficiently far away from each other. Furthermore, we uniformly sampled safe and unsafe states; these states as well as the expert trajectories are shown in Figure 4 in relative coordinates.

Secondly, we built a web-based simulator that allows a user to control two simulated aerial vehicles. As before, the goal of the simulation is to control the two aerial vehicles such that they do not collide. We emphasize that these trajectories were solely by human guidance; no nominal controller was used. The data is plotted in Figure 4.

#### Training procedure

We parametrized the CBF candidate with a two-hidden-layer fully-connected neural network with 64 neurons in each layer and tanh activation functions. The training procedure was implemented using JAX [6] and the Adam algorithm with a cosine decay learning rate. We trained the neural network for epochs using the loss in (3.7) with , , , , and . Each of these hyperparameters was chosen via grid-search. The learned CBFs and the closed form CBF [17] evaluated at the training points are plotted in Figure 5 in relative coordinates.

#### Closed-loop control with learned CBF

To demonstrate the efficacy of the CBF learned from expert demonstrations, we used it in the aircraft collision avoidance problem with the same control goal and safety specification as in (4.2). The two airplanes were initialized at various symmetric initial positions on the circle such that they were facing each other. In this way, if both airplanes used the nominal MPC controller, they would collide.

The closed-loop state trajectories using our learned CBF are shown in Figure 6. The CBFs learned on both data-sets successfully steer the airplanes away from each other for all initial states, which experimentally validates the forward invariance of . As a comparison, we also plotted the state trajectories produced by the CBF from [17] under the same settings in Figure 6. Since this CBF is derived analytically, it appears to render more aggressive control actions which manage to separate the airplanes at a closer distance.

## 5 Conclusion

We proposed and analyzed an optimization based approach to learning CBFs from expert demonstrations for known nonlinear control affine dynamical systems. We showed that under suitable assumptions of smoothness on the underlying dynamics and the learned CBF (which can be guaranteed using classic and recent [9] results for RKHS and DNNs), and under sufficiently fine sampling, the learned CBF is provably valid, guaranteeing safety. This work provides a firm theoretical foundation for future exploration that will look to leverage tools from statistical learning theory to reduce the sample complexity burden of the proposed method by focusing on guaranteeing safety for “typical” behaviors, as opposed to uniform coverage of the state-space.

## 6 Acknowledgements

This work was supported in part by the Knut and Alice Wallenberg Foundation (KAW) and the SSF COIN project.

## References

• [1] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada (2019-06) Control barrier functions: theory and applications. In 2019 18th European Control Conference (ECC), Naples, Italy, pp. 3420–3431. Cited by: §1, §2.1.
• [2] A. D. Ames, J. W. Grizzle, and P. Tabuada (2014-12) Control barrier function based quadratic programs with application to adaptive cruise control. In Proc. Conf. Decis. Control, Los Angeles, CA,, pp. 6271–6278. Cited by: §1, §4.0.1.
• [3] A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada (2017) Control barrier function based quadratic programs for safety critical systems. IEEE Trans. Autom. Control 62 (8), pp. 3861–3876. Cited by: §1, §1, §2.1, §4.0.1.
• [4] M. ApS (2019) The mosek optimization toolbox for matlab manual. version 9.0.. External Links: Link Cited by: §4.0.1.
• [5] F. Blanchini (1999) Set invariance in control. Automatica 35 (11), pp. 1747–1767. Cited by: §3.3.
• [6] JAX: composable transformations of Python+NumPy programs External Links: Link Cited by: §4.
• [7] R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick (2019) End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In

Proceedings of the AAAI Conference on Artificial Intelligence

,
Vol. 33, pp. 3387–3395. Cited by: §1.1.
• [8] S. Diamond and S. Boyd (2016) CVXPY: a Python-embedded modeling language for convex optimization.

Journal of Machine Learning Research

17 (83), pp. 1–5.
Cited by: §4.0.1.
• [9] M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas (2019) Efficient and accurate estimation of lipschitz constants for deep neural networks. In Advances in Neural Information Processing Systems, pp. 11423–11434. Cited by: §3.4.2, §3, §5.
• [10] S. Kolathaya and A. D. Ames (2019) Input-to-state safety with control barrier functions. IEEE Control Systems Letters 3 (1), pp. 108–113. Cited by: §4.0.1.
• [11] P. A. Parrilo (2000) Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. Ph.D. Thesis, California Institute of Technology. Cited by: §1.
• [12] S. Prajna, A. Jadbabaie, and G. J. Pappas (2007) A framework for worst-case and stochastic safety verification using barrier certificates. IEEE Trans. Autom. Control 52 (8), pp. 1415–1428. Cited by: §1.
• [13] A. Rahimi and B. Recht (2008) Random features for large-scale kernel machines. In Advances in neural information processing systems, pp. 1177–1184. Cited by: §3.4.1, §3.
• [14] U. Rosolia and F. Borrelli (2017) Learning model predictive control for iterative tasks. a data-driven control framework. IEEE Transactions on Automatic Control 63 (7), pp. 1883–1896. Cited by: Remark 1.
• [15] M. Saveriano and D. Lee (2019) Learning barrier functions for constrained motion planning with dynamical systems. In IEEE International Conference on Intelligent Robots and Systems, Cited by: §1.1.
• [16] E. D. Sontag (1989) A ’universal’ construction of artstein’s theorem on nonlinear stabilization. Systems & control letters 13 (2), pp. 117–123. Cited by: §1.
• [17] E. Squires, P. Pierpaoli, and M. Egerstedt (2018) Constructive barrier certificates with applications to fixed-wing aircraft collision avoidance. In 2018 IEEE Conference on Control Technology and Applications (CCTA), pp. 1656–1661. Cited by: §1.1, Figure 5, Figure 6, §4.0.2, §4, §4, §4.
• [18] M. Srinivasan, A. Dabholkar, S. Coogan, and P. Vela (2020) Synthesis of control barrier functions using a supervised machine learning approach. arXiv preprint arXiv:2003.04950. Cited by: §1.1.
• [19] A. J. Taylor, A. Singletary, Y. Yue, and A. D. Ames (2020) A control barrier perspective on episodic learning via projection-to-state safety. arXiv preprint arXiv:2003.08028. Cited by: §1.1.
• [20] A. Taylor, A. Singletary, Y. Yue, and A. Ames (2019) Learning for safety-critical control with control barrier functions. arXiv preprint arXiv:1912.10099. Cited by: §1.1.
• [21] R. Vershynin (2018)

High-dimensional prob.: an introduction with applications in data science

.
Vol. 47, Cambridge university press. Cited by: §3.4.2, §3.5.
• [22] A. Virmaux and K. Scaman (2018) Lipschitz regularity of deep neural networks: analysis and efficient estimation. In Advances in Neural Information Processing Systems, pp. 3835–3844. Cited by: §3.4.2.
• [23] L. Wang, D. Han, and M. Egerstedt (2018) Permissive barrier certificates for safe stabilization using sum-of-squares. In 2018 Annual American Control Conference (ACC), pp. 585–590. Cited by: §1.
• [24] L. Wang, E. A. Theodorou, and M. Egerstedt (2018) Safe learning of quadrotor dynamics using barrier certificates. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2460–2465. Cited by: §1.1.
• [25] P. Wieland and F. Allgöwer (2007-08) Constructive safety using control barrier functions. In Proc. IFAC Symp. Nonlin. Control Syst., Pretoria, South Africa, pp. 462–467. Cited by: §1.
• [26] G. Wood and B. Zhang (1996) Estimation of the lipschitz constant of a function. Journ. of Global Opt. 8 (1), pp. 91–103. Cited by: §3.4.2.
• [27] X. Xu, J. W. Grizzle, P. Tabuada, and A. D. Ames (2017) Correctness guarantees for the composition of lane keeping and adaptive cruise control. IEEE Transactions on Automation Science and Engineering 15 (3), pp. 1216–1229. Cited by: §1.
• [28] X. Xu, P. Tabuada, J. W. Grizzle, and A. D. Ames (2015) Robustness of control barrier functions for safety critical control. In Proc. Conf. Analys. Design Hybrid Syst., Vol. 48, pp. 54–61. Cited by: §1, §2.1.
• [29] S. Yaghoubi, G. Fainekos, and S. Sankaranarayanan (2020) Training neural network controllers using control barrier functions in the presence of disturbances. arXiv preprint arXiv:2001.08088. Cited by: §1.1.