Robot control systems are challenging to design, not least because of the problems of task complexity and model uncertainty. Robotics control problems like those in Fig. 1 often involve both safety and stability requirements, where the controller must drive the system towards a goal state while avoiding unsafe regions. Complicating matters, the model used to design the controller is seldom a perfect representation of the physical plant, and so controllers must account for uncertainty in any parameters (e.g. mass, friction, or unmodeled effects) that vary between the engineering model and true plant. Automatically synthesizing safe, stable, and robust controllers for nonlinear reach-avoid tasks is a long-standing open problem in controls. In this paper, we address this problem with a novel approach to robust model-based learning. Our work presents a unified framework for handling both model uncertainty and complex safety and stability specifications.
Over the years, several approaches have been proposed to solve this problem. In one view, reach-avoid can be treated as an optimal control problem and solved using model predictive control (MPC) schemes and their robust variants. Robust MPC promises a method for general-purpose controller synthesis, finding an optimal control signal given only a model of the system and a specification of the task. However, there are a number of recognized disadvantages of robust MPC. First, there are currently no techniques for guaranteeing the safety, stability, or recursive feasibility of robust MPC beyond the linear case [rmpc]. Second, many sources of model uncertainty (e.g. mass or friction) are multiplicative in the dynamics, but robust MPC algorithms are typically limited to additive uncertainty [rmpc, Lofberg2003]. Finally, MPC is computationally expensive, making it difficult to achieve high control frequencies in practice [Levine2019].
An alternative method for synthesizing safe, stable controllers comes from Lyapunov theory, through the use of control Lyapunov and control barrier functions (resp., CLFs and CBFs, [Ames2017a]) — certificates that prove the stability and safety of a control system, respectively. CLFs and CBFs are similar to standard Lyapunov and barrier functions, but they can be used to synthesize a controller rather than just verifying the performance of a closed-loop system. Unfortunately, CLF and CBF certificates are very difficult construct in general, particularly for systems with nonlinear dynamics [Giesl2015].
The most recent set of methods promising general-purpose controller synthesis come from the field of learning for control; for instance, using reinforcement learning[Cheng2019, Han2020]Chang2019, Sun2020, Qin2021, Tsukamoto2020]. However, the introduction of learning-enabled components into safety-critical control tasks raises questions about soundness, robustness, and generalization. Some learning-based control techniques incorporate certificates such as Lyapunov functions [Chang2019], barrier functions [Dean2020a, Qin2021, Peruffo2020], and contraction metrics [Sun2020, Tsukamoto2020] to prove the soundness of learned controllers. Unfortunately, these certificates’ guarantees are sensitive to uncertainties in the underlying model. In particular, if the model used during training differs from that encountered during deployment, then guarantees on safety and stability may no longer hold.
Our main contribution is a learning-based framework for synthesizing robust nonlinear feedback controllers from safety and stability specifications. This contribution has two parts. First, we provide a novel extension of control Lyapunov barrier functions to robust control, defining a robust control Lyapunov barrier function (robust CLBF). Second, we develop a model-based approach to learning robust CLBFs, which we use to derive a safe controller using techniques from robust convex optimization. Other methods for learning Lyapunov and barrier certificates exist, but a key advantage of our approach is that we learn certificates with explicit robustness guarantees, enabling generalization beyond the system parameters seen during training. We demonstrate our approach on a range of challenging control problems, including trajectory tracking, nonlinear control with obstacle avoidance, flight control with a learned model of ground effect, and a satellite rendezvous problem with non-convex safety constraints, comparing our approach with robust MPC. In all of these experiments, we find that our method either matches or exceeds the performance of robust MPC while reducing computational cost at runtime by at least a factor of 10.
2 Related Work
This work builds on a rich history of certificate-based control theory, including classical Lyapunov functions as well as more recent approaches such as control Lyapunov functions (CLFs [Artstein1983, Ames2014]) and control barrier functions (CBFs [Ames2019], a generalization of artificial potential fields [Singletary2020_cbf_apf]). The majority of classical certificate-based controllers rely on hand-designed certificates [Choi2020, Castaneda2020]
, but these can be difficult to obtain for nonlinear or high-dimensional systems. Some automated techniques exist for synthesizing CLFs and CBFs; however, many of these techniques (such as finding a Lyapunov function as the solution of a partial differential equation) are computationally intractable for many practical applications[Giesl2015]. Other automated synthesis techniques are based on convex optimization, particularly sum-of-squares programming (SOS, [Ahmadi2016]), but are limited to systems with polynomial dynamics and do not scale favorably with the dimension of the system.
A promising line of work in this area is to use neural networks to learn certificate functions. These techniques range in complexity from verifying the stability of a given control system[Abate2020, Richards2018] to simultaneously learning a control policy and certificate [Sun2020, Chang2019, Qin2021]. Most of these works do not explicitly consider robustness to model uncertainty, although contraction metrics may be used to certify robustness to bounded additive disturbance [Sun2020].
Most approaches to handling model uncertainty in the context of certificate-guided learning for control involve online adaptation. For example, [Choi2020, Taylor2019] assume that a CLF or CBF are given and learn the unmodeled residuals in the CLF and CBF derivatives. When combined with a QP-based CLF/CBF controller, this technique enables adaptation to model uncertainty but relies on a potentially unsafe exploration phase. Although safe adaptation strategies exist, the main drawback with these techniques is their reliance on a hand-designed CLF and CBF, which are non-trivial to synthesize for nonlinear systems. Additionally, combined CLF/CBF controllers are prone to getting stuck when the feasible sets of the CLF and CBF no longer intersect.
Online optimization-based control techniques such as model-predictive control (MPC) are also relevant as a general-purpose control synthesis strategy. However, the computational complexity of MPC, and particularly robust MPC, is a widely-recognized issue, particularly when considering deployment to resource-constrained robotic systems such as UAVs [rmpc, Levine2019]. We revisit the computational cost of robust MPC, particularly as compared with the cost of our proposed method, in Section 6. Some approaches apply learning to characterize uncertainty in system dynamics and augment a robust MPC scheme [Fan2020]
, but these methods do not fundamentally change the computational burden of MPC. Other methods rely on imitation learning to recreate an MPC-based policy online[Kahn2016], but these methods can encounter difficulties in generalizing beyond the training dataset.
A number of techniques from classical nonlinear control also deserve mention, such as sliding mode and adaptive controllers. These methods do not directly support state constraints and so must be paired with a separate trajectory planning layer [slotine_li_1991]. Another drawback is that these techniques require significant effort to manually derive appropriate feedback control laws, and we are primarily interested in automated techniques for controller synthesis.
3 Preliminaries and Background
We consider continuous-time, control-affine dynamical systems of the form , where , , and and are smooth functions modeling control-affine nonlinear dynamics. We assume that and depend on model parameters and are affine in those parameters for any fixed . This assumption on the dynamics is not restrictive; it covers many physical systems with uncertainty in inertia, damping, or friction (e.g. rigid-body dynamics or systems described by the manipulator equations), and it includes bounded additive and multiplicative disturbance as a special case. We also assume that and are Lipschitz but make no further assumptions, allowing us to consider cases when components of and are learned from experimental data. For concision, we will use and (without subscript) to refer to the dynamics evaluated with nominal parameters . In this paper, we consider the following control synthesis problem:
Definition 1 (Robust Safe Control Problem).
Given a control-affine system with uncertain parameters , a goal configuration , a set of unsafe states , and a set of safe states (such that and ), find a control policy such that all trajectories satisfying and have the following properties for any parameters : Reachability of with tolerance : Safety: implies
Simply put, we wish to reach the goal while avoiding the unsafe states . We use the notion of reachability instead of asymptotic stability to permit (small) steady-state error; in the following we will use “stable” as shorthand for reachability. Note that we do not require , as it will be made clear in the following discussion that we need a non-empty boundary layer to allow for flexibility in finding a safety certificate.
Lyapunov theory provides tools that are naturally suited to reach-avoid problems: control Lyapunov functions (for stability) and control barrier functions (for safety [Ames2017a]). To avoid issues arising from learning two separate certificates, we rely on a single, unifying certificate known as a control Lyapunov barrier function (CLBF). Our definition of CLBFs is related to those in [Romdlony2016] and [Xiao2021] (differing from the formulation in [Romdlony2016] by a constant offset , and differing from [Xiao2021] where safety and reachability are proven using two separate CLBFs). We begin by providing a standard definition of a CLBF in the non-robust case, but in the next section we provide a novel, robust extension of CLBF theory before demonstrating how neural networks may be used to synthesize these functions for a general class of dynamical system. In the following, we denote as the Lie derivative of along .
Definition 2 (Clbf).
A function is a CLBF if, for some ,
Intuitively, we can think of a CLBF as a special case of a control Lyapunov function where the safe and unsafe regions are contained in sub- and super-level sets, respectively. If we define a set of admissible controls , then we arrive at a theorem proving the stability and safety of any controller that outputs elements of this set (the proof is included in the supplementary material).
If is a CLBF then any control policy will be both safe and stable, in the sense of Definition 1.
Based on these results, we can define a CLBF-based controller, analogous to the CLF/CBF-based controller in [Choi2020] but without the risk of conflicts between the CLF and CBF conditions, relying on the CLBF and some nominal controller (e.g. the LQR policy):
It should be clear that , so this controller will result in a system that is certifiably safe and stable (with the CLBF acting as the certificate). The nominal control signal is included to encourage smoothness in the solution , particularly near the desired fixed point at where becomes small. CLBFs provide a single, unified certificate of safety and stability; however, some significant issues remain. In particular, how do we guarantee that a CLBF will generalize beyond the nominal parameters?
4 Robust CLBF Certificates for Safe Control
In this section, we extend the definition of CLBFs to provide explicit robustness guarantees, and we present a key theorem proving the soundness of robust CLBF-based control.
Definition 3 (Robust CLBF, rCLBF).
As in the non-robust case, we define the set of admissible controls for a robust CLBF, , and the corresponding QP-based controller, the soundness of which is given by Theorem 2:
If is a robust CLBF, then any control policy will be both safe and stable, in the sense of Definition 1, when executed on a system , with uncertain parameters (where is the convex hull of scenarios ).
See the supplementary materials. ∎
This result demonstrates the soundness and robustness of an rCLBF-based controller, but does not provide a means to construct a valid rCLBF. In the next section, we will present an automated model-based learning approach to rCLBF synthesis, yielding a general framework for solving robust safe control problems even for systems with complex, nonlinear, or partially-learned dynamics.
5 Learning Robust CLBFs
A persistent challenge in using of certificate-based controllers is the difficulty of finding valid certificates, especially for systems with nonlinear dynamics and complex specifications of and (e.g. obstacle avoidance). Taking inspiration from recent advances in certificate-guided learning for control [Chang2019, Qin2021], we employ a model-based supervised learning framework to synthesize an rCLBF-based controller. The controller architecture is comprised of three main parts: the rCLBF , a proof controller , and the QP-based controller (rCLBF-QP). We parameterize and as neural networks. These networks are trained offline, where is used to prove that the feasible set of (rCLBF-QP) is non-empty, then is evaluated online to provide the parameters of (rCLBF-QP), which is solved to find the control input. In the offline training stage, our primary goal is finding an rCLBF such that the conditions of Definition 3 are satisfied. To ensure (1b), we define , where
is the activation vector of the last hidden layer of theneural network. To train such that conditions (1a), (1c), (1d), and (3) are satisfied over the domain of interest, we sample points uniformly at random from to yield a population of training points , then define the empirical loss:
where – are positive tuning parameters, is a small parameter (typically ) that allows us to encourage strict inequality satisfaction and enables generalization claims, and are the number of points in the training sample in and , respectively, and
is the ReLU function. The terms in this empirical loss are directly linked to conditions (1a), (1c), (1d), and (3) such that each term is zero if the corresponding condition is satisfied at all training points. For example, the final term in this loss is designed to encourage satisfaction of the robust CLBF decrease condition (3). The factor in the final term is computed by solving (rCLBF-QP) at each training point and computing the maximum violation of constraint (4), such that when the QP has a feasible solution and and networks. During training, we rely on to compute the time derivative of in the final term of the loss. To provide a training signal for , we define an additional loss , where is a nominal controller (e.g. a policy derived from an LQR approximation). The parameters of and are optimized using the combined loss . The small weight applied to ensures that the training process prioritizes satisfying the CLBF conditions.
An important detail of our control architecture is that the learned control policy is used primarily to demonstrate that the feasible set of (rCLBF-QP) is non-empty. We are not required to use at execution time; we can choose any control policy from the admissible set . In the online stage, we rely on an optimization-based controller (rCLBF-QP), which solves a small quadratic program with constraints and variables (one for each element of ). To ensure that this QP is feasible at execution, we permit a relaxation of the CLBF constraints (4) and penalize relaxation with a large coefficient in the objective. Once trained, can be verified using neural-network verification tools [Liu2021], sampling [Bobiti2018], or a generalization error bound [Qin2021]. More details on data collection, training, implementation, and verification strategies are included in the supplementary materials.
It is important to note that this training strategy encourages satisfying (3) only on the finite set of training points sampled uniformly from the state space; there is no learning mechanism that enforces dense satisfaction of (3). In the supplementary materials, we include plots of 2D sections of the state space showing that (3) is satisfied at the majority of points, but there is a relatively small violation on a sparse subset of the state space. Because these violation regions are sparse, the theory of almost Lyapunov functions applies [liu2020almost]: small violation regions may induce temporary overshoots (requiring shrinking the certified invariant set), but they do not invalidate the safety and stability assurances of the certificate. Strong empirical results on controller performance in Section 6 support this conclusion, though we admit that good empirical performance is not a substitute for guarantees based on rigorous verification, which we hope to revisit in future work.
To evaluate the performance of our learned rCLBF-QP controller, we compare against min-max robust model predictive control (as described in [Lofberg2003, lofberg2012]) on a series of simulated benchmark problems representing safe control problems with increasing complexity. The first two concern trajectory tracking, where we wish to limit the tracking error despite uncertainty in the reference trajectory. The next two benchmarks are UAV stabilization problems that add additional safety constraints and increasingly nonlinear dynamics. The last three benchmarks involve highly non-convex safety constraints. The first four benchmarks provide a solid basis for comparison between our proposed method and robust MPC, while the last three demonstrate the power of our approach to generalize to maintain safety even in complex environments.
In each experiment, we vary model parameters randomly in , simulate the performance of the controller, and compute the rate of safety constraints violations and average error relative to the goal across simulations. These data are reported along with average evaluation time for each controller in Table 1. To examine the effect of control frequency on MPC performance, we include results for two different control periods for all robust MPC experiments (we also report the horizon length ). In some cases we observed that the evaluation time for MPC exceeds the control period; in practice this would lead to the controller failing, but in our experiments we simply ran the simulation slower than real-time. Our robust MPC comparison supports only linear models with bounded additive disturbance; we linearize the systems about the goal point and select an additive disturbance to approximate the disturbance from uncertain model parameters. The following sections will present results from each benchmark separately, and more details are provided in the supplementary materials, including the dynamics and constraints used for each benchmark, as well as the hardware used for training and execution.
|Task||Algorithm||Safety rate||Evaluation time (ms)|
|Car trajectory tracking||rCLBF-QP||0.7523||10.4|
|Kinematic model||Robust MPC (, )||1.5148||194.6|
|Robust MPC (, )||12.4438||172.8|
|Car trajectory tracking||rCLBF-QP||1.0340||9.6|
|Sideslip model||Robust MPC (, )||0.1560||336.5|
|Robust MPC (, )||18.1939||316.9|
|Robust MPC (, )||100%||0.0980||316.2|
|Robust MPC (, )||100%||63.6303||291.0|
|Robust MPC (, )||100%||0.2086||247.2|
|Robust MPC (, )||100%||0.3267||253.2|
|Robust MPC (, )||21%||1.3977||214.8|
|Robust MPC (, )||11%||1.9725||239.1|
|Robust MPC (, )||53%||276.9|
|Robust MPC (, )||0%||265.2|
|Robust MPC (, )||39%||6.3751||187.3|
|Robust MPC (, )||15%||9.0592||197.4|
|For car trajectory tracking, we compute maximum tracking error over the trajectory.|
|For 2D quadrotor, we compute % of trials reaching the goal with tolerance without collision.|
|Note: We also implemented SOS optimization to search for a CLBF and controller, but bilinear optimization (as in [Majumdar2013])|
|did not converge with maximum polynomial degree 10 and a Taylor expansion of the nonlinear dynamics.|
6.1 Car trajectory tracking
First, we consider the problem of tracking an a priori unknown trajectory using two different car models. In the first model (the kinematic model), the vehicle state is , representing error relative to the reference trajectory ( is the steering angle). The second model (the sideslip model) has state , where is the sideslip angle [Althoff2017a]. Both models have control inputs for the rate of change of and . We assume that the reference trajectory is parameterized by an uncertain curvature: at any point the angular velocity of the reference point can vary on . The goal point is zero error relative to the reference, and the safety constraint requires maintaining bounded tracking error.
The performance of our controller is shown in Fig. 2. We see that for both models, both our controller and robust MPC are able to track the reference trajectory. However, robust MPC was only successful when run at slower than real-time speeds (with a control period roughly twice as fast as the average evaluation time). MPC became unstable when run at a slower control frequency . In contrast, our rCLBF-QP controller runs in real-time with a control period of on a laptop computer. This significant improvement in speed is due primarily to the reduction in the size of (rCLBF-QP) relative to that of the QPs used by robust MPC. For example, for the sideslip model, our controller solves a QP with 2 variables and 2 constraints, whereas the robust MPC controller solves a QP with 35 variables and 23 constraints (after pre-compiling using YALMIP [lofberg2012]). Because the learned rCLBF encodes long-term safety and stability constraints into local constraints on the rCLBF derivative, the rCLBF controller requires only a single-step horizon (as opposed to the receding horizon used by MPC).
By comparing performance between these two models, we can discern an important feature of our approach. Increasing the state dimension when moving between models does not substantially increase the evaluation time for our controller (as it does for robust MPC), but it does degrade the tracking performance, suggesting that the number of samples required to train the CLBF to any given level of performance increases with the size of the state space. These examples also highlight a potential drawback of our approach, which relies on a parameter-invariant robust CLBF. Because it attempts to find a common rCLBF for all possible parameter values, our controller exhibits some small steady-state error near the goal. This occurs because there is no single control input that renders the goal a fixed point for all possible parameter values and motivates our use of a goal-reaching tolerance in Definition 1.
6.2 UAV stabilization
The next two examples involve stabilizing a quadrotor near the ground while maintaining a minimum altitude. Relative to the previous examples, these benchmarks increase the complexity of the state constraints, and we consider two models with increasingly challenging dynamics. The first model (referred to as the “3D quadrotor”) has 9 state dimensions for position, velocity, and orientation, with control inputs for the net thrust and angular velocities [Sun2020]. The second model (the “neural lander”) has lower state dimension, including only translation and velocity, with linear acceleration as an input, but its dynamics include a neural network trained to approximate the aerodynamic ground effect, which is particularly relevant to this safe hovering task [liu2020robust]. The mass of both models is uncertain, but assumed to lie on for the 3D quadrotor and for the neural lander.
Fig. 3 shows simulation results on these two models. The trend from the previous benchmarks continues: our controller maintains safety while reducing evaluation time by a factor of 10 relative to MPC. Moreover, while the robust MPC method can achieve low error relative to the goal for the the 3D quadrotor model, the nonlinear ground effect term prevents MPC from driving the neural lander to the goal. In contrast, the rCLBF-QP method can consider the full nonlinear dynamics of the system, including the learned ground effect, and achieves a much lower error relative to the goal.
6.3 Navigation with non-convex safety constraints
The preceding benchmarks all include convex safety constraints that can be easily encoded in a linear robust MPC scheme. Our next set of examples demonstrate the ability of our approach to generalize to complex environments. These problems are commonly solved by combining planning and robust tracking control, so in our comparisons we use robust MPC to track a safe reference path through each environment. In contrast, our rCLBF-QP controller is not provided with a reference path and instead synthesizes a safe controller using only the model dynamics and (non-convex) safety constraints, which is a more challenging problem than the tracking problem as in Section 6.1. The three navigation problems we consider are: (a) controlling a Segway to duck under an obstacle to reach a goal [aastrom2021feedback], (b) navigating a 2D quadrotor model around obstacles [Sun2020], and (c) completing a satellite rendezvous that requires approaching the target satellite from a specific direction [jewison2016spacecraft]. For (a) and (c), we conducted additional comparisons with a Hamilton-Jacobi-based controller (HJ, [toolboxls]) and policy trained via constrained policy optimization reinforcement learning (CPO, [cpo]). Simulated trajectories are shown in Fig. 4. Note that in the Segway and satellite examples, robust MPC fails to track the reference path, while the rCLBF controller successfully navigates the environment. HJ preserves safety in the satellite example but fails to reach the goal (which is positioned near the border of the unsafe region), while HJ controller synthesis failed in the Segway example (the backwards reachable set did not reach the start location with a horizon). Note that the HJ satellite controller requires different initial conditions, since it will fail if started outside of the safe region. The policy trained using CPO navigates to the goal in the satellite example, but it is not safe. In the Segway example, CPO does not learn a stable controller (details are given in the appendix).
7 Discussion & Conclusion
These results demonstrate two clear trends. First, the performance of our controller (in terms of both safety rate and error relative to the goal) is comparable to that of MPC when the MPC controller is stable. In some cases, our method achieves lower steady-state error due to its ability to consider highly nonlinear dynamics, as in the neural lander example. In other cases, the dynamics are well-approximated by the linearization and robust MPC achieves better steady-state error, but our approach still achieves a comparable safety rate. Second, we observe that the performance of the robust MPC algorithm is highly sensitive to the control frequency, and these controllers are only stable at control frequencies that cannot run in real-time on a laptop computer. This highlights one benefit of our method over traditional MPC, which trades increased offline computation for an order of magnitude reduction in evaluation time. In all cases, we find that our proposed algorithm finds a controller that satisfies the safety constraints despite variation in model parameters, validating our claim of presenting a framework for robust safe controller synthesis.
In summary, we present a novel, learning-based approach to synthesizing robust nonlinear feedback controllers. Our approach is guided by a robust extension to the theory of control Lyapunov barrier functions that explicitly accounts for uncertainty in model parameters. Through experiments in simulation, we successfully demonstrate the performance of our approach on a range of challenging safe control problems. A number of interesting open questions remain, including scalable verification strategies for , the sample complexity of this learning method, and the relative convergence rates of , , and the QP controller derived from
, which we hope to revisit in future work. We also plan on exploring application to hardware systems, including considerations of delay and state estimation uncertainty.
The NASA University Leadership Initiative (grant #80NSSC20M0163) and Defense Science and Technology Agency in Singapore provided funds to assist the authors with their research, but this article solely reflects the opinions and conclusions of its authors and not any NASA entity, DSTA Singapore, or the Singapore Government. C. Dawson is supported by the NSF Graduate Research Fellowship under Grant No. 1745302.
In addition to the sections below, we include a video demonstrating our controller’s performance on the kinematic car trajectory tracking and 2D quadrotor obstacle avoidance benchmarks. In addition, we include documented code for running several of our examples.
Proof of Theorem 1
The proof of Theorem 1 follows from the following lemmas, which prove stability and safety of CLBF-based control, respectively.
If is a CLBF, then any control policy will exponentially stabilize the system to .
Since , it follows that for the closed loop system. Thus, is a Lyapunov function and proves exponential stability about . ∎
If is a CLBF, then for any control policy and any initial condition , (i.e. any trajectory starting in the safe set will never enter the unsafe region).
For convenience, define . Since , condition (1c) implies that . Conditions (1b) and (1e) ensure that is strictly decreasing in time (except when , at which point is constant at zero). As a result, . If were to enter the unsafe region, there would exist such that . This is a contradiction, so we conclude that will never enter the unsafe region for . ∎
Proof of Theorem 2
By assumption, and are affine in . Additionally, the Lie derivatives and are affine in and , and the rCLBF constraint (4) is affine in and . As a result, the overall mapping from to the left-hand side of (4) is affine and thus maps the convex hull of to the convex hull of . It follows that if (4) is satisfied for each scenario then it will be satisfied for any possible . We can conclude that the rCLBF satisfies the conditions of a standard CLBF for any particular realization of the system with parameters , so the safety and stability results of Theorem 1 apply. ∎
Implementation of Learning Approach
In this section, we describe several details of our implementation of the system used to train and
. At a high level, our system is implemented in PyTorch[pytorch] using PyTorch Lightning [falcon2019pytorch]. All neural networks were implemented with activation functions, and we used batched stochastic gradient descent with weight decay for optimization (with learning rate and decay rate ). The next paragraphs describe our training strategies.
Sampling of training data: we found that training performance could be improved by specifying a fixed percentage of training points that must be sampled from the goal, safe, and unsafe regions. For example, instead of sampling points uniformly from the state space, we might sample uniformly from the goal region, uniformly from the unsafe region, uniformly from the safe region, and the remaining uniformly from the entire state space.
Network initialization: although it was not necessary for all experiments, we found that some experiments (particularly the car trajectory tracking benchmarks) performed better if the CLBF network was initialized to match the quadratic Lyapunov function found by linearizing the system about the goal point. After training for several epochs to match this quadratic initial guess, we then alternated between training and , optimizing one for several epochs before optimizing the other. We found that on some examples this stabilized the learning process. We did not notice an improvement from episodic learning, although this may be more useful when training on higher-dimensional systems.
during the development process, we optimized hyperparameters (, the size of the and networks, and the penalty applied to relaxations of the QP constraints) based on a combination of the empirical loss on a test data set and through controller performance in simulation. In most experiments, we found that and were sensible defaults, along with neural networks with 2 hidden layers of 64 units each. We found that tuning parameters and yield controllers that perform well in simulation.
Reach-avoid problem specification: when defining reach-avoid problems for this approach, care should be taken when specifying and . We found that it is necessary to have some region in between the safe and unsafe sets where the neural rCLBF has the freedom to adjust the boundary at as needed to find a valid rCLBF. In addition, we found that including a safety constraint that prevents the system from leaving the region where training data was gathered improves the controller’s performance.
rCLBF-QP Relaxation: to ensure that the controller is always feasible, we permit the QP to relax the constraints on the CLBF derivative, and the extent of this relaxation is penalized with a large coefficient in the QP objective. The penalty coefficients used in different experiments are included below. This relaxation also provides a useful training signal for the network. To make use of this signal, we solve (rCLBF-QP
) for each point at training-time and scale the last term of the loss function point-wise by the relaxation, effectively increasing the penalty for regions where the feasible set of (rCLBF-QP) is empty and decreasing the penalty in regions where there exists a feasible solution (even if has not yet converged to find that feasible solution).
Verification of Learned CLBFs
Our focus in this paper is primarily on the use of robust CLBFs to automatically synthesize feedback controllers for nonlinear safe control tasks. We find that our learning method yields functions that satisfy the rCLBF conditions in the vast majority of the state space, and yields feedback controllers that are successful in simulation, but we do not claim to have exhaustively verified our learned rCLBFs. Indeed, scalable verification for learned certificate functions remains an open problem. Relevant verification techniques include neural network reachability analysis (see [Liu2021] for a recent survey), SMT solvers [Chang2019], Lipschitz-informed sampling methods [Bobiti2018], and probabilistic claims from learning theory [Qin2021].
Additionally, these verification techniques might be used in future work to inform the training of an rCLBF neural network. For instance, spectral normalization [miyato2018spectral] of the rCLBF network would allow us to tune the Lipschitz constant of , enabling more effective use of Lipschitz-informed sampling verification tools. Similarly, reachability tools and SMT solvers can provide counter-examples to augment the training data and make further failures less likely [Chang2019]. Further, almost Lyapunov functions [liu2020almost, Boffi2020] show that even if the Lyapunov conditions do not hold everywhere the system is still provably stable; this result may generalize to CLBFs as well. These are all exciting directions that we hope to explore in our future work on this topic.
Implementation of Robust MPC
We implemented our robust MPC scheme in Matlab following the example in the YALMIP documentation [lofberg2012], which is in turn based on the algorithm published in [lofberg2012]. This MPC algorithm relies on a linearization of the system dynamics, and we used a constant linearization about the goal state. For trajectory tracking examples, we linearize the system about the reference trajectory.
The robust MPC problem was formulated in YALMIP and Gurobi [gurobi] was used as the underlying QP solver. When measuring evaluation times for robust MPC, we first use YALMIP to pre-compile the robust QP then measure the time needed to solve the compiled QP using Matlab’s built-in timeit function. We understand that additional optimizations (e.g. explicit MPC) might reduce the evaluation time of robust MPC further, but those optimizations can be applied equally well to speeding up the QP solution in our proposed controller. Effectively, for the purposes of measuring performance, we optimize both approaches to the point where a single quadratic program is being sent to the Gurobi QP solver, and so we believe we have provided a fair comparison in our results.
Implementation of Hamilton Jacobi Control Synthesis
To compute the Hamilton-Jacobi value function, we used the helperOC package at https://github.com/HJReachability/helperOC, which wraps the toolboxLS software [toolboxls]. We over-approximate the parametric uncertainty with an additive uncertainty. In the Segway example, where the unsafe set is defined in terms of , we over-approximate this unsafe set using a polytope defined on . We computed the HJ value function, then applied the optimal HJ controller forwards described in [hj_overview]. We used a time step of 0.05 seconds and a maximum horizon of 5 seconds while computing the backwards reachable set. The HJ value function was approximated on a grid, and the grid resolution was set to balance accuracy and running time.
Details on Simulation Experiments
This section reports the dynamics and hyperparameters used in our experiments. Note that in some of our examples, mass is an uncertain parameter but enters into the dynamics as (similarly for rotational inertia). In these cases we treat as the uncertain parameter and proceed with our method as described in Section 5. For clarity, we give the uncertainty ranges in terms of rather than in terms of the reciprocal.
Training was conducted on a workstation with a 32-core AMD 3970X CPU and four Nvidia GeForce 2080 Ti GPUs (one GPU was used for each training job, allowing us to parallelize our experiments). Runtime evaluation was conducted on a consumer laptop with an Intel i7-8565U CPU running at 1.8 GHz, and no GPU.
We use the kinematic single track model of a car given in the CommonRoad benchmarks [Althoff2017a]. We modify this model to express position and orientation relative to a reference path parameterized by , , , and (the linear velocity and acceleration, angle, and angular velocity of the reference path). To model a reference path with uncertain curvature, we treat as the uncertain parameter and assume that it vares on .
The state of the path-centric kinematic car model is , representing Cartesian error, steering angle, velocity error, and heading error, and the control inputs are and (the steering angle velocity and longitudinal acceleration). The dynamics are given by , with