## I Introduction

The rapid growth of Artificial Intelligence (AI) and Machine Learning (ML) disciplines has created a tremendous impact in almost every industry related to engineering, from finance to medicine, biology and general cyber-physical systems. The ability of most ML algorithms to deal with large amounts of data and learn high dimensional dependencies not only has expanded the capabilities of traditional disciplines but also has opened up new opportunities towards the development of decision making systems which operate in complex scenarios. Despite these recent successes

[silver2017mastering], there is low acceptance of AI and ML algorithms to safety-critical domains, including human-centered robotics, and particularly in the flight and space industries. For example, both recent and near-future planned Mars rover missions largely rely on daily human decision making and piloting, due to a very low acceptable risk for trusting autonomy algorithms and an inherent distrust of black-boxes. Therefore there is a need to develop computational tools and algorithms that bridge two worlds: the canonical structure of control theory, which is important for providing guarantees in safety-critical applications, and the data driven abstraction and representational power of machine learning, which is necessary for adapting the system to achieve resiliency against unmodeled disturbances.Towards this end, we propose a novel, lightweight framework for Bayesian adaptive control for safety critical systems, which we call BALSA (BAyesian Learning-based Safety and Adaptation). This framework leverages ML algorithms for learning uncertainty representations of dynamics which in turn are used to generate sufficient conditions for stability using stochastic CLFs and safety using stochastic CBFs. Treating the problem within a stochastic framework allows for a cleaner and more optimal approach to handling modeling uncertainty, in contrast to deterministic, discrete-time, or robust control formulations. We apply our framework to the problem of high-speed agile autonomous vehicles, a domain where learning is especially important for dynamics which are complex and difficult to model (e.g., fast autonomous driving over rough terrain). Potential Mars Sample Return (MSR) missions are one example in this domain. Current Mars rovers (i.e., Opportunity and Curiosity) have driven on average 3km/year [nasa_curiosity, nasa_opportunity]. In contrast, if MSR launches in 2028, then the rover has only 99 sols (102 days) to complete potentially 10km [2014_klein, 2019_nelessen]. After factoring in the intermittent and heavily delayed communications to earth, the need for adaptive, high-speed autonomous mobility could be crucial to mission success.

Along with the requirements for safety and adaptation, computational efficiency is of paramount importance for real systems. Hardware platforms often have severe power and weight requirements, which significantly reduce their computational power. Probabilistic learning and control over deep Bayesian models is a computationally intensive problem. In contrast, we shorten the planning horizon and rely on a high-level, lower fidelity planner to plan desired trajectories. Our method then guarantees safe trajectory tracking behavior, even if the given trajectory is not safe. This frees up the computational budget for other tasks, such as online model training and inference.

Related work - Machine-learning based planning and control is a quickly growing field. From Model Predictive Control (MPC) based learning [wagner, williams2018information]

, safety in reinforcement learning

[berkenkamp2017safe], belief-space learning and planning [kim2019bi], to imitation learning

[ross2011reduction], these approaches all demand considerations of safety under learning [Ostafew2016a, Pereida2018, Hewing2017, Shi2018]. Closely related to our work is Gaussian Process-based Bayesian Model Reference Adaptive Control (GP-MRAC) [Chowdhary2015], where modeling error is approximated with a Gaussian Process (GP). However, computational speed of GPs scales poorly with the amount of data (), and sparse approximations lack representational power. Another closely related work is that of [Nguyen], who showed how to formulate a robust CLF which is tolerant to bounded model error. Extensions to robust CBFs were given in [nguyen2016optimal]. A stated drawback of this approach is the conservative nature of the bounds on the model error. In contrast, we incorporate model learning into our formulation, which allows for more optimal behavior, and leverage stochastic CLF and CBF theory to guarantee safety and stability with probability 1. Other related works include [Cheng2019], which uses GPs in CBFs to learn the drift term in the dynamics , but uses a discrete-time, deterministic formulation. [Nguyen2015] combined L1 adaptive control and CLFs. Learning in CLFs has been considered in several works, e.g., [Taylor2019, gurriet2018towards].Contributions

- Here we take a unique approach to address the aforementioned issues, with the requirements of 1) adaptation to changes in the environment and the system, 2) adaptation which can take into account high-dimensional data, 3) guaranteed safety during adaptation, 4) guaranteed stability during adaptation and convergence of tracking errors, 5) low computational cost and high control rates. Our contributions are fourfold: First, we introduce a Bayesian adaptive control framework which explicitly uses the model uncertainty to guarantee stability, and is agnostic to the type of Bayesian model learning used. Second, we extend recent stochastic safety theory to systems with

switched dynamics to guarantee safety with probability 1. In contrast to adaptive control, switching dynamics are used to account for model updates which may only occur intermittently. Third, we combine these approaches in a novel online-learning framework (BALSA). Fourth, we compare the performance of our framework using different Bayesian model learning and uncertainty quantification methods. Finally, we apply this framework to a high-speed driving task on rough terrain using an Ackermann-steering vehicle and validate our method on both simulation and hardware experiments.## Ii Safety and Stability under Model Learning via Stochastic CLF/CBFs

Consider a system with dynamics:

(1) |

where , , and the controls are . For simplicity we restrict our analysis to systems of this form, but emphasize that our results are extensible to systems of higher relative degree [Nguyen2016], as well as hybrid systems with periodic orbits [ames2014rapidly]. A wide range of nonlinear control-affine systems in robotics can be transformed into this form. In general, on a real system, and may not be fully known. We assume is known and invertible, which makes the analysis more tractable. It will be interesting in future work to extend our approach to unknown, non-invertible control gains, or non-control affine systems (e.g. ). Let be a given approximate model of . We formulate a pre-control law with pseudo-control :

(2) |

which leads to the system dynamics being

(3) |

where is the modeling error, with .

Suppose we are given a reference model and reference control from, for example, a path planner:

The utility of the methods outlined in this work is for adaptive tracking of this given trajectory with guaranteed safety and stability. We assume that is continuously differentiable in , . Further, is bounded and piecewise continuous, and that is bounded for a bounded . Define the error . We split the pseudo-control input into four separate terms:

(4) |

where we assign and to a PD controller:

(5) |

Additionally, we assign as a pseudo-control which we optimize for and as an adaptive element which will cancel out the model error. Then we can write the dynamics of the model error as:

(6) |

where the matrices and are used for ease of notation. The gains should be chosen such that is Hurwitz. When , the modeling uncertainty is canceled out and the system can be controlled without error.

We construct an approximation of the modeling error as a Bayesian regression problem. In discrete time, the learning problem is formulated as finding a mapping from input data to output data . Given the dataset with , we can construct the model . Note that we do not require updating the model at each timestep, which significantly reduces computational load requirements and allows for training more expressive models (e.g., neural networks).

Our approximation will not be exact with limited data. We assume the use of a function approximation model which quantifies the uncertainty on its predictions, where the output of the model

is a multivariate Gaussian random variable. This model should

know what it doesn’t know [li2011knows], and should capture both the epistemic uncertainty of the model, i.e., the uncertainty from lack of data, as well as the aleatoric uncertainty, i.e., the uncertainty inherent in the system [Roy2011]. Methods for producing reliable confidence bounds include a large class of Bayesian neural networks ([Hafner2018, Harrison2018, gal2016dropout]), Gaussian Processes or its many approximate variants ([shahriari2015taking, pan2017prediction]), and many others. We anticipate that as this field matures the reliability and accuracy of these methods will continue to improve.By setting , the error dynamics can be written as the following switching stochastic differential equation (SDE):

(7) |

with and where and is a zero-mean Wiener process. is a switching index which updates each time the model is updated. The main problem which we address is how to find a pseudo-control which provably drives the tracking error to while simultaneously guaranteeing safety.

Since is not known a priori, one approach is to assume that is bounded by some known term. The size of this bound will depend on the type of model used to represent the uncertainty, its training method, and the distribution of the data . See [Chowdhary2015] for such an analysis for sparse online Gaussian Processes. For neural networks in general there has been some work on analyzing these bounds [yarotsky2017error, shi2019neural]. Without loss of generality, we let the modeling error to be fully captured in , i.e., . Then we have the following dynamics:

(8) |

with . This is valid as long as captures both the epistemic and aleatoric uncertainty accurately and as long as that uncertainty is distributed as a multivariate Gaussian. This is a special case of a more general infinite-dimensional SDE formulation where is replaced with , an infinite-dimensional random variable which captures the uncertainty of learning. For a more thorough discussion see [hennig2011optimal, pan2015sample]. Note also that if the bounds on are known, then our results are easily extensible to this case via (7).

### Ii-a Stochastic Control Lyapunov Functions for Switched Systems

We establish sufficient conditions on to guarantee convergence of the error process to 0. The result is a linear constraint similar to deterministic CLFs (e.g., [nguyen2016optimal]). The difference here is the construction a stochastic CLF condition for switched systems. The switching is needed to account for online updates to the model as more data is accumulated.

In general, consider a switched SDE of Itô type [khasminskii2011stochastic] defined by:

(9) |

where , is a Wiener process, is a

-vector function,

a matrix, and is a switching index. The switching index may change a finite number of times in any finite time interval. For each switching index, and must satisfy the Lipschitz condition with compact. Then the solution of (9) is a continuous Markov process.###### Definition II.1.

is said to be *exponentially mean square ultimately bounded uniformly in* i if there exists positive constants such that for all , we have that .

We first restate the following theorem from [Chowdhary2015]:

###### Theorem II.1.

Let be the process defined by the solution to (9), and let be a function of class with respect to , and class with respect to . Denote the Itô differential generator by . If 1) for real ; and 2) for real , and all i; then the process is exponentially mean square ultimately bounded uniformly in . Moreover, , , and .

###### Proof.

See [Chowdhary2015] Theorem 1. ∎

We use Theorem II.1 to derive a stochastic CLF sufficient condition on for the tracking error . Consider the stochastic Lyapunov candidate function where is the solution to the Lyapunov equation , where is any symmetric positive-definite matrix.

###### Theorem II.2.

###### Proof.

The Lyapunov candidate function is bounded above and below by . We have the following Itô differential of the Lyapunov candidate:

(11) |

Rearranging, (10) becomes . Setting , we see that the conditions for Theorem II.1 are satisfied and is exponentially mean square ultimately bounded uniformly in . Moreover,

(12) |

where is the condition number of the matrix . Therefore if for all , converges to 0 exponentially in the mean square sense. ∎

The relaxation variable allows us to find solutions for which may not always strictly satisfy a Lyapunov stability criterion . This allows us to incorporate additional constraints on at the cost of losing convergence of the error to 0. Fortunately, the error will remain bounded by the largest . In practice we re-optimize for a new at each timestep. This does not affect the result of Theorem II.2 as long as we re-optimize a finite number of times for any given finite interval.

One highly relevant set of constraints we want to satisfy are control constraints , where is a matrix and is a vector. Let . Recall the pre-control law (2). Then the control constraint is:

(13) |

Next we formulate additional constraints to guarantee safety.

### Ii-B Stochastic Control Barrier Functions for Switched Systems

We leverage recent results on stochastic control barrier functions [clark2019] to derive constraints linear in which guarantee the process satisfies a safety constraint, i.e., for all . The set is defined by a locally Lipschitz function as and . We first extend the results of [clark2019] to switched stochastic systems.

###### Definition II.2.

Let be a switched stochastic process defined by (9). Let the function be locally Lipschitz and twice-differentiable on . If there exists class-K functions and such that for all , , then is called a *candidate control barrier function*.

###### Definition II.3.

Let be a candidate control barrier function. If there exists a class-K function such that , then is called a *control barrier function (CBF)*.

###### Theorem II.3.

Suppose there exists a CBF for the switched stochastic process defined by (9). If , then for all , .

###### Proof.

[clark2019] Theorem 1 provides a proof of the result for non-switched stochastic processes. Let denote the switching times of , i.e., when , the process has diffusion matrix , and when for , the process has diffusion matrix . If , then for all with probability 1 since the process does not switch in the time interval . By similar argument for any if then for all with probability 1. This also implies that , since is a continuous Markov process. Then for all with probability 1. Then by induction, for all , . ∎

Next, we establish a linear constraint condition sufficient for to guarantee safety for (8). Rewrite (8) in terms of as:

(14) | |||

###### Theorem II.4.

### Ii-C Safety and Stability under Model Adaptation

We can now construct a CLF-CBF Quadratic Program (QP) in terms of incorporating both the adaptive stochastic CLF and CBF conditions, along with control limits (Equation (17)):

(17) | ||||

(Adaptive CLF) | ||||

(Adaptive CBF) | ||||

In practice, several modifications to this QP are often made ([Nguyen2016],[ames]). In addition to a relaxation term for the CLF in Theorem II.2, we also include a relaxation term for the CBF. This helps to ensure the QP is feasible and allows for slowing down as much as possible when the safety constraint cannot be avoided due to control constraints, creating, e.g., lower impact collisions. Safety is still guaranteed as long as the relaxation term is less than 0. For an example of guaranteed safety in the presence of this relaxation term see [nguyen2016optimal], also see [gurriet2018towards] for an approach to handling safety with control constraints. The emphasis of this work is on guaranteeing safety in the presence of adaptation so we leave these considerations for future work. Our entire framework is outlined in Algorithm 1.

## Iii Application to Fast Autonomous Driving

In this section we validate BALSA on a kinematic bicycle model for car-like vehicles. We model the state as position in x and y, heading, and velocity respectively, with dynamics . where is the input acceleration, is the vehicle length, and is the steering angle. We employ a simple transformation to obtain dynamics in the form of (1). Let where , , , , and . Let the controls . Then fits the canonical form of (1). To ascertain the importance of learning and adaptation, we add the following disturbance to to use as a “true” model:

(18) |

This constitutes a non-linearity in the forward velocity and a tendency to drift to the right.

We use the following barrier function for pointcloud-based obstacles. Similar to [nguyen2016optimal], we design this barrier function with an extra component to account for position-based constraints which have a relative degree greater than 1. Let our safety set , where is the position of an obstacle. Let where is the radius of a circle around the obstacle. Then construct a barrier function . As shown by [Nguyen2016], is a CBF, where helps to control the rate of convergence. We chose and .

### Iii-a Validation of BALSA in Simulation

One iteration of the algorithm for this problem takes less than on a 3.7GHz Intel Core i7-8700K CPU, in Python code which has not been optimized for speed. We make our code publicly available^{1}^{1}1https://github.com/ddfan/balsa.git. Because training the model occurs on a separate thread and can be performed anytime online, we do not include the model training time in this benchmark. We use OSQP [osqp] as our QP solver.

In Figure 2, we compare BALSA with several different baseline algorithms. We place several obstacles in the direct path of the reference trajectory. We also place velocity barriers for driving too fast or too slow. We observe that the behavior of the vehicle using our algorithm maintains good tracking errors while avoiding barriers and maintaining safety, while the other approaches suffer from various drawbacks. The adaptive controller (ad) and PD controller (pd) violate safety constraints. The (qp) controller with an inaccurate model also violates constraints and exhibits highly suboptimal behavior (Figure 3). A robust (rob) formulation which uses a fixed robust bound which is meant to bound any model uncertainty [nguyen2016optimal], while not violating safety constraints, is too conservative and non-adaptive, has trouble tracking the reference trajectory. In contrast, BALSA adapts to model error with guaranteed safety. We also plot the model uncertainty and error in (Figure 3

). For this experiment we used a Neural Network trained with dropout and a negative-log-likelihood loss function

[gal2016dropout].### Iii-B Comparing Different Modeling Methods in Simulation

Next we compared the performance of BALSA on three different Bayesian modeling algorithms: Gaussian Processes, a Neural Network with dropout, and ALPaCA [Harrison2018]

, a meta-learning approach which uses a hybrid neural network with Bayesian regression on the last layer. For all methods we retrained the model intermittently, every 40 new datapoints. In addition to the current state, we also included as input to the model the previous control, angular velocity in yaw, and the current roll and pitch of the vehicle. For the GP we re-optimized hyperparameters with each training. For the dropout NN, we used 4 fully-connected layers with 256 hidden units each, and trained for 50 epochs with a batch size of 64. Lastly, for ALPaCA we used 2 hidden layers, each with 128 units, and 128 basis functions. We used a batch size of 150, 20 context data points, and 20 test data points. The model was trained using 100 gradient steps and online adaption (during prediction) was performed using 20 of the most recent context data points with the current observation. Figure (

4) and Table (I) show a comparison of tracking error for these methods. We found GPs to be computationally intractable with more than 500 data points, although they exhibited good performance. Neural networks with dropout converged quickly and were efficient to train and run. ALPaCA exhibited slightly slower convergence but good tracking as well.No learn | GP | Dropout | ALPaCA | |
---|---|---|---|---|

0-60s | 0.580 | 0.3992 | 0.408 | 0.390 |

60-120s | 0.522 | 0.097 | 0.105 | 0.110 |

### Iii-C Hardware Experiments on Martian Terrain

To validate that BALSA meets real-time computational requirements, we conducted hardware experiments on the platform depicted in Figure (5

). We used an off-the shelf RC car (Traxxas Xmaxx) in 1/5-th scale (wheelbase 0.48 m), equipped with sensors such as a 3D LiDAR (Velodyne VLP-16) for obstacle avoidance and a stereo camera (RealSense T265) for on-board for state estimation. The power train consists of a single brushless DC motor, which drives the front and rear differential, operating in current control mode for controlling acceleration. Steering commands were fed to a servo position controller. The on-board computer (Intel NUC i7) ran Ubuntu 18.04 and ROS

[quigley2009ros].Experiments were conducted in a Martian simulation environment, which contains sandy soil, gravel, rocks, and rough terrain. We gave figure-eight reference trajectories at 2m/s and evaluated the vehicle’s tracking performance (Figure 5). Due to large achieving good tracking performance at higher speeds is difficult. We observed that BALSA is able to adapt to bumps and changes in friction, wheel slip, etc., exhibiting improved tracking performance over a non-adaptive baseline (Table II).

Mean Err | Std Dev | Max | |
---|---|---|---|

No Learn | 1.417 | 0.568 | 6.003 |

Learning | 0.799 | 0.387 | 2.310 |

Mean, standard deviation, and max tracking error on our rover platform for a figure-8 task.

We also evaluated the safety of BALSA under adaptation. We used LiDAR pointclouds to create barriers at each LiDAR return location. Although this creates a large number of constraints, the QP solver is able to handle these in real-time. In order to remain invariant to localization drift, at each timestep of BALSA’s optimization we used the most recent LiDAR pointcloud only. Figure 6 shows what happens when an obstacle is placed in the path of the reference trajectory. The vehicle successfully slows down and comes to a stop if needed, avoiding the obstacle altogether.

## Iv Conclusion

In this work, we have described a framework for safe, fast, and computationally efficient probabilistic learning-based control. The proposed approach satisfies several important real-world requirements and take steps towards enabling safe deployment of high-dimensional data-driven controls and planning algorithms. Further development on autonomous fast-driving vehicles is the immediate next step, while applications to other types of robots including drones, legged robots, and manipulators will be straightforward. Incorporating better uncertainty-representing modeling methods and training on higher-dimensional data (vision, LiDAR, etc) will also be a fruitful direction of research.

## Acknowledgement

The authors would like to thank Joel Burdick’s group for their hardware support. This research was partially carried out at the Jet Propulsion Laboratory (JPL), California Institute of Technology, and was sponsored by the JPL Year Round Internship Program and the National Aeronautics and Space Administration (NASA). Jennifer Nguyen was supported in part by NASA EPSCoR Research Cooperative Agreement WV-80NSSC17M0053 and NASA West Virginia Space Grant Consortium, Training Grant #NX15AI01H. Evangelos A. Theodorou was supported by the C-STAR Faculty Fellowship at Georgia Institute of Technology. Copyright ©2019. All rights reserved.