Trained deep control policies via reinforcement learning (RL) and imitation learning (IL) allow to generate control outputs directly from sensor inputs. However, in contrast to simulations and games, applying such techniques to real-world safety-critical applications remains an incredibly challenging task. The strong reliance on the deep neural networks makes them vulnerable to overconfident or unpredictable results when presented with data distributions unseen during training. Additionally, in real life scenarios there might be environmental factors (e.g. friction, wind, viscosity etc.) and uncertainties due to machine perception, that may not have been explicitly modeled in the formulation. The importance of predictions and actions being robust to such exogenous variations is paramount in safety-critical aspects of real-world robotics.
Much of the previous work on safety in deep control policies has focused on modifying the training phase. These include reward engineering [long2018towards], constrained optimization to incorporate safety constraints [bouton2019safe] and worst-case optimization [tang2019worst]. Barring simple safety constraints most of the prior work do not provide safety guarantees would hold at the deployment phase (test time). This is due to the fact that it is difficult to characterize or enumerate the complete state space that the agent is required to operation in. For example, it is impossible to characterize a-priory all images a robot would see. Furthermore, the mathematical structure of the deep policies further makes it difficult to provide analysis of the deep policies.
In this work we explore a runtime alternative that aims to keep the system safe by providing minimal deviations of control signals stemming from an embedded deep control policy. The framework ensures that the structure of safety is always preserved via a barrier function, while the agent continues to make progress towards the task it was trained for. In particular, the work extends Safety Barrier Certificates (SBC) ([Wang_2017]) to handle safety considerations in deep control policies. We specifically focus on the problem of autonomous drone-racing, where a quadrotor needs to negotiate several gates without collisions while moving as fast as possible on a racing track.
There are several technical challenges that we encounter and address. First, deep control policies are often used when there is no explicit model of systems dynamics available. In absence of such a dynamic model it is non-trivial to use SBCs. Second, the safety constraints for applications such as drone-racing can be complex. For example in our application case the quadrotors are not allowed to collide with the gates or other objects in the environment. Representations such as Euclidean signed distance fields (ESDF) are popular and useful to formally define the safety conditions. However, its unclear on how SBCs can be applied here. Finally, the safety framework also needs to account for any uncertainty and non-determinism that might arise due to environmental factors.
The core insight in this work is that for many real-world applications, system dynamics are well approximated as an ordinary differential equation that is uniformly continuous, bounded and Lipschitz continuous. This allows us to make a locally linear approximation while ensuring that the approximation error is small. Similarly, we incorporate safety constraints defined over ESDFs via smooth approximations and finally discuss extension to probabilistic safety in order to address uncertainty and non-determinism. We implement our framework in simulation and show that our method results in guaranteeable safety and improved avoidance compared to the original deep control policies, even under perception uncertainty.
2 Related Work
Recent research has focused on learning control policies directly from raw data using deep neural networks (DNN) using either imitation or reinforcement learning ([pan2018agile], [lillicrap2015continuous]). Much of the work in safe deep control focuses on training time and aims to induce risk-aversion via reward function or through constrained optimization (garcia2012safe; achiam2017constrained; long2018towards). However, none of these approaches guarantee safety during the test or deployment phase. Formal verification and certification has also been proposed to address the application of deep neural networks in safety-critical applications. For example, (katz2017reluplex; liu2019algorithms)
focus on verification procedure of DNNs through analysis of activation functions and layers. Similarly,dutta2018learning perform verification for a feedback control network using a receding horizon formulation that attempts to enforce properties such as reachability, safety and stability. fisac2019bridging discuss control-theoretic modifications to reinforcement learning for safety analysis. The notion of probabilistic safety under uncertainty has also been explored previously via formal methods (Sadigh-RSS-16). Much of this work results in computationally intensive procedures that cannot be easily used in real-time systems.
Safety Barrier Certificates (SBC) with permissive control barrier functions (CBF), have been previously used to guarantee runtime safety in both deterministic and non-deterministic settings (ames2016control; Ames_2019; Wang_2017). The key idea is to first define a barrier function by considering a set of unsafe states and the system dynamics and then use it to minimally modify a given controller so that the resulting solution is safe. The framework can be extended to handle uncertainty in the environment to probabilistically guarantee safety (Wenhao2019). Recent work by bajcsy2019efficient also proposed a real-time safety framework on top of learning-based planners, based on Hamilton-Jacobi reachability. This paper builds upon this line of works where the key idea centers on wrapping a deep control policy within the SBC framework. However, unlike prior applications of SBC in our case there is no explicit system dynamic model available. Furthermore, in most of the cases where deep controllers are applied there is uncertainty that stems from the sensing and perception part of the system.
We demonstrate the framework on the task of autonomous drone-racing, where a quadrotor uses an RGB camera to perceive and negotiate multiple gates as fast as possible on a racing track. The approaches to solve this task considers inferring simple representation of the environment, and then using either classical control and planning methods (kaufmann2019beauty; Li_2019) or building deep controllers (pmlr-v87-kaufmann18a; bonatti2019learning). In this paper we mostly explore deep policies and assume that at least one gate is always in the field of view of the quadrotor.
3 Proposed Framework
Assume we have been given a policy, that produces a control signal . The goal of the proposed framework is to provide a projection of such that the system is safe with respect to safety constraints. In order to use the SBC framework we need to first characterize (1) system dynamics, (2) safety constraints and (3) handle uncertainty. We describe these in detail below:
System Dynamic Model: Due to absence of an explicit system dynamic model in most deep control scenarios we need to make certain assumptions. Specifically, similar to herbert2017fastrack we make the assumption that the underlying unknown dynamic of the system is uniformly continuous, bounded and Lipschitz continuous. Additionally, we further make an assumption that the approximation error, uncertainty and non-determinism in the system could be explained via noise with finite support (Wenhao2019). This allows us to define a simplified dynamics model for a robot evolving as a continuous time system:
where is the system state, the control affine dynamics is and the noisy observation is . is the control input action, where are the process and measurements noises. We treat that the deviations arising from approximations, disturbances or noise can be fully captured by quantities and .
Safety Constraints: Given our application domain of drone-racing we propose safety constraints based in rich representations common in robotics. Our obstacle model is a static model and is represented through a distance transform inspired by the Euclidean signed distance field (ESDF). We define three regions for every obstacle with pose that are subsets of the 3D metric space: inside the obstacle, outside the obstacle and as its border. For any point in 3D space that is the position of the robot, we define a custom distance function to obstacle as follows:
Under the assumption that a robot cannot physically be inside an obstacle, i.e. for every state , there are properties of the distance function defined in (2) that is useful in defining the safety barrier. In particular we make the following observation: where is Lipschitz continuous and differentiable almost everywhere and bounded under a finite support. We define a state to be safe with respect to an obstacle with the pose if the following conditions hold:
The set indicates the set of states that are safe with respect to the obstacle , where is a buffer safety radius. Naturally, the condition of ensures valid robot states lie only outside obstacles. For our application in drone-racing we consider square gates as obstacles (see Fig. 1). Additionally, considering the finite boundary of (2) and the fact that all our obstacles are the same, we precompute the signed distance field using (2) for all positions of the robot within a region of interest relative to the gate.
Safety Under Uncertainty
: In most of the deep control scenarios the state variable used to define and evaluate safety is latent. Consequently, we assume that a state estimation routine is available that would provide the system state with a bounded error. In our work on drone-racing, we use a Variational Autoencoder (VAE) based module to estimate the pose of the gates. This estimation is noisy, but considered to have a finite support. To address safety under the uncertainty arising due to such state estimation, we perform worst-case safety computation. Formally, we define a new distance function as follows:
where is the set of points that is occupied when the obstacle is replicated at all possible poses within the error threshold of the predicted pose. We can represent a new obstacle with pose that comprises of all possible positions such that . This new obstacle allows us to consider the worst-case scenarios under the uncertainty and can be tackled using the same safety definition as in (4). A new distance map can be also precomputed. This method simplifies the way to provide safety under uncertainties as the basic underlying fabric stays unchanged. Fig. 1, Fig. 2 show the distance maps with and without considering uncertainty and the difference in the measurements of the obstacle. Now the safe set considering worst-case scenarios under uncertainty can be defined by:
Its easy to show using Remark 5 that there exists and equivalent safety set that considers the original ESDF using the newly constructed obstacle with pose :
4 Probabilistic Safety Barrier Certificates
Barrier certificates, or barrier functions, are used to ensure that robots remain in safe sets for all time. Controllers are expected to satisfy the barrier certificates while taking control actions that are as close as possible to the nominal action. Under the assumption that at least one obstacle is in the field of view of the robot at any time, we simplify the notation and represent the safety set and constraints as a function of the next obstacle pose relative to the robot. The set is again defined by all states that correspond to the center of the robot to be outside the obstacle. Thus safety set in the new simpler representation is defined similarly to (4):
Based on the theory of Zeroing Control Barrier Functions (ZCBF) and SBC, some conditions need to be applied to the controller to guarantee forward invariance of the safety set. A continuously differentiable function is a ZCBF, and the admissible control space can be defined as:
Any Lipschitz continuous controller guarantees that the set is forward invariant. Considering the extended class- function as for , and based on the admissible control space, the SBC that defines the constraints can be formulated as:
It is easy to show that this leads to linear constraints over the space of controllers (Wenhao2019). Furthermore, it is also possible to pre-compute a distance map for over a fine grid of all possible poses, thus it is efficient to determine the relevant constraints efficiently in runtime.
In order to ensure defined in (8) is continuously differentiable, we can use a smooth approximation to (2) (for example using the softmax trick) can be used. In our experiments, we simply work with ESDFs noting that the regions of non-diffrentiability arise only at the places where the vehicle is guaranteed to be safe by a wide margin. Second, we transform the VAE’s estimated gate position coordinates from spherical to Euclidean coordinates, where the quadrotor’s yaw angle is equal to the predicted angle with respect to the obstacle. These actions help prove the continuity of the derivative of . Thus we can formally rewrite as:
Further, under the assumption of locally linear dynamics, we can effectively equate in equation (11) to the action . Finally, we formulate our safety problem as a Quadratic Program (QP) to minimally change the action if needed, i.e. modify the original control action if it is found to be in violation of the safety constraints. Formally, we solve the following program using the safety constraints defined in (8) and the SBC (11):
where is the boundary of the controller action, is the original deep policy control and the safe action is denoted by . In practice considering worst-case safety leads to larger number of modifications to the original controller (for example see Fig. 1 and Fig. 2).
|(a) Performance on Safety||(b) Performance on Task Success|
5 Experiments and Results
We performed experiments to verify the robustness of the method and understand its limitations via a drone-racing simulation built on top of AirSim (Shah_2017). Each experiment comprised a quadrotor navigating through a set of ten racing tracks for three loops, where each track was around 50m in length and had 8 gates positioned randomly. Each experiment was associated with one of four difficulty levels (ranging from 0 to 1.5 with a step size of 0.5), defined by the maximum offset between the centers of two consecutive gates, where a larger offset requires more maneuvering to stay on track.
We use two key metrics for evaluation: Safety and the ability to solve the given tasks successfully. A trial is defined by maneuvering through three consecutive laps of a track, while it is defined as safe when the quadrotor stays collision-free over the entire trial. The percent of gates negotiated safely through a trial is a measure of success on the task. We wish to explore if the proposed framework allows us to be safe while being still competitive as defined by the success criterion.
For the perception module and baseline control policies, we use the networks from bonatti2019learning
: a variational auto-encoder (VAE-constrained) that predicts next gate poses and Behavior Cloning (BC) policies constrained and unconstrained, which are the best performing networks for control in the mentioned work. We compare both deep control policies with our proposed safety framework with and without uncertainty. Our uncertainty estimation uses errors in gate pose estimation computed empirically bybonatti2019learning.
Fig. 3 shows the performances of both baselines when augmented with our optimization method, by safety and success metrics. In our experiments, the success rate seems to be almost similar for all methods, with a slight advantage for the safety method considering uncertainty. As the track difficulty increases to 1.5, we observed the safety performance of the original policies deteriorate drastically, while that of the safety policies decrease slower. We observe that the best safety rates were achieved when considering uncertainty.
|(a) Trajectories on four track difficulty levels||(b) Minimum distance - Averaged over track difficulty level|
), over the 10 experiments for each difficulty level. The horizontal line indicates median, while the boundaries of the boxes denote the 25% and 75% quantile levels respectively. The whiskers correspond to the most extreme data points not considered outliers, and the outliers are plotted indivudally as ‘o’. For difficult tracks the proposed framework leads to a better separation between the gates and the quadrotor.
We evaluated the experiments also by a distance metric defined in (2). For every trial we recorded the minimum distance between the quadrotor and the next gate, which indicates how close the quadrotor was to a possible collision. If the trial ended in collision, then the score is zero. Fig. 4b shows the minimum distance values seen, averaged for each difficulty level. The results show that our proposed safety method considering uncertainties achieves the best performance overall. A visualization of safe control commands and trajectories are shown in Fig. 4a and Fig. 5, applied to the BC unconstrained network. In Fig. 4a, we show the differences between the original policy trajectories and the safety controls with and without considering uncertainties. For the first three track difficulty levels the trajectories are almost the same, but when the difficulty level increases to 1.5, then the original policy leads to a collision with gate 2, the safety method leads to a collision with gate 4, but when considering uncertainty the trajectory is safe and leads out of track. A detailed control visualization is shown in Fig. 5, where the actions of the original policy are violating the safety constraints and would lead to a possible collision with a gate, whereas the safety method with uncertainty computes a collision-free action.
We have observed a few limitations of the proposed method in our experiments. For example, when the angle between the current gate and next gate was too sharp but still in the field of view of the quadrotor, occasionally, all methods caused a collision with the current gate. Another issue we encountered was when the quadrotor starts a trial facing a gate’s pole, and in close proximity. In this situation, the quadrotor most of the time collided with the gate for all the methods, which could have been because of significant noise in gate estimated position and estimation errors exceeding the worst case values considered. One way to overcome such an issue is to consider optimization for the next two gates instead of only one.
6 Conclusions and Future Work
We have presented a framework for safe deep control policies for the task of drone-racing. At the heart of our method are safety barrier certificates, used to minimally change the controller to ensure forward invariance of safety. The main idea to overcome uncertainty in obstacle position is considering the worst case in error threshold of predicted obstacle pose and building a pre-computed distance map through Euclidean signed distance field. Our experiments show that using our proposed method is elevating the safety rate of deep control policies, while still achieving competitive results. Future work includes investigating a prediction process of more than one gate position. We would also be interested in exploring the use of this method during training time of deep control policies, to balance safety and performance before execution.
We would like to thank Ratnesh Madaan and Rogerio Bonatti for their inputs regarding the baseline perception and control policies; as well as Matthew Brown and Nicholas Gyde for their help with the simulations.