Modern applications of ground, aerial, and underwater mobile robots necessitate safe operation in unexplored and unstructured environments. Planning and control techniques that ensure safety in such environments, relying on limited field-of-view observations, are critical for autonomous navigation. A successful and widely used approach for safe control of dynamical systems is to encode safety constraints through control barrier functions (CBFs) [cbf] and combine the CBF constraints with the control objectives in a quadratic programming (QP) formulation. A common assumption in specifying CBF safety constraints is that the robot already has complete knowledge of the unsafe regions. However, navigation in unknown environments requires the online estimations of unsafe regions using onboard sensing.
This paper studies a navigation problem in which a robot, equipped with a LiDAR scanner, is tasked to follow a motion plan relying on streaming range measurements to ensure safety. A control Lyapunov function (CLF) [cbf] is utilized to specify the trajectory tracking objective. We construct CBFs approximating the shape of the objects in the environment and introduce associated robust safety constraints in the optimization of the robot’s control policy to ensure obstacle avoidance.
The signed distance function (SDF) of a set in a metric space determines the distance of a given point to the boundary of , with the sign determined by whether or not is in
. Observing that CBF and SDF are both level-set functions, we approximate the SDFs of observed objects and define the corresponding CBFs. Using the LiDAR measurements at each time instant, we incrementally train a multilayer perceptron (MLP) to model each obstacle’s SDF shape in real time. To balance the accuracy and efficiency of SDF reconstruction, we employ replay memory in training[prioritized].
After obtaining SDF approximations of the obstacles, we can efficiently estimate the distance and gradient information for the barrier functions. We take into account the errors in the estimation for the controller synthesis, resulting in safety constraints where the input appears linearly and also within an -norm term. The presence of such constraints means that the control synthesis problem can no longer be stated as a QP. Nevertheless, we show that the resulting optimization problem can be formulated as a second-order cone programming (SOCP) problem and therefore can still be solved efficiently. The proposed CLF-CBF-SOCP framework integrates the SDF estimation of the CBFs to synthesize safe controls at each time instant, allowing the robot to safely navigate the unknown environment.
This paper makes the following contributions.
An online incremental training approach, utilizing replay memory, is proposed for deep neural network approximation of the signed distance functions of objects observed with streaming distance measurements.
Our analysis shows that incorporating safety constraints, that account for the worst-case estimation error of the SDF approximations, into a control synthesis optimization problem leads to a convex SOCP formulation.
Ii Related Work
Occupancy mapping techniques aim to partition an environment into occupied and free subsets using sensor observations. Octomap [hornung2013octomap]
is a probabilistic occupancy mapping algorithm that uses a tree-based structure to store and update the probability of occupied and free space based on streaming range observations. Occupancy information alone may not be enough for safe planning and control, since many algorithms require distance (or even gradient) information to the obstacle surfaces. Voxblox[oleynikova2017voxblox] is an incremental truncated SDF mapping algorithm that approximates the distance to the nearest surface at a set of weighted support points. The approach also provides an efficient extension from truncated to complete (Euclidean) SDF using an online wavefront propagation algorithm. Fiesta [luxin2019fiesta] further improves the efficiency and accuracy of building online SDF maps by using two independent queues for inserting and deleting obstacles separately and doubly linked lists for map maintenance. Recently, impressive results have been achieved in object shape modeling using deep neural networks. DeepSDF [deepsdf] and Occupancy Nets [occupancynet]
implicitly represent 3D shapes by supervised learning of SDF approximations via fully connected neural networks. Groppet al. [implicitsurface] further extended the DeepSDF model by introducing the Eikonal constraint that any SDF function must satisfy in training the neural network. In our paper, we further extend this offline training method in two ways: enabling online training by introducing replay memory, which helps the network to remember the reconstructed shape from past observations, and adding truncated signed distance points to the training set to ensure the correct sign of the approximated SDF.
Quadratic programming with CBF constraints offers an elegant and efficient framework for safe control synthesis in robot navigation tasks [glotfelter2017nonsmooth, xu2017realizing]. This approach employs CLF constraints to stabilize the system and achieving the control objectives whenever possible, whereas the CBF constraints are used to ensure safety of the resulting controller at all times. CBFs are used for safe multi-robot navigation in [glotfelter2017nonsmooth]. CLF and CBF constraints are combined to solve a simultaneous lane-keeping (LK) and adaptive speed regulation (ASR) problem in [xu2017realizing]. Here, CLFs are used to ensure convergence to the control objectives for LK and ASR, where CBFs are used to meet safety requirements. However, the barrier functions in these approaches are assumed to be known. When the environment is unknown, a robot may only rely on the estimation of barrier functions using its sensors. A closely related work by Srinivasan et al. [srinivasan2020synthesis]
presents a Support Vector Machine (SVM) approach for the online synthesis of barrier functions from range data. By approximating the boundary between occupied and free space as the SVM decision boundary, the authors extract CBF constraints and solve a CBF-QP to generate safe control inputs. Our approach constructs CBFs directly from an approximation of the object SDFs and explicitly considers the potential errors in the CBF constraints when formulating the control synthesis optimization problem.
Iii Problem Formulation
Consider a robot whose dynamics are governed by a non-linear control-affine system:
where is the robot state, and are continuously differentiable functions, and is the control input. Define the admissible control input space as , where and . The state space is partitioned into a closed safe set and open obstacle set such that and . The obstacle set is further partitioned into obstacles, denoted , such that and for any . Each of these sets is defined implicitly through a continuously differentiable function , with gradient . Formally, . Note that if the robot state is in the open set , and if is on its boundary .
The robot is equipped with a range sensor, such as a LiDAR scanner, and aims to follow a desired path, relying on the noisy distance measurements to avoid collisions. The path and sensed information are defined as follows.
A path is a piece-wise continuous function .
Let be the field of view of the range sensor when the robot is in state . At discrete times , for , the range sensor provides a set of points on the boundary of each obstacle which are within its field-of-view.
Iv Obstacle Estimation via Online Signed Distance Function Approximation
Throughout the paper, we rely on the concept of Signed Distance Function (SDF) to describe each unsafe region . For each , the SDF function is:
where denotes the Euclidean distance between a point and a set. In this section, we describe an approach to construct approximations to the obstacle SDFs using the point cloud observations at time step .
Iv-a Data Pre-processing
Given the point cloud on the surface of the th obstacle at time , we can regard as the points on the zero level-set of a distance function. To reduce the scale of training data, we define the origin of the th obstacle . The points are centered around and have a measured distance of to the obstacle surface . Let be the Euclidean distance between the robot state and . Let be a small positive constant, define a point along the LiDAR ray from the robot state to that is approximately a small distance from the obstacle surface . We call the set as truncated SDF point set.
For each obstacle , the training set for the estimation task is constructed as a union of the points on the boundary and the truncated SDF points. The training set at time is .
Iv-B Loss Function
Inspired by the recent impressive results on multilayer perceptron approximation of SDF[deepsdf, implicitsurface], we introduce a fully connected neural network with parameters to approximate the SDF of each observed obstacle at time step . Our approach relies on the fact that an SDF has the property that its gradient norm satisfies the Eikonal equation
in its domain. We use a loss function that encouragesto approximate the measured distances in a training set (distance loss ) and to satisfy the Eikonal constraint for set of points (Eikonal loss ). For example, equals for points on the obstacle surface and equals for the truncated SDF points along the LiDAR rays. The loss function is defined as , with a parameter and:
The Eikonal training set can be generated arbitrarily in since needs to satisfy the Eikonal constraint everywhere. In practice, we generate
by sampling points from a mixture of a uniform distribution and Gaussians centered at the points in the distance training set
with standard deviation equal to the distance to the-th nearest neighbor with . For the distance training set , ideally, we should use as much data as possible to obtain an accurate approximation of the SDF . For example, at time , all observing data may be used as the distance training set . However, the online training time becomes longer and longer as the robot receives more and more LiDAR measurements, which makes it impractical for real-time mapping and navigation tasks. Therefore, we introduce an Incremental Training with Replay Memory approach in IV-C, which enables us to get accurate SDF estimates within reasonable training time for real-time tasks.
Iv-C Incremental Learning
After training the network when the obstacle was first observed at time step with the training set , we need to consider how to update the MLP parameters when we receive new observation sets for . We first consider an approach that updates based on the new observing set and uses the previous MLP parameters as initialization, we call it Incremental Training (IT) approach. Alternatively, we consider updating based on all observing sets up to time and uses as initialization. We call this update method as Batch Training (BT) approach.
The IT approach is efficient because at each time , it uses training data sets of approximately constant cardinality, leading to approximately constant update time during online training. However, discarding the old data
and using stochastic gradient descent to re-train the network parameterson the new data causes degradation in the neural network’s ability to represent the old data. In other words, the SDF approximation at time is good at approximating the latest observed obstacle surface but the approximation quality degrades at previously observed surfaces. The BT approach does not have this limitation since it uses all training data for training at time but, as a result, its training time increases over time. Hence, our motivation for introducing an Incremental Training with Replay Memory (ITRM) approach is to balance the trade-off between SDF estimation error and online training time, making it suitable for real-time robotic tasks.
Experience replay is an effective and commonly used technique for online reinforcement learning, which enables convergence of stochastic gradient descent to a critical point in policy and value function approximation[experiencereplay, prioritized]. This idea has not been explored for online supervised learning of geometric surfaces. The first contribution of this paper is to use replay memory for online incremental learning of SDF. At each time step , We construct an experience replay memory for each successive training step by utilizing the SDF approximations .
The replay memory associated with a signed distance function and truncation parameter is a set of points that are at most a distance from the zero-level set of : .
To construct the replay memory set , we need to generate samples from the level sets of the SDF approximation . Matan et al. [atzmon2019controlling] introduced a simple and scalable approach to sample from neural network level sets. In robotics applications, where the environments are commonly 2-D or 3-D, samples from the level sets of can also be obtained using the Marching Cubes algorithm [Lorensen87marchingcubes]. In our experiments in Sec. VI, we used the Marching Cubes algorithm to extract samples and from the zero and level-sets of , respectively, and construct the replay memory as .
Given the replay memory , our ITRM approach constructs a training set at time by combining the latest observation with a randomly sampled subset from . To make the algorithm efficient without losing significant information about the previous data, we let , which means that we pick the same number of points as the current observing set from the replay memory.
The three neural network training techniques, IT, BT, and ITRM, discussed in this section, are formally defined as follows. In all approaches, for obstacle , the MLP parameters obtained at the previous time step are used as an initial guess for the new MLP parameters at time . The SDF approximations are trained via the loss function in (3) but each method uses a different training set :
We use Softplus (SmoothReLU function) as the non-linear activations of our network, which is defined as . As the network is a combination of linear functions and Softplus, the output is a continuously differentiable function. The estimated gradient
is one input in our loss function and can be calculated via backpropagation through the neural network layers[implicitsurface].
Remark 1 (Errors in the network approximation).
We do not formally compute the error bounds in the SDF approximation and leave the theoretical computation of SDF approximation error for future work. Several recent works study the approximation power and error bounds of neural networks [Bailey2019ApproximationPO, Yarotsky2017ErrorBF]. Our ensuing analysis does take into account errors in this approximation to synthesize safe controllers. In our evaluation, cf. Sec. VI, we obtain the SDF error bounds by comparing the SDF estimations to the ground-truth object SDFs, and use them to help perform online navigation tasks.
V Safe Navigation with Estimated Obstacles
In this section, we rely on the estimated SDFs constructed in Sec. IV to formalize the synthesis of a controller that guarantees safety with respect to the exact obstacles, despite the errors in the approximation.
V-a Safe Control with Estimated Barrier Functions
A useful tool to ensure that the robot state remains in the safe set throughout its evolution is the notion of control barrier function (CBF).
Definition 4 ([ames2016control]).
A continuously differentiable function , with if and if , is a zeroing control barrier function (ZCBF) on if there exists an extended class function such that, for each , there exists with
where is the Lie derivative of along .
If the exact SDFs describing the obstacles were known, they could be used to define ZCBFs that ensure the forward invariance of . However, when the environment is observable only through online range measurements as described in Sec. IV, we only have the estimated SDFs at our disposal to define (note that is continuously differentiable). Our next result describes how to use this information to ensure the safety of . The statement is for a generic (e.g., a single obstacle) and we later particularize our discussion to the case of multiple obstacles.
Let be the error at in the approximation of , and likewise, let be the error at in the approximation of its gradient. Assume there are available known functions and such that
with and as . Let
Then, any locally Lipschitz continuous controller such that guarantees that the safe set is forward invariant.
We start by substituting in (4):
Note that since and is an extended class function, the maximum value in (7) is obtained by:
The minimum value is attained when is in the opposite direction to the gradient of , namely
Therefore, the inequality condition (7) can be rewritten as
which is equivalent to the condition in (6). ∎
Proposition 1 allows us to synthesize safe controllers even though the obstacles are not exactly known, provided that error bounds on the approximation of the barrier function and its gradient are available and get better as the robot state gets closer to the boundary of the safe set .
V-B Control Synthesis via Second-Order Cone Programming
We encode the control objective using the notion of a Lyapunov function. Formally, we assume the existence of a control Lyapunov function, as defined next.
Definition 5 ([cbf]).
A control Lyapunov function (CLF) for the system dynamics in (1) is a continuously differentiable function for which there exist a class function such that, for all :
The function may be used to encode a variety of control objectives, including for instance path following. We present a specific Lyapunov function for this purpose in Sec.VI.
When the barrier function is precisely known, one can combine CLF and CBF constraints to synthesize a safe controller via the following Quadratic Program:
where is a baseline controller, is a matrix penalizing control effort, and (with the corresponding penalty ) is a slack variable, introduced to relax the CLF constraints in order to ensure the feasibility of the QP. The baseline controller is used to specify additional control requirements such as desirable velocity or orientation (see Sec. VI) but may be set to if minimum control effort is the main objective. The QP formulation in (8) modifies online to ensure safety and liveness via the CBF and CLF constraints.
The optimization problem in (9) is equivalent to the following second-order cone program:
We first introduce a new variable so that the problem in (9) is equivalent to
The last constraint in (11) corresponds to a rotated second-order cone, , which can be converted into a standard SOC constraint [alizadeh2003second]:
Let , and consider the constraint . Multiplying both sides by and adding , the constraint is equivalent to
Taking a square root on both sides, we end up with , which is equivalent to the third constraint in (10). ∎
Vi Experiments and Results
Here we use the Pybullet simulator [coumans2020] to evaluate the performance of our proposed approach for online obstacle shape estimation and navigation with robust safety constraints. In our experiments, we use a two-wheeled ground TurtleBot, equipped with a LiDAR scanner, which has a degree field of view and meters maximum range. The simulation environments contain obstacles with various shapes that are unknown to the robot. The goal is to show that the robot can avoid obstacles with the estimated barrier functions while following the desired path whenever possible.
We model the robot motion using unicycle kinematics and take a small distance off the wheel axis as in [cortes2017coordinated] to obtain a relative-degree-one model:
where , represent the robot longitudinal and angular velocity, respectively. The state and input are , . We use a baseline controller to encourage the robot to drive at its maximum velocity in a straight line whenever possible.
We specify the desired robot path by a 2-D curve , . To capture the path-following task via a CLF constraint, we define an output function . The closest point on the reference path to the robot state is obtained by . According to the results in [ames2014rapidly], the following function is a valid CLF for the path following task:
where is a positive-definite matrix calculated by solving the Lyapunov equation of the input-output linearization.
Vi-B Network Architecture and Training Implementation
We used the MLP network architecture proposed by Gropp et al. [implicitsurface] and Park et al. [deepsdf] to represent the SDF approximator . The neural network has fully connected layers with neurons each and a single skip connection from the input to the middle layer. All internal layers use Softplus as nonlinear activations. We set the parameter in (3). For online incremental training with replay memory, at each time step , we performed iterations with the ADAM optimizer [Adam] with constant learning rate of .
|Case||in IT||in ITRM||in BT|
Vi-C SDF Estimation Results
To evaluate the accuracy and efficiency of the proposed ITRM approach for SDF estimation, we experimented with various obstacle shapes and compare the SDF prediction accuracy and the training time of our proposed ITRM approach and the IT and BT approaches described in Sec. IV-C. Fig. 3 and Fig. 4 show our experiment setup. The robot is equipped with a LiDAR with rays per scan. Zero-mean Gaussian noise with standard deviation was added to the distance measurements along each ray. To evaluate the accuracy of the SDF approximation , we sample points uniform on the surface of the ground-truth object and measure the SDF error ( varies for different objects):
The SDF error is shown in Table I for the eight object instance in Fig. 3. The SDF error of our ITRM approach is comparable with the error of BT approach and is much smaller than the error of the IT approach across the object instances. The training update time is the time needed for updating the MLP parameters from to . The average training update time across the object instances for the three methods is shown in Fig. 4. We see that as the robot moves around the environment, the average training update time of the ITRM approach remains at about while the BT approach requires more and more time for training.
Vi-D Safe Navigation Results
The second set of experiments demonstrates safe trajectory tracking using online SDF obstacle estimates to define constraints in the CLF-CBF-SOCP control synthesis optimization (10). The robot is asked to move along a reference path while avoiding a priori unknown obstacles in 8 different environments, simiar to the one shown in Fig. 1. To account for the fact that the robot body is not a point mass, we subtract the robot radius from the SDF estimate when defining the CBF: . The error bounds of CBF and its gradient are approximated based on Table I. We compare the performance of the proposed CLF-CBF-SOCP approach to a CLF-CBF-QP approach that assumes the estimated obstacle CBFs are accurate and ignores the estimation error. In the experiment, to avoid small linear velocity and ensure the feasibility of our problem, we set the relaxation penalty parameter on the diagonal of in Sec. V-B as at the beginning and make adaptive. are the penalty parameters for linear velocity, angular velocity and path following respectively. If there is no solution found at some time step, will be divided by until one feasible solution found.
We use the Fréchet distance between the reference path and the paths produced by the CLF-CBF-SOCP and CLF-CBF-QP frameworks to evaluate the path similarities. Let , be two paths in the environment as defined in Def.1, the Fréchet distance is a measure of similarity between and that takes into account the location and ordering of the points along the paths. Two reparameterizations of are continuous, non-decreasing, surjections . The Fréchet distance is computed by:
where is the Euclidean distance in our case.
In Fig. 5, we see that the robot can follow the prescribed reference path if no obstacles are nearby. Comparing the green and pink trajectories in the plots, we see that the green trajectory generated by CLF-CBF-SOCP is always more conservative since it takes the error in the CBF estimates into account. When there is an obstacle near or on the reference path, the robot controlled by the CLF-CBF-SOCP controller stays further away from the obstacles than the robot controlled by the CLF-CBF-QP controller. This agrees with the Fréchet distance results presented in Table II.
In Fig. 6, we see that the CLF-CBF-QP controller sometimes fails to avoid obstacles because it does not take the error in the CBF estimation into account. In contrast, the CLF-CBF-SOCP controller is guaranteed by Prop. 2 to remain safe if the CBF estimation is captured correctly in the SOC constraints.
To sum up, from Table II, we notice that the trajectory mismatch with respect to the reference path is larger under our proposed CLF-CBF-SOCP framework than under the CLF-CBF-QP framework if they both succeed. However, our proposed approach guarantees safe navigation while the CLF-CBF-QP controller sometimes leads to collision due to errors in SDF estimation.
|Environment||Online CLF-CBF-SOCP vs Groundtruth||Online CLF-CBF-QP vs Groundtruth|
We have proposed the Incremental Training with Replay Memory method to approximate control barrier functions. We introduced the first online incremental learning approach in estimating Signed Distance Functions that utilizes replay memory. By using replay memory in training, our proposed approach balances the estimation error and the training time, which ensures it suitable for real-time robotic tasks. The CBF approximations are then used to formulate a novel CLF-CBF-SOCP framework which ensures safe navigation in an unknown environment by taking measurement and estimation uncertainties into account to synthesize a robust navigation algorithm. Compared with the traditional CLF-CBF-QP framework, our proposed approach ensures safety in arbitrary environments.