Generating Robust Supervision for Learning-Based Visual Navigation Using Hamilton-Jacobi Reachability

12/20/2019 ∙ by Anjian Li, et al. ∙ Simon Fraser University berkeley college 20

In Bansal et al. (2019), a novel visual navigation framework that combines learning-based and model-based approaches has been proposed. Specifically, a Convolutional Neural Network (CNN) predicts a waypoint that is used by the dynamics model for planning and tracking a trajectory to the waypoint. However, the CNN inevitably makes prediction errors, ultimately leading to collisions, especially when the robot is navigating through cluttered and tight spaces. In this paper, we present a novel Hamilton-Jacobi (HJ) reachability-based method to generate supervision for the CNN for waypoint prediction. By modeling the prediction error of the CNN as disturbances in dynamics, the proposed method generates waypoints that are robust to these disturbances, and consequently to the prediction errors. Moreover, using globally optimal HJ reachability analysis leads to predicting waypoints that are time-efficient and do not exhibit greedy behavior. Through simulations and experiments on a hardware testbed, we demonstrate the advantages of the proposed approach for navigation tasks where the robot needs to navigate through cluttered, narrow indoor environments.



There are no comments yet.


page 2

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Autonomous navigation is fundamental to control and robotics. Following the success of deep learning, visual navigation has gained popularity. One appeal of visual navigation – which involves using one or more cameras and computer vision to perceive the environment to reach navigational goals – is that cameras are cheap, light weight, and ubiquitous.

Typically, a geometric map of the environment is used for navigation ([thrun2005probabilistic, slam-survey:2015, lavalle2006planning]); however, real-time map generation can be challenging in texture-less environments or in the presence of transparent, shiny objects, or strong ambient lighting ([alhwarin2014ir]). In contrast, end-to-end (E2E) learning approaches have been used for locomotion ([gandhi2017learning, kahn2017uncertainty, sadeghi2016cadrl, kang2019generalization]) and goal-point navigation ([zhu2016target, gupta2017cognitive, khan2017memory, kim2015deep, pan2017agile]

) that side-step this explicit map estimation step, but suffer from data inefficiency and lack of robustness (

[recht2018tour]). Consequently, a number of papers seek to combine the best of learning with optimal control for high-speed navigation ([richter2018bayesian, jung2018perception, loquercio2018dronet, bansal2018chauffeurnet, muller2018driving, meng2019neural]), race-track driving ([pmlr-v78-drews17a, drews2019vision]), and drone racing ([kaufmann2018beauty, kaufmann2018deep]). In particular, [bansal2019combining] combines ideas from optimal control and computer vision by having a convolutional neural network (CNN) predict waypoints instead of control signals, and using optimal control to obtain the control for reaching the waypoints. This hybrid approach greatly improved generalizability: a CNN trained in simulation could be successfully deployed on a real robot without additional training or tuning. However, the inevitable errors in waypoint predictions during the test time could lead to unintended robot trajectories, ultimately resulting in collisions with the obstacles. This is particularly problematic when the robot needs to navigate through cluttered environments or narrow openings, as the error margin in such scenarios is often small.

Figure 1: Hardware experiment 1. On the left, 3rd and 1st person views of the Turtlebot2 testbed are shown along with a trajectory (red) to the goal (green). On the right, a birds-eye view of the environment is displayed. The Turtlebot’s starting location is shown as a blue circle.

Contributions: In this paper, we build on the framework in [bansal2019combining] and propose a novel reachability-based method to generate robust waypoints for supervising the CNN. Our key insight is to model the CNN prediction error as “disturbance” in the system dynamics and generate waypoints that are optimal under the worst-case disturbances, ensuring robustness despite the prediction errors. In the context of other work in the safe learning literature such as [Fisac2019] and [Bajcsy2019], which wrap reachability-based safety controllers around policies being learned, we provide an alternative that pre-emptively uses disturbances when generating training data to alleviate the effect of CNN prediction errors.

Unlike [bansal2019combining]

, which relies on distance-based heuristics, our method involves computing value functions by solving static Hamilton-Jacobi (HJ) (


) partial differential equations (PDEs). The value functions represent the time until goal-reaching and time until collision despite the worst-case disturbances, given system dynamics and a known environment. These value functions are combined into a cost map that precisely quantifies the quality of waypoints and considers all possible combinations of states by construction. This leads to less greedy navigation behavior and significant improvement in the success rate during the test time.

Overall, our approach leads to less greedy navigation behaviors that are robust to CNN prediction errors. Through simulations and real-world experiments we demonstrate that practical scenarios such as safely moving through narrow doorways become possible with the proposed approach.

2 Problem Setup

We consider the problem of autonomous navigation in a priori unknown static environment. Starting from an initial position, the robot needs to reach a goal position . We model our ground vehicle as a four-dimensional (4D) system with the following dynamics:


where the state at time consists of the position , speed , and heading . The control is acceleration and turn rate, . The robot is equipped with a forward-facing, monocular RGB camera mounted at a fixed height and oriented at a fixed pitch. At time , the robot receives an RGB image of the environment , , the state , and the target position . The objective is to obtain a control policy that uses these inputs to guide the robot to within a certain distance of .

3 Background

We build upon the learning-based waypoint approach to navigation (LB-WayPtNav) proposed in [bansal2019combining]. However, unlike LB-WayPtNav, we use a HJ Reachability-based framework to generate supervision data. We now provide a brief overview of LB-WayPtNav and HJ reachability.

3.1 LB-WayPtNav

Figure 2: LB-WayPtNav framework from [bansal2019combining].

LB-WayPtNav combines a learning-based perception module with a dynamics model-based planning and control module for navigation in a priori unknown environments (see Fig. 2).
Perception module. The perception module is implemented as a CNN that takes as input a pixel RGB image, , captured from the onboard camera, the target position, , specified in the vehicle’s current coordinate frame, and vehicle’s current linear and angular speed, , and outputs the desired state or a waypoint .
Planning and control module. Given a waypoint , the system dynamics in Eqn. (1) are used to plan a spline-based trajectory to , starting from the current state of the vehicle. This leads to a smooth, dynamically feasible, and computationally efficient trajectory to the waypoint. An LQR-based feedback controller tracks the trajectory. The commands generated by the LQR controller are executed on the system for seconds, and then a new image is used to generate another waypoint and trajectory. This process is repeated until the robot is within a certain distance of .

3.2 The Time-to-Reach Problem in Reachability Analysis

Consider a dynamical system described by , where is the state, and are the control and disturbance respectively. In this paper, the control represents the actions a robot can take to change its state, and the disturbance primarily models CNN prediction errors. Following [Yang2013] and [Takei10], we define the time-to-reach (TTR) value function denoted , which represents the minimum time required to drive the system to the goal set while avoiding the obstacle set , despite disturbances. The control minimizes this time and disturbance maximizes: . We also define the time-to-collision (TTC) value function, , which represents the maximum time until collision with the obstacle set assuming that the control is optimally avoiding obstacles under the worst-case disturbance: . Applying the dynamic programming principle, we can obtain and respectively as the viscosity solution for the following stationary HJ PDEs:


4 Reachability-based Supervision for Waypoints

In the perception module presented in Fig. 2, from the start position, the robot sequentially observes the image , linear speed and angular speed at time to predict a waypoint with a CNN. This waypoint prediction is conducted at a fixed replan frequency until the robot reaches the goal , defined to be positions within certain distance to .

To generate supervision for safe and efficient waypoints, we propose a novel reachability expert to autonomously navigate in simulation and collect training data. Specifically, given an obstacle map and a goal area , one can compute corresponding TTR and TTC value maps, which are integrated in the cost function of a model predictive control (MPC) optimization problem. By solving this MPC problem, the optimal waypoint is obtained, and at time , the image is rendered, and linear and angular speed are measured in simulation environment. Finally, we repeat the above procedure in different navigation tasks until sufficient data-label pairs are obtained for the training dataset. The entire procedure is illustrated in Fig. 3.

Figure 3: Workflow of training data generation using reachability expert.

4.1 TTR and TTC Computations

We add disturbances to Eqn. (1) to describe the expert’s system dynamics


with states , controls and disturbances . Also, , , and are upper bounds for acceleration, angular speed and disturbances in turn rate, and is the circular upper bound for disturbances in and components of speed.

For every navigation task, we initialize a goal position on an known obstacle map . We define the goal area to be , and the map as . and are then computed based on Eqn. (2). and guide the reachability expert in a 4D state space for goal reaching and collision avoidance.

Incorporating worst-case disturbances leads to and having more conservative values, which results in less greedy expert trajectories in a nuanced manner. Greediness often makes the robot incapable of going around obstacles and having a view of the environment that informs the robot about possible routes to the goal. Crucially, the conservatively safe trajectories address the prediction errors from neural networks, since a minor deviation will not lead to collision. Note that is assumed to be known only during training; no such assumption is made during the test time, during which the robot only relies on onboard sensors for navigation.

4.2 Waypoint Supervision Generation

We use an MPC framework to generate waypoints and expert trajectories. To achieve efficient and safe navigation, we trade off between reaching the goal faster and staying further from obstacles. Thus we design a novel cost function, ReachabilityCost , to be a combination of TTR and TTC:


where is the computational upper bound for TTC, and is the scale factor.

The expert uses MPC to plan, in a receding horizon manner, a trajectory of time horizon until the goal is reached. Starting from , in every MPC optimization problem, we discretize the time horizon to be . At any time index , we denote and . Then the following MPC problem is sequentially solved in :

subject to

To solve Eqn. (4.2) in , the reachability expert first samples a local waypoint grid in the heading direction as possible final states : , and compute dynamically feasible spline trajectories to each waypoint using differential flatness of Eqn. (3) ([WALAMBE2016601]). Next, the expert filters out the invalid waypoints whose trajectory violates control constraints. Finally, the solution trajectory with the minimum cost is chosen, and the corresponding waypoint is added to the training data set along with the image and speeds at time . By solving the MPC problem many times, we obtain the expert dataset , where is the index of the MPC problem, and is the total number of data points (and the total number of MPC problems solved).

5 Summary of Simulation Results

With the generated expert dataset , we train a CNN that implicitly learns how to predict good waypoints given the current input image and robot’s system dynamics. Then, we test our model in a novel environment in simulation without an a priori known map.

Dataset: We use Stanford large-scale 3D Indoor Spaces dataset [armeni20163d] as our simulation environment, which are 3D scans of real world buildings. Two buildings are used for data generation and training; the 3rd held-out building is used as the test environment, which has significant differences in the object appearance and layout. For navigation tasks in training and testing, we sample various start and goal positions that require the robot to go through narrow openings.

Implementation details: We train the CNN in Fig. 2 with 150k data points from the reachability expert,

k. The mean squared error loss is used and optimized using the Adaptive Moment Estimation (ADAM) algorithm with a learning rate of

and weight decay of .

Metrics: We use both statistics and trajectory plots to present the test results. For statistics, we use success rate, average time to reach the goal area (for successful tasks), acceleration and jerk to measure the quality of trajectories. With trajectory plots, we analyze the robot’s specific behaviors.

Baselines: We compare our approach with the HeuristicsCost designed in [bansal2019combining] for the MPC framework:


where is the distance to the nearest obstacle, is the distance to the goal, and and are scaling factors. We compare our ReachabilityCost in Eqn. (5) to the HeuristicsCost in Eqn. (7) for two different navigation frameworks: waypoint navigation (WayPtNav) and end-to-end (E2E) learning, resulting in a total of 4 methods. WayPtNav maps the current image and controls inputs to the waypoint (as in Fig. 2), where E2E learning directly outputs the control commands given the same inputs. We also compare against an additional baseline, ReachabilityCost-NoDstb, that does not incorporate disturbances in the system dynamics during the data generation.

5.1 Expert Performance

We select m/s, rad/s, and m/s to match the specifications of the Turtlebot 2 used in the hardware experiments (Sec. 7). We set m/s and rad/s to account for prediction errors, and to prioritize collision avoidance. All expert trajectories are generated according to Section 4.2, where the replanning is done every s to collect training data. In Fig. 6 (a) to (c), we compare the expert trajectories obtained by HeuristicsCost and ReachabilityCost. The reachability expert uses the full system dynamics for optimizing waypoints. As a result, it maintains an appropriate orientation and speed when going through the narrow openings. Moreover, due to the presence of disturbances, it takes a conservative path, always staying near the middle of narrow openings, resulting in collision-free trajetcories even when there is prediction error. In contrast, HeuristicsCost takes a greedier path to approach the goal. To address CNN prediction error, [bansal2019combining]

use an obstacle padding which makes narrow openings impossible to enter.

5.2 Test Results

Agent Success (%) Time taken (s) Acceleration (m/s) Jerk (m/s)
WayPtNav-ReachabilityCost 63.82 21.00 ±8.00 0.06 ±0.01 0.94 ±0.13
WayPtNav-HeuristicsCost 52.26 18.82 ±5.66 0.07 ±0.02 1.06 ±0.15
WayPtNav-ReachabilityCost-NoDstb 49.24 16.19 ±4.8 0.07 ±0.01 0.98 ±0.16
E2E-ReachabilityCost 8.04 19.55 ±4.72 0.07 ±0.01 2.16 ±0.30
E2E-HeuristicsCost 31.66 25.56 ±9.85 0.26 ±0.06 9.06 ±1.94
Table 1: Quantitative Comparisons in Simulation: We compute four metrics-success rate, average time to reach the goal, acceleration and jerk-on 200 navigation tasks with a replan frequency of 4Hz. The proposed method, WayPtNav-ReachabilityCost, is most successful at completing novel navigation tasks. Without the disturbances incorporated in the dynamics, ReachabilityCost takes the shortest time to reach the goal, but the success rate largely drops because of prediction errors. For E2E learning, the success rate is generally lower and the trajectories are less smooth (indicated by high average jerk).

We compare the different methods in Table 1. WayPtNav-ReachabilityCost achieves the highest success rate and least acceleration and jerk. WayPtNav-ReachabilityCost-NoDstb takes the shortest time to reach the goal, but experiences a notable drop on the success rate.

Figure 4: Success rate in tasks of different difficulties. Difficulties are assessed according to minimum distance to obstacles along trajectories, divided into Hard ( m, 242 tasks), Medium ( m m, 155 tasks) and Easy ( m, 102 tasks). ReachabilityCost has significant advantage in Hard tasks.

E2E learning results in lower success rate and more jerky trajectories for both methods. Notably, ReachabilityCost has a significant lower success rate with E2E learning compared to WayPtNav.

6 Analysis of Simulation Results

6.1 Comparison with WayPtNav-HeuristicsCost

As discussed in Sec. 5.1, the reachability expert is less greedy and more robust, which enables it to navigate through cluttered environments and narrow openings. This results in a higher success rate during test time, as shown in Fig. 4, where we compare the two methods on 500 navigation tasks with varying difficulties. Here, we measure the difficulty of a task by the opening size that the robot must navigate through on its way to the goal. ReachabilityCost has much higher success rate on the “Hard” level, indicating that it makes robot more adept at maneuvering in narrow environments.

6.2 Effect of Replanning Frequency

Figure 5: Success rates improve for both methods as the replanning frequency is increased (up to 4 Hz). ReachabilityCost benefits considerably more with higher replan frequencies.

Replanning frequency represents how often the robot predicts new waypoints. Fig. 5 shows the success rate on 200 navigation tasks for 5 different replanning frequencies: 0.67 Hz, 1 Hz, 2 Hz, 4 Hz and 6.67 Hz. As we increase the frequency, the success rate improves and reaches the peak at 4 Hz for both experts. Above 4 Hz, the success rate drops dramatically.

In general, a higher replanning frequency helps the learning system react faster to the unknown environment; however, if it is too high, the lack of visual memory in the WayPtNav framework results in myopic decisions, leading to a drop in the success rate. ReachabilityCost benefits more from the higher replanning frequencies compared to HeuristicsCost, as the greedy behaviors from the latter tend to drive the robot to cut corners, often leading to situations that are hard to recover from.

6.3 Effect of Adding Disturbances

Adding disturbances in the dynamics is crucial for improving test performance: it not only accounts for dynamics uncertainties, but also models neural network prediction errors. Without disturbances, trajectory can be “too optimal” so that a little deviation of robot’s state or minor errors from the CNN results in collision. This is evident from Table 1, where the success rate drops from to in the absence of disturbances.

We examine reachability expert trajectories with and without the disturbances in Fig. 6 (b) and (c). Without disturbances, the expert chooses a path close to the obstacles, while with disturbances, the expert stays near the middle of the road. Although both experts succeed in reaching the goal, disturbances lead to more robust trajectories, which translate to test time as shown in Fig. 6 (d) and (e)). Without disturbances, the robot tries to avoid the wall but fails, while with disturbances, the robot is able to stay in the middle of the opening, and pass through a very narrow doorway.

6.4 Comparison with E2E Learning

Our conclusions here are consistent with the findings in [bansal2019combining] – the model-based approach (WayPtNav) leads to a higher success rate and significantly smoother trajectories. The success rate of E2E learning declines further for the reachability expert as the control profiles are even more nuanced, making it challenging to learn them.

6.5 Failure Cases

Since we do not construct a map or have any visual memory in this framework, the robot struggles in the navigation scenarios where it needs to “backtrack”. In addition, when the room layout is too different from the training time, the CNN fails to predict good waypoints.

Figure 6: Trajectory comparison. (a), (b), and (c) are expert trajectories; (d) and (e) are test trajectories. In (a), the baseline expert starts with a greedier path and fails to enter the narrow opening due to hard obstacle padding (medium grey) used during waypoint optimization. Comparing (b) and (c), the reachability expert can safely go through the openings, but is more likely to stay in the middle of the road with disturbances incorporated in dynamics. This ability is transferred to test times in (d) and (e), where the robot with disturbances added is more resistant to the prediction errors from the neural network and manages to take a collision free path to the goal.

7 Hardware Experiments

We tested our framework on a Turtlebot 2 hardware testbed (Fig. 1), using monocular RGB images from the onboard camera for navigation. We tested the ReachabilityCost and HeuristicsCost CNNs directly on the Turtlebot without any additional training or fine-tuning.

Experiments were carried out in 3 separate areas of Simon Fraser University, each of which were absent from the training set. An outline of each experiment and the trajectories taken by the Turtlebot with each CNN are shown in Figures 1 and 7. Video footage of all experiments can be found at

Figure 7: Experimental scenarios 2 (top) and 3 (bottom), shown in 3rd and 1st person along with birds-eye view. The goal is marked in green, and the trajectory taken by the ReachabilityCost CNN to the goal is shown in red. The Turtlebot’s starting location is shown as a blue circle.

Each scenario required the Turtlebot to traverse narrow spaces and doorways, both of which have shown a low success rate for the HeuristicsCost CNN. For the first scenario, the Turtlebot maneuvered through a narrow doorway, and then around the nook behind the door to reach the goal (Fig. 1). The second and third scenarios required the Turtlebot to move from an open room into a narrow corridor and from a cluttered environment into a hallway, respectively (Fig. 7).

For each scenario, the HeuristicsCost CNN was unable to maneuver through the doorways, and collided with the wall at full speed, while the ReachabilityCost CNN successfully navigated through the doors and reached the goal. The ability to navigate through narrow environments and doorways is a key improvement seen in the ReachabilityCost CNN. For both methods, the neural networks trained using end-to-end learning were unable to reach the goal.

8 Conclusion

In this paper, we present a novel method to generate training data for waypoint prediction in visual navigation, using reachability analysis. Our method helps the neural network to learn to predict efficient and safe waypoints given system dynamics and observations of the environment. We also use disturbances in dynamics to model neural network prediction errors and greatly enhance its robustness. Simulation and real robot experiments demonstrate higher success rate and smoother trajectories in navigation tasks with our method, which crucially enables the robot to pass through narrow passages such as doorways. Immediate future work includes adding memory in our navigation framework, investigating the data mismatch between training and test scenarios to better transfer the expert performance in test time, and incorporating human agents.