Machine learning techniques have been recently applied to mobile robot navigation to develop robots that are capable of moving from one point to another within obstacle-occupied environments in a collision-free manner [pfeiffer2017perception, chen2017socially, faust2018prm, siva2019robot, xiao2020appld, xiao2020toward, liu2020lifelong]. Besides classical planning methods [quinlan1993elastic, fox1997dynamic]
, machine learning approaches can produce effective planners from data instead of hand-crafted rules and heuristics.
Among the thrust of learning to navigate, Imitation Learning (IL)[pfeiffer2017perception]
and Reinforcement Learning (RL)[faust2018prm] are the two main streams. While the former requires expert demonstration, the latter learns from trial-and-error. Their initial successes indicate a promising potential future for these data-driven approaches, which do not require sophisticated engineering and in-situ adjustment [xiao2017uav, xiao2020appld]. However, most learning approaches require a large amount of training data in order to produce good navigation behaviors, especially in challenging unseen environments.
Learning from Hallucination (LfH) [xiao2020toward] is a recently proposed paradigm to address the difficulty of obtaining high-quality training data. Using LfH, the robot collects training data in an obstacle-free, and thus completely safe, environment with a random exploration policy. During training, the most constrained surrounding obstacle configuration is synthetically projected onto the robot’s perception, which allows the effective action taken by the robot in the open space to be the only feasible, and therefore optimal, action. A control policy is learned with training data as if the robot had been moving in those constrained spaces. Thanks to the inherent safety of navigating in a completely open training environment, the robot can autonomously generate a large amount of training data with no human supervision or any costly failure during trial-and-error learning.
However, one major drawback of LfH is that the perception also needs to be hallucinated during deployment, with the help of a fine-resolution reference path (prior knowledge that is sometimes infeasible to obtain), and it requires other modules to address out-of-distribution scenarios. This runtime hallucination adds extra computation to the perception and becomes ineffective when only sparse future waypoints are available. Furthermore, hallucinating to be always in the most constrained environment during deployment causes the planner to become unnecessarily conservative and fail to adapt to some realistic situations, e.g., an actual open space.
The Hallucinated Learning and Sober Deployment (HLSD) approach proposed in this work eliminates the necessity of hallucination during deployment and allows the robot to perceive its actual surroundings. This “sober deployment” relaxes the requirement for a high-resolution reference path and enables the robot to adapt to the real deployment environment. Through a novel hallucination strategy during training, the robot is able to learn from many obstacle configurations as augmentations sampled in addition to the minimal unreachable set, which is the smallest set of obstacles required to cause the actions performed in the obstacle-free training environment to be optimal, given a specific goal. Our simulated experiment on 300 benchmark testbeds [perille2020benchmarking] and our real-world experiment using a physical robot show that after seeing an extensive amount of carefully designed hallucinated training data, the robot is able to efficiently produce agile maneuvers without runtime hallucination. Superior navigation performance is achieved compared to the original LfH approach with extra access to a high-resolution global path and runtime hallucination, and also to a classical navigation planner.
Ii Related Work
This section presents related work in mobile robot navigation, using classical motion planning and recent machine learning techniques.
Ii-a Classical Motion Planning
In terms of classical motion planning, mobile robot navigation is the problem of moving a robot from one point to another within an obstacle-occupied space in a collision-free manner. Such planning usually happens in the robot Configuration Space (C-Space) [latombe2012robot] and produces (asymptotically) optimal motion plans based on a predefined metric, such as maximum clearance, shortest path, or a combination thereof [lavalle2006planning]. These metrics are optimized in terms of C-space representation (e.g. cellular decomposition [lavalle2006planning]), global planning (e.g. Dijkstra’s), and local planning. For example, the optimization-based Elastic Bands (E-Band) planner [quinlan1993elastic] uses repulsive forces generated from obstacles to deform and optimize an initial trajectory, primarily to achieve maximum clearance. The Dynamic Window Approach (DWA) [fox1997dynamic] samples feasible actions and scores them based on a weighted score of distance to obstacles, closeness to a global path, and progress toward the goal. The new hallucination strategy proposed in this paper assumes the planner learned from hallucinated training data primarily seeks the shortest path. We leave hallucination strategies that optimize for maximum clearance for future work.
Ii-B Machine Learning for Navigation
Despite decades of effort to develop classical autonomous navigation systems [quinlan1993elastic, fox1997dynamic], machine learning has recently shown promise for creating competitive end-to-end planners with obstacle avoidance [pfeiffer2017perception], enabling terrain-based navigation [siva2019robot], allowing robots to move around humans [chen2017socially], and tuning parameters for classical navigation systems [xiao2020appld]. All these learning methods require either extensive or high-quality training data, such as that derived from trial-and-error exploration or from human demonstrations, respectively.
To address the difficulty in obtaining extensive or high-quality training data, self-supervised LfH [xiao2020toward] collects training data in an obstacle-free environment with complete safety, and then learns a local navigation policy through synthetically projecting the most constrained C-space () onto the robot perception. However, LfH also requires hallucination of during deployment. This hallucination “on-the-fly” requirement assumes prior knowledge such as a relatively high-resolution global path is available. Furthermore, this previous work [xiao2020toward] only works well when the linear velocity in the training set is mostly constant, due to the fact that the robot always hallucinates navigating in the most constrained scenarios and the trained local planner is therefore relatively conservative. Varying speeds in the same most constrained space will lead to ambiguity, as shown in our experiments (Section IV). In this work, we eliminate the necessity of hallucination “on-the-fly”, and therefore the requirement of an available high-resolution global path. During sober deployment, the local planner only needs a single local goal point, along with the real, rather than hallucinated, perception. In addition, the novel hallucination technique also teaches the robot to vary its speed in response to the real environment, instead of the hallucinated most constrained spaces.
In this section, we present our Hallucinated Learning and Sober Deployment (HLSD) technique for mobile robot navigation. In Sec. III-A, we start with a simplified example, which assumes the robot to be a point mass that follows a relatively simple path, to introduce the idea of a minimal unreachable set, , for a given optimal plan. We provide necessary conditions for being a minimal unreachable set, as the basis for many unreachable sets for the optimal plan. In Sec. III-B, we show how we use one special case, , to represent all other minimal unreachable sets. We also present how we adapt the point mass example to deal with real-world robots. In Sec. III-C, we introduce a sampling method to generate augmentations to the representative minimal unreachable set () to generalize for sober deployment.
Iii-a Point Robot Example
We first present a simplified example, which assumes a point mass robot going through three non-colinear configurations. In this case, the robot’s workspace is exactly the C-space. We use the same notation used by Xiao et al. to formalize LfH [xiao2020toward]: given a robot’s C-space partitioned by unreachable (obstacle) and reachable (free) configurations, , the classical motion planning problem is to find a function that can be used to produce optimal plans that result in the robot moving from the robot’s current configuration to a specified goal configuration without intersecting (the interior of) . Here, a plan is a sequence of low-level actions (i.e. is the robot’s action space). Considering the “dual” problem of finding , LfH [xiao2020toward] includes a method to find the (unique) most constrained unreachable set corresponding to , . In this work, however, we introduce the definition of a (not unique) minimal unreachable set, , corresponding to :
In other words, is a minimal set of obstacles that lead to being an optimal plan. Any path that arises from an optimal plan can be approximated by connected line segments. The simplest case of a robot path following an optimal plan to move from to (except a straight line) is composed of configurations on two straight line segments, and . The intermediate turning point is defined as . Since is one configuration on the robot’s path, we say goes through (Fig. 1). In the following, given a point-mass robot moving from to according to some optimal plan computed by , we show two necessary conditions () for an unreachable set () to be a minimal unreachable set (), to aid in identifying the representative unreachable set () used in Sec. III-B.
If , then the optimal plan for the unreachable set , , must go through .
Assume otherwise. Since (Definition III.1), for , the path arising from is shorter than the one arising from . Since we assume does not go through , adding back to does not affect the feasibility of for and does not change the path length. The path arising from is still shorter than the one arises from in , thus contradicting the optimality of for .
( 1) .111 is a topological space, is a closed set, and is an open set. is a boundary point of both and . The robot can come arbitrarily close to the obstacles while remaining in [lavalle2006planning].
Consider any circle that centers at with radius . intersects at and at (Fig. 1 a). Assume there exists no configuration such that and . Consider the path : since is feasible, then and are feasible. Moreover, by assumption is also feasible. But is shorter than since is shorter than due to triangle inequality. This contradicts the optimality of . Therefore, that belongs to some . Since this is true for , and is a closed set [lavalle2006planning], the limit point .
We name the union of all robot configurations in the triangle defined by , , and as . On the other side of line segment , we name the union of all configurations in the half ellipse, whose focal points locate at and , and whose major axis length is , as . We define (the grey area in Fig. 1).
( 2) , ,
Assume , . Then is a feasible path. The length of is the sum of distances from (inside ellipse) to the two ellipse focal points and , which is, per definition of an ellipse, shorter than its major axis length . This contradicts the optimality of .
Based on the two necessary conditions of (Proposition 1 and III.3), one class of minimal unreachable sets could be simply constructed by connecting with some point on the left boundary of the ellipse with a straight line ( in Fig. 1 a), or two line segments and ( in Fig. 1 b), if not all configurations on are in . In particular, for efficiency, we simply represent all minimal unreachable sets with a special case, (Fig. 1 d), which is all configurations along the straight line and (the reflective symmetry point of with respect to ). Empirical evidence of this approximation’s sufficiency for the purpose of learning will be provided in Sec. IV. Here, we further provide one more observation to help develop intuition regarding how to identify a .
, , .
Assume and , only two possibilities exist:
(1) is outside the entire (left and right half) ellipse whose focal points locate at and , and whose major axis length is : based on Proposition III.1, the optimal plan must go through . The shortest possible path, which goes through , is , if it exists. However, based on the definition of ellipse, for any outside the ellipse, . This contradicts the optimality of .
(2) is inside the entire ellipse, but outside : must be in the right half of the ellipse but outside (Fig. 1 e). Again, must go through . The shortest possible path which goes through is , if it exists, which must intersect either or at some point . Due to substructure optimality (i.e. a sub-path of a shortest path is still a shortest path), the shortest path between any points on must be its sub-path. If is on , then the shortest path from to must be . is longer than , and therefore not optimal. If is on , the shortest path between and must be . is longer than , and therefore not optimal. In both cases, the shortest possible path going through is longer than . This contradicts the optimality of .
Therefore, , , .
Iii-B Realistic Nonholonomic Robot
Based on the propositions discussed in Sec. III-A, we present the hallucination technique for a realistic robot. In realistic scenarios, we approximate all in with . We hypothesize that all are sufficiently similar that using just one leads to learning that is just as good as if we used them all, especially when (1) , , and extracted from realistic trajectories are close to each other, and (2) is instantiated in terms of discrete LiDAR beams. Empirical evidence of this sufficiency will be provided in Sec. IV. However, a realistic robot cannot be modeled as a simple point mass because: (1) the size is not negligible and we need a path for the center of the robot that causes no part of the robot to collide with an obstacle; (2) nonholonomic robots cannot change motion direction instantly, so we need to generalize to a continuously turning path from the piece-wise point mass example. In Sec. III-B, we aim to adapt the point mass example (Sec. III-A) to real-world robots, and therefore need to address these differences.
To address (1), we define one point to the left and another to the right of the centroid of the robot, offset by the robot width, as footprint points, as shown in Fig. 2. The instantaneous linear velocity along the line between these two footprint points is zero for nonholonomic vehicles. The polygon defined by a sequence of footprint points must belong to free space. The actual path executed by the optimal plan follows the middle of this area. To address (2), we define as configurations between the current and goal configurations with a non-zero angular velocity (). Given the current configuration, each , and each ’s next configuration, their left or right footprint points are treated as , , and in the point robot case: based on the sign of , the robot turns left or right, and the left or right footprint points are chosen. For each triple of point robot , , and , for efficiency, we approximate all different with (Fig. 1 d). For a realistic optimal plan with actual (current) and (goal) and multiple point robot triples in between, we define the union of all as .
In particular, we define an “opposite” function of : as the minimal hallucination function.222Note the inverse function does not exist. Technically, can be uniquely determined by and , but we include it as an input to for notational symmetry with . This function finds based on the fact that is an optimal plan. As visualized in Fig. 2, is instantiated in terms of discrete LiDAR range readings. For each LiDAR beam, we define a minimum and a maximum range, which limit the possible range readings of this particular beam based on the optimal plan . The minimum range of all beams is determined by the left and right boundary, as configurations within the boundary must be in . We project all onto the corresponding LiDAR beams and for the beams directly intersect any , the maximum range is set as the distance from the robot to the intersection point. For other beams, the maximum range is simply the LiDAR’s physical limit, or manually pre-processed to a certain threshold.
Iii-C Sampling between Min and Max Range
Given the representative minimal unreachable set, we want to find all possible unreachable sets, or their sensor readings, that could lead to being the optimal plan. Based on the LiDAR scan with a minimum and maximum range for each beam (end of Sec. III-B), a sampling strategy is devised to create many obstacle sets, , in which is optimal.
Our sampling strategy aims at creating different range readings that (1) resemble real-world obstacles, and (2) respect uncertainty/safety. For (1), most real-world obstacles have a certain footprint, and their surface contribute to continuity among neighboring beams. Starting from the first beam, a random range is sampled between min and max with a uniform distribution. Moving on to the neighboring beam, with a probability, we increase, or decrease, the previous range by a small random amount, and assign the value to the current beam. This practice is to simulate the continuity in neighboring beams. We make sure this value is within the min and max range of that beam. With probability , we start from scratch and randomly sample between this beam’s min and max values. This practice is to simulate the scenarios where the next beam misses the current obstacle, and reaches another one or does not reach any obstacle at all. For (2), we add an offset value as a function of the optimal plan to the ranges. For example, given a faster speed of , a larger positive offset is added to the range reading (obstacles are farther away), because faster motion is correlated with more uncertainty or less safety (details can be found in Sec. IV-A). One example scan sample is shown in Fig. 2 as blue ’s.
The entire HLSD pipeline is described in Alg. 1. The inputs to the algorithm are a random exploration policy in open space; the minimal hallucination function ; a sampling count of hallucinated to be generated per data point; the offset function; the probability ; and a parameterized planner . For every data point in collected using in open space (line 2), we hallucinate (line 5) and generate the min and max values for every LiDAR beam (line 6). Lines 8–15 correspond to the sampling technique to generate random laser scans. We instantiate as LiDAR readings and add it to (line 16). This process is repeated sampling count times for every data in . Finally, we train
with supervised learning (line 19). This hallucinated learning enables sober deployment with perception of the real configurationwithout runtime hallucination (Lines 21–22).
Simulated and physical experiments are conducted to validate our hypothesis that HLSD can achieve better performance (faster, smoother, safer) than a classical method and LfH with runtime hallucination. In our experiments, we use a Clearpath Jackal robot, a four-wheeled, differential-drive, Unmanned Ground Vehicle (UGV), running the Robot Operating System (ROS) move_base navigation stack. Its DWA local planner is replaced with HLSD.
: For , instead of requiring a high-resolution global path from the global planner (Dijkstra’s) to construct in LfH [xiao2020toward], we only query a single local goal on the global path 1m away from the robot at each time step, and is the origin in the robot frame (orientation is ignored for simplicity). The UGV is equipped with a 720-dimensional LiDAR with a 270 field of view, and we clip the maximum range to 1m to reduce the input space (). The planning horizon of is set to 1, i.e. only a single actionxiao2020toward].
: randomly picks a target pair and commands the robot to reach that speed with constant increments/decrements considering acceleration limit. After reaching , keeps that command with some probability (0.9) or otherwise generates a new target command. We limit the output m/s and rad/s. During training in a simulated open space, control inputs ( and ) and robot configurations (, , and ) are recorded. Unlike the dataset collected by LfH [xiao2020toward], which mostly contains m/s, our dataset contains a variety of values. 12585 data points are collected in a 505s real-time simulation, including a variety of motions in an open space.
sampling count, , and offset: We set sampling count to 10 and to 0.48. The offset() function linearly maps current in the range [0.3, 1.0]m/s to an offset value between [0, 1]m. Considering the fact that a real robot also needs to turn even in open space due to nonholonomic constraints, as opposed to an ideal point mass robot which does not, we also hallucinate for configurations where m/s. Considering highly constrained spaces, we additionally hallucinate the most constrained for configurations where m/s. So . Training takes less than three minutes on a NVIDIA GeForce GTX 1650 laptop GPU.
After computing with Alg. 1, we use the same Model Predictive Control model in LfH [xiao2020toward] to check for collisions. When a collision is detected, the robot enters a two-phase recovery behavior: it first queries the global path immediately in front of the robot, and rotates to align the robot heading with the tangential direction of the global path. If this recovery behavior is still not safe, the robot backs up at m/s. Since LfH learns only from the most constrained C-space, it requires runtime hallucination to match with training data, and a Turn in Place module to drive the robot out of out-of-distribution scenarios. Neither of those components are required by HLSD. Because our dataset contains varying m/s, the LfH speed modulation module to adapt mostly constant to real environments is not necessary either. The robot is able to react to the real obstacle configuration by using alone because covers a wide range of distributions.
Iv-B Simulated Experiments
We first use a metric ground navigation benchmark dataset with 300 navigation environments [perille2020benchmarking] to compare (1) DWA, (2) original LfH trained on the mostly constant 0.4m/s dataset with hallucinated (LfH 0.4), (3) LfH trained on our varying speed 1.0m/s dataset with (LfH 1.0), (4) HLSD trained on the 0.4m/s dataset with augmentations to (HLSD 0.4), and (5) HLSD trained on the 1.0m/s dataset with (HLSD 1.0). Although DWA’s default max linear velocity is 0.5m/s, for a fair comparison, we set DWA’s max linear velocity to the same as HLSD’s (1.0m/s). We find that by also doubling DWA’s default sampling rate for linear and angular velocity (12 and 40), the robot’s performance is roughly the same as when using the default parameters (but at double the speed). Simulated results are shown in Fig. 4.
In each of the 300 navigation environments generated by Cellular Automaton, the robot navigates between a specified start and goal location without a map (example trials shown in Fig. 3
). We record the traversal time for each trial, with a maximum of 50s. Three trials are conducted for each planner in each environment, resulting in a total of 4500 trials. The difficulty of the 300 environments are ordered from left to right based on the DWA performance (blue line). The performances of other planners are plotted as dots, for which a line is fit using linear regression. LfH 1.0 (red) fails many trials. The reason is learning from a dataset with varying speed and always hallucinating the most constrained spaces will cause ambiguity: the same most constrained C-space can map to different plans, which confuses the learner. LfH 0.4 (magenta) and HLSD 0.4 (cyan) achieve similar performance and are less sensitive to environment difficulty, especially LfH 0.4, but LfH 0.4 requires runtime hallucination and other components. Note DWA has 1.0m/s max speed, while LfH 0.4 and HLSD 0.4 have up to 0.6m/s max speed, modulated from mostly 0.4m/s learned from the dataset. DWA has advantage in easy environments (left) due to its fast speed, but in difficult ones (right), LfH 0.4 and HLSD 0.4 are more stable. Also having 1.0m/s max speed, HLSD 1.0 achieves the best result, outperforming all alternatives across the entire difficulty range. The means and standard deviations of all five planners calculated from all trials are shown in Tab.I
. LfH 1.0 has the largest average time and variance, because the learner is confused by the varying training labels in the same most constrained spaces. DWA has large time and also high variance due to its sampling nature. Again, HLSD 0.4 performs similarly as LfH 0.4 does. HLSD 1.0 still outperforms all other planners.
|DWA||LfH 0.4||LfH 1.0||HLSD 0.4||HLSD 1.0|
Iv-C Physcial Experiments
We also deploy the same set of local planners in a physical test course, five trials each (Fig. 5 top). The results are shown in Tab. II. The complicated obstacle course causes DWA to execute many recovery behaviors, and the robot takes a long average time with large variance to finish the traversal. LfH 0.4 and HLSD 0.4 are both able to successfully navigate the robot through, with similar traversal times. But note that LfH 0.4 requires a fine-resolution global path for hallucination during deployment and the Turn in Place module, while HLSD does not and only reacts to the real obstacles without any other extra components. In this physical obstacle course, LfH 1.0 fails every trial due to multiple collisions. Our HLSD 1.0 achieves the best performance both in terms of average time and standard deviation. HLSD deployment in a natural cluttered environment is shown in Fig. 5 bottom.
|DWA||LfH 0.4||LfH 1.0||HLSD 0.4||HLSD 1.0|
We introduce HLSD, a self-supervised machine learning technique for mobile robot navigation with safety guarantee during training. Similar to LfH [xiao2020toward], the robot safely learns in a completely obstacle-free environment and its perception is hallucinated with obstacle configurations where the actions taken in the free space remain optimal. Instead of overfitting to the most constrained spaces during training and requiring runtime hallucination and other modules to adapt to actual environments, the new HLSD pipeline allows the robot to navigate with the learned planner alone in response to realistic perception. By leveraging a large body of carefully designed hallucinations used for training, the learned planner does not need to deal with many out-of-distribution scenarios and has its own capability to address uncertainty/safety in the real-world during deployment.