Autonomous robotic navigation in off-road environments is important for many tasks, including planetary exploration [2, 18] and search-and-rescue missions . In an off-road setting, a fundamental challenge in the local planning problem is that both the geometry (e.g., positive and negative obstacles) and the semantics (e.g., terrain type, time of year) impact the speed and safety with which a robot can traverse the environment. For example, Fig. 1 shows a mobile robot surrounded by grass and foliage. While a purely geometry-based system might consider much of the scene as obstacles (e.g., since a lidar pointcloud would have many returns from the leaves and grass), the semantics suggest much of the vegetation is sufficiently soft that the vehicle could drive through it, but trees and rocks would likely stop the robot.
To capture these semantics, many approaches train semantic segmentation modules for camera images [35, 36] or lidar pointclouds , which can reduce the dimensionality and enable learning in a low visual fidelity simulation environment [29, 7]. However, the ontology of existing labeled datasets for off-road navigation [35, 39, 11, 19] is a fundamental limitation for capturing traversability. For instance, the 20 and 24 classes in [39, 11] only contain the broad “bush”, “grass”, and “tree” classes for vegetation, where the varying degree of traversability within each class is not captured. For instance, some bushes can be driven through, some will slow the robot down, and others will stop the robot. Alternatively,  contains labels with finer granularity such as “traversable grass” and “non-traversable low vegetation”, but these are specific to the large vehicle in mind during the expensive manual labeling procedure, and thus would not generalize to other vehicles.
Furthermore, because each individual mission or operator can have different objectives, it is important that the risk tolerance of the planner (e.g., whether to take a shortcut that has a 1% chance of causing the robot to get stuck) can be quickly adjusted. Risk-aware planning has been extensively studied, e.g., [21, 27, 6, 3, 15]
, with different notions of risk such as collision probability and classification uncertainty. Recent works such as[10, 8] have adopted conditional value at risk (CVaR) as the risk metric, which has been analyzed in-depth in  about why CVaR allows robots to assess risks rationally. Alternatively, other work tries to directly learn policies, dynamics models, or cost functions [31, 32, 4, 25, 37, 12, 17] that satisfy the desired risk tolerance. However, these off-road techniques often require a non-intuitive cost function tuning procedure or a complete re-training of the model to adjust the risk tolerance.
To address these issues and bridge the gap between semantic perception and risk-aware planning, this work proposes a new representation of traversability as a general distribution of robot speed conditioned on environment semantics and the commanded speed. The proposed pipeline first automatically labels a dataset of collected trajectories with realized vehicle speeds to capture the variability in speed outcomes associated with each semantic class. This dataset can then be used to learn a distribution of achievable speeds, conditioned on the commanded speed and semantics of the nearby terrain. The learned speed distribution is then converted into a speed map representation that can be leveraged with various planning paradigms. Notably, we incorporate risk-awareness into the planner via CVaR and show how to adjust the risk without collecting any extra data or re-training the learned model.
The contributions of this work include: i) a new representation of traversability as a probability distribution of speeds the robot could achieve, which can be learned from data and allows physically meaningful interpretation in m/s; ii) a new risk-aware minimum-time planner based on Model Predictive Path Integral (MPPI) control that uses the learned speed distribution map, allowing risk level adjustment without re-training or collecting more trajectories; iii) a demonstration of a robot reaching its goal with up to % improvement in success rate than a risk-unaware algorithm in a high-fidelity Unity simulation environment in a full autonomy stack.
Ii Related Work
Traversability analysis can be achieved via both proprioceptive and exteroceptive  sensors, where the former category includes IMUs that measure vibration and orientation of the robot [20, 22], and the latter category that includes lidar and RGB cameras that provide geometric and semantic understanding of the environment. Purely geometry-based analysis has been widely adopted, e.g., [14, 23, 24, 8], which often involves a weighted sum of costs extracted based on geometric properties such as slope, roughness and step height. Notably, geometry-based methodology has been demonstrated successfully in the DARPA Subterranean Challenge , where the cost function is assumed to be Gaussian, allowing easy computation and adjustment of risk level via CVaR. In contrast, this work proposes a new representation of traversability based on experience and brings semantics into the problem.
Among methods that combine semantic-based and geometry-based techniques,  proposes a fusion strategy that uses geometry-based cost (slope added to step height) for the terrain, unless the associated semantic label is known to be undesirable.  uses both geometric and semantic layers in a multi-layer costmap, and fuses the costs by accounting for layer uncertainty.  classifies a dense 3D pointcloud to extract traversability labels (Free, Low Cost, Medium Cost, Lethal), where the ground truth labels are designed based on human expertise. These methods either require human expertise in associating semantics with traversability, or require combining costs with different units, which makes tuning non-intuitive. In contrast, this work uses vehicle speed as a common unit in the cost function to enable intuitive risk level adjustment.
Other recent work proposes methods to learn navigation policies or cost functions from experience via imitation learning[31, 32, 4, 25]
, inverse reinforcement learning33], or model-free reinforcement learning . While these methods leverage datasets or simulators to reduce some of the expert knowledge requirements, a key limitation is that adjusting the risk tolerance could require collection of a new set of expert trajectories and/or re-training the learned models. Alternatively, [12, 17] learn a predictive “events” (e.g., bumpy, collision, smooth) model from a diverse dataset of experiences. By predicting the probability of undesirable events, the riskiness of the planner in  can be adjusted by changing the penalty for these events without re-training the network. However, the cost terms in  have different units such as terrain bumpiness and goal proximity, leading to a difficult conversion from risk tolerance to reward function weights. This work also leverages learning from a dataset of experienced trajectories, but instead proposes a pipeline to produce speed maps that can be incorporated into many planning paradigms.
For fast off-road navigation, time-to-goal is a typical performance measure that depends on the quality of planned trajectory and how well a robot executes the planned maneuvers. A good assessment of terrain traversability allows a planner to generate trajectories that are both fast and can be executed. To this end, the discrepancy between the vehicle’s planned speed and realized speed provides a natural quantity to describe the traversability of terrain. Importantly, traversability is probabilistic in nature, due to imperfect sensing, broad semantic class labels for terrain, and the dynamics of vehicle-terrain interactions. Therefore, we propose to capture traversability via a conditional distribution of realized speed given the commanded speed and sensor observation.
Iii-a Traversability as a Conditional Speed Distribution
Denoting the set of realized speeds as and the set of possible observations about a terrain patch as , we define traversability of the terrain as the conditional distribution
where are the realized speed and the commanded speed, is the observation about the terrain, and is a probability distribution parameterized by
, which in practice can be learned via a neural network. This representation is general enough to capture the multi-modality of the distribution and allows the planner to extract desired statistics for trajectory planning such as mean, modes and variance.
For high-speed navigation in a cluttered environment (e.g., a forest), the planner often has to trade off opportunities to reduce navigation time with the risk of colliding with an obstacle or getting stuck. To quantify these risks, we adopt the Conditional Value at Risk (CVaR), which satisfies a group of axioms important for rational risk assessment . The Conditional Value at Risk at level is defined as:
where is the Value at Risk, or the
Note that we define CVaR to capture the worst-case speed outcomes (i.e., lowest speed), as visualized in Fig. 2. Intuitively, measures the average speed outcomes that are lower than the -quantile of the speed distribution, capturing the worst-case expected speed. Notice that CVaR is the same as the mean when , so often a low is picked for sufficient distinction from the expectation.
Iii-B Generating a Risk-Aware Traversability Map
Next, we show how a learned speed distribution from Eq. 1 can be used to convert a semantic map (built from the robot’s sensor data) into a representation of traversability for the planner.
The architecture is illustrated in Fig. 3. The input to the pipeline is a semantic gridmap, , with semantic classes, width , and height . Let represent a -layer speed map, where each layer has width and height . For map , we denote as the cell value in layer , row and column . Given the speed limits of , we let the -th layer correspond to the commanded speed range of , where . Lastly, we associate the distribution to each cell indexed by , where denotes the observation about the terrain patch that lies in the cell. Note that any cell with unknown traversability should be marked, e.g., with a negative number.
Although it is up to the user to populate the map cell values based on the associated speed distribution, it is important that the values provide interpretability and allow the user to easily adjust the riskiness of the planner. To this end, we propose to use the convex combination of the mean and at level of the underlying speed distribution:
where we know . Intuitively, can be interpreted as the risk-adjusted speed
which lies between the CVaR speed estimate and the mean speed estimate due to convex combination.
For convenience, we define a look-up function that returns the risk-adjusted speed estimate in the multi-layer speed map given a position and speed :
where are the map indices corresponding to position and speed if lies in the map. Note that speed is returned when the look-up is not valid, which is desirable when the unknown region can be dangerous, such as water. However, if domain knowledge suggests that the unknown is benign, a more optimistic value can be returned such as the query speed .
Iv Minimum Time Navigation
We adopt the MPPI controller proposed in [40, Algorithm 2] for minimum time planning. MPPI is an information theoretic model predictive control (MPC) algorithm that tries to approximate the mean of the optimal control distribution via weighted samples of a Gaussian proposal distribution in order to minimize the KL-divergence between the two. This approach is attractive because it is derivative-free and works with general cost functions and dynamics. Next, we follow the notation in  and propose a new cost function for min-time planning using the proposed risk-aware traversability map. Fig. 4 illustrates a high-level overview of our strategy and the notation used.
Consider the discrete time stochastic system:
is the state vector andis the noisy realization of the nominal control input . Given the initial condition , a sequence of input leads to the state trajectory according to the dynamics (6). For the purpose of min-time planning, we assume it is possible to extract the planar position and speed from state . Furthermore, let be an indicator function that returns when any state has reached the goal at position for , and returns otherwise.
The min-time state-dependent objective is defined as
where and are the time-to-go and the stage cost, respectively:
with being the default speed for estimating time-to-go at the end of the rollout, and being the sampling duration. Intuitively, the lower the risk-adjusted speed is, the more nominal stage cost is scaled up, indicating longer time to travel. If any state reaches the goal for , all subsequent states do not incur stage cost or cost-to-go.
At each time and given the nominal control sequence , MPPI estimates the mean of the optimal control distribution as the weighted sum of control rollouts that are sampled from for all . The weight of each rollout is an exponentiated cost function (7) evaluated along the induced state trajectory. The algorithm runs in an receding horizon fashion, where the nominal control sequence in the next round is set to be newest estimate of the mean of the optimal control distribution.
In an off-road environment, where the geometric and semantic properties are hard to assess, traversability can be highly non-Gaussian. For instance, if terrain is classified as vegetation but there is no distinction between dense bushes or soft grass, the speed outcome may exhibit bi-modal distribution (i.e., being stuck or not). With this core issue in mind, we first validate our approach of capturing speed distributions in a grid world (Section V-A), and then integrate the proposed planner into a full autonomy stack in a high-fidelity Unity environment (Section V-B). Given this environment and autonomy stack, Section V-C describes the process of collecting a training dataset and training a neural network to predict PMFs of the speed distribution. Then, Section V-D demonstrates the improved navigation performance as a result of the learned speed maps and risk-awareness.
V-a Grid World Navigation
To validate that a speed distribution representation of traversability can be incorporated into a risk-aware planner and lead to improved performance, we first designed a grid world (see upper left ofFig. 5), where each cell is associated with a semantic type (either vegetation or dirt). The task is to navigate from the start position to the goal position in minimum time by planning a sequence of actions chosen from that move the robot to a neighboring cell in the corresponding directions with nominal speed of . Although every action is deterministic, the actual traversal time is stochastic due to the underlying speed distribution of traversed cell, as shown in Fig. 5 (upper right). Note that we use a single-layer traversability map (i.e., ) and denote for each value.
To find the min-time trajectory, we use a best-first search algorithm  with a prioritized search queue where new nodes with reachable states given actions are added. The stage cost of the map cell at row and column is the estimated traversal time , where CVaR is fixed at level . We compare the performance of the planner with , which correspond to using mean speed, risk-adjusted speed, and the (pure) risk speed. The optimal trajectories and their average time-to-goal values over trials are visualized in Fig. 5 (upper left and bottom). By only capturing the mean speeds (), the planner’s performance has high variance because it does not consider the worst-case. By only accounting for risk (), the planner’s performance has very low variance, but it is also highly conservative. By utilizing both risk and mean of the speed distribution (), the planner’s riskiness can be chosen to achieve shorter average time-to-goal with slight increase in variance. This experiment demonstrates that representing traversability as a speed distribution can be incorporated into a risk-aware planner and lead to improved performance.
V-B Integration with Autonomy Stack in High-Fidelity Environment
Next, we integrate the proposed methods into a full autonomy stack  in a high-fidelity Unity simulation environment with a Clearpath Warthog platform , as shown in Fig. 6. To focus on the challenge of coarse semantic labeling and its impact on traversability analysis, the Unity environment contains dirt and vegetation terrain types, where roughly of the bushes (classified as vegetation) on grass are non-traversable (i.e., the robot cannot drive through them), which slows down navigation and the resulting wheel slip has a side-effect of causing substantial drift in the vehicle’s pose estimate.
V-C Data Collection and Network Training
In order to learn the traversability model as defined in (1), samples consisting of tuples of (commanded speed, terrain type, realized speed) were gathered with a joystick-controlled robot, where the joystick provided the commanded speeds, and the traversed terrain type and true speed were taken directly from the Unity simulation engine. The robot drove for minutes in the training area (Fig. 6 left), resulting in about and samples associated with the vegetation and dirt terrain, respectively. The dataset was used to train a multi-layer feedforward NN with hidden layers ofoutput speed bins between m/s and maximum speed
m/s via a softmax layer. The network was trained forepochs with the Adam optimizer with the learning rate of , resulting in the speed distribution maps visualized in Fig. 7, where the rows correspond to binned commanded speeds and their resultant PMFs of speed outcomes.
During deployment, top-down semantic images of the environment with m cell resolution was processed by the network, as illustrated in Fig. 3. The training and testing areas contain mostly dirt and vegetation terrains types, and unknown semantic types were assumed to induce m/s. Note that the semantic map and every commanded speed are paired and reshaped into a large batch input to the network. A layer speed distribution map with output speed bins can be evaluated under ms using the CPU (all runtimes reported on a desktop computer with an Intel i7-7700K CPU and 32GB RAM). The risk-aware speed maps can be extracted from the speed distribution maps as the convex combination of PMF mean and CVaR. When the entire autonomy stack was running and competing for CPU, mean and CVaR took ms and ms to compute, respectively. The computation for CVaR was done via rectangular approximation of density within each output speed bin (software optimization and GPU parallelization could likely reduce the CVaR calculation times substantially). Due to computational constraints, the traversability map was published at Hz to the MPPI local planner.
V-D Min-Time Navigation Benchmark
In order to benchmark the effectiveness of learned risk-aware speed maps, the planner is tasked to navigate the robot from a set of pre-specified starting positions to goal positions in an unseen test environment, as illustrated in Fig. 6. Each test pair of starting point and goal were repeated for 3 trials over a range of risk weights . The goal tolerance was set to be a m circle and the longest distance the robot had to travel was about m. A timeout period of s was imposed to terminate the trials where the robot got stuck or disoriented due to collisions with bushes. The planner models the Clearpath Warthog robot as a differential drive robot whose control input consists of two wheel speeds. During each MPPI optimization round, control rollouts were sampled over s horizon at
Hz according to noise standard deviation ofrad/s for each wheel. The estimated optimal control distribution was iteratively refined as MPPI ran in a receding horizon fashion. A small default speed of m/s was used for estimating time-to-go to encourage the robot to approach the goal.
The benchmark results are shown in Fig. 8 which contain the success rate and the average speeds over the successful trials for a range of risk weights. As the risk weight increases, the robot has lower average speed but higher success rate () as beta increases from to . Example rollouts produced by and are shown in Fig. 9, where the risk-aware trajectories () overlap more with the dirt terrain (lower risk), whereas the planner that only accounts for the expectation () prefers shorter paths that overlap more with vegetation (higher risk). As a result, one of the trajectories led to a collision with bushes, which caused localization errors and failure to reach goal.
Vi Conclusion & Future Work
This work proposed a new notion of traversability as the conditional speed distribution achievable by a robot, conditioned on the environment semantics and commanded speed. This representation can be learned directly from experienced trajectories and can be incorporated into various planning paradigms as a speed map. The proposed planning strategy was shown to lead to faster average time-to-goals compared to other methods that did not consider the worst-case. Lastly, the proposed risk-aware strategy led to higher success rate in minimum-time navigation task in a high-fidelity simulator.
One area of future work is in automatically tuning the risk parameter online, based on differences between realized speed and commanded speed. Additionally, the work could be extended to capture other forms of uncertainty, such as from out-of-distribution inputs or from probabilistic outputs from the semantic segmentation module. Finally, a learned cost-to-go estimator could be used to improve the cost assigned to the end of each MPPI rollout for improved performance.
Research was sponsored by the Army Research Office and was accomplished under Cooperative Agreement Number W911NF-21-2-0150. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.
-  (2022) ARL autonomy stack. External Links: Cited by: §V-B.
-  (1989) Ambler: an autonomous rover for planetary exploration. Computer 22 (6), pp. 18–26. Cited by: §I.
-  (2011) Chance-constrained optimal path planning with obstacles. IEEE Transactions on Robotics 27 (6), pp. 1080–1094. Cited by: §I.
-  (2016) End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316. Cited by: §I, §II.
-  (2022) Clearpath warthog. External Links: Cited by: §V-B.
-  (2011) A minimum risk approach for path planning of uavs. Journal of Intelligent & Robotic Systems 61 (1), pp. 203–219. Cited by: §I.
-  (2019) Planning beyond the sensing horizon using a learned context. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1064–1071. Cited by: §I.
-  (2021) STEP: stochastic traversability evaluation and planning for risk-aware off-road navigation. In Robotics: Science and Systems, pp. 1–21. Cited by: §I, §II.
-  (2021) TTM: terrain traversability mapping for autonomous excavator navigation in unstructured environments. arXiv preprint arXiv:2109.06250. Cited by: §II.
-  (2019) Risk-aware motion planning and control using cvar-constrained optimization. IEEE Robotics and Automation letters 4 (4), pp. 3924–3931. Cited by: §I.
-  (2021) Rellis-3d dataset: data, benchmarks and analysis. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 1110–1116. Cited by: §I.
Badgr: an autonomous self-supervised learning-based navigation system. IEEE Robotics and Automation Letters 6 (2), pp. 1312–1319. Cited by: §I, §II.
-  (2003) Distributed search and rescue with robot and sensor teams. In Field and Service Robotics, pp. 529–538. Cited by: §I.
-  (2011) Off-road terrain traversability analysis and hazard avoidance for ugvs. Technical report CALIFORNIA UNIV SAN DIEGO DEPT OF ELECTRICAL ENGINEERING. Cited by: §II.
-  (2010) Chance constrained rrt for probabilistic robustness to environmental uncertainty. In AIAA guidance, navigation, and control conference, pp. 8160. Cited by: §I.
-  (2020) How should a robot assess risk? towards an axiomatic theory of risk in robotics. In Robotics Research, pp. 75–84. Cited by: §I, §III-A.
-  (2020) Learning to drive off road on smooth terrain in unstructured environments using an on-board camera and sparse aerial images. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 1263–1269. Cited by: §I, §II.
-  (2004) Autonomous navigation system for planetary exploration rover based on artificial potential fields. In Proceedings of Dynamics and Control of Systems and Structures in Space (DCSSS) 6th Conference, pp. 153–162. Cited by: §I.
-  (2018) Real-time semantic mapping for autonomous off-road navigation. In Field and Service Robotics, pp. 335–350. Cited by: §I.
Three-dimensional mapping with augmented navigation cost through deep learning. Journal of Intelligent & Robotic Systems 101 (3), pp. 1–21. Cited by: §II.
-  (2015) Risk-aware planetary rover operation: autonomous terrain classification and path planning. In 2015 IEEE aerospace conference, pp. 1–10. Cited by: §I.
-  (2016) Recurrent neural networks for fast and robust vibration-based ground classification on mobile robots. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 5603–5608. Cited by: §II.
-  (2020) Fast local planning and mapping in unknown off-road terrain. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 5912–5918. Cited by: §II.
-  (2021) G-vom: a gpu accelerated voxel off-road mapping system. arXiv preprint arXiv:2109.13176. Cited by: §II.
-  (2017) Agile autonomous driving using end-to-end deep imitation learning. arXiv preprint arXiv:1709.07174. Cited by: §I, §II.
Terrain traversability analysis methods for unmanned ground vehicles: a survey.
Engineering Applications of Artificial Intelligence26 (4), pp. 1373–1385. Cited by: §II.
-  (2013) Risk-aware path planning for autonomous underwater vehicles using predictive ocean models. Journal of Field Robotics 30 (5), pp. 741–762. Cited by: §I.
-  (2002) Artificial intelligence: a modern approach. Cited by: §V-A.
-  (2017) Cad2rl: real single-image flight without a single real image. In Robotics: Science and Systems XIII, Cited by: §I.
-  (2022) Semantic terrain classification for off-road autonomous driving. In Conference on Robot Learning, pp. 619–629. Cited by: §I, §II.
-  (2010) Applied imitation learning for autonomous navigation in complex natural terrain. In Field and Service Robotics, pp. 249–259. Cited by: §I, §II.
-  (2008) High performance outdoor navigation from overhead data using imitation learning. Robotics: Science and Systems IV, Zurich, Switzerland 1. Cited by: §I, §II.
-  (2015) Traversability analysis for mobile robots in outdoor environments: a semi-supervised learning approach based on 3d-lidar data. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 3941–3946. Cited by: §II.
Risk-aware autonomous navigation.
Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, Vol. 11746, pp. 117461D. Cited by: §II.
Deep multispectral semantic scene understanding of forested environments using multimodal fusion. In International Symposium on Experimental Robotics (ISER), Cited by: §I.
-  (2017) Adapnet: adaptive semantic segmentation in adverse environmental conditions. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 4644–4651. Cited by: §I.
-  (2021) APPLE: adaptive planner parameter learning from evaluative feedback. IEEE Robotics and Automation Letters 6 (4), pp. 7744–7749. Cited by: §I, §II.
-  (2021) Control of rough terrain vehicles using deep reinforcement learning. IEEE Robotics and Automation Letters 7 (1), pp. 390–397. Cited by: §II.
-  (2019) A rugd dataset for autonomous navigation and visual perception in unstructured outdoor environments. In International Conference on Intelligent Robots and Systems (IROS), Cited by: §I.
-  (2017) Information theoretic mpc for model-based reinforcement learning. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1714–1721. Cited by: §I, §IV.