Functional Path Optimisation for Exploration in Continuous Occupancy Maps

by   Gilad Francis, et al.
The University of Sydney

Autonomous exploration is a complex task where the robot moves through an unknown environment with the goal of mapping it. The desired output of such a process is a sequence of paths that efficiently and safely minimise the uncertainty of the resulting map. However, optimising over the entire space of possible paths is computationally intractable. Therefore, most exploration methods relax the general problem by optimising a simpler one, for example finding the single next best view. In this work, we formulate exploration as a variational problem which allows us to directly optimise in the space of trajectories using functional gradient methods, searching for the Next Best Path (NBP). We take advantage of the recently introduced Hilbert maps to devise an information-based functional that can be computed in closed-form. The resulting trajectories are continuous and maximise safety as well as mutual information. In experiments we verify the ability of the proposed method to find smooth and safe paths and compare these results with other exploration methods.



There are no comments yet.


page 5

page 7

page 8

page 9

page 10


Active Bayesian Multi-class Mapping from Range and Semantic Segmentation Observation

Many robot applications call for autonomous exploration and mapping of u...

Robotic Exploration of Unknown 2D Environment Using a Frontier-based Automatic-Differentiable Information Gain Measure

At the heart of path-planning methods for autonomous robotic exploration...

Optimisation and Comprehensive Evaluation of Alternative Energising Paths for Power System Restoration

Power system restoration after a major blackout is a complex process, in...

Active Bayesian Multi-class Mapping from Range and Semantic Segmentation Observations

The demand for robot exploration in unstructured and unknown environment...

Active Exploration and Mapping via Iterative Covariance Regulation over Continuous SE(3) Trajectories

This paper develops iterative Covariance Regulation (iCR), a novel metho...

Frontier-based Automatic-differentiable Information Gain Measure for Robotic Exploration of Unknown 3D Environments

The path planning problem for autonomous exploration of an unknown regio...

From river flow to spatial flow: flow map via river flow directions assignment algorithm

Flow map is an effective way to visualize object movements across space ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The objective of autonomous exploration is to produce a consistent representation of the environment. It involves complex decision-making, selecting the trajectories a robot should take in order to minimise the overall uncertainty in the model. Essentially, exploration is a path optimisation procedure to find trajectories that efficiently learn the environment. The difficulty lies in the dimensionality and shape of the search space which prohibits a closed-form solution to the general exploration problem, making autonomous exploration an active field of research. The plethora of exploration methods in the literature offer different strategies for relaxing this intractable problem, most commonly by reducing the search space dimensionality, for example by discretising the path.

In this work, we formalise exploration as a variational problem. We present a novel approach based on functional gradient descent (FGD) to efficiently optimise exploratory paths over continuous occupancy maps. We use stochastic FGD to overcome the limitations of standard FGD methods in order to ensure convergence. This process enables optimisation over the entire path, resulting in continuous smooth paths that maximise the overall map quality while keeping the robot safe from collisions. Our contributions are:

  1. A Next Best Path method. An information-driven variational framework for safe autonomous exploration in continuous occupancy maps. Path optimisation is performed directly in the space of trajectories using a combined objective; which considers safety, efficiency and information. The method is invariant to the choice of path representation as it uses stochastic functional gradient descent to optimise the objective along the entire path.

  2. Developing a mutual information (MI) variational objective for continuous occupancy maps. This replaces the common and expensive approach of computing MI explicitly over the entire map for each evaluated path. Instead, our method modifies the path using MI functional gradients without the need to compute MI explicitly. These gradients are obtained from local perturbations on the map model which are derived in closed-form.

The remainder of this paper is organised as follows. Literature on autonomous exploration is surveyed in Section II. The basic building blocks, such as Hilbert maps and FGD are reviewed in Section III. Section IV describes in detail the functional exploration algorithm. Experimental results and analysis are presented in Section V. Finally, Section VI draws conclusions on the proposed method.

Ii Related Work

The goal of autonomous exploration is to produce a consistent environment model by minimising any uncertainties. In a mapping context, exploration is the process of producing high-fidelity maps [1]. This is a complex problem mainly due the dimensionality of the solution space. Most exploration methods use occupancy grid maps in their planning [2], rather than continuous occupancy maps. Regardless of the type of occupancy map, exploration methods take one of two forms; frontier-driven or information-theoretic. Juliá et al. [3] provide a quantitative comparison between these exploration methods.

Frontier-based exploration methods drive the robot toward the borders of the known space [4]. In a grid map, frontiers are clusters of free cells neighbouring unknown cells. Once frontiers are identified, a separate path planner finds a safe path toward a selected frontier. Various utility functions can be used when choosing the most desirable frontier. The simplest form considers only the travelling costs [4]. González-Baños and Latombe [5] choose a goal point based on a score of expected coverage penalised by the travelling costs. A generalised approach for goal point selection given several criteria were suggested by [6].

Information-theoretic exploration methods optimise a utility function associated with the uncertainty of the map. Early work optimised the selection of a discrete goal point rather than optimising an entire path. Elfes [7] suggested MI as an information metric for exploration, while [8] proposed a next best view (NBV) approach using the entire map entropy. Vallvé and Andrade-Cetto [9] use a potential field computed over the entire configuration space to find exploration candidates. However, this method assumes discrete steps, disregarding the reduction in entropy between consecutive robot poses. Charrow et al. [10]

combined frontier and information-based methods. While they optimised the information heuristic over a continuous control input space, in effect the path consisted of a fixed number of time steps. Lauri and Ritala

[11] formulate the exploration problem as

partially observable Markov decision process

(POMDP) and used sample-based approach to solve the POMDP. Similarly to the work of [10], the action space is continuous, however, the path consists of a finite set of time steps, which is in contrast to the proposed method where optimisation is performed in the space of trajectories.

Only a handful of algorithms tackle exploration in continuous occupancy maps. Yang et al. [12] employed rapidly-exploring random tree (RRT) to sample a set of feasible path candidates for a Gaussian Processes (GPs) maps. An adaptation of the frontier method for continuous occupancy maps was introduced by [13], where a discretised frontier map was built from the continuous map. More recently, Jadidi et al. [14] employed MI to rank these frontiers. Bayesian optimisation (BO) has also been used for exploration. Marchant and Ramos [15] utilised BO to optimise the selection of continuous informative paths over a continuous environmental model. Francis et al. [16] used constrained BO for safe exploration to learn an MI objective. While BO optimises over continuous paths, in practice it uses only a limited path parameterisation such as quadratic or cubic splines.

In summary, the exploration method proposed in this paper uses an information-based utility to optimise path selection. Employing calculus of variations, the optimisation procedure is invariant to the path representation. This is a major difference from existing methods that typically depend on a finite path parametrisation, such as finite sets of time steps or waypoints, or employ simple representations such as quadratic or cubic splines. Instead, our method can utilise a highly expressive path representation; such as non-parametric [17] or approximate kernel paths [18]. As our method uses continuous occupancy maps, the MI utility can be derived directly from the map model in closed form, which simplifies computations.

Iii Preliminaries

In this section, we review the basic building blocks of the functional exploration method; Hilbert maps and functional gradient path planning. In section IV, we adapt these building blocks to support an information-driven safe exploration algorithm.

Iii-a Hilbert Maps

A Hilbert map [19] is a continuous discriminative model that predicts occupancy based on sensors observations. Unlike grid maps [20] that discretise space into a set of independent cells, Hilbert maps maintain neighbourhood information. Other continuous mapping methods based on Gaussian Processes (GPs) [13, 21]

hold similar properties. However, these methods are limited by the scalability of the associated GPs. Hilbert maps, on the other hand, use an approximate kernel logistic regression model, which renders updating and querying the map independent of the dataset size. In addition,

stochastic gradient descent (SGD) is used to enable real-time performance.

Formally, a Hilbert map is a logistic regression

(LR) classifier model that predicts occupancy anywhere in the map. It uses nonlinear projection into a

reproducing kernel Hilbert space (RKHS) in order to represent a real environment. Given a set of weights , the predictive occupancy posterior transforms into:


In an occupancy map context represents either a or location, while denotes two possible binary outputs, unoccupied and occupied. We define

as the logistic sigmoid function.

is a set of features that, in expectation, approximate the inner product defined by the kernel function . This approximation enables fast training of a map model, suitable for use in a robotics setting. There are several methods to approximate the kernel matrix [19]

. Given the desired approximation, the weights vector

can be then trained by minimising the regularised negative log-likelihood (NLL) as commonly performed in logistic regression methods [19].

Iii-B Functional Gradient Descent

Functional gradient descent (FGD) is a variational framework to optimise nonlinear models. It has been successfully applied to motion planning problem in recent years with the main objective of producing safe, collision-free paths. It was recently suggested as an alternative approach to sampling-based methods for path planning using occupancy maps [17]. In this section, the general method is discussed, before the extension for autonomous exploration is described in section IV.

We first introduce notation. A path, , is a function that maps a time-like parameter into configuration space . The objective functional returns a real number for each path , corresponding to a cost or loss associated with . captures path properties such as smoothness and safety. The goal of the optimisation process is to find a path that minimises the overall costs:


Finding the optimal path is performed by following the functional gradient of the objective. This is an iterative process where the functional gradient update rule is derived from a linear approximation of the cost functional around the current trajectory, :


We enforce small updates by adding a regularisation term based on the norm of the update:


The regularisation term

is the squared norm with respect to a metric tensor

and is a user-defined learning rate. By differentiating the right hand side of (4) with respect to , we obtain the iterative update rule:


We note that (5) forms a general update rule, regardless of the choice of the objective function or the path representation. The only requirements are that is invertible and the gradient exists.

The general rule for computing the objective functional gradient stems from the calculus of variations. As a variational method, the objective functional must take the form of an integral or sum. Generally, the objective functional takes the form , therefore the functional gradient can be computed using the Euler-Lagrange equation [22]:


These gradients are then used to compute the iterative update rule of (5). Constraints are incorporated using KKT conditions, similar to [23], and are an explicit part of the path representation.

Iv Functional Exploration

The following section introduces our proposed functional exploration method. The use of functional gradient descent on an information-based objective results in trajectories which are highly expressive and safe while optimising the amount of information gained along the path.

Iv-a Notation

We first introduce the notation used throughout the following sections. The workspace of the robot defines the space where obstacles lie and the map is queried. In addition, to account for the robot’s finite size or its pose uncertainty, a set of body points, are defined. As the trajectory lies in configuration space , we set a forward kinematic transform that maps a robot configuration, and body point to a point in the workspace . To simplify notation, we define as the workspace location for the pair . We assume that the robot is equipped with a sensor, such as laser range finder, with a maximum range and an angular field of view .

For a given function , a functional returns a single value . Functionals are usually represented by an integral. However, whenever we compute the safety or MI functionals, the cost is computed over . As a result, the functional must be approximated by a reduce operator, e.g., average or maximum, that aggregates the cost along . Given a workspace cost function , we can approximate the functional by a sum over a finite set of time and body points:


In addition, we note the difference between gradient operators. We define as a gradient with respect to , while is the workspace gradient.

Iv-B Exploration Functional Objective

The goal of autonomous exploration is to safely reduce the uncertainty of the environment model. Some exploration methods compute a finite set of go-to points that locally maximise information gain. Other methods optimise an information-based objective over the entire path, however, these are highly dependent on the path parameterisation. Formulating exploration as a variational problem and solving it using functional gradient descent, provides a general optimisation framework which is invariant to the choice of path parameterisation and can even take a non-parametric form as shown in [17].

The approach taken in our work relies on maximising mutual information along the entire path while keeping the trajectory safe. This is attained by constructing an objective functional which is a weighted sum of three components:

  • which maintains path safety by penalising proximity to obstacles,

  • which penalises based on the shape of the trajectory, keeping path smooth and short and,

  • which rewards the mutual information gained along the path.

The overall objective takes a form of a weighted sum, as is also shown schematically in Figure 1;


Here ,, are user-defined coefficients. In the following sections, we will introduce the different components of the functional objective; , and . For each component we will derive its functional gradient assuming a Hilbert map as the environment model.

Fig. 1: Given a sample , the functional is the weighted sum of the various objectives; obstacle, dynamic and MI. and are global variables used to compute these objectives.

Iv-B1 Obstacle Functional

Following our previous work [17], we define the workspace cost function as the map occupancy of (1), i.e. . Given (7), the obstacle functional can be approximated by . The functional gradient can then be computed as


Here, is the workspace Jacobian. Since the map model is continuous and at least twice differentiable [19], the spatial gradient of occupancy can be computed in closed-form from (1) as:


Iv-B2 Path Dynamics Functional

penalises kinematic costs associated with . The straightforward approach is to regularise on the trajectory length, which can be attained by optimising the integral over the squared velocity norm: . Following the Euler-Lagrange equation (6), the functional gradient of is


Iv-B3 Mutual Information Functional

Fig. 2: MI Functional gradient. The MI gradient is computed for a time sample given a continuous occupancy map

(a). The current path estimate

is depicted in blue, while the expected pose at time is shown in green. The modified OM (b) is generated using hypothetical observations, shown as cyan diamonds, based on a robot’s configuration at time . Only observations at the edge of sensor range are considered for . The entropy of the OM (c) and the modified OM (d) can be computed from the occupancy values (high entropy shown in white), which produces MI (e). The MI gradient is estimated by samples around observations (shown as black arrows). Accumulating gradient samples results in the overall MI gradient for , shown by the yellow arrow on the robot. Note that the images of occupancy, entropy and MI are only given here for presentation purposes and full maps are never computed. The planner only accesses occupancy, entropy and gradient values through stochastic samples.

Mutual information (MI) is used in many autonomous exploration algorithms as an information-based objective function [24]. In this context, MI is defined as the reduction in entropy conditioned on expected observations. Given an occupancy map and a set of expected observations , we can define MI as:


where denotes Shannon’s entropy. The main computational challenge of (12) is resolving the expected observations . These are emulated observations that are produced by ray casting based on the sensor model. Determining and MI over an entire path is computationally intensive, leading most exploration methods to solve a relaxed MI optimisation problem, by either discretising or parameterising the paths.

The approach taken in this work uses the MI functional and its gradient to maximise MI efficiently over the entire trajectory. To compute , an MI reward function is computed over the robot’s workspace in a similar fashion to the computation of . However, computation of the conditional entropy entails changes to the Hilbert map model.

In the following section, we will describe the stages involved in computing the MI functional gradient. The three stages executed when computing are:

  • Simulating expected observations

  • Creating a perturbed Hilbert map model

  • Obtaining MI functional gradient

Fig. 2 shows the steps required for MI gradient computations at a given time .

Simulating expected observations

Similarly to the obstacle functional, the MI functional is approximated by a sum over a finite set of points ; . is chosen in a way that will estimate the infinitesimal change in MI at . To compute we emulate observations by ray casting, as done in other information-driven exploration techniques [7, 14]. However, while other methods evaluate the MI reward over the entire field of view of the sensor, our method is only interested in the expected observations at the sensor’s limits (maximum range). The rationale behind this approach is that new information about the environment will be mainly obtained at the sensor’s sensing limits . Fig. 3 illustrates the difference between the two approach to compute MI.

The output of the ray casting process is a set of unoccupied expected observations for any time and body point . differs from the expected observations of (12), as we are only interested in unoccupied (no obstacle) observation at the sensor maximum range. Fig. 2b depicts as cyan diamonds.

(a) Standard Exploration
(b) Functional Exploration
Fig. 3: Difference in MI calculation. is the robot’s pose for which MI is computed, the black lines depict the sensor field of view and the blue area is the region where MI is computed. RKHS:subfig-1 MI is computed over the entire field of view of sensor, producing . RKHS:subfig-2 MI is computed only at the sensor’s range limits, producing
Input: : Occupancy Map.
                : Start state.
                : No obstacle threshold.
                Optional: boundary conditions , initial guess .
while solution not converged  do
       //Stochastic sampling:
       Draw mini-batch
       foreach  do
             Eq. (1)
             if  then
                   simulated observations
                   Combine using Eq.(8)
                   update rule Eq. 22
             end if
       end foreach
end while
Algorithm 1 Stochastic Functional Exploration

Creating a perturbed Hilbert map model

In this step, we generate , the modified Hilbert map conditioned on the expected observations. The straightforward approach is to train a new map model based on . This approach is commonly used in exploration methods for occupancy grid maps. However the computational costs of such an approach are high, as new maps must be generated along the entire trajectory during optimisation. Instead, we propose the use of a perturbed Hilbert map model.

The perturbed Hilbert map model modifies the predictive map model (1) with a perturbed model . This model uses the expected observations as a dataset , where is a point in 2D or 3D space and

is the log odds of the desired predictive occupancy posterior at

, to fit a Gaussian process (GP):


We note that of the current occupancy map is the mean function of the GP. The kernel function is the same function approximated by the Hilbert map features

. The predictive probability of the perturbed map

is given by:


Fig. 2b depicts the resulting occupancy map following the embedding of the expected observations in an existing map.

The computational cost of the perturbed Hilbert map is cubic in the number of expected observations . As we are only modifying the map in a small region, a small set of observations is required to generate perturbation, keeping the computation load low.

Obtaining MI functional gradient

The workspace MI cost function for is defined as the MI summed over the entire map, i.e. . However, as we are only interested in the change at the limits of the sensor range, can be replaced by; . Where denotes workspace locations which lie on the arc given by the maximum sensing range and the sensor’s field of view , as shown in Fig. 3 . can be approximated with a sum using either a deterministic or a Monte-Carlo schedule:


Given the approximations (7) and (15), the MI functional can be represented as


The MI functional gradient follows the same form as (9):


The spatial gradient of MI, , can be expressed in closed-form using the Hilbert maps’ continuous model. Using the MI definition from (12), the gradient is defined as , where

is the spatial gradient of the entropy. Using the chain rule we rewrite the spatial gradient of

around a query point as:


where is the probability of occupancy at given by (1). As

is a Bernoulli random variable,

is simply; . The occupancy gradient depends on the occupancy map used. For the unperturbed map, is given by (10). The spatial gradient of the perturbed map (14) can be computed similarly to the unperturbed map;


As is a GP, its gradient can be computed in closed form.

Fig. 2e shows the computation of MI gradient by sampling along the arc defined by the sensor range. Each sample produces an MI gradient, schematically shown by the black arrows. The overall MI gradient, which pushes optimisation toward exploratory trajectories is computed from the sum of all samples and is shown in Fig. 2e as a yellow arrow.

Fig. 4: Functional planning iteration. Functional gradients deform the initial solution in blue. The obstacles gradient repels path from obstacles and unknown space. The MI gradient pulls path toward unexplored space. The resulting path, in green, is the optimised solution.

Iv-C Functional Exploration Algorithm

In this section, we describe the functional exploration algorithm, which aims to find a safe path that maximises MI over its entire course.

Functional exploration is a general optimisation framework, meaning it is invariant to the choice of path representation. Functional optimisation methods have used waypoint parameterisation [22], Gaussian process representation [25, 26], or defined trajectories over RKHS [23]. However, in all methods the objective is sampled via a deterministic schedule. Such approaches have proved unsatisfactory for planning using occupancy maps [17]. To ensure convergence to a safe solution, the path is stochastically sampled in the entire domain and any uninformative gradient updates, such as those coming from unsafe parts of the map, are rejected.

The path representation used in this work is based on kernel matrix approximations [18]. The path is essentially a weighted sum of nonlinear features, similar to a regression problem. This path model is highly expressive, yet concise. Most importantly, the path can be optimised by SGD [27]. Given a set of weights , the path is defined as:


is an initial path, which can be randomly chosen or can be computed by a simple and fast planner. is a term used to adjust boundary conditions, such as the start and goal pose. are nonlinear features that approximates the inner product defined by a kernel function by the following dot product [18];


The kernel maintains the correlation between various time points.

Using (20), the general functional optimisation given in (2) is transformed into an optimisation of the weights, . The iterative update rule of (5) takes the following form:


where .

The algorithm for functional exploration is given in Algorithm 1. The essential inputs are the initial robot state and the occupancy map. Boundary conditions or an initial solution are optional inputs. In each iteration, a mini-batch is drawn. The safety of each sample is checked, and if it is below , the sample is used to update the path. The gradient of the various components (obs, dyn, MI) of the objective functional are computed, and summed according to (8). Once the overall functional gradient is computed, the weights are updated according to (22) .

Figure 4 provides an insight into the optimisation process of a single planning iteration. Each iteration starts with an initial guess, which is depicted in blue. The obstacle functional gradient repels the path from obstacles and unknown space. The MI objective pulls the path toward unexplored space. The intermediate path updates, following the functional gradient, are shown in grey. The final optimal path is shown in green.

Fig. 5: Comparison of exploration methods at various planning iterations using continuous occupancy maps. The hexagonal markers are the planning poses. Triangle and star represent start and end points, respectively. The grey lines are either candidate paths (RRT) or intermediate solutions (functional exploration). The pink line depicts the optimised path and the green line is the traversed path. The path is assessed during execution, and a re-plan step is invoked if a path is no longer safe.

V Experimental Results

In this section, we evaluate the performance of the stochastic functional gradient path planner and compare it to other exploration methods for continuous occupancy maps. As a benchmark, we chose to compare our method to the RRT-based exploration method of [12] which we modified to use Hilbert maps. This method optimises MI during path selection, and while its bottleneck lies in computational complexity, it takes advantage of the fact the RRTs are probabilistically complete. Another method used for comparison is based on frontier exploration [4], which takes a grid approximation of the continuous map in a similar approach to [13]. The path is then constructed by a smooth RRT planner which reasons only about the path safety. All methods, including Hilbert maps, are implemented in Python and tested on an Intel i5-6200U with 8GB RAM.

V-a Simulations

Figure 5 shows a qualitative comparison at various planning iterations. While all exploration methods successfully build the map, there are some clear differences. The frontier-based method produces jerky paths and tends to move towards the edges and corners of the map. Path optimisation methods, on the other hand, stay closer to obstacles. As the RRT-based method maximises MI only, its path moves closer to the edges of obstacles, while the proposed method keeps a bigger distance, as its objective includes obstacle safety explicitly. We note that increasing the safety margin by applying a blurring filter, as done with grid maps, is not applicable in continuous maps, as it requires expensive discretisation of the map.

Quantitative comparisons are shown in Fig. 6 and in Table I. Fig. 6 depicts the reduction in the entropy of the map. The rate of reduction is similar for both our method and the RRT-based exploration, albeit slightly better for the latter, mainly due the fact that the paths generated by the RRT move closer to obstacles. As a result, the RRT planner covers the map faster, however with a higher probability of collision, as shown in Table I by the maximum occupancy values exceeding the 50% occupancy threshold. The convergence of the frontier planner is slower, as a result of the over-estimation of the path utility by the choice of a single goal at each iteration. In addition, paths are jerky, leading to longer time to cover the same area. As the frontier planner does not explicitly minimise collision risk, the maximum occupancy over the path is high. In contrast, the maximum occupancy of the proposed method along the path is significantly below the 50% occupied threshold.

Fig. 6: Comparison of exploration methods - 2 repetitions; the proposed functional exploration method (red), RRT-based exploration (black) and frontier-based exploration (green). RRT and the propose method converge in a similar fashion as both optimise MI over entire path. The goal-based approach of frontier converges slower as it does not explicitly optimise on MI.
Proposed method RRT [12] Frontier[4]
Mean occupancy
Max. occupancy
Median Iter. Plan Time [s]
Mean Iter. Plan Time [s]
Max. Iter. Plan Time [s]
TABLE I: Performance comparison (40 planning iterations)

Comparing the median runtime results shown in Table I reveals that the RRT planner is significantly slower than our method, mainly because MI is computed over the entire map. Since frontier only queries the map for occupancy, its runtime is significantly smaller. However, its average runtime is similar to our method as occasionally resolving longer path is required. It is worth noting that the runtime results are limited by the Python implementation of the Hilbert map. C++ implementation of the Hilbert map proved to be two to three orders of magnitude faster, which makes it suitable for online applications.

V-B Real World Scenario

To evaluate the performance of the functional exploration algorithm in a real world scenario, we simulated a robot exploring the Intel-Lab. We used the Intel-Lab dataset (available at to generate a ground truth, shown in Fig. 7, from which we can emulate range observations. However, the exploring robot does not have direct access to the ground truth map. The map in Fig. 7 reveals a relatively simple structure. Yet, the small rooms and narrow corridors pose a difficulty to a robot with limited manoeuvrability. To prevent a situation where the robot is stuck in a room, we added a reverse-on-path option. Meaning, if the robot identifies a dead-end, it may reverse on the path that took it to that spot.

Figure 8 shows the exploration process at various planning iterations. The generated Hilbert map is overlaid with the ground truth map of 7 as a reference to the map accuracy. The robot successfully explore the majority of the map, moving mainly in the main corridor. It enters only some of the rooms, only where there is enough clearance at the entrance. We note that the robot only relies on occupancy around the entrance to assess safety.

Figure 8 provides an insight to the path optimisation process. This is shown by the intermediate paths (in grey), which reveals how the functional objective in Eq. (8) balances safety with exploration during the path selection process. The MI term in Eq. (8) pulls paths towards the border between known and unknown space. The safety functional, on the other hand, maintains a safe distance from obstacles and unknown space. Consequently, paths tend to move in the middle of the corridors and end close, but within some margin, to a frontier.

The main limitation of the functional exploration approach is the lack of global context during the optimisation. As FGD in a local optimisation process, its outcome depends on the starting point of the optimisation. This make FGD sensitive to dead-ends. An exploration dead-end scenario is shown in Fig. 7(c), where a robot is inside a room unable to find its way out. When the robot is inside a room, it can not identify any planning horizon in its local neighbourhood. Meaning, the MI functional’s contributions during optimisation are negligible, which results in non-exploring paths. To somewhat resolve this problem, we added a reverse-on-path option when the algorithm identifies a dead-end. However, a more robust solution may include a global exploration initial guess, such as a frontier, to start the optimisation. This will require to develop a frontier detection method for continuous occupancy maps, as current frontier exploration methods require to discretise the occupancy map, which is computationally intensive.

Fig. 7: Intel-Lab - Ground truth map based on the Intel-Lab dataset. The robot does not have access to this map. It is only used to emulate range observations.
(a) 1
(b) 4
(c) 9
(d) 14
(e) 17
(f) 22
(g) 25
(h) 29
(i) 32
Fig. 8: Functional exploration in the Intel-Lab at various planning iterations using continuous occupancy maps.The blue overlay depicts the ground truth, as shown in Fig. 7. The grey lines are intermediate solutions and the green line is the optimal path. The path is assessed during execution, and a re-plan step is invoked if a path is no longer safe. The traversed path is plotted in cyan, the red hexagons are the planning poses and the blue hexagon is the current pose.

Vi Conclusions

This paper introduces a novel method for exploration over continuous occupancy maps using stochastic functional gradient descent. This approach formalises exploration as a variational problem, where optimisation is performed directly in the space of trajectories. The functional objective of the proposed method explicitly optimises both safety and information collection over the entire path, finding the Next Best Path. While this approach can be used with any type of occupancy map, it is highly effective with Hilbert maps, where the introduced MI objective and its gradient can be computed from a perturbed model of the map. Our proposed approach eliminates the need for computing MI over the entire map as done in other exploration techniques. Rather, it computes variations to the path based on functional gradient of MI which are efficiently derived in closed-form from the map model.

Comparisons with other exploration methods show that the proposed method improves on both safety and MI. Point exploration methods, such as frontier, which do not optimise the path selection, exhibit slower exploration rate. On the other hand, sampling-based exploration methods, such as [12], do not include safety in their objective, hence the resulting paths tends to move closer to obstacles. Moreover, these methods are computationally expensive due to the need to repeatedly sample the MI objective over the entire path. In comparison our proposed method achieves similar exploration rates to [12] while taking less time to compute and still maximising safety.


  • [1] C. Stachniss, Robotic Mapping and Exploration. Springer, 2009.
  • [2] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. MIT Press, 2005.
  • [3] M. Juliá, A. Gil, and O. Reinoso, “A Comparison of Path Planning Strategies for Autonomous Exploration and Mapping of Unknown Environments,” Autonomous Robots, vol. 33, no. 4, pp. 427–444, 2012.
  • [4] B. Yamauchi, “A Frontier-based Approach for Autonomous Exploration,” in Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and Automation, 1997.
  • [5] H. H. González-Baños and J.-C. Latombe, “Navigation Strategies for Exploring Indoor Environments,” The International Journal of Robotics Research, vol. 21, no. 10-11, pp. 829–848, 2002.
  • [6] D. Holz, N. Basilico, F. Amigoni, and S. Behnke, “A Comparative Evaluation of Exploration Strategies and Heuristics to Improve Them,” in Proceedings of the European Conference on Mobile Robots, 2011.
  • [7] A. Elfes, “Robot Navigation: Integrating Perception, Environmental Constraints and Task Execution within a Probabilistic Framework,” in Reasoning with Uncertainty in Robotics, 1996.
  • [8] P. Whaite and F. P. Ferrie, “Autonomous Exploration: Driven by Uncertainty,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 3, pp. 193–205, 1997.
  • [9] J. Vallvé and J. Andrade-Cetto, “Potential Information Fields for Mobile Robot Exploration,” Robotics and Autonomous Systems, vol. 69, pp. 68–79, 2015.
  • [10] B. Charrow, G. Kahn, S. Patil, S. Liu, K. Goldberg, P. Abbeel, N. Michael, and V. Kumar, “Information-Theoretic Planning with Trajectory Optimization for Dense 3D Mapping,” in Proceeding of Robotics: Science and Systems, 2015.
  • [11] M. Lauri and R. Ritala, “Planning for Robotic Exploration based on Forward Wimulation,” preprint arXiv:1502.02474, 2015.
  • [12] K. Yang, S. Keat Gan, and S. Sukkarieh, “A Gaussian Process-based RRT Planner for the Exploration of an Unknown and Cluttered Environment with a UAV,” Advanced Robotics, vol. 27, no. 6, pp. 431–443, 2013.
  • [13] M. G. Jadidi, J. V. Miro, R. Valencia, and J. Andrade-Cetto, “Exploration on Continuous Gaussian Process Frontier Maps,” in Proceedings of the IEEE International Conference on Robotics and Automation, 2014.
  • [14] M. G. Jadidi, J. V. Miro, and G. Dissanayake, “Mutual Information-based Exploration on Continuous Occupancy Maps,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2015.
  • [15] R. Marchant and F. Ramos, “Bayesian Optimisation for Informative Continuous Path Planning,” in Proceedings of the IEEE International Conference on Robotics and Automation, 2014.
  • [16] G. Francis, L. Ott, R. Marchant, and F. Ramos, “Occupancy Map Building through Bayesian Exploration,” preprint arXiv:1703.00227, 2017.
  • [17] G. Francis, L. Ott, and F. Ramos, “Stochastic Functional Gradient for Motion Planning in Continuous Occupancy Maps,” in Proceeding of the IEEE International Conference on Robotics and Automation, 2017.
  • [18] G. Francis, L. Ott, and F. Ramos, “Stochastic Functional Gradient Path Planning in Occupancy Maps,” preprint arXiv:1705.05987, may 2017.
  • [19] F. Ramos and L. Ott, “Hilbert maps: Scalable Continuous Occupancy Mapping with Stochastic Gradient Descent,” in Proceedings of Robotics: Science and Systems, 2015.
  • [20] A. Elfes, “Using Occupancy Grids for Mobile Robot Perception and Navigation,” Computer, vol. 22, no. 6, pp. 46–57, 1989.
  • [21] S. T. O’Callaghan and F. T. Ramos, “Gaussian Process Occupancy Maps,” The International Journal of Robotics Research, vol. 31, no. 1, pp. 42–62, 2012.
  • [22] M. Zucker, N. Ratliff, A. D. Dragan, M. Pivtoraiko, M. Klingensmith, C. M. Dellin, J. A. Bagnell, and S. S. Srinivasa, “CHOMP: Covariant Hamiltonian Optimization for Motion Planning,” The International Journal of Robotics Research, vol. 32, no. 9-10, pp. 1164–1193, 2013.
  • [23] Z. Marinho, B. Boots, A. Dragan, A. Byravan, G. J. Gordon, and S. Srinivasa, “Functional Gradient Motion Planning in Reproducing Kernel Hilbert Spaces,” in Proceedings of Robotics: Science and Systems, 2016.
  • [24] B. J. Julian, S. Karaman, and D. Rus, “On Mutual Information-based Control of Range Sensing Robots for Mapping Applications,” The International Journal of Robotics Research, vol. 33, no. 10, pp. 1375–1392, 2014.
  • [25] M. Mukadam, X. Yan, and B. Boots, “Gaussian Process Motion Planning,” in Proceedings of the IEEE International Conference on Robotics and Automation, 2016.
  • [26] J. Dong, M. Mukadam, F. Dellaert, and B. Boots, “Motion Planning as Probabilistic Inference using Gaussian Processes and Factor Graphs,” in Proceedeing of Robotics: Science and Systems, 2016.
  • [27] H. Robbins and S. Monro, “A Stochastic Approximation Method,” The Annals of Mathematical Statistics, vol. 22, no. 3, pp. 400–407, 1951.