Exploration is a fundamental task in autonomous robotics. The goal is to map an unknown environment [Yamauchi97icra], for example to establish how a robot can navigate through it. Exploration can be used in various scenarios from search and rescue, oil and gas exploration, 3D reconstruction to inspection tasks. Different applications have different requirements. In search and rescue, for instance, it is important to find survivors rapidly. Other requirements could be that the generated map has a high accuracy or that the path travelled by the robot is as short as possible. In exploration, a map representation is used to describe the space that the robot perceived so far. Based on this description, a planning algorithm decides where to move next. In this work, we focus on the map representation.
Most map representations currently used for exploration assume perfect state estimates. However, on-board state estimators, such as visual-inertial odometry, are prone to drift. Since these are the estimators most likely to be used in unknown environments in the real world, these map representations can in practice lead to incorrect maps, as illustrated in Fig. 3.
To some extent, drift can be removed when a robot revisits a place and a correction of the trajectory estimate can be performed with a loop closure. However, as Fig. 3 illustrates, not all drift error can be removed in this way. Furthermore, such a correction causes heavy computational load in typical grid-based maps, as every depth measurement ray needs to be re-cast with the new trajectory estimate [Wurm10icraw]. Sub-mapping approaches can mitigate this computational cost [Bosse03icra], [Schmuck16ifac], [Millane17arxiv].
In this paper, we propose a map representation for exploration that is robust to state estimate drift. We build our approach using ideas from [Howard06ieee], where free space is represented with local polygons connected by relative poses. Our contributions are as follows:
We apply the ideas from [Howard06ieee] to exploration by adding semantics to the local polygons, which incorporate information from other nearby polygons. Nearby polygons are determined based on vicinity in a pose graph, as opposed to vicinity in an estimate of space that is faulty due to drift.
We show that with this, global map consistency is not required for the robot to know when exploration is complete. With this, global map optimization is not needed, which is particularly interesting for multi-robot exploration.
We show how to apply our representation to an exploration algorithm that previously used a grid-based representation. Local navigation is performed in local polygons, while navigation between remote locations in the map is performed using teach-and-repeat [Es15crv].
Finally, we simulate our proposed map representation under different conditions, compare it to grid-based representations, and validate it in the real world. As we show, our method comes at a cost of longer exploration times, but given reliable place recognition, is able to deal with large drift in cases where grid-based representations fail.
2 Related Work
While a map representation is used to represent the space that the robot perceived so far, a planning algorithm has to plan where the robot should move next. In the following, we review these two aspects of exploration separately.
2.1 Map Representations
Exploration can be formalized as the problem of discovering all free space within a bounded volume. Thus, the map representation needs to represent the spatial information necessary to achieve this. Map representations are commonly divided into three categories [Wallgrun10book]: metric representations, topological representations and hybrid, topometric representations.
Metric Representations Metric representations express locations as coordinates in a global reference frame. The predominant type of metric representations used in exploration are grid-based representations. In a grid-based representation, the volume is quantized into voxels, arranged in a grid. The voxels are then assigned appropriate labels, such as “free”, “unknown” or “obstacle” [Yamauchi97icra]
. Voxels typically carry further information, such as occupancy probabilities[Moravec85icra, Wurm10icraw] or a signed distance to the closest obstacle [Izadi11siggraph, Oleynikova16rssw]. In two dimensions, an alternative way of representing known free space is to represent it with a polygon [Gonzalez02ijrr, Caccavale18iros]. Here, the inside of the polygon represents free space, while its boundaries can either represent obstacles or the interface to unknown space. Because the location of all entities in fully metric representations are expressed in a global reference frame, they are highly reliant on accurate pose estimates: wrong pose estimates would lead to depth measurements being inconsistent with the true scene geometry, and with each other. As a consequence, the robot could wrongly believe occupied space to be free or inversely, free space to be occupied, which could have disastrous consequences for navigation.
Topological Representations Unlike metric representations, topological representations do not express locations as coordinates in a reference frame. Instead, locations are represented as vertices in a graph. Relationships between locations, such as adjacency, are expressed as edges. In order to be navigable, these edges should carry enough information to allow a robot to move between adjacent vertices. Vision-based navigation in such maps has been demonstrated using Visual Teach and Repeat [Furgale10jfr, Es15crv]. While topological maps can also be derived from volumetric maps [Wallgrun10book, Bloechliger17arxiv], we are not aware of any work that uses purely topological maps to directly represent volumetric information. Pure topological maps are thus rarely used for exploration. A notable exception to this are [Bosse03icra, Akdeniz15icra]. Rather than explicitly keeping track of free space, their robots build a pose graph similar to [Es15crv]. At every vertex of this graph, the robot adds to the graph a finite amount of adjacent candidate locations where the robot could move to. Subsequently, these candidate locations are all visited and the process is repeated recursively unless a candidate turns out to correspond to a previously visited location. However, candidate robot locations are only a very crude representation of free space, which makes it hard to estimate the actual coverage.
Hybrid Representations Hybrid, or topometric, representations are a combination of metric and topological representations, aimed at combining their advantages. Strictly speaking, the aforementioned pose graphs were already topometric, since they contained metric positions and transformations. Nonetheless, in this section we focus on representations which augment topological graphs with volumetric information. An early topometric representation has been proposed in [Howard06ieee]. This representation consists of a set of polygons, each representing the free space measured around a specific pose of the robot. The locations of these polygons are expressed relative to each other. However, [Howard06ieee] does not consolidate the information of neighbouring polygons in a way that would be helpful for exploration. As a consequence, in [Howard06ijrr2], where [Howard06ieee] is used in an exploration system, the authors fall back on local occupancy grids for exploration. In contrast, we extend [Howard06ieee] with a consolidation procedure that allows its direct application to exploration. Furthermore, while [Howard06ijrr2] ends up building a globally consistent map, we show that global consistency is not required for exploration. The use of local occupancy grid submaps is also proposed in [Schmuck16ifac, Millane17arxiv], where the authors aim at producing a globally consistent map of the environment just like [Howard06ijrr2]. Achieving global consistency requires map optimization, which in turn leads to costly re-calculation of the occupancy grid submaps, even if it can sometimes be avoided for small loop closure corrections [Schmuck16ifac]. We show that all of this can be avoided since global consistency is not necessary for exploration.
2.2 Path Planning for Exploration
In order to understand how to build a good representation, we need to understand how it is used. In path planning for exploration, after every observation the robot has to decide where to move next. In order to achieve fast exploration, the robot should perceive unknown space as quickly as possible. There are two main planning approaches in the literature: Next-best-view (NBV) planning and frontier-based planning.
function. This problem has also been studied in the computer vision community[Connolly85icra]. Most NBV planners determine the next best view by sampling candidate views, predicting their utility [Gonzalez02ijrr, Papachristos17icra], and picking the view with the highest utility. The advantage is that the utility function can be adapted to contain arbitrary terms and constraints, such as the motion model of the robot, or terms describing map accuracy. The disadvantage is that finding the view with optimal utility is usually intractable, hence a sampling method has to be applied. Furthermore, and somewhat as a consequence, NBV planning commonly requires heavy computational load.
Frontier-Based Exploration Planning Frontier-based exploration planners follow a simple and efficient scheme. The strategy is to navigate to the closest boundary between known free and unknown space. These boundaries are referred to as frontiers. Frontier-based exploration methods assume that navigating to a frontier will result in the exploration of new space. The original idea was introduced in [Yamauchi97icra]. Unlike NBV planners, frontier-based planners do not natively support arbitrary terms or constraints. However, it was shown that they cover the space faster than their NBV counterparts [Holz10isr, Cieslewski17iros]. In this work, we will evaluate our representation using a modified version of the frontier-based exploration method presented in [Cieslewski17iros]. Frontiers are explicitly expressed in our method, so it is natural to adopt frontier-based exploration.
3 Problem Statement
The goal of exploration is to build a map of an unknown environment. Consider a bounded region of space that represents the unknown environment that we would like to explore. consists of free and occupied space . With a robot that can measure the free space around itself, we can explore . We denote a robot pose in the world frame as with orientation and translation . The Field of View (FOV) of the robot at pose is denoted as . While a robot is moving on a trajectory , it will measure at consecutive sampling times . We denote the corresponding poses as . The space explored up to time (known free space) is the union of all the FOVs measured so far:
We consider two goals for our representation: 1) represent the robots’ estimate of . 2) Allow to determine when exploration is complete, that is, when the whole free space is covered:
Without knowing the ground truth , this condition can be established given sufficient knowledge about the boundary of , . Generally, either consists of interfaces to occupied space or interfaces to unknown free space. As can be shown by contradiction, (2) holds if and only if consists only of interfaces to occupied space111 Trivial exceptions, such as disconnected free space, are omitted for brevity. (“no frontiers left”). Hence, the second goal can be reached by correctly representing .
4 Proposed Representation
As in [Howard06ieee], the proposed map representation is based on polygons. More specifically, local polygons individually represent the FOV of every pose, . The inside of a polygon represents known free space while its edges represent the boundary . Unlike [Howard06ieee], is labeled, which allows us to reach the second goal of the problem statement. The labels are obstacle for parts of the boundary that are common with occupied space, frontier for parts adjacent to unknown space, and free for parts that are considered to lie within another polygon. See Fig. 4 for a visualization of our representation.
4.1 Local Volumes from Depth Measurements
We assume a depth sensor as source to build the polygon of the robot’s Field of View (FOV). Each measurement consists of a set of simultaneously acquired depth samples. In our simulations, these samples are modelled as equally distributed within the FOV, as shown in Fig. 5.
We build the polygon from these samples by taking the position of adjacent samples and the robot’s position as vertices and connecting them with edges. Some samples correspond to measured depths while others correspond to samples where the closest obstacle is beyond the sensor range. If two adjacent depth samples are not beyond range, and have values with a difference below a threshold , the edge connecting them is considered an obstacle edge. In all other cases, the edge is considered a frontier edge. The threshold is intended to account for occlusions (see Fig. 5). Sometimes, these cannot be distinguished from obstacle surfaces that are nearly parallel to the measurement rays [Gonzalez02ijrr]. These incorrectly labeled frontiers can be resolved by approaching that obstacle from another angle. Furthermore, the two edges adjacent to the robot position are also considered frontiers.
4.2 Pose Graph Representation
In our representation, we store separate local volumes for each set of depth measurement samples, which are built as described in the previous section. These polygons are referenced in the vertices of a pose graph. Each vertex of this graph represents a robot pose at which a measurement was taken. The vertices are connected with edges that carry the relative transformation between the two poses. The edges are either established between subsequent measurements, in which case is estimated using odometry, or they are established due to place recognition, in which case comes from a relative pose estimation algorithm. The pose graph and our representation can be constructed incrementally and on-line. The relative transformation between non-adjacent poses can be estimated by integrating relative transformations along a path connecting the two corresponding vertices. Considering odometry drift, this estimate becomes less accurate as the path length increases. Without global consistency, estimates integrated along different paths can differ [Bosse03icra, Es15crv]. Thus, for our purposes, we use the estimate resulting from integrating along the shortest path.
4.3 Frontier Consolidation
When adding new depth measurements, we need to update the local polygons. New measurements can turn unknown space into known free space, and the corresponding frontier boundaries need to be converted into free boundaries. Similarly, frontiers might need to be re-assessed after place recognition. This process, which we call frontier consolidation, is illustrated in Fig. 8.
For every new depth measurement , a consolidation scope comprising the local volumes of nearby poses is established using Dijkstra’s algorithm. All poses within a distance of are added to that scope, where the norm of the estimated translation between two adjacent poses is used as the weight of the edge between them. Consequently, reflects the distance over which the pose estimation (odometry and relative pose estimation) has small drift and can be used to consolidate local volumes. Inside the consolidation scope, all local volumes are then consolidated pairwise among all possible pairs. To that end, all local volumes are temporarily expressed relative to by using the pose resulting from pose integration along the shortest path from , see Fig. 8. Given two local volumes transformed in this way, any frontier edge of one volume that lies inside the other volume is re-labeled as free edge, and vice versa. If a frontier edge is only partially inside, it is subdivided accordingly.
Consolidation is triggered for every new depth measurement. In our experiments, we assume that previously visited places are recognized at the same time as a vertex is inserted into the polygon. Otherwise, polygon consolidations would also need to be triggered as new edges are inserted into the pose graph due to place recognition.
So while our approach does not rely on odometry accuracy, it relies on good place recognition performance. An increasing rate of false negatives could potentially be dealt with by methods that expand existing matches, or by increasing the volume in which polygons are consolidated. An increasing rate of false positives (perceptual aliasing) would be harder to deal with. However, false positives have also been shown to be catastrophic for optimization-based approaches [Sunderhauf12iros] – remedies to false positives are equally applicable to both approaches.
4.4 Extension to 3D
While for simplicity, we focus on two dimensions in this paper, we believe that the presented approach is readily applicable in 3D. The main challenge here is the 3D representation of local volumes, and the intersection of those local volumes, where volume boundaries become surfaces, rather than edges. One could use local occupancy grids, or local coarse meshes [Teixeira16iros, Greene17iccv] for memory efficiency.
5 Application to Frontier-Based Exploration
To demonstrate how our map representation can be fitted to exploration approaches that previously used metric representations, we adapt the state-of-the-art method proposed in [Cieslewski17iros] to use our solution. Exploration in [Cieslewski17iros] is performed using a state machine with the following three states:
Reactive: While frontiers are present in the current FOV, navigate towards the frontier that incurs the least change to the current velocity.
Deliberate: If no frontiers are left in the FOV, but frontiers are left globally, plan a path to the closest frontier.
If no frontiers are left overall, exploration is considered complete.
Since the reactive state is based on the current field of view, it is trivial to adapt to our representation. In the original implementation, information was extracted from OctoMap [Wurm10icraw] updates to determine frontier voxels in the current FOV. With our representation, frontier candidates are simply selected from the edges of the current local polygon. The deliberate state cannot be adapted directly. In the original implementation, a path to the closest frontier was found using a Dijkstra search across adjacent free space voxels. With our representation, adjacency of free space is only meaningful among polygons that are also close to each other in the pose graph. Hence, instead of looking for the closest frontier in 2D space, we instead look for the closest vertex in the pose graph that has a local volume with unconsolidated frontiers. We then let the robot navigate to that vertex using the teach and repeat navigation method proposed in [Es15crv], as it does not require global consistency of the map. Once the robot arrives at that vertex, it switches back to the reactive state. Completion in the original method is assumed once there are no frontier voxels left. In our approach, this corresponds to none of the polygons having frontier edges left.
Using [Cieslewski17iros] as exploration algorithm, we compare our representation to a metric grid-based representation, with and without loop closure capability, in simulated 2D environments under varying conditions. The representations are compared in different environments, with simulated odometry drift of varying intensity and with simulated place recognition of varying recall.
6.1 Experimental Setup
We simulate a robot with a forward-looking depth sensor operating in either of the environments shown in Fig. 9.
The depth sensor has a FOV of in horizontal direction and a range of , similar to the sensor used in [Cieslewski17iros]. The simulation is performed in time steps , for each of which the robot’s true pose and pose estimate are updated as follows:
where is the relative pose command output provided by the exploration planner. In the true pose update, a pose increment is applied to . As it is the robot that executes the desired pose increment, it believes it has executed , and so noise needs to be applied to the true pose update, rather than the estimate. The translation of has coefficients sampled from a zero-mean Gaussian and its rotation angle is sampled from . Here,
is a function that ensures that the variance of the noise is proportional with the distance travelled:
We furthermore simulate place recognition. Whenever the current pose falls within a threshold distance to a previous pose , the identifier of that pose, , as well as are provided to the simulated robot. To prevent self-matches, all poses within of pose graph traversal are excluded from place recognition. Also, place recognition is only provided if is in line of sight from , to prevent place recognition across occlusions.
6.2 Baseline Implementation
As a baseline, we implement a simplified version of the grid representation [Wurm10icraw] using a regular grid instead of octrees, as we do not take computational performance into account. According to [Wurm10icraw, Moravec85icra], the occupation probability of each cell is updated using the most recent measurement with
We assume a precise depth sensor and let if a ray hits an obstacle inside at time , if cell is fully contained inside the polygon spanned by the sensor rays as defined in Section 4.1, and otherwise. Consequently, can only assume values , which we accordingly label as free, unknown and occupied. For cases where (6) is undefined, later measurements override earlier measurements. This baseline is implemented in two ways: a naive one (“grid”), which uses the state estimate as-is, and one with loop closure capability (“grid-lc”). The latter exploits place recognition events to optimize its pose-graph according to all relative pose estimates present in the pose graph using g2o [Kuemmerle11icra]. Every place recognition triggers an optimization after which all updates according to (6) are executed anew with the optimized pose estimates . We use a voxel length of 1m for all maps.
6.3 Evaluation Metrics
To quantify the performance of exploration, we measure the distance traveled until the robot believes full coverage has been achieved, , and the expected distance to be traveled to discover an arbitrary location in free space, . reflects the general speed of exploration. It is not severely affected if the robot takes a very long time at the end to navigate to the last couple of frontiers. Depending on the application, one metric is more important than the other. For more insight into the progress of exploration, we track the ratio between the volume of known free space and the full free space, , , which we call coverage ratio. It is initially and reaches when full coverage is achieved. Note that refers to the ground truth volume covered by the robot, not the estimate of that space by the robot. is represented using a grid – in our simulations we use this grid to define the environment in the first place, see Fig. 9. Unlike in the baseline grid representation, occupied cells are known beforehand, and only free cells are labeled as known free or unknown free. Known free space is approximated as the set of cells labeled known free. An unknown free cell is relabeled known free as soon as any sensor ray intersects it and remains known free until the end of the simulation. Given this approximation of , the coverage ratio is calculated by dividing the count of cells in by the count of cells in . Since full coverage will not be reached in all experiments, even if the robot believes this to be the case, we also report the final coverage ratio. In these cases, is undefined.
We perform experiments with the following parameter combinations: simulation in either of the environments shown in Fig. 9; pose estimate noise simulated with
with the noise multiplier ; and place recognition radius of or times the sensor range . For every parameter setting, ten different runs have been performed (due to the stochastic nature of the simulated noise). The place recognition radius is varied because a larger radius should result in the robot having to travel less in order to consolidate frontiers. It is limited to two times the sensor range as beyond that, polygon intersection between and is impossible.
Fig. 13 shows the performance of the evaluated methods as pose estimation noise is increased. As we can see, only the proposed representation always reaches full coverage in all environments. The other approaches perform particularly poorly in the maze environment, where the robot has to travel long distances in close proximity without the ability to close loops, see Fig. 3. We can also see that this comes at a cost of longer distances covered until the robot believes coverage is achieved. is between and times larger for our approach, , however, only up to larger. This happens because the proposed approach needs to consolidate frontiers of all spatially close polygons through place recognition, while the grid-based representation is able to consolidate frontiers from measurements even if the relative pose established between them comes from an estimate integrated over a very long path. A typical evolution of the coverage ratios for one sample of each method and noise multiplier is shown in Fig. 17.
Fig. 20 shows how our method is affected by the loop closure distance . As can be seen, a larger distance at which places can be recognized leads to faster exploration times in the open environment. This makes sense, as a larger loop closure distance leads to generally larger frontier consolidation scopes, allowing frontiers to be removed faster. Exploration speed in the maze environment, however, is not significantly affected, as there are generally not many place recognition events happening in that environment.
8 Validation in the Real World
We validate that our approach works in the real world with the experimental setup shown in Fig. 21.
The platform is a ClearPath Jackal equipped with a Hokuyo laser scanner as depth sensor and two fisheye cameras that provide a view of the environment. Ground truth is obtained with a motion tracking system using reflective IR markers. For the state estimate, we use the wheel odometry provided by the Jackal robot. We intentionally do not use the best available state estimate, to demonstrate robustness to drift.
The most important assumption that we have made in our simulations is that sufficient place recognition and relative pose estimation can be provided. To this end, we use visual place recognition from a panoramic image stitched from the two fisheye cameras. The vertical field of view is restricted to prevent place recognition from structure that is visible from everywhere in the room. Using a view allows place recognition independent of the orientation of the robot between the two matched places. First, the CNN full image desciptor NetVLAD [Arandjelovic16cvpr] is used to quickly determine the visually most similar previously recorded image. Fig. 21 (c) shows NetVLAD matches in a trajectory where the robot drives once around each obstacle. As can be seen, all place recognitions occur locally, with only two matches across a longer distance, but still within line of sight. Given a NetVLAD match, we match ORB features [Rublee11iccv] and use P3P [Gao03pami] and RANSAC [Fischler81cacm] for geometric verification and relative pose estimation. For P3P, the 3D locations of features in each frame are triangulated using KLT tracks [Lucas81ijcai] of those features in subsequent frames. The only component that we have not implemented, as it is out of the scope of this project, is teach-and-repeat [Es15crv]. To simulate teach-and-repeat, we instead use the motion tracking system to let the robot backtrack its trajectory. This, and ground truth for evaluation, are the only things for which the motion tracking system is used.
Fig. 24 (a) shows the final state of exploration in our experiment.
As we can see, there is drift in the estimated trajectory, yet our approach still manages to fully cover the environment. Note that there are parts of the environment where trajectories overlap. These are locations where our visual relative pose estimator fails to obtain enough feature matches. However, as long as relative pose estimations are obtained within the distance of the consolidation scope (see polygons shown in Fig. 24 (a)), this does not pose a significant problem. Fig. 24 (b) shows the corresponding coverage over the travelled distance. The behaviour is consistent with the results obtained in simulation, wherein the robot first quickly covers most of the map, after which it does a lot of backtracking to seek out frontiers that remain in the map.
In this paper, we present a map representation that allows exploration in spite of large drift in the pose estimate. We show that global consistency in the pose graph is not required to determine if exploration is complete. This alleviates the need for map optimization, which is particularly interesting for multi-robot exploration. In addition, the proposed method can be adapted to algorithms that currently use different representations. Using a state-of-the-art exploration algorithm, we compare our representation to a grid-based representation. In contrast to the latter, and at a cost of longer exploration time, all of the free space can be fully covered with our representation, even with large drift in the state estimate.
This work was supported by the National Centre of Competence in Research (NCCR) Robotics through the Swiss National Science Foundation and the SNSF-ERC Starting Grant.