Autonomous exploration has an increasing number of applications such as disaster relief, search and rescue, and environment monitoring. It is a fundamental capability for the mobile robot to gather information of an unknown environment. The primary goal is to autonomously gather required information with limited time or resource budget, which poses great challenges for exploration strategies.
Various past methodologies have been proposed to tackle the autonomous exploration problem, among which the most investigated approach is the nearest frontier method . The frontier in this method is defined as the boundary between the known and unknown regions on the incomplete map. In this scheme, the robot always pursuits the nearest frontier to explore. Besides the frontier-based method, the information gain is also frequently adopted in the context of autonomous exploration , where the NBT is determined by evaluating the amount of information gain at the target areas. Typically, these approaches use the metric map to represent the environment and extract the required information, which can be time-consuming in large-scale environments.
In contrast with the majority of frontier or information based methods, little effort has been devoted to developing special-designed path planning methods for the exploration. However, an efficient path planner is a prerequisite for the target evaluation involving path cost. Sampling-based path planning method  is a broadly adopted approach in the exploration. It is more efficient with respect to the large-scale or high-dimensional environments. However, these planning methods are not historical-aware. A path will be replanned using these methods on the whole map when the robot needs to backtrack to some places that is previously visited without further exploitation. Besides, the previous planning methods pay more attention to the path cost without considering the information collected along the path.
In this paper, we propose to use a novel hybrid map, the SRM, to guide the exploration. This topological map is a graph structure that is built incrementally over the course of exploration. As shown in Fig. 1, the node on the graph structure has two attributes: the semantic label and the information gain at the corresponding location. The NBT selection is performed directly on this topological map. Based on the SRM, we propose a hierarchy decision-making strategy, where the high-level decision is made according to the semantic information, and the low-level decision making is concerned with the path cost and the information gain. Taking advantage of such a process, not only can we get the information gain efficiently comparing with computing on the whole metric map, but we can also query a path cost directly on the graph structure. Furthermore, the queried path is optimized using the cross-entropy method. Hence the robot can get the most information along the generated trajectory. The experimental results demonstrate that our method can explore the unknown environment more efficiently than the conventional methods.
The remaining of this paper is organized as follows. In Section. II, we introduce some related work about autonomous exploration. Then in Section. III, we give some preliminaries about our problem. After that, we introduce our methodology in Section. IV. Our experiments results are presented in Section. V. Finally, we give the conclusion and future work in Section. VI.
Ii Related work
A large number of related approaches are developed to cope with different aspects of the autonomous exploration problem . In this section, we mainly focus on three tightly coupled aspects that our framework is built upon in the autonomous exploration. To start with, we give an overview of the map representation in robotic navigation. Then we describe different decision-making strategies in the exploration process. Finally, we review the literature on informative path planning during the exploration.
Ii-a Map representation
The study of map representation in robot tasks is one of the most active areas in robotics research. Generally speaking, the map can be divide into four groups: the metric map, the topological map, the semantic map, and the hybrid map . Various approaches have been proposed to enable the robot navigation with different kinds of maps , but few studies have been focused on the map representation during the exploration. Choset et al.  proposed a sensor based hierarchically generated Voronoi graph to represent the environment. One can query a path efficiently and completely on the graph during the exploration. Rezanejad et al.  suggest to use the topological map generated by the flux skeleton based method on the occupancy grid map, where the skeleton is generated online by utilizing image-based method instead of constructing incrementally with the exploration.
Ii-B Guidance of exploration
NBT evaluation is another critical issue in autonomous exploration. Two frequently used metrics for determining the NBT are the nearest frontier and the information gain of the candidate targets . Shen et al.  improve the exploration efficiency through evaluating the NBT according to a stochastic differential equation-based method. Charrow et al. proposed to use the information-theoretic objective function to guide the exploration. More recently, Umari et al.  present an exploration method based on rapidly exploring random tree (RRT). The frontier regions can be detected efficiently through sampling based method. Besides, some novel metrics are also proposed for the evaluation of NBT. Dornhege et al.  present a frontier-void-based method that uses both the frontier and the unexplored 3D volumes to guide the exploration. Visser et al.  propose to balance the path cost and the information gain during the exploration.
In recent years, there has been growing interests in making the autonomous exploration more intelligent. Owald et al.  present an exploration strategy by exploiting the structure of the environment. Semantic information has been widely used for the robot navigation as well 
. With semantic reasoning, a robot can achieve human-like performance. Moreover, some learning-based methods toward autonomous exploration have been proposed along with the developments in deep learning approaches.
Ii-C Informative path planning
Path planning is of great importance during the exploration but most work mainly pays attention to the motion constraints without considering the information gathered along the path. Heng et al.  propose an algorithm for visual exploration considering both the exploration and the coverage. The robot is pruning to select a path with more information. Davis et al.  present a coverage aware path planning method. This approach plans a path from a start point to the goal while maximizing the user defined regions. Tan et al.  propose an information gathering method based on cross-entropy. The trajectory is optimized using the cross-entropy method to maximize the objective function in information gathering.
Our main goal is to build a precise metric map of an unknown environment with as less path cost and exploration time as possible. We use the occupancy grid map to model the environment. The map, denoted as
, encodes the probability of being occupied for every grid. Suppose there arecandidate regions to be explored, the information gain of the candidate region is obtained using . The entropy over the occupancy grid map is denoted as:
where is the grid cell of the map at location . We use mutual information to represent the information gain for a sensing location , which is defined as:
We propose to use the SRM for the decision-making process over the course of exploration. The SRM, denoted as , is the topological map of the explored environment. It is a graph structure with representing the vertices and with represents the edges that connect the vertices. For every on the graph , it has two attributes: the semantic label and the information gain during the exploration. A semantic mapping process is maintained and the semantic label is continually attached to the nodes of the graph. The nodes on the graph are used to indicate the candidate regions to be explored, i.e. . The path from the robot current location to the target area can be queried efficiently on the graph . Then the path cost is calculated by:
The path will be optimized with respect to utilizing the function , which is the trajectory optimization method based on cross-entropy.
Iv-a Method Overview
To build a precise metric map of the environment, we build a topological map incrementally during the course of exploration. As Fig. 2 shows, the mapping process maintains three maps: the occupancy grid map , the semantic map , and the graph structure . The node on the graph contains the semantic label provided by and the information gain queried on . Robot can find a path efficiently on the graph
, which is then optimized by the cross-entropy based method. The determination of NBT involves not only the path cost and the information gain, but also the high-level semantic information. All these evaluation metrics are combined in the decision-making process to guide the robotic exploration.
Iv-B Semantic Road Map
Iv-B1 Tree construction
The graph structure is built incrementally along with the exploration process. Initially, there is only one original node root at the starting point where the robot starts to explore the environment, i.e. . The pseudo code of building the graph structure is shown in Alg. 1. We propose to use the sampling points under the sensor scope as the candidate vertices for building the graph. As shown in Fig. 3, the laser beams indicate the current sensor scope. Function takes as input the sensor scope and generate sampling points , as indicated by the yellow points in Fig. 3. For a random point , it will find the nearest vertex on graph according to:
where is the set of the vertex on the graph and the connection from to is not collision free. If , then is not a feasible random point since does not exist. Function will be called several times before a feasible sample point is obtained. Then, the feasible point connect the nearest vertex in the graph . The random point and the edge that connects and are added to the existing graph. Furthermore, each effective vertex is attached with a semantic label which corresponds to the room type detected by the semantic mapping pipeline. However, if is less than a threshold, then the random point will be dropped and a new iteration will start. Line 10-22 in Alg. 1 describe the details of adding a vertex to the graph. The algorithm runs online along with the exploration process.
We use A* to search a global path on the existing graph . Since we use the node on the graph to represent the target, it is simple and straightforward to get a sparse path on the graph. Some edges on the graph may become infeasible along with the exploration process. The reason for this is that the collision checking is performed on the incomplete occupancy grid map. The edges that lie in the unknown area in the map is assumed to be collision-free, but sometimes there could be collision in the unknown area. In our proposed method, we check the feasibility of the path only if a path is generated. If the generated path is not feasible, i.e. one or more edges for constructing the graph are not collision free, we delete those edges on the graph and restart the path-finding process.
Iv-B2 Efficient frontier detection
We propose an efficient frontier detection method based on the observation that the frontier cells are always appears grouply, as Fig. 4 shows. When one frontier cell is found, then the adjacent cell may also the frontier cell. A frontier clusters can be obtained efficiently through exploring the surrounding area of a frontier . To get the the cell , the nodes on are used for the evaluation. The graph exists and extends towards the collision free area, while the frontier cells lie between the collision free area and the unknown regions. Hence the frontier point can be found near the nodes on the graph inherently. We exploit the vertices on the graph to find the frontiers over the course of exploration.
As shown in Fig. 4, The shaded square indicates the frontier cells around the nodes detected by the ray-casting method. Two frontier cells are detected around node B on the . Note that there is another frontier cell that is not detected by the raycasting method. This cell can be found efficiently through checking the neighbors of detected frontier cells. Hence all the frontier cells can be obtained through this method. Unlike the conventional method that checks all the cells to get the frontiers. our approach is more efficient because we only need to check the cells around the nodes on the graph. To further improve the efficiency of our method, we divide the nodes on the graph into two groups: the closed nodes and working nodes . The closed nodes are those no frontiers can be detected with. If there is no frontier detected around a nodes using the ray-casting method, then this nodes is closed. We will not check these nodes when performing frontier detection. This prune method is a simplify that can further improve the efficiency since the ray-casting method can also be time consuming.
The proposed frontier detection method is shown in Alg. 2. A cluster of points can be found through function around node . For every point in the set , if it is a frontier cell, then we mark this point on the map and check its neighbor. If the neighbors are also frontiers, this process repeats until all the frontiers in the neighborhood are found. Line 2-13 in Alg. 2 describes the frontier detection process. This iteration ends when there is no frontier detected. This means that there is no frontier regions around the vertex , we will put this vertex into closed set and will not check it in the following iterations.
We use the amount of frontiers to quantify the expected information gain around a node. With the proposed frontier detection method, we can obtain the information efficiently.
Iv-C Informative path planning
The path connecting the start and the goal region contains the vertice and the edges . This coherent path is not optimal since the vertice are generated randomly without considering any constraints. In the proposed framework, the robot is required to gather more information when it moves from one point to another. The path should not only be with less path cost but it should also be more informative. The goal is to compute a trajectory that satisfies:
where is the reward function over the path. We consider both the path cost and the information gain along the path. Thus we define the reward function by: , as modeled in , where is a positive constant which determines trade-off between the exploration and exploitation, is the trajectory length of the path, and is the information gain along the path defined by Equation. 2. guarantees the path is collision-free and satisfies the motion constrains of the robot. Besides that, this function gives the upper bound of the path cost of the final path. is the path length of the sparse path . The optimized path length, should satisfy the constraint that: . This constraint can helps when the optimization of the path does no converge. Moreover, the information gain along a path is a function about time , which means that the sensor scan at time t should consider the effect of its previous scan . This can help the robot avoid the local minimum in information rich area using our proposed objective function.
The objective function is optimized using the cross-entropy method. In what follows we will mainly focus on how to optimize our problem using the cross-entropy method. We refer the interested reader to  for more details. Our development follows closely  using notation adapted to our setting. The path is parameterized by the function , where is the parameter space. The cost function can then be defined by:
This optimization in Equation. 6 can be solved by the cross-entropy method. Denote as the optimal reward, i.e.
where is the accumulated reward. The parameter space has a pdf . Here in our case the pdf is a mixture of Gaussian, where the number of Gaussian components, n, is determined by the vertices on the original path . Equation. 7 can be formulated as:
The optimal pairs can be updated by:
This iteration will end until a certain termination condition is fulfilled. Then we get an optimized trajectory , which is more informative with less path cost as well.
Iv-D Decision making
The target locations during the exploration are always encoded by the nodes on the graph , hence we can easily query a path to the target on the graph . The NBT is determined by the high level semantic information, the path cost and the information gain.
The semantic information can give the robot high-level information for guiding the exploration. We divide the scene into two categories, the room and the connection . is the place that connects multiple rooms, for example, the corridor. During the exploration, robot always tends to explore the rooms instead of the corridors. As Fig. 5 shows, the robot firstly explores the Room 1, Room 2, and Room 3 first according to the distance, then the Corridor 1 and Corridor 2 according to the distance and the information gain. Noticed that when there is no corridor, the exploration will become a Travel Salesman Problem (TSP). Robot will traverse all the rooms according to the effective methods of TSP. Overall the decision is made differently in two cases:
Case 1: If there are rooms detected in the environment, the robot will traverse these rooms according to the distance from its current location. When a new room comes up during this process, the robot will replan this traverse path every time it finishes exploring one room.
Case 2: If there are only corridors in the environment, the robot will select one target according to the reward function . Robot considers both the path cost and the information gain when selecting the target regions in this case.
Robot will make decisions according to the two cases above. In this simple but effective decision model, robot can distinguish the rooms from the connection places. Generally speaking, the robot is able to explore the room one by one without wasting much time in the corridor. This effectively help reduce the backtracking problem, in which the robot will come back and forth to explore unfinished previous tasks.
V Experiments and Results
To demonstrate the benefits of our proposed exploration framework, we conduct a number of experiments in both simulated and real-world environments. The experimental studies demonstrate the effectiveness of our proposed pipeline as well as the efficiency of each individual module of the proposed framework.
We test our proposed framework in a typical simulated indoor environment. As Fig. 6 shows, the blue line indicates the edges and the colorful points indicate the nodes of the graph. Different colors of the node represent different semantic labels. The frontier region, as marked by the red stripe in the Fig. 6(b)-Fig. 6(c), can be completely and efficiently detected using the generated SRM. The reason is that a lot of nodes that are near the frontiers are generated during SRM buliding process. This means the frontier’s location information is already encoded by these nodes, which makes the frontier detection task easier when performing the ray-casting method. This experiment demonstrates that our proposed topological map, which is built incrementally along with the exploration process, can be used for fast frontier detection and guiding the exploration.
To further explore our frontier detection method, we carried out several experiments. Firstly, we conduct ray-casting on all the nodes of the graph to get the frontier regions. This method is more accurate but it is more time-consuming. Note that there are no frontier regions around some nodes on the graph along with the exploration process, so we propose to prune those nodes. The experiment is performed in the simulated environment, as shown in Fig. 7(a). Overall, our proposed method can always detect the frontier region within 0.4s, as shown in Fig. 7(b). For the method without pruning, the frontier detection time increases significantly along with the exploration, corresponding to the increase of the number of nodes on the graph. The method with pruning is even more efficient after pruning extra nodes. As a quantitative measurement, the red line in Fig. 7 indicates the frontier detection time after pruning extra nodes. As we can see, the frontier detection time is increasing during the first 50 seconds. This is because there exists a lot of frontier regions and only few extra nodes can be pruned. However, the frontier detector can become increasingly efficient with more and more extra nodes being pruned.
|Env||Size ()||Our planner||RRT*|
Robot can query a path efficiently on the topological map . To demonstrate the efficiency of our proposed planner, we conduct a comparative study in four different environments. The size of each environment is reported in TABLE. I. The control variable is the planner used during the exploration and the nearest frontier based method is adopted for the decision-making module in all the experiments. We choose RRT* as the comparable planner. As reported in Table. I, our planner is more efficient than the RRT* planner in all the four experimental scenarios. The time spent on the test group and the control group are both positively related to the map size. Moreover, our method can even better in large-scale or high-dimensional environments.
The performance of our trajectory optimization method is demonstrated in Fig. 9. The proposed objective function is optimized using the cross-entropy method. As shown in Fig. 9, the sparse trajectory is queried directly on the generated graph structure. The black line indicates the shortest path satisfying the motion constraints. The useful information is evaluated by the unknown area covered by the robot sensor scope. The black path is short without considering the collected information along the path. Using our proposed cross-entropy based method, we can find a path, as the red line shows in Fig. 9(a), along which the robot can collect more information. With relatively longer path, the robot can explore more unknown areas when it executes the path from the start to the goal. Fig. 9(b) shows the variations of the objective function at different steps of the iterations using our cross-entropy based method. Obviously, the value of the objective function is reduced and we can get an optimal path after a few iterations.
We proposed to guide the exploration using the semantic information, the information gain and the path cost to the target. The experiments are performed in four environments to evaluate the efficiency of our exploration strategy. We compare our method with the Nearest frontier based method and the Max Entropy method. The nearest frontier method is a greedy method which always seeks the nearest frontier to explore during the exploration, while the Max Entropy selects the most informative area as the target to explore. Compared with these two methods, our method is always more efficient in all the four environments, as shown in Fig. 8. The entropy can be reduced to a smaller value with less time using our method. However, it is difficult to select a better one from the other two methods. The performances of these two methods are highly dependent on the environment.
The statistics of the path length and time spending are reported in Fig. 10
. Our proposed guidance strategy can explore the environment with less time and path cost. Besides, the standard deviation of our method is also less than the other ones, which means our method works more steadily. There is no much difference between the performance of the Nearest Frontier based method and the Max Entropy method. Furthermore, we can see from Fig.10 that longer path means longer exploration time. This is because of the fact that most exploration time is spent on the execution of the path. The less the path is, the less exploration time is needed. Our proposed guidance strategy is useful in reducing the path cost, as shown in Fig. 10. Besides, the proposed trajectory optimization method can also provide more information with less path cost, which in turn reduce the path length.
To evaluate the efficiency of our proposed framework, we compare our method with the method that considers both the path cost and the information gain, in which the two criteria are linearly combined weighted by two coefficients and . The combined method uses the RRT* for the path planning during the exploration. Fig. 11 shows the experiment environment, where one typical exploration trajectories of our method and the combined method are illustrated. We compare our method with the combined method at three different starting locations and the data is recorded in Table. II. As we can see, the performance of our proposed method is better than the comparison group in all these three scenarios.The combined method may achieve better performance with proper and . For example, there is no much difference of the mean path length when starting at location A. However, the combined method is not stable, which can be seen from the larger standard deviation of the path length.
|A||72.1 2.1||0.36||74.3 5.2||0.30|
|B||75.8 4.3||0.31||98.2 6.7||0.33|
|C||74.5 2.7||0.27||79.7 5.3||0.25|
Vi Conclusion and future work
In this paper, we propose a novel hybrid map, SRM, for autonomous exploration in indoor environments. This hybrid map features the combination of the semantic map and the topological map. It is a graph structure where a node contains the information gain and the semantic label corresponding to the node location. Both the information gain and the path cost can be obtained efficiently with this graph structure. Moreover, we use the cross-entropy based optimization method to optimize the queried sparse path. This optimization provides a more informative trajectory. The semantic information, combined with the information gain and the path cost, is used to guide the exploration. Our framework is not a simple integration of several modules but a tightly coupled system. The modules in this system contribute to each other to make the exploration more efficient. The experimental studies demonstrate that our method is effective and more efficient than the others.
Our method currently focuses on the 2D environment. In the future, we will extend our method into 3D exploration area. We plan to use the octomap to represent the environment. Our proposed method can efficiently detect the frontiers and plan a path in 3D environments.
-  Brian Yamauchi. A frontier-based approach for autonomous exploration. In Computational Intelligence in Robotics and Automation, 1997. CIRA’97., Proceedings., 1997 IEEE International Symposium on, pages 146–151. IEEE, 1997.
-  Cyrill Stachniss, Giorgio Grisetti, and Wolfram Burgard. Information gain-based exploration using rao-blackwellized particle filters. In Robotics: Science and Systems, volume 2, pages 65–72, 2005.
-  Steven M LaValle. Rapidly-exploring random trees: A new tool for path planning. 1998.
-  Angelos Mallios, Pere Ridao, David Ribas, Marc Carreras, and Richard Camilli. Toward autonomous exploration in confined underwater environments. Journal of Field Robotics, 33(7):994–1012, 2016.
Sebastian Thrun et al.
Robotic mapping: A survey.
Exploring artificial intelligence in the new millennium, 1(1-35):1, 2002.
-  Kurt Konolige, Eitan Marder-Eppstein, and Bhaskara Marthi. Navigation in hybrid metric-topological maps. In Robotics and Automation (ICRA), 2011 IEEE International Conference on, pages 3041–3047. IEEE, 2011.
-  Howie Choset and Joel Burdick. Sensor-based exploration: The hierarchical generalized voronoi graph. The International Journal of Robotics Research, 19(2):96–125, 2000.
-  Morteza Rezanejad, Babak Samari, I Rekleitis, Kaleem Siddiqi, and Gregory Dudek. Robust environment mapping using flux skeletons. In Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on, pages 5700–5705. IEEE, 2015.
-  Matan Keidar and Gal A Kaminka. Efficient frontier detection for robot exploration. The International Journal of Robotics Research, 33(2):215–236, 2014.
-  Maani Ghaffari Jadidi, Jaime Valls Miro, and Gamini Dissanayake. Gaussian processes autonomous mapping and exploration for range-sensing mobile robots. Autonomous Robots, 42(2):273–290, 2018.
-  Shaojie Shen, Nathan Michael, and Vijay Kumar. Autonomous indoor 3d exploration with a micro-aerial vehicle. In Robotics and Automation (ICRA), 2012 IEEE International Conference on, pages 9–15. IEEE, 2012.
-  Hassan Umari and Shayok Mukhopadhyay. Autonomous robotic exploration based on multiple rapidly-exploring randomized trees. In Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on, pages 1396–1402. IEEE, 2017.
-  Christian Dornhege and Alexander Kleiner. A frontier-void-based approach for autonomous exploration in 3d. Advanced Robotics, 27(6):459–468, 2013.
-  Arnoud Visser and Bayu A Slamet. Balancing the information gain against the movement cost for multi-robot frontier exploration. In European Robotics Symposium 2008, pages 43–52. Springer, 2008.
-  Stefan Oßwald, Maren Bennewitz, Wolfram Burgard, and Cyrill Stachniss. Speeding-up robot exploration by exploiting background information. IEEE Robotics and Automation Letters, 1(2):716–723, 2016.
-  Cyrill Stachniss. Multi-robot exploration using semantic place labels. In Robotic Mapping and Exploration, pages 73–90. Springer, 2009.
Delong Zhu, Tingguang Li, Danny Ho, and Max Q-H Meng.
Deep reinforcement learning supervised autonomous exploration in office environments.In Robotics and Automation (ICRA), 2018 IEEE International Conference on. IEEE, 2018.
-  Lionel Heng, Alkis Gotovos, Andreas Krause, and Marc Pollefeys. Efficient visual exploration and coverage with a micro aerial vehicle in unknown environments. In ICRA, volume 3, pages 3–5, 2015.
-  Bobby Davis, Ioannis Karamouzas, and Stephen J Guy. C-opt: Coverage-aware trajectory optimization under uncertainty. IEEE Robotics and Automation Letters, 1(2):1020–1027, 2016.
-  Y. T. Tan, Abhinav Kunapareddy, and Marin Kobilarov. Gaussian process adaptive sampling using the cross-entropy method for environmental sensing and monitoring. IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 2018.
-  Héctor H González-Banos and Jean-Claude Latombe. Navigation strategies for exploring indoor environments. The International Journal of Robotics Research, 21(10-11):829–848, 2002.
-  Marin Kobilarov. Cross-entropy motion planning. The International Journal of Robotics Research, 31(7):855–871, 2012.