I Introduction
Robot exploration problem is defined as making a robot or multirobots explore unknown cluttered environments (i.e., office environment, forests, ruins, etc.) autonomously with specific goals. The goal can be classified as: (1) Maximizing the knowledge of the unknown environments, i.e., acquiring a map of the world
[19], [12]. (2) Searching a static or moving object without prior information about the world. While solving the exploration problems with the second goal can combine the prior information of the target object, such as the semantic information, it also needs to fulfill the first goal.The frontierbased methods [21], [4] have been widely used to solve the robot exploration problem. [21] adopts a greedy strategy, which may lead to an inefficient overall path. Many approaches have been proposed to consider more performance metrics (i.e. information gain, etc.) [4], [2], [1]. However, these approaches are designed and evaluated in a limited number of environments. Therefore, they may fail to generalize to other environments whose layouts are different.
Compared with these classical methods, machine learningbased methods exhibit the advantages of learning from various data. Deep Reinforcement Learning (DRL), which uses a neural network to approximate the agent’s policy during the interactions with the environments
[17], gains more and more attention in the application of games [10], robotics [6], etc. When applied to robot exploration problems, most research works [23], [3]design the state space as the form of image and use Convolution Neural Networks (CNN). For example, in
[23], CNN is utilized as a mapping from the local observation of the robot to the next optimal action.However, the size of CNN’s input images is fixed, which results in the following limitations: (1) If input images represent the global information, the size of the overall explored map needs to be predefined to prevent input images from failing to fully containing all the map information. (2) If input images represent the local information, which fails to convey all the state information, the recurrent neural networks (RNN) or memory networks need to be adopted. Unfortunately, the robot exploration problem in this formulation requires relatively longterm planning, which is still a tough problem that has not been perfectly solved.
In this paper, to deal with the aforementioned problems, we present a novel state representation method, which relies on 4D pointcloudslike information of variable size. These information have the same data structure as point clouds and consists of 2D points’ location information, and the corresponding 1D frontier and 1D distance information, as shown in Fig.1. We also designs the corresponding training framework, which bases on the deep QLearning method with variable action space. By replacing the image observation with 4D pointcloudslike information, our proposed exploration model can deal with unknown maps of arbitrary size. Based on dynamic graph CNN (DGCNN) [20], which is one of the typical neural network structure that process point clouds information, our proposed neural network takes 4D pointcloudslike information as input and outputs the expected value of each frontier, which can guide the robot to the frontier with the highest value. This neural network is trained in a way similar to DQN in the environment [9], which is a fast exploration simulation platform that includes data of many 2D indoor layouts. The experiment shows that our exploration model can achieve a relatively good performance, compared with the baseline in [9], stateoftheart in [11], classical methods in [21], [4] and a random method.
Ia Original Contributions
The contributions of this paper are threefold. First, we propose a novel state representation method using 4D pointcloudslike information to solve the aforementioned problems in Section II. Although point clouds have been utilized in motion planning and navigation ([7], [16]), our work is different from these two papers in two main parts: (1) We use point clouds to represent the global information while they represent the local observation. (2) Our action space is to select a frontier point from the frontier set of variable size, while their action space contains control commands. Second, we design the corresponding state function based on DGCNN [20], and the training framework based on DQN [10]. The novelty is that our action space’s size is variable, which makes our neural network converge in a faster way. Third, we demonstrate the performance of the proposed method on a wide variety of environments, which the model has not seen before, and includes maps whose size is much larger than maps in the training set.
The remainder of this paper is organized as follows. We first introduce the related work in Section II. Then we formulate the frontierbased robot exploration problem and DRL exploration problem in Section III. After that, the framework of our proposed method are detailed in Section IV. In Section V, we demonstrate the performance of our proposed method through a series of simulation experiments. At last, we conclude the work of this paper and discuss directions for future work in section VI.
Ii Related Work
In [21], the classical frontier method is defined, where an occupancy map is utilized in which each cell is placed into one of three classes: open, unknown and occupied. Then frontiers are defined as the boundaries between open areas and unknown areas. The robot can constantly gain new information about the world by moving to successive frontiers, while the problem of selecting which frontiers at a specific stage remains to be solved. Therefore, in a frontierbased setting, solving the exploration problem is equivalent to finding an efficient exploration strategy that can determine the optimal frontier for the robot to explore. A greedy exploration strategy is utilized in [21] to select the nearest unvisited, accessible frontiers. The experiment results in that paper show that the greedy strategy is shortsighted and can waste lots of time, especially when missing a nearby frontier that will disappear at once if selected (this case is illustrated in the experiment part).
Many DRL techniques have been applied into the robot exploration problem in several previous works. In a typical DRL framework [17], the agent interacts with the environment by taking actions and receiving rewards from the environment. Through this trialanderror manner, the agent can learn an optimal policy eventually. In [18], a CNN network is trained under the DQN framework with RGBD sensor images as input to make the robot learn obstacle avoidance ability during exploration. Although avoiding obstacles is important, this paper does not apply DRL to learn the exploration strategy. The work in [11] combines frontierbased exploration with DRL to learn an exploration strategy directly. The state information includes the global occupancy map, robot locations and frontiers, while the action is to output the weight of a cost function that evaluates the goodness of each frontier. The cost function includes distance and information gain. By adjusting the weight, the relative importance of each term can be changed. However, the terms of the cost function rely on human knowledge and may not be applicable in other situations. In [8], the state space is similar to the one in [11], while the action is to select a point from the global map. However, the map size can vary dramatically from one environment to the next. It losses generality when setting the maximum map size before exploring the environments.
In [23] and [3], a local map, which is centered at the robot’s current location, is extracted from the global map to represent current state information. By using a local map, [23] trains the robot to select actions in “turn left, turn right, move forward”, while [3] learns to select points in the free space of the local map. Local map being state space can eliminate the limitation of global map size, but the current local map fails to contain all the information. In [23], the robot tends to get trapped in an explored room when there is no frontier in the local map, because the robot has no idea where the frontiers are. The training process in [3] needs to drive the robot to the nearest frontier when the local map contains no frontier information, although a RNN network is integrated into their framework. This human intervention adopts a greedy strategy and can not guarantee an optimal or nearoptimal solution. When utilizing local observations, the robot exploration problem requires the DRL approach to have a longterm memory. Neural map in [13] is proposed to tackle simple memory architecture problems in DRL. Besides, Neural SLAM in [22] embeds traditional SLAM into attentionbased external memory architecture. However, the memory architectures in [13] and [22] are based on the fixed size of the global map. It is difficult to be applied to unknown environments whose size may be quite large compared with maps in training sets. Unlike the aforementioned methods, our method uses the 4D pointcloudslike information to represent the global state information, which does not suffer from both the map size limitation and the simple memory problem. As far as we know, our method is the first to apply point clouds to robot exploration problems. Therefore, we also design a respective DRL training framework to map the exploration strategy directly from point clouds.
Iii Problem Formulation
Our work aims to develop and train a neural network that can take 4D pointcloudslike information as input and generate efficient policy to guide the exploration process of a robot equipped with a laser scanner. The network should take into account the information about explored areas, occupied areas, and unexplored frontiers. In this paper, the robot exploration problem is to make a robot equipped with a limitedrange laser scanner explore unknown environments autonomously.
Iiia Frontierbased Robot Exploration Problem
In the robot exploration problem, a 2D occupancy map is most frequently adopted to store the explored environment information. Define the explored 2D occupancy map at step as . Each grid in can be classified into the following three states: free grid , occupied grid , and unknown grid . According to [21], frontiers are defined as the boundaries between the free space and unknown space . Many existing DRL exploration frameworks learn a mapping from and to robot movement commands which can avoid obstacles and navigate to specific locations. Although this endtoend framework has a simple structure, it is difficult to train. Instead, our method learns a policy network that can directly determine which frontier to explore, which is similar to [11] and [8]. At step , a target frontier is selected from based on an exploration strategy and current explored map . Once the change of the explored map is larger than the threshold, the robot is stopped, and the explored map at step is obtained. By moving to selected frontiers constantly, the robot will explore more unknown areas until no accessible frontier exists.
Because can be represented as an image, it is commonly used to directly represent the state information. As explained in Section II, a novel state representation method with 4D pointcloudslike information is proposed instead. The 4D pointcloudslike information at step is defined as a 4dimensional point cloud set with points, denoted by . Each point contains 4D coordinates , where denotes the location of the point, denotes the distance from the point to the robot location without collision, denotes whether point in belongs to frontier or not.
IiiB DRL exploration formulation
The robot exploration problem can be formulated as a Markov Decision Process (MDP), which can be modeled as a tuple
. The state space at step is defined by 4D point cloud , which can be divided into frontier set and obstacle set . The action space at step is the frontier set , and the action is to select a point from , which is the goal of the navigation module implemented by [15]. When the robot take an action from the action space, the state will transit to stateaccording to the stochastic state transition probability
. Then the robot will receive an immediate reward . The discount factor adjusts the relative importance of immediate and future reward. The objective of DRL algorithms is to learn a policy that can select actions to maximize the expected reward, which is defined as the accumulated discounted rewards over time.Because the action space varies according to the size of frontier set
, it is difficult to design a neural network that maps the state to the action directly. The valuebased RL is more suitable to this formulation. In valuebased RL, a vector of action values, which are the expected rewards after taking actions in state
under policy , can be estimated by a deep Q network (DQN) , where are the parameters of the mutilayered neural network. The optimal policy is to take action that has the highest action value: . DQN [10] is a novel variant of Qlearning, which utilizes two key ideas: experience reply and target network. The DQN tends to overestimate action values, which can be tackled by double DQN in [5]. Double DQN select an action the same as DQN selects, while estimate this action’s value by the target network.Iv Algorithm
In this section, we present the framework of our method and illustrate its key components in detail.
Iva Framework Overview
The typical DRL framework is adopted, where the robot interacts with the environment step by step to learn an exploration strategy. The environment in this framework is based on [9]. The state and action space of the original environment is the local observation and robot movement commands, respectively. When incorporated into our framework, receives a sequence of robot movement command and outputs the global map once the change of the explored map is larger than the threshold, which is detailed in Section IIIA. As shown in Fig. 2, the 4D pointcloudslike information can be obtained by a modified Dijkstra algorithm. After the policy network outputting the goal point, the algorithm is implemented to find the path connecting the robot location to the goal point. Then a simple path tracking algorithm is applied to generate the sequence of robot movement commands.
IvB Frontier Detection and Distance Computation
Computing the distances from the robot location to points in the frontier set without collision will be timeconsuming if each distance is obtained by running the algorithm once. Instead, we modify the Dijkstra algorithm to detect frontiers and compute distance at the same time by sharing the search information. Denote the “open list” and “close list” as and , respectively. The open list contains points that need to be searched, while the close list contains points that have been searched. Define the contour list as , which contains the location and the cost of points that belong to frontier or obstacle. Only points in the free space of map are walkable. The goal of this modified algorithm is to extract the contour of free space and obtain the distance information simultaneously. As shown in Algorithm 1, the start point with the cost of zero, which is decided by the robot location, is added to . While the open list is not empty, the algorithm repeats searching 8 points, denoted by , adjacent to current point . The differences from Dijkstra algorithms are: (1) If belongs to occupied or unknown space, add to frontier list , as shown in line 10 of Algorithm 1. (2) Instead of stopping when the goal is found, the algorithm terminates until contains zero points. After the algorithm ends, the contour list contains points that are frontiers or boundaries between free space and obstacle space. Points in the contour list can be classified by their neighboring information into frontier or obstacle set, which is shown in Fig. 3.
IvC Network with Point Clouds as Input
In this section, the architecture of the stateaction value network with 4D pointcloudslike information as input is detailed. The architecture is modified from DGCNN in Segmentation task [20], which proposes edge convolution (EdgeConv) operation to extract the local information of point clouds. The EdgeConv operation includes two steps for each point in the input set: (1) construct a local graph including the center point and its knearest neighbors; (2) apply convolutionlike operations on edges which connect each neighbor to the center point. The EdgeConv operation takes the point set as input and outputs the feature set , where and denote the dimension of input and output set, and denotes the number of points in the input set. Different from DGCNN and other typical networks processing point cloud such as PointNet [14], which have the same input and output point number, our network takes the frontier and obstacle set as input and only outputs the value of points from the frontier set. The reason for this special treatment is to decrease the action space’s size to make the network converge in a faster manner.
The network takes as input points at time step , which includes frontier points and obstacle points, which are denoted as and respectively. The output is a vector of estimated value of each action
. The network architecture contains multiple EdgeConv layers and multilayer perceptron (mlp) layers. At one EdgeConv layer, the feature set is extracted from the input set. Then all features in this edge feature set are aggregated to compute the output EdgeConv feature for each corresponding point. At a mlp layer, the data of each point is operated on independently to obtain the information of one point, which is the same as the mlp in PointNet
[14]. After 4 EdgeConv layers, the outputs of all EdgeConv layers are aggregated and processed by a mlp layer to generate a 1D global descriptor, which encodes the global information of input points. Then this 1D global descriptor is concatenated with the outputs of all EdgeConv layers. After that, points that belong to the obstacle set are ignored, and the points from the frontier set are processed by three mlp layers to generate scores for each point.IvD Learning framework based on DQN
As described in Section IIIB, the DQN is a neural network that for a given state outputs a vector of action values. The network architecture of DQN is detailed in Section IVC, where the state in point clouds format contains frontier and obstacle sets. Under a given policy , the true value of an action in state is: . The target is to make the estimate from the frontier network converge to the true value. The parameters of the actionvalue function can be updated in a gradient decent way, after taking action in state and observing the next state and immediate reward :
(1) 
where is the learning rate and the target is defined as
(2) 
where denotes the parameter of the target network, which is updated periodically by .
To make the estimate from our network converge to the true value, a typical DQN framework [10] can be adopted. For each step , the tuple is saved in a reply buffer. The parameters can be updated by equation 1 and 2 given the tuple sampled from the reply buffer.
The reward signal in the DRL framework helps the robot know at a certain state whether an action is appropriate to take. To make the robot able to explore unknown environments successfully, we define the following reward function:
(3) 
The term equals the newly discovered area at time , which is designed to encourage the robot to explore unknown areas. The term provides a consistent penalization signal to the robot when a movement command is taken. This reward encourages the robot to explore unknown environments with a relatively short length of overall path. The term is designed to guide the robot to reduce the number of frontiers, as defined in equation 4:
(4) 
where denotes the number of frontier group at time .
V Training Details and Simulation Experiments
In this section, we first detail the training process of our DRL agent. Then we validate the feasibility of our proposed method in robot exploration problem by two sets of experiments: (1) a comparison with five other exploration methods, (2) a test of scalability where maps of larger size compared with training set are to be explored.
Va Training Details
To learn a general exploration strategy, the robot is trained in the
environment where 100 maps with different shapes and features are to be explored. The robot is equipped with a 2m range laser scanner with a 180 degree field of view. The noise of laser range measurement is simulated by the Gaussian distribution with a mean of 0 and a standard deviation of 0.02m. As an episode starts, a map is randomly selected from the training set, and the robot start pose, including the location and pose, is also set randomly. In the beginning, the circle area centered at the start location is explored by making the robot rotate for 360 degree. Then at each step, a goal frontier point is selected from the frontier set under the policy of our proposed method. A* algorithm is applied to find a feasible path connecting the current robot location to the goal frontier. A simple path tracking algorithm is used to find the robot commands to follow the planned path: moving to the nearest unvisited path point
, and replanning the path if the distance between the robot and is larger than a fixed threshold. An episode ends if the explored ratio of the whole map is over .HouseExpo  Training  

Laser range  Discount factor  0.99  
Laser field of view  Target network update f  4000 steps  
Robot radius  Learning rate  0.001  
Linear step length  Replay buffer size  50000  
Angular step length  Exploration rate  15000  
Meter2pixel  Learning starts  3000 
Some training map samples are shown in Fig. 5. The largest size of a map in the training set is 256 by 256. Because the size of state information changes at each time and batch update method requires point clouds with same size, currently, it is not realistic to train the model in a batchupdate way. In typical point clouds classification training process, the size of all the point clouds data are preprocessed to the same size. However, these operations will change the original data’s spatial information in our method. Instead, for each step, the network parameters are updated 32 times with a batch size equal to 1. Learning parameters are listed in Table 1. The training process is performed on a computer with Intel i77820X CPU and GeForce GTX 1080Ti GPU. The training starts at 2000 steps and ends after 90000 update steps, which takes 72 hours.
VB Comparison Study
Besides our proposed method, we also test the performance of the weight tuning method in [11], a costbased method in [4], a method with greedy strategy in [21], a method utilizing a random policy and the baseline in [9], which we denote as the weight method, cost method, greedy method, random method and baseline, respectively. To compare the performance of different methods, we use 4 maps as a testing set, as shown in Fig. 6. For each test map, we conduct 100 trials by setting robot initial locations randomly for each trail.
The random method selects a frontier point from the frontier set randomly. The greedy method chooses the nearest frontier point. The baseline utilizes a CNN which directly determines the robot movement commands by the current local observation.
The cost method evaluates the scalar value of frontiers by a cost function considering distance and information gain information.
(5) 
where is the weight that adjusts the relative importance of costs. and denote the normalized distance and information gain of a frontier. At each step, after obtaining the frontier set as detailed in Section IVB
, the kmeans method is adopted to cluster the points in the frontier set to find frontier centers. To reduce the runtime of computing information gains, we only compute the information gain for each frontier center. The information gain is computed by the area of the unknown space to be explored if this frontier center is selected
[4]. The weight in the cost method is fixed to 0.5, which fails to be optimal in environments with different structures and features.The weight method can learn to adjust the value of the weight in equation 5 under the same training framework as our proposed method. The structure of the neural network in weight method is presented in Fig. 4, which takes the 4D pointcloudslike information as input and outputs a scalar value.
We select the length of overall paths, which are recorded after of areas are explored, as the metric to evaluate the relative performance of the six exploration methods. The length of overall paths can indicate the efficiency of the exploration methods. The box plot in Fig. 7
is utilized to visualize the path length’s data of each exploration method on four test maps. The baseline is not considered here because the baseline sometimes fails to explore the whole environment, which will be explained later. Three values of this metric are used to analyze the experiment results: (1) the average, (2) the minimum value, (3) the variance. Our proposed method has the minimum average length of overall paths for all four test maps. The minimum length of the proposed method in each map is also smaller than other five methods. This indicates that the exploration strategy of our proposed method is more effective and efficient than the other five methods. The random method has the largest variance and average value for each map because it has more chances to sway between two or more frontier groups. That is why the overall path of the random method in Fig.
8 is the most disorganized. The weight method has a lower average and minimum value compared with the cost method in all test maps, due to the advantages of learning from rich data. However, the weight method only adjusts a weight value to change the relative importance of distance and information gain. If the computation of the information gain is not accurate or another related cost exists, the weight method fails to fully demonstrate the advantages of learning. Instead, our proposed method can learn useful information, including information gain, by learning to estimate the value of frontiers, which is the reason that our proposed method outperforms the weight method.Fig. 8 shows an example of the overall paths of six different methods when exploring the map1 with the same starting pose. Each episode ended once the explored ratio was more than 0.95. The proposed method explored the environment with the shortest path. The explored ratio of the total map according to the length of the current path is shown in Fig. 9. The greedy method’s curve has a horizontal line when the explored ratio is near 0.95. This is because the greedy method missed a nearby frontier which would disappear at once if selected. However, the greedy method chose the nearest frontier instead, which made the robot travel to that missed frontier again and resulted in a longer path. For example, in Fig. 8, the greedy method chose to explore the point C instead of the point A, when the robot was at point B. This decision made the robot travel to the point A later, thus making the overall path longer. The baseline’s curve also exhibits a horizontal line in Fig. 9, which is quite long. The baseline’s local observation contained no frontier information when the surroundings were all explored (in the left part of the environment shown in Fig. 8). Therefore, at this situation, the baseline could only take “random” action (i.e. travelling along the wall) to find the existing frontiers, which would waste lots of travelling distances and may fail to explore the whole environment.
VC Scalability Study
In this section, a map size of (531, 201) is used to test the performance of our proposed method in larger environments compared with maps in the training set. If the network’s input is fixed size images, the map needs to be padded into a (531, 531) image, which is a lowefficient state representation way. Then a downscaling operation of the image is required to make the size of the image input the same as the requirement of the neural network, e.g. (256,256). Although the neural network can process the state data by downscaling, the quality of the input data decreases. Therefore, the network fails to work once the scaled input contains much less necessary information than the original image. Fig.
10 presents the overall path generated by our method without downscaling the map size. Our proposed method, which takes point clouds as input, has better robustness in maps with large scales because of the following two reasons. First, we incorporate the distance information into point clouds, which can help neural networks learn which part of point clouds are walkable. Although the image representation can also have a fourth channel as distance information, the scaling operation can make some important obstacles or free points disappear, which changes the structure of the map. Secondly, the number of pixels in an image increases exponentially as the size of the image increase. The number of points in point clouds equals the number of pixels that represent an obstacle or frontier in a map, which is not an exponential relation unless all the pixels in a map are obstacles or frontiers.Vi Conclusions And Future Work
In this paper, we present a novel state representation method using 4D pointcloudslike information and design the framework to learn an efficient exploration strategy. Our proposed method can solve the problems that come with using images as observations. The experiments demonstrate the effectiveness of our proposed method, compared with other five commonly used methods. For the future work, other network structures and RL algorithms can be modified and applied to the robot exploration problem with point clouds as input. The converge speed of training may also be improved by optimizing the training techniques. Besides, the multirobot exploration problem may also use point clouds to represent the global information.
References
 [1] (2011) Exploration strategies based on multicriteria decision making for searching environments in rescue operations. Autonomous Robots 31 (4), pp. 401–417. Cited by: §I.
 [2] (2002) Information based adaptive robotic exploration. In IEEE/RSJ international conference on intelligent robots and systems, Vol. 1, pp. 540–545. Cited by: §I.
 [3] (2019) Selflearning exploration and mapping for mobile robots via deep reinforcement learning. In Aiaa scitech 2019 forum, pp. 0396. Cited by: §I, §II.
 [4] (2002) Navigation strategies for exploring indoor environments. The International Journal of Robotics Research 21 (1011), pp. 829–848. Cited by: §I, §I, §VB, §VB.

[5]
(2016)
Deep reinforcement learning with double qlearning.
In
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence
, AAAI’16, pp. 2094–2100. Cited by: §IIIB.  [6] (2020) Rlbench: the robot learning benchmark & learning environment. IEEE Robotics and Automation Letters 5 (2), pp. 3019–3026. Cited by: §I.
 [7] (2020) Robust rlbased mapless local planning: using 2d point clouds as observations. IEEE Robotics and Automation Letters 5 (4), pp. 5787–5794. External Links: Document Cited by: §IA.
 [8] (2019) Deep reinforcement learningbased automatic exploration for navigation in unknown environment. IEEE transactions on neural networks and learning systems 31 (6), pp. 2064–2076. Cited by: §II, §IIIA.
 [9] (2019) Houseexpo: a largescale 2d indoor layout dataset for learningbased algorithms on mobile robots. arXiv preprint arXiv:1903.09845. Cited by: §I, Fig. 2, §IVA, §VB.
 [10] (2015) Humanlevel control through deep reinforcement learning. nature 518 (7540), pp. 529–533. Cited by: §IA, §I, §IIIB, §IVD.
 [11] (2019) Deep reinforcement learning robot for search and rescue applications: exploration in unknown cluttered environments. IEEE Robotics and Automation Letters 4 (2), pp. 610–617. External Links: Document Cited by: §I, §II, §IIIA, §VB.
 [12] (2016) Speedingup robot exploration by exploiting background information. IEEE Robotics and Automation Letters 1 (2), pp. 716–723. External Links: Document Cited by: §I.
 [13] (2018) Neural map: structured memory for deep reinforcement learning. In International Conference on Learning Representations, Cited by: §II.

[14]
(2017)
Pointnet: deep learning on point sets for 3d classification and segmentation
. InProceedings of the IEEE conference on computer vision and pattern recognition
, pp. 652–660. Cited by: §IVC, §IVC. 
[15]
(19681968)
A formal basis for the heuristic determination of minimum cost paths
. IEEE transactions on Systems Science and Cybernetics 4 (2), pp. 100–107. Cited by: §IIIB.  [16] (2020) Learning obstacle representations for neural motion planning. Proceedings of Conference on Robot Learning (CoRL). Cited by: §IA.
 [17] (2018) Reinforcement learning: an introduction. MIT press. Cited by: §I, §II.
 [18] (2016) A robot exploration strategy based on qlearning network. In 2016 IEEE International Conference on Realtime Computing and Robotics (RCAR), Vol. , pp. 57–62. External Links: Document Cited by: §II.
 [19] (2002) Probabilistic robotics. Communications of the ACM 45 (3), pp. 52–57. Cited by: §I.
 [20] (2019) Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog) 38 (5), pp. 1–12. Cited by: §IA, §I, §IVC.
 [21] (1997) A frontierbased approach for autonomous exploration. In Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA’97.’Towards New Computational Principles for Robotics and Automation’, pp. 146–151. Cited by: §I, §I, §II, §IIIA, §VB.
 [22] (2017) Neural slam: learning to explore with external memory. arXiv preprint arXiv:1706.09520. Cited by: §II.
 [23] (2018) Deep reinforcement learning supervised autonomous exploration in office environments. In 2018 IEEE International Conference on Robotics and Automation (ICRA), Vol. , pp. 7548–7555. External Links: Document Cited by: §I, §II.
Comments
There are no comments yet.