Identifying previously encountered locations is fundamental to a variety of robotic applications such as loop closure in simultaneous localization and mapping (SLAM) , or merging observations from different robots in multi-robot systems 
. This problem is harder in GPS-denied or restricted settings where absolute information about locations is unavailable or unreliable. Forests are an interesting and challenging use of robot systems, with promising applications to automatic timber volume estimation, animal detection  and forest fire management . The problem of place recognition is critical in cluttered forests as the operational area is vast, GPS is typically unreliable due to dense forest canopy, infrastructure for long range communication is usually not available, and the environment is repetitive .
To address this, we propose a new method that defines polygons based on tessellations of the set of landmark detections to define unique signatures of parts of a robot observation that can be used to localize the agent and compute landmark correspondences.
Place recognition has been successfully solved in different applications in urban or indoor scenarios where features have meaningful and discriminative information [8, 17]. For the multi-view case, observations of an environment can be differentiated by encoding the semantic labels observed in a scene . However, in the context of navigation under forest canopy these methods are not viable since the only reliable information differentiating trees are their position in space.
In this type of scenario, instead of a bag of words approach, one can resort to encode the spatial relationships of the objects using a global descriptor that encodes the entire observation.
Graphs are a natural representation for these types of representations. The nodes of the graph are usually given by feature or landmarks detected in the observation, from which a kernel can be applied to extract a descriptor to be matched against other observations [9, 24]. GLARE , its rotation invariant extension GLAROT , and its 3D extension  first compute a descriptor for each observation based on the relative distances and angles between landmarks. Then, given a possible match, they recompute the same features for each landmark to get pairs to estimate an affine transformation between the observations. Even though this method has had some success in a forest environment , by discretizing the space to compute its descriptor it is susceptible to noise since the estimate of the same landmark can lie on different bins. For the 2D case, 
extends these works by representing the same metrics as probability density functions to avoid the discretization problem, but it can still suffer from noise and partial overlap.Since we encode spatial relationships with geometric primitives, our method does not require discretization.
On the other end of the spectrum, local methods compute descriptors based on small regions around feature points can successfully match even in the presence of partial overlap or occlusion. Gawel et al.  merges accumulated point cloud maps from vision and lidar systems by calculating structural descriptors based on the density of the neighborhood of feature points. Fast point feature histograms (FPFH) [22, 23], creates descriptors for each point by computing surface normals using neighboring points inside a user-defined radius, represented as angular features that are binned into a histogram.
The drawback of local methods is that the number of comparisons usually grows linearly with the number of points in the set, resulting in higher computational times. For robotics applications where speed and resource allocation is critical, this can be prohibitive.
By using polygons derived from the Urquhart graph, our method encodes information with less descriptors than a local method while also being robust to partial overlap and noise. We present experimental evidence in simulation and in real world environments that the method can be successfully used for these tasks, and is capable of handling more perturbations than current state-of-the-art methods. In summary, our contributions are:
A novel descriptor based on polygons capable of handling large scale repetitive environments while being robust to noise.
A framework for alignment of arbitrary observations using only the position of landmarks
Ii Problem Formulation
We model a robot with a noisy, limited range sensor traversing an environment filled with identical landmarks. We define a map as a set of landmarks , where the only information differentiating and , are their locations in . Due to the limited range sensor, a robot at time has the potential to detect a submap of landmarks .
However, due to sensor noise and landmark occlusion, some landmarks may not be successfully detected. Let
be the Boolean random variable representing successful detection of landmark,
Similarly to , we model, . We define the observed submap under the presence of detection noise
We assume that each landmark has a 2-D coordinate projection on the -plane in the map frame . Due to sensor noise and uncertainty in the landmark projection, the observed 2-D coordinate projection may differ from the true 2-D coordinate projection . We model this noise as , where
is a 2-D Gaussian random variable with zero mean and variance, . The 2-D observation of including both forms of sensor noise is
Finally, the robot will typically perceive its surroundings in its local frame, not the global map frame. Suppose that at time , the pose of the robot projected in the -plane in the map frame is . We define the noisy 2-D observation of in this local frame as
We will refer to the robot pose at time as for ease of notation.
Problem (Place recognition with identical landmarks). Given noisy 2-D observations and taken at times , determine if the corresponding sub-maps and if so, estimate the associated rigid transformation where
Let , be a set of discrete points in general position, the Delaunay triangulation  defines a triangulation on the space defined by the convex hull of such that no point is in the circumcicle of any polygon of . A tessellation is a finite family of polygons which cover without gaps or overlaps. The Delaunay triangulation of , is a tessellation.
We can represent as sets with different classes of geometric elements. The edges of are given by the set , where each edge is a line segment bounded by a pair of end points and , and its triangles are given by the set .
We define a function , for where
e.g. maps triangles of the Delaunay triangulation to its corresponding edges. The set
represents the largest edges of each triangle.
From a graph theoretic viewpoint, can be represented as a graph where and . The Urquhart Graph  of is a graph where and . is a sub-graph of where the longest edges of each triangle are removed.
A cycle of an arbitrary graph is a non-empty sequence of edges with a vertex sequence such that . A simple cycle is a cycle with a corresponding vertex sequence which has the property , i.e. there are no repeated vertices except for the first and the last.
The cycle basis of a graph is the minimal set of simple cycles such that for all cycles , , such that where represents the symmetric difference operation. Intuitively, any cycle can be computed with elements of the cycle basis.
For , the cycle basis of is equivalent to the set of triangles defined by the tessellation. Similarly, defines a new tessellation , where . Each cycle of can be viewed as a composition of cycles of that shared the longest edge that was dropped.
We refer to the set of polygons given by as and the union of all three sets as . Note that since elements of are derived from elements of , equation 2 can be used to map each polygon of the Urquhart tessellation to a subset of triangles of the Delaunay triangulation that compose it.
In this section, we describe our method for extracting from an observation , the polygons or for brevity. Moreover, we present a framework for solving Eq. 1 using landmark correspondences derived from . We assume that the distribution in space of the landmarks captured in the range of a lidar reading of the environment is unique, and leverage the position of these landmarks to define polygons that can be used to uniquely represent a sub-map .
For a given observation of a sub-map , is computed as follows: (1) Compute from ; (2) Compute from and ; (3) Extract from .
In this work we build off of the pipeline proposed in , and we derive the position projection of the landmarks from the tree models provided by the algorithm.
Iv-a Polygons from Landmark Detections
In general, a polygon of will have consistent metric properties such as perimeter and area across observations as long as other polygons that share a side with it are not perturbed with noise . However, similar triangles are likely to be found as the scale of the area covered by the robot grows. Moreover, the number of triangles to match has an upper bound of where is the number of points and are the points that lie in  which can become impracticable to match in real-time applications.
For two polygons and , , from different regions of the map to have similar metric properties, it would require that triangles that compose and also have similar metric properties and are arranged in space similarly. More formally, for all . For this reason, creates polygons that are less likely to repeat than and decreases the probability of false-positive correspondences.
As stated in section III, the set polygons of the tessellation is given by . We propose a new algorithm summarized in 1, where we loop through the elements of to compute while also updating as triangles are combined to efficiently compute both and .
For a given pair of cycles and that represent polygons from a tessellation, we compute the symmetric difference by moving the elements of its lists of edges and . The shared edge is moved to the first position in and the last position in . Then, we concatenate to the end of , excluding from both. This procedure will create a new cycle and maintain the order of the edges. In some cases, elements of will have a hanging edge, as depicted by the yellow polygon in 3. These elements can be detected and filtered out in post processing to respect the definition of a simple cycle. Moreover, polygons that have a side at the boundary of the tessellation are discarded since the probability that these shapes will be altered in a new observation with slightly different robot position is large as new landmarks can enter the field of view of the sensor.
Iv-B Robust Polygon Descriptor
We can view the place recognition problem as an instance of the sub-graph matching problem using , which is a NP-Complete . Instead, we use to derive descriptors for different regions of the observation.
We borrow techniques from the shape retrieval literature  and, for each polygon and triangle , we compute a centroid distance signature. The set of points that compose is where is the first element of the tuple of end points that define as explained in section III. The centroid of is computed by
Since the size of can vary for different polygons, we sample a constant number points relative to the perimeter size. The step size between sampled points is given by , where and is the length of the perimeter, creating a new set of points with the same number of elements regardless of the size of .
The new centroid signature with sampled points is given by
A large size will have the effect of smoothing out the polygon, while a small number will be more likely to capture details such as sharp corners. The optimal value balances these two properties to maximize precision while also being robust to noise.
Since the centroid distance signature uses only relative distances, it is translation invariant. However, the order of the elements of the descriptor can be different depending on what part of the polygon sampling starts. One approach to address this is to apply a permutation function to the descriptor and use the configuration with smallest distance such as GLARE, which can be inefficient as we may have to match many polygons per observation.
is a new signature in the frequency domain, and has the property that its magnitudewill be the same regardless of the order of the input, making it invariant to starting point of the sampling step.
We store for all polygons in , including triangles. A pair of polygons and is considered a match if
To increase robustness and speed, we only compare polygons if . That is, if the difference in number of points between polygons is smaller than or equal to . In the worst case, the initial comparison between elements of of a pair of observations will be where and .
In loop-closure or other scenarios where the number of comparisons scales over time with the number of stored observations, we may have to sample elements of with a threshold to have an upper bound on the time required for matching a pair of observations. With this process, the algorithm will still grow linearly with respect to the number of stored observations. However, like in global methods, the number of landmarks will no longer affect the performance since the number of polygons to match per observation will be constant.
As stated previously, a polygon defines a subset of triangles that were used to compose it and can be retrieved with . Given a target polygon , correspondences between elements of and are accepted if
where is the number of elements of with correspondences in according to Eq. IV-C, and is a user defined threshold.
Given a pair of corresponding triangles and , we match edges and by
where represents the length of each element of and is a permutation matrix reordering the elements of . Intuitively, the permutation matrix that generates the smallest difference between the lengths of the elements is the best assignment between them. It is trivial to extend this information to point correspondences by matching points that share corresponding edges.
Iv-D Affine Transformation Computation
Given a set of point correspondences, there are many approaches for solving the observation alignment problem. For the case, the assignments found by our algorithm can be propagated to the entire object and an optimization based approach can be used to align the instances, e.g. as an alternative to the data association methods presented in  Section III-B.
We limit our experiments to where is a degrees of freedom affine transformation estimated with RANSAC. For each iteration, we randomly sample two correspondences, and solve equation 1 analytically for . If the Euclidean distance between a pair of corresponding points after the transformation is below a threshold
, we consider the correspondence an inlier. If the ratio of inliers to outliers is above a thresholdor the maximum number of iterations is reached, we stop the algorithm and return the best estimate with respect to number of inliers.
In this section we present our experimental setup for the simulation and real-world experiments, results regarding the influence of key parameters to the performance of our method, comparison against other algorithms on place recognition and map-merging tasks and a discussion.
We run our experiments in two different environments. The first is a simulated forest, where we use ground truth pose measurements to evaluate precision and recall of our algorithm under different levels of noise and the effect on performance of different parameter configurations. In the second experiment, we evaluate the quality of the associations detected by our algorithm with a map merging task on real-world dataset collected with an UAV carrying an Ouster OS1-64lidar.
For benchmarking, we compare our method with GLARE descriptors with GLAROT distance using and as the angle and distance discretization resolution. Following , we compute a GLARE signature for each landmark in the observation and a global signature that is the accumulation of the point signatures. If a potential match is found with the global signature, the point descriptors are used to detect point-to-point associations. We also compare with Li et at. . Developed in parallel with this work, they propose a Delaunay triangulation only matching where descriptors for each triangle are computed by a concatenation of the areas and perimeters of itself and other triangles that share a side with it.
For our method, is set at in all experiments. For all methods, we use RANSAC to compute the affine transformation according to IV-D with , and . Since the original papers do not explicitly recommend values for distance thresholds, we use the best configuration found empirically for GLARE/GLAROT and for Li et al.  which are and respectively.
V-B Place Recognition in Simulation
We simulate a or approximately acres forest. To ensure a consistent density of trees across the map, the set of 2D landmarks is generated by Poisson-Disc sampling through Bridson’s algorithm  with minimum distance between points of . This algorithm will create a regular pattern of trees across the environment, which is not a realistic representation of how trees are distributed in a real forest. To account for this, each point in the set is perturbed with Gaussian noise with mean and standard deviation. After these steps, each radius observation will have approximately trees per lidar reading and the average distance from a tree to its nearest neighbor is , while in the accumulated map from our real-world dataset this distance is .
The simulated robot does a circular path
times. Each observation is rotated by a random angle sampled from a uniform distribution in the range, and every landmark in the observations subject to position noise and detection noise . Excluding the trivial match where , we consider a match a true positive if
and the rotation difference is smaller than which are similar constrains to related work . We consider a false negative when not enough matches are found but the distance between the ground truth poses is smaller than the lidar radius.
For all experiments we run all possible combinations of the detection success probability in the range and the standard deviation of the position estimation error in the range totalling 20 different experiments.
We compute Precision-Recall (PR) curves with respect to the minimum number of point to point associations in the range required to run RANSAC. For our method, we first store all polygons in and evaluate different configurations of the polygon descriptor distance . The values of the PR curve are the average precision and recall under different combinations of values of and . We consider the best configuration with respect to the F1-score, given by
By running the simulation experiment with different configurations of , we find that with , our method achieves the best balance between precision and recall as shown in Fig. 3(a).
With this result, we evaluate the effect of the parameter that controls the amount of polygons to be sampled from . We select polygons by prioritizing elements with , since polygons with large are usually more sensitive to noise and small polygons such as triangles encode less information about the space. If not enough polygons in this range are found, we continue randomly selecting from large polygons and if still necessary, from triangles.
We show the PR curves with different values of in Fig. 3(b) and observe that sampling from elements from has a direct impact on performance and all polygons should be used if processing power is not a constraint.
Finally, we combine the best results of previous experiments to evaluate the effect of different levels of noise to the precision and recall of each method. In table I we show the performance of all methods for each combination of noise using the same range for and as previous experiments.
Increasing the position noise or decreasing the number of detections have an impact on the performance of all methods. Li et al. is able to handle more detection failures than both our method and GLARE as triangles capture a smaller portion of the observation making it more likely to have consistent polygons even with a large percentage of unobserved landmarks. However, with of position noise, the performance of Li et al. significantly drops. The main factor for this is that the descriptor relies on the area of the triangle, which has high sensitivity to noise. While for our experiments we used the configuration that achieved the best F1-score, we found that for GLAROT that implied in either high precision and low recall or low precision and high recall. In this experiment, we observe a high number of false positives, making GLAROT have poor performance even on the scenario with no noise. For our method, as the polygons in capture a larger area of the observation, these elements are more likely to be altered as landmarks are not detected. For this reason, while we show significantly more robustness than the other methods, we observe that our method is more sensitive to detection failures.
|Position NoiseDetection Success Prob.|
|Ours||GLAROT||Li et al.||Ours||GLAROT||Li et al.||Ours||GLAROT||Li et al.||Ours||GLAROT||Li et al.|
V-C Map Merging in a Forest
In this experiment, the robot does two different flights across the same plot of a commercial pine tree forest. For each flight, we split the readings into sub-maps for every minute of data. The sub-maps have partial overlap, position noise, and cases of detection failure, making this tasks challenging for methods that are not robust to these factors.
For GLARE, since we know that subsequent sub-maps have overlap, we skip the global signature check for glare and directly match landmarks. With every method, we iteratively recompute its corresponding descriptor for the new accumulated map and merge with the next sub-map until all are combined into a single map.
We order the sub-maps by time of capture, and run the merging pipeline on pairs of subsequent observations. The associations could be used in a more complex data association pipeline such as CLEAR  that checks for cycle consistency across different observations to refine the matches, and consequently resulting in better map quality. In this experiment we use a simpler approach with DBSCAN  to cluster trees closer than after the alignment into a single landmark. This is preferable instead of simply removing duplicates based on the correspondences since some landmark matches may not have been detected.
Although the odometry provided by SLOAM  may contain drift, we use it as a reference to evaluate the alignments computed by associations provided by each method. As shown in Fig. 5, the output of our method is significantly closer to the ground truth than GLARE and Li et al. Moreover, the latter is unable to find correspondences for all sub-maps, resulting in incomplete maps. We also calculated the mean squared error (MSE) of the poses in each flight after the alignment versus the ground truth. We do not consider the poses of the first sub-map since it is used as the target frame and consequently is not affected by the transformations. For flight one, our method achieves MSE of meters and flight two meters. Since both benchmark methods fail to find good correspondences, their errors are above for both flights.
Finally, the accumulated trajectories of flights one and two are merged using the same process. Since GPS is not available we do not have ground truth measurements for this experiment. However, we know that the robot lands roughly on the same place on both flights. As the other methods already failed to merge the sub-maps, the alignment of both flights also fails. While for our method, the Euclidean distance between the last position of the robot in flight one and flight two is of meters. While this is an imperfect metric to analyze the overall quality of the alignment, it indicates that the result is close to the real alignment. The result of this experiment is depicted in Fig. 6.
For deployment on a robot, an algorithm has to be accurate and have a reasonable execution time. For all three methods, we report in table II the worst case big O and the average speed required to estimate landmark correspondences between two observations, times are reported using an Intel® i7-7500U CPU @ 2.70GHz.
Even though our method has a quadratic big O for matching in the worst case, it scales with the number of polygons of , which is smaller than the number of of landmarks in a given observation. Moreover, as discussed previously, we can sample polygons to have an upper bound on the time for matching if necessary.
For Li et at. , the complexity for computing the descriptors is bounded by the computation of the Delaunay triangulation. However, as stated in section IV-A, the number of triangles can be larger than the number of points.
It is worth noting that our method and Li et at are implemented in pure Python. For GLAROT, we ported the original author’s C++ code to Python. However, during our experiments we run the signature computation in pure Python and matching in C++ with Python bindings due to the high execution time of the Python version.
|Big O||Time (ms)||Big O||Time (ms)|
|Li et al. (Py)|
Vi Discussion and Conclusions
Place recognition and data association is a challenging problem, especially in a forest where trees which look identical are the only available landmarks or scenarios where classic features are not reliable due to weather or view-point changes.
We presented a method that defines sets of polygons based on tessellations computed on the position of landmarks. We proposed a framework to identify previously seen locations from these polygons, and landmark associations to compute an affine transformation to align observations.
In our simulation experiments, we show that the proposed method is more reliable and robust compared to the benchmark methods. The construction of polygons narrows the search space of possible point correspondences. Combined with consolidated shape retrieval techniques for matching, it yields a more robust framework than relying purely on geometric properties such as polygon area and perimeter.
In the real-world experiments, we show the advantages of having reliable features that describe only parts of the observation while merging sub-maps with partial overlap and noise, being the only method that generated consistent sub-maps with respect to the continuity of the trajectory and that is able to merge the accumulated maps.
One of the drawbacks of using local structures without a global signature such as GLARE, is that the number of elements to match grows significantly faster as we accumulate observations for tasks such as loop closure. Our method reduces the search space when compared with local method yet still grows in computational cost with the number of landmarks. We presented a solution based on sampling polygons but show that it decreases the performance of the method.
Directions for future work include better polygon sampling techniques, an extension of the descriptor computation and matching to 3D shapes, and ways to incorporate other properties, such as dimensions of the objects to improve accuracy.
-  (2007) Fast poisson disk sampling in arbitrary dimensions.. SIGGRAPH sketches 10, pp. 1278780–1278807. Cited by: §V-B.
-  (2020) SLOAM: semantic lidar odometry and mapping for forest inventory. IEEE Robotics and Automation Letters 5 (2), pp. 612–619. Cited by: §I, §IV-D, §IV, §V-C.
The complexity of theorem-proving procedures.
Proceedings of the third annual ACM symposium on Theory of computing, pp. 151–158. Cited by: §IV-B.
-  (1934) Sur la sphere vide. Izv. Akad. Nauk SSSR, Otdelenie Matematicheskii i Estestvennyka Nauk 7 (793-800), pp. 1–2. Cited by: §III, §IV-A.
-  (1992) Error correction in mobile robot map learning. In Proceedings 1992 IEEE International Conference on Robotics and Automation, pp. 2555–2556. Cited by: §II.
-  (1996) A density-based algorithm for discovering clusters in large spatial databases with noise.. In Kdd, Vol. 96, pp. 226–231. Cited by: §V-C.
-  (2019) CLEAR: a consistent lifting, embedding, and alignment rectification algorithm for multi-view data association. arXiv preprint. Cited by: §V-C.
-  (2012) Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics 28 (5), pp. 1188–1197. Cited by: §I.
-  (2018) LoST? appearance-invariant place recognition for opposite viewpoints using visual semantics. Proceedings of Robotics: Science and Systems XIV. Cited by: §I.
-  (2016) Structure-based vision-laser matching. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 182–188. Cited by: §I.
-  (2018) X-view: graph-based semantic multi-view localization. IEEE Robotics and Automation Letters 3 (3), pp. 1687–1694. Cited by: §I, §V-B.
-  (2014-09) Large scale place recognition in 2D LIDAR scans using Geometrical Landmark Relations. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, pp. 5030–5035. External Links: Cited by: §I, §V-A.
-  (2016) Efficient loop closure based on falko lidar features for online robot localization and mapping. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1206–1213. Cited by: §I.
-  (2013) Appearance-based loop closure detection for online large-scale and long-term operation. IEEE Transactions on Robotics 29 (3), pp. 734–745. Cited by: §I.
-  (2014) Online global loop closure detection for large-scale multi-session graph-based slam. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2661–2666. Cited by: §I.
-  (2020) Localization in unstructured environments: towards autonomous robots in forests with delaunay triangulation. Remote Sensing 12 (11), pp. 1870. Cited by: §IV-A, 4(d), §V-A, §V-A, §V-D.
-  (2015) Visual place recognition: a survey. IEEE Transactions on Robotics 32 (1), pp. 1–19. Cited by: §I.
-  (2012) An unmanned aircraft system for automatic forest fire monitoring and measurement. Journal of Intelligent & Robotic Systems 65 (1-4), pp. 533–548. Cited by: §I.
-  (2018) Mapping forests using an unmanned ground vehicle with 3d lidar and graph-slam. Computers and Electronics in Agriculture 145, pp. 217–225. Cited by: §II.
-  (2019) Geometric relation distribution for place recognition. IEEE Robotics and Automation Letters 4 (2), pp. 523–529. Cited by: §I.
-  (2017) Place recognition of 3d landmarks based on geometric relations. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 648–654. Cited by: §I.
-  (2009) Fast point feature histograms (fpfh) for 3d registration. In 2009 IEEE international conference on robotics and automation, pp. 3212–3217. Cited by: §I.
-  (2010) Semantic 3d object maps for everyday manipulation in human living environments. KI-Künstliche Intelligenz 24 (4), pp. 345–348. Cited by: §I.
Data association aware semantic mapping and localization via a viewpoint-dependent classifier model. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8 (en). Cited by: §I.
-  (2018) Search and rescue under the forest canopy using multiple uas. In International Symposium on Experimental Robotics, pp. 140–152. Cited by: §I, §I, §II.
-  (1980) Algorithms for computation of relative neighbourhood graph. Electronics Letters 16 (14), pp. 556–557. Cited by: §III.
Nature conservation drones for automatic localization and counting of animals.
European Conference on Computer Vision, pp. 255–270. Cited by: §I.
-  (2011) Viewpoint invariant 3d landmark model inference from monocular 2d images using higher-order priors. In 2011 International Conference on Computer Vision, pp. 319–326. Cited by: §II.
-  (2001) A comparative study on shape retrieval using fourier descriptors with different shape signatures. In Proc. of international conference on intelligent multimedia and distance education (ICIMADE01), pp. 1–9. Cited by: §IV-B, §IV-B.