During the past years many techniques have been introduced that can perform simultaneous localization and mapping using both regular cameras and depth sensing technologies based on either structured light, ToF, or LiDAR. These sensing technologies are producing more accurate depth measurements, they are still far from perfect. As a result, the existing methods still suffer from the drift problem. These drift errors could be corrected by incorporating sensing information taken from places that have been visited before, or loop-closure detection, which requires algorithms that can be able to recognize revisited areas. Unfortunately, existing solutions for loop detection for 3D LiDAR data are not both robust and fast enough to meet the demand of real-world SLAM applications.
In this paper, a global descriptor for a LiDAR point cloud, called LiDAR Iris, is proposed for fast and accurate loop-closure detection. The name of our LiDAR discriptor is originated from the society of person identification based on human’s iris signature. As shown in Fig.1, the commonly adopted Daugman’s Rubber Sheet Model  is used to remap each point within the iris region to a pair of polar coordinates (r,). where r is on the interval [, ], and is angle [0,2]. We observe the similar characteristics between the bird’s eye view of a LiDAR point cloud and an iris image of a human. Both can be represented in a polar-coordinate frame, and be transformed into a signature image. Figure 2 shows the bird’s eye views of two LiDAR point clouds, and two extracted LiDAR-Iris images, respectively, based on the Daugman’s Rubber Sheet Model.
With the extracted LiDAR-Iris image representation, a binary signature image can be obtained for each point cloud after a series of LoG-Gabor filtering and thresholding operations on the LiDAR-Iris image. Given two point clouds, the similarity of them can be calculated as the hamming-distance of two corresponding binary signature images extracted from the two point clouds. Our LiDAR-Iris method can deal with pose variation problem in LiDAR-based loop-closure detection.
Ii Related Work
Compared with visual loop-closure detection, LiDAR-based loop-closure detection has received more and more attention due to its robustness to illumination changes and high accuracy in localization.
In general, loop detection using 3D data can roughly be categorized into four main classes. The first category is the point-to-point matching, which operates directly on the point clouds. Popular methods in this category include the iterative closest points (ICP)  and its variants [4, 1], in the case when two point clouds are already roughly aligned.
To improve matching capability and robustness, the second category applies corner (keypoint) detector to the 3D point cloud and extract a local descriptor from each keypoint location, and conduct scene matching based on a bag-of-words (BoW) model. Many keypoint detection methods have been proposed in the literature, such as 3D Sift, 3D Harris , 3D-SURF, intrinsic shape signatures (ISSs) , as well as descriptors such as SHOT  and B-SHOT, etc.
However, the detection of distinctive keypoints with high repeatability remains a challenge in 3D point cloud analysis. One way to deal with this problem is to extract global descriptors (represented in the form of histograms) from point cloud, e.g., the point feature histogram , ensemble of shape functions  (ESF), fast point feature histogram (FPFH)  and the viewpoint feature histogram (VFH) . Recently, a novel global 3D descriptor for loop detection, named multiview 2D projection (M2DP) , is proposed. This descriptor suffer from the lack of rotation-invariance, where a PCA operation applied to align point cloud is not a robust way to achieve invariance to rotation.
However, either the global or local descriptor matching continues to suffer from, respectively, the lack of descriptive power or the struggle with invariance. More recently, convolutional neural networks (CNNs) models are exploited to learn both feature descriptors[6, 20, 22, 7] as well as metric for matching point cloud [2, 11, 21]
. However, a severe limitation of these deep-learning based methods on the other hand is that they need a tremendous amount of training data. Moreover, they do not generalize well when trained and applied on data with varying topographies or acquired under different conditions.
The most similar work to our method is the Scan-Context (SC) 
for loop-closure detection, which also exploits the expanded bird-eye view of LiDAR’s point cloud. Our method is different in three aspects: First, we encode height information of surroundings as the pixel intensity of the LiDAR-Iris image. Second, we extract a discriminative binary feature map from the LiDAR-Iris image for loop-closure detection. Third, our loop-closure detection step is rotation-invariant with respect to LiDAR’s pose. In contrast, in the Scan-Context method, only the maximum-height information is encoded in their expanded images, and also no feature extraction step. In addition, the Scan-Context method is not rotation-invariant, where a brute-force matching scheme is adopted.
Iii LiDAR Iris
This section consists of five different parts: discretization and encoding of bird’s eye view image, generation and binarization of LiDAR-Iris image.
Iii-a Generation of LiDAR-Iris Image Representation
Given a point cloud, we first project it to its bird’s eye view, as shown in Fig. 3. We keep a square of area of as the valid sensing zone where the LiDAR’s position is the center of the square. Typically, we set to 80 in all of our experiments. The sensing square is discretized into 80 (radial direction) 360 (angular direction) bins, , with the angular resolution being 1 and radial resolution being 1 (shown in Fig.3).
In order to fully represent the point cloud, one can adopt some feature extraction methods from points within each bin, such as height, range, reflection, ring and so on. For simplicity, we encode all the points falling within the same bin with an eight-bit binary code. If given an -channel LiDAR sensor L, its horizontal field-of-view (FOV) angle is 360 and its vertical FOV of V degrees, with the vertical resolution being degrees. As shown in Fig.3, if the largest and smallest pitch angle of all scan channels is and , respectively, the highest and lowest point that the scan-line can reach is about above ground and under ground in theory ( is the z-coordinate of the scanned point), respectively. In practice, the height of the lowest scanning point is set to the negative of the vehicle’s height, i.e., ground plane level. We use and to represent the two quantities, respectively.
In this paper, we use the Velodyne HDL-64E (KITTI dataset) and VLP-16 (our own dataset) as the LiDAR sensors to validate our work. Typically, the is for the HDL-64, the and are -3 and 5, respectively, if is set to 80 and the height of LiDAR is about 3. Likewise, the is for the VLP-16, the and are -2 and 22, respectively, if is set to 80 and the height of LiDAR is about 2.
With these notions, we encode the points falling within each bin , denoted by , as follows. First, we linearly discretize the range between and into 8 bins, denoted by . Each point of is assigned into one of the bins of based on the y-coordinate of each point. Afterwards, is set to be zero if is empty. Otherwise, is set to be one. Thus, we are able to obtain an 8-bit binary code for each , as shown in Fig.3. The binary code within each bin is turned into a decimal number between 0 and 255.
Inspired by the work in iris recognition, we can expand the LiDAR’s bird-eye view into an image strip, a.k.a. the LiDAR-Iris image, based on the Daugman’s Rubber Sheet Model . The pixel’s intensity of the LiDAR-Iris image is the decimal number calculated for each , with the size of the LiDAR-Iris image equal to that of . Apparently, the convenience in generating the LiDAR-Iris image representation originates from the above discretization and encoding procedures. Note that the obtained LiDAR-Iris images of the same geometrical location is up to a translation if we assume a 3D (x,y and yaw) pose space. In general, our method can be applied to detect loop-closure in 6D pose space if the LiDAR is combined with an IMU sensor. With the calibration of LiDAR and IMU, we can re-align point cloud so that the z-axis of LiDAR is identical with gravity direction.
In Fig.2, the bottom two images are the LiDAR-Iris image representation of the two corresponding LiDAR point clouds. The two point clouds are collected when robot was passing by the same geometrical location twice, and they are approximately subject to a rotation. Correspondingly, the two LiDAR-Iris images are mainly subject to a cyclic translation.
Iii-B Fourier transform for a translation-invariant LiDAR Iris
The translation variation shown above can cause a significant degrade in matching LiDAR-Iris images for LiDAR-based loop-closure detection. To deal with this problem, we adopt the Fourier transform to estimate the translation between two LiDAR-Iris images. The Fourier-based schemes are able to estimate large rotations, scalings, and translations. It is noted that the rotation and scaling factor is irrelevant in our case. Suppose that two LiDAR-Iris imagesand differ only by a shift () such that = . The Fourier transform of and is related by
Correspondingly, the normalized cross power spectrum is given by
where indicates the complex conjugate. Taking the inverse Fourier transform , meaning that is nonzero only at () = . Figure 5 gives an example of alignment based on Fourier transform, where the third row is a transformed version of the second LiDAR-Iris image.
Iii-C Binary feature extraction with LoG-Gabor filters
To enhance the representation ability, we exploit LoG-Gabor filters to extract more features from LiDAR-Iris images. LoG-Gabor filter can be used to decompose the data in the LiDAR-Iris region into components that appear at different resolutions, and it has the advantage over traditional Fourier transform in that the frequency data is localised, allowing features which occur at the same position and resolution to be matched up. We only use 1D LoG-Gabor filters to ensure a real-time capability of our method. A one dimensional Log-Gabor filter has the frequency response:
where and are the parameters of the filter. will give the center frequency of the filter. affects the bandwidth of the filter. It is useful to maintain the same shape while the frequency parameter is varied. To do this, the ratio should remain constant.
Eight 1D LoG-Gabor filters are exploited to convolve each row of the LiDAR-Iris image, where the wavelength of the filter is increased by the same factor, resulting in the real and the imaginary part for each filter. As shown in Fig.4, the first image shows the eight LoG-Gabor filters, and the second image shows the real and imaginary parts of the convolution response with the first four filters.
Empirically, we have tried using different number of LoG-Gabor filters for feature extraction, and have found that four LoG-Gabor filters can achieve best loop-closure detection accuracy at a low computational cost. Figure 6 shows the accuracy that can be achieved on a validation dataset with different number of LoG-Gabor filters, where we obtain best results with the first four filters. Therefore, we only use the first four filters in obtaining our experimental results. The convolutional responses with the four filters are turned into binary by a simple thresholding operation, and thus we stack them into a large binary feature map for each LiDAR-Iris image. For example, the third image of Figure 4 shows one binary feature map for a LiDAR-Iris image.
Iv Loop-Closure Detection with LiDAR Iris
In a full SLAM method, loop-closure detection is an important step to trigger the backend optimzation procedure to correct the already estimated pose and maps. To apply the LiDAR Iris to detect loops, we obtain a binary feature map with LiDAR-Iris representation for each image. Therefore, we can obtain a history database of LiDAR-Iris binary features for all keyframes that are saved when robot traversing in a scene. The distance between the LiDAR-Iris binary feature maps of the current keyframe and each of the history keyframes is calculated by Hamming distance. If the obtained Hamming distance is smaller than a threshold, it is regarded as a loop-closure event.
In this section, we compare the LiDAR-Iris method with a few other popular algorithms in loop-closure detection. Since LiDAR Iris is a global descriptor, the performance of our method is compared to three other methods that extract global descriptors for a 3D point cloud. Specifcally, they are Scan-Context, M2DP and ESF, respectively. The codes of the three compared methods are available. The ESF method is implemented in the Point Cloud Library(PCL), and the Matlab codes of Scan-Context and M2DP can be downloaded from the author’s website. All experiments are carried out on the same PC with an Intel i7-8550U CPU at 1.8GHZ and 8GB memory.
All experimental results are obtained based on performance comparison on three KITTI odometry sequences and two collected on our campus. These datasets are considering diversity, such as the type of 3D LiDAR sensors (64 channels for Velodyne HDL-64E and 16 channels for Velodyne VLP-16) and the type of loops (eg., loop events occuring at places where robot moves in the same or opposite direction).
: Among the 11 sequences with the ground truth of pose (from 00 to 10), we selected three sequences 00, 05 and 08 which contains the largest number of loop-closure events. The sequence 08 has loop events only occuring at locations where robot/vehicle moves in the directions opposite to each other, and others have only loop events in the same direction. The scans of the KITTI dataset had been obtained from the Velodyne HDL-64E. Since the KITTI dataset provides scans with indexes, we use obtain loop-closure data easily.
: We collected our own data on campus by using the Velodyne VLP-16 mounted on our mobile robot, Fig.8(c). We selected two different-sized scenarios from our own VLP-16 dataset for the validation of our method. The smaller scene Fig.8(b) has only loop events of the same direction, and the larger scene Fig.8(a) has loop events of both the same and opposite directions. In order to get the ground truth of pose and location for our data, we use high-fidelity IMU/GPS to record the poses and locations of each LiDAR’s frame. We only use the keyframes and their ground-truth locations in our experiment. Note that the distance between two keyframe locations set to 1.
V-B Experimental Settings
In order to demonstrate the performance of our method thoroughly, we adopt two different protocols when evaluating the recall and precision of each compared method.
The first protocol, , is a real one for loop-closure detection. Suppose that the keyframe for current location is , to find whether the current location have been traversed or not, we need to match with all previous keyframes in the map database except the very closed ones, e.g., the 30 keyframes ahead of the current one. By setting a threhold on the feature distance, denoted by , between and the closest match in the database, we can predict whether corresponds to an already traversed place or not. If the feature distance is no larger than , is predicted as a loop closure. Otherwise, is predicted as not a loop closure. To obtain the true-positive and recall rate, the prediction is further verified against the ground-truth. For example, if is predicted as a loop-closure event, it is regarded as a true positive only if the ground truth distance between and the closest match in the database is less than 4m. Note that the 4m-distance is set as default according to .
The second protocol, , treats loop-closure detection as a place re-identification (re-ID) problem. First, we create the positive and negative pairs according to Euclidean distance between the two keyframes’ locations. For example, the and are two keyframes of the same sequence and their poses are and , respectively. If , the pair is a positive match pair, and otherwise negative pair. By calculating pairwise feature distance for all , we can get an affinity matrix for each sequence, shown in Fig.7. Likewise, by setting a threhold on the affinity matrix, we can obtain the true-positive and recall rate.
V-C Performance Comparison
All experiments in this paper, we set the parameters of Scan-Context as , and used in their paper and the default parameters of the available codes for M2DP and ESF. All methods use the raw point cloud without downsampling.
As shown in the top row of Fig.9, in the KITTI sequence 05, three segments of loop-closure events are labeled in different colors in the trajectory and the corresponding affinity matrix, respectively. The bottom row also highlights the loop-closure part in the shorter sequence of our own data. Figure 7 shows the affinity matrices obtained by the four compared method on two exemplar sequences (KITTI 05 and the shorter one collected by us). From left to right are the ground-truth results, LiDAR Iris, Scan-Context, M2DP and ESF, respectively. It is clear that the loop-closure areas can be detected effectively by the LiDAR-Iris and Scan-Context methods. Intuitively, it shows that the global descriptor generated by using our approach can more easily distinguish positive match pairs from negative match pairs. Although the affinity matrix obtained by Scan-Context can also find the loop-closure areas, a lot of negative pairs also exhibit low matching values, which can more easily cause false positives. In contrast, M2DP and ESF are much worse than ours and the Scan-Context method.
The performance of our method can also be validated from the precision-recall (PR) curve in Fig.10, where the top and bottom rows represent the results based on and , respectively.
From left to right are the results of KITTI 00, KITTI 05, KITTI 08, our smaller scene and larger scene data, respectively. The ESF approach shows worst performance on all sequences under the two protocols. This algorithm strongly relies on the histogram and distinguish places only when the structure of the visible region is substantially different.
Under the protocol , M2DP reported high precision on the sequence that has only same-direction loop-closure events, which shows that M2DP can detect these loops correctly. However, when the sequence contains only opposite-direction loop-closure events (such as the KITTI 08) or has both same- and opposite-direction loop-closure events (our larger scene dataset), it fails to detect a loop-closure. The Scan-Context method can achieve promising results. In contrast, our approach demonstrates very competitive performance on the entire five sequences, and achieves the best performance among the four compared methods. The superior performance of our method over the other compared ones originates in the merits of our method. First, the binary feature map of the LiDAR-Iris representation has very discriminative ability. Second, the translation-invariance achieved by the fourier transform can deal with the opposite loop-closure problem. Although the descriptor generated by Scan-Context can also achieve translation-invariance, it seeks a quite brute-force view alignment based matching.
Under the protocol , the number of negative pairs is much larger than that of positive pairs, and the number of total matching pairs to be predicted is much larger than that under protocol . For example, in the KITTI sequence 00, there are 4541 bin files and 790 of them are true loops under the protocol . However, if we set to be 4m, we will generate 68420 positive pairs and 20547720 negative pairs under the protocol . Likewise, our method also achieves the best performance.
V-D Computational Complexity
We only compare our method with Scan-Context in terms of computational complexity, with both methods achieving much better performance than the other two. We evaluated the computational complexity on KITTI sequence 00. The complexity of our method is evaluated in terms of the time spent on the binary feature extraction from LiDAR-Iris image and matching two binary feature maps, not including the time on generating LiDAR Iris. Similarly, the complexity of Scan-Context is evaluated in terms of the time spent on matching two Scan-Context images. Both methods are implemented in Matlab. Specifically, we select a LiDAR frame from the KITTI sequence 00, and then calculate its distance from the same sequence. We obtain the average time it takes for each frame. As shown in Fig.11, the average computation time of our method is about 0.0231s and the time of Scan-Context is 0.0257s.
In this paper, we propose a global descriptor for a LiDAR point cloud, LiDAR Iris, summarizing a place as a binary signature image obtained after a couple of Gabor-filtering and thresholding operations on the LiDAR-Iris image representation. Compared to existing global descriptors using a point cloud, LiDAR Iris showed higher loop-closure detection performance across various datasets under two different protocol experiments.
-  Segal Aleksandr, D. Haehnel, and S. Thrun. Generalized-icp. Robotics: Science and Systems, 2(4), 2009.
-  M. Angelina Uy and G. Hee Lee. Pointnetvlad: Deep point cloud based retrieval for large-scale place recognition. In , pages 4470–4479, 2018.
-  Paul J. Besl and McKay Neil D. Method for registration of 3-d shapes. Robotics-DL tentative, pages 586–606, 1992.
-  Paul J. Besl and McKay Neil D. Rusinkiewicz, s. and levoy, m. International Conference On 3-D Digital Imaging and Modeling, 2001.
-  J. Daugman. How iris recognition works. In International Conference on Image Processing, 2002.
-  A. Dewan, T. Caselitz, and W. Burgard. Learning a local feature descriptor for 3d lidar scans. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.
-  R. Dube, A. Cramariuc, D. Dugas, J. Nieto, R. Siegwart, and C. Cadena. Segmap: 3d segment mapping using data-driven descriptors. In Robotics: Science and Systems, 2018.
-  L. He, X. Wang, and H. Zhang. M2dp: A novel 3d point cloud descriptor and its application in loop closure detection. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016.
-  Giseop Kim and Ayoung Kim. Scan context: Egocentric spatial descriptor for place recognition within 3D point cloud map. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, Oct. 2018.
-  Jan Knopp, Mukta Prasad, Geert Willems, Radu Timofte, and Luc Van Gool. Hough transform and 3d surf for robust three dimensional classification. In European Conference on Computer Vision, 2010.
-  Z. Liu, S. Zhou, C. Suo, P. Yin, C. Wen, H. Wang, H. Li, and Y.-H. Liu. Lpd-net: 3d point cloud learning for large-scale place recognition. In IEEE Int. Conf. Comput. Vision, 2019.
-  S.M. Prakhya, B. Liu, and W. Lin. B-shot: A binary feature descriptor for fast and efficient keypoint matching on 3d point clouds. In IEEE/RSJ international conference on intelligent robots and systems (IROS), 2015.
-  R.B. Rusu, N. Blodow, and M. Beetz. Fast point feature histograms (fpfh) for 3d registration. In IEEE International Conference on Robotics and Automation, page 3212–3217, 2009.
-  R.B. Rusu, N. Blodow, Z.C. Marton, and M. Beetz. Aligning point cloud views using persistent feature histograms. In IEEE/RSJ international conference on intelligent robots and systems (IROS), page 3384–3391, 2008.
-  R.B. Rusu, G.R. Bradski, R. Thibaux, and J.M. Hsu. Fast 3d recognition and pose using the viewpoint feature histogram. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010.
-  S. Salti, F. Tombari, and L. di Stefano. Shot: Unique signatures of histograms for surface and texture description. In Computer Vision and Image Understanding, 2014.
-  P. Scovanner, S. Ali, and M. Shah. A 3-dimensional sift descriptor and its application to action recognition. In ACM Multimedia, 2007.
-  I. Sipiran and B Bustos. A robust 3d interest points detector based on harris operator. In Eurographics Workshop on 3D Object Retrieval, 2010.
-  W. Wohlkinger and M. Vincze. Ensemble of shape functions for 3d object classification. In 2011 IEEE International Conference on Robotics and Biomimetics, pages 2987–2992, Dec 2011.
-  H. Yin, X. Ding, L. Tang, Y. Wang, and R. Xiong. Efficient 3d lidar based loop closing using deep neural network. In IEEE International Conference on Robotics and Biomimetics (ROBIO), 2017.
-  Wang Y. Tang L. Ding X. Yin, H. and R. Xiong. Locnet: Global localization in 3d point clouds for mobile robots. In IEEE Intelligent Vehicles Symposium (IV), pages 26–30, 2018.
-  Xiao J. Yu, F. and T. Funkhouser. Semantic alignment of lidar data at city scale. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1722–1731, 2015.
-  Y Zhong. Intrinsic shape signatures: A shape descriptor for 3d object recognition. In ICCV Workshops, 2009.