Phase-SLAM: Phase Based Simultaneous Localization and Mapping for Mobile Structured Light Illumination Systems

by   Xi Zheng, et al.

Structured Light Illumination (SLI) systems have been used for reliable indoor dense 3D scanning via phase triangulation. However, mobile SLI systems for 360 degree 3D reconstruction demand 3D point cloud registration, involving high computational complexity. In this paper, we propose a phase based Simultaneous Localization and Mapping (Phase-SLAM) framework for fast and accurate SLI sensor pose estimation and 3D object reconstruction. The novelty of this work is threefold: (1) developing a reprojection model from 3D points to 2D phase data towards phase registration with low computational complexity; (2) developing a local optimizer to achieve SLI sensor pose estimation (odometry) using the derived Jacobian matrix for the 6 DoF variables; (3) developing a compressive phase comparison method to achieve high-efficiency loop closure detection. The whole Phase-SLAM pipeline is then exploited using existing global pose graph optimization techniques. We build datasets from both the unreal simulation platform and a robotic arm based SLI system in real-world to verify the proposed approach. The experiment results demonstrate that the proposed Phase-SLAM outperforms other state-of-the-art methods in terms of the efficiency and accuracy of pose estimation and 3D reconstruction. The open-source code is available at


page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8


TagSLAM: Robust SLAM with Fiducial Markers

TagSLAM provides a convenient, flexible, and robust way of performing Si...

LiTAMIN2: Ultra Light LiDAR-based SLAM using Geometric Approximation applied with KL-Divergence

In this paper, a three-dimensional light detection and ranging simultane...

LoopSmart: Smart Visual SLAM Through Surface Loop Closure

We present a visual simultaneous localization and mapping (SLAM) framewo...

RadarSLAM: Radar based Large-Scale SLAM in All Weathers

Numerous Simultaneous Localization and Mapping (SLAM) algorithms have be...

Global Pose Estimation with an Attention-based Recurrent Network

The ability for an agent to localize itself within an environment is cru...

ART-SLAM: Accurate Real-Time 6DoF LiDAR SLAM

Real-time six degree-of-freedom pose estimation with ground vehicles rep...

RGB-D SLAM with Structural Regularities

This work proposes a RGB-D SLAM system specifically designed for structu...

I Introduction

The SLI technology has been widely used for high-precision 3D scanning for many industrial applications with the camera-projector pair. There are usually two approaches for SLI systems to achieve 360 degree 3D reconstruction: controlled motion based and free motion based [19]. The former uses a servo motor to rotate the object along a pre-defined trajectory for multiple view scanning; the latter estimates sensor motions through local and global point cloud registration, such as Iterative Closest Point (ICP) and associated variants [2, 11]. The free-motion approach is advantageous in its flexibility but incurs high computational complexity and demands a high storage capacity.

Fig. 1: A diagram of the proposed Phase-SLAM framework based on the camera-projector pair, which utilizes a 3D-to-2D reprojection model to predict the phase data for an assumed sensor pose, a local optimizer to achieve pose estimation, and a compressive method to enable fast loop closure detection.

Meanwhile, as the 2D phase data produced by SLI systems contain 3D information [22], it is appealing to utilize the phase to achieve high-efficiency pose estimation and loop closure detection. However, to develop a fully functional Phase-SLAM system has to cope with the following technological challenges: (1) how to build the intrinsic relationship between the phase and the transformation of SLI; (2) how to develop a local optimization procedure for estimating 6 DOF motions of the SLI sensor (odometry); (3) how to achieve sparse representation and fast matching of phase data for loop closure detection. Our previous work [24] proposes a geometric reference plane to model the relationship between phases and motions of 6 DoF separately, which is complicated and inconvenient. Besides, if the loop closure detection is based on whole phase images, the memory footprint will also grow quickly as the scanning view increases. This paper presents an upgraded Phase-SLAM framework, which utilizes a 3D point to 2D phase reprojection method to build the model, a gradient based local optimizer to achieve odometry functionality and a compression method to enable efficient loop detection (Fig. 1). The main contributions of this work include,

  1. proposing a reprojection model from 3D point to 2D phase data, which can be used to get phase estimations and measurements;

  2. constructing a local pose optimizer with the reprojection model and the analytical expression of Jacobian matrix is derived for pose estimation;

  3. developing a complete pipeline of Phase-SLAM framework with a compressive loop closure detection scheme and the pose graph optimization;

  4. building simulation and real-world datasets and providing the open-source code for further development.

This paper is organized as follows. Section II introduces the related work. Section III gives an overview of the Phase-SLAM system pipeline. Section IV describes the proposed phase-based pose estimation and compressive loop detection methods. Section V provides experiment results and discussions. Section VI concludes the paper and outline the future works. Appendix supplements the details of the Jacobian matrix in use.

Ii Related work

Most visual SLAM systems are based on either direct or indirect schemes. Direct approaches [6, 5] sample pixels from image regions and minimize the photometric error. Indirect approaches [13, 15] require extra computational resources for detecting and describing features. In contrast, the proposed Phase-SLAM system is based on pixel-level phase data, which contain 3D depth information and can be extracted directly by selecting a region of interest (ROI).

Ii-a Point Cloud Registration

SLI systems often use point cloud registration methods to achieve large fields of view scans, either local or global. Classical local registration methods, such as Point-to-Point ICP [2], minimizes the sum of distances between points and their nearest neighbours. Point-to-Plane ICP [11] assumes that each corresponding point is located on a plane, and introduces surface normals into the objective function to achieve more efficient data registration. Symmetrized objective function (SymICP) have been proposed to extend the planar convergence into a quadratic one at extra computational costs [16].

Local methods are limited by initial guesses, so structural features of point clouds are used to search for transformations globally. Point coordinates and surface normals have been used to compute the Fast Point Feature Histograms (FPFH) [17], and the coplanar 4-point sets have been chosen as features for registration (Super 4PCS) [12]. Besides, Go-ICP uses the branch-and-bound (BnB) scheme to avoid local optima [23]. Fast Global Registration (FGR) applies a Black-Rangarajan duality to achieve a more robust objective function [25]. BCPD++ formulates coherent point drift in a Bayesian setting to supervise the convergence of algorithm [10]. Compared with above 3D point cloud registration methods, our approach converts 3D point cloud registration into 2D phase data registration, resulting in much reduced computational complexity and memory footprint.

Ii-B Loop Closure Detection

Loop closure detection can effectively eliminate the accumulating error. A plain method is randomly sampling a number of keyframes to find loop closures [4]. Odometry based approaches judge whether there is a loop closure at the current position according to the calculated map [9]. Appearance based approaches determine the loop relationship based on the similarity of two scenes [13, 15]. Bag-of-Words (BoW) based the approach [7] uses descriptors (words) for loop closure detection instead of whole images. In this paper, our loop closure detection is based on compressed phase data to reduce both computatonal complexity and storage space without losing much detection performance.

Iii System Setup and Problem Statement

Fig. 2: The system diagram of the proposed Phase-SLAM. Based on the SLI phase image, 3D point clouds for each new sensor pose can be computed, which are used for phase data prediction (Section IV-B) and local pose estimation (Section IV-C). The compressive loop closure detection is performed to trigger the global pose graph optimizer (Section IV-E). Finally, the refined sensor poses are used to achieve 360 degree 3D point clouds of the object under scanning.

Iii-a System Setup

The proposed Phase-SLAM pipeline is shown in Fig. 2. Based on the SLI sensor data, correponding 3D point clouds and the reprojection model are used to obtain phase data (Section IV-B). Then, the local pose optimization module (Section IV-C) is used to estimate sensor poses by minimizing errors between predictions and measurements of phase data. Local pose graphs are updated until the compressive loop closure detection (Section IV-E) is triggerred. The pose graph optimizer then performs global optimization to eliminate the cumulative errors and revise sensor poses. Finally, poses are used to align multi-view point clouds and achieve the overall 3D object reconstruction.

We define the notations used in this paper. The initial position of the projector is chosen as the origin of the world coordinate system. is the world frame, is the camera frame, is the projector frame, and means the -th sensor pose. The and stand for the phase image and phase value at each pixel location, respectively. denotes the estimated value.

is the 3D coordinate of a point. The transformation between two sensor poses is represented by vector

, Matrix and vector represent rotation and translation from to . and can be obtained for a given .

Iii-B Problem Statement

This work aims at developing a complete SLAM system that can estimate the SLI sensor pose transformation through phase data registration and achieve global 360 degree dense 3D reconstruction through pose graph optimization. At each step, 3D points are projected into the sensor imaging plane with initialized pose rotation and translation by using


where the is the perspective transformation with the projection matrix , means that projects a 3D point onto the imaging plane. is the pixel position, which is used for obtaining phase data estimations and measurements . Obtained the and , the sensor pose transformation is estimated by




Such a local optimization procedure requires computing the Jacobian matrix iteratively until it converges. The loop closure detection and pose graph optimization will be also needed to reduce estimation errors.

Iii-C SLI Scanning

In the camera-projector based SLI system, the Phase Measuring Profilometry (PMP) method is used to calculate the phase image, as shown in Fig. 3. The camera captures the raw images of sine patterns deformed by the scanned surface, given by


where (the number of patterns), and are the background brightness and intensity modulation, respectively. The phase image can then be calculated by [22]

Fig. 3: An illustration of SLI imaging system. PMP uses the images of projection patterns to compute phase images; the 3D point clouds are then obtained by triangulation with the calibrated camera-projector parameters.

Iv Proposed methods

This section investigates the geometric model among 3D point, phase data and sensor pose. Comparing with our previous work [24], this work develops a more intuitive and simpler model based on reprojective transformation method. After phase data pairing, the sensor pose motion can be estimated through least-square optimization between phase predictions and measurements. A compressed sensing scheme is adopted to achieve fast loop closure detection for global pose graph optimization.

Iv-a Phase Values under Epipoloar Constraint

Fig. 4: An illustration of SLI imaging principle. The projector is regarded as a camera with the phase pattern. The phase pair should be performed under the epipoloar constraint. The phase values of each column of phase pattern are linearly increased from 0 to 2.

In a SLI system, we regard the projector as another camera, which has similar projection parameters and perspective principles with it. As shown in the left of Fig. 4, according to the epipoloar constraint, a phase value obtained from the phase image can correspond to a pixel location in the “camera” imaging plane (phase pattern), like stereo-vision [1]. In PMP method, the phase pattern is actively projected by the projector, so pixel locations and phase values on pattern plane have a fixed and known relevance. As shown in the right of Fig. 4, the phase value of each column in phase pattern is linearly increased from 0 to 2 and each row in the pattern is the same. This means when we get the phase value of a 3D point from the phase image, we can know its ordinate under the projector’s pattern coordinate. Vice versa, if we knew the projective coordinate (just only ) of a point in the pattern, we could get its corresponding phase value on the phase image by


where is the row height of the projector’s imaging plane.

Iv-B Phase Pairing Based on Reprojective Transformation

As shown in the Fig. 5, a 3D point is measured by the SLI sensor at the with the coordinate . Assuming the transformation: rotation matrix and translation vector , the SLI move to by it and the point will have a new coordinate , given by

Fig. 5: An illustration of reprojection from 3D points to 2D phase data. A 3D point P obtained by is reprojected into the imaging plane of camera and projector in with a assumed rotation and translation . Then the errors between the predicted and measured phase data ( and ) are minimized with respect to and .

Based on the new coordinate, the point is reprojected into camera and projector imaging plane in to get two pixel locations on them: and by the transformation (Eq. (1)), respectively. On the projector imaging plane (the phase pattern in Fig. 4), when the reprojection ordinate of point : is known, the phase value prediction can be obtained by Eq. (6). So, combining Eq. (1, 6, 7), the phase value of the a 3D point in the phase pattern can be estimated by using


where is the element of , , is calibration parameter (projector’s focal length and principal point along the row of the projector imaging plane, respectively).

On the camera imaging plane, the phase value measurement can be obtained from the phase image at the pixel location , which can be computed by


where is the element of projection matrix . When and

are not integers, bilinear interpolation on

can be used to calculate phase data at integer indices.

Fig. 6: An illustration of a simple local pose optimization process. (a) an object raw image acquired at ; (c) the phase image acquired at with a white ROI; (d) the phase image acquired at ; (b) the plot of optimization objective cost between two phase images (within the ROI) with respect to different sensor displacements, where the black dashed line indicates the ground truth of the minimum.

Iv-C Local Pose Optimizer

In the local optimizer, the state variables are defined as , which is equivalent to and . The error between and are given by


The objective function is shown in


the is a ROI in phase images, is the -th point in ROI. .

The proposed function Eq. (11) can be solved by iterative gradient-based methods [14]. Given the initial value , the cost function can be approximated by Taylor expand about , and , where


is the Jacobian matrix, the optimization increment is computed by , which is the negative gradient direction of , controls the size of steps. The solution is updated by , where is the iterative index [1, 14].

Fig. 6 shows a simple example the optimization process. Fig. 6 (a) shows an object image. (c) shows the corresponding phase image with a ROI. (d) shows the phase images acquired at a new sensor pose. (b) shows the plot of errors between two sets of phase data (within the ROI) with respect to the displacements of the SLI sensor. It can be seen that such an optimization process can be converged to the local minimum [26].

Iv-D The Jacobian Matrix

According to Eq. (10, 12), the Jacobian matrix of () is given by


where and are vertical and horizontal gradients of , computed by pixel difference. More details in Eq. (13) are provided in the Appendix, and the other term in the Eq. (12) is substituted by


Iv-E Loop Closure Detection

Fig. 7: An illustration of the sparsity of phase and photo images. (a,c) A phase image and a photo image; (b,d) the corresponding wavelet coefficients of two images; (e) the wavelet coefficient L1 norms of two types of images within one SLAM loop.

The proposed Phase-SLAM utilizes the Compressive Sensing (CS) technique to reduce computational complexity and data storage space for loop closure detection. The compressibility of an image is determined by its sparsity. More sparse images will lose less information after compression and the sparse image contains less high-frequency information [3]. A haar wavelet bases and norm are used to illustrate the degree of sparsity of phase images. As shown in Fig. 7, a phase image and a photo image are projected upon wavelet bases first. Then the norms of the wavelet coefficients of two types of images within one SLAM loop are compared in Fig. 7 (e). It can be seen that the norms of the wavelet coefficients of phase images are much smaller than photo images, indicating the degree of sparsity of phase images is much smaller than photo images.

According to the CS theory [3], two signals and are distinguishable after compression if the matrix satisfies


where is a constant, and is Gaussian matrix. The compressed signals , has quite smaller size than the original signals. For a 2D phase image , we first reshape it into a 1D vector . The reshaped phase data can be recovered by the compressed signal , and the error between two compressive phase vector is shown in


When is smaller than a threshold, the loop-closure is detected.

Iv-F The Pipeline of Phase-SLAM

After successful loop-closure detection, the pose graph optimization technique [8] will be used to eliminates cumulative error and refine poses. The pose sequences in our system usually have a large interval during the scanning process, so every estimated pose is a vertex in the pose graph optimizer.

V Experiment Results and Discussions

The proposed Phase-SLAM system was evaluated with both the Unreal Engine 4 (UE4) simulator and real-world experiments. All experiments were implemented on a PC with an Intel Core i7-9800K CPU @ 3.6GHz.

V-a Simulation Experiments

Fig. 8:

An illustration of the proposed compressive loop closure detection. (a) Three phase images; (b) the Gaussian pseudo-random matrix used for compressive projection; (c) three sets of compressed signals corresponding to three phase images used for loop closure detection.

Fig. 9: A comparison of global 3D point cloud registration results along with ground-truth. (a)-(c) Simulation datasets named David, Elephant and Dancing girl; (e)-(f) real-world datasets named David-6DoF, PiKaChu and Sona. (Top Row) The ground truth (in gray) and reconstruction results (in other colors) by using the proposed Phase-SLAM; (Bottom Row) the 3D objects.

The simulation dataset was collected with the Airsim plugin in UE4. Different 3D models were used as targets, and the virtual SLI device moved around the target along a radius of 120 cm and with a rotation interval of 20 degrees. The simulated dataset is based on three models namely David, Elephant and Dancing girl, which contains calibration parameters, phase images and ground-truth poses. The baseline methods include four state-of-the-art local methods, namely Point-to-Point ICP [2], Point-to-Plant ICP [11], SymICP [16] and FPFH [17], and two SOTA global methods, namely FGR [25] and BCPD++ [10]. Local methods were conducted based on Point Cloud Library (PCL) implementation [18]. Global methods were based on open-source code. The numbers of iterations of Point-to-Point ICP, Point-to-Plane ICP, and SymICP were chosen as 30; FPFH was 10000. The implementation of FGR and BCPD++ used the recommended parameters.

Fig. 10: The plot of Relative Pose Errors [20] of (top) rotation and (bottom) translation in simulation. PhaseS-Loop denotes Phase-SLAM with pose graph optimization.

The compression of phase images is shown in Fig. 8. In simulation experiments, the resolution of phase images is (Fig. 8 (a)), the size of Gaussian compressive random matrix is chosen as (Fig. 8 (b)), that is, the compression ratio is 3070:1 and the size of the compressed phase signal is . Fig. 8 (c) shows the compressed signals corresponding to three phase images like Fig. 8 (a). It can be seen that the three sets of signals are distinguishable in terms of the peaks and valleys for loop closure detection. Furthermore, experiment results show that the time consumption of the back-end optimization using CS technique can be reduced by than using original phase images.

Fig. 9 (a)-(c) shows the 3D reconstruction results (top) and ground truth (bottom) for the three simulation targets (David, Elephant and Dancing girl) using the proposed Phase-SLAM with loop closure. More quantified reconstruction errors are shown in Fig. 15. Fig. 10 shows the relative pose errors (RPE) [20] in rotation (top) and translation (bottom), respectively by using five methods with David dataset. It can be seen that the proposed Phase-SLAM method with loop closure (PhaseS-Loop) outperforms other four methods. The median RPE of Phase-SLAM is 0.81 degree and 0.94cm in rotation and translation, respectively. Table I shows the root mean squared error (RMSE) of absolute trajectory error (ATE) [20] and the computation time for three different datasets. The average RMSE of our approach is 1.06cm, which is better than other methods. Actually, Phase-SLAM with loop closure outperforms PhaseS by . In simulations, the average number of 3D points corresponding to the image is around 50000. BCPD++ has the highest compuation speed among the 6 existing methods. Our approach is still almost two times faster than BCPD++. And the average running time of the back-end optimization is 40.7 ms.

Method David Elephant Dancing Girl
PhaseS-Loop[ours] 1.39 / 1.52 0.72 / 1.58 1.07 / 0.82
PhaseS[ours] 2.32 / 1.49 2.40 / 1.23 2.05 / 0.76
BCPD++[10] 25.53 / 2.87 87.09 / 3.46 15.57 / 2.68
SymICP[16] 6.35 / 109.10 7.06 / 146.71 8.17 / 104.22
Point to Plane[11] 6.76 / 55.15 13.07 / 65.73 10.57 / 45.89
Point to Point[2] 17.17 / 32.56 19.35 / 44.68 40.26 / 30.75
FGR[25] 11.12 / 25.56 15.67 / 34.68 38.74 / 26.90
FPFH[17] 8.99 / 70.59 25.10 / 79.09 32.70 / 59.68
  • RMSE of ATE: The root mean squared error of absolute trajectory error.

TABLE I: RMSE of ATE (cm) / Computation time (s)

V-B Real-World Experiments

Fig. 11 shows the experiment setup, where the SLI sensor, consisting of a projector (DLP3000 DMD from TI) and an industrial camera (12801024 resolution from HIKVISION), is mounted on a UR5 robotic arm. Fig. 9

(d-f) show two plaster statues (David, Sona) and a plush toy (PiKaChu) used to build real-world datasets, namely David-6DoF, David-3DoF, Sona-3DoF and PiKaChu-3DoF, where 6DoF and 3DoF stand for six and three degrees of freedom motions, respectively. The David-6DoF dataset includes 31 random poses; Each 3DoF dataset has 37 poses at equal intervals of 10 degrees and a radius of 60cm.

Fig. 12 shows the RPE results of five different methods using the David-6DoF dataset. It is clear that the proposed method (PhaseS-Loop) has a better performance than other four methods. Table II is the RMSE of ATE and the computation time for four different datasets using eight different methods. It can be seen that the proposed method outperforms other six methods in both terms of accuracy and computation time. The 3D object reconstruction results using the proposed method under real-world datasets are shown in Fig. 9 (d-f).

Fig. 11: (a)The real-world experiment setup where a object is fixed on a bracket and the SLI sensor is installed on a UR5 robotic arm. (b)The SLI sensor consists of a DLP3000 projector and a HIKVISION camera.
Fig. 12: The plot of Relative Pose Errors [20] of (top) rotation and (bottom) translation in real-world experiments.
Method David-6DoF David-3DoF PiKaChu Sona
PhaseS-Loop 4.69/4.20 4.71/3.19 2.09/3.30 1.83/3.18
PhaseS 6.12/4.17 5.74/3.17 3.27/3.27 2.29/3.15
BCPD++ 244.41/2.79 53.01/2.93 21.72/3.72 22.39/3.81
SymICP 99.28/374.26 28.78/345.25 35.68/268.69 30.88/242.12
Point to Plane 101.5/168.53 33.66/152.13 33.97/119.23 36.65/109.15
Point to Point 170.2/118.42 89.28/101.54 70.36/81.34 84.44/77.9
FGR 282.3/238.25 224.21/213.15 231.17/302.45 149.92/191.7
FPFH 109.85/386.8 95.51/153.66 254.34/202.14 90.81/217.9
  • RMSE of ATE: The root mean squared error of absolute trajectory error.

TABLE II: RMSE of ATE (mm) / Computation Time (s)

Fig. 13 illustrates the estimated SLI sensor trajectory and the ground truth under David-6DoF dataset, where the total trajectory length is 3.967m. Fig. 14 shows the pose estimation errors by using local methods (SymICP, Point-to-Plant and Point-to-Point ICP) and our method without global optimizaiton under different initial values. We can see that the proposed method is least sensitive to initial values. Fig. 15 shows a radar chart of seven methods without global optimization for all datasets using five performance metrics (Hausdorff distance, computation time, translation/rotation errors, and storage space) for a comprehensive evaluation. The Hausdorff distance is used to describe the dissimilarity between reconstructed point clouds and the ground-truth [21]. It is obvious that the proposed method has the superior performance in all those metrics.

Fig. 13: The plot of the estimated sensor trajectory by using the full pipeline of the proposed Phase-SLAM on David-6DoF. The ground-truth is obtained via the UR5 robotic arm.
Fig. 14: The plot of registration errors of (top) translation and (bottom) rotation with respect to different sensor pose initializations. The x-axis is the percentage of perturbation for pose initialization with respect to the ground-truth.
Fig. 15: The radar chart of 5 performance metrics for 7 different algorithms. The rotation and translation errors are measured via the Euler distances; the Hausdorff distance is used to measure the dissimilarity between two point clouds.

Vi Conclusion

This paper presents a phase based Simultaneous Localization and Mapping (Phase-SLAM) pipeline for fast and accurate SLI sensor pose estimation and 3D object reconstruction. The proposed reprojection model and local pose optimizer can achieve the odometry functionality with high efficiency, accuracy and low sensitivity to initial pose knowledge. The proposed compressive loop closure detection technique can reduce both the loop closure computational time and data storage space. Even without global optimization, the proposed local data registration method outperforms six other existing 3D point cloud based methods in terms of sensor pose estimation accuracy, storage space, computation time and 3D reconstruction errors. The code of our framework and the dataset in use are available online.


The analytic expression of the Jacobian of with respect to is provided in this section. The intermediate terms are given by


The analytic expression of Jacobian is then given by



  • [1] A. M. Andrew (2001)

    Multiple view geometry in computer vision

    Kybernetes. Cited by: §IV-A, §IV-C.
  • [2] P. J. Besl and N. D. McKay (1992) Method for registration of 3-d shapes. In Sensor fusion IV: control paradigms and data structures, Vol. 1611, pp. 586–606. Cited by: §I, §II-A, §V-A, TABLE I.
  • [3] E. J. Candes (2008) The restricted isometry property and its implications for compressed sensing. Comptes rendus mathematique 346 (9-10), pp. 589–592. Cited by: §IV-E, §IV-E.
  • [4] F. Endres, J. Hess, J. Sturm, D. Cremers, and W. Burgard (2013) 3-d mapping with an rgb-d camera. IEEE transactions on robotics 30 (1), pp. 177–187. Cited by: §II-B.
  • [5] J. Engel, V. Koltun, and D. Cremers (2017) Direct sparse odometry. IEEE transactions on pattern analysis and machine intelligence 40 (3), pp. 611–625. Cited by: §II.
  • [6] J. Engel, T. Schöps, and D. Cremers (2014) LSD-slam: large-scale direct monocular slam. In Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars (Eds.), Cham, pp. 834–849. Cited by: §II.
  • [7] D. Gálvez-López and J. D. Tardos (2012) Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics 28 (5), pp. 1188–1197. Cited by: §II-B.
  • [8] G. Grisetti, R. Kümmerle, C. Stachniss, and W. Burgard (2010) A tutorial on graph-based slam. IEEE Intelligent Transportation Systems Magazine 2 (4), pp. 31–43. Cited by: §IV-F.
  • [9] D. Hahnel, W. Burgard, D. Fox, and S. Thrun (2003) An efficient fastslam algorithm for generating maps of large-scale cyclic environments from raw laser range measurements. In Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453), Vol. 1, pp. 206–211. Cited by: §II-B.
  • [10] O. Hirose (2020) Acceleration of non-rigid point set registration with downsampling and gaussian process regression. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §II-A, §V-A, TABLE I.
  • [11] K. Low (2004) Linear least-squares optimization for point-to-plane icp surface registration. Chapel Hill, University of North Carolina 4 (10), pp. 1–3. Cited by: §I, §II-A, §V-A, TABLE I.
  • [12] N. Mellado, D. Aiger, and N. J. Mitra (2014) Super 4pcs fast global pointcloud registration via smart indexing. In Computer graphics forum, Vol. 33, pp. 205–215. Cited by: §II-A.
  • [13] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos (2015) ORB-slam: a versatile and accurate monocular slam system. IEEE transactions on robotics 31 (5), pp. 1147–1163. Cited by: §II-B, §II.
  • [14] J. Nocedal and S. Wright (2006) Numerical optimization. Springer Science & Business Media. Cited by: §IV-C.
  • [15] T. Qin, P. Li, and S. Shen (2018) Vins-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics 34 (4), pp. 1004–1020. Cited by: §II-B, §II.
  • [16] S. Rusinkiewicz (2019) A symmetric objective function for icp. ACM Transactions on Graphics (TOG) 38 (4), pp. 1–7. Cited by: §II-A, §V-A, TABLE I.
  • [17] R. B. Rusu, N. Blodow, and M. Beetz (2009) Fast point feature histograms (fpfh) for 3d registration. In 2009 IEEE international conference on robotics and automation, pp. 3212–3217. Cited by: §II-A, §V-A, TABLE I.
  • [18] R. B. Rusu and S. Cousins (2011) 3d is here: point cloud library (pcl). In 2011 IEEE international conference on robotics and automation, pp. 1–4. Cited by: §V-A.
  • [19] J. Salvi, J. Pages, and J. Batlle (2004) Pattern codification strategies in structured light systems. Pattern recognition 37 (4), pp. 827–849. Cited by: §I.
  • [20] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers (2012) A benchmark for the evaluation of rgb-d slam systems. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 573–580. Cited by: Fig. 10, Fig. 12, §V-A.
  • [21] A. A. Taha and A. Hanbury (2015) An efficient algorithm for calculating the exact hausdorff distance. IEEE transactions on pattern analysis and machine intelligence 37 (11), pp. 2153–2163. Cited by: §V-B.
  • [22] Y. Wang, K. Liu, Q. Hao, X. Wang, D. L. Lau, and L. G. Hassebrook (2012)

    Robust active stereo vision using kullback-leibler divergence

    IEEE transactions on pattern analysis and machine intelligence 34 (3), pp. 548–563. Cited by: §I, §III-C.
  • [23] J. Yang, H. Li, D. Campbell, and Y. Jia (2015) Go-icp: a globally optimal solution to 3d icp point-set registration. IEEE transactions on pattern analysis and machine intelligence 38 (11), pp. 2241–2254. Cited by: §II-A.
  • [24] X. Zheng, R. Ma, R. Gao, and Q. Hao Phase-slam: mobile structured light illumination for full body 3d scanning. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1617–1624. Cited by: §I, §IV.
  • [25] Q. Zhou, J. Park, and V. Koltun (2016) Fast global registration. In European conference on computer vision, pp. 766–782. Cited by: §II-A, §V-A, TABLE I.
  • [26] Y. Zhou, G. Gallego, H. Rebecq, L. Kneip, H. Li, and D. Scaramuzza (2018) Semi-dense 3d reconstruction with a stereo event camera. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 235–251. Cited by: §IV-C.