I Introduction
The SLI technology has been widely used for highprecision 3D scanning for many industrial applications with the cameraprojector pair. There are usually two approaches for SLI systems to achieve 360 degree 3D reconstruction: controlled motion based and free motion based [19]. The former uses a servo motor to rotate the object along a predefined trajectory for multiple view scanning; the latter estimates sensor motions through local and global point cloud registration, such as Iterative Closest Point (ICP) and associated variants [2, 11]. The freemotion approach is advantageous in its flexibility but incurs high computational complexity and demands a high storage capacity.
Meanwhile, as the 2D phase data produced by SLI systems contain 3D information [22], it is appealing to utilize the phase to achieve highefficiency pose estimation and loop closure detection. However, to develop a fully functional PhaseSLAM system has to cope with the following technological challenges: (1) how to build the intrinsic relationship between the phase and the transformation of SLI; (2) how to develop a local optimization procedure for estimating 6 DOF motions of the SLI sensor (odometry); (3) how to achieve sparse representation and fast matching of phase data for loop closure detection. Our previous work [24] proposes a geometric reference plane to model the relationship between phases and motions of 6 DoF separately, which is complicated and inconvenient. Besides, if the loop closure detection is based on whole phase images, the memory footprint will also grow quickly as the scanning view increases. This paper presents an upgraded PhaseSLAM framework, which utilizes a 3D point to 2D phase reprojection method to build the model, a gradient based local optimizer to achieve odometry functionality and a compression method to enable efficient loop detection (Fig. 1). The main contributions of this work include,

proposing a reprojection model from 3D point to 2D phase data, which can be used to get phase estimations and measurements;

constructing a local pose optimizer with the reprojection model and the analytical expression of Jacobian matrix is derived for pose estimation;

developing a complete pipeline of PhaseSLAM framework with a compressive loop closure detection scheme and the pose graph optimization;

building simulation and realworld datasets and providing the opensource code for further development.
This paper is organized as follows. Section II introduces the related work. Section III gives an overview of the PhaseSLAM system pipeline. Section IV describes the proposed phasebased pose estimation and compressive loop detection methods. Section V provides experiment results and discussions. Section VI concludes the paper and outline the future works. Appendix supplements the details of the Jacobian matrix in use.
Ii Related work
Most visual SLAM systems are based on either direct or indirect schemes. Direct approaches [6, 5] sample pixels from image regions and minimize the photometric error. Indirect approaches [13, 15] require extra computational resources for detecting and describing features. In contrast, the proposed PhaseSLAM system is based on pixellevel phase data, which contain 3D depth information and can be extracted directly by selecting a region of interest (ROI).
Iia Point Cloud Registration
SLI systems often use point cloud registration methods to achieve large fields of view scans, either local or global. Classical local registration methods, such as PointtoPoint ICP [2], minimizes the sum of distances between points and their nearest neighbours. PointtoPlane ICP [11] assumes that each corresponding point is located on a plane, and introduces surface normals into the objective function to achieve more efficient data registration. Symmetrized objective function (SymICP) have been proposed to extend the planar convergence into a quadratic one at extra computational costs [16].
Local methods are limited by initial guesses, so structural features of point clouds are used to search for transformations globally. Point coordinates and surface normals have been used to compute the Fast Point Feature Histograms (FPFH) [17], and the coplanar 4point sets have been chosen as features for registration (Super 4PCS) [12]. Besides, GoICP uses the branchandbound (BnB) scheme to avoid local optima [23]. Fast Global Registration (FGR) applies a BlackRangarajan duality to achieve a more robust objective function [25]. BCPD++ formulates coherent point drift in a Bayesian setting to supervise the convergence of algorithm [10]. Compared with above 3D point cloud registration methods, our approach converts 3D point cloud registration into 2D phase data registration, resulting in much reduced computational complexity and memory footprint.
IiB Loop Closure Detection
Loop closure detection can effectively eliminate the accumulating error. A plain method is randomly sampling a number of keyframes to find loop closures [4]. Odometry based approaches judge whether there is a loop closure at the current position according to the calculated map [9]. Appearance based approaches determine the loop relationship based on the similarity of two scenes [13, 15]. BagofWords (BoW) based the approach [7] uses descriptors (words) for loop closure detection instead of whole images. In this paper, our loop closure detection is based on compressed phase data to reduce both computatonal complexity and storage space without losing much detection performance.
Iii System Setup and Problem Statement
Iiia System Setup
The proposed PhaseSLAM pipeline is shown in Fig. 2. Based on the SLI sensor data, correponding 3D point clouds and the reprojection model are used to obtain phase data (Section IVB). Then, the local pose optimization module (Section IVC) is used to estimate sensor poses by minimizing errors between predictions and measurements of phase data. Local pose graphs are updated until the compressive loop closure detection (Section IVE) is triggerred. The pose graph optimizer then performs global optimization to eliminate the cumulative errors and revise sensor poses. Finally, poses are used to align multiview point clouds and achieve the overall 3D object reconstruction.
We define the notations used in this paper. The initial position of the projector is chosen as the origin of the world coordinate system. is the world frame, is the camera frame, is the projector frame, and means the th sensor pose. The and stand for the phase image and phase value at each pixel location, respectively. denotes the estimated value.
is the 3D coordinate of a point. The transformation between two sensor poses is represented by vector
, Matrix and vector represent rotation and translation from to . and can be obtained for a given .IiiB Problem Statement
This work aims at developing a complete SLAM system that can estimate the SLI sensor pose transformation through phase data registration and achieve global 360 degree dense 3D reconstruction through pose graph optimization. At each step, 3D points are projected into the sensor imaging plane with initialized pose rotation and translation by using
(1) 
where the is the perspective transformation with the projection matrix , means that projects a 3D point onto the imaging plane. is the pixel position, which is used for obtaining phase data estimations and measurements . Obtained the and , the sensor pose transformation is estimated by
(2) 
where
(3) 
Such a local optimization procedure requires computing the Jacobian matrix iteratively until it converges. The loop closure detection and pose graph optimization will be also needed to reduce estimation errors.
IiiC SLI Scanning
In the cameraprojector based SLI system, the Phase Measuring Profilometry (PMP) method is used to calculate the phase image, as shown in Fig. 3. The camera captures the raw images of sine patterns deformed by the scanned surface, given by
(4) 
where (the number of patterns), and are the background brightness and intensity modulation, respectively. The phase image can then be calculated by [22]
(5) 
Iv Proposed methods
This section investigates the geometric model among 3D point, phase data and sensor pose. Comparing with our previous work [24], this work develops a more intuitive and simpler model based on reprojective transformation method. After phase data pairing, the sensor pose motion can be estimated through leastsquare optimization between phase predictions and measurements. A compressed sensing scheme is adopted to achieve fast loop closure detection for global pose graph optimization.
Iva Phase Values under Epipoloar Constraint
In a SLI system, we regard the projector as another camera, which has similar projection parameters and perspective principles with it. As shown in the left of Fig. 4, according to the epipoloar constraint, a phase value obtained from the phase image can correspond to a pixel location in the “camera” imaging plane (phase pattern), like stereovision [1]. In PMP method, the phase pattern is actively projected by the projector, so pixel locations and phase values on pattern plane have a fixed and known relevance. As shown in the right of Fig. 4, the phase value of each column in phase pattern is linearly increased from 0 to 2 and each row in the pattern is the same. This means when we get the phase value of a 3D point from the phase image, we can know its ordinate under the projector’s pattern coordinate. Vice versa, if we knew the projective coordinate (just only ) of a point in the pattern, we could get its corresponding phase value on the phase image by
(6) 
where is the row height of the projector’s imaging plane.
IvB Phase Pairing Based on Reprojective Transformation
As shown in the Fig. 5, a 3D point is measured by the SLI sensor at the with the coordinate . Assuming the transformation: rotation matrix and translation vector , the SLI move to by it and the point will have a new coordinate , given by
(7) 
Based on the new coordinate, the point is reprojected into camera and projector imaging plane in to get two pixel locations on them: and by the transformation (Eq. (1)), respectively. On the projector imaging plane (the phase pattern in Fig. 4), when the reprojection ordinate of point : is known, the phase value prediction can be obtained by Eq. (6). So, combining Eq. (1, 6, 7), the phase value of the a 3D point in the phase pattern can be estimated by using
(8) 
where is the element of , , is calibration parameter (projector’s focal length and principal point along the row of the projector imaging plane, respectively).
On the camera imaging plane, the phase value measurement can be obtained from the phase image at the pixel location , which can be computed by
(9) 
where is the element of projection matrix . When and
are not integers, bilinear interpolation on
can be used to calculate phase data at integer indices.IvC Local Pose Optimizer
In the local optimizer, the state variables are defined as , which is equivalent to and . The error between and are given by
(10) 
The objective function is shown in
(11) 
the is a ROI in phase images, is the th point in ROI. .
The proposed function Eq. (11) can be solved by iterative gradientbased methods [14]. Given the initial value , the cost function can be approximated by Taylor expand about , and , where
(12) 
is the Jacobian matrix, the optimization increment is computed by , which is the negative gradient direction of , controls the size of steps. The solution is updated by , where is the iterative index [1, 14].
Fig. 6 shows a simple example the optimization process. Fig. 6 (a) shows an object image. (c) shows the corresponding phase image with a ROI. (d) shows the phase images acquired at a new sensor pose. (b) shows the plot of errors between two sets of phase data (within the ROI) with respect to the displacements of the SLI sensor. It can be seen that such an optimization process can be converged to the local minimum [26].
IvD The Jacobian Matrix
IvE Loop Closure Detection
The proposed PhaseSLAM utilizes the Compressive Sensing (CS) technique to reduce computational complexity and data storage space for loop closure detection. The compressibility of an image is determined by its sparsity. More sparse images will lose less information after compression and the sparse image contains less highfrequency information [3]. A haar wavelet bases and norm are used to illustrate the degree of sparsity of phase images. As shown in Fig. 7, a phase image and a photo image are projected upon wavelet bases first. Then the norms of the wavelet coefficients of two types of images within one SLAM loop are compared in Fig. 7 (e). It can be seen that the norms of the wavelet coefficients of phase images are much smaller than photo images, indicating the degree of sparsity of phase images is much smaller than photo images.
According to the CS theory [3], two signals and are distinguishable after compression if the matrix satisfies
(15) 
where is a constant, and is Gaussian matrix. The compressed signals , has quite smaller size than the original signals. For a 2D phase image , we first reshape it into a 1D vector . The reshaped phase data can be recovered by the compressed signal , and the error between two compressive phase vector is shown in
(16) 
When is smaller than a threshold, the loopclosure is detected.
IvF The Pipeline of PhaseSLAM
After successful loopclosure detection, the pose graph optimization technique [8] will be used to eliminates cumulative error and refine poses. The pose sequences in our system usually have a large interval during the scanning process, so every estimated pose is a vertex in the pose graph optimizer.
V Experiment Results and Discussions
The proposed PhaseSLAM system was evaluated with both the Unreal Engine 4 (UE4) simulator and realworld experiments. All experiments were implemented on a PC with an Intel Core i79800K CPU @ 3.6GHz.
Va Simulation Experiments
The simulation dataset was collected with the Airsim plugin in UE4. Different 3D models were used as targets, and the virtual SLI device moved around the target along a radius of 120 cm and with a rotation interval of 20 degrees. The simulated dataset is based on three models namely David, Elephant and Dancing girl, which contains calibration parameters, phase images and groundtruth poses. The baseline methods include four stateoftheart local methods, namely PointtoPoint ICP [2], PointtoPlant ICP [11], SymICP [16] and FPFH [17], and two SOTA global methods, namely FGR [25] and BCPD++ [10]. Local methods were conducted based on Point Cloud Library (PCL) implementation [18]. Global methods were based on opensource code. The numbers of iterations of PointtoPoint ICP, PointtoPlane ICP, and SymICP were chosen as 30; FPFH was 10000. The implementation of FGR and BCPD++ used the recommended parameters.
The compression of phase images is shown in Fig. 8. In simulation experiments, the resolution of phase images is (Fig. 8 (a)), the size of Gaussian compressive random matrix is chosen as (Fig. 8 (b)), that is, the compression ratio is 3070:1 and the size of the compressed phase signal is . Fig. 8 (c) shows the compressed signals corresponding to three phase images like Fig. 8 (a). It can be seen that the three sets of signals are distinguishable in terms of the peaks and valleys for loop closure detection. Furthermore, experiment results show that the time consumption of the backend optimization using CS technique can be reduced by than using original phase images.
Fig. 9 (a)(c) shows the 3D reconstruction results (top) and ground truth (bottom) for the three simulation targets (David, Elephant and Dancing girl) using the proposed PhaseSLAM with loop closure. More quantified reconstruction errors are shown in Fig. 15. Fig. 10 shows the relative pose errors (RPE) [20] in rotation (top) and translation (bottom), respectively by using five methods with David dataset. It can be seen that the proposed PhaseSLAM method with loop closure (PhaseSLoop) outperforms other four methods. The median RPE of PhaseSLAM is 0.81 degree and 0.94cm in rotation and translation, respectively. Table I shows the root mean squared error (RMSE) of absolute trajectory error (ATE) [20] and the computation time for three different datasets. The average RMSE of our approach is 1.06cm, which is better than other methods. Actually, PhaseSLAM with loop closure outperforms PhaseS by . In simulations, the average number of 3D points corresponding to the image is around 50000. BCPD++ has the highest compuation speed among the 6 existing methods. Our approach is still almost two times faster than BCPD++. And the average running time of the backend optimization is 40.7 ms.
Method  David  Elephant  Dancing Girl 

PhaseSLoop[ours]  1.39 / 1.52  0.72 / 1.58  1.07 / 0.82 
PhaseS[ours]  2.32 / 1.49  2.40 / 1.23  2.05 / 0.76 
BCPD++[10]  25.53 / 2.87  87.09 / 3.46  15.57 / 2.68 
SymICP[16]  6.35 / 109.10  7.06 / 146.71  8.17 / 104.22 
Point to Plane[11]  6.76 / 55.15  13.07 / 65.73  10.57 / 45.89 
Point to Point[2]  17.17 / 32.56  19.35 / 44.68  40.26 / 30.75 
FGR[25]  11.12 / 25.56  15.67 / 34.68  38.74 / 26.90 
FPFH[17]  8.99 / 70.59  25.10 / 79.09  32.70 / 59.68 

RMSE of ATE: The root mean squared error of absolute trajectory error.
VB RealWorld Experiments
Fig. 11 shows the experiment setup, where the SLI sensor, consisting of a projector (DLP3000 DMD from TI) and an industrial camera (12801024 resolution from HIKVISION), is mounted on a UR5 robotic arm. Fig. 9
(df) show two plaster statues (David, Sona) and a plush toy (PiKaChu) used to build realworld datasets, namely David6DoF, David3DoF, Sona3DoF and PiKaChu3DoF, where 6DoF and 3DoF stand for six and three degrees of freedom motions, respectively. The David6DoF dataset includes 31 random poses; Each 3DoF dataset has 37 poses at equal intervals of 10 degrees and a radius of 60cm.
Fig. 12 shows the RPE results of five different methods using the David6DoF dataset. It is clear that the proposed method (PhaseSLoop) has a better performance than other four methods. Table II is the RMSE of ATE and the computation time for four different datasets using eight different methods. It can be seen that the proposed method outperforms other six methods in both terms of accuracy and computation time. The 3D object reconstruction results using the proposed method under realworld datasets are shown in Fig. 9 (df).
Method  David6DoF  David3DoF  PiKaChu  Sona 

PhaseSLoop  4.69/4.20  4.71/3.19  2.09/3.30  1.83/3.18 
PhaseS  6.12/4.17  5.74/3.17  3.27/3.27  2.29/3.15 
BCPD++  244.41/2.79  53.01/2.93  21.72/3.72  22.39/3.81 
SymICP  99.28/374.26  28.78/345.25  35.68/268.69  30.88/242.12 
Point to Plane  101.5/168.53  33.66/152.13  33.97/119.23  36.65/109.15 
Point to Point  170.2/118.42  89.28/101.54  70.36/81.34  84.44/77.9 
FGR  282.3/238.25  224.21/213.15  231.17/302.45  149.92/191.7 
FPFH  109.85/386.8  95.51/153.66  254.34/202.14  90.81/217.9 

RMSE of ATE: The root mean squared error of absolute trajectory error.
Fig. 13 illustrates the estimated SLI sensor trajectory and the ground truth under David6DoF dataset, where the total trajectory length is 3.967m. Fig. 14 shows the pose estimation errors by using local methods (SymICP, PointtoPlant and PointtoPoint ICP) and our method without global optimizaiton under different initial values. We can see that the proposed method is least sensitive to initial values. Fig. 15 shows a radar chart of seven methods without global optimization for all datasets using five performance metrics (Hausdorff distance, computation time, translation/rotation errors, and storage space) for a comprehensive evaluation. The Hausdorff distance is used to describe the dissimilarity between reconstructed point clouds and the groundtruth [21]. It is obvious that the proposed method has the superior performance in all those metrics.
Vi Conclusion
This paper presents a phase based Simultaneous Localization and Mapping (PhaseSLAM) pipeline for fast and accurate SLI sensor pose estimation and 3D object reconstruction. The proposed reprojection model and local pose optimizer can achieve the odometry functionality with high efficiency, accuracy and low sensitivity to initial pose knowledge. The proposed compressive loop closure detection technique can reduce both the loop closure computational time and data storage space. Even without global optimization, the proposed local data registration method outperforms six other existing 3D point cloud based methods in terms of sensor pose estimation accuracy, storage space, computation time and 3D reconstruction errors. The code of our framework and the dataset in use are available online.
Appendix
The analytic expression of the Jacobian of with respect to is provided in this section. The intermediate terms are given by
(17) 
The analytic expression of Jacobian is then given by
(18) 
References

[1]
(2001)
Multiple view geometry in computer vision
. Kybernetes. Cited by: §IVA, §IVC.  [2] (1992) Method for registration of 3d shapes. In Sensor fusion IV: control paradigms and data structures, Vol. 1611, pp. 586–606. Cited by: §I, §IIA, §VA, TABLE I.
 [3] (2008) The restricted isometry property and its implications for compressed sensing. Comptes rendus mathematique 346 (910), pp. 589–592. Cited by: §IVE, §IVE.
 [4] (2013) 3d mapping with an rgbd camera. IEEE transactions on robotics 30 (1), pp. 177–187. Cited by: §IIB.
 [5] (2017) Direct sparse odometry. IEEE transactions on pattern analysis and machine intelligence 40 (3), pp. 611–625. Cited by: §II.
 [6] (2014) LSDslam: largescale direct monocular slam. In Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars (Eds.), Cham, pp. 834–849. Cited by: §II.
 [7] (2012) Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics 28 (5), pp. 1188–1197. Cited by: §IIB.
 [8] (2010) A tutorial on graphbased slam. IEEE Intelligent Transportation Systems Magazine 2 (4), pp. 31–43. Cited by: §IVF.
 [9] (2003) An efficient fastslam algorithm for generating maps of largescale cyclic environments from raw laser range measurements. In Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453), Vol. 1, pp. 206–211. Cited by: §IIB.
 [10] (2020) Acceleration of nonrigid point set registration with downsampling and gaussian process regression. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §IIA, §VA, TABLE I.
 [11] (2004) Linear leastsquares optimization for pointtoplane icp surface registration. Chapel Hill, University of North Carolina 4 (10), pp. 1–3. Cited by: §I, §IIA, §VA, TABLE I.
 [12] (2014) Super 4pcs fast global pointcloud registration via smart indexing. In Computer graphics forum, Vol. 33, pp. 205–215. Cited by: §IIA.
 [13] (2015) ORBslam: a versatile and accurate monocular slam system. IEEE transactions on robotics 31 (5), pp. 1147–1163. Cited by: §IIB, §II.
 [14] (2006) Numerical optimization. Springer Science & Business Media. Cited by: §IVC.
 [15] (2018) Vinsmono: a robust and versatile monocular visualinertial state estimator. IEEE Transactions on Robotics 34 (4), pp. 1004–1020. Cited by: §IIB, §II.
 [16] (2019) A symmetric objective function for icp. ACM Transactions on Graphics (TOG) 38 (4), pp. 1–7. Cited by: §IIA, §VA, TABLE I.
 [17] (2009) Fast point feature histograms (fpfh) for 3d registration. In 2009 IEEE international conference on robotics and automation, pp. 3212–3217. Cited by: §IIA, §VA, TABLE I.
 [18] (2011) 3d is here: point cloud library (pcl). In 2011 IEEE international conference on robotics and automation, pp. 1–4. Cited by: §VA.
 [19] (2004) Pattern codification strategies in structured light systems. Pattern recognition 37 (4), pp. 827–849. Cited by: §I.
 [20] (2012) A benchmark for the evaluation of rgbd slam systems. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 573–580. Cited by: Fig. 10, Fig. 12, §VA.
 [21] (2015) An efficient algorithm for calculating the exact hausdorff distance. IEEE transactions on pattern analysis and machine intelligence 37 (11), pp. 2153–2163. Cited by: §VB.

[22]
(2012)
Robust active stereo vision using kullbackleibler divergence
. IEEE transactions on pattern analysis and machine intelligence 34 (3), pp. 548–563. Cited by: §I, §IIIC.  [23] (2015) Goicp: a globally optimal solution to 3d icp pointset registration. IEEE transactions on pattern analysis and machine intelligence 38 (11), pp. 2241–2254. Cited by: §IIA.
 [24] Phaseslam: mobile structured light illumination for full body 3d scanning. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1617–1624. Cited by: §I, §IV.
 [25] (2016) Fast global registration. In European conference on computer vision, pp. 766–782. Cited by: §IIA, §VA, TABLE I.
 [26] (2018) Semidense 3d reconstruction with a stereo event camera. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 235–251. Cited by: §IVC.