1 Introduction
Deep learning has emerged as a powerful tool for 3D point cloud registration – where one wants to bring source and template point clouds into geometric alignment. Many “black box” strategies [40, 18, 32]
have been proposed that attempt to model the entire 3D point registration process as a neural network. Although exhibiting impressive results over a variety of benchmarks, such strategies tend to exhibit poor generalization performance if one needs to align point clouds that have not been seen during training (different object categories, different sensors, etc.). Iterative Closest Point (ICP)
[3] and its variants [42, 6, 15, 31, 39] still tend to fare much better in this regard, and as a result, are still the method of choice for many problems in robotics and vision. Although these classical methods tend to exhibit superior generalization performance, they have their drawbacks. In particular, when the point clouds lack distinct local geometric features, it becomes difficult to find effective correspondences – limiting the utility of the approach for many important problems.Recently, Aoki proposed a promising approach – PointNetLK [1] – to generalized 3D point registration that leverages insights from the classical Lucas & Kanade (LK) [19]
image alignment algorithm. Instead of using a neural network for modelling the entire registration pipeline, the approach learns only a point cloud embedding (PointNet). The actual registration process therein is deterministic and can be viewed as a modification of the classical LK algorithm. Another strength of the approach is that the embedding can be learned in a supervised endtoend manner by unrolling the LK algorithm as a recurrent neural network. Unlike other comparable neural network strategies, PointNetLK exhibits remarkable generalization performance across a number of scenarios. Further, unlike ICP and its variants, the approach does not rely on the cumbersome step of establishing correspondences between point clouds.
A drawback to the approach, however, is the numerical manner in which gradients are estimated within the LK pipeline which can often lead to poor registration performance. It is well understood that numerical gradients are intrinsically noisy, sensitive to stepsize choice, and in their limit inherently illconditioned [13]. In this paper, we advocate for a completely deterministic derivation of PointNetLK which circumvents many of its current limitations in terms of robustness and fidelity. Further, the approach allows for the derivation of an analytical Jacobian matrix that can be decomposed into “feature” and “warp” components. An advantage of this decomposition is that it allows for applicationspecific modifications to the “warp Jacobian” without the need for retraining the entire pipeline (something previously impossible with conventional PointNetLK). Our approach also circumvents some inherent memory and efficiency issues that arise when employing a deterministic gradient within PointNetLK. Specifically, we propose a novel point sampling strategy using seminal insights on critical points from PointNet [28] that allow for efficiency and good registration performance. We demonstrate impressive empirical performance across a number of benchmarks outperforming current stateoftheart methods such as Deep Closest Point (DCP) [34] and PRNet [35].
2 Related Work
Full learning model. Deep learning and its successful applications in 3D vision have motivated researchers to tackle challenging 3D alignment problems. One approach that has been explored is the use of full learning models in which deep neural networks are used to model the entire registration pipeline. While some authors use RGBD data to feed neural networks to estimate alignment transformations [14, 26], others [9, 8, 10] extract local features from point cloud patches. Recent works [40, 18] have focused at largescale registration using the entire point cloud to extract correspondences. Despite the great progress, full learning models still lack generalizability to unseen data.
Hybrid learning model. Unlike traditional methods [17, 29, 30]
that rely on handcrafted features to perform registration, hybrid learning models replace them by deep features. Elbaz
[11] proposed to extract deep features by projecting 3D point cloud to 2D and then apply RANSAC [12] and ICP for registration. Wang and Solomon proposed the Deep Closest Point (DCP) [34] method that leverages the DGCNN [36]for feature extraction and then they solve the transformation matrix using a differential SVD module. The same authors later proposed the PRNet
[35] to extend DCP to handle partial registrations. Recently, Yew and Li proposed the RPMNet [41] that combines the PointNet model with a robust point matching technique to estimate the rigid transformation. However, this method needs extra face normals to extract point feature. Although the hybrid learning models generalize well to various 3D shapes, these methods still need adequate keypoints for correspondence search. Aoki proposed the PointNetLK [1] which uses PointNet to extract deep features and the LK algorithm for point matching. Huang [16] further improves PointNetLK with a point distance loss. However, PointNetLK and its variant rely on numerical gradients which are highly sensitive to the stepsize choice and could result in poor and unstable performance.Global registration. ICPlike registration methods are highly sensitive to initialization that may produce unreliable results. Some methods have tried to solve for a global optimal solution by either using branchandbound based optimization [39], RANSACbased expansion [24], correspondencedependent searching [43, 27]
, or probabilitybased registration
[22]. Other novel ideas employed convex relaxation techniques to optimize the registration [23, 5]. However, these method demand large amounts of computing resources. Yang [38]recently proposed an outlierfree registration method that improves efficiency. Beyond traditional methods, Choy
[7] applied deep learningbased features and a weighted Procrustes analysis to perform global optimization. A fundamental issue with global registration methods is that they rely on dense correspondences, which might produce lowfidelity results for 3D shapes that lack geometric features.Lucas & Kanade algorithm. The image alignment framework proposed by Lucas and Kanade [19] and its derivatives [4, 2, 20, 25] seek to minimize the alignment error between two images by either using extracted distinct features or all the pixels in an image (photometric error). Lv [21] used a neural network to extract pyramid features for image tracking. Wang [33] proposed a regressionbased object tracking framework, which integrates the LK algorithm into an endtoend deep learning paradigm. In PointNetLK [1], the authors expanded the endtoend LK tracking paradigm to 3D point cloud.
3 Background
Problem statement. Let and be the template and source point clouds respectively, where and are the number of points. The rigid transformation that aligns the observed source to the template can be defined as , where are the exponential map twist parameters and the generator matrices. The PointNet embedding function can be employed to encode a 3D point cloud into a dimensional feature descriptor. Thus the point cloud registration problem can be formulated as
(1) 
where the symbol denotes the rigid transformation. For computational efficiency, the inverse compositional Lucas & Kanade (ICLK) formulation can be employed, and it is defined as
(2) 
where .
PointNetLK. We review the canonical PointNetLK approach [1]. Instead of solving directly for a global solution , they iteratively solve for an incremental change as
(3) 
where denotes the inverse composition, and the initial guess is set to . To solve this, we linearize Eq. 3 as
(4) 
where is the Jacobian matrix defined as
(5) 
The twist parameters can be solved as
(6) 
where the symbol is the MoorePenrose pseudoinverse, . Finally, the twist parameters are updated iteratively as
(7) 
where
(8) 
The numerical PointNetLK approximates each direction, , of the Jacobian in Eq. 5 using finite differences as
(9) 
where is the step size which infinitesimally perturbs the twist parameter of . Instead of learning a step size through the network, the algorithm requires a predefined step size for the approximation. However, this finite difference approximation is inherently problematic when the step size is infinitesimally small – numerical issues will raise, and the gradient approximation will become unstable. On the other hand, if the step size is relatively large, the approximation will also be inaccurate. Furthermore, the computational complexity of the numerical PointNetLK grows with the number of points and parameters.
4 Deterministic PointNetLK
In this section, we introduce our deterministic PointNetLK approach. Rather than approximating the Jacobian function using finite difference, we compute the exact
Jacobian given the input point cloud and its learned point features. We further explore possible improvements regarding the efficiency of our algorithm through changes in: network architecture, feature extraction, computation of the Jacobian, and point sampling. We then discuss several ways to design the loss function.
4.1 How to compute a deterministic Jacobian?
To compute a deterministic Jacobian for PointNetLK, we factorize from Eq. 5
into two parts with the chain rule as
(10) 
For efficiency, we apply the inverse compositional Lucas & Kanade (ICLK) algorithm. Thus, the calculation of
in each iteration is not necessary, and the initial transformation can be defined as an identity matrix,
. Eq. 10 then becomes,(11) 
The first part is the “feature gradient” which describes the changes in direction of the feature descriptors learned from the point cloud. We unroll the neural network to compute the “feature gradient”. The second part is the “warp Jacobian” as defined in the ICLK algorithm. It can be precomputed when we apply an identity warp to the template point cloud.
Let the template point cloud, , be . By forward passing through the simplified PointNet [28] model (3 layers and without the TNet), a perpoint (before the pooling operation) feature is extracted as
(12) 
where is a matrix transformation, represents the bias term,
stands for the batch normalization layer,
denotes the elementwise rectified linear unit function, and
is th layer. Thus our perpoint embedding feature can be simplified as . We solve for the partial derivative of with respect to the input points as(13) 
where and the number of layers .
Since PointNet extracts a global feature vector, we apply the max pooling operation,
, to obtain the final Jacobian as(14) 
Given the closedform Jacobian in Eq. 14, the whole point cloud registration pipeline can be deterministic. Following Eq. 6, 7, 8, we get the updates as , where . Note that our Jacobian formulation does not rely on finite differences. Our Jacobian is deterministic and does not require any step size to approximate the gradients. Thus, our method circumvents the numerical problems caused by the canonical PointNetLK.
4.2 Network design strategies
We noticed that the computational complexity of our deterministic PointNetLK grows with the number of points, which makes the naive implementation problematic for training. Here we propose several design strategies to make our method computationally efficient.
Feature aggregation. The input point cloud has points which we randomly split into segments. Each segment has number of points, and it is then fed to the network to get the feature vector . Then, we use max pooling to aggregate all into a global feature vector . Note that the aggregation strategy does not increase the time complexity.
Random feature. This strategy is similar to the previous one except that we do not aggregate features from each segment. We treat each segment of the point cloud as a small minibatch. Without feature aggregation, we consider each minibatch as individual data that enables the network to learn better representations. Moreover, the network converges faster to a solution.
Random points for the Jacobian computation. Instead of using the entire point cloud to estimate the deterministic Jacobian , we randomly sample 10% of the points to compute it. The dimension of each matrix for the Jacobian computation shrinks intensively. Theoretically, our deterministic PointNetLK can process the point cloud with a large number of points.
Compute Jacobian with aggregated points. While the point sampling through the 3D space is not representative (uniform sampling), to capture more important features in a point cloud, we can aggregate the Jacobian of each point cloud segment. Note that the math for the Jacobian computation still holds because we employ the max pooling operation.
Critical points for feature encoding. PointNet [28] proposed the use of critical points, which are the points that contribute the most to the global feature vector. Therefore, we also use critical points to evaluate our method. Moreover, we find that using critical points improves the efficiency of the model without loss of generalizability and accuracy.
4.3 Loss function
We employ different combinations of loss functions in our point cloud registration pipeline. The first loss is the error between the estimated transformation and the groundtruth transformation . The second loss measures the difference between the template feature vector and the source feature vector . Also, we explore the use of a pointbased distance loss.
Transformation error loss. We want to minimize the mean squared error (MSE) between the estimated and the groundtruth transformations. For efficiency, we formulate the transformation loss as
(15) 
where is an identity matrix and is the Frobenius norm. This formulation is computationally efficient because it does not require matrix logarithm operations.
Feature difference loss. To capture different feature signals for the transformed point clouds, we include a feature difference as another loss function during training. We want to minimize the error between the template point feature and the source point feature . Given that the encoded point feature is deterministic, if the point clouds are aligned, the feature difference should reduce to zero. The feature loss is defined as
(16) 
Point distance loss. In [16], the authors mentioned using the Chamfer distance as a loss function for the numerical PointNetLK training. Rather than directly calculating the distance between the template point cloud and the source point cloud, they used a decoder to first retrieve the point cloud from the feature vector, then calculate the point distance of the reconstructed point cloud. The point distance loss defined in Eq. 17 implicitly combines the rotation and the translation through a 3D shape representation.
(17) 
where is the reconstructed point cloud.
We can combine the loss functions as following:
for supervised learning;
for semisupervised learning; and
for unsupervised learning. We employ
for most of our experiments. The and are used for ablation studies.5 Experiments
We trained all the methods on the ModelNet40 [37] dataset. ModelNet40 has 3D shapes from 40 categories ranging from airplane, car, to plant, lamp. We sampled point clouds from vertices and all point clouds were centered at the origin within a unit box. To demonstrate the generalizability of our proposed method, we split the 40 categories into two parts. The first 20 categories are for training, while the last 20 categories are for testing. We also partitioned 20% of the training set for evaluation. The training transformations include rotations that were randomly drawn from and translations that were randomly sampled from . We applied the transformation to the source point cloud to get our template point cloud. During testing, we also sampled rotations from , and translations from for fair comparisons. We set the maximum number of iterations to ten for all the iterative methods. Since some correspondencebased methods require large computation, we only sampled up to points during testing. All testing experiments were performed on a single NVIDIA TITAN X (Pascal) GPU or a Intel Core i78850H CPU at 2.60 GHz. We adapted the code released by other methods for our experiments. Note that PRNet was trained on uniformly sampled points from the ModelNet40 dataset, since our training dataset is sparse, which would lead to illconditioned matrices, and causes SVD convergence problem.
5.1 Accuracy and generalization
We report the accuracy of our method compared with other point cloud registration methods in Fig. 2 and Table 1. We first set a maximum threshold for the rotation error ranging from to , and translation error up to . For each range, we measured the ratio of successfully aligned point cloud to the total number of point cloud as the success ratio. As shown in Fig. 2, our method greatly outperforms the traditional registration method ICP [3], deep featurebased method DCP [34], and PRNet [35]. Even with a rotation error threshold less than and a translation error threshold less than , our method can still achieve of success ratio, while ICP has only of success ratio, DCP and PRNet nearly fail for all testing point clouds. The results indicate that our approach has highly accurate alignment results.
We also present results on ModelNet40 dataset with different measurement metrics in Table 1. We used the root mean squared error (RMSE) to evaluate the variation of each error and the median error (Median) to better represent the error distribution. Compared to other methods, our proposed approach has extremely low median error in both rotation and translation. This result reveals that our method achieves significant accuracy for most test cases, while only a small portion of them will fall into larger errors.
We have shown that our method can generalize to other object categories during testing when trained on different object categories. In Table 2, we provide results on ShapeNet Core.V2 dataset. Our method can still achieve remarkably small median error, which further highlights its generalizability.
Left figure shows the training time per epoch when training with the same GPU consumption. Our method takes about
minutes to train one epoch, while numerical PointNetLK takes minutes and DCP takes minutes. Right figure is the testing time of one point cloud on a single CPU. Purple line indicates that our method is fast during testing and is hardly affected by the number of points. As the number of points grows, the test time of correspondencebased methods grows quadratically.5.2 Fidelity analysis
According to the results presented in Section 5.1, our method achieves high accuracy alignments. We further demonstrate the high fidelity of our method by setting the maximum rotation error threshold in , and the maximum translation error threshold in . In Fig. 3, we demonstrate that under an extremely small fidelity criterion, our approach achieves higher fidelity than the canonical PointNetLK and ICP, and also achieves high success ratio with infinitesimal registration errors. Other methods lost the fidelity when we set a small error criterion. The outperformance of our approach attributes to the deterministic gradient computation. Considering that we applied the LK algorithm on a 3D point cloud, we can directly process the spatial information with the deterministic “feature gradient” and the analytical “warp Jacobian”. Since our deterministic approach assures a high fidelity point cloud alignment, we can utilize it to refine the registration results given by other methods.
5.3 Efficiency
Fig. 4 demonstrates that our method is more computationally efficient than other methods during training and testing. We trained each network using points and a single GPU. During testing, we varied the number of points from to . Using a simplified PointNet with 3 layers and only 100 points for the Jacobian computation, our method is faster than the numerical PointNetLK. It also requires less space and time than other methods, especially when the number of points is large. With the number of points increasing, our method still maintains high efficiency. This suggests that our approach has the potential to efficiently cope with large number of points.
5.4 Robustness to noise
To verify the robustness of our method to noise, we trained the model on noiseless data and then added Gaussian noise independently to each point during test time. Note that we only added noise to the source point cloud, which was a reasonable simulation of the realworld situation. We set the success registration criterion to be a rotation error smaller than and a translation error smaller than . Fig. 5 displays the area under the curve (AUC) result. Compared with the numerical PointNetLK, our approach is more robust to noise even when the source point cloud has large noise (0.04). When the data is noisy, the deterministic Jacobian provides more accurate gradients than the numerical one. DCP and PRNet fail when large noise is applied.
5.5 Sparse point cloud
In realworld applications, especially in the autonomous driving scenes, point clouds obtained from LiDAR sensors are sparse. To test the ability of our method to deal with sparse data, we simulated the sparsity in the source point clouds for the ModelNet40 dataset. Starting from a dense and complete template point cloud, we gradually subtracted a subset from the entire point cloud. In the end, we got the sparse source with a decreasing percentage of the points. Fig. 7 implies that our method maintains relatively high success ratio in sparse registration with more than of the points sampled from the source point cloud.
5.6 Partial data
To further explore the capacity of our method to register point cloud in realworld scenes, we performed partial data experiment. We selected a complete shape as the template point cloud and obtained a partial source point cloud from the template by determining which points were visible from certain random camera poses. The simulation process was to set a camera at the origin facing at direction in the spherical coordinate, where was the polar angle, and was the azimuthal angle. We sampled
from a normal distribution
and from a normal distribution . Then, we moved the source point cloud along the vector , where the radical distance was set as . Next, we determined which points were visible to the camera. These visible points would be the partial source point cloud. Although our method achieves the highest success ratio for both rotation and translation, as shown in Fig. 8, it could not retain a high fidelity for translation changes. The main reason is that for the partial data, our approach is unable to know the real center of the object. When subtracting the mean and centering the partial point cloud at the origin, it infers a wrong point cloud center. Another reason lies in the feature loss function (Eq. 16) we used. The feature loss can be large when the template is complete and the source is a partial scan.5.7 Decomposition of the Jacobian
An advantage of our method is that we can decompose the Jacobian function into a deterministic “feature gradient” and an analytical “warp Jacobian”. Seeing that the “feature gradient” is deterministic and separated from the Jacobian function, we can reuse it for alternate alignment tasks. During testing, we are able to compute the Jacobian function without retraining the complete registration pipeline. We only need to compute the “warp Jacobian” and compose the final Jacobian function with precomputed “feature gradient”. Examples are showcased in Fig. 6.
5.8 Ablation study
We have introduced several network strategies and different loss functions for our approach. In this section, we compare these various settings^{1}^{1}1The Aggregated Jacobian results are in the supplementary material.. Table. 3 lists errors for different metrics. The first two rows are the canonical PointNetLK. Replacing feature difference loss and transformation error loss with single point distance loss in row 2 improved the accuracy. The row 310 shows results of our deterministic PointNetLK. It did not lose accuracy when aggregating feature for the entire point cloud (shown in rows 3, 5, 9, and 10). If we use random feature rather than aggregated feature (rows 4, 6, 7, and 8), we can improve the fidelity. However, in semisupervised learning (rows 8 and 10), aggregated feature will generate better results.
Note that with random feature, our method is faster. Adopting point distance loss rather than feature loss (rows 7, 8, 9, and 10) will gain slight improvements.
6 Discussion and Future Work
The deterministic PointNetLK is a deep featurebased registration method that preserves high fidelity, generalization, and efficiency. Unlike other full learning methods, our approach uses PointNet to extract point cloud features and deterministic LK algorithm for registration. Such hybrid model leverages point feature representation from a neural network and the intrinsic generalizability of LK algorithm.
We advocate to solve the Jacobian function using two separate deterministic gradients. We unroll the network to compute the accurate “feature gradient” signal corresponding to spatial locations. Still, the decomposition of the Jacobian function enables the reuse of “feature gradient” in different applications without retraining the entire registration pipeline. Furthermore, we propose different strategies to deal with efficiency.
Our experiments highlight the high fidelity property of our method and its robustness to different data settings which can be used for registration refinement. In addition, We have noticed that choosing the proper data for training is crucial for our method. One of the future work will be finding possible ways to better preprocess 3D point cloud. Another possible work is to improve the “feature gradient” using different encoding strategies. In addition, we want to extend our method to global registration framework. Finally, we expect our method to be applied in the realworld scenarios, particularly in the SLAM community.
References

[1]
Y. Aoki, H. Goforth, R. A. Srivatsan, and S. Lucey.
PointNetLK: robust & efficient point cloud registration using
PointNet.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pages 7163–7172, 2019.  [2] S. Baker and I. Matthews. LucasKanade 20 years on: a unifying framework. International journal of computer vision, 56(3):221–255, 2004.
 [3] P. J. Besl and N. D. McKay. Method for registration of 3D shapes. In Sensor fusion IV: control paradigms and data structures, volume 1611, pages 586–606. International Society for Optics and Photonics, 1992.
 [4] J.Y. Bouguet et al. Pyramidal implementation of the affine LucasKanade feature tracker description of the algorithm. Intel corporation, 5(110):4, 2001.
 [5] J. Briales and J. GonzalezJimenez. Convex global 3D registration with Lagrangian duality. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4960–4969, 2017.
 [6] D. Chetverikov, D. Svirko, D. Stepanov, and P. Krsek. The trimmed iterative closest point algorithm. In Object recognition supported by user interaction for service robots, volume 3, pages 545–548. IEEE, 2002.
 [7] C. Choy, W. Dong, and V. Koltun. Deep global registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2514–2523, 2020.
 [8] H. Deng, T. Birdal, and S. Ilic. PPFFoldNet: unsupervised learning of rotation invariant 3D local descriptors. In Proceedings of the European Conference on Computer Vision (ECCV), pages 602–618, 2018.
 [9] H. Deng, T. Birdal, and S. Ilic. PPFNet: global context aware local features for robust 3D point matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 195–205, 2018.
 [10] H. Deng, T. Birdal, and S. Ilic. 3D local features for direct pairwise registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3244–3253, 2019.
 [11] G. Elbaz, T. Avraham, and A. Fischer. 3D point cloud registration for localization using a deep neural network autoencoder. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4631–4640, 2017.
 [12] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981.
 [13] B. Fornberg. Numerical differentiation of analytic functions. ACM Transactions on Mathematical Software (TOMS), 7(4):512–526, 1981.
 [14] Z. Gojcic, C. Zhou, J. D. Wegner, and A. Wieser. The perfect match: 3D point cloud matching with smoothed densities. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5545–5554, 2019.
 [15] M. Greenspan and M. Yurick. Approximate KD tree search for efficient ICP. In Proceedings of the IEEE International Workshop on 3D Digital Imaging and Modeling (3DIM), pages 442–448. IEEE, 2003.
 [16] X. Huang, G. Mei, and J. Zhang. Featuremetric registration: a fast semisupervised approach for robust point cloud registration without correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 11366–11374, 2020.
 [17] A. E. Johnson and M. Hebert. Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 21(5):433–449, 1999.
 [18] W. Lu, G. Wan, Y. Zhou, X. Fu, P. Yuan, and S. Song. DeepVCP: an endtoend deep neural network for point cloud registration. In Proceedings of the International Conference on Computer Vision (ICCV), pages 12–21, 2019.

[19]
B. D. Lucas, T. Kanade, et al.
An iterative image registration technique with an application to
stereo vision.
Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)
, 1981.  [20] S. Lucey, R. Navarathna, A. B. Ashraf, and S. Sridharan. Fourier LucasKanade algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 35(6):1383–1396, 2012.
 [21] Z. Lv, F. Dellaert, J. M. Rehg, and A. Geiger. Taking a deeper look at the inverse compositional algorithm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4581–4590, 2019.
 [22] Y. Ma, Y. Guo, J. Zhao, M. Lu, J. Zhang, and J. Wan. Fast and accurate registration of structured point clouds with small overlaps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9, 2016.
 [23] H. Maron, N. Dym, I. Kezurer, S. Kovalsky, and Y. Lipman. Point registration via efficient convex relaxation. ACM Transactions on Graphics, 35(4):1–12, 2016.
 [24] N. Mellado, D. Aiger, and N. J. Mitra. Super 4PCS fast global point cloud registration via smart indexing. In Computer Graphics Forum, volume 33, pages 205–215. Wiley Online Library, 2014.
 [25] S. Oron, A. BarHille, and S. Avidan. Extended LucasKanade tracking. In Proceedings of the European Conference on Computer Vision (ECCV), pages 142–156. Springer, 2014.
 [26] G. D. Pais, S. Ramalingam, V. M. Govindu, J. C. Nascimento, R. Chellappa, and P. Miraldo. 3DRegNet: a deep neural network for 3D point registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7193–7203, 2020.
 [27] Y. Pan, B. Yang, F. Liang, and Z. Dong. Iterative global similarity points: a robust coarsetofine integration solution for pairwise 3D point cloud registration. In Proceedings of the International Conference on 3D Vision (3DV), pages 180–189. IEEE, 2018.
 [28] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. PointNet: deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 652–660, 2017.
 [29] R. B. Rusu, N. Blodow, and M. Beetz. Fast point feature histograms (FPFH) for 3D registration. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 3212–3217. IEEE, 2009.
 [30] S. Salti, F. Tombari, and L. Di Stefano. SHOT: unique signatures of histograms for surface and texture description. Computer Vision and Image Understanding (CVIU), 125:251–264, 2014.
 [31] R. Sandhu, S. Dambreville, and A. Tannenbaum. Particle filtering for registration of 2D and 3D point sets with stochastic dynamics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–8. IEEE, 2008.
 [32] V. Sarode, X. Li, H. Goforth, Y. Aoki, R. A. Srivatsan, S. Lucey, and H. Choset. PCRNet: point cloud registration network using pointnet encoding. arXiv preprint arXiv:1908.07906, 2019.
 [33] C. Wang, H. K. Galoogahi, C.H. Lin, and S. Lucey. DeepLK for efficient adaptive object tracking. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 627–634. IEEE, 2018.
 [34] Y. Wang and J. M. Solomon. Deep Closest Point: learning representations for point cloud registration. In Proceedings of the International Conference on Computer Vision (ICCV), pages 3523–3532, 2019.
 [35] Y. Wang and J. M. Solomon. PRNet: selfsupervised learning for partialtopartial registration. In Neural Information Processing Systems (NIPS), 2019.
 [36] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics (TOG), 38(5):146, 2019.
 [37] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3D Shapenets: a deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1912–1920, 2015.
 [38] H. Yang, J. Shi, and L. Carlone. TEASER: fast and certifiable point cloud registration. arXiv preprint arXiv:2001.07715, 2020.
 [39] J. Yang, H. Li, and Y. Jia. GoICP: solving 3D registration efficiently and globally optimally. In Proceedings of the International Conference on Computer Vision (ICCV), pages 1457–1464, 2013.
 [40] Z. J. Yew and G. H. Lee. 3DFeatNet: weakly supervised local 3D features for point cloud registration. In Proceedings of the European Conference on Computer Vision (ECCV), pages 630–646. Springer, 2018.
 [41] Z. J. Yew and G. H. Lee. RPMNet: robust point matching using learned features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 11824–11833, 2020.
 [42] Z. Zhang. Iterative point matching for registration of freeform curves and surfaces. International Journal of Computer Vision (IJCV), 13(2):119–152, 1994.
 [43] Q.Y. Zhou, J. Park, and V. Koltun. Fast global registration. In Proceedings of the European Conference on Computer Vision (ECCV), pages 766–782. Springer, 2016.
7 Supplementary Material
In this supplementary material, we further explain our network design strategies, and present more visual registration results of our deterministic PointNetLK.
8 Network Design Strategies
We have introduced network design strategies in Section 4.2 of our main paper. In this section, we include a demonstration figure (Fig. 10) to further explain our strategies.
As shown in Fig. 10 (a), the long green box is the entire point cloud that contains points. We split it into segments where each segment is denoted as a small green box. Each segment is encoded through a perpoint embedding PointNet (orange box) to get a feature which is depicted as a blue box. We then concatenate all the feature segments to get . We use max pooling to get our final feature vector . If we want to speed up the training, we can use random feature as our final (see Fig. 10 (b)). Same strategies can be used to compute the Jacobian function. We can randomly choose a point cloud segment to compute the Jacobian as depicted in Fig. 10 (c). The purple box denotes parameters needed for Jacobian computation. However, when the point cloud is not a good representation of the 3D shape, like uniformly sampling, we aggregate of each point cloud segment (shown in Fig. 10 (d)). The red dash box represents the same operation in Fig. 10 (c).
We also provide quantitative results for different Jacobian computation strategies to complete our ablation study table (as shown in Table 4). We find that if we use the aggregated Jacobian, there is no significant change in the performance.
9 Visual Results
In Section 5 of our main paper, we have showed the robustness of our deterministic PointNetLK approach on different registration scenarios. We provide more visual results of our approach in this section.
9.1 Generalizability
Our approach has superior generalizability over different dataset. We show several registration results on the Stanford 3D scan dataset^{2}^{2}2http://graphics.stanford.edu/data/3Dscanrep in Fig. 11.
9.2 Results for complete data
Fig. 12 shows the visual registration results on the complete model.
9.3 Results for noisy data
Fig. 13 displays the registration results on noisy dataset. Our method is robust to noise.
9.4 Results for sparse data
We present results on sparse data registration in Fig. 14. Though there is only data in the source point cloud, our method still has great performance.
9.5 Results for partial data
Fig. 15 shows the partial data registration results. Our approach has relatively good performance for the partial registration.
Comments
There are no comments yet.