Deterministic PointNetLK for Generalized Registration

by   Xueqian Li, et al.
Carnegie Mellon University

There has been remarkable progress in the application of deep learning to 3D point cloud registration in recent years. Despite their success, these approaches tend to have poor generalization properties when attempting to align unseen point clouds at test time. PointNetLK has proven the exception to this rule by leveraging the intrinsic generalization properties of the Lucas Kanade (LK) image alignment algorithm to point cloud registration. The approach relies heavily upon the estimation of a gradient through finite differentiation – a strategy that is inherently ill-conditioned and highly sensitive to the step-size choice. To avoid these problems, we propose a deterministic PointNetLK method that uses analytical gradients. We also develop several strategies to improve large-volume point cloud processing. We compare our approach to canonical PointNetLK and other state-of-the-art methods and demonstrate how our approach provides accurate, reliable registration with high fidelity. Extended experiments on noisy, sparse, and partial point clouds depict the utility of our approach for many real-world scenarios. Further, the decomposition of the Jacobian matrix affords the reuse of feature embeddings for alternate warp functions.



There are no comments yet.


page 12

page 13


PointNetLK: Robust & Efficient Point Cloud Registration using PointNet

PointNet has revolutionized how we think about representing point clouds...

One Framework to Register Them All: PointNet Encoding for Point Cloud Alignment

PointNet has recently emerged as a popular representation for unstructur...

PCRNet: Point Cloud Registration Network using PointNet Encoding

PointNet has recently emerged as a popular representation for unstructur...

Fast Registration for cross-source point clouds by using weak regional affinity and pixel-wise refinement

Many types of 3D acquisition sensors have emerged in recent years and po...

CorAl – Are the point clouds Correctly Aligned?

In robotics perception, numerous tasks rely on point cloud registration....

EOE: Expected Overlap Estimation over Unstructured Point Cloud Data

We present an iterative overlap estimation technique to augment existing...

Learning Compact Geometric Features

We present an approach to learning features that represent the local geo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep learning has emerged as a powerful tool for 3D point cloud registration – where one wants to bring source and template point clouds into geometric alignment. Many “black box” strategies [40, 18, 32]

have been proposed that attempt to model the entire 3D point registration process as a neural network. Although exhibiting impressive results over a variety of benchmarks, such strategies tend to exhibit poor generalization performance if one needs to align point clouds that have not been seen during training (different object categories, different sensors, etc.). Iterative Closest Point (ICP) 

[3] and its variants [42, 6, 15, 31, 39] still tend to fare much better in this regard, and as a result, are still the method of choice for many problems in robotics and vision. Although these classical methods tend to exhibit superior generalization performance, they have their drawbacks. In particular, when the point clouds lack distinct local geometric features, it becomes difficult to find effective correspondences – limiting the utility of the approach for many important problems.

Recently, Aoki proposed a promising approach – PointNetLK [1] – to generalized 3D point registration that leverages insights from the classical Lucas & Kanade (LK) [19]

image alignment algorithm. Instead of using a neural network for modelling the entire registration pipeline, the approach learns only a point cloud embedding (PointNet). The actual registration process therein is deterministic and can be viewed as a modification of the classical LK algorithm. Another strength of the approach is that the embedding can be learned in a supervised end-to-end manner by unrolling the LK algorithm as a recurrent neural network. Unlike other comparable neural network strategies, PointNetLK exhibits remarkable generalization performance across a number of scenarios. Further, unlike ICP and its variants, the approach does not rely on the cumbersome step of establishing correspondences between point clouds.

A drawback to the approach, however, is the numerical manner in which gradients are estimated within the LK pipeline which can often lead to poor registration performance. It is well understood that numerical gradients are intrinsically noisy, sensitive to step-size choice, and in their limit inherently ill-conditioned [13]. In this paper, we advocate for a completely deterministic derivation of PointNetLK which circumvents many of its current limitations in terms of robustness and fidelity. Further, the approach allows for the derivation of an analytical Jacobian matrix that can be decomposed into “feature” and “warp” components. An advantage of this decomposition is that it allows for application-specific modifications to the “warp Jacobian” without the need for re-training the entire pipeline (something previously impossible with conventional PointNetLK). Our approach also circumvents some inherent memory and efficiency issues that arise when employing a deterministic gradient within PointNetLK. Specifically, we propose a novel point sampling strategy using seminal insights on critical points from PointNet [28] that allow for efficiency and good registration performance. We demonstrate impressive empirical performance across a number of benchmarks outperforming current state-of-the-art methods such as Deep Closest Point (DCP) [34] and PRNet [35].

2 Related Work

Full learning model. Deep learning and its successful applications in 3D vision have motivated researchers to tackle challenging 3D alignment problems. One approach that has been explored is the use of full learning models in which deep neural networks are used to model the entire registration pipeline. While some authors use RGB-D data to feed neural networks to estimate alignment transformations [14, 26], others [9, 8, 10] extract local features from point cloud patches. Recent works [40, 18] have focused at large-scale registration using the entire point cloud to extract correspondences. Despite the great progress, full learning models still lack generalizability to unseen data.

Hybrid learning model. Unlike traditional methods [17, 29, 30]

that rely on hand-crafted features to perform registration, hybrid learning models replace them by deep features. Elbaz  

[11] proposed to extract deep features by projecting 3D point cloud to 2D and then apply RANSAC [12] and ICP for registration. Wang and Solomon proposed the Deep Closest Point (DCP) [34] method that leverages the DGCNN [36]

for feature extraction and then they solve the transformation matrix using a differential SVD module. The same authors later proposed the PRNet 

[35] to extend DCP to handle partial registrations. Recently, Yew and Li proposed the RPM-Net [41] that combines the PointNet model with a robust point matching technique to estimate the rigid transformation. However, this method needs extra face normals to extract point feature. Although the hybrid learning models generalize well to various 3D shapes, these methods still need adequate keypoints for correspondence search. Aoki proposed the PointNetLK [1] which uses PointNet to extract deep features and the LK algorithm for point matching. Huang  [16] further improves PointNetLK with a point distance loss. However, PointNetLK and its variant rely on numerical gradients which are highly sensitive to the step-size choice and could result in poor and unstable performance.

Global registration. ICP-like registration methods are highly sensitive to initialization that may produce unreliable results. Some methods have tried to solve for a global optimal solution by either using branch-and-bound based optimization [39], RANSAC-based expansion [24], correspondence-dependent searching [43, 27]

, or probability-based registration 

[22]. Other novel ideas employed convex relaxation techniques to optimize the registration [23, 5]. However, these method demand large amounts of computing resources. Yang  [38]

recently proposed an outlier-free registration method that improves efficiency. Beyond traditional methods, Choy  

[7] applied deep learning-based features and a weighted Procrustes analysis to perform global optimization. A fundamental issue with global registration methods is that they rely on dense correspondences, which might produce low-fidelity results for 3D shapes that lack geometric features.

Lucas & Kanade algorithm. The image alignment framework proposed by Lucas and Kanade [19] and its derivatives [4, 2, 20, 25] seek to minimize the alignment error between two images by either using extracted distinct features or all the pixels in an image (photometric error). Lv  [21] used a neural network to extract pyramid features for image tracking. Wang  [33] proposed a regression-based object tracking framework, which integrates the LK algorithm into an end-to-end deep learning paradigm. In PointNetLK [1], the authors expanded the end-to-end LK tracking paradigm to 3D point cloud.

3 Background

Problem statement. Let and be the template and source point clouds respectively, where and are the number of points. The rigid transformation that aligns the observed source to the template can be defined as , where are the exponential map twist parameters and the generator matrices. The PointNet embedding function can be employed to encode a 3D point cloud into a -dimensional feature descriptor. Thus the point cloud registration problem can be formulated as


where the symbol denotes the rigid transformation. For computational efficiency, the inverse compositional Lucas & Kanade (IC-LK) formulation can be employed, and it is defined as


where .

PointNetLK. We review the canonical PointNetLK approach [1]. Instead of solving directly for a global solution , they iteratively solve for an incremental change as


where denotes the inverse composition, and the initial guess is set to . To solve this, we linearize Eq. 3 as


where is the Jacobian matrix defined as


The twist parameters can be solved as


where the symbol is the Moore-Penrose pseudo-inverse, . Finally, the twist parameters are updated iteratively as




The numerical PointNetLK approximates each direction, , of the Jacobian in Eq. 5 using finite differences as


where is the step size which infinitesimally perturbs the twist parameter of . Instead of learning a step size through the network, the algorithm requires a pre-defined step size for the approximation. However, this finite difference approximation is inherently problematic when the step size is infinitesimally small – numerical issues will raise, and the gradient approximation will become unstable. On the other hand, if the step size is relatively large, the approximation will also be inaccurate. Furthermore, the computational complexity of the numerical PointNetLK grows with the number of points and parameters.

4 Deterministic PointNetLK

In this section, we introduce our deterministic PointNetLK approach. Rather than approximating the Jacobian function using finite difference, we compute the exact

Jacobian given the input point cloud and its learned point features. We further explore possible improvements regarding the efficiency of our algorithm through changes in: network architecture, feature extraction, computation of the Jacobian, and point sampling. We then discuss several ways to design the loss function.

4.1 How to compute a deterministic Jacobian?

To compute a deterministic Jacobian for PointNetLK, we factorize from Eq. 5

into two parts with the chain rule as


For efficiency, we apply the inverse compositional Lucas & Kanade (IC-LK) algorithm. Thus, the calculation of

in each iteration is not necessary, and the initial transformation can be defined as an identity matrix,

. Eq. 10 then becomes,


The first part is the “feature gradient” which describes the changes in direction of the feature descriptors learned from the point cloud. We unroll the neural network to compute the “feature gradient”. The second part is the “warp Jacobian” as defined in the IC-LK algorithm. It can be pre-computed when we apply an identity warp to the template point cloud.

Let the template point cloud, , be . By forward passing through the simplified PointNet [28] model (3 layers and without the T-Net), a per-point (before the pooling operation) feature is extracted as


where is a matrix transformation, represents the bias term,

stands for the batch normalization layer,

denotes the element-wise rectified linear unit function, and

is -th layer. Thus our per-point embedding feature can be simplified as . We solve for the partial derivative of with respect to the input points as


where and the number of layers .

Since PointNet extracts a global feature vector, we apply the max pooling operation,

, to obtain the final Jacobian as


Given the closed-form Jacobian in Eq. 14, the whole point cloud registration pipeline can be deterministic. Following Eq. 678, we get the updates as , where . Note that our Jacobian formulation does not rely on finite differences. Our Jacobian is deterministic and does not require any step size to approximate the gradients. Thus, our method circumvents the numerical problems caused by the canonical PointNetLK.

4.2 Network design strategies

We noticed that the computational complexity of our deterministic PointNetLK grows with the number of points, which makes the naive implementation problematic for training. Here we propose several design strategies to make our method computationally efficient.

Feature aggregation. The input point cloud has points which we randomly split into segments. Each segment has number of points, and it is then fed to the network to get the feature vector . Then, we use max pooling to aggregate all into a global feature vector . Note that the aggregation strategy does not increase the time complexity.

Random feature. This strategy is similar to the previous one except that we do not aggregate features from each segment. We treat each segment of the point cloud as a small mini-batch. Without feature aggregation, we consider each mini-batch as individual data that enables the network to learn better representations. Moreover, the network converges faster to a solution.

Random points for the Jacobian computation. Instead of using the entire point cloud to estimate the deterministic Jacobian , we randomly sample 10% of the points to compute it. The dimension of each matrix for the Jacobian computation shrinks intensively. Theoretically, our deterministic PointNetLK can process the point cloud with a large number of points.

Compute Jacobian with aggregated points. While the point sampling through the 3D space is not representative (uniform sampling), to capture more important features in a point cloud, we can aggregate the Jacobian of each point cloud segment. Note that the math for the Jacobian computation still holds because we employ the max pooling operation.

Critical points for feature encoding. PointNet [28] proposed the use of critical points, which are the points that contribute the most to the global feature vector. Therefore, we also use critical points to evaluate our method. Moreover, we find that using critical points improves the efficiency of the model without loss of generalizability and accuracy.

4.3 Loss function

We employ different combinations of loss functions in our point cloud registration pipeline. The first loss is the error between the estimated transformation and the ground-truth transformation . The second loss measures the difference between the template feature vector and the source feature vector . Also, we explore the use of a point-based distance loss.

Transformation error loss. We want to minimize the mean squared error (MSE) between the estimated and the ground-truth transformations. For efficiency, we formulate the transformation loss as


where is an identity matrix and is the Frobenius norm. This formulation is computationally efficient because it does not require matrix logarithm operations.

Feature difference loss. To capture different feature signals for the transformed point clouds, we include a feature difference as another loss function during training. We want to minimize the error between the template point feature and the source point feature . Given that the encoded point feature is deterministic, if the point clouds are aligned, the feature difference should reduce to zero. The feature loss is defined as


width= Rot. Error (degrees) Trans. Error Algorithm RMSE  Median  RMSE  Median  ICP [3] 39.3255 5.0363 0.4743 0.0579 DCP [34] 5.5000 1.2024 0.0216 0.0043 PRNet [36] 40.6498 3.8742 0.1257 0.0210 Ours 3.3502 2.17e-6 0.0307 4.47e-8

Table 1: Results on unseen categories of ModelNet40. Our method outperforms other methods in most metrics. means smaller values are better.

width= Rot. Error (degrees) Trans. Error Algorithm RMSE  Median  RMSE  Median  ICP [3] 40.7131 5.8249 0.4778 0.0731 DCP [34] 8.5869 0.9295 0.0205 0.0029 PRNet [36] 60.9340 8.9274 0.1443 0.0274 Ours 4.2404 2.60e-6 0.0438 4.47e-8

Table 2: Results on unseen dataset ShapeNet Core V2. Our method can generalize to different 3D shapes and still preserves high fidelity. means smaller values are better.

Point distance loss. In [16], the authors mentioned using the Chamfer distance as a loss function for the numerical PointNetLK training. Rather than directly calculating the distance between the template point cloud and the source point cloud, they used a decoder to first retrieve the point cloud from the feature vector, then calculate the point distance of the reconstructed point cloud. The point distance loss defined in Eq. 17 implicitly combines the rotation and the translation through a 3D shape representation.


where is the reconstructed point cloud.

We can combine the loss functions as following:

for supervised learning;

for semi-supervised learning; and

for unsupervised learning. We employ

for most of our experiments. The and are used for ablation studies.

5 Experiments

We trained all the methods on the ModelNet40 [37] dataset. ModelNet40 has 3D shapes from 40 categories ranging from airplane, car, to plant, lamp. We sampled point clouds from vertices and all point clouds were centered at the origin within a unit box. To demonstrate the generalizability of our proposed method, we split the 40 categories into two parts. The first 20 categories are for training, while the last 20 categories are for testing. We also partitioned 20% of the training set for evaluation. The training transformations include rotations that were randomly drawn from and translations that were randomly sampled from . We applied the transformation to the source point cloud to get our template point cloud. During testing, we also sampled rotations from , and translations from for fair comparisons. We set the maximum number of iterations to ten for all the iterative methods. Since some correspondence-based methods require large computation, we only sampled up to points during testing. All testing experiments were performed on a single NVIDIA TITAN X (Pascal) GPU or a Intel Core i7-8850H CPU at 2.60 GHz. We adapted the code released by other methods for our experiments. Note that PRNet was trained on uniformly sampled points from the ModelNet40 dataset, since our training dataset is sparse, which would lead to ill-conditioned matrices, and causes SVD convergence problem.

Figure 2: Accuracy. The maximum error threshold lies in for rotation and for translation. Purple line shows that our method achieved nearly of success for alignments with a small maximum error threshold which indicates an absolute advantage over other methods.
Figure 3: Fidelity. We set extremely low error thresholds for both rotation and translation during testing. The purple line shows that our method preserves the highest fidelity among other methods. Orange line denotes that the numerical PointNetLK also achieves reasonable high accuracy. However, with the approximated numerical Jacobian, it lacks fidelity when compared with our method. Green and olive lines indicate the complete failure of DCP and PRNet when looking at the fidelity.

5.1 Accuracy and generalization

We report the accuracy of our method compared with other point cloud registration methods in Fig. 2 and Table 1. We first set a maximum threshold for the rotation error ranging from to , and translation error up to . For each range, we measured the ratio of successfully aligned point cloud to the total number of point cloud as the success ratio. As shown in Fig. 2, our method greatly outperforms the traditional registration method ICP [3], deep feature-based method DCP [34], and PRNet [35]. Even with a rotation error threshold less than and a translation error threshold less than , our method can still achieve of success ratio, while ICP has only of success ratio, DCP and PRNet nearly fail for all testing point clouds. The results indicate that our approach has highly accurate alignment results.

We also present results on ModelNet40 dataset with different measurement metrics in Table 1. We used the root mean squared error (RMSE) to evaluate the variation of each error and the median error (Median) to better represent the error distribution. Compared to other methods, our proposed approach has extremely low median error in both rotation and translation. This result reveals that our method achieves significant accuracy for most test cases, while only a small portion of them will fall into larger errors.

We have shown that our method can generalize to other object categories during testing when trained on different object categories. In Table 2, we provide results on ShapeNet Core.V2 dataset. Our method can still achieve remarkably small median error, which further highlights its generalizability.

Figure 4: Training and testing time.

Left figure shows the training time per epoch when training with the same GPU consumption. Our method takes about

minutes to train one epoch, while numerical PointNetLK takes minutes and DCP takes minutes. Right figure is the testing time of one point cloud on a single CPU. Purple line indicates that our method is fast during testing and is hardly affected by the number of points. As the number of points grows, the test time of correspondence-based methods grows quadratically.

5.2 Fidelity analysis

According to the results presented in Section 5.1, our method achieves high accuracy alignments. We further demonstrate the high fidelity of our method by setting the maximum rotation error threshold in , and the maximum translation error threshold in . In Fig. 3, we demonstrate that under an extremely small fidelity criterion, our approach achieves higher fidelity than the canonical PointNetLK and ICP, and also achieves high success ratio with infinitesimal registration errors. Other methods lost the fidelity when we set a small error criterion. The outperformance of our approach attributes to the deterministic gradient computation. Considering that we applied the LK algorithm on a 3D point cloud, we can directly process the spatial information with the deterministic “feature gradient” and the analytical “warp Jacobian”. Since our deterministic approach assures a high fidelity point cloud alignment, we can utilize it to refine the registration results given by other methods.

5.3 Efficiency

Fig. 4 demonstrates that our method is more computationally efficient than other methods during training and testing. We trained each network using points and a single GPU. During testing, we varied the number of points from to . Using a simplified PointNet with 3 layers and only 100 points for the Jacobian computation, our method is faster than the numerical PointNetLK. It also requires less space and time than other methods, especially when the number of points is large. With the number of points increasing, our method still maintains high efficiency. This suggests that our approach has the potential to efficiently cope with large number of points.

Figure 5: Robustness to noise.

We add Gaussian noise with zero mean and different standard deviations (

) to the source point cloud during testing. Note that the rotation error threshold is and the translation error threshold is . Our method is robust to noise as shown in the purple line. Even with relatively large Gaussian noise (std.), our method still has around successful registration cases under the success criterion.
Figure 6: Jacobian decomposition. We pre-compute the analytical “feature gradient” from point cloud. For different registration tasks, we do not need to re-train the entire registration pipeline. Only the “feature gradient” is learnable, with the “warp Jacobian” being defined analytically and easily modified. For example, in 3D registration, we can compose a 3D “warp Jacobian” to the pre-computed “feature gradient” in order to get the steepest descent point features as depicted in the upper row. If we impose a more constrained 2D rigid transformation as shown in the bottom row, a 2D “warp Jacobian” is computed and composed to get a 2D Jacobian.

5.4 Robustness to noise

To verify the robustness of our method to noise, we trained the model on noiseless data and then added Gaussian noise independently to each point during test time. Note that we only added noise to the source point cloud, which was a reasonable simulation of the real-world situation. We set the success registration criterion to be a rotation error smaller than and a translation error smaller than . Fig. 5 displays the area under the curve (AUC) result. Compared with the numerical PointNetLK, our approach is more robust to noise even when the source point cloud has large noise (0.04). When the data is noisy, the deterministic Jacobian provides more accurate gradients than the numerical one. DCP and PRNet fail when large noise is applied.

Figure 7: Sparse registration. Results on registration with different sparsity levels in the source point cloud. The success criterion is rotation error under and translation error under . All the models are trained with the complete point cloud. Purple line indicates that our method is relatively robust to the sparsity of the source. With points in the source point cloud, our method can still achieve AUC with current threshold. Numerical PointNetLK lacks accuracy, while DCP and PRNet fail even if 90% of the points are provided.

5.5 Sparse point cloud

In real-world applications, especially in the autonomous driving scenes, point clouds obtained from LiDAR sensors are sparse. To test the ability of our method to deal with sparse data, we simulated the sparsity in the source point clouds for the ModelNet40 dataset. Starting from a dense and complete template point cloud, we gradually subtracted a subset from the entire point cloud. In the end, we got the sparse source with a decreasing percentage of the points. Fig. 7 implies that our method maintains relatively high success ratio in sparse registration with more than of the points sampled from the source point cloud.

Figure 8: Partial registration. We use partial source point cloud during testing. The rotation error threshold is from to and the translation threshold is set between to . Left figure shows the success ratio of the rotation and the right figure is the translation success ratio. Our method is denoted in purple line, which surpasses other methods. Our approach preserves the high fidelity in rotation, but loses accuracy in translation.

width= Canonical Deterministic All Critical Feature Random Rot. Error (degrees) Trans. Error # Jacobian Jacobian points points aggregation feature RMSE Median RMSE Median 1 8.1825 3.63e-6 0.0743 5.96e-8 2 5.2323 2.47e-6 0.0580 5.96e-8 3 5.5578 2.83e-6 0.0493 5.96e-8 4 3.3502 2.17e-6 0.0307 4.47e-8 5 6.6980 2.90e-6 0.0533 5.96e-8 6 3.5657 2.25e-6 0.0318 4.47e-8 7 3.3234 2.18e-6 0.0380 4.47e-8 8 3.6901 2.12e-6 0.0382 3.73e-8 9 3.7418 1.76e-6 0.0339 2.98e-8 10 2.8975 1.90e-6 0.0286 2.98e-8

Table 3: Ablation study. Results on different network design strategies and loss functions. The deterministic PointNetLK achieved higher fidelity than the canonical PointNetLK, which highlights the advantage of our deterministic Jacobian. For supervised () and semi-supervised () training, using random features improved alignment results, while for unsupervised training, the aggregated feature was preferred. Replacing the feature difference loss with a point distance loss did not remarkably improve the performance.

5.6 Partial data

To further explore the capacity of our method to register point cloud in real-world scenes, we performed partial data experiment. We selected a complete shape as the template point cloud and obtained a partial source point cloud from the template by determining which points were visible from certain random camera poses. The simulation process was to set a camera at the origin facing at direction in the spherical coordinate, where was the polar angle, and was the azimuthal angle. We sampled

from a normal distribution

and from a normal distribution . Then, we moved the source point cloud along the vector , where the radical distance was set as . Next, we determined which points were visible to the camera. These visible points would be the partial source point cloud. Although our method achieves the highest success ratio for both rotation and translation, as shown in Fig. 8, it could not retain a high fidelity for translation changes. The main reason is that for the partial data, our approach is unable to know the real center of the object. When subtracting the mean and centering the partial point cloud at the origin, it infers a wrong point cloud center. Another reason lies in the feature loss function (Eq. 16) we used. The feature loss can be large when the template is complete and the source is a partial scan.

Figure 9: Robustness results. Visual results of our point cloud registration method under different conditions. We show the template models as 3D surfaces for better visualization and the source point clouds are in black. The registered point cloud is shown in purple. Our deterministic PointNetLK method is robust to noisy, sparse, and partial data. The 3D models are from ModelNet40.

5.7 Decomposition of the Jacobian

An advantage of our method is that we can decompose the Jacobian function into a deterministic “feature gradient” and an analytical “warp Jacobian”. Seeing that the “feature gradient” is deterministic and separated from the Jacobian function, we can reuse it for alternate alignment tasks. During testing, we are able to compute the Jacobian function without re-training the complete registration pipeline. We only need to compute the “warp Jacobian” and compose the final Jacobian function with pre-computed “feature gradient”. Examples are showcased in Fig. 6.

5.8 Ablation study

We have introduced several network strategies and different loss functions for our approach. In this section, we compare these various settings111The Aggregated Jacobian results are in the supplementary material.. Table. 3 lists errors for different metrics. The first two rows are the canonical PointNetLK. Replacing feature difference loss and transformation error loss with single point distance loss in row 2 improved the accuracy. The row 3-10 shows results of our deterministic PointNetLK. It did not lose accuracy when aggregating feature for the entire point cloud (shown in rows 3, 5, 9, and 10). If we use random feature rather than aggregated feature (rows 4, 6, 7, and 8), we can improve the fidelity. However, in semi-supervised learning (rows 8 and 10), aggregated feature will generate better results.

Note that with random feature, our method is faster. Adopting point distance loss rather than feature loss (rows 7, 8, 9, and 10) will gain slight improvements.

6 Discussion and Future Work

The deterministic PointNetLK is a deep feature-based registration method that preserves high fidelity, generalization, and efficiency. Unlike other full learning methods, our approach uses PointNet to extract point cloud features and deterministic LK algorithm for registration. Such hybrid model leverages point feature representation from a neural network and the intrinsic generalizability of LK algorithm.

We advocate to solve the Jacobian function using two separate deterministic gradients. We unroll the network to compute the accurate “feature gradient” signal corresponding to spatial locations. Still, the decomposition of the Jacobian function enables the reuse of “feature gradient” in different applications without re-training the entire registration pipeline. Furthermore, we propose different strategies to deal with efficiency.

Our experiments highlight the high fidelity property of our method and its robustness to different data settings which can be used for registration refinement. In addition, We have noticed that choosing the proper data for training is crucial for our method. One of the future work will be finding possible ways to better pre-process 3D point cloud. Another possible work is to improve the “feature gradient” using different encoding strategies. In addition, we want to extend our method to global registration framework. Finally, we expect our method to be applied in the real-world scenarios, particularly in the SLAM community.


  • [1] Y. Aoki, H. Goforth, R. A. Srivatsan, and S. Lucey. PointNetLK: robust & efficient point cloud registration using PointNet. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , pages 7163–7172, 2019.
  • [2] S. Baker and I. Matthews. Lucas-Kanade 20 years on: a unifying framework. International journal of computer vision, 56(3):221–255, 2004.
  • [3] P. J. Besl and N. D. McKay. Method for registration of 3D shapes. In Sensor fusion IV: control paradigms and data structures, volume 1611, pages 586–606. International Society for Optics and Photonics, 1992.
  • [4] J.-Y. Bouguet et al. Pyramidal implementation of the affine Lucas-Kanade feature tracker description of the algorithm. Intel corporation, 5(1-10):4, 2001.
  • [5] J. Briales and J. Gonzalez-Jimenez. Convex global 3D registration with Lagrangian duality. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4960–4969, 2017.
  • [6] D. Chetverikov, D. Svirko, D. Stepanov, and P. Krsek. The trimmed iterative closest point algorithm. In Object recognition supported by user interaction for service robots, volume 3, pages 545–548. IEEE, 2002.
  • [7] C. Choy, W. Dong, and V. Koltun. Deep global registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2514–2523, 2020.
  • [8] H. Deng, T. Birdal, and S. Ilic. PPF-FoldNet: unsupervised learning of rotation invariant 3D local descriptors. In Proceedings of the European Conference on Computer Vision (ECCV), pages 602–618, 2018.
  • [9] H. Deng, T. Birdal, and S. Ilic. PPFNet: global context aware local features for robust 3D point matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 195–205, 2018.
  • [10] H. Deng, T. Birdal, and S. Ilic. 3D local features for direct pairwise registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3244–3253, 2019.
  • [11] G. Elbaz, T. Avraham, and A. Fischer. 3D point cloud registration for localization using a deep neural network auto-encoder. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4631–4640, 2017.
  • [12] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981.
  • [13] B. Fornberg. Numerical differentiation of analytic functions. ACM Transactions on Mathematical Software (TOMS), 7(4):512–526, 1981.
  • [14] Z. Gojcic, C. Zhou, J. D. Wegner, and A. Wieser. The perfect match: 3D point cloud matching with smoothed densities. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5545–5554, 2019.
  • [15] M. Greenspan and M. Yurick. Approximate KD tree search for efficient ICP. In Proceedings of the IEEE International Workshop on 3D Digital Imaging and Modeling (3DIM), pages 442–448. IEEE, 2003.
  • [16] X. Huang, G. Mei, and J. Zhang. Feature-metric registration: a fast semi-supervised approach for robust point cloud registration without correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 11366–11374, 2020.
  • [17] A. E. Johnson and M. Hebert. Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 21(5):433–449, 1999.
  • [18] W. Lu, G. Wan, Y. Zhou, X. Fu, P. Yuan, and S. Song. DeepVCP: an end-to-end deep neural network for point cloud registration. In Proceedings of the International Conference on Computer Vision (ICCV), pages 12–21, 2019.
  • [19] B. D. Lucas, T. Kanade, et al. An iterative image registration technique with an application to stereo vision.

    Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)

    , 1981.
  • [20] S. Lucey, R. Navarathna, A. B. Ashraf, and S. Sridharan. Fourier Lucas-Kanade algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 35(6):1383–1396, 2012.
  • [21] Z. Lv, F. Dellaert, J. M. Rehg, and A. Geiger. Taking a deeper look at the inverse compositional algorithm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4581–4590, 2019.
  • [22] Y. Ma, Y. Guo, J. Zhao, M. Lu, J. Zhang, and J. Wan. Fast and accurate registration of structured point clouds with small overlaps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9, 2016.
  • [23] H. Maron, N. Dym, I. Kezurer, S. Kovalsky, and Y. Lipman. Point registration via efficient convex relaxation. ACM Transactions on Graphics, 35(4):1–12, 2016.
  • [24] N. Mellado, D. Aiger, and N. J. Mitra. Super 4PCS fast global point cloud registration via smart indexing. In Computer Graphics Forum, volume 33, pages 205–215. Wiley Online Library, 2014.
  • [25] S. Oron, A. Bar-Hille, and S. Avidan. Extended Lucas-Kanade tracking. In Proceedings of the European Conference on Computer Vision (ECCV), pages 142–156. Springer, 2014.
  • [26] G. D. Pais, S. Ramalingam, V. M. Govindu, J. C. Nascimento, R. Chellappa, and P. Miraldo. 3DRegNet: a deep neural network for 3D point registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7193–7203, 2020.
  • [27] Y. Pan, B. Yang, F. Liang, and Z. Dong. Iterative global similarity points: a robust coarse-to-fine integration solution for pairwise 3D point cloud registration. In Proceedings of the International Conference on 3D Vision (3DV), pages 180–189. IEEE, 2018.
  • [28] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. PointNet: deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 652–660, 2017.
  • [29] R. B. Rusu, N. Blodow, and M. Beetz. Fast point feature histograms (FPFH) for 3D registration. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 3212–3217. IEEE, 2009.
  • [30] S. Salti, F. Tombari, and L. Di Stefano. SHOT: unique signatures of histograms for surface and texture description. Computer Vision and Image Understanding (CVIU), 125:251–264, 2014.
  • [31] R. Sandhu, S. Dambreville, and A. Tannenbaum. Particle filtering for registration of 2D and 3D point sets with stochastic dynamics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–8. IEEE, 2008.
  • [32] V. Sarode, X. Li, H. Goforth, Y. Aoki, R. A. Srivatsan, S. Lucey, and H. Choset. PCRNet: point cloud registration network using pointnet encoding. arXiv preprint arXiv:1908.07906, 2019.
  • [33] C. Wang, H. K. Galoogahi, C.-H. Lin, and S. Lucey. Deep-LK for efficient adaptive object tracking. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 627–634. IEEE, 2018.
  • [34] Y. Wang and J. M. Solomon. Deep Closest Point: learning representations for point cloud registration. In Proceedings of the International Conference on Computer Vision (ICCV), pages 3523–3532, 2019.
  • [35] Y. Wang and J. M. Solomon. PRNet: self-supervised learning for partial-to-partial registration. In Neural Information Processing Systems (NIPS), 2019.
  • [36] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics (TOG), 38(5):146, 2019.
  • [37] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3D Shapenets: a deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1912–1920, 2015.
  • [38] H. Yang, J. Shi, and L. Carlone. TEASER: fast and certifiable point cloud registration. arXiv preprint arXiv:2001.07715, 2020.
  • [39] J. Yang, H. Li, and Y. Jia. Go-ICP: solving 3D registration efficiently and globally optimally. In Proceedings of the International Conference on Computer Vision (ICCV), pages 1457–1464, 2013.
  • [40] Z. J. Yew and G. H. Lee. 3DFeat-Net: weakly supervised local 3D features for point cloud registration. In Proceedings of the European Conference on Computer Vision (ECCV), pages 630–646. Springer, 2018.
  • [41] Z. J. Yew and G. H. Lee. RPM-Net: robust point matching using learned features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 11824–11833, 2020.
  • [42] Z. Zhang. Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision (IJCV), 13(2):119–152, 1994.
  • [43] Q.-Y. Zhou, J. Park, and V. Koltun. Fast global registration. In Proceedings of the European Conference on Computer Vision (ECCV), pages 766–782. Springer, 2016.

7 Supplementary Material

In this supplementary material, we further explain our network design strategies, and present more visual registration results of our deterministic PointNetLK.

8 Network Design Strategies

Figure 10: Network design strategies. We can choose different combination of these strategies – (a)(c), (a)(d), (b)(c), (b)(d) – to improve the accuracy and the efficiency of our approach.

We have introduced network design strategies in Section 4.2 of our main paper. In this section, we include a demonstration figure (Fig. 10) to further explain our strategies.

As shown in Fig. 10 (a), the long green box is the entire point cloud that contains points. We split it into segments where each segment is denoted as a small green box. Each segment is encoded through a per-point embedding PointNet (orange box) to get a feature which is depicted as a blue box. We then concatenate all the feature segments to get . We use max pooling to get our final feature vector . If we want to speed up the training, we can use random feature as our final (see Fig. 10 (b)). Same strategies can be used to compute the Jacobian function. We can randomly choose a point cloud segment to compute the Jacobian as depicted in Fig. 10 (c). The purple box denotes parameters needed for Jacobian computation. However, when the point cloud is not a good representation of the 3D shape, like uniformly sampling, we aggregate of each point cloud segment (shown in Fig. 10 (d)). The red dash box represents the same operation in Fig. 10 (c).

We also provide quantitative results for different Jacobian computation strategies to complete our ablation study table (as shown in Table 4). We find that if we use the aggregated Jacobian, there is no significant change in the performance.

width= Random Aggregated Feature Random Rot. Error (degrees) Trans. Error # Jacobian Jacobian aggregation feature RMSE Median RMSE Median 3 5.5578 2.83e-6 0.0493 5.96e-8 4 3.3502 2.17e-6 0.0307 4.47e-8 7 3.3234 2.18e-6 0.0380 4.47e-8 8 3.6901 2.12e-6 0.0382 3.73e-8 9 3.7418 1.76e-6 0.0339 2.98e-8 10 2.8975 1.90e-6 0.0286 2.98e-8 11 6.0874 2.77e-6 0.0665 5.96e-8 12 5.0043 2.13e-6 0.0546 4.47e-8 13 4.4247 1.71e-6 0.0481 2.98e-8 14 4.2186 1.91e-6 0.0457 2.98e-8

Table 4: Ablation study (continued). Results on different network design strategies and loss functions. All the models used the deterministic Jacobian and entire point cloud (we did not use critical points in this experiment). Rows 3, 4, 5, 6, 7, 8, 9, and 10 are results extracted from our main paper. means smaller values are better.

9 Visual Results

In Section 5 of our main paper, we have showed the robustness of our deterministic PointNetLK approach on different registration scenarios. We provide more visual results of our approach in this section.

9.1 Generalizability

Our approach has superior generalizability over different dataset. We show several registration results on the Stanford 3D scan dataset222 in Fig. 11.

Figure 11: Generalizability. Visual results on several Stanford 3D scans. Note that our model was trained on half ModelNet40 dataset. Gray surfaces indicate the template, black point cloud is the source, and purple point cloud denotes our registration result.

9.2 Results for complete data

Fig. 12 shows the visual registration results on the complete model.

Figure 12: Complete data registration. Registration results on complete ModelNet40 dataset. Our method has high fidelity registration results.

9.3 Results for noisy data

Fig. 13 displays the registration results on noisy dataset. Our method is robust to noise.

Figure 13: Noise data. We add Gaussian noise independently to each point in the source point cloud. Visual results point out that our method is robust to Gaussian noise.

9.4 Results for sparse data

We present results on sparse data registration in Fig. 14. Though there is only data in the source point cloud, our method still has great performance.

9.5 Results for partial data

Fig. 15 shows the partial data registration results. Our approach has relatively good performance for the partial registration.

Figure 14: Sparse point cloud. We preserved points in the source point cloud while the template is a complete, dense point cloud. Results indicate that our method is robust to sparse point cloud registration.
Figure 15: Partial data. The source point cloud is the partial data, where the template is the complete point cloud. The registration results show that our method is relatively robust to the partial point cloud registration.