Deep Models with Fusion Strategies for MVP Point Cloud Registration

by   Lifa Zhu, et al.
Shanghai Jiao Tong University
NetEase, Inc

The main goal of point cloud registration in Multi-View Partial (MVP) Challenge 2021 is to estimate a rigid transformation to align a point cloud pair. The pairs in this competition have the characteristics of low overlap, non-uniform density, unrestricted rotations and ambiguity, which pose a huge challenge to the registration task. In this report, we introduce our solution to the registration task, which fuses two deep learning models: ROPNet and PREDATOR, with customized ensemble strategies. Finally, we achieved the second place in the registration track with 2.96546, 0.02632 and 0.07808 under the the metrics of Rot_Error, Trans_Error and MSE, respectively.



There are no comments yet.


page 6


Multi-View Partial (MVP) Point Cloud Challenge 2021 on Completion and Registration: Methods and Results

As real-scanned point clouds are mostly partial due to occlusions and vi...

Target-less registration of point clouds: A review

Point cloud registration has been one of the basic steps of point cloud ...

Deep Hough Voting for Robust Global Registration

Point cloud registration is the task of estimating the rigid transformat...

A General Framework for Flexible Multi-Cue Photometric Point Cloud Registration

The ability to build maps is a key functionality for the majority of mob...

Multi-view Point Cloud Registration with Adaptive Convergence Threshold and its Application on 3D Model Retrieval

Multi-view point cloud registration is a hot topic in the communities of...

Deep Closest Point: Learning Representations for Point Cloud Registration

Point cloud registration is a key problem for computer vision applied to...

RAR: Region-Aware Point Cloud Registration

This paper concerns the research problem of point cloud registration to ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The main goal of point cloud registration in Multi-View Partial (MVP) Challenge 2021111 [pan2021variational] is to estimate a rigid transformation to align a point cloud pair. The pairs in this competition have the characteristics of low overlap, non-uniform density, unrestricted rotations and ambiguity, which pose a huge challenge to the registration task.

In this report, we introduce our solution to the registration task, which fuses two deep learning models: ROPNet and PREDATOR [huang2020registration], with customized ensemble strategies. We propose ROPNet, a new deep learning model using Representative Overlapping Points with discriminative features for registration that transforms partial-to-partial registration into partial-to-complete registration. Specifically, we propose a context-guided module which uses an encoder to extract global features for predicting point overlap score. To better find representative overlapping points, we use the extracted global features for coarse alignment. Then, we introduce a Transformer [guo2020pct] to enrich point features and remove non-representative points based on point overlap score and feature matching. A similarity matrix is built in a partial-to-complete mode, and finally, weighted SVD is adopted to estimate a transformation matrix.

PREDATOR  [huang2020registration] is a recent work which implements pairwise point cloud registration with deep attention on overlap region. It shows great performance in both 3DMatch [zeng20173dmatch] and 3DLowMatch [huang2020registration]. However, object-centric point clouds in MVP registration are different from scene-centric data in 3DMatch, which are ambiguous or symmetric. (detailed information can be seen in section 5). That is why we chose an ensemble model strategy. Furthermore, to make PREDATOR work better, we solved a simple but important GNN bug222, adjusted parameters for the MVP registration challenge and also, solved registration in a partial-to-complete manner during RANSAC iterations.

Finally, we propose a few ensemble rules based on data characteristics to fuse ROPNet and PREDATOR that help to achieve a better performance in a large variety of cases. We achieved 3.16656, 0.029237 and 0.08451 on the validation set with the metrics of Rot_Error, Trans_Error and MSE, respectively. In the MVP Point Cloud Challenge 2021 we achieved the place in the registration track with error values of , and on test set.

2 Team Details

  • CodaLab user name


  • Team name


  • Team member names

    Lifa Zhu, Changwei Lin, Dongrui Liu, Xin Li, Francisco Gómez-Fernández

  • Affiliation(s)

    Deep Glint, Shanghai Jiaotong University, Sichuan University

  • CodaLab email address

  • Final rank of the team in the development phase

    in the registration track.

  • Link to the codes/executables of the solution(s)

    We released our ROPNet implementation at The official code of PREDATOR was released by the the authors at We also released our unofficial implementation of PREDATOR at

3 Contribution Details

  • Title of the contribution

    Deep models with fusion strategies for partial-to-partial point cloud registration under unrestricted rotations.

  • General method description

    We propose to fuse ROPNet and PREDATOR[huang2020registration] to solve point cloud registration in the MVP Challenge. ROPNet and the pipeline of our solution to this challenge can be seen in Figure 1 and Figure 2.

    Figure 1: Overview of ROPNet registration pipeline. The CG module consumes the source (green) and target (red) point clouds, and outputs initial pose and overlapping points (non-overlapping points are in black). The TFMR module takes the output of CG module as input, and generates accurate correspondences. The FMR step removes false correspondences (blue lines) and keeps some positive correspondences (gray lines).
    Figure 2: The pipeline of our solution to the MVP Registration Challenge. The point cloud in green is the source point cloud, and the point cloud in red is the target point cloud.

    In ROPNet, we proposed a context-guided module which uses an encoder to extract global features for predicting point overlap score and introduced a Transformer to enrich point features and remove non-representative points based on point overlap score and feature matching. Using ROPNet, we achieved low rotation and translation error on validation set whose rot_level=0, as shown in Table 7.

    Figure 3: Left: Architecture of the CG module. CG module consumes source (in green) and target (in red) data, and outputs overlap score (, ) and initial transformation matrix . Right: Details of information interaction. It takes point features and global features as input and outputs fused point features based on the pair.

    Figure 4: Left: Overview of Transformer-based feature matching removal (TFMR) module. TFMR module takes the transformed source and target as input, and outputs representative points and their correspondences. and are the output from CG module that denote initial alignment and overlap score for . Right: Details of feature matching removal (FMR). It takes correspondences for overlapping source points and outputs accurate correspondences (gray lines).

    Considering unrestricted rotations, we used PREDATOR in our pipeline. In PREDATOR source code, we found and solved a simple but important GNN bug which helps the network to obtain a higher performance. Then, we adjusted parameters in PREDATOR for the MVP registration challenge. Inspired by the idea of partial-to-complete proposed in ROPNet, we try to remove some points in source data based on the predicted scores and keep all points in target data during RANSAC iterations. However, as shown in the results reported in Table 7, the registration error is still not ideal due to the data characteristics. We will analyse the data characteristics in section 5.

    Based on the above discussions, we designed a few ensemble strategies based on data characteristics to help fuse ROPNet and PREDATOR to estimate the final rigid transformation. Experiments on the validation set showed it is effective on most cases, with few fails.

  • Representative image / diagram of the method(s)

    ROPNet can be seen in Figure 1. Our proposed context-guided (CG) module and Transformer-based Feature Matching Removal (TFMR) Module in ROPNet can seen in Figure 3 and Figure 4. The pipeline of our solution is shown in Figure 2.

4 Method and Data Details

  • Training description

    We trained ROPNet and PREDATOR independently. All 2048 points were involved in training for the two models. For ROPNet, we trained for 600 epochs using Adam optimizer with initial learning rate of 0.0001. The learning rate changes using a cosine annealing schedule. We trained ROPNet in a non-iterative manner. However, we run 2 iterations for the TFMR module during test. It is noted that we only trained ROPNet for small rotation angles ranging from 0° to 45°. Also, we did not use Point Pair Features


    , because we could not get accurate normal vectors in the MVP challenge data.

    For PREDATOR, following the code333 released by the authors, we trained on MVP registration dataset for 200 epochs using SGD with 0.98 momentum. The initial learning rate was 0.01, with an exponential decay factor of 0.95 every epoch. We trained PREDATOR under unrestricted rotation angles ranging from 0° to 360°. In addition, we adjusted some parameters such as voxel size to 0.04, sampled points in circle loss, and others in loss implementation.

  • Testing description

    For each source and target point cloud pair, we estimate transformations and based on ROPNet and PREDATOR, respectively. is the output predicted from source to target using end-to-end ROPNet model. is also the transformation from source to target, which is obtained with RANSAC using features and keypoints provided by PREDATOR. We select or based on our proposed ensemble rules, which will be introduced in section 5.

  • Results of the comparison to other approaches

    We compare our method with RPMNet_corr [zodage2020correspondence], which is the variant of RPMNet [yew2020rpm] to help solving registration with unrestricted rotations. The results in Table 1 shows that our ROPNet achieves much lower registration error than RPMNet_corr when rot_level is 0. When rot_level is not restricted, the ensemble model of ROPNet and PREDATOR also achieves much lower registration error than RPMNet_corr.

    Model rot_level Error(R) Error(t) MSE
    RPMNet_corr 0 12.5560 0.1674 0.3865
    ROPNet 0 1.0449 0.0193 0.0375
    RPMNet_corr 0, 1 21.9685 0.2062 0.5896
    ROPNet + PREDATOR 0, 1 3.16656 0.029237 0.08451
    Table 1: Comparison to other approaches on val set.
  • Results on other standard benchmarks

    We compared our ROPNet with several classic deep learning registration networks in the ModelNet40 [wu20153d] dataset, including DCP [wang2019deep], IDAM [li2019iterative], DeepGMR [yuan2020deepgmr] and RPMNet [yew2020rpm]. We generate partial point cloud pairs following RPMNet, then we use 40 categories in ModelNet40 for training, and test over 40 categories on the test set. We evaluate the registration in terms of the isotropic rotation and translation error


    proposed in RPMNet[yew2020rpm], where and represent the predicted and the ground truth transformation respectively, means the trace of matrix. Moreover, we evaluate the isotropic rotation and translation error used in DCP [wang2019deep] by calculating mean absolute error of Euler angle and translation vector. Both and represent rotation error in degrees. The results in Table 2 indicate that ROPNet outperforms other methods, exceeding DCP, IDAM and DeepGMR by a large margin. This is also the reason why we chose ROPNet as our baseline method.

    DCP-v2 11.1723 0.1356 5.6421 0.0657
    IDAM-GNN 14.2891 0.1909 7.4966 0.0877
    DeepGMR 14.3612 0.1589 7.0914 0.0775
    RPMNet 1.4239 0.0139 0.7304 0.0065
    ROPNet 1.1567 0.0108 0.5946 0.0051
    Table 2: Results on other standard benchmarks (ModelNet40 unseen shapes).
  • Novelty degree of the solution and whether it has been previously published

    • We proposed an end-to-end network ROPNet which may be the first work to transform partial-to-partial registration to partial-to-complete registration.

    • A simple yet effective CG module is proposed to obtain overlapping points and an initial alignment.

    • We proposed TFMR module which uses transformer to enrich point feature and removes non-representative points by feature matching.

    • We proposed a few customized ensemble strategies to fuse registration networks for MVP registration challenge.

    Our preprint version of ROPNet can be accessed at which has not yet been published.

  • Comment the robustness and generality of the proposed solution(s)?

    In order to show the robustness and generality of our work, we conducted further experiments that are discussed in the following.

    To validate the model generalization ability, we use the first 8 categories for training and the rest 8 categories for testing. As shown in Table 3, both ROPNet and PREDATOR have good generalization abilities. Also, comparing the second and the third row, we can see that our ROPNet has a better generalization than PREDATOR.

    We conducted robustness experiments in the first 8 categories mainly considering two aspects: noise and point cloud density. Firstly, we add noise which is sampled from and clipped to for each point independently to validate the model robustness. Secondly, we trained on 2048 points and evaluate on 1024 points sampled from the original 2048 points. As shown in the top-3 rows of Table 4, ROPNet is robust to density variation and noise. From the bottom 2 rows, we can see that the ensemble model is robust to noise.

    Furthermore, we conducted generalization and robustness experiments on ModelNet40 partial-to-partial registration. We use the first 20 categories for training and the rest 20 categories for testing and validating the model generalization ability. We evaluate the robustness of model in the presence of noise sampled in the same way as before. As shown in Table 5, ROPNet is robust to noise with good generalization abilities.

    However, there are still some hard cases or ambiguity cases that we can’t register them well. The last row in Figure 6 shows some bad cases in validation set whose rotation errors (in degree) are bigger than 10°.

    Model Error(R) Error(t) MSE
    ROPNet(8) + PREDATOR(8) 6.0865 0.0408 0.1470
    ROPNet(16) + PREDATOR(8) 5.5636 0.0303 0.1274
    ROPNet(8) + PREDATOR(16) 4.1196 0.0290 0.1009
    ROPNet(16) + PREDATOR(16) 3.8302 0.0251 0.0920
    Table 3: Results on MVP registration unseen categories. The number in brackets indicates the number of training categories.
    Model rot_level Noise npoints Error(R) Error(t) MSE
    ROPNet 0 2048 1.2599 0.0199 0.0418
    ROPNet 0 1024 1.5246 0.0264 0.0530
    ROPNet 0 1024 1.6637 0.0291 0.0581
    ROPNet + PREDATOR 0, 1 2048 1.9003 0.0260 0.0592
    ROPNet + PREDATOR 0, 1 2048 2.4904 0.0249 0.0684
    Table 4: Results on the first 8 categories of MVP to validate the robustness.
    Methods Noise
    ROPNet 1.1637 0.0116 0.6190 0.0055
    ROPNet 1.4656 0.0145 0.7799 0.0070
    Table 5: Results on ModelNet40 unseen categories.
  • Comment the efficiency of the proposed solution(s)?

    We evaluate the inference speed of ROPNet, PREDATOR and ROPNet + PREDATOR, independently. Both the source and target point cloud have 2048 points. Tests were run on a single GeForce GTX TITAN X with Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz, 32GB RAM, which is different from the training environment. As shown in Table 6, we need 0.152s, 0.457s and 0.768s for one pair registration with ROPNet, PREDATOR and ensemble models, respectively. By updating RANSAC iterations in PREDATOR from 2M to 0.1M, the ensemble models are more faster with a slight performance decrease.

    Model rot_level time (s) Error(R)
    ROPNet 0 0.152 1.0449
    PREDATOR (RANSAC 0.1M) 0, 1 0.270 7.6118
    PREDATOR (RANSAC 2M) 0, 1 0.457 6.8290
    ROPNet + PREDATOR (RANSAC 0.1M) 0, 1 0.580 3.86217
    ROPNet + PREDATOR (RANSAC 2M) 0, 1 0.768 3.16656
    Table 6: Inference speed for different models.

5 Ensembles and Fusion Strategies / Ablation Studies (if any)

  • Describe in detail the use of ensembles and/or fusion strategies (if any).

    (a1) 1-89-predator-180 (a2) 1-89-ropnet-0.3 (b1) 4-308-predator-176 (b2) 4-308-ropnet-32
    (c1) 14-1081-predator-14 (c2) 14-1081-ropnet-0.8 (d1) 15-1155-predator-180 (d2) 15-1155-ropnet-1
    Figure 5: Visualization of registration on validation set. The source, target and predicted point cloud are in green, red and blue, respectively. The description below denotes category-id-model-Error(R).

    We applied ensemble models based on the observation that some point cloud pairs in the MVP dataset are ambiguous and challenging, as shown in Figure 5:

    • (a1)(a2) denotes registration of plane-oriented categories.

    • (b1)(b2) denotes registration of rotational-symmetry categories.

    • (c1)(c2) denotes registration with very low overlap.

    • (d1)(d2) denotes registration of axisymmetric categories.

    • (a1)(b1)(d1) are categories which are ambiguous for registration.

    For example, as shown in Figure 5 (a1)(a2), we intuitively believe that the two registration results are reasonable. However, we calculated the rotation error based on ground truth transformation, and obtain error of 180 with PREDATOR and error of 0.3 with ROPNet, respectively. From Figure 5 (b1)(b2) and (d1)(d2), we get the same conclusion. We think they all are ambiguous pairs. For pair Figure 5 (c1) with low overlap, RANSAC implementation by Open3D [zhou2018open3d] v0.9 in PREDATOR, whose evaluation criteria is based on overlap, tends to obtain a higher overlap which it is not reasonable for pairs like (c1). Besides, as shown in Figure 2, when sampled points are concentrated on the same area during each RANSAC iteration, they can hardly get the proper registration result.

    Based on the above observations, we try to fuse ROPNet and PREDATOR on the MVP registration challenge. Here, we designed four rules to select the final transformation based on overlap and rotation matrix.

    First, let’s explain some mathematical symbols. We reuse and defined in section 4. Also, we estimate the transformation from target point cloud to source point cloud based on ROPNet. We also calculated overlap and between the transformed source point cloud and target point cloud based on and . For each transformation , we have rotation matrix and translation vector . Now, we have the following specially designed rules:

    • : If rotation error between and is smaller than the defined threshold , we select with high confidence.

    • : If (in degree) is smaller than the threshold , we select with high confidence.

    • : If is smaller than the threshold , we select with high confidence

    • : If , we select with high confidence. is the pre-defined threshold.

    We set different for each category on the validation set. We fuse the above rules using the following predicate:

    to decide which model we should use.

    If the result of evaluating is True, we select ROPNet, otherwise PREDATOR. Experiments on the validation set showed it is effective on most pair cases, with few failed cases.

  • What was the benefit over the single method?

    As observed in Figure 5, some pairs are ambiguous, with low overlap or under unrestricted rotation. ROPNet is good at processing the registration problems with small rotation angles, even when the pair has low overlap. However, it is difficult to deal with registration under large rotations. PREDATOR can solve registration problems under unrestricted angles. However, it is not ideal for object-centric point clouds registration, especially for pairs with low overlap, plane structures or symmetric structures. Considering the above observations, we fused the two models and achieved better performance in MVP registration track.

  • What were the baseline and the fused methods?

    We can see the quantitative results in Table 7 and Table 8. More visualization results can be seen in Figure 6.

    Model rot_level Error(R) Error(t) MSE
    ROPNet 0 1.0449 0.0193 0.0375
    PREDATOR 0 6.5910 0.0430 0.1581
    PREDATOR 0, 1 6.8290 0.0413 0.1605
    Table 7: Evaluation on validation set based on single model.
    label 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 total total
    type validation test

    0.4579 2.7508 0.6073 0.6676 8.2973 0.4111 5.3276 1.5040 5.1859 5.3121 8.7753 3.2399 0.2696 0.5272 1.5929 5.7384 3.16656 2.96546
    Error(t) 0.0198 0.0517 0.0152 0.0143 0.0392 0.0131 0.0855 0.0281 0.0258 0.0305 0.0434 0.0255 0.0115 0.0111 0.0192 0.0339 0.029237 0.02632
    MSE 0.0278 0.0997 0.0258 0.0260 0.1840 0.0202 0.1785 0.0544 0.1163 0.1233 0.1966 0.0820 0.0162 0.0203 0.0470 0.1341 0.08451 0.07808
    Table 8: Evaluation on validation and test set based on ensemble model.
    Figure 6: Registration visualization on MVP registration. The first 4 rows show registration results on test set. The last row shows the registration results on validation set whose rotation errors (in degree) are bigger than 10°. The source and target point cloud are marked in green and red respectively. The blue one is the transformed point cloud using the estimating transformation.

6 Reproducibility Details

  • Implementation details (including language, platform, parallelization and memory requirements)

    We have tested our code on Ubuntu 16.4, Python 3.7, PyTorch (1.7.1+cu101) torchvision (0.8.2+cu101), GCC 5.4.0, Open3D 0.9.0. Almost all experiments were run on a Tesla V100 GPU with an Intel 6133 CPU @ 2.50GHz, 320G RAM. Part of test experiments were run on a single GeForce GTX TITAN X with Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz, 32GB RAM.

  • Training/testing time?

    It required about 7 minutes to train ROPNet for one epoch on MVP registration data (with 2048 points), and about 70 hours for training 600 epochs. It required about 18 minutes to train PREDATOR for one epoch, and about 60 hours for training 200 epochs. The training speed was tested on shared server while many other tasks were also using the GPU, CPU and memory, so training may be faster for a clean server machine. Finally, it required 0.768s to obtain the registration result for a pair of point clouds with 2048 points. More details about testing time can be seen in  Table 6.

7 Feedback

  • General comments on the MVP Challenge 2021.

    We found participating in the Point cloud registration track on the MVP Challenge 2021 was very interesting and challenging. The data set considers different overlap, density and unrestricted rotation angles, having several differences with ModelNet40 which is used as registration benchmark in many recent works. We learned a lot from the competition and believe it will help us to design more robust registration algorithms based on multiple consideration.

    Unfortunately, I am a little disappointed with one thing: the data quality and metrics. As we observed in the validation set, there are some pairs with just plane objects (category 1), or symmetrical objects (categories 4, 6, 9, 15). Some ground truth results are not reasonable under the metrics used in this challenge, as we described in section 5

    . However, by re-cleaning the data set and improving the evaluation metrics, we think this issue can be solved, and a new and unified point cloud registration benchmark can be produced for the community.

  • What do you expect on a new competition on point cloud related tasks?

    In our opinion, and mentioned before, re-cleaning the data set, improving the evaluation metrics, and then setting up a new, unified point cloud registration benchmark (like 3DMatch [zeng20173dmatch], KITTI [geiger2012we]) for registration community is a good choice. As far as we know, some point cloud registration works are not compared fairly on ModelNet40, because they may use different sampled points, different transformation matrices and different partial point clouds generation. Besides, the overlap size, rotation angles and points density are set in easier configurations.

  • Other comments (if any): encountered difficulties, proposed tracks, proposed evaluation metric(s), proposed challenge platform, etc.

    Evaluation metrics should be improved, such as considering chamfer distance between the transformed point cloud and complete point cloud, or evaluating symmetric and asymmetric objects registration performance respectively.

    Also, there is a small shortcoming in the platform, in which each user can hide their submissions or best result.