Multi-View Partial (MVP) Point Cloud Challenge 2021 on Completion and Registration: Methods and Results

12/22/2021
by   Liang Pan, et al.
18

As real-scanned point clouds are mostly partial due to occlusions and viewpoints, reconstructing complete 3D shapes based on incomplete observations becomes a fundamental problem for computer vision. With a single incomplete point cloud, it becomes the partial point cloud completion problem. Given multiple different observations, 3D reconstruction can be addressed by performing partial-to-partial point cloud registration. Recently, a large-scale Multi-View Partial (MVP) point cloud dataset has been released, which consists of over 100,000 high-quality virtual-scanned partial point clouds. Based on the MVP dataset, this paper reports methods and results in the Multi-View Partial Point Cloud Challenge 2021 on Completion and Registration. In total, 128 participants registered for the competition, and 31 teams made valid submissions. The top-ranked solutions will be analyzed, and then we will discuss future research directions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 5

08/20/2020

Weakly-supervised 3D Shape Completion in the Wild

3D shape completion for real data is important but challenging, since pa...
04/20/2021

Variational Relational Point Completion Network

Real-scanned point clouds are often incomplete due to viewpoint, occlusi...
02/11/2021

HyperPocket: Generative Point Cloud Completion

Scanning real-life scenes with modern registration devices typically giv...
06/01/2021

Consistent Two-Flow Network for Tele-Registration of Point Clouds

Rigid registration of partial observations is a fundamental problem in v...
10/18/2021

Deep Models with Fusion Strategies for MVP Point Cloud Registration

The main goal of point cloud registration in Multi-View Partial (MVP) Ch...
11/25/2018

Multi-view Point Cloud Registration with Adaptive Convergence Threshold and its Application on 3D Model Retrieval

Multi-view point cloud registration is a hot topic in the communities of...
05/06/2022

Multi-view Point Cloud Registration based on Evolutionary Multitasking with Bi-Channel Knowledge Sharing Mechanism

Registration of multi-view point clouds is fundamental in 3D reconstruct...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

3D reconstruction for point clouds has been extensively explored in past decades. Thanks to the rapid development of deep learning recently, many researchers study learning-based approaches to perform single-view completion 

[30, 10, 12, 31, 28] and multi-view registration [17, 6, 29, 3, 4, 11, 33] for high-quality 3D reconstruction. However, completion and registration for partial point clouds are far from being fully resolved by existing methods.

Recently, we have been established a versatile multi-view partial (MVP) point cloud dataset [12], which contains over 100,000 high-quality virtual-scanned partial point clouds and complete point clouds. Employing the MVP [12] dataset, we organized the Multi-View Partial Point Cloud Challenge 2021 on Completion and Registration (MVP Challenge111Challenge website:https://competitions.codalab.org/competitions/33430) collocated with the Workshop on Sensing, Understanding and Synthesizing Humans at ICCV2021222Workshop website: https://sense-human.github.io/. The MVP Challenge lasted for nine weeks, from Jul. 12th, 2021 to Sep. 12th, 2021. The goal of this challenge is to boost research on point cloud completion and registration. A total of 128 participants registered for our competition, and 31 teams made valid submissions. All participants are restricted to training their models using our prepared training data only for fair comparisons. On Oct. 18th, 2021, the top-3 ranked approaches for each track are selected and rewarded.

In the following sections, we will introduce the completion track (Sec. 2) and the registration track (Sec. 3) of the MVP challenge. For each track, we will describe the settings, analyze the top-ranked solutions, and discuss potential future research directions.

2 Single-View Partial Point Cloud Completion

Overview.

Given a partial observation, point cloud completion targets at reconstructing its complete 3D shape. After registering for the MVP challenge, each team is able to submit their completion results for evaluation on the Codalab platform. All models are required to be trained by using our prepared training data only, and those trained models that achieve the best performance on the test data set can probably provide high-quality completion results on the extra-test data set. We highlight that no additional strategies, such as pre-training, are allowed.

Dataset.

The MVP Challenge 2021 on Point Cloud Completion mainly employs the MVP dataset [12]

that we proposed in CVPR 2021. The MVP dataset is a large-scale multi-view partial point cloud dataset containing over 100,000 high-quality scans, which renders partial 3D shapes from 26 uniformly distributed camera poses for each 3D CAD model. It provides a Training set with 62,400 partial-complete point cloud pairs and a Test set with 41,800 pairs. Besides, we generate an extra-test set consisting of 59,800 partial-complete point cloud pairs following the same fashion, which is used for evaluating different completion methods in this challenge. We suggest that future research works should only use the Test set for evaluation instead of the extra-test set. Notably, each partial and/or complete point cloud has 2,048 points.

Evaluation Metric.

Considering the computation efficiency, we use the symmetric Chamfer Distance (CD) Loss for evaluating completion methods. Formally, the CD loss can be formulated as:

(1)

where and denote points that belong to two point clouds and , respectively.

Results.

The benchmark results are reported in Table 1. In the following subsections, we are going to summarize and report their methods and experiments according to their submitted reports.

Method Ranking CD ()
SPTNet 3 5.15
TSPCC 2 5.01
PointTr++ 1 5.01
Table 1: Top team results in MVP Completion Challenge 2021.

2.1 Solution of First Place

PoinTr++: Enhanced Geometry-Aware Transformers with Iterative Refinement


Team Members: Xumin Yu, Yongming Rao, Jiwen Lu, and Jie Zhou


General Method Description

Overall, the champion team uses PoinTr [28] to complete a point cloud from a partial input (shown in Fig. 1). Then they use multiple refinement blocks to iteratively denoise the prediction to produce the final point cloud.

  • Concatenation + refinement pipeline: Many previous works like PCN [30] and TopNet [15] adopt a pipeline that is more similar to reconstruction, encoding the input point cloud as a single feature and reconstructing the completed point cloud by a decoder like FoldingNet [26]. While PoinTr [28]

    adopts the concatenation strategy, where the final prediction comes from the concatenation results of the inputs and the outputs of the model. The reconstruction pipeline struggles to keep the details of the input point clouds while the concatenation pipeline is facing the problem that the final prediction may be discontinuous in appearance. They propose a pipeline that combines these two strategies. They add reconstruction modules to the concatenated point clouds to further refine the results and make the final point cloud smooth and continuous. To keep the details of the input, they only predict the position shift vector for each point in the refinement block. In the newly proposed pipeline, they effectively combine the advantages of the concatenation-based and reconstruction-based methods.

  • Iterative refinement: They further investigate the refinement strategy and propose the iterative refinement method, which iteratively refines the predicted point clouds using several refinement blocks. They concatenate the origin input and the prediction point cloud from the previous step and send it into the next refinement block.

    Figure 1: The pipeline of our PoinTr++. They use PoinTr [28] to complete a point cloud from a partial input. Then they use refinement blocks to iteratively denoise the prediction point cloud.

    A straightforward way to implement the iterative refinement is to add several refinement modules after PoinTr and optimize them in an end-to-end manner. However, it will cause two problems: 1) deeper model is harder to optimizer; 2) the end-to-end training brings a heavy computational cost and GPU memory consumption. Therefore, they propose to iteratively add refinement blocks after the base model during the training. When adding one more refinement block, they freeze the model before it and only train the newly added module, which can save the most computation costs and decompose the optimization problem into several simpler sub-problems.

    In their experiments, they select RENet proposed in the VRCNet paper [12] as their refinement block because of its high performance. The module takes the concatenated point clouds as inputs and refine the point cloud using a hierarchical encoder-decoder architecture with Edge-preserved Pooling (EP) and Edge-preserved Unpooling (EU) modules which effectively learn multi-scale structural relations.

Model CD F1-Score@1%
PCN [30] 9.77 0.32
TopNet [15] 10.11 0.308
CRN [16] 7.25 0.434
ECG [10] 6.64 0.476
VRCNet [12] 5.96 0.499
PoinTr [28] 6.15 0.456
PoinTr+ 5.13 0.511
PoinTr++ 4.93 0.525
Table 2: Comparisons of performance with existing methods on the MVP validation set. Both the input and output contain 2048 points. PoinTr+ use a single refinement module to produce the final prediction. CD loss multiplied by .

Training Description

The training process in the concatenation-refinement pipeline is multi-stage. They iteratively train each refinement module during the training phase. In our experiments on MVP benchmark, they add two refinement blocks after PoinTr, so a two-stage training is used.

In the first stage of the training phase, they jointly train a refinement module and PoinTr [28]. They set the learning rate to , batch size to 32, and weight decay to . The hidden dimension for PoinTr is set to 384. During the training, they use AdamW optimizer with WarmingUpCosLR scheduler to optimize the model. The L1 chamfer distance (CD-) is adopted as our training loss. Specifically, they calculate CD-

between four predicted point clouds (includes three coarse-grained predictions and one fine-grained prediction) and ground-truth. They adaptively adjust the weights for these four loss terms to obtain the final weighted loss during the training phase. In the first 10 epochs, they use the weights of [1, 1, 0.5, 0.1]. In the second 10 epochs, they use the weights of [1, 1, 1, 0.5]. For the remaining epochs of training, they use the weights of [1, 1, 1, 1]. They normalize the total loss by dividing the sum of the weights. They use an early-stop strategy in the first stage of training.

333They stop the training when the network converges on the test set.

In the second phase, they add a refinement block after the first one. The weights of PoinTr and the first refinement block come from the first stage, and they initialize the additional refinement block with the weights of the first refinement block. In this phase of training, they freeze the PoinTr and the first refinement block. They set the batch size to 16 in this phase. All other hyper-parameters are the same as the first phase.

They stop adding more refinement blocks to the model since the performance will not be significantly improved on the MVP benchmark. Therefore, they only use two refinement modules for the sake of efficiency.


Testing Description

In the test phase, they follow the standard procedure of point cloud completion. They send a partial point cloud containing 2048 points to the trained model and obtain a completed point cloud with 2048 points. Results comparing against the other applications are reported in Table 2.

Figure 2: The overall network architecture. The generation stage is used to generate the complete point cloud with the robustness of diversified incomplete structures. Subsequently, the refinement stage is used to refine the complete point clouds with discriminative underlying attributes of category label and global representation.

2.2 Solution of Second Place

Robust and Discriminative Two-Stage Point Cloud Completion with Semantic Refinement and IOI augmentation


Team Members: Mingye Xu, Xiaoyuan Luo, Kexue Fu, Peng Gao, Manning Wang, Yali Wang, and Yu Qiao.

denotes corresponding author

Figure 3: Incompletion-of-Incompletion data augmentation. They crop the original training incomplete point clouds randomly as the new incomplete input, and treat the original incomplete point cloud as the ground truth.

General Method Description

Fig. 2 illustrates the structure of their two-stage point cloud completion network, which consists of generation stage (Net1: VRCNet [12] with IOI data augmentation) and refinement stage (Net2: their Conditional Refining Network). The IOI data augmentation can increase the diversity of the incomplete point clouds to boost the generality of the generation network. Then they use their Condition Refining Network (CRNet) to make more detailed refinement with the aid of semantic category information and shape codes.

Figure 4: Their Conditional Refining Network includes three consecutive multi-scale SPD modules, and use conditional modulation module to adjust the displacement features.

Robust Point Cloud Generation: VRCNet with IOI Augmentation.

Their generation network is built upon the VRCNet [12] which consists of two consecutive encoder-decoder sub-networks that serve as “probabilistic modeling” (PMNet) and “relational enhancement” (RENet). PMNet embeds global shape representation and latent distributions from the partial inputs and generates the coarse skeletons. Then RENet strives to enhance structural relations by learning multi-scale local point features, and reconstruct the fine complete point clouds on coarse skeletons.

  • IOI (Incompletion-of-Incompletion) Augmentation: To increase the robustness of point cloud generation, they propose a novel Incompletion-of-Incompletion (IOI) data augmentation method. As Figure 3 shows, they can crop the incomplete point cloud randomly and feed it to the model to reconstruct the original incomplete point cloud. This augmentation is aiming at increasing the diversity of global features and latent distributions from PMNet and makes the RENet more generalization capacity to variations of incomplete structures. Verified by their experiments in Table 5, the proposed data augmentation can indeed improve the performance of the complement network.

  • Self-Supervised Pretraining by Point Cloud Reconstruction: They also investigate the pre-training mechanism. The self-supervised reconstruction pre-training can reach a good initial point across downstream fine-tuning completion tasks. In this way, it can lead to wider optima and is easier to optimize compared with training from scratch. This can also improve the method performance, which is shown in Table 5.

Figure 5: Conditional Modulation module.

Discriminative Point Cloud Refinement: Condition Refining Network with Semantic Guidance.

Their refining network is aiming at refining the complete point cloud with more geometry details and more semantic information. Figure 4 indicates the structure of their Conditional Refining network (CRNet), where Conditional Modulation module can effectively adjust point-wise representation with semantic guidance, while Multi-Scale SPD module can refine the point cloud to show more geometrical structures with multi-scale context aggregation. Details will be described below.

Figure 6: Multi-scale SPD Module with multi-scale skip transformers.
Method CD F1-Score@1%
PCN [30] 9.77 0.320
TopNet [15] 10.11 0.308
MSN[8] 7.90 0.432
Wang et. al.[16] 7.25 0.434
ECG[10] 6.64 0.476
VRCNet [12] 5.96 0.499
CRNet 5.27 0.535
Table 3: Shape completion results (CD loss multiplied by ) with various resolutions on the MVP dataset(2,048 points).
Method CD F1-Score@1%
PCN [30] 6.02 0.638
TopNet [15] 6.36 0.601
MSN[8] 4.90 0.710
Wang et. al.[16] 4.30 0.740
ECG[10] 3.58 0.753
GRNet [25] 3.87 0.692
NSFA [32] 3.77 0.783
VRCNet [12] 3.02 0.796
CRNet 2.51 0.824
Table 4: Shape completion results (CD loss multiplied by ) with various resolutions on the MVP dataset(16,384 points).
  • Conditional Modulation Module: The utilization of underlying shape attributes (global shape codes and semantic category information) can encourage the local representation closer to the global discrimination of the same object, which can be applied as the guidance of the point cloud refinement. Existing methods only merge the global information through the concatenation with the local representation, however the concatenation is not effective enough and it largely increases the weight of MLPs (Model F in Table 5). These methods also ignore the important category information which contains discriminative semantics. To this end, they propose a lightweight Conditional Modulation Module for point cloud refinement. In addition to achieving the adjustment on global point cloud representation, the proposed module can be easily extended to learn local enhancement effects for point cloud refining.

    Model Based model Generation (Net1) Refinement (Net2) Strategy description CD (Test) CD (Extra-Test)
    A(baseline) - VRCNet - - 5.96 6.08
    B A VRCNet - Self-supervised pre-training 5.78 (0.18) 5.91 (0.17)
    C A VRCNet - IOI augmentation 5.83 (0.13) 5.93 (0.15)
    D C VRCNet Spatial Refiner [7] Add refining module 5.66 (0.20) 5.76 (0.17)
    E C VRCNet SPD [24] Add refining module 5.41 (0.42) 5.50 (0.43)
    F E VRCNet CRNet (SPD) Concatenate shape codes 5.41 (0.00) 5.50 (0.00)
    G E VRCNet CRNet (SPD) Add Conditional Modulation module 5.32 (0.09) 5.41 (0.09)
    H G VRCNet CRNet (Multi-scale SPD) Add multi-scale skip transformers 5.27 (0.05) 5.35 (0.06)
    Table 5: Ablation studies of our method on MVP completion challenge. Training data: MVP training set; VAL data: MVP testing set; Public Test: MVP Extra-Test set. (CD loss multiplied by )
    Model Based model Generation (Net1) Refinement (Net2) Strategy description CD (Public Test)
    A(baseline) - VRCNet - - 5.79
    C A VRCNet - IOI augmentation 5.61 (0.18)
    D C VRCNet Spatial Refiner [7] Add refining module 5.58 (0.03)
    E C VRCNet SPD [24] Add refining module 5.33 (0.28)
    G E VRCNet CRNet (SPD) Conditional modulation module 5.06 (0.27)
    H G VRCNet CRNet (Multi-scale SPD) Add multi-scale skip transformers 5.01 (0.05)
    Table 6: Ablation studies of our method on MVP completion challenge. Training data: MVP training set and test set; Public Test: MVP extra-test set. (CD loss multiplied by )

    As Figure 6 shows, to enable the network to have the ability of handling operations that require semantic category information and global shape codes, they modulate the intermediate displacement features of the CRNet as follows:

    (2)

    where denotes the element-wise multiplication operation and is MLP layers, is the intermediate displacement features from Conditional Refining Network,

    are affine parameters that are estimated from the point cloud category labels

    and point cloud global codes from the previous generation network:

    (3)
    (4)

    where and are all MLP layers.

    They use conditional vector to affect the cluster centers of the local representation and use conditional vector

    to fine-tune the variance in the feature space. Thus, they can achieve point feature global adjustment with only a few parameters. The local features are encouraged to be more closer of the same object than features of other objects, such that the local representations of each object can be affected by the distinct semantic information and global shape codes. Therefore, the model is not easy to be confused with similar local structures under different semantic information.

  • Multi-Scale SPD module: To reveal fine local geometric details on the complete shape, existing methods[16, 30, 32] usually adopt folding-based strategy [26] to obtain the variations for learning different displacements for the duplicated points. However, the folding-based strategy ignores the local shape characteristics contained in the original point due to the same 2D grids for sampling. Different from the folding based strategy, SnowflakeNet [24] use SPD (Snowflake Point Deconvolution) to reformulate the generation of child points from parent points as a growing process of snowflake, where the shape characteristic embedded by the parent point features is extracted and inherited into the child points through a point-wise splitting operation. They also introduce a novel skip-transformer [24] to learn splitting patterns in SPD module which can learn shape context and spatial relationship between child points and parent points.

    Figure 7: The framework of SPTNet.

    Their CRNet aims to refine the local geometric details on the complete point cloud. Inspired by SnowflakeNet [24], they use a similar structure as SPD. Different from SPD [24], their input is predicted completion point cloud with points from Net1, and they did not use the point-wise splitting operation to increase the number of points. In contrast, they only obtain the variations of coordinates on each point in their multi-scale SPD module as Fig. 6 shows. In order to progressively refine the local geometric details, three multi-scale SPDs are used in their Conditional Refining network, which is shown in Fig. 4. To facilitate consecutive multi-scale SPDs to refine points in a coherent manner, they use a skip-transformer to learn and refine the spatial context from different layers. Moreover, to improve the robustness to the variations on the diversity of local structures, they apply multi-scale skip transformers with different local regions in their multi-scale SPD module.

    As Fig. 6 shows, in the -th multi-scale SPD module, they take the refined point cloud from previous layer as , and they extract the per-point feature from and global shape codes by the basic PointNet [13]. Then they send the displacement feature from the previous conditional modulation module and into two skip transformers with different local regions for local feature learning. Then they fed the multi-scale local features to MLP, and obtained the displacement feature of the current layer. They can use to generate the point displacement :

    (5)

    where is hyper-tangent activation. Finally, the output point cloud is update as:

    (6)

Training description

  • Stage one: They train the generation network with their IOI augmentation. The training strategy is same as VRCNet [12]. The batch size is set to 64. A total of 150 epochs were executed. The optimizer is Adam optimizer. The training settings are similar with VRCNet [12].

  • Stage two: They add the refining network (their proposed CRNet) and remove the data augmentation. The learning rate is initialized as . The batch size is set to 64. A total of 70 epochs were executed. In this stage, they only train the refining network , and the generation part (Net1 in Fig. 2) of the model is fixed after training in the first stage.


Testing description They follow the official testing strategy without any aftertreatments. Following [10]

, we compare our method with other evaluated methods on MVP original dataset (2,048 points and 16,384 points), the evaluated CD loss and F-score are reported in Table

3 and 4. Our method outperforms other methods in terms of CD and F-score@1%.

2.3 Solution of Third Place

Learning Spherical Point Transformation for Point Cloud Completion


Team Members: Junsheng Zhou, Xin Wen, Peng Xiang, Yu-Shen Liu, and Zhizhong Han

denotes equally contribution

denotes corresponding author


General Method Description

Inspired by PMPNet[21]

, which learns the point moving path between the incomplete and complete points, they design a novel deep neural network for point cloud completion by learning spherical point transformation (SPTNet).

As shown in Fig. 7, they first randomly sample points on the standard sphere and generate a complete point cloud by moving each spherical point. As a result, the network learns a strict and unique correspondence on point-level and thus improves the quality of the predicted complete shape. Moreover, due to the uniformity of spherical points, the predicted complete shape generated by the network also retains the uniformity. In order to fine-tune the results, they added the RENet[12] at the end of the network. Different from other supervised[20, 24] or unsupervised[19] point cloud completion methods, SPTNet makes full use of the spherical distribution, so it achieves better results. And they learned a non-end-to-end point movement based RENet for further fine-tuning. In summary, the main contributions of their work are listed as follows:

  • They propose a novel network for point cloud completion, named SPTNet, which moves spherical points to generate a complete point cloud in high accuracy.

  • They propose to generate complete shapes in a coarse-to-fine manner. After spherical points transformation, they apply RENet in both end-to-end and non-end-to-end manners, which justified their idea of point movement based shape generation for fine-tuning.

  • They explore the feasibility of leveraging a transformer-based network to learn point-wise features in the encoder, which captures more local information between points.


Training Description

They did not pre-train their network under any additional datasets or adopt any pre-trained models. And they did not do any enhancement to the data. In the training process, they adopt a two-stage training strategy. In the first stage, SPTNet with a generative RENet is trained end-to-end. And in the second stage, based on the results generated in the first stage, they train an additional point movement based RENet for each class, which is non-end-to-end. And they use Chamfer Distance as the loss function during training. Adam optimizer is used for all networks with an initial learning rate of 0.0001, and the learning rate is multiplied by 0.7 every 40 epochs. The batch size is set to 16, and the total number of training epochs is set to 100. They use PyTorch to implement the method. All the models are trained on a NVIDIA 2080Ti GPU with 11GB memory consumption. It takes about 24 hours for training and 45 minutes for testing. And there is no human effort required for implementation, training, and validation.


Testing Description

During testing, the first pass the test data through SPTNet, and then input it into different point movement based RENet according to the category of the model.

3 Partial-to-Partial Point Cloud Registration

Overview.

Besides completion for single-view partial point clouds, researchers often perform Partial-to-Partial Registration (PPR) for 3D reconstruction. However, previous methods usually perform PPR on uniformly dis ModelNet40 [23]

under restricted rotations in [0, 45°]. On the contrary, we use virtual-scanned partial point cloud pairs in the MVP registration challenge, and many partial point cloud pairs are under unrestricted rotations in [0, 180°]. These settings are more similar to observations in real applications, such as 6D pose estimation, which, however, challenges existing object-centric PPR methods.

Method Ranking Rot Error Trans Error MSE
IM-Net 3 2.91 0.027 0.078
ROPNet + PREDATOR 2 2.97 0.026 0.078
Hybrid Optimization 1 2.92 0.021 0.072
Table 7: Top team results in MVP Registration Challenge 2021.

Dataset.

We generate partial point cloud pairs from the MVP dataset [12], and a successful pair is selected if sufficient overlapped areas are detected. In total, we generate a training set with 6,400 paired partial point clouds, a test set with 1,200 pairs, and an extra-test set with 2,000 pairs. In the test set and extra-test set, most relative rotations are within [0, 45°], and the rest have unrestricted rotations [0, 180°]. The ratio is roughly 4 : 1. Note that the source and the target are two different incomplete point clouds scanned from the same CAD model.

Evaluation Metric.

We mainly use three metric, including isotropic rotation error , l2 translation error and an MSE error considering both rotations and translations. Those metric functions are defined as:

(7)
(8)
(9)

where rad2deg converts angles from radians to degrees, and deg2rad is the inverse operation. , , and are groundtruth rotations, predicted rotations, groundtruth translations and predicted translations, respectively.

Results.

The benchmark results are reported in Table 7.

3.1 Solution of First Place

Hybrid Optimization Method with Unconstrained Variables


Team Members: Yuanjie Yan and Junyi An


General Method Description

In the MVP registration, they introduces a new set of variables to directly represent the transformation from point cloud and point cloud , where , , and denote the rotation direction, rotation angle, translation direction and translation distance, respectively. They map restricted variables to unrestricted variables to optimize the translation vector. Then, they use a variant of CD loss as optimization function. Finally, they initialize the variables and use strategies to optimize them.

Transformation matrix with optimized variables: The rotation matrix can be represented by rotation direction and angle.

(10)

where

(11)

can be any direction vector, and the range of is in the interval .

Local
CD loss
Projected
CD loss
Unconstrained
T
Strategies
Rot
Error
Trans
Error
MSE
3.2654 0.0230 0.0799
3.2350 0.0292 0.0796
9.8391 0.0239 0.2149
2.9497 0.0211 0.0726
Table 8: The ablation experiments of the proposed method.

The translation vector can be represented by translation direction and distance.

(12)

To ensure the legitimacy of the translation, they provide a constraint on the translation distance: .

The constraint with variables is known as ”box constraint” in the optimization literature. There are three different methods of approaching this problem following [1]:

  • Projected gradient descent performs one step of standard gradient descent and then clips all the coordinates to be within the box.

  • Clipped gradient descent does not clip the original target on each iteration; rather, it incorporates the clipping into the objective function to be minimized.

  • Change of variables introduces a new variable instead of optimizing the original variable.

They choose the third approach as a smoothing of clipped gradient descent that eliminates the problem of getting stuck in extreme regions. In this case, the translation vector is given by:

(13)

where the constrained is replaced by term . For similar ideas, they also proposed a mapping method using the function.

(14)

They try several mapping functions. The results show that eq. (14) is the most effective mapping to reduce the Trans error.

Loss: To optimize the variables, they introduce the chamfer distance (CD) loss, which can help to find the point correspondences in the optimization process. The standard CD loss is given by:

(15)

Moreover, they propose two variants of CD loss to improve the registration of MVP, Local CD Loss and Projected CD Loss .

Method Rot Error Trans Error MSE
DCP 27.4702 0.1042 0.5836
IDAM 20.6607 0.1482 0.5088
Euler angles 13.7002 0.0488 0.2879
6-D 9.6956 0.0442 0.2134
Hybrid Optimization 2.9187 0.0206 0.0716
Table 9: Registration comparison between different methods.
  • Local CD loss. The local CD loss on point clouds reduces the limit of matching all points, improving the tolerance for point cloud distribution differences.

    (16)

    where

    is a hyperparameter between

    and . and are subsets of and , and .

  • Projected CD loss. The local CD loss only pays attention to the local alignment but ignores the global point cloud matching. They project onto the , , plane and calculate the projected CD loss as follows.

    (17)

    where is a projected plane.

The final optimization function is given by:

(18)

where is balance weights.

Rot Error Trans Error MSE
2.9751 0.0254 0.02106 0.0003 0.07299 0.0004
2.9187* 0.0206* 0.0716*

Table 10: The first line: Results of repeated experiments. The second line: The result of the proposed method with eight intervals.

Training description

The proposed method does not require training.


Testing Description

Predicting rigid transformation by gradient descent is a non-convex optimization problem. Therefore, it is crucial for optimization from multiple initializations. In their method, they initialize the term into rotation directions. As for the initial angle, they make initial value in the interval . The result with the smallest loss is selected as the rotation matrix. They set a threshold for the optimization loss. When the optimization loss is bigger than the threshold, they will initialize the angle in the intervals , , and , respectively, and repeat the optimization. This optimization strategy divided intervals tends to solve the problem of the symmetric point cloud model. It tries to avoid the situation of rotating more than or

. This strategy coincides with the prior probability of the dataset. Most relative rotations are restricted in

, and the rest have unrestricted rotations . The ratio is roughly .

Figure 8: Overview of ROPNet registration pipeline. The CG module consumes the source (green) and target (red) point clouds, and outputs initial pose and overlapping points (non-overlapping points are in black). The TFMR module takes the output of CG module as input, and generates accurate correspondences. The FMR step removes false correspondences (blue lines) and keeps some positive correspondences (gray lines).
Figure 9: The pipeline of our solution to the MVP Registration Challenge. The point cloud in green is the source point cloud, and the point cloud in red is the target point cloud.

3.2 Solution of Second Place

Deep Models with Fusion Strategies


Team Members: Lifa Zhu, Changwei Lin, Dongrui Liu, Xin Li, Francisco Gómez-Fernández.


Figure 10: Left: Architecture of the CG module. CG module consumes source (in green) and target (in red) data, and outputs overlap score (, ) and initial transformation matrix . Right: Details of information interaction. It takes point features and global features as input and outputs fused point features based on the pair.

Figure 11: Left: Overview of Transformer-based feature matching removal (TFMR) module. TFMR module takes the transformed source and target as input, and outputs representative points and their correspondences. and are the output from CG module that denote initial alignment and overlap score for . Right: Details of feature matching removal (FMR). It takes correspondences for overlapping source points and outputs accurate correspondences (gray lines).

General Method Description

They propose to fuse ROPNet [33] and PREDATOR[4] to solve point cloud registration in the MVP Challenge. ROPNet and the pipeline of their solution to this challenge can be seen in Fig. 8 and Fig. 9.

In ROPNet, they proposed a context-guided module that uses an encoder to extract global features for predicting point overlap score and introduced a Transformer to enrich point features and remove non-representative points based on point overlap score and feature matching.

Considering unrestricted rotations, they used PREDATOR in their pipeline. In PREDATOR source code, they found and solved a simple but important GNN bug that helps the network to obtain a higher performance. Then, they adjusted parameters in PREDATOR for the MVP registration challenge. Inspired by the idea of partial-to-complete proposed in ROPNet, they try to remove some points in source data based on the predicted scores and keep all points in target data during RANSAC iterations.

Based on the above discussions, they designed a few ensemble strategies based on data characteristics to help fuse ROPNet and PREDATOR to estimate the final rigid transformation. Experiments on the test set showed it is effective in most cases, with few fails. ROPNet can be seen in Fig. 8. Their proposed context-guided (CG) module and Transformer-based Feature Matching Removal (TFMR) Module in ROPNet can be seen in Fig. 10 and Fig. 11. The pipeline of their solution is shown in Fig. 9.

Model rot_level Rot Error Trans Error MSE
RPMNet_corr [34] 0 12.5560 0.1674 0.3865
ROPNet [33] 0 1.0449 0.0193 0.0375
RPMNet_corr [34] 0, 1 21.9685 0.2062 0.5896
ROPNet [33] + PREDATOR [4] 0, 1 3.1666 0.0292 0.0845
Table 11: Comparison to other approaches on val set.
Figure 12:

Overview of the proposed method. Feature extraction is a shared module.

and represent point clouds after sampling.

Training description

They trained ROPNet and PREDATOR independently. All 2048 points were involved in training for the two models. For ROPNet, they trained for 600 epochs using Adam optimizer with an initial learning rate of 0.0001. The learning rate changes using a cosine annealing schedule. They trained ROPNet in a non-iterative manner. However, they run 2 iterations for the TFMR module during the test. It is noted that they only trained ROPNet for small rotation angles ranging from 0° to 45°. Also, they did not use Point Pair Features [2], because they could not get accurate normal vectors in the MVP challenge data.

For PREDATOR, following the code444https://github.com/overlappredator/OverlapPredator released by the authors, they trained on MVP registration dataset for 200 epochs using SGD with 0.98 momentum. The initial learning rate was 0.01, with an exponential decay factor of 0.95 every epoch. They trained PREDATOR under unrestricted rotation angles ranging from 0° to 360°. In addition, they adjusted some parameters such as voxel size to 0.04, sampled points in circle loss, and others in loss implementation.


Testing Description

For each source and target point cloud pair, they estimate transformations and based on ROPNet and PREDATOR, respectively. is the output predicted from source to target using the end-to-end ROPNet model. is also the transformation from source to target, which is obtained with RANSAC using features and keypoints provided by PREDATOR. They select or based on their proposed ensemble rules. They compare their method with RPMNet_corr [34], which is the variant of RPMNet [27] to help solve registration with unrestricted rotations. The results in Table 11 show that their ROPNet achieves a much lower registration error than RPMNet_corr when rot_level is 0. When rot_level is not restricted, the ensemble model of ROPNet and PREDATOR also achieves much lower registration error than RPMNet_corr.

3.3 Solution of Third Place

IM-Net: Partial Point Cloud Registration with Inliers Prediction and Matching


Team Members: Qinlong Wang and Yang Yang

Figure 13: ISS module details. is a learnable parameter and shared for all points.

General Method Description

Their work titled with, IM-Net: Partial Point Cloud Registration with Inliers Prediction and Matching is built with 4 main components, as illustrated in Figure 12.

  • Firstly, they use DGCNN [18] to extract the per point initial feature in each point cloud.

  • To extract identifying matching features, each point feature of two point cloud is then transformed via a graph neural network constructed by interleaving intra- and cross-graph aggregations, i.e., edges of intra-graph connect all points in the same point cloud and edges of cross-graph connect per point with all points in another point cloud. They introduce a fully connected, equal-weighted intra-graph to learn the relative structure information of each point cloud, in which a large receptive field is beneficial to clarifying ambiguous features. And an attention-based cross-graph is adopted to communicate information between two point clouds for identifying point features. To better learn the structure information, they embed the coordinate of each point with MLP to the same dimension with the initial feature and then concatenate two features before feeding into GNN.

  • Then an Inlier points Sampling Sinkhorn (ISS) module samples and matches confidential inlier points in overlapping parts. Firstly, inlier scores for all points are predicted to prepare sampling candidates. Especially in the training phase, they sample half of the points with higher scores and half of the points with lower scores. And in the test phase, the sample is simply to select all candidates with higher scores. Then, they construct a new similarity matrix with dustbin by sampling features. To further handle outliers lain in sampled points, the dustbin score is initialized with inlier scores. Next, they feed the similarity matrix into Sinkhorn algorithm to generate a matching confidence map.

  • Finally, a bidirectional correspondence is constructed by concatenating source to target correspondence, and the symmetric one, i.e. correspondence, is selected by the matches with the highest confidence in each row and column of the matching confidence map. The introduced correspondence can generate more matching pairs for low overlapping data and show better results than the correspondence based on mutual check. Each correspondence is weighted by the corresponding confidence. They compute the transformation using SVD.

Model Details: For the submitted model, they interleave the intra- and cross-graph for 10 layers. The dimension of the initial features extracted by DGCNN [18]

is 64, and the feature dimension in GNN and ISS module remains at 128 after concatenation. The ISS module will sample 1/6 points for each point cloud. They use the same evaluation metrics with the official implementation of MVP benchmark.

Dataset Preprocess Error Recall
Rot. Trans. MSE RMSE
MVP test [11] - 2.5008 0.0305 0.0742 0.0382 0.9583
MVP test [11] add noise 3.2183 0.0311 0.0872 0.0447 0.9633
ModelNet40 [23] sample rotation 3.7722 0.0120 0.0779 0.0236 0.9617
Table 12: Performance on MVP test data with Gaussian noise and ModelNet40.

Implementation Details: The loss function used for training consists of 4 parts. The first two parts are a cross-entropy loss for inlier scores and a peaky loss inspired by the loss function in R2D2 [14] to maximize the peakiness of the overlap scores. The last two parts are a cross-entropy loss for match confidence map following the implementation of the loss function for RPMNet in [34] and a cross-entropy loss for dustbin confidence. They weighted the peaky loss with 0.25 and the dustbin loss with 0.5.


Training description

They train their network using MVP training data and validate the model with MVP test data. During training, they random sample each point cloud to 1024 points for saving memory and augmenting the overlap rate of data. Moreover, the ISS module will sample half of the positive samples and half of the negative samples, i.e. matching and unmatching points. The network is trained using ADAM optimizer [5] with a learning rate of 0.0005. The submitted model is refined within 500 epochs using a learning rate of 0.0001 after 1000 epochs. The network often converges after 500 epochs.


Testing Description

During the test time, they remain 2048 points for inference, and ISS module will sample points with higher inlier scores. They further select half of the bidirectional correspondence with higher correspondence confidence to compute the transformation.

4 Discussion

4.1 Completion

Summary. Generally, the top-3 methods significantly outperform baseline methods. PoinTr++ employs a global transformer for generating the missing parts, followed by using RENet. CRNet proposes the IOI augmentation and multi-scale SPD module to achieve semantic-aware completion. SPTNet makes full use of the spherical distribution, and it learns a non- end-to-end point movement based RENet for further fine-tuning.

Future Directions. In this challenge, we use CD loss for evaluating different completion methods. However, CD loss is not sensitive to global distribution, and the completion results can be biased to partial observations. New metrics, such as BCD [22] and EMD, can be leveraged for evaluation. Recently, diffusion models [9] provide impressive point cloud completion results, which also circumvent the imbalance issue. In addition, unsupervised [31] or self-supervised [19] point cloud completion can also be studied.

4.2 Registration

Summary. In a nutshell, the top-3 methods achieve surprisingly good registration results for partial-to-partial point cloud registration, especially for those with unrestricted rotations. The 1st place method uses a non-learning hybrid optimization. The 2nd place uses an assembling strategy with ROP [33] and PREDATOR [4]. The 3rd place uses cross-graph connections and ISS module.

Future Directions. Although they achieve outstanding registration results, the PPR problem has not been fully resolved, as: 1) hybrid optimization requires multiple initializations; 2) the assembling method is not elegant or efficient; 3) the 3rd place method heavily relies on cross-graph connections and requires overall 1500 epochs for training. Moreover, those registration methods did not take full advantage of pose-invariant features (e.g. PPF), which can facilitate full-range PPR [11].

Acknowledgement

We sincerely thank Yuanhan Zhang for helpful discussions.

References