1 Introduction
3D reconstruction for point clouds has been extensively explored in past decades. Thanks to the rapid development of deep learning recently, many researchers study learningbased approaches to perform singleview completion
[30, 10, 12, 31, 28] and multiview registration [17, 6, 29, 3, 4, 11, 33] for highquality 3D reconstruction. However, completion and registration for partial point clouds are far from being fully resolved by existing methods.Recently, we have been established a versatile multiview partial (MVP) point cloud dataset [12], which contains over 100,000 highquality virtualscanned partial point clouds and complete point clouds. Employing the MVP [12] dataset, we organized the MultiView Partial Point Cloud Challenge 2021 on Completion and Registration (MVP Challenge^{1}^{1}1Challenge website:https://competitions.codalab.org/competitions/33430) collocated with the Workshop on Sensing, Understanding and Synthesizing Humans at ICCV2021^{2}^{2}2Workshop website: https://sensehuman.github.io/. The MVP Challenge lasted for nine weeks, from Jul. 12th, 2021 to Sep. 12th, 2021. The goal of this challenge is to boost research on point cloud completion and registration. A total of 128 participants registered for our competition, and 31 teams made valid submissions. All participants are restricted to training their models using our prepared training data only for fair comparisons. On Oct. 18th, 2021, the top3 ranked approaches for each track are selected and rewarded.
2 SingleView Partial Point Cloud Completion
Overview.
Given a partial observation, point cloud completion targets at reconstructing its complete 3D shape. After registering for the MVP challenge, each team is able to submit their completion results for evaluation on the Codalab platform. All models are required to be trained by using our prepared training data only, and those trained models that achieve the best performance on the test data set can probably provide highquality completion results on the extratest data set. We highlight that no additional strategies, such as pretraining, are allowed.
Dataset.
The MVP Challenge 2021 on Point Cloud Completion mainly employs the MVP dataset [12]
that we proposed in CVPR 2021. The MVP dataset is a largescale multiview partial point cloud dataset containing over 100,000 highquality scans, which renders partial 3D shapes from 26 uniformly distributed camera poses for each 3D CAD model. It provides a Training set with 62,400 partialcomplete point cloud pairs and a Test set with 41,800 pairs. Besides, we generate an extratest set consisting of 59,800 partialcomplete point cloud pairs following the same fashion, which is used for evaluating different completion methods in this challenge. We suggest that future research works should only use the Test set for evaluation instead of the extratest set. Notably, each partial and/or complete point cloud has 2,048 points.
Evaluation Metric.
Considering the computation efficiency, we use the symmetric Chamfer Distance (CD) Loss for evaluating completion methods. Formally, the CD loss can be formulated as:
(1) 
where and denote points that belong to two point clouds and , respectively.
Results.
The benchmark results are reported in Table 1. In the following subsections, we are going to summarize and report their methods and experiments according to their submitted reports.
Method  Ranking  CD () 
SPTNet  3  5.15 
TSPCC  2  5.01 
PointTr++  1  5.01 
2.1 Solution of First Place
PoinTr++: Enhanced GeometryAware Transformers with Iterative Refinement
Team Members: Xumin Yu, Yongming Rao, Jiwen Lu, and Jie Zhou
General Method Description
Overall, the champion team uses PoinTr [28] to complete a point cloud from a partial input (shown in Fig. 1). Then they use multiple refinement blocks to iteratively denoise the prediction to produce the final point cloud.

Concatenation + refinement pipeline: Many previous works like PCN [30] and TopNet [15] adopt a pipeline that is more similar to reconstruction, encoding the input point cloud as a single feature and reconstructing the completed point cloud by a decoder like FoldingNet [26]. While PoinTr [28]
adopts the concatenation strategy, where the final prediction comes from the concatenation results of the inputs and the outputs of the model. The reconstruction pipeline struggles to keep the details of the input point clouds while the concatenation pipeline is facing the problem that the final prediction may be discontinuous in appearance. They propose a pipeline that combines these two strategies. They add reconstruction modules to the concatenated point clouds to further refine the results and make the final point cloud smooth and continuous. To keep the details of the input, they only predict the position shift vector for each point in the refinement block. In the newly proposed pipeline, they effectively combine the advantages of the concatenationbased and reconstructionbased methods.

Iterative refinement: They further investigate the refinement strategy and propose the iterative refinement method, which iteratively refines the predicted point clouds using several refinement blocks. They concatenate the origin input and the prediction point cloud from the previous step and send it into the next refinement block.
A straightforward way to implement the iterative refinement is to add several refinement modules after PoinTr and optimize them in an endtoend manner. However, it will cause two problems: 1) deeper model is harder to optimizer; 2) the endtoend training brings a heavy computational cost and GPU memory consumption. Therefore, they propose to iteratively add refinement blocks after the base model during the training. When adding one more refinement block, they freeze the model before it and only train the newly added module, which can save the most computation costs and decompose the optimization problem into several simpler subproblems.
In their experiments, they select RENet proposed in the VRCNet paper [12] as their refinement block because of its high performance. The module takes the concatenated point clouds as inputs and refine the point cloud using a hierarchical encoderdecoder architecture with Edgepreserved Pooling (EP) and Edgepreserved Unpooling (EU) modules which effectively learn multiscale structural relations.
Model  CD  F1Score@1% 
PCN [30]  9.77  0.32 
TopNet [15]  10.11  0.308 
CRN [16]  7.25  0.434 
ECG [10]  6.64  0.476 
VRCNet [12]  5.96  0.499 
PoinTr [28]  6.15  0.456 
PoinTr+  5.13  0.511 
PoinTr++  4.93  0.525 
Training Description
The training process in the concatenationrefinement pipeline is multistage. They iteratively train each refinement module during the training phase. In our experiments on MVP benchmark, they add two refinement blocks after PoinTr, so a twostage training is used.
In the first stage of the training phase, they jointly train a refinement module and PoinTr [28]. They set the learning rate to , batch size to 32, and weight decay to . The hidden dimension for PoinTr is set to 384. During the training, they use AdamW optimizer with WarmingUpCosLR scheduler to optimize the model. The L1 chamfer distance (CD) is adopted as our training loss. Specifically, they calculate CD
between four predicted point clouds (includes three coarsegrained predictions and one finegrained prediction) and groundtruth. They adaptively adjust the weights for these four loss terms to obtain the final weighted loss during the training phase. In the first 10 epochs, they use the weights of [1, 1, 0.5, 0.1]. In the second 10 epochs, they use the weights of [1, 1, 1, 0.5]. For the remaining epochs of training, they use the weights of [1, 1, 1, 1]. They normalize the total loss by dividing the sum of the weights. They use an earlystop strategy in the first stage of training.
^{3}^{3}3They stop the training when the network converges on the test set.In the second phase, they add a refinement block after the first one. The weights of PoinTr and the first refinement block come from the first stage, and they initialize the additional refinement block with the weights of the first refinement block. In this phase of training, they freeze the PoinTr and the first refinement block. They set the batch size to 16 in this phase. All other hyperparameters are the same as the first phase.
They stop adding more refinement blocks to the model since the performance will not be significantly improved on the MVP benchmark. Therefore, they only use two refinement modules for the sake of efficiency.
Testing Description
In the test phase, they follow the standard procedure of point cloud completion. They send a partial point cloud containing 2048 points to the trained model and obtain a completed point cloud with 2048 points. Results comparing against the other applications are reported in Table 2.
2.2 Solution of Second Place
Robust and Discriminative TwoStage Point Cloud Completion with Semantic Refinement and IOI augmentation
Team Members: Mingye Xu, Xiaoyuan Luo, Kexue Fu, Peng Gao, Manning Wang, Yali Wang, and Yu Qiao.
denotes corresponding author
General Method Description
Fig. 2 illustrates the structure of their twostage point cloud completion network, which consists of generation stage (Net1: VRCNet [12] with IOI data augmentation) and refinement stage (Net2: their Conditional Refining Network). The IOI data augmentation can increase the diversity of the incomplete point clouds to boost the generality of the generation network. Then they use their Condition Refining Network (CRNet) to make more detailed refinement with the aid of semantic category information and shape codes.
Robust Point Cloud Generation: VRCNet with IOI Augmentation.
Their generation network is built upon the VRCNet [12] which consists of two consecutive encoderdecoder subnetworks that serve as “probabilistic modeling” (PMNet) and “relational enhancement” (RENet). PMNet embeds global shape representation and latent distributions from the partial inputs and generates the coarse skeletons. Then RENet strives to enhance structural relations by learning multiscale local point features, and reconstruct the fine complete point clouds on coarse skeletons.

IOI (IncompletionofIncompletion) Augmentation: To increase the robustness of point cloud generation, they propose a novel IncompletionofIncompletion (IOI) data augmentation method. As Figure 3 shows, they can crop the incomplete point cloud randomly and feed it to the model to reconstruct the original incomplete point cloud. This augmentation is aiming at increasing the diversity of global features and latent distributions from PMNet and makes the RENet more generalization capacity to variations of incomplete structures. Verified by their experiments in Table 5, the proposed data augmentation can indeed improve the performance of the complement network.

SelfSupervised Pretraining by Point Cloud Reconstruction: They also investigate the pretraining mechanism. The selfsupervised reconstruction pretraining can reach a good initial point across downstream finetuning completion tasks. In this way, it can lead to wider optima and is easier to optimize compared with training from scratch. This can also improve the method performance, which is shown in Table 5.
Discriminative Point Cloud Refinement: Condition Refining Network with Semantic Guidance.
Their refining network is aiming at refining the complete point cloud with more geometry details and more semantic information. Figure 4 indicates the structure of their Conditional Refining network (CRNet), where Conditional Modulation module can effectively adjust pointwise representation with semantic guidance, while MultiScale SPD module can refine the point cloud to show more geometrical structures with multiscale context aggregation. Details will be described below.
Method  CD  F1Score@1% 
PCN [30]  9.77  0.320 
TopNet [15]  10.11  0.308 
MSN[8]  7.90  0.432 
Wang et. al.[16]  7.25  0.434 
ECG[10]  6.64  0.476 
VRCNet [12]  5.96  0.499 
CRNet  5.27  0.535 
Method  CD  F1Score@1% 
PCN [30]  6.02  0.638 
TopNet [15]  6.36  0.601 
MSN[8]  4.90  0.710 
Wang et. al.[16]  4.30  0.740 
ECG[10]  3.58  0.753 
GRNet [25]  3.87  0.692 
NSFA [32]  3.77  0.783 
VRCNet [12]  3.02  0.796 
CRNet  2.51  0.824 

Conditional Modulation Module: The utilization of underlying shape attributes (global shape codes and semantic category information) can encourage the local representation closer to the global discrimination of the same object, which can be applied as the guidance of the point cloud refinement. Existing methods only merge the global information through the concatenation with the local representation, however the concatenation is not effective enough and it largely increases the weight of MLPs (Model F in Table 5). These methods also ignore the important category information which contains discriminative semantics. To this end, they propose a lightweight Conditional Modulation Module for point cloud refinement. In addition to achieving the adjustment on global point cloud representation, the proposed module can be easily extended to learn local enhancement effects for point cloud refining.
Model Based model Generation (Net1) Refinement (Net2) Strategy description CD (Test) CD (ExtraTest) A(baseline)  VRCNet   5.96 6.08 B A VRCNet  Selfsupervised pretraining 5.78 (0.18) 5.91 (0.17) C A VRCNet  IOI augmentation 5.83 (0.13) 5.93 (0.15) D C VRCNet Spatial Refiner [7] Add refining module 5.66 (0.20) 5.76 (0.17) E C VRCNet SPD [24] Add refining module 5.41 (0.42) 5.50 (0.43) F E VRCNet CRNet (SPD) Concatenate shape codes 5.41 (0.00) 5.50 (0.00) G E VRCNet CRNet (SPD) Add Conditional Modulation module 5.32 (0.09) 5.41 (0.09) H G VRCNet CRNet (Multiscale SPD) Add multiscale skip transformers 5.27 (0.05) 5.35 (0.06) Table 5: Ablation studies of our method on MVP completion challenge. Training data: MVP training set; VAL data: MVP testing set; Public Test: MVP ExtraTest set. (CD loss multiplied by ) Model Based model Generation (Net1) Refinement (Net2) Strategy description CD (Public Test) A(baseline)  VRCNet   5.79 C A VRCNet  IOI augmentation 5.61 (0.18) D C VRCNet Spatial Refiner [7] Add refining module 5.58 (0.03) E C VRCNet SPD [24] Add refining module 5.33 (0.28) G E VRCNet CRNet (SPD) Conditional modulation module 5.06 (0.27) H G VRCNet CRNet (Multiscale SPD) Add multiscale skip transformers 5.01 (0.05) Table 6: Ablation studies of our method on MVP completion challenge. Training data: MVP training set and test set; Public Test: MVP extratest set. (CD loss multiplied by ) As Figure 6 shows, to enable the network to have the ability of handling operations that require semantic category information and global shape codes, they modulate the intermediate displacement features of the CRNet as follows:
(2) where denotes the elementwise multiplication operation and is MLP layers, is the intermediate displacement features from Conditional Refining Network,
are affine parameters that are estimated from the point cloud category labels
and point cloud global codes from the previous generation network:(3) (4) where and are all MLP layers.
They use conditional vector to affect the cluster centers of the local representation and use conditional vector
to finetune the variance in the feature space. Thus, they can achieve point feature global adjustment with only a few parameters. The local features are encouraged to be more closer of the same object than features of other objects, such that the local representations of each object can be affected by the distinct semantic information and global shape codes. Therefore, the model is not easy to be confused with similar local structures under different semantic information.

MultiScale SPD module: To reveal fine local geometric details on the complete shape, existing methods[16, 30, 32] usually adopt foldingbased strategy [26] to obtain the variations for learning different displacements for the duplicated points. However, the foldingbased strategy ignores the local shape characteristics contained in the original point due to the same 2D grids for sampling. Different from the folding based strategy, SnowflakeNet [24] use SPD (Snowflake Point Deconvolution) to reformulate the generation of child points from parent points as a growing process of snowflake, where the shape characteristic embedded by the parent point features is extracted and inherited into the child points through a pointwise splitting operation. They also introduce a novel skiptransformer [24] to learn splitting patterns in SPD module which can learn shape context and spatial relationship between child points and parent points.
Their CRNet aims to refine the local geometric details on the complete point cloud. Inspired by SnowflakeNet [24], they use a similar structure as SPD. Different from SPD [24], their input is predicted completion point cloud with points from Net1, and they did not use the pointwise splitting operation to increase the number of points. In contrast, they only obtain the variations of coordinates on each point in their multiscale SPD module as Fig. 6 shows. In order to progressively refine the local geometric details, three multiscale SPDs are used in their Conditional Refining network, which is shown in Fig. 4. To facilitate consecutive multiscale SPDs to refine points in a coherent manner, they use a skiptransformer to learn and refine the spatial context from different layers. Moreover, to improve the robustness to the variations on the diversity of local structures, they apply multiscale skip transformers with different local regions in their multiscale SPD module.
As Fig. 6 shows, in the th multiscale SPD module, they take the refined point cloud from previous layer as , and they extract the perpoint feature from and global shape codes by the basic PointNet [13]. Then they send the displacement feature from the previous conditional modulation module and into two skip transformers with different local regions for local feature learning. Then they fed the multiscale local features to MLP, and obtained the displacement feature of the current layer. They can use to generate the point displacement :
(5) where is hypertangent activation. Finally, the output point cloud is update as:
(6)
Training description

Stage two: They add the refining network (their proposed CRNet) and remove the data augmentation. The learning rate is initialized as . The batch size is set to 64. A total of 70 epochs were executed. In this stage, they only train the refining network , and the generation part (Net1 in Fig. 2) of the model is fixed after training in the first stage.
Testing description They follow the official testing strategy without any aftertreatments. Following [10]
, we compare our method with other evaluated methods on MVP original dataset (2,048 points and 16,384 points), the evaluated CD loss and Fscore are reported in Table
3 and 4. Our method outperforms other methods in terms of CD and Fscore@1%.2.3 Solution of Third Place
Learning Spherical Point Transformation for Point Cloud Completion
Team Members: Junsheng Zhou, Xin Wen, Peng Xiang, YuShen Liu, and Zhizhong Han
denotes equally contribution
denotes corresponding author
General Method Description
Inspired by PMPNet[21]
, which learns the point moving path between the incomplete and complete points, they design a novel deep neural network for point cloud completion by learning spherical point transformation (SPTNet).
As shown in Fig. 7, they first randomly sample points on the standard sphere and generate a complete point cloud by moving each spherical point. As a result, the network learns a strict and unique correspondence on pointlevel and thus improves the quality of the predicted complete shape. Moreover, due to the uniformity of spherical points, the predicted complete shape generated by the network also retains the uniformity. In order to finetune the results, they added the RENet[12] at the end of the network. Different from other supervised[20, 24] or unsupervised[19] point cloud completion methods, SPTNet makes full use of the spherical distribution, so it achieves better results. And they learned a nonendtoend point movement based RENet for further finetuning. In summary, the main contributions of their work are listed as follows:

They propose a novel network for point cloud completion, named SPTNet, which moves spherical points to generate a complete point cloud in high accuracy.

They propose to generate complete shapes in a coarsetofine manner. After spherical points transformation, they apply RENet in both endtoend and nonendtoend manners, which justified their idea of point movement based shape generation for finetuning.

They explore the feasibility of leveraging a transformerbased network to learn pointwise features in the encoder, which captures more local information between points.
Training Description
They did not pretrain their network under any additional datasets or adopt any pretrained models. And they did not do any enhancement to the data. In the training process, they adopt a twostage training strategy. In the first stage, SPTNet with a generative RENet is trained endtoend. And in the second stage, based on the results generated in the first stage, they train an additional point movement based RENet for each class, which is nonendtoend. And they use Chamfer Distance as the loss function during training. Adam optimizer is used for all networks with an initial learning rate of 0.0001, and the learning rate is multiplied by 0.7 every 40 epochs. The batch size is set to 16, and the total number of training epochs is set to 100. They use PyTorch to implement the method. All the models are trained on a NVIDIA 2080Ti GPU with 11GB memory consumption. It takes about 24 hours for training and 45 minutes for testing. And there is no human effort required for implementation, training, and validation.
Testing Description
During testing, the first pass the test data through SPTNet, and then input it into different point movement based RENet according to the category of the model.
3 PartialtoPartial Point Cloud Registration
Overview.
Besides completion for singleview partial point clouds, researchers often perform PartialtoPartial Registration (PPR) for 3D reconstruction. However, previous methods usually perform PPR on uniformly dis ModelNet40 [23]
under restricted rotations in [0, 45°]. On the contrary, we use virtualscanned partial point cloud pairs in the MVP registration challenge, and many partial point cloud pairs are under unrestricted rotations in [0, 180°]. These settings are more similar to observations in real applications, such as 6D pose estimation, which, however, challenges existing objectcentric PPR methods.
Method  Ranking  Rot Error  Trans Error  MSE 
IMNet  3  2.91  0.027  0.078 
ROPNet + PREDATOR  2  2.97  0.026  0.078 
Hybrid Optimization  1  2.92  0.021  0.072 
Dataset.
We generate partial point cloud pairs from the MVP dataset [12], and a successful pair is selected if sufficient overlapped areas are detected. In total, we generate a training set with 6,400 paired partial point clouds, a test set with 1,200 pairs, and an extratest set with 2,000 pairs. In the test set and extratest set, most relative rotations are within [0, 45°], and the rest have unrestricted rotations [0, 180°]. The ratio is roughly 4 : 1. Note that the source and the target are two different incomplete point clouds scanned from the same CAD model.
Evaluation Metric.
We mainly use three metric, including isotropic rotation error , l2 translation error and an MSE error considering both rotations and translations. Those metric functions are defined as:
(7) 
(8) 
(9) 
where rad2deg converts angles from radians to degrees, and deg2rad is the inverse operation. , , and are groundtruth rotations, predicted rotations, groundtruth translations and predicted translations, respectively.
Results.
The benchmark results are reported in Table 7.
3.1 Solution of First Place
Hybrid Optimization Method with Unconstrained Variables
Team Members: Yuanjie Yan and Junyi An
General Method Description
In the MVP registration, they introduces a new set of variables to directly represent the transformation from point cloud and point cloud , where , , and denote the rotation direction, rotation angle, translation direction and translation distance, respectively. They map restricted variables to unrestricted variables to optimize the translation vector. Then, they use a variant of CD loss as optimization function. Finally, they initialize the variables and use strategies to optimize them.
Transformation matrix with optimized variables: The rotation matrix can be represented by rotation direction and angle.
(10) 
where
(11) 
can be any direction vector, and the range of is in the interval .



Strategies 


MSE  
✓  ✓  ✓  3.2654  0.0230  0.0799  
✓  ✓  ✓  3.2350  0.0292  0.0796  
✓  ✓  ✓  9.8391  0.0239  0.2149  
✓  ✓  ✓  ✓  2.9497  0.0211  0.0726 
The translation vector can be represented by translation direction and distance.
(12) 
To ensure the legitimacy of the translation, they provide a constraint on the translation distance: .
The constraint with variables is known as ”box constraint” in the optimization literature. There are three different methods of approaching this problem following [1]:

Projected gradient descent performs one step of standard gradient descent and then clips all the coordinates to be within the box.

Clipped gradient descent does not clip the original target on each iteration; rather, it incorporates the clipping into the objective function to be minimized.

Change of variables introduces a new variable instead of optimizing the original variable.
They choose the third approach as a smoothing of clipped gradient descent that eliminates the problem of getting stuck in extreme regions. In this case, the translation vector is given by:
(13) 
where the constrained is replaced by term . For similar ideas, they also proposed a mapping method using the function.
(14) 
They try several mapping functions. The results show that eq. (14) is the most effective mapping to reduce the Trans error.
Loss: To optimize the variables, they introduce the chamfer distance (CD) loss, which can help to find the point correspondences in the optimization process. The standard CD loss is given by:
(15)  
Moreover, they propose two variants of CD loss to improve the registration of MVP, Local CD Loss and Projected CD Loss .
Method  Rot Error  Trans Error  MSE 
DCP  27.4702  0.1042  0.5836 
IDAM  20.6607  0.1482  0.5088 
Euler angles  13.7002  0.0488  0.2879 
6D  9.6956  0.0442  0.2134 
Hybrid Optimization  2.9187  0.0206  0.0716 

Local CD loss. The local CD loss on point clouds reduces the limit of matching all points, improving the tolerance for point cloud distribution differences.

Projected CD loss. The local CD loss only pays attention to the local alignment but ignores the global point cloud matching. They project onto the , , plane and calculate the projected CD loss as follows.
(17) where is a projected plane.
The final optimization function is given by:
(18)  
where is balance weights.
Rot Error  Trans Error  MSE 
2.9751 0.0254  0.02106 0.0003  0.07299 0.0004 
2.9187*  0.0206*  0.0716* 

Training description
The proposed method does not require training.
Testing Description
Predicting rigid transformation by gradient descent is a nonconvex optimization problem. Therefore, it is crucial for optimization from multiple initializations. In their method, they initialize the term into rotation directions. As for the initial angle, they make initial value in the interval . The result with the smallest loss is selected as the rotation matrix. They set a threshold for the optimization loss. When the optimization loss is bigger than the threshold, they will initialize the angle in the intervals , , and , respectively, and repeat the optimization. This optimization strategy divided intervals tends to solve the problem of the symmetric point cloud model. It tries to avoid the situation of rotating more than or
. This strategy coincides with the prior probability of the dataset. Most relative rotations are restricted in
, and the rest have unrestricted rotations . The ratio is roughly .3.2 Solution of Second Place
Deep Models with Fusion Strategies
Team Members: Lifa Zhu, Changwei Lin, Dongrui Liu, Xin Li, Francisco GómezFernández.
General Method Description
They propose to fuse ROPNet [33] and PREDATOR[4] to solve point cloud registration in the MVP Challenge. ROPNet and the pipeline of their solution to this challenge can be seen in Fig. 8 and Fig. 9.
In ROPNet, they proposed a contextguided module that uses an encoder to extract global features for predicting point overlap score and introduced a Transformer to enrich point features and remove nonrepresentative points based on point overlap score and feature matching.
Considering unrestricted rotations, they used PREDATOR in their pipeline. In PREDATOR source code, they found and solved a simple but important GNN bug that helps the network to obtain a higher performance. Then, they adjusted parameters in PREDATOR for the MVP registration challenge. Inspired by the idea of partialtocomplete proposed in ROPNet, they try to remove some points in source data based on the predicted scores and keep all points in target data during RANSAC iterations.
Based on the above discussions, they designed a few ensemble strategies based on data characteristics to help fuse ROPNet and PREDATOR to estimate the final rigid transformation. Experiments on the test set showed it is effective in most cases, with few fails. ROPNet can be seen in Fig. 8. Their proposed contextguided (CG) module and Transformerbased Feature Matching Removal (TFMR) Module in ROPNet can be seen in Fig. 10 and Fig. 11. The pipeline of their solution is shown in Fig. 9.
Model  rot_level  Rot Error  Trans Error  MSE 
RPMNet_corr [34]  0  12.5560  0.1674  0.3865 
ROPNet [33]  0  1.0449  0.0193  0.0375 
RPMNet_corr [34]  0, 1  21.9685  0.2062  0.5896 
ROPNet [33] + PREDATOR [4]  0, 1  3.1666  0.0292  0.0845 
Training description
They trained ROPNet and PREDATOR independently. All 2048 points were involved in training for the two models. For ROPNet, they trained for 600 epochs using Adam optimizer with an initial learning rate of 0.0001. The learning rate changes using a cosine annealing schedule. They trained ROPNet in a noniterative manner. However, they run 2 iterations for the TFMR module during the test. It is noted that they only trained ROPNet for small rotation angles ranging from 0° to 45°. Also, they did not use Point Pair Features [2], because they could not get accurate normal vectors in the MVP challenge data.
For PREDATOR, following the code^{4}^{4}4https://github.com/overlappredator/OverlapPredator released by the authors, they trained on MVP registration dataset for 200 epochs using SGD with 0.98 momentum. The initial learning rate was 0.01, with an exponential decay factor of 0.95 every epoch. They trained PREDATOR under unrestricted rotation angles ranging from 0° to 360°. In addition, they adjusted some parameters such as voxel size to 0.04, sampled points in circle loss, and others in loss implementation.
Testing Description
For each source and target point cloud pair, they estimate transformations and based on ROPNet and PREDATOR, respectively. is the output predicted from source to target using the endtoend ROPNet model. is also the transformation from source to target, which is obtained with RANSAC using features and keypoints provided by PREDATOR. They select or based on their proposed ensemble rules. They compare their method with RPMNet_corr [34], which is the variant of RPMNet [27] to help solve registration with unrestricted rotations. The results in Table 11 show that their ROPNet achieves a much lower registration error than RPMNet_corr when rot_level is 0. When rot_level is not restricted, the ensemble model of ROPNet and PREDATOR also achieves much lower registration error than RPMNet_corr.
3.3 Solution of Third Place
IMNet: Partial Point Cloud Registration with Inliers Prediction and Matching
Team Members: Qinlong Wang and Yang Yang
General Method Description
Their work titled with, IMNet: Partial Point Cloud Registration with Inliers Prediction and Matching is built with 4 main components, as illustrated in Figure 12.

Firstly, they use DGCNN [18] to extract the per point initial feature in each point cloud.

To extract identifying matching features, each point feature of two point cloud is then transformed via a graph neural network constructed by interleaving intra and crossgraph aggregations, i.e., edges of intragraph connect all points in the same point cloud and edges of crossgraph connect per point with all points in another point cloud. They introduce a fully connected, equalweighted intragraph to learn the relative structure information of each point cloud, in which a large receptive field is beneficial to clarifying ambiguous features. And an attentionbased crossgraph is adopted to communicate information between two point clouds for identifying point features. To better learn the structure information, they embed the coordinate of each point with MLP to the same dimension with the initial feature and then concatenate two features before feeding into GNN.

Then an Inlier points Sampling Sinkhorn (ISS) module samples and matches confidential inlier points in overlapping parts. Firstly, inlier scores for all points are predicted to prepare sampling candidates. Especially in the training phase, they sample half of the points with higher scores and half of the points with lower scores. And in the test phase, the sample is simply to select all candidates with higher scores. Then, they construct a new similarity matrix with dustbin by sampling features. To further handle outliers lain in sampled points, the dustbin score is initialized with inlier scores. Next, they feed the similarity matrix into Sinkhorn algorithm to generate a matching confidence map.

Finally, a bidirectional correspondence is constructed by concatenating source to target correspondence, and the symmetric one, i.e. correspondence, is selected by the matches with the highest confidence in each row and column of the matching confidence map. The introduced correspondence can generate more matching pairs for low overlapping data and show better results than the correspondence based on mutual check. Each correspondence is weighted by the corresponding confidence. They compute the transformation using SVD.
Model Details: For the submitted model, they interleave the intra and crossgraph for 10 layers. The dimension of the initial features extracted by DGCNN [18]
is 64, and the feature dimension in GNN and ISS module remains at 128 after concatenation. The ISS module will sample 1/6 points for each point cloud. They use the same evaluation metrics with the official implementation of MVP benchmark.
Dataset  Preprocess  Error  Recall  
Rot.  Trans.  MSE  RMSE  
MVP test [11]    2.5008  0.0305  0.0742  0.0382  0.9583 
MVP test [11]  add noise  3.2183  0.0311  0.0872  0.0447  0.9633 
ModelNet40 [23]  sample rotation  3.7722  0.0120  0.0779  0.0236  0.9617 
Implementation Details: The loss function used for training consists of 4 parts. The first two parts are a crossentropy loss for inlier scores and a peaky loss inspired by the loss function in R2D2 [14] to maximize the peakiness of the overlap scores. The last two parts are a crossentropy loss for match confidence map following the implementation of the loss function for RPMNet in [34] and a crossentropy loss for dustbin confidence. They weighted the peaky loss with 0.25 and the dustbin loss with 0.5.
Training description
They train their network using MVP training data and validate the model with MVP test data. During training, they random sample each point cloud to 1024 points for saving memory and augmenting the overlap rate of data. Moreover, the ISS module will sample half of the positive samples and half of the negative samples, i.e. matching and unmatching points. The network is trained using ADAM optimizer [5] with a learning rate of 0.0005. The submitted model is refined within 500 epochs using a learning rate of 0.0001 after 1000 epochs. The network often converges after 500 epochs.
Testing Description
During the test time, they remain 2048 points for inference, and ISS module will sample points with higher inlier scores. They further select half of the bidirectional correspondence with higher correspondence confidence to compute the transformation.
4 Discussion
4.1 Completion
Summary. Generally, the top3 methods significantly outperform baseline methods. PoinTr++ employs a global transformer for generating the missing parts, followed by using RENet. CRNet proposes the IOI augmentation and multiscale SPD module to achieve semanticaware completion. SPTNet makes full use of the spherical distribution, and it learns a non endtoend point movement based RENet for further finetuning.
Future Directions. In this challenge, we use CD loss for evaluating different completion methods. However, CD loss is not sensitive to global distribution, and the completion results can be biased to partial observations. New metrics, such as BCD [22] and EMD, can be leveraged for evaluation. Recently, diffusion models [9] provide impressive point cloud completion results, which also circumvent the imbalance issue. In addition, unsupervised [31] or selfsupervised [19] point cloud completion can also be studied.
4.2 Registration
Summary. In a nutshell, the top3 methods achieve surprisingly good registration results for partialtopartial point cloud registration, especially for those with unrestricted rotations. The 1st place method uses a nonlearning hybrid optimization. The 2nd place uses an assembling strategy with ROP [33] and PREDATOR [4]. The 3rd place uses crossgraph connections and ISS module.
Future Directions. Although they achieve outstanding registration results, the PPR problem has not been fully resolved, as: 1) hybrid optimization requires multiple initializations; 2) the assembling method is not elegant or efficient; 3) the 3rd place method heavily relies on crossgraph connections and requires overall 1500 epochs for training. Moreover, those registration methods did not take full advantage of poseinvariant features (e.g. PPF), which can facilitate fullrange PPR [11].
Acknowledgement
We sincerely thank Yuanhan Zhang for helpful discussions.
References
 [1] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39–57. IEEE, 2017.
 [2] Bertram Drost, Markus Ulrich, Nassir Navab, and Slobodan Ilic. Model globally, match locally: Efficient and robust 3d object recognition. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 998–1005. Ieee, 2010.
 [3] Kexue Fu, Shaolei Liu, Xiaoyuan Luo, and Manning Wang. Robust point cloud registration framework based on deep graph matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8893–8902, 2021.
 [4] Shengyu Huang, Zan Gojcic, Mikhail Usvyatsov, Andreas Wieser, and Konrad Schindler. Predator: Registration of 3d point clouds with low overlap. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4267–4276, 2021.
 [5] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [6] Jiahao Li, Changhao Zhang, Ziyao Xu, Hangning Zhou, and Chi Zhang. Iterative distanceaware similarity matrix convolution with mutualsupervised point elimination for efficient point cloud registration. arXiv preprint arXiv:1910.10328, 2019.
 [7] Ruihui Li, Xianzhi Li, PhengAnn Heng, and ChiWing Fu. Point cloud upsampling via disentangled refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 344–353, 2021.

[8]
Minghua Liu, Lu Sheng, Sheng Yang, Jing Shao, and ShiMin Hu.
Morphing and sampling network for dense point cloud completion.
In
Proceedings of the AAAI Conference on Artificial Intelligence
, volume 34, pages 11596–11603, 2020.  [9] Zhaoyang Lyu, Zhifeng Kong, Xudong Xu, Liang Pan, and Dahua Lin. A conditional point diffusionrefinement paradigm for 3d point cloud completion. arXiv preprint arXiv:2112.03530, 2021.
 [10] Liang Pan. Ecg: Edgeaware point cloud completion with graph convolution. IEEE Robotics and Automation Letters, 2020.
 [11] Liang Pan, Zhongang Cai, and Ziwei Liu. Robust partialtopartial point cloud registration in a full range. arXiv preprint arXiv:2111.15606, 2021.
 [12] Liang Pan, Xinyi Chen, Zhongang Cai, Junzhe Zhang, Haiyu Zhao, Shuai Yi, and Ziwei Liu. Variational relational point completion network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8524–8533, 2021.
 [13] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 1(2):4, 2017.
 [14] Jerome Revaud, Philippe Weinzaepfel, César De Souza, Noe Pion, Gabriela Csurka, Yohann Cabon, and Martin Humenberger. R2d2: repeatable and reliable detector and descriptor. arXiv preprint arXiv:1906.06195, 2019.
 [15] Lyne P Tchapmi, Vineet Kosaraju, Hamid Rezatofighi, Ian Reid, and Silvio Savarese. Topnet: Structural point cloud decoder. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 383–392, 2019.
 [16] Xiaogang Wang, Marcelo H Ang Jr, and Gim Hee Lee. Cascaded refinement network for point cloud completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 790–799, 2020.

[17]
Yue Wang and Justin M Solomon.
Prnet: Selfsupervised learning for partialtopartial registration.
In Advances in Neural Information Processing Systems, pages 8812–8824, 2019.  [18] Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog), 38(5):1–12, 2019.
 [19] Xin Wen, Zhizhong Han, YanPei Cao, Pengfei Wan, Wen Zheng, and YuShen Liu. Cycle4completion: Unpaired point cloud completion using cycle transformation with missing region coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13080–13089, 2021.
 [20] Xin Wen, Tianyang Li, Zhizhong Han, and YuShen Liu. Point cloud completion by skipattention network with hierarchical folding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1939–1948, 2020.
 [21] Xin Wen, Peng Xiang, Zhizhong Han, YanPei Cao, Pengfei Wan, Wen Zheng, and YuShen Liu. Pmpnet: Point cloud completion by learning multistep point moving paths. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7443–7452, 2021.
 [22] Tong Wu, Liang Pan, Junzhe Zhang, Tai Wang, Ziwei Liu, and Dahua Lin. Densityaware chamfer distance as a comprehensive metric for point cloud completion. arXiv preprint arXiv:2111.12702, 2021.
 [23] Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015.
 [24] Peng Xiang, Xin Wen, YuShen Liu, YanPei Cao, Pengfei Wan, Wen Zheng, and Zhizhong Han. Snowflakenet: Point cloud completion by snowflake point deconvolution with skiptransformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5499–5509, 2021.
 [25] Haozhe Xie, Hongxun Yao, Shangchen Zhou, Jiageng Mao, Shengping Zhang, and Wenxiu Sun. Grnet: Gridding residual network for dense point cloud completion. arXiv preprint arXiv:2006.03761, 2020.
 [26] Yaoqing Yang, Chen Feng, Yiru Shen, and Dong Tian. Foldingnet: Point cloud autoencoder via deep grid deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 206–215, 2018.
 [27] Zi Jian Yew and Gim Hee Lee. Rpmnet: Robust point matching using learned features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11824–11833, 2020.
 [28] Xumin Yu, Yongming Rao, Ziyi Wang, Zuyan Liu, Jiwen Lu, and Jie Zhou. Pointr: Diverse point cloud completion with geometryaware transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12498–12507, 2021.

[29]
Wentao Yuan, Benjamin Eckart, Kihwan Kim, Varun Jampani, Dieter Fox, and Jan
Kautz.
Deepgmr: Learning latent gaussian mixture models for registration.
In European Conference on Computer Vision, pages 733–750. Springer, 2020.  [30] Wentao Yuan, Tejas Khot, David Held, Christoph Mertz, and Martial Hebert. Pcn: Point completion network. In 2018 International Conference on 3D Vision (3DV), pages 728–737. IEEE, 2018.
 [31] Junzhe Zhang, Xinyi Chen, Zhongang Cai, Liang Pan, Haiyu Zhao, Shuai Yi, Chai Kiat Yeo, Bo Dai, and Chen Change Loy. Unsupervised 3d shape completion through gan inversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1768–1777, 2021.
 [32] Wenxiao Zhang, Qingan Yan, and Chunxia Xiao. Detail preserved point cloud completion via separated feature aggregation. arXiv preprint arXiv:2007.02374, 2020.
 [33] Lifa Zhu, Dongrui Liu, Changwei Lin, Rui Yan, Francisco GómezFernández, Ninghua Yang, and Ziyong Feng. Point cloud registration using representative overlapping points. arXiv preprint arXiv:2107.02583, 2021.
 [34] Tejas Zodage, Rahul Chakwate, Vinit Sarode, Rangaprasad Arun Srivatsan, and Howie Choset. Correspondence matrices are underrated. In 2020 International Conference on 3D Vision (3DV), pages 603–612. IEEE, 2020.
Comments
There are no comments yet.