1 Introduction
Finding dense correspondence between D shapes is a key algorithmic component in problems such as statistical modeling Blanz and Vetter (2003); Zuffi et al. (2017); Bogo et al. (2014), crossshape texture mapping Kraevoy et al. (2003), and spacetime D reconstruction Niemeyer et al. (2019). Dense D shape correspondence can be defined as: given two D shapes belonging to the same object category, one can match an arbitrary point on one shape to its semantically equivalent point on another shape if such a correspondence exists. For instance, given two chairs, the dense correspondence of the middle point on one chair’s arm should be the similar middle point on another chair’s arm, despite different shapes of arms; or alternatively, declare the nonexistence of correspondence if another chair has no arm. Although prior dense correspondence methods Ovsjanikov et al. (2012); Litany et al. (2017); Groueix et al. (2018a); Halimi et al. (2019); Roufosse et al. (2019); Lee and Kazhdan (2019); Steinke et al. (2007); Liu et al. (2019a) have proven to be effective on organic shapes, e.g., human bodies and mammals, they become less suitable for generic topologyvarying or manmade objects, e.g., chair or vehicles Huang et al. (2014). It remains a challenge to build dense D correspondence for a category with large variations in geometry, structure, and even topology. First of all, the lack of annotations on dense correspondence often leaves unsupervised learning the only option. Second, most prior works make an inadequate assumption Van Kaick et al. (2011) that there is a similar topological variability between matched shapes. Manmade objects such as chairs shown in Fig. 1 are particularly challenging to tackle, since they often differ not only by geometric deformations, but also by part constitutions. In these cases, existing correspondence methods for manmade objects either perform fuzzy Kim et al. (2012); Solomon et al. (2012) or partlevel Sidi et al. (2011); Alhashim et al. (2015) correspondences, or predict a constant number of semantic points Huang et al. (2017); Chen et al. (2020). As a result, they cannot determine whether the established correspondence is a “missing match” or not. As shown in Fig. 1, for instance, we may find nonconvincing correspondences in legs between an office chair and a legged chair, or even no correspondences in arms for some pairs. Ideally, given a query point on the source shape, a dense correspondence method aims to determine whether there exists a correspondence on the target shape, and the corresponding point if there is. This objective lies at the core of this work. Shape representation is highly relevant to, and can impact, the approach of dense correspondence. Recently, compared to point cloud Achlioptas et al. (2018); Qi et al. (2017a, b) or mesh Groueix et al. (2018b); Georgia Gkioxari (2019); Wang et al. (2018), deep implicit functions have shown to be highly effective as D shape representations Park et al. (2019); Mescheder et al. (2019); Liu et al. (2019b); Saito et al. (2019); Chen et al. (2019); Chen and Zhang (2019); Atzmon et al. (2019), since it can handle generic shapes of arbitrary topology, which is favorable as a representation for dense correspondence. Often learned as a MLP, conventional implicit functions input the D shape represented by a latent code and a query location in the D space, and estimate its occupancy . In this work, we propose to plant the dense correspondence capability into the implicit function by learning a semantic part embedding. Specifically, we first adopt a branched implicit function Chen et al. (2019) to learn a part embedding vector (PEV),
, where the maxpooling of
gives the . In this way, each branch is tasked to learn a representation for one universal part of the input shape, and PEV represents the occupancy of the point w.r.t. all the branches/semantic parts. By assuming that PEVs between a pair of corresponding points are similar, we then establish dense correspondence via an inverse function mapping the PEV back to theD space. To further satisfy the assumption, we devise an unsupervised learning framework with a joint loss measuring both the occupancy error and shape reconstruction error between
and . In addition, a crossreconstruction loss is proposed to enforce part embedding consistency by mapping within a pair of shapes in the collection. During inference, based on the estimated PEVs, we can produce a confidence score to distinguish whether the established correspondence is valid or not. In summary, contributions of this work include: We propose a novel paradigm leveraging implicit functions for categoryspecific unsupervised dense D shape correspondence, which is suitable for objects with diverse variations including varying topology. We devise several effective loss functions to learn a semantic part embedding, which enables both shape segmentation and dense correspondence. Further, based on the learnt part embedding, our method can estimate a confidence score measuring if the predicted correspondence is valid or not. Through extensive experiments, we demonstrate the superiority of our method in shape segmentation and D semantic correspondence.2 Related Work
Dense Shape Correspondence While there are many dense correspondence works for organic shapes Ovsjanikov et al. (2012); Litany et al. (2017); Groueix et al. (2018a); Halimi et al. (2019); Roufosse et al. (2019); Lee and Kazhdan (2019); Boscaini et al. (2016); Steinke et al. (2007), due to space, our review focuses on methods designed for manmade objects, including optimization and learningbased methods. For the former, most prior works build correspondences only at a part level Kalogerakis et al. (2010); Huang et al. (2011); Sidi et al. (2011); Alhashim et al. (2015); Zhu et al. (2017). Kim et al. Kim et al. (2012) propose a diffusion map to compute pointbased “fuzzy correspondence” for every shape pair. This is only effective for a small collection of shapes with limited shape variations. Kim et al. (2013) and Huang et al. (2015) present a templatebased deformation method, which can find pointlevel correspondences after rigid alignment between the template and target shapes. However, these methods only predict coarse and discrete correspondence, leaving the structural or topological discrepancies between matched parts or part ensembles unresolved. A series of learningbased methods Yi et al. (2017); Huang et al. (2017); Sung et al. (2018); Muralikrishnan et al. (2019); You et al. (2020) are proposed to learn local descriptors, and treat correspondence as D semantic landmark estimation. E.g., ShapeUnicode Muralikrishnan et al. (2019) learns a unified embedding for D shapes and demonstrates its ability in correspondence among D shapes. However, these methods require groundtruth pairwise correspondences for training. Recently, Chen et al. Chen et al. (2020) present an unsupervised method to estimate D structure points. Unfortunately, it estimates a constant number of sparse structured points. As shapes may have diverse part constitutions, it may not be meaningful to establish the correspondence between all of their points. Groueix et al. Groueix et al. (2019) also learn a parametric transformation between two surfaces by leveraging cycleconsistency, and apply to the segmentation problem. However, the deformationbased method always deforms all points on one shape to another, even the points from a nonmatching part. In contrast, our unsupervisedly learnt model can perform pairwise dense correspondence for any two shapes of a manmade object. Implicit Shape Representation Due to the advantages of continuous representation and handling complicated topologies, implicit functions have been adopted for learning representations for D shape generation Chen and Zhang (2019); Mescheder et al. (2019); Park et al. (2019); Liu et al. (2019b), encoding texture Oechsle et al. (2019); Sitzmann et al. (2019); Saito et al. (2019), and D reconstruction Niemeyer et al. (2019). Meanwhile, some works Huang et al. (2004, 2006) leverage the implicit representation together with a deformation model for shape registration. However, these methods rely on the deformation model, which might prevent their usage for topologyvarying objects. Slavcheva et al. Slavcheva et al. (2017) present an approach which implicitly obtains correspondence for organic shapes by predicting the evolution of the signed distance field. However, as they require a Laplacian operator to be invariant, it is limited to small shape variations. Recently, some extensions have been proposed to learn deep structured Genova et al. (2019, 2020) or segmented implicit functions Chen et al. (2019), or separate implicit functions for shape parts Paschalidou et al. (2020). However, instead of at a part level, we extend implicit functions for unsupervised dense shape correspondence.
3 Proposed Method
Let us first formulate the dense D correspondence problem. Given a collection of D shapes of the same object category, one may encode each shape in a latent space . For any point in the source shape , dense D correspondence will find its semantic corresponding point in the target shape if a semantic embedding function (SEF) is able to satisfy
(1) 
Here the SEF is responsible for mapping a point from its D Euclidean space to the semantic embedding space. When and have sufficiently similar locations in the semantic embedding space, they have similar semantic meaning, or functionality, in their respective shapes. Hence is the corresponding point of . On the other hand, if their distance in the embedding space is too large (), there is not a corresponding point in for . If SEF could be learned for a small , the corresponded point of can be solved via , where is the inverse function of that maps a point from the semantic embedding space back to the D space. Therefore, the dense correspondence amounts to learning the SEF and its inverse function. Toward this goal, we propose to leverage the topologyfree implicit function, a conventional shape representation, to jointly serve as SEF. By assuming that corresponding points are similar in the embedding space, we explicitly implement an inverse function mapping from the embedding space to the D space, so that the learning objectives can be more conveniently defined in the D space rather than the embedding space. Both functions are jointly learned with an occupancy loss for accurate shape representation, and a selfreconstruction loss for the inverse function to recover itself. In addition, we propose a crossreconstruction loss enforcing two objectives. One is that the two functions can deform source shape points to be sufficiently close to the target shape. The other is that corresponding offset vectors, , are locally smooth within the neighbourhood of .
3.1 Implicit Function and Its Inverse
Implicit Function As in Chen and Zhang (2019); Mescheder et al. (2019), a shape is first encoded as a shape code by a PointNet Qi et al. (2017a). Given the D coordinate of a query point
, the implicit function assigns an occupancy probability
between and , where indicates is inside the shape, and outside. This conventional function can not serve as SEF, given its simple D output. Motivated by the unsupervised part segmentation Chen et al. (2019), we adopt its branched layer as the final layer of our implicit function, whose output is denoted by in Fig. 2: . A maxpooling operator () leads to the final occupancy by selecting one branch, whose index indicates the unsupervisedly estimated part where belongs to. Conceptually, each element of shall indicate the occupancy value of w.r.t. the respective part. Since appears to represent the occupancy of w.r.t. all semantic parts of the object, the latent space of can be the desirable semantic embedding, and thus we term as the part embedding vector (PEV) of . In our implementation, is composed of fully connected layers each followed by a LeakyReLU, except the final output (Sigmoid).Inverse Implicit Function Given the objective function in Eqn. 1, one may consider that learning SEF, , would be sufficient for dense correspondence. However, this has two issues. 1) To find correspondence of , we need to compute , i.e., assuming the output of equals and solve for via iterative backpropagation. This can be inefficient during inference. 2) It is easier to define shaperelated constraints or losses between and in the D space, than those between and in the embedding space. To this end, we define the inverse implicit function to take PEV and the shape code as inputs, and recover the corresponding D location: . We use a multilayer perception (MLP) network to implement . With , we can efficiently compute via forward passing, without iterative backpropagation.
3.2 Training with Loss Functions
We jointly train our implicit function and inverse function by minimizing three losses: occupancy loss , selfreconstruction loss , and crossreconstruction loss , i.e.,
(2) 
where measures how accurately predicts the occupancy of the shapes, enforces is an inverse function of , and strives for part embedding consistency across all shapes in the collection. We first explain how we prepare the training data, then detail our losses. Training Samples Given a collection of raw D surfaces with consistent upright orientation, we first normalize the raw surfaces by uniformly scaling the object such that the diagonal of its tight bounding box has a constant length and make the surfaces watertight by converting them to voxels. Following the sample scheme of Chen and Zhang (2019), for each shape, we obtain spatial points and their occupancy label , which is for the inside points and otherwise. In addition, we uniformly sample surface points to represent D shapes, resulting in . Occupancy Loss This is a error between the label and estimated occupancy of all shapes:
(3) 
SelfReconstruction Loss We supervise the inverse function by recovering input surface points :
(4) 
where is the th vertex of shape . CrossReconstruction Loss The crossreconstruction loss is designed to encourage the resultant PEVs to be similar for densely corresponded points from any two shapes. As in Fig. 2, from a shape collection we first randomly select two shapes and . The implicit function generates PEVs () given () and their respective shape codes () as inputs. Then we swap their PEVs and send the concatenated vectors to the inverse function : , . If the part embedding is pointtopoint consistent across all shapes, the inverse function should recover each other, i.e., , . Towards this goal, we exploit several loss functions to minimize the pairwise difference between those shapes:
(5) 
where is Chamfer distance (CD) loss, Earth Mover distance (EMD) loss, surface normal loss, smooth correspondence loss, and are the weights. The first three terms focus on the shape similarity, while the last one encourages the correspondence offsets to be locally smooth. Chamfer distance loss is defined as:
(6) 
where CD is calculated as Qi et al. (2017a): . Earth mover distance loss is defined as:
(7) 
where EMD is the minimum of sum of distances between a point in one set and a point in another set over all possible permutations of correspondences Qi et al. (2017a): , where is a bijective mapping. Surface normal loss An appealing property of implicit representation is that the surface normal can be analytically computed using the spatial derivative via backpropagation through the network. Hence, we are able to define the surface normal distance on the point sets.
(8) 
where is the surface normal of . We measure
by the Cosine similarity distance:
, where denotes the dotproduct. Smooth correspondence loss encourages that the correspondence offset vectors , of neighboring points are as similar as possible to ensure a smooth deformation:(9) 
where , , and are neighborhoods for and respectively.
3.3 Inference
During inference our method can offer both shape segmentation and dense correspondence for D shapes. As each element of PEV learns a compact representation for one common part of the shape collection, the shape segmentation of is the index of the element being maxpooled from its PEV. As both the implicit function and its inverse are pointbased, the number of input points to can be arbitrary during inference. Given two point sets , with shape codes and , generates PEVs and , and outputs crossreconstructed shape . For any query point , a preliminary correspondence may be found by a nearest neighbour search in : . Knowing the index of in , the same index in refers to the final correspondence . Here, the nearest neighbor search might not be optimal as it limits the solution to the already sampled points in . An alternative is that, once the preliminary correspondence is found, within its neighbourhood, we can search an surface point who is closer to than . As our input shapes are densely sampled, this alternative does not provide notable benefits, and thus we use the first approach. Finally, we compute the correspondence confidence as , where is normalized to the range of , and is the index of in . Since the learned part embedding is discriminative among different parts of a shape, the distance of PEVs is suitable to define the confidence. When is larger than a predefined threshold , this correspondence is valid; otherwise has no correspondence.
3.4 Implementation Detail
Our method is trained in three stages: ) PointNet and implicit function are trained on sampled pointvalue pairs via Eqn. 3. ) , , and inverse function are jointly trained via Eqn. 3 and 4. ) We jointly train , and with . In experiments, we set , , , , , , ,
. We implement our model in Pytorch and use Adam optimizer at a learning rate of
in all stages.4 Experiments
4.1 3D Semantic Correspondence
Data We evaluate on D semantic point correspondence, a special case of dense correspondence, with two motivations: 1) no database of manmade objects has groundtruth dense correspondence; 2) there is far less prior work in dense correspondence for manmade objects, than the semantic correspondence task, which has strong baselines for comparison. Thus, to evaluate semantic correspondence, we train on ShapeNet Chang et al. (2015) and test on BHCP Kim et al. (2013) following the setting of Huang et al. (2017); Chen et al. (2020). For training, we use a subset of ShapeNet including plane (), bike (), chair () categories to train individual models. For testing, BHCP provides groundtruth semantic points ( per shape) of shapes including plane (), bike (), chair (), helicopter (). We generate all pairs of shapes for testing, e.g., pairs for bike. The helicopter is tested with the plane model as Huang et al. (2017); Chen et al. (2020) did. As BHCP shapes are with rotations, prior works test on either one or both settings: aligned and unaligned (i.e., vs. arbitrary relative pose of two shapes). Baseline We compare our work with multiple stateoftheart (SOTA) baselines. Kim12 Kim et al. (2012) and Kim13 Kim et al. (2013) are traditional optimization methods that require part label for templates and employ collectionwise coanalysis. LMVCNN Huang et al. (2017), ShapeUnicode Muralikrishnan et al. (2019), AtlasNet2 Deprelle et al. (2019) and Chen et al. Chen et al. (2020) are all learning based, where Huang et al. (2017); Muralikrishnan et al. (2019) require groundtruth correspondence labels for training. Despite Chen et al. (2020) only estimates a fixed number of sparse points, Chen et al. (2020) and ours are trained without labels. As optimizationbased methods and Huang et al. (2017) are designed for the unaligned setting, we also train a rotationinvariant version of ours by supervising to predict an additional rotation matrix and applying it to rotate the input point before feeding to . Results The correspondence accuracy is measured by the fraction of correspondences whose error is below a given threshold of Euclidean distances. As in Fig. 3, the solid lines show the results on the aligned data and dotted lines on the unaligned data. We can clearly observe that our method outperforms baselines in plane, bike and chair categories on aligned data. Note that Kim13 Kim et al. (2013) has a slightly higher accuracy than ours on the helicopter category, likely due to the fact that Kim et al. (2013) tests with the helicopterspecific model, while we test on the unseen helicopter category with a planespecific model. At the distance threshold of , our method improves on average accuracy in categories over Chen et al. (2020). For unaligned data, our method achieves competitive performance as baselines. While it has the best AUC overall, it is worse at the threshold between . The main reason is the implicit network itself is sensitive to rotation. Note that this comparison shall be viewed in the context that most baselines use extra cues during training or inference, as well as high inference speed of our learningbased approach. Some visual dense correspondence results are shown in Fig. 4. Note the amount of nonexistent correspondence is impacted by the threshold as in Fig. 4. A larger discovers more subtle nonexistence correspondences. This is expected as the division of semantically corresponded or not can be blurred for some shape parts. By only finding the closest points on aligned D shapes, we report its semantic correspondence accuracy as the black curve in Fig. 6. Clearly, our accuracy is much higher than this “lower bound", indicating our method doesn’t rely much on the canonical orientation. To further validate on noisy real data, we evaluate on the Chair category with additive noise and compare with Chen et al. Chen et al. (2020). As shown in Fig. 6, the accuracy is slightly worse than testing on clean data. However, our method still outperforms the baseline on noisy data. Detecting NonExistence of Correspondences Our method can build dense correspondences for D shapes with different topologies, and automatically declare the nonexistence of correspondence. The experiment in Fig. 3 cannot fully depict this capability of our algorithm as no semantic point was annotated on a nonmatching part. Also, there is no benchmark providing the nonexistence label between a shape pair. We thus build a dataset with paired shapes from the chair category of ShapeNet part dataset. Within a pair, one has the arm part while the other does not. For the former, we annotate arm points and nonarm points based on provided part labels. As correspondences don’t exist for the arm points, we can utilize this data to measure our detection of nonexistence of correspondence. Based on our confidence scores, we report the ROC in Fig. 6. The AUC shows our strong capability in detecting no correspondence.
4.2 Unsupervised Shape Segmentation
In testing, unlike prior templatebased Kim et al. (2013) or feature point estimation methods Chen et al. (2020), we don’t need to transfer any segmentation labels. Thus, we only compare with the SOTA unsupervised segmentation method BAENet Chen et al. (2019). Following the same protocol Chen et al. (2019), we train categoryspecific models and test on the same categories of ShapeNet part dataset Yi et al. (2016): plane (), bag (), cap (), chair (), mug (), skateboard (), table (), and chair* (a joint chair+table set with shapes). Intersection over Union (IoU) between prediction and the groundtruth is a common metric for segmentation. Since unsupervised segmentation is not guaranteed to produce the same part counts exactly as the groundtruth, e.g., combining the seat and back of a chair as one part, we report a modified IoU Chen et al. (2019) measuring against both parts and part combinations in the groundtruth. As in Tab. 1, our model achieves a consistently higher segmentation accuracy for all categories than BAENet. As BAENet is very similar to our model trained in Stage , these results show that our dense correspondence task helps the PEV to better segment the shapes into parts, thus producing a more semantically meaningful embedding. Some visual results of segmentation are shown in Fig. 5.
Shape (#parts)  plane ()  bag ()  cap ()  chair ()  chair* ()  mug ()  skateboard ()  table ()  Aver.  











BAENet Chen et al. (2019)  
Proposed  88.0 
4.3 Ablations and Visualizations
Shape Representation Power of Implicit Function We hope our novel implicit function still serves as a shape representation while achieving dense correspondence. Hence its shape representation power needs to be evaluated. Following the setting of Tab. 1, we first pass a groundtruth point set from the test set to and extract the shape code . By feeding and a grid of points to , we can reconstruct the D shape by Marching Cubes. We evaluate how well the reconstruction matches the groundtruth point set. The average Chamfer distance (CD) between ours and branched implicit function (BAENet) on the categories is and (), respectively. The lower CD shows that our novel design of semantic embedding actually improves the shape representation.
Loss Terms on Correspondence Since the point occupancy loss and selfreconstruction loss are essential, we only ablate each term in the crossreconstruction loss for the chair category. Correspondence results in Fig. 7 demonstrate that, while all loss terms contribute to the final performance, and are the most crucial ones. forces to resemble . Without , it is possible that may resemble well, but with erroneous correspondences locally. Part Embedding over Training Stages The assumption of learned PEVs being similar for corresponding points motivates our algorithm design. To validate this assumption, we visualize the PEVs of semantic points, defined in Fig. 7, with their groundtruth corresponding points across chairs. The tSNE visualizes the dim PEVs in a D plot with one color per semantic point, after each training stage. The model after Stage training resembles BAENet. As in Fig. 7, the points corresponding to the same semantic point, i.e., D points of the same color, scatter and overlap with other semantic (colored) points. With the inverse function and selfreconstruction loss in Stage , the part embedding shows more promising grouping of colored points. Finally, the part embedding after Stage has well clustered and more discriminative grouping, which means points corresponding to the same semantic location do have similar PEVs. The improvement trend of part embedding across stages shows the effectiveness of our loss design and training scheme. Onehot vs. Continuous Embedding Ideally, BAENet Chen et al. (2019) should output a onehot vector before , which would benefit unsupervised segmentation the most. In contrast, our PEVs prefer a continuous embedding rather than onehot. To better understand PEV, we compute the statistics of Cosine Similarity (CS) between the PEVs and their corresponding onehot vectors: (BAENet) vs. (ours). This shows our learnt PEVs are approximately
onehot vectors. Compared to BAENet, our smaller CS and larger variance are likely due to the limited network capability, as well as our encouragement to learn a continuous embedding benefiting correspondence.
Dimensionality of PEV Fig. 6 and 6 show the shape segmentation and semantic correspondence results over the dimensionality of PEV. Our algorithm performs the best in both when . Despite unsupervisedly segmenting chairs into parts, the extra dimensions of PEV benefit the finergrained task of correspondence (Fig. 7), which in turns help segmentation.Computation Time Our training on one category ( samples) takes hours to converge with a GTXTi GPU, where , , and hours are spent at Stage , , respectively. In inference, the average runtime to pair two shapes () is second including runtimes of , , networks, neighbour search and confidence calculation.
5 Conclusion
In this work, we propose a novel framework including an implicit function and its inverse for dense D shape correspondences of topologyvarying objects. Based on the learnt semantic part embedding via our implicit function, dense correspondence is established via the inverse function mapping from the part embedding to the corresponding D point. In addition, our algorithm can automatically calculate a confidence score measuring the probability of correspondence, which is desirable for manmade objects with large topological variations. The comprehensive experimental results show the superiority of the proposed method in unsupervised shape correspondence and segmentation.
Broader Impact
Product design (e.g., furniture ) is labor extensive and requires expertise in computer graphics. With the increasing number and diversity of D CAD models in online repositories, there is a growing need for leverage them to facilitate future product development due to their similarities in function and shape. Towards this goal, our proposed method provide a novel unsupervised paradigm to establish dense correspondence for topologyvarying objects, which is a prerequisite for shape analysis and synthesis. Furthermore, as our approach is designed for generic objects, its application space can be extremely wide.
Acknowledgement
The authors would like to thank the reviewers and area chairs for their valuable comments and suggestions. We acknowledge Vladimir G. Kim and Nenglun Chen for sharing data and results.
Supplementary
In this supplementary material, we provide: Implementation details, including network structures and training details. Additional experimental results, including expressiveness of the inverse implicit function and visualization of the correspondence confidence score.
A Implementation Details
a.1 Network Structures
PointNet Encoder .
Implicit Function .
The implicit function network follows the work of Chen et al. (2019) (unsupervised case). The implicit function takes the shape code and a spatial point as inputs and predicts the part embedding vector (PEV) . As shown in Fig. 8(b), it is composed of fully connected (FC) layers each of which is applied with , except the final output is applied a activation.
Inverse Implicit Function .
The inverse implicit function is also implemented as an MLP, which is composed of FC layers each of which is applied with , except the final output is applied a activation. As shown in Fig 8(c), the inverse implicit function network inputs the PEVs and shape latent code, and recover the corresponding D points.
a.2 Training Details
Sampling PointValue Pairs.
The training of implicit function network needs pointvalue pairs. Following the sampling strategy of Chen and Zhang (2019), we obtain the paired data offline. are the spatial point and the corresponding occupancy label. We sample points from the voxel models in different resolutions: (), () and () in order to train the implicit function progressively.
Training Process
We summarize the training process in Tab. 2. In Stage , we adopt a progressive training technique Chen and Zhang (2019) to train our implicit function on gradually increasing resolution data (), which stabilizes and significantly speeds up the training process.
Network  Loss  

Stage  ,  
Stage  , ,  and 
Stage  , , 
B Additional Experimental Results
A supplementary video is provided to visualize additional results, explained as follows.
b.1 Expressiveness of Inverse Implicit Function
Given our inverse implicit function, we are able to crossreconstruct each other between two paired shapes by swapping their part embedding vectors. Further, we can interpolate shapes both in shape latent space and
D space and maintain the pointlevel correspondence consistently.CrossReconstruction Performance.
We first show the crossreconstruction performances in the supplementary video. From a shape collection, we can randomly select two shapes and . Their shape codes and can be predicted by the PointNet encoder. With their respectively generated PEVs and , we can swap their PEVs and send the concatenated vectors to the inverse function and obtain , . As shown in the video, the cross reconstructions closely resemble each other, even with different part constitutions. Here, we also provide the crossreconstruction performance of two additional object categories: car and table.
Interpolation in Latent Space.
An alternative way to explore the correspondence ability of the inverse implicit function, is to evaluate the interpolation capability of the inverse implicit function. In this experiment, we first interpolate shapes in the latent space (), and send the concatenated vectors ( and ) to the inverse function. As observed in the video, our inverse implicit function generalizes well the different shape deformations. Moreover, the correspondences are pointtopoint consistent across all the deformations. It also demonstrates that the learned part embedding is discriminative among different parts of shape and pointwise consistent among different shapes.
Latent Interpolation Comparison.
We compare the latent interpolate capability with conventional implicit function. For the conventional implicit function, we sample a grid of points and pass them to the implicit function to obtain its value. With the threshold of , we obtain the surface points. As can be observed in the video, the interpolation performance of our inverse implicit function is better than conventional implicit function in shape generation and deformation. Furthermore, our interpolations are pointtopoint correspondence across all the deformations.
Interpolation in 3D Space.
We also show the interpolation capability of the corresponding points in the D space in the video. Given the estimated dense correspondence, we can compute the correspondence offset vectors for all corresponding pairs of points. Assuming we interpolate the correspondence in video frames, for each frame we move all points of by the amount of and show the moved points. It can be observed that our deformed shape is meaningful and a semantic blending of two shapes. In addition, the correspondence offsets are locally smooth in the D space.
b.2 Visualization of the Correspondence Confidence Score
To further visualize the correspondence confidence score, we provide the confidence score maps for some examples in Figure of the paper. As shown in the video, the confidence score can show the probability around corresponded points between the target shape (red box) and its pairwise source shapes. For example, for the source shapes with arms, we can clearly see the confidence scores of the arm part is significantly lower than other parts.
References
 Learning representations and generative models for 3D point clouds. In ICML, Cited by: §1.
 Deformationdriven topologyvarying 3D shape correspondence. TOG. Cited by: §1, §2.
 Controlling neural level sets. In NeurIPS, Cited by: §1.
 Face recognition based on fitting a 3D morphable model. TPAMI. Cited by: §1.
 FAUST: dataset and evaluation for 3D mesh registration. In CVPR, Cited by: §1.

Learning shape correspondence with anisotropic convolutional neural networks
. In NeurIPS, Cited by: §2.  ShapeNet: an informationrich 3D model repository. arXiv preprint arXiv:1512.03012. Cited by: §4.1.
 Unsupervised learning of intrinsic structural representation points. In CVPR, Cited by: §1, §2, Figure 3, §4.1, §4.2.

BAENET: branched autoencoder for shape cosegmentation
. In ICCV, Cited by: §A.1, §1, §2, §3.1, §4.2, §4.3, Table 1.  Learning implicit fields for generative shape modeling. In CVPR, Cited by: §A.2, §A.2, §1, §2, §3.1, §3.2.
 Learning elementary structures for 3D shape generation and matching. In NeurIPS, Cited by: §4.1.
 Local deep implicit functions for 3D shape. In CVPR, Cited by: §2.
 Learning shape templates with structured implicit functions. In ICCV, Cited by: §2.
 Mesh RCNN. In ICCV, Cited by: §1.
 3DCODED: 3D correspondences by deep deformation. In ECCV, Cited by: §1, §2.
 AtlasNet: a papiermâché approach to learning 3D surface generation. In CVPR, Cited by: §1.
 Unsupervised cycleconsistent deformation for shape matching. In Computer Graphics Forum, Cited by: §2.
 Unsupervised learning of dense shape correspondence. In CVPR, Cited by: §1, §2.
 Learning local shape descriptors from part correspondences with multiview convolutional networks. TOG. Cited by: §1, §2, §4.1.

Analysis and synthesis of 3D shape families via deeplearned generative models of surfaces
. In Computer Graphics Forum, Cited by: §2. 
Joint shape segmentation with linear programming
. In SIGGRAPH Asia, Cited by: §2.  Functional map networks for analyzing and exploring large shape collections. TOG. Cited by: §1.
 Shape registration in implicit spaces using information theory and free form deformations. TPAMI. Cited by: §2.
 A hierarchical framework for high resolution facial expression tracking. In CVPRW, Cited by: §2.
 Learning 3D mesh segmentation and labeling. In SIGGRAPH, Cited by: §2.
 Learning partbased templates from large collections of 3D shapes. TOG. Cited by: §2, Figure 3, §4.1, §4.2.
 Exploring collections of 3D models using fuzzy correspondences. TOG. Cited by: §1, §2, §4.1.
 Matchmaker: constructing constrained texture maps. TOG. Cited by: §1.
 Dense pointtopoint correspondences between genuszero shapes. In Computer Graphics Forum, Cited by: §1, §2.
 Deep functional maps: structured prediction for dense shape correspondence. In ICCV, Cited by: §1, §2.
 3D face modeling from diverse raw scan data. In ICCV, Cited by: §1.
 Learning to infer implicit surfaces without 3D supervision. In NeurIPS, Cited by: §1, §2.
 Occupancy networks: learning 3D reconstruction in function space. In CVPR, Cited by: §1, §2, §3.1.
 Shape unicode: A unified shape representation. In CVPR, Cited by: §2, §4.1.
 Occupancy flow: 4D reconstruction by learning particle dynamics. In ICCV, Cited by: §1, §2.
 Texture fields: learning texture representations in function space. In ICCV, Cited by: §2.
 Functional maps: a flexible representation of maps between shapes. TOG. Cited by: §1, §2.
 DeepSDF: learning continuous signed distance functions for shape representation. In CVPR, Cited by: §1, §2.
 Learning unsupervised hierarchical part decomposition of 3D objects from a single rgb image. In CVPR, Cited by: §2.
 Pointnet: deep learning on point sets for 3D classification and segmentation. In CVPR, Cited by: §A.1, §1, §3.1, §3.2.
 Pointnet++: deep hierarchical feature learning on point sets in a metric space. In NeurIPS, Cited by: §1.
 Unsupervised deep learning for structured shape matching. In ICCV, Cited by: §1, §2.
 PIFu: pixelaligned implicit function for highresolution clothed human digitization. In ICCV, Cited by: §1, §2.

Unsupervised cosegmentation of a set of shapes via descriptorspace spectral clustering
. In SIGGRAPH Asia, Cited by: §1, §2.  Scene representation networks: continuous 3Dstructureaware neural scene representations. In NeurIPS, Cited by: §2.
 Towards implicit correspondence in signed distance field evolution. In ICCV, Cited by: §2.
 Soft maps between surfaces. In Computer Graphics Forum, Cited by: §1.
 Learning dense 3D correspondence. In NeurIPS, Cited by: §1, §2.
 Deep functional dictionaries: learning consistent semantic structures on 3D models from functions. In NeurIPS, Cited by: §2.
 A survey on shape correspondence. In Computer Graphics Forum, Cited by: §1.
 Pixel2mesh: generating 3D mesh models from single RGB images. In ECCV, Cited by: §1.
 A scalable active framework for region annotation in 3D shape collections. TOG. Cited by: §4.2.
 SyncspecCNN: synchronized spectral CNN for 3D shape segmentation. In CVPR, Cited by: §2.
 KeypointNet: a largescale 3D keypoint dataset aggregated from numerous human annotations. In CVPR, Cited by: §2.
 Deformationdriven shape correspondence via shape recognition. TOG. Cited by: §2.
 3D menagerie: modeling the 3D shape and pose of animals. In CVPR, Cited by: §1.
Comments
There are no comments yet.