1 Introduction
Large collections of 3D models enable datadriven techniques for interactive geometry modeling, shape synthesis, imagebased reconstruction, and shape completion [MWZ14]. Many of these techniques require the collection to have additional surface annotations such as segmentation into functional [YKC16] or geometric [LSD18] parts. The notion of parts and their granularity can vary significantly across different tasks, so many novel applications require new types of annotations [MZC19, YLZ19, WZS19]
. Deep learning algorithms have recently achieved stateoftheart in automatically predicting such surface annotations
[QSMG16, QYSG17, WSL18]. However, they typically require a significant number of training examples for every shape category, which limits their applicability, and bears significant startup cost in introducing a new type of annotation. In this work, we propose a new deep learning approach which leverages large nonannotated object collections to perform fewshot segmentation.We rely on the idea to use shape matching to transfer labels from similar examples. This approach has been shown to be robust in extreme “fewshot” learning scenarios [YKC16]
and can work robustly even in heterogeneous datasets as long as labeled models roughly span all the shape variations. The fewshots segmentation problem then amount to the fundamental problem of identifying correspondences between shapes. There is a vast amount of work on shape matching, which can be roughly separated in two trends: (i) classical optimization based approaches; (ii) recent approaches where correspondences are directly predicted by a neural network.
Traditional, optimizationbased methods such as iterative closest point (ICP) algorithm, are fast and effective with good initial guesses and few degrees of freedom (e.g., a rigid motion)
[RL01]. More flexible correspondence algorithms for dissimilar models usually require significantly more compute time to optimize for larger number of degrees of freedom [BR07, KLF11, CK15]. Since directly matching dissimilar shapes poses significant challenges, these methods often rely on joint analysis of the entire collection [KLM12], leveraging cycle consistency priors during optimization [HG13, NBCW11]. These joint correspondence estimation methods tend to be very compute heavy and as new models are added to the collection, the entire optimization needs to be repeated. We thus turned to deep learningbased approaches.
Indeed, with the recent advances in neural networks for geometry analysis, learningbased methods have been proposed to address the matching problem. Of particular interest to us is the method of Groueix et al. [GFK18a], which demonstrate that one can learn how to deform a human body template to the target point cloud, even without correspondence supervision. In their approach, the target point cloud is encoded into a latent descriptor space (via PointNet encoder [QSMG16]), and then the deformation network takes the target descriptor and a point on the template, and maps the point to new position so that it aligns to the target. This approach is efficient, since it only requires a forward pass through a network. It also has the benefit of holistic understanding of shape deformations, since the same neural network is trained for all models in the input collection. However, it has to be trained specifically for each template, limiting this method to analysis of geometrically and topologically similar shape collections, such as human bodies. If such a template is not available, one can pick a very generic shape (e.g., a sphere) and still obtain some correspondences via the intermediate domain [GFK18b]. However, as we will show, the quality of the correspondences will degrade significantly as shapes deviate from that domain.
In this work we propose a novel neural network architecture that learns to match shapes directly, without relying on a predefined template, by learning to predict deformations that aligns points on the source shape to points on the target. Note that the transformation can be much more complex than a rigid transformation, and that the space of meaningful transformation is defined implicitly by the (unlabelled) training data. We encode both source and target shapes and then predict the deformed position for every point on the source conditioned on these two codes, unlike prior work that use a fixed template common to all the shapes. We show that the results obtained can be greatly improved if the network is trained not only with a reconstruction loss, which encourages it to deform the source shape into the target shape, but also using a cycle consistency loss. Indeed a deformation which respects correspondences should be consistent between pairs of shapes i.e., the deformation from A to B should be the inverse of the deformation from B to A . More generally, in larger cycles of shapes , global consistency is achieved if the composition of the N successive mappings from to is identity. This new consistency loss used during training can be seen as playing a role similar to the global consistency objective used in optimizationbased approaches. Finally, our network is trained in an selfsupervised manner using only shape reconstruction and cycle consistency losses.
We demonstrate the effectiveness of our approach for shape matching by propagating segmentations in a fewshot learning setting on the ShapeNet part dataset [YKC16]. We first show that in this extreme case with very few training examples, PointNet [QSMG16], a strongly supervised method, fails to generalize. Then, we propose several strategies for picking source shapes and propagate the signal from them, using our predicted correspondences. We demonstrate that even with a simple strategy, such as picking the source with smallest Chamfer distance, our method is better at transferring segmentations than other fast correspondence techniques such as ICP with rigid transformation and a prior learningbased method that aligns sphere and plane templates [GFK18b].
into a latent feature vector, from which an MLP predicts transformation parameters, used in
(b) to deform into , by stacking Transformation Layers (TL) and FullyConnected Layers (FC).2 Related Work
Shape matching is a longstanding problem in shape analysis [vKZHCO11]. It is often done explicitly, by deforming a source shape to a target [RL01, BR07, LSP08, HAWG08, ZSCO08], or implicitly, by mapping points [KLF11, CK15, OMMG10, BBK06] or functions [OBCS12, RPWO18, EBC17] on one shape to another. The deformationbased methods typically aim to minimize the amount of distortion introduced by the deformation, and the mappingbased approaches often assume that shapes to be nearisometric. Both assumptions do not hold for very dissimilar shapes.
To address this challenge, some prior methods leverage additional context of the entire shape collection in a joint optimization [KLM12, NBCW11]. These techniques often use cycleconsistency as additional cue [HZG12, HG13, ROA13]. This, enables estimating correspondences even between dissimilar objects by mapping via intermediate shapes. While these traditional optimization techniques are very powerful, nonrigid matching involves optimizing for many degrees of freedom with complex nonconvex objective functions, and takes minutes or hours. To make matters worse, joint analysis usually scales in a superlinear manner with number of models, and if a new shape is added to a collection, the entire optimization needs to be repeated.
Recently, learningbased correspondence techniques were used to address these limitations. They are fast, typically only requiring a forward pass through a neural network, and they enable joint analysis of a collection of shapes, since multiple shapes are typically used during training. Descriptorbased methods embed each shape point into some highdimensional space, where corresponding points are embedded nearby [HKC18, BMRB16, WHC16]. In most cases, however, a more holistic mapping for the entire shape is often preferred, since it is more capable of preserving the intrinsic shape structure. Litany et al. [LRR17] use a deep neural network to predict a soft intersurface mapping a common representation used in functional map framework. Groueix et al. [GFK18a] propose to train a network that predicts a deformation for each point on a template. A similar method that uses planes or spheres can be used in case such a template is not available [GFK18b]. These techniques struggle with diverse shape collections when matched shapes have very different topology and geometry. Instead, we propose a method that takes both source and target shape as input and infers the mapping. We also propose a novel regularization term favoring cycleconsistency when mapping across multiple shapes in the collection. A similar cycleconsistency loss for training deep networks to predict correspondences between images of different instances of objects from the same category has recently been used in [ZKA16]. In this work, views rendered from different viewpoints from a 3D model were used to avoid the trivial identity flow solution, but no correspondence between 3D shapes was predicted.
We demonstrate the value of our method for fewshot segmentation transfer. While many techniques have been developed for strongly supervised mesh segmentation [QSMG16, QYSG17, WSL18, LSD18, KAMC17, KHS10], they typically rely on many training examples and fail in a fewshot scenarios (see Table 1). In these cases, some framework propose to rely on propagating annotations from most similar annotated shapes via global or local shape matching [YKC16]. In fact, it is common for correspondence techniques to be evaluated and used for transferring various signals between shapes [OBCS12, KLF11, ACBCO17, CFG15].
3 Learning asymmetric cycleconsistent shape matching
We address the surface matching problem by training a model that takes as inputs a source shape, a target shape, and a point on the source shape and generates the corresponding point on the target shape. As pointed out in Groueix et al. [GFK18a], a learnable model allows for efficient surface matching, which is in contrast to approaches requiring optimization over a collection of pairwise shape matches [NBCW11].
We assume that shapes are represented as point sets sampled from the shapes’ surface. Given point sets and , our goal is to learn a mapping function that takes a 3D point to its corresponding point . If is a function on points and A a set of points, we denote by the set .
First, building on work on unsupervised templatebased shape correspondence [GFK18a] we use a Chamfer loss to minimize the distance between deformed source and the target . Unlike prior work, however, we do not assume that all of our shapes are derived from the same template and directly predict templatefree correspondences between pairs of shapes.
Second, we seek to leverage the success of cycle consistency, which has been used in shape collection optimization [NBCW11]
and more recently in selfsupervised learning
[ZPIE17], during training of our learnable mapping function. Formally, for shapes that are assumed to be put into correspondence, we enforce that the learnable mapping function satisfies,(1) 
We use cycleconsistency training losses for cycles of lengths two and three as it implies consistency for cycles of any length [NBCW11]. We visualize our cycleconsistency loss in Figure 1.
4 Approach
We describe our learnable mapping function , implemented as a twostage neural network, in Section 4.1, our training losses in Section 4.2, and application to segmentation in Section 4.3.
4.1 Architecture
The architecture of our shape transformation model from a source shape A to a target shape B is visualized in Figure 2 and can be separated into two parts: (a) a parameter prediction network which outputs transformation parameters given the two shapes (Figure 2a); (b) a deformation network that transforms the first shape into the second one using the predicted parameters (Figure 2b). We now describe these two components.
To predict transformation parameters, A and B are first passed into two independent PointNet networks [QSMG16] leading to feature encodings and of size 512. The resulting concatenated descriptor
contains information about the pair (A, B). A multilayer perceptron (MLP) then predicts transformation parameters vectors
from this concatenated feature.The deformation network (Figure 2b) takes a surface point in and outputs the associated deformed point. The network is composed of modules each with the same architecture. Let’s call the input of module and its output. The operation computed by this module is:
(2) 
where is the matrix of parameters of a fullyconnected layer in , "" refers to the Hadamard (term to term) product,
is the activation function for module
and are the transformation parameters, both in in , corresponding to a scale and a bias in each dimension. Note that this is similar to the architecture of the Tnet modules in [QSMG16, JSZ15], but using fewer predicted parameters. Also note that equation 2 is differentiable, which enables the two subnetworks to be trained jointly in an endtoend fashion. In all of our experiments we usedmodules, 64 dimensions for each intermediary feature and ReLU activations for all but the last layer, for which we used a hyperbolic tangent.
We train for 500 epochs with Adam
[KB14] starting with a learning rate of divided by 10 after 400 epochs.4.2 Training Losses
We train our deformation by minimizing the weighted sum over several components: a loss enforcing cycle consistency , Chamfer distance loss , and a self reconstruction loss :
We only use the selfreconstruction loss to stabilize the beginning of the training and disable it after 30 epochs to focus on cycle consistency and reconstruction losses. We train all parameters in our network by sampling triplets of shapes which are needed by our 3cycle consistency and enforcing all other losses on all the associated deformations. We first explain how we sampled these triplets, then detail the different terms of our loss.
4.2.1 Training shape sampling
For our cycleconsistency loss, we require a valid mapping across shape triplet (A, B, C). As different shape categories may have different topologies, we train categoryspecific networks. Furthermore, as there may be topological changes within a single category, for shape A, we randomly sample shapes B and C from the K nearest neighbors of A under chamfer distance. We take and demonstrate in the ablation study the superiority of this approach over random sampling of shape triplets.
We apply data augmentation on each sampled shape in this order : a random rotation around the axis of a random angle between and , an anisotropic scaling of random scale between and , a bounding box normalization, and a small random translation below 0.03.
4.2.2 Cycleconsistency loss
The cycle consistency loss is based on the intuition that a point deformed through any cycle of deformations should be mapped back to itself. One way to enforce consistency would be to compute composite functions, for two shapes and minimizing for all in . However is typically not an element of , and computing would thus require computing the deformations of other points than the points of . To avoid this, we consider instead projections of the deformed shapes to the target shapes. More precisely, we define the shape projection operator
(3) 
and enforce 2cycle consistency between and by minimizing
(4) 
and cycle consistency for the cycle by minimizing
(5) 
Our full cycleconsistency loss is simply defined by summing over possible all possible two and three cycles using a sampled triplet (A, B, C).
(6) 
Enforcing 2 and 3cycle consistency implies consistency for any cycle [NBCW11].
4.2.3 Reconstruction loss
As discussed in section 3, we want to enforce that every point in the target shape is well reconstructed, but not necessarily that any point in the source shape is mapped to the target shape, in case some part appear in the source and not the target. We thus used asymmetric Chamfer distance to quantify how well the network has generated the target shape. More precisely, given a pair of shapes (X,Y), the asymmetric chamfer computes the average distance between a point and its nearest neighbor in .
(7) 
Given a training triplet , we define the reconstruction loss by summing the asymmetric chamfer loss on all 6 possible (source, target) couples.
(8) 
If segmentation is available for the training shapes, we can compute the distance in equation 7 on each segment independently, which would add supervision on the correspondences. We of course do not use such labels for our fewshot learning experiments, but show in Table 2 it can be used if available to slightly boost our results.
4.2.4 Selfreconstruction loss
We can fully supervise the deformation by manually deforming a shape with a known transformation. We found such a supervision was helpful to stabilize and speed up the beginning of our training. Concretely, we sampled deformations similar to what we did for data augmentation (described above in 4.2.1) by composing (1) a rotation, (2) an anisotropic scaling, and (3) a rescaling to a centered bounding box. Given a transformation , we compute the average distance between the two images of a point under and the predicted mapping function .
(9) 
Our corresponding selfreconstruction loss is the sum of this loss for each of the three point clouds in the triplet (A, B, C) with different random transformations.
(10) 
4.3 Application to segmentation
Learning a deformation between two shapes provides an intuitive method to transfer label information, such as a part segmentation, from a labeled shape to an unlabeled one. In this formulation, we assume we are given a (small) number of labeled shapes, and seek to label each point on an unlabeled test shape. This requires us to decide which of the labeled shapes we should use as the source to propagate labels to the target shapes.
Selection Criteria. Given a target , We manually define 4 possible source selection criteria:

Nearest Neighbor: The source shape that minimizes the Chamfer distance between and is selected.

Deformation Distance: The source shape that minimizes the Chamfer distance between and is selected.

Cosine Distance: The source shape that minimizes the cosine distance distance between the PointNet encodings and is selected.

Cycle Consistency: The source shape that minimizes 2cycle loss for the pair is selected.
Having selected a pair , labels can be transferred directly with our approach.
Voting strategy. Instead of selecting a single source shape to get labels from, combining several voting shapes allows for better segmentation. We select the Kbest sources, and make each source shape vote with equal weight for the label of each target point. We evaluate the benefits of this voting approach in Section 5.2.2.
(target) 
(source) 
retrieved shape 
to input 
labels 
segmentation 
(target) 
(source) 
retrieved shape 
to input 
labels 
segmentation 
5 Results
In this section, we show qualitative and quantitative results on the tasks of fewshot and supervised semantic segmentation and compare against several baselines.
Data and evaluation criteria. We evaluated our approach on the standard ShapeNet part dataset [YKC16]. We restricted ourselves to the 5 most populated categories, namely Airplane, Car, Chair, Lamp, and Table. Point clouds sampled on mesh objects are densely labeled for segmentation with one to five parts. We follow Qi et al. [QSMG16] and report the mean intersection over union (mIoU) between the predicted and ground truth segmentation across instances in a category.
Baselines. We compare our unsupervised approach against supervised and unsupervised approaches. We used PointNet as a supervised baseline. Our unsupervised baselines include a learned approach derived from Atlasnet [GFK18b] and variants of iterative closest points (ICP) [Zha94, BM92]. AtlasNet is a templatebased reconstruction method that predicts a transformation of the template matching the target shape. The learned deformations have been previously observed to be semantically consistent [GFK18a]. To transfer segmentation labels from a source to a target, we project the source labels on the source reconstruction through nearest neighbors, then on the template through dense correspondence between the template and the source reconstruction. Similarly, we transfer labels on the template to the target by dense correspondence and nearest neighbors. AtlasNet is trained on the same train/test splits as our approach. We consider two settings of AtlasNet – with 10 patches or 1 sphere as the template. Additionally, we use two standard shape alignment baselines. First, labels can be transferred from source to target through nearest neighbor matching, which we call the Identity baseline. An immediate refinement over this baseline is to apply ICP to align the source to the target, and then use nearest neighbors. We call the latter the ICP baseline.
5.1 Qualitative Results
Correspondences. In figure 5 we visualize in more detail the correspondences obtained with our approach. We visualize how each point on the source shape is deformed and transferred to the target shape using a colored checkerboard. For each example, we show a successful deformation (top) and a failure case (bottom). Note how the checkerboard appears nicely deformed in the case of successful deformation, and still appears consistent on some parts in the failure cases.
Cycleconsistency. In figure 6 we compare the mappings learned by our approach with and without cycleconsistency loss. The Chamfer Distance is a point based loss with no control over the amount of distorsion. Notice in this case that the deformed source has large triangles. It indicates that the mapping learned by a Chamfer loss alone is not smooth, and can’t be used in label tranfer. On the other hand, the cycleconsistency loss leads to a smooth and high quality mapping.
Segmentation transfer. When looking at the results, a first surprising observation is the high quality of the identity baseline (this is quantitatively confirmed in Table 2). Indeed, the different criteria tend to select shapes that are really close to the target. To focus on interesting examples, we selected in Figure 3 the pairs that maximize the performance improvement provided by our method compared to the identity baseline using the cycleconsistencyselection criterion. The richness of the learned deformations allows our method to find meaningful correspondences in cases where the training example is far from the target shape and the identity baseline does not work. Note that the deformations are often far from isometric. Thus, methods that rely on regularization toward identity, a popular approach to regularize learned deformations [GFK18a, KTEM18, WZL18], would likely fail.
Failure cases. Figure 4 shows failures of our method. We show for each category the pair which minimizes our segmentation transfer performance. It is clear that the corresponding shapes are rare and specific object instances. We observe two main sources of errors. First, in some cases where we correctly deform in , the ground truth labeling was inconsistent, leading to large errors. For example, notice how the source airplane has a single label. Second, and are sometimes too distant topologically so that a highfidelity reconstruction of is impossible by deforming . For example, notice how the pole of the lamp has been erroneously inflated to match the target shape.
5.2 Quantitative Results
5.2.1 Fewshot Segmentation
In this section, we evaluate our approach on the task of transferring semantic labels from a small set of segmented shapes to unlabeled data.
10 shots  Selection Criterion  Airplane  Car  Chair  Lamp  Table 

(a) Pointnet    
(b) Atlasnet Patch  Nearest Neighbors  
(c) Atlasnet Sphere  Nearest Neighbors  
(d) ICP  Nearest Neighbors  
(e) Ours  Nearest Neighbors  
(f) Ours  Cycle Consistency  
(g) Ours  Oracle 
We report quantitative results for fewshot semantic segmentation on point clouds in Table 1
. Note that the learningbased methods are all trained separately for each category. Since the results depend on the sampled shapes used in the training set, we report the average and standard deviation over ten randomly sampled training sets. We use the Nearest Neighbors criterion to pair sources and targets and compare our approach against all baselines
(b, c, d, e). Notice that our approach outperforms all baselines on all categories. Interestingly, the AtlasNet baseline is not on par with ICP, hinting at the difficulty of predicting two consistent deformations of the template.We find that the Cycle Consistency criterion (f) is a stronger selection criterion than Nearest Neighbors and boosts the results simply by selecting a better (Source, Target) pair. We also report an oracle sourceshape selection with our approach where the source shape maximising IoU with the target is selected, which corresponds to the scenario where an optimal source shape is selected. Notice the large improvement of the oracle, showing the quality of our deformations and the potential of our method.
5.2.2 Supervised segmentation
Our method is not designed to be competitive when many training samples are available. Indeed, it solves for the deformation against each of the provided segmented shapes, which for large numbers of examples can be computationally expensive compared to feedforward segmentation predictions like PointNet [QSMG16]. One forward pass through our network deforms a source shape in a target shape in 7 milliseconds (ms), with a 7ms standard deviation (std). ICP takes 28 ms with a 17 std^{1}^{1}1 We use Open3D [ZPK18] to compute ICP ran on Intel i76900K  3.2 GHz and run our method on an NVIDIA TITAN X.. Here, however, we study the performance of our method in this case, using the segmentation of the many training shapes as supervision during training and making the ten best shapes vote during testing. We report results of our unsupervised method. In addition, we consider adding supervision to our approach by computing Chamfer distances over points with the same segmentation label. The corresponding results are reported in Table 2
Table 2 shows that, when using all the annotations, nearest neighbors is again a surprisingly good baseline, only slightly below performance of PointNet. Despite the good performance of the identity baseline, our method outperforms it in all categories and performs on par with PointNet. Note that the encoders of our approach incorporate two PointNet architectures, which makes this result intuitive.
Table 2 also highlights the importance of the criterion selection. Notice the significant boost in each category gained by carefully choosing the selection criterion over the Nearest Neighbors criterion. The exciting performance of the oracle, way over the PointNet baseline, is another incentive at carefully designing selection criteria.
Finally, notice that our unsupervised trained model is on par with our supervised one. The boost gained by supervised training is marginal except in the car category. It confirms that our cycleconsistent loss is efficient to enforce meaningful part correspondence.
Selection  Airplane  Car  Chair  Lamp  Table  
(a) Pointnet    83.4  74.9  89.6  80.8  80.6 
(b) Identity  NN  81.3  74.0  86.1  78.4  78.9 
(c) Ours unsup  NN  81.5  73.9  86.6  78.8  79.2 
(d) Ours unsup  Best criterion  83.4  74.6  88.4  79.8  79.7 
(e) Ours unsup  Oracle  87.9  78.9  93.0  93.9  89.3 
(f) Ours sup  NN  81.2  75.9  86.9  78.4  79.0 
(g) Ours sup  Best criterion  83.5  76.4  88.8  79.3  79.9 
(h) Ours sup  Oracle  88.0  80.2  93.1  93.4  89.4 
5.2.3 Selection criteria and voting strategy
Figure 7 shows a quantitative comparison on all criteria, on all category for the identity baseline and our approach using a voting strategy with different number of shapes. The oracle, and PointNet performances are also reported. The Deformation Distance criterion outperforms all other criteria but remains far from the oracle. The oracle performs better than the PointNet baseline across all categories. As a sanity check, we observe that our method outperforms the identity baseline in all settings, showing that it helps to apply our method to transfer labels from to .
Figure 7 also confirms that using several source shapes is beneficial when many annotated examples are available. In the limit, when all source shapes vote and selection criterion does not matter anymore, an average labelling is predicted with poor performances, which again outlines the importance of source selection. Using nine source shapes performs the best across most criteria and categories when all the training annotations can be used.
5.3 Ablation Study
In this section we conduct an ablation study to empirically validate our approach. Table 3 shows performances without the cycle loss, without Chamfer loss, and without any specific triplet sampling strategy during training, simply selecting random shapes.
Table 3 shows that the cycle consistency loss is critical to the success of our method (relative drop of in IoU). Training without Chamfer distance as a reconstruction loss performs slightly better than the identity baseline and below our approach. This highlight the fact that the cycle consistency loss also acts as a reconstruction loss. Finally, our triplet sampling strategy during training provides a small boost.
Car/100 shots  Nearest Neighbor  Oracle 

(a) Identity  67.60  73.59 
(b) Ours  68.19  75.87 
(c) Ours w/o cycle loss  52.78  59.63 
(d) Ours w/o chamfer  66.21  74.31 
(e) Ours w/o knn restriction 
67.70  75.23 
5.4 Hyperparameter Study
Figure 8 demonstrates once more that the cycleconsistency loss is the pivotal insight of our method. It also outlines the stability of the results for different weightings of our losses. Note how performances are maintained even in the extreme case with only the cycleconsistency loss. Indeed, the identity function is not a trivial minimum of the cycle consistency loss because of the projection step.
6 Conclusion
We have presented a method for learning a parametric transformation between two surfaces that leverages cycleconsistency as a supervisory signal to predict meaningful correspondences. Our method does not require an object template, can operate without any intershape correspondences supervision, and does not assume the deformation is nearly isometric. We demonstrate that our method is able to transfer segmentation labels from a very small number of labeled examples significantly better than stateoftheart methods, and match the segmentation performance when a larger training dataset is provided.
We believe that the large gap between our performance and the “oracle shape” which provides maximal accuracy shows that using learned deformations to transfer labels, investigating ways to better understand what source models should be selected and new ways to aggregate information across multiple sources is a very promising research direction.
References
 [ACBCO17] Azencot O., Corman E., BenChen M., Ovsjanikov M.: Consistent functional cross field design for mesh quadrangulation. ACM Trans. Graph. 36, 4 (July 2017), 92:1–92:13.
 [BBK06] Bronstein A. M., Bronstein M. M., Kimmel R.: Generalized multidimensional scaling: A framework for isometryinvariant partial surface matching. Proceedings of the National Academy of Sciences 103, 5 (2006), 1168–1172.
 [BM92] Besl P. J., McKay N. D.: A method for registration of 3d shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14, 2 (Feb. 1992), 239–256.
 [BMRB16] Boscaini D., Masci J., Rodolà E., Bronstein M. M.: Learning shape correspondence with anisotropic convolutional neural networks. CoRR abs/1605.06437 (2016).
 [BR07] Brown B., Rusinkiewicz S.: Global nonrigid alignment of 3D scans. ACM Transactions on Graphics (Proc. SIGGRAPH) 26, 3 (Aug. 2007).
 [CFG15] Chang A. X., Funkhouser T. A., Guibas L. J., Hanrahan P., Huang Q., Li Z., Savarese S., Savva M., Song S., Su H., Xiao J., Yi L., Yu F.: Shapenet: An informationrich 3d model repository. CoRR abs/1512.03012 (2015).
 [CK15] Chen Q., Koltun V.: Robust nonrigid registration by convex optimization. ICCV (2015).
 [EBC17] Ezuz D., BenChen M.: Deblurring and denoising of maps between shapes. Comput. Graph. Forum 36, 5 (Aug. 2017), 165–174.
 [GFK18a] Groueix T., Fisher M., Kim V. G., Russell B., Aubry M.: 3dcoded : 3d correspondences by deep deformation. In ECCV (2018).

[GFK18b]
Groueix T., Fisher M., Kim V. G., Russell B., Aubry M.:
AtlasNet: A PapierMâché Approach to Learning 3D Surface
Generation.
In
Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)
(2018).  [HAWG08] Huang Q., Adams B., Wicke M., Guibas L. J.: Nonrigid registration under isometric deformations. In Computer Graphics Forum (2008), vol. 27, pp. 1449–1457.
 [HG13] Huang Q.X., Guibas L.: Consistent shape maps via semidefinite programming. In Proceedings of the Eleventh Eurographics/ACMSIGGRAPH Symposium on Geometry Processing (AirelaVille, Switzerland, Switzerland, 2013), SGP ’13, Eurographics Association, pp. 177–186.
 [HKC18] Huang H., Kalogerakis E., Chaudhuri S., Ceylan D., Kim V. G., Yumer E.: Learning local shape descriptors from part correspondences with multiview convolutional networks. Transactions on Graphics (2018).
 [HZG12] Huang Q.X., Zhang G.X., Gao L., Hu S.M., Butscher A., Guibas L.: An optimization approach for extracting and encoding consistent maps in a shape collection. ACM Trans. Graph. 31, 6 (Nov. 2012), 167:1–167:11.
 [JSZ15] Jaderberg M., Simonyan K., Zisserman A., et al.: Spatial transformer networks. In Advances in neural information processing systems (2015), pp. 2017–2025.
 [KAMC17] Kalogerakis E., Averkiou M., Maji S., Chaudhuri S.: 3D shape segmentation with projective convolutional networks. In Proc. IEEE Computer Vision and Pattern Recognition (CVPR) (2017).
 [KB14] Kingma D. P., Ba J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
 [KHS10] Kalogerakis E., Hertzmann A., Singh K.: Learning 3D Mesh Segmentation and Labeling. ACM Transactions on Graphics 29, 3 (2010).
 [KLF11] Kim V. G., Lipman Y., Funkhouser T.: Blended intrinsic maps. Transactions on Graphics (Proc. of SIGGRAPH), 4 (2011).
 [KLM12] Kim V. G., Li W., Mitra N. J., DiVerdi S., Funkhouser T.: Exploring Collections of 3D Models using Fuzzy Correspondences. Transactions on Graphics (Proc. of SIGGRAPH), 4 (2012).
 [KTEM18] Kanazawa A., Tulsiani S., Efros A. A., Malik J.: Learning categoryspecific mesh reconstruction from image collections.
 [LRR17] Litany O., Remez T., Rodolà E., Bronstein A. M., Bronstein M. M.: Deep functional maps: Structured prediction for dense shape correspondence. CoRR abs/1704.08686 (2017).
 [LSD18] Li L., Sung M., Dubrovina A., Yi L., Guibas L. J.: Supervised fitting of geometric primitives to 3d point clouds. CVPR (2018).
 [LSP08] Li H., Sumner R. W., Pauly M.: Global correspondence optimization for nonrigid registration of depth scans. Computer Graphics Forum (Proc. SGP’08) 27, 5 (July 2008).
 [MWZ14] Mitra N. J., Wand M., Zhang H., CohenOr D., Kim V. G., Huang Q.X.: StructureAware Shape Processing. SIGGRAPH Course notes (2014).
 [MZC19] Mo K., Zhu S., Chang A., Yi L., Tripathi S., Guibas L., Su H.: PartNet: A largescale benchmark for finegrained and hierarchical partlevel 3D object understanding.
 [NBCW11] Nguyen A., BenChen M., Welnicka K., Ye Y., Guibas L.: An optimization approach to improving collections of shape maps. In Computer Graphics Forum (2011), vol. 30, Wiley Online Library, pp. 1481–1491.
 [OBCS12] Ovsjanikov M., BenChen M., Solomon J., Butscher A., Guibas L.: Functional maps: A flexible representation of maps between shapes. ACM Trans. Graph. 31, 4 (July 2012), 30:1–30:11.
 [OMMG10] Ovsjanikov M., Mérigot Q., Mémoli F., Guibas L. J.: One point isometric matching with the heat kernel. Comput. Graph. Forum 29, 5 (2010), 1555–1564.
 [QSMG16] Qi C. R., Su H., Mo K., Guibas L. J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. arXiv preprint arXiv:1612.00593 (2016).
 [QYSG17] Qi C. R., Yi L., Su H., Guibas L. J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017).
 [RL01] Rusinkiewicz S., Levoy M.: Efficient variants of the icp algorithm. In Proceedings Third International Conference on 3D Digital Imaging and Modeling (2001).
 [ROA13] Rustamov R. M., Ovsjanikov M., Azencot O., BenChen M., Chazal F., Guibas L.: Mapbased exploration of intrinsic shape differences and variability. ACM Trans. Graph. 32, 4 (July 2013), 72:1–72:12.
 [RPWO18] Ren J., Poulenard A., Wonka P., Ovsjanikov M.: Continuous and orientationpreserving correspondences via functional maps. ACM Trans. Graph. 37, 6 (Dec. 2018), 248:1–248:16.
 [vKZHCO11] van Kaick O., Zhang H., Hamarneh G., CohenOr D.: A survey on shape correspondence. Computer Graphics Forum 30, 6 (2011), 1681–1707.
 [WHC16] Wei L., Huang Q., Ceylan D., Vouga E., Li H.: Dense human body correspondences using convolutional networks. In Computer Vision and Pattern Recognition (CVPR) (2016).
 [WSL18] Wang Y., Sun Y., Liu Z., Sarma S. E., Bronstein M. M., Solomon J. M.: Dynamic graph CNN for learning on point clouds. CoRR abs/1801.07829 (2018).
 [WZL18] Wang N., Zhang Y., Li Z., Fu Y., Liu W., Jiang Y.G.: Pixel2mesh: Generating 3d mesh models from single rgb images. In ECCV (2018).
 [WZS19] Wang X., Zhou B., Shi Y., Chen X., Zhao Q., Xu K.: Shape2motion: Joint analysis of motion parts and attributes from 3d shapes. In CVPR (2019), p. to appear.
 [YKC16] Yi L., Kim V. G., Ceylan D., Shen I.C., Yan M., Su H., Lu C., Huang Q., Sheffer A., Guibas L.: A scalable active framework for region annotation in 3d shape collections. SIGGRAPH Asia (2016).
 [YLZ19] Yu F., Liu K., Zhang Y., Zhu C., Xu K.: Partnet: A recursive part decomposition network for finegrained and hierarchical shape segmentation. In CVPR (2019), p. to appear.
 [Zha94] Zhang Z.: Iterative point matching for registration of freeform curves and surfaces, 1994.
 [ZKA16] Zhou T., Krähenbühl P., Aubry M., Huang Q., Efros A. A.: Learning dense correspondence via 3dguided cycle consistency. In Computer Vision and Pattern Recognition (CVPR) (2016).

[ZPIE17]
Zhu J.Y., Park T., Isola P., Efros A. A.:
Unpaired imagetoimage translation using cycleconsistent adversarial networks.
In Computer Vision (ICCV), 2017 IEEE International Conference on (2017).  [ZPK18] Zhou Q.Y., Park J., Koltun V.: Open3D: A modern library for 3D data processing. arXiv:1801.09847 (2018).
 [ZSCO08] Zhang H., Sheffer A., CohenOr D., Zhou Q., van Kaick O., Tagliasacchi A.: Deformationdrive shape correspondence. Computer Graphics Forum (Special Issue of Symposium on Geometry Processing) 27, 5 (2008), 1431–1439.