1. Introduction
Detecting, quantifying and analyzing variability in shape collections is a fundamental task in computer graphics and geometry processing, with applications across multiple domains, including in statistical shape analysis [Anguelov et al., 2005; Bogo et al., 2014; Hasler et al., 2009], shape exploration [Kim et al., 2012; Rustamov et al., 2013; Kleiman et al., 2015], shape correspondence [Huang et al., 2014] and cosegmentation [Wang et al., 2012]. A key question that arises in all techniques for extracting variability is the choice of the right shape representation, which can reveal the structure of each shape in the context of the collection while also being compact and easy to manipulate, enabling efficient shape analysis and processing.
The majority of existing techniques dedicated to extracting variability in a collection are based on first selecting a template (or base) shape and considering the changes on all other shapes with respect to this template — this is the standard practice in medical domains where the reference shape is often referred to as an “atlas” (e.g., in brain anatomy) [Grenander and Miller, 1998]. In computer graphics this approach is common both in shape reconstruction and in statistical shape analysis [Anguelov et al., 2005; Bogo et al., 2014; Hasler et al., 2009], but also in shape exploration (e.g., [Kim et al., 2012; Ovsjanikov et al., 2011; Kim et al., 2013; Rustamov et al., 2013] among many others) where the template is often constructed by either simplifying some fixed base shape or by using shape abstractions derived from collections of parts and their relations.
Although easy and intuitive, templatebased shape exploration and analysis has obvious limitations when shape variability is large and no single prototype adequately models all given shapes. But even in settings of more modest variation, there are significant limitations: first, the choice of the template can significantly affect the results in terms of the types of variability that is detected and highlighted. Second, considering the variability with respect to a fixed base shape can make it difficult to reveal crossclass variability that becomes apparent only when comparing all pairs of shapes in the collection. Third, even when the base shape is given, the exact choice of encoding for the variability remains crucial. For example, while simple techniques based on the displacement of each template vertex might be relevant for reconstruction and statistical shape analysis, their use is very limited in the context of learning since, as they are not invariant to even the basic rigid motions.
This question of shape representation has also become particularly important with the advent of powerful techniques based on deep learning and convolutional neural networks. Although very successful in image analysis, their adoption for shape processing has so far been relatively limited due to representational differences. Common 3D representations such as meshes or point clouds are irregular, unlike the regular grids defining 2D images, making it challenging to define notions such as convolution or to encode basic 3D invariances. While some significant progress has been made in this direction in the past few years (see e.g., [Maron et al., 2017] and [Bronstein et al., 2017] for an overview), the question of defining a representation that is at once invariant, compact and wellsuited for learning remains open.
In this paper we present a novel approach to encoding shapes in the context of a collection that helps overcome many of the above limitations. Specifically, starting from a collection of shapes with some soft (functional) maps between them, we show how consistent latent spaces that have previously been used for improving map quality can also be exploited to reveal the geometric variability in the collection, without relying on a base or template shape (e.g., in Figure 13, our approach highlights the regions that are distinctive between the cats and lions), or assuming a particular (e.g., starshaped) topology of the functional map network. Our approach is based on a novel analysis of latent spaces, which demonstrates that after proper regularization they can be endowed with natural, unbiased geometric structure. We then show that, although our latent shape is a dual object that need not correspond to a real shape in 3D, it can be used together with the notion of shape differences introduced in [Rustamov et al., 2013] to construct a representation for each shape in the collection, but without relying on a fixed base shape as done in that work. Moreover, we show how the algebraic nature of this representation can be exploited to detect detailed information about differences between shape classes, perform (possibly partial) shape analogies, and analyze shapes across different modalities.
Contributions. To summarize, our main contributions are:

We describe how latent functional spaces can be endowed with natural geometric (metric and measure) structure, giving rise, for the first time, to a welldefined notion of a “latent shape” that characterizes a shape collection.

We define shape differences between real and latent shapes and show how such differences lead to a shape representation, that can be used for detailed shape analysis without assuming a particular topology of the map network.

We provide tools for a nuanced understanding of shape viability, including the separation of the different types of variability present within and across shape subcollections.

We demonstrate that our new representation supports deep learning techniques, including CNNs, for both analysis and synthesis, leading to improved results over baseline methods.
2. Related Work
Templatebased shape analysis and exploration. Analyzing shape collections by variability around a template shape has a rich and vast history going back to D’Arcy Thompson’s classic “On growth and form” [Thompson et al., 1942], which has inspired Kendall’s shape space theory [Kendall, 1989] and pattern theory formalized by Grenader and commonly used in computational anatomy [Grenander and Miller, 1998], where templates are often referred to as atlases.
In Computer Graphics, shape spaces based on template variation are ubiquitous in statistical shape analysis, e.g. for defining 3D morphable models [Blanz and Vetter, 1999; Allen et al., 2003], especially for capturing variability in human body and pose, e.g. [Anguelov et al., 2005; Hasler et al., 2009; Bogo et al., 2014] among many others.
Shape templates are also commonly used for exploring shape collections [Ovsjanikov et al., 2011; Kim et al., 2012]. Although in most cases the presence of a shape template is assumed to be given a priori, simultaneous template construction and fitting techniques have been used for both reconstruction [Wand et al., 2007, 2009; Tong et al., 2012] and exploration [Kim et al., 2013], among many others.
While pervasive, templatebased methods also have a wellknown limitation in that the choice of the template model can introduce bias in the kinds of variability that are revealed. Common selection techniques include using a particular (median) shape in a collection that is as close as possible to a centroid, or constructing a new template shape by pointwise averaging (e.g., [Joshi et al., 2004]).
Our approach avoids the construction of a explicit template shape, and replaces it with an implicit template obtained via the analysis of latent functional space, which both removes the bias in the template shape selection and also avoids the expensive geometric (embedded 3D shape) template construction.
Shape Analysis with functional maps. Our approach takes as input a collection of shapes with soft (functional) maps between them. In this, we follow the recent line of work on shape analysis with soft maps, similar to [Solomon et al., 2012; Kim et al., 2012; Rustamov et al., 2013]. Namely, we use the formalism of functional maps introduced originally in [Ovsjanikov et al., 2012] and extended significantly in followup works, including [Kovnatsky et al., 2013; Huang et al., 2014] among others (see [Ovsjanikov et al., 2017] for a recent overview).
Although originally proposed as a computational tool for shape matching, followup works have also shown its utility in shape analysis and exploration, starting with map visualization [Ovsjanikov et al., 2013], detection and encoding of shape differences [Rustamov et al., 2013], and cosegmentation and coanalysis [Huang et al., 2014] among others. The advantage of these techniques is that they only require approximate functional maps, which are much easier to compute than precise (pointtopoint) correspondences. Nevertheless, existing methods such as [Ovsjanikov et al., 2013; Rustamov et al., 2013] also follow the spirit of templatebased techniques and assume the presence of a single base shape with respect to which variability is captured. A recent method introduced in [Huang and Ovsjanikov, 2017] has tried to lift this assumption but is still restricted to revealing global variability within a single collection. We extend these techniques first by proposing a templatefree analysis and exploration framework using functional maps and second by proposing techniques for detecting and highlighting crosscollection variability, and finally by defining a compact shape representation that is suitable for learning.
Latent functional spaces. A key building block in our approach is the use of socalled latent functional spaces, which are closely related to map synchronization [Wang and Singer, 2013] and which have been used for computing consistent functional maps in shape and image collections [Huang et al., 2014; Wang et al., 2013, 2014]. One of our key contributions is to show that in addition to providing a powerful computational method for map inference, latent functional spaces also allow to reveal variability in shape collections and also to define a compact and informative shape representation.
Shape representations for learning. One of our key applications is to show how the shape representation obtained via the latent functional spaces can be naturally used in the context of supervised learning applications, and especially enable the use of convolutional neural networks for shape regression and classification.
In this, our work is related to the recent techniques aimed at applying deep learning methods to shape analysis. One of the main challenges is defining a meaningful notion of convolution, while ensuring invariance to basic transformations, such as rigid motions. Several techniques have recently been proposed based on e.g., Geometry Images [Sinha et al., 2016], Volumetric [Maturana and Scherer, 2015; Wang et al., 2017], pointbased [Qi et al., 2016] and multiview approaches [Su et al., 2015], as well as, more recently intrinsic techniques that adapt convolution to curved surfaces [Masci et al., 2015; Boscaini et al., 2016] (see also [Bronstein et al., 2017] for an overview), and even via toric covers [Maron et al., 2017] among many others.
Despite this tremendous progress in the last few years, defining a shape representation that can naturally support convolution operations, is compact, invariant to the desired class of transformations (e.g., rigid motions) and not limited to a particular topology, remains a challenge. As we show below, our representation is wellsuited for learning applications, and especially for revealing subtle geometric information regarding the shape structure.
Shape processing in latent representations. Finally, our work is also related to recent techniques that construct latent spaces for representing 3D shapes, especially those based on learning. For instance, [Wu et al., 2016] combine a 3DCNN with a Generative Adversarial Network (GAN) to first learn the latent space of 3D shapes. Given the latent space, they regress an image feature learned via a 2DCNN to the latent space to recover the underlying geometry. [Girdhar et al., 2016]
follow a similar strategy but use a voxelbased AutoEncoder (AE) instead of a GAN for learning the latent representation.
[Achlioptas et al., 2018] introduced an AE operating on 3D pointclouds to produce a latent space which is further exploited by a GAN for pointcloud synthesis. In a similar manner, [Li et al., 2017] developed a recursive neural net to map 3D partlayouts to a latent space at which a GAN operates to create novel shapes with various parthierarchies.Differently from our representation via latent space analysis, these learned embeddings represent shapes as points in some highdimensional space and rarely give access to regions or parts of the 3D shapes associated with or responsible for the shape variability. On the other hand, we represent shapes in a collection as linear operators, stored as matrices, which not only enables a meaningful notion of convolution but also allows us to recover explanations for differences and variability in terms of highlighted shape pats.
3. Overview
The rest of the paper is organized as follows: in Section 4 we describe the problem setting, the main goals and notations used below. Section 5 provides the theoretical foundation for our method.
In particular, we characterize the geometric structure of latent shapes in Section 5.1 and define our shape representation based on shape differences with respect to latent shapes in Section 5.2. We then describe the two key applications: extracting variability in shape collections (Section 6) and using our representation for 3D deep learning (Section 7). Finally, we show qualitative and quantitative results obtained using our methods in Section 8.
4. Preliminaries, Notation and Problem Setup
Throughout our work, we assume that we are given a collection of related 3D shapes and a set of functional maps [Ovsjanikov et al., 2012] among some shape pairs. Our main goal is to develop a theoretical foundation for a novel representation for the shapes in the collection, and to show how this representation can be effectively used in practical applications.
Specifically, we assume as input a set of shapes and functional maps , which map realvalued functions between some pairs of shapes . The functional maps can either be induced by pointwise correspondences, or, can be obtained via an optimization procedure, as described, e.g., in [Ovsjanikov et al., 2017]. Let be the stiffness matrix and the area matrix of these shapes, which encode respectively the metric and the measure information. The LaplaceBeltrami operator (LBO) is classically discretized as [Meyer et al., 2003]. We let be the diagonal matrix storing the
smallest eigenvalues of the LBO of shape
, andthe matrix storing the corresponding eigenvectors. Following previous works, we assume that functional maps are given in the reduced eigenbasis and can be thought of as matrices of size
.The functional map network (FMN) on is a graph , where the th vertex in corresponds to the functional space on , and the edge if we are given a functional map . We assume that this network is symmetric ( if and only if ) and is connected so that there exists at least one path consisting of the edges in between any pair of vertices in .
Shape Differences
Our shape representation is based on the shape differences introduced in [Rustamov et al., 2013], which characterize shape deformations by encoding the changes in inner products of functions. Namely, given shapes and a functional map in the reduced basis, the authors introduce the areabased and the conformal shape differences :
(1)  
(2) 
where is the MoorePenrose pseudoinverse. Intuitively is a linear operator, which once again, can be represented as a matrix of size , and which encodes the difference or distortion induced by a map (see Figure 2 and Eq.(4) in [Rustamov et al., 2013]).
The key limitation of shape difference operators for shape collection analysis, is that they require a choice of a base shape and consider only directional changes, from shape to other shapes, making it impossible to use them given an arbitrary (non starshaped) FMN. Thus, one of our goals is to extend this construction to the case of shape collections without assuming a fixed base shape. We achieve this by exploiting the formalism of latent functional bases [Wang et al., 2013], which has been proposed for improving the consistency of functional maps.
Latent Spaces
Given a FMN, the authors of [Wang et al., 2013] propose to extract a set of consistent latent bases on such that , and use them to refine the quality (consistency) of functional maps. The latent bases can be thought of as functions on , or as functional maps from some latent shape to each shape . Then, a map from to can be factored into a map from to the latent shape and then to via: . While useful as a tool for improving functional maps, the exact structure of latent shapes is still not fully understood, and they have so far not been used for representing shapes in a collection.
In our work we first show how latent shapes can be endowed with geometric structure, and be made more stable, through an extra regularization, and then define a latent space shape representation.
5. Latent Representation
5.1. Canonical Latent Basis and Latent Shape
Our first key observation is that the latent shape plays the role of an “average shape” in analyzing shape collections – a shapelike object that represents the entire collection, and which can be endowed with a natural geometric structure. Crucially, unlike existing approaches, for example in computational anatomy [Younes, 2010] that consider building templates or average shapes, we characterize the latent shape directly in the functional domain, without attempting to embed it in the ambient space.
The following theorem establishes the connection between the consistent latent basis and the geometry of the latent shape, while at the same time highlighting the limitations of the previously used approaches for constructing latent bases:
Theorem 5.1 ().
Given a collection of discrete 3D shapes in  vertex correspondence and sharing the same mesh connectivity, and a consistent FMN , in which the functional maps are represented in the eigenbasis on each . Let be the consistent latent basis satisfying the conditions: , and , where is a diagonal matrix. Then, the eigenbasis of the latent shape whose metric and measure are given by , i.e. can be recovered as for any .
This theorem suggests that the consistent latent basis carries information about the “average” geometry in the collection, given, in the full basis, by the average metric and measure matrices.
Role of Proper Regularization
Note that previous approaches for constructing the latent basis, such as [Wang et al., 2013] proposed to compute the latent basis by solving the optimization problem Geometrically, and in light of Theorem 5.1, this corresponds to only averaging the measure of the shapes, which leads to metric ambiguity. This can result in significant instabilities in the extraction of the latent basis. We demonstrate this effect in Figure 2. Namely, given a shape collection and an additional shape , we compared the CLB on , with respect to the original shape collection, and recomputed with all shapes. Figure 2(b) depicts the change of basis matrix between these two settings, which has noisy offdiagonal entries, suggesting that the latent shape is significantly perturbed.
To overcome this instability, we propose to construct a canonical latent basis by introducing an extra normalization which forces to be a diagonal matrix, and which corresponds in Theorem 5.1 to averaging the metric on the latent shape. With this additional normalization, the change of basis matrix between latent bases with and without shape shown in Figure 2(c) is much closer to a diagonal one than in Figure 2(b). The details of this construction are given in Algorithm 1.
Now, the extra normalization incorporates the metric information, therefore the latent shape can be thought of as a welldefined shape. In general, a shape with the average metric and measure does not admit an embedding in , but as we will soon show, this construction carries rich geometric information useful for shape processing.
Let us stress that Theorem 5.1 is of purely theoretical interest. However, Algorithm 1 can be implemented in practice, without assuming access to the full basis or exact consistent maps. In Figure 2 (d), we also show the proximity between the eigenbasis/spectrum of the latent shape recovered from functional maps in the reduced basis and the theoretical ground truth. Namely, Figure 2(d) shows the transformation matrix between the first computed eigenbasis, when functional maps are represented in a reduced basis of size and the theoretical groundtruth, given by the exact averaging of the metric and measure. At the same time, Figure 2(e) shows the eigenvalues in the two cases. Hereafter, we always use the canonical latent basis in all the formulations and applications, and denote it by to simplify notation.
Computing canonical latent basis in practice Computing the consistent latent basis with the framework of [Wang et al., 2013] involves an eigendecomposition of a possibly large, blockwise sparse matrix, whose size depends on the number of shapes and the dimensionality of functional maps. In order to gain scalability, in practice we first sample a subset of shapes with which we compute the canonical latent basis, and then for each shape outside the subset, we search for its nearest neighbor, , in using the ShapeDNA descriptor. Finally, we push the latent basis from to via the functional map , namely, . In particular, this scheme not only improves the scalability of the computation of our latent shape representation, but also allows to avoid recomputing the latent basis for each new shape.
5.2. Shapes as Latent Shape Differences
Although the canonical CLB reduces the instability present in the previous basis construction, the latent bases unfortunately still cannot be used to represent each shape in the collection. The main reason is that is expressed in the eigenbasis of shape , and therefore, one cannot compare, for example with , which is fundamental in both shape analysis and learning applications.
Instead, we build our shape representation by defining the latent shape differences, which are linear operators acting on the function space of the latent shape, and which, as such, are independent of the basis on each shape.
Namely, we assume the spectrum of the latent shape, arising from step 3. of the procedure described in Algorithm 1, is denoted by . Then, following the formulation of [Rustamov et al., 2013], we define the areabased and conformal latent shape differences as:
(3)  
(4) 
The final procedure for extracting these operators from a given collection is summarized in Algorithm 2.
The main insight of our work is that the latent shape differences provide a compact and extremely versatile representation for each shape in a collection as a pair of smallsized matrices, which enjoy several nice theoretical properties, and enable a number of novel applications in analysis and learning.
5.3. Properties of the Latent Shape Differences
Given a shape collection with the associated functional map network, the latent space shape differences (LSSDs) provide a representation of each shape as a pair of matrices whose size is controlled by the size of the latent basis. In this work, we argue that this representantion enables a number of novel applications and lifts fundamental restrictions of previous approaches. In particular, LSSDs inherit some of the most attractive properties of shape differences, such as their compactness and informativeness, while avoiding their shortcomings. Below we summarize the main properties of this representation.
Invariance: LSSDs provide a representation that is invariant to rigid (and more generally isometric) shape transformations. In the context of learning, this is especially important as it will allow us to do inference in a poseinvariant way.
Flexibility: computing LSSDs only requires the knowledge of functional maps and places no restriction on the shape discretization. For example, they can accomondate collections of shapes with different number of vertices, or even with different modalities such as pointclouds and meshes.
Informativeness: LSSDs fully encode the intrinsic geometry of each shape in the collection in a compact way. Indeed, it follows from Theorem 5.1 that in the presence of full information, given the FMN of a collection of shapes , the spectrum of the latent shape, and for each shape in , one can recover the intrinsic geometry for each , i.e., the area and stiffness matrices , which, in turn, fully determines the edge lengths [Zeng et al., 2012].
Functoriality: if we interpret each as the functional map associating the latent shape to , it follows from the functoriality property in [Rustamov et al., 2013] that
where is the shape difference between and . Thus, LSSDs not only encode the difference of each shape to the latent shape but also allow to factor the difference between each pair of shapes, via the canonical latent basis.
Algebraic nature: LSSDs are linear functional operators on the latent shape. As such, they can be represented as small matrices and manipulated using standard numerical linear algebraic tools, in practice. Moreover, they provide detailed (localized) information about the shape geometry. As we show below, this allows us to extract partial information to compare and reconstruct shape parts, in contrast to purely global shape descriptors.
Baseshape independence: Crucially, unlike the original shape differences, which rely on the choice of a specific base shape, which can lead to biased results, and requires a starshaped map network, LSSDs are extracted from the entire input functional map network, regardless of its topology. This is especially true due to our novel regularization, which leads to a latent shape, endowed with canonical geometric structure. Let us note that theoretically, in the presence of full information and an a priori consistent map network, the choice of the base shape should not affect the results. In practice, however, functional maps are represented in a reduced basis and are not perfectly consistent, which can introduce strong bias in the subsequent analysis.
To illustrate this effect, we aligned a collection of cats and dogs shown in Figure 3 without any maps across them using the original and the latent space shape differences. For the former, we assume that a pair of shapes, e.g., the boxed animals in Figure 3, to be used as bases in each cluster, and computed the eigenvalues of the respective shape differences as descriptors. On the other hand, we used the eigenvalues of the latent shape differences as the descriptor for each shape in the collection, without any a priori information. The alignment result based on the above descriptors in shown in the bottom two rows of Figure 3. Note that, when using the approach of [Rustamov et al., 2013] even after fixing the corresponding base shapes, none of the base shape choices led to the correct result. We demonstrate one such result obtained by fixing the base shapes to be the ones shown in the blue boxes. Meanwhile, as shown in the middle row, using the latent shape differences results in the groundtruth alignment. Note that the same experiment has been conducted in [Rustamov et al., 2013] (see Figure 13 therein), however, to obtain the exact alignment, the authors used all pairwise shape differences.
Compatibility with sparse map networks: Another key advantage of our latent representation is its ability to extract information from sparse map networks. As observed in previous works [Huang et al., 2014], functional maps between similar shapes are typically much easier to compute. On the other hand, establishing functional maps from a fixed base shape to all other shapes in the collection can lead to significant errors. To illustrate this, we consider a sequence of frames of galloping horses shown in Figure 4(a), and assume that only functional maps between consecutive frames are given, resulting in a sparse FMN with chain topology. Figure 4(b) demonstrates that, even when extracted from the sparse FMN, the LSSDs recover the cyclical structure of the collection, while using the shape differences from the base shape, , computed by composing the given functional maps, leads to an erroneous embedding, as shown in Figure 4(c).
5.4. Projected Latent Shape Differences
Besides encoding each shape in the collection, the latent shape differences also give access to detailed information about the deformation, including the local changes in different shape regions in a purely algebraic way.
A link between actual deformations across shapes and functional distortions induced by the respective shape differences has been established in [Rustamov et al., 2013] – namely, a function, such as an indicator function of a region, will be modified by the shape difference, if it is supported on region undergoing a deformation. It has been further shown in [Rustamov et al., 2013] (see Section 6) that the areabased (resp. conformal) shape difference is an identity operator if and only if the underlying map is areapreserving (resp. conformal).
In this section, we propose a novel projection operation on the latent shape differences, the key observation is that we can suppress a functional deformation by modifying the shape difference so that it acts like an identity operator on certain functional subspace expressing the deformation of interest, which, in the following allows us to perform partial shape analogies.
Suppose that we are given a set of shapes and a FMN , and let be the LSSDs computed using Algorithm 2. Now we consider a set of functions , where are orthonormal basis functions on the latent shape, i.e., . We construct a projected latent shape difference using and as follows:
(5) 
It is easy to verify that if is orthogonal to the subspace spanned by functions in , and if is spanned by the functions in . Intuitively, if contains the full basis on the latent shape, then , which forces latent shape difference to correspond to an areapreserving or conformal map, depending on the type of .
6. Shape Collection Comparison
Several approaches have been proposed for detecting geometric variability that exists within a given collection of shapes connected by functional maps, e.g., [Rustamov et al., 2013; Huang and Ovsjanikov, 2017]. In this section, we show how our latentbased shape representation can be used for detecting and analyzing and differences across different shape collections, or two subsets of a larger collection. Namely, given a set of shapes , a FMN and a partition , we aim to capture the difference between and , while not being sensitive to the global variability that exists within . This problem arises especially when trying to detect the detailed geometric properties that are responsible for the differences between shape classes (e.g., healthy vs unhealthy organs), while factoring out the “normal” or “common” variability within the collection.
Global variability
Before approaching this problem, we first propose an algebraic approach for detecting global variability within a collection. Our observation is that, in light of Section 5.4, suppressing global variability should lead to projected LSSDs that are indistinguishable from each other. Namely, we would like to find a basis such that the latent shape differences projected onto are as close as possible. For this, we first introduce a term that measures the difference of the norms between the original and projected latent shape differences:
(6) 
According to the following lemma, the change is always nonnegative and can be written in a quadratic form.
Lemma 6.1 ().
If , then
It is natural to optimize for a function , which maximizes the global change of distances within the collection, i.e.,
(7) 
In other words, after suppressing the functional deformation related to , the shapes are maximally brought together. According to Lemma 6.1,
is given by the eigenfunction associated with the largest eigenvalues of
.Crosscollection variability
Following the same idea above, we formulate the crosscollection variability to be such that after suppressing it, the clusters and should become closer to each other, while maintaining their inner structure. In other words, we aim to simultaneously maximize the changes of distances across shapes in different clusters, and minimize those within the same cluster.
Putting these two goals together, we construct:
(8) 
As an illustration, in Figure 5, we demonstrate the optimizers respectively. Since the horizontal bump is of twice the size of the vertical one, to maximally reduce the intravariability, one should suppress the horizontal deformation. Meanwhile, it is intuitive that cluster and are distinguished by magnitudes of the vertical bumps, which should be detected as crosscollection variability.
Finally, we point out that, though not being equivalent, there is a connection between the formulations above and the one for detecting global variability proposed in [Huang and Ovsjanikov, 2017]. In fact, we can use results from both approaches for crossvalidation. We refer interested readers to Appendix for the statement and proof of this connection.
7. Applications in Learning
On the (deep) learning side of our exposition we study how our representation of the latent shape difference operators, can be used as the input that neural networks will rely upon to reason about 3D data. A key property of this input representation is that it is encoded as a small size matrix – i.e. it provides a regular structure amenable to convolutions. CNNs rely and take advantage of the spatial proximities found in regulargrid data such as images. Analogously, according to our formulation for LSSDs, we have, e.g. in Eq. 3, , where is the th latent basis function. Thus, instead of spatial proximity, the neighboring entries in our matrix representation encode and provide interactions of function pairs that are close in the “spectral” domain.
The first task at which we test the effectiveness of our approach is shape regression
– here we compare neuralnetworks that learn to estimate hidden parameters that control the body variation of humanform meshes. Assuming a set of training shapes, with known parameters that are represented as realvalued vectors, our networks learn to regress the underlying parameters for new unseen shapes.
The second task we explore is that of 3D pointcloud reconstruction. Quite differently from the regression task, here we test how a novel network that inputs a latent difference matrix can learn to reconstruct a 3D pointcloud version of the underlying mesh. This problem is closely related to shape reconstruction from intrinsic operators, which was recently considered in [Boscaini et al., 2015; Corman et al., 2017] where several advanced, purely geometric, optimization techniques have been proposed that give satisfactory results in the presence of full information [Boscaini et al., 2015] or under strong (extrinsic) regularization [Corman et al., 2017] — but they also demonstrate the many challenges posed by this type of reconstruction. In contrast, we show that by using the context of a collection and learning machinery, real shapes can be recovered rather well from their latent difference operators, and moreover that entirely new shapes can be synthesized using the algebraic structure of difference operators.
One possible concern with our approach is that it requires an initial functional map network, which can potentially restrict the amount of training data available. However, as we show in Section 8 even for collections of moderate size, consisting of a hundred to two hundred shapes, our networks are sufficiently regularized and allow for very powerful and effective learning.
7.1. Localized Latent Shape Difference
Localized shape deformation is a useful tool for shape analysis and synthesis in geometry processing. The algebraic form of our latent representation makes it easy to manipulate, meanwhile, the geometric information encoded in it allow us to access the local geometric features.
Given shapes and with their respective LSSDs and , and a set of basis functions , expressed in the basis of the latent shape and which is supported on localized region on the shapes, we can construct an operator that acts as on and as on the complement of as follows: Note that this expression resembles Eq. (5) above, but where we consider
, so that one of the interpolated shapes is the latent shape itself. Using
allows us to construct shapes by mixing different parts or regions of existing shapes, leading to localized interpolation/shape analogy, as we will show in Section 8.4.8. Main Experimental Results
8.1. Applications in Learning
In this set of experiments we explored how latent shape differences can be used within the context of a 3D deep learning pipeline. As mentioned in Section 7 our latent differences provide a new representation of geometry with unique characteristics, suggesting its use in 3DML applications.
8.2. Datageneration
For the experiments of Sections 8.3 and 8.4 we generated human shape bodies in eight different poses using the opensource implementation [Chen et al., 2015] of the SCAPE method [Anguelov et al., 2005]. In [Chen et al., 2015], body variations are controlled with latent parameters , which informally encode shape attributes such as height, leggirth, belly protrusion, etc. To generate our shapes we sampled uniformly i.i.d. each of the aforementioned parameters and considered eight modifications of the standard Tpose. See Figure 6 for a sample of the resulting meshes. It is worth noting that the produced meshes share the same combinatorial tessellation on 6,449 vertices, which facilitated the construction of pairwise functional maps in this collection. For the following experiments of regression and reconstruction, we used a traintestval split with of this dataset respectively.
8.3. Regression
In this experiment, we assess the efficacy of a neuralnetwork in regressing the bodygenerating parameters under different types of input representations. Concretely, we compare the responses between two types of input: pointclouds with points sampled uniformly areawise from each mesh and areabased latent differences. We explore the effect of several design choices in the construction of our differences. First, we consider different topologies of the underlying Functional Map Network (FMN). These include the complete graph but also much sparser versions based on the nearestneighbors () of each shape. Second, we vary the dimensions of the latent bases which crucially effects the size of the difference matrices. We use the LBO eigenvectors with the smallest eigenvalues to express all functional maps and the Euclidean norm of these spectra to define a distance for the construction of the nearestneighbors. Last, we train our neuralnetworks to minimize the MeanSquareError (MSE) between their predicted and the groundtruth shape generating parameters. Note, that since these parameters are independent of a shape’s pose, posevariations of this dataset act as “nuisance” variables that the networks have to explainaway.
8.3.1. Comparing architectures: protocol
To select a good pointcloud (PC) based architecture we evaluate three PointNetlike networks [Qi et al., 2016] that use encoding/decoding schemes like those of [Achlioptas et al., 2018]
. These architectures have shown excellent results in tasks involving 3D point clouds, including classification, partsegmentation and generation and provide a strong baseline. Concretely, our pointbased architectures have three layers of convolutional encoders, followed by a featurewise maxpool and either two or three layers of FCReLUs that act as decoders. To strengthen our comparisons, we calibrate each PC architecture to have a distinct number of training parameters and train it with several learning rates to obtain from a pool of
models, the one with the best performance (see Appendix Sec. B.1.1 for more details).At the same time, we consider two types of architectures when the input is a latentdifference: MultiLinear Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). Across all experiments, these MLPs are four layer deep and the CNNs have two layers of convolutions leading to a third FC layer (see Appendix Sec.
B.2 for more details).Figure 7 shows the MSE between the predicted vectors and the groundtruth for the test shapes in a variety of conditions. The reported MSE is the average over five random datasplits and weight initializations of the neural nets. The networks are trained maximally for epochs and the displayed MSE correspond to the model (epoch) that optimized the validation split. The dashedline shows the performance of the best overall pointbased architectures.
Discussion.
Figure 7 reveals several trends. First, shape difference CNNs perform better than MLPs and both perform significantly better than pointbased nets for a wide variety of different configurations. Second, there seems to exist a sweetspot in the range of 35 and 45 latent bases — which consistently produces better results across different network topologies. Third, denser topologies give rise to better results with the clique FMN achieving the best performance. In Table 1 we include some complementary information to that of Figure 7. Its first row contains the MSE measurements for the pointbase network (PC column) and some MLP/CNNs configurations (clique or 20nearestneighbors topology with basis functions). The second row reports the generalization error (difference between test and training MSE) of each architecture. The architectures seem to overfit in a similar fashion percentagewise, but crucially the differencebased ones, do so at significantly lower values. The last two rows report the MSE and the average distance between the predictions and groundtruth when we train these networks for 1,000 instead of 500 epochs.
Metric  PC  MLPClique  CNN20  CNNClique 

MSE@500  0.057  0.033  0.027  0.009 
GE@500  0.020  0.009  0.010  0.003 
MSE@1K  0.061  0.032  0.013  0.005 
@1K  0.192  0.134  0.086  0.050 
8.4. Reconstruction
In the second set of deep learning experiments we demonstrate how we can reconstruct a pointcloud derived from a 3D mesh based on the corresponding latent areabased difference operators. To achieve this we use a wider and deeper version of our previous CNN with the regressionoptimal input: difference matrices of dimensions , based on a clique FMN. The new network is comprised of 5 layers, with the first two layers being convolutional and the remaining three FCs (see Appendix Sec. B.2 for more details). The output of this network is realnumbers which are trained to have minimal Chamfer(pseudo)distance, from the corresponding groundtruth pointclouds that are comprised also of points (similar to [Fan et al., 2016; Achlioptas et al., 2018]).
Figures 8 and 10 demonstrate the quality of the learned reconstructions along with the capacity of our representation for doing semanticallyrich shape synthesis operations, such that of constructing new shapes (not present in the original shape collection) based on shapeanalogies. First, to visually inspect the reconstruction quality compare the groundtruth pointclouds: with their corresponding reconstructions . These groundtruth pointclouds belong in the test split and their reconstructions have successfully captured both the underlying pose and the body structure. While this is true, it is also evident that some highfrequency geometric information (mostly around the hands) has not been recovered. Despite these artifacts, these results are remarkable, given previous attempts at shape reconstruction from difference operators [Boscaini et al., 2015; Corman et al., 2017], which only work by combining both area and conformal differences and work in very restricted settings under strong regularization.
We also test the generalization power of the network by synthesizing shapes and that try to have a similar posewise and bodywise relation to the pointcloud , as the relation that pointcloud has to (to form an analogy). To construct we decode (i.e. reconstruct) the neuralnetwork’s latent code corresponding to the additive formula: , where is the output activations of the first FC layer when the input is shape . This is the traditional practice in performing analogies with the latentcodes of a deepnet [Mikolov et al., 2013; Wu et al., 2016; Achlioptas et al., 2018], based on latent vector arithmetic. In a different way that better reflects the nature of difference operators, we also reconstruct the result of the multiplicative formula with being the difference operator of shape . Here we directly exploit the matrix nature of our representation which enables this type of algebra. The result of this approach is . It is interesting to observe that this reconstruction, (), results not only in less noisy pointclouds compared to ; but also in semantically more appropriate structures, e.g. in Fig 8, reflects less prominently the expected sitting pose and in Fig 10, has more muscular arms than expected.
Partial shape analogies Moreover, we propose to construct partial shape analogies. We follow the formulation described in Section 7.1 – in parallel to , we construct for localized deformation transfer between and . We first show in Figure 9 a partial body transfer. Given the LSSDs regarding and , we restricted the region of interest to their upper body, and synthesized the LSSD. The reconstructed point clouds are shown in Figure 9. Note that is similar to in the lower body, while being similar to in the upper body.
In Figure 10, we show both the global shape analogies and the partial ones. are the reconstruction result of and , respectively.
Generalization with computed functional maps
Though the input LSSDs of our network are precomputed with respect to all the shapes in consideration, as mentioned at the end of Section 5.1, we can assign the latent basis to new unseen shapes without recomputing the latent basis. In particular, we generated a set of new human shape bodies, and for each shape, we searched for its nearest neighbor in the existing collection. We then the kernel matching algorithm [Lähner et al., 2017] to compute an initial map between the new shape and its neighbor in the collection, allowing us to compute its latent representations, as described at the end of Section 5.1.
We show some of the reconstruction results in Figure 11. In particular, we show in 11(b) a failure case. It is worth noting that though the result has a wrong pose compared to the ground truth, the body type is recovered. This is due to the fact that in this dataset, the variability across different body types is more prominent, while the number of poses is limited. Thus the network is expected to put more weight on the features regarding the former, resulting some mismatch poses. We also emphasize that lifting the need of a base shape is crucial in this case, since estimating functional maps across distant shapes is errorprone. Contrastingly, our formulation simplifies such matching procedure.
Latent shape interpolation We also considered another dataset for the reconstruction task – the Dynamic FAUST dataset [Bogo et al., 2017], from which we sampled shapes for training, validation and test. For the computational efficiency, we computed the canonical latent basis among a subset of shapes, and push the basis to the rest shapes in the same way above, but using the groundtruth functional maps. It is worth noting the shapes in this dataset manifest high extrinsic variability while being near isometry within the poses corresponding to the same character. On the other hand, our representation is purely intrinsic, making it challenging to learn features that differentiate the extrinsic change.
Here, we demonstrate the advantage of the algebraic form of our representation. We selected a pair of shapes and from the test set, and construct a sequence of linear interpolation between their latent representations , i.e., . The output of the network, given , presents a continuous change regarding the knees and upper body. In Figure 12, we selected output point clouds with respect to increasing (from left to right), which show a process of raising the right knee and leaning to left from to .
8.5. Geometric Exploration of Shape Collections
In the following experiments we demonstrate the utility of our method for capturing crosscollection variability in shape collections, as suggested in Section 6. In particular, we demonstrate that our method can be applied to realworld data (Figure 16) beyond the synthesized shapes, and can be used to compare point clouds as well as triangle meshes. We also demonstrate that our method can extract informative signals in a semisupervised classification task (Figure 14). Finally, we demonstrate that our method is stable with respect to the input functional maps, as it produces comparable results when using computed and ground truth maps functional maps (Figure 13,16,18,17).
Throughout the results below, unless stated otherwise, we used areabased LSSDs, which are represented as matrices of size in the reduced basis. To construct the FMN, we first compute distances among shapes (using the shapeDNA descriptors [Reuter et al., 2006]), and form a minimum spanning tree network using these distances. When considering two clusters, we first form a spanning tree on each, and connect shapes across clusters using nearest neighbor search. The methods described in Section 6 optimize for functions on the latent shape, which we map to functions on the actual shapes in the collection, resulting in a consistent and informative visualization.
We applied our method with computed functional maps as input as well. Unless stated otherwise, in the following we used the kernel matching algorithm [Lähner et al., 2017] for an initial pointwise map, and then converted and refined the maps using functional map techniques (see, e.g., [Ovsjanikov et al., 2012]).
Heterogeneous Shape Collection Comparision
We first demonstrate that our method can capture variability across heteregeneous shape collections, without relying on pointwise correspondences, through its use of functional maps framework. For this, in Figure 13, we show the computed distinctive functions highlighting the difference between a set of cats (each consists of vertices) and a set of lions (each consists of vertices), where the crosscollection maps were estimated using the original functional maps approach [Ovsjanikov et al., 2012] given a sparse set of landmarks. Note that our method correctly highlights the snouts, the four paws and the tips of the tails, distinctive to each class, despite the presence of the global poses variability in the collection. Moreover, we consider a quantitative validation of the crosscollection variability detected by our method, by comparing the PCA embeddings of the LSSDs before and after projection with respect to the highlighted functions shown in Figure 13. As shown Figure 13(b) after projection the relative distances within the same cluster remain similar while the two clusters become closer to each other.
Clustering with Visual Evidence
In Figure 14, we analyze two clusters of shapes displayed in the two top rows that represent different characters in two distinct poses. As shown in the first five columns, the highlighted functions capture the bending knees, which intuitively distinguish the two poses. In contrast, both the global variability across the whole collection (the second column from right) and the ones within each cluster (the rightmost column) concentrate on the torso. We also used computed functional maps for detecting the crosscollection variability, and obtained comparable highlighted functions, which are shown on a subset of the shapes in Figure 15.
We also plot the PCA of the latent shape differences in the bottom of Figure 14, where the blue and red points are mixed, suggesting the dominance of variability in body type, captured in areabased shape differences.
However, with the highlighted function detected by our approach, expressed through coefficients in the latent basis, we can separate the shapes by computing and plotting the PCA of the resulting vectors. As can be seen in Figure 14 (bottom, middle) these vectors separate the two clusters much better.
Furthermore, the same procedure can be performed with partial clustering. Thus, given the cluster ids of only the shapes in the red box, we first computed the optimal distinctive function with respect to this subset, and plotted the PCA of on the latent shape of the whole collection. The PCA plot of suggests that this approach reveals the correct clusters in a semisupervised way.
Practical Application in Anatomy
The problem of analyzing variability across different classes of 3D objects is wellstudied in computational anatomy, where the classical approach is to first manually establish dense landmark correspondences and to compute the difference from each object to some precomputed deformable template shape. As mentioned above, our approach does not require an actual embedding of a template, which allows it to handle complex heterogeneous data.
To illustrate this, we compared two sets of bones of two subspecies of wild boars acquired using 3D scanning techniques. In particular, as input we considered the bone scans with consistent handcrafted landmarks and sliding landmarks [Gunz and Mitteroecker, 2013] on each of shape. We then estimated the FMN starting with a different number, , of the handcrafted landmarks, using the functional map estimation approach proposed in [Huang and Ovsjanikov, 2017]. These functional maps were then used to compute the distinctive regions, as described Section 6. The corresponding shapes and highlighted functions are shown in Figure 16. Remarkably, our results are stable with even for a small number of landmarks, and furthermore correspond to anatomically meaningful shape parts, which in general coincide with the ones detected by the extrinsic templatebased approach, which uses all the landmarks, and agree with the functionality explanation for this crossspecies variability identified by the domain experts.
Crosscollection Variability across Point Clouds
Our framework can also be applied across different modalities. In Figure 17, we compared the shapes on the top row with the ones on the bottom, which clusters correspond to two characters in different poses. Therefore the crosscollection variability should capture the difference in body shapes. We used the discretization from [Huang et al., 2017] for computing eigenbases on point clouds, and, given a sparse set of ten landmarks across the 8 point clouds, we used the adjointregularization from [Huang and Ovsjanikov, 2017] for estimating the functional maps between the shapes. For comparison, we also computed the crosscollection variability detected with the groundtruth FMN among the same shapes represented as meshes. Clearly, both of these highlight regions are on the torso, reflecting the significant area change, associated with the change in body type.
Facial Comparison
We end our demonstration with a comparison between two sets of human faces, which correspond to happy and sad expressions (due to the lack of space, we plot only faces out of ). As shown in Figure 18, both highlighted functions computed with groundtruth functional maps and the computed ones detect the intuitive crosscollection difference – the chin and the cheeks.
9. Conclusions
We have presented a novel approach for representing and analyzing 3D shapes in a context of one or multiple collections. Our construction is based on functional maps network connecting the shapes and a novel analysis that demonstrates that previously used latent functional spaces can both be endowed with a natural geometric structure and provide a basis for representing and comparing shapes in an unbiased way. This leads to Latent Space Shape Differences which represent each shape in the collection as a pair of functional operators, stored as smallsized matrices in practice. This representation has many appealing properties, including invariance to rigid motions as well as full intrinsic informativeness that permits reconstruction. We have demonstrated their use in extracting and highlighting variability of interest in a set of shapes, while also suppressing other variability that we regard as nuisance (and which may in fact manifest in larger geometric deformations). We believe that this highly nuanced understanding of shape distortions and variability is important for many applications in engineering, biology, and medicine. Moreover, we showed that the matrix form of our representation makes it suitable for learning algorithms and the use of CNNs in particular, for both regression and reconstruction.
We also note that the matrix nature of our representation makes it a different mathematical object from the usual latent codes used in machine learning that are invariably points in highdimensional Euclidean spaces. While pointbased representations usually lead to quite limited set of operations (typically, interpolations and vectorbased analogies), our difference matrices reflect the internal structure of the shapes and enable, for example, localized shape analogies.
References
 [1]
 Achlioptas et al. [2018] Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas J Guibas. 2018. Learning Representations and Generative Models For 3D Point Clouds. Proceedings of the 35th International Conference on Machine Learning (2018).
 Allen et al. [2003] Brett Allen, Brian Curless, and Zoran Popović. 2003. The space of human body shapes: reconstruction and parameterization from range scans. In ACM transactions on graphics (TOG), Vol. 22. ACM, 587–594.
 Anguelov et al. [2005] Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: Shape Completion and Animation of People. In ACM Transactions on Graphics (TOG), Vol. 24. ACM, 408–416.
 Blanz and Vetter [1999] Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques. ACM Press/AddisonWesley Publishing Co., 187–194.
 Bogo et al. [2014] Federica Bogo, Javier Romero, Matthew Loper, and Michael J Black. 2014. FAUST: Dataset and Evaluation for 3D Mesh Registration. In Proc. CVPR. 3794–3801.

Bogo
et al. [2017]
Federica Bogo, Javier
Romero, Gerard PonsMoll, and
Michael J. Black. 2017.
Dynamic FAUST: Registering Human Bodies in
Motion. In
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)
.  Boscaini et al. [2015] Davide Boscaini, Davide Eynard, Drosos Kourounis, and Michael M Bronstein. 2015. ShapefromOperator: Recovering Shapes from Intrinsic Operators. In Computer Graphics Forum, Vol. 34. Wiley Online Library, 265–274.
 Boscaini et al. [2016] Davide Boscaini, Jonathan Masci, Emanuele Rodolà, and Michael Bronstein. 2016. Learning shape correspondence with anisotropic convolutional neural networks. In Advances in Neural Information Processing Systems. 3189–3197.
 Bronstein et al. [2017] Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. 2017. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34, 4 (2017), 18–42.
 Chen et al. [2015] Wenzheng Chen, Huan Wang, Yangyan Li, Hao Su, Zhenhua Wang, Changhe Tu, Dani Lischinski, Daniel CohenOr, and Baoquan Chen. 2015. Synthesizing Training Images for Boosting Human 3D Pose Estimation. In 3D Vision (3DV). https://doi.org/chen1474147/Deep3DPose
 Corman et al. [2017] Etienne Corman, Justin Solomon, Mirela BenChen, Leonidas Guibas, and Maks Ovsjanikov. 2017. Functional Characterization of Intrinsic and Extrinsic Geometry. ACM Trans. Graph. 36, 2, Article 14 (March 2017), 17 pages.
 Fan et al. [2016] Haoqiang Fan, Hao Su, and Leonidas J. Guibas. 2016. A Point Set Generation Network for 3D Object Reconstruction from a Single Image. CoRR abs/1612.00603 (2016).
 Girdhar et al. [2016] Rohit Girdhar, David F. Fouhey, Mikel Rodriguez, and Abhinav Gupta. 2016. Learning a Predictable and Generative Vector Representation for Objects. Springer International Publishing, Cham, 484–499.
 Grenander and Miller [1998] Ulf Grenander and Michael I Miller. 1998. Computational anatomy: An emerging discipline. Quarterly of applied mathematics 56, 4 (1998), 617–694.
 Gunz and Mitteroecker [2013] Philipp Gunz and Philipp Mitteroecker. 2013. Semilandmarks: a method for quantifying curves and surfaces. Hystrix, the Italian Journal of Mammalogy 24, 1 (2013), 103–109. https://doi.org/10.4404/hystrix24.16292
 Hasler et al. [2009] Nils Hasler, Carsten Stoll, Martin Sunkel, Bodo Rosenhahn, and HP Seidel. 2009. A Statistical Model of Human Pose and Body Shape. In Computer Graphics Forum, Vol. 28. 337–346.
 Huang et al. [2014] Qixing Huang, Fan Wang, and Leonidas Guibas. 2014. Functional map networks for analyzing and exploring large shape collections. ACM Transactions on Graphics (TOG) 33, 4 (2014), 36.
 Huang et al. [2017] Ruqi Huang, Frederic Chazal, and Maks Ovsjanikov. 2017. On the Stability of Functional Maps and Shape Difference Operators. 37, 1 (2017).
 Huang and Ovsjanikov [2017] Ruqi Huang and Maks Ovsjanikov. 2017. Adjoint Map Representation for Shape Analysis and Matching. In Proc. SGP, Vol. 36.
 Joshi et al. [2004] Sarang Joshi, Brad Davis, Matthieu Jomier, and Guido Gerig. 2004. Unbiased diffeomorphic atlas construction for computational anatomy. NeuroImage 23 (2004), S151–S160.
 Kendall [1989] David G Kendall. 1989. A survey of the statistical theory of shape. Statist. Sci. (1989), 87–99.
 Kim et al. [2013] Vladimir G Kim, Wilmot Li, Niloy J Mitra, Siddhartha Chaudhuri, Stephen DiVerdi, and Thomas Funkhouser. 2013. Learning partbased templates from large collections of 3D shapes. ACM Transactions on Graphics (TOG) 32, 4 (2013), 70.
 Kim et al. [2012] Vladimir G Kim, Wilmot Li, Niloy J Mitra, Stephen DiVerdi, and Thomas Funkhouser. 2012. Exploring collections of 3D models using fuzzy correspondences. ACM Transactions on Graphics (TOG) 31, 4 (2012), 54.
 Kingma and Ba [2014] Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). arXiv:1412.6980
 Kleiman et al. [2015] Yanir Kleiman, Oliver van Kaick, Olga SorkineHornung, and Daniel CohenOr. 2015. SHED: shape edit distance for finegrained shape similarity. ACM Transactions on Graphics (TOG) 34, 6 (2015), 235.
 Kovnatsky et al. [2013] Artiom Kovnatsky, Michael M Bronstein, Alexander M Bronstein, Klaus Glashoff, and Ron Kimmel. 2013. Coupled quasiharmonic bases. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 439–448.
 Lähner et al. [2017] Z. Lähner, M. Vestner, A. Boyarski, O. Litany, R. Slossberg, T. Remez, E. Rodol‘a, A. M. Bronstein, M. M. Bronstein, R. Kimmel, and D. Cremers. 2017. Efficient Deformable Shape Correspondence via Kernel Matching. arXiv preprint 1707.08991 (2017).
 Li et al. [2017] Jun Li, Kai Xu, Siddhartha Chaudhuri, Ersin Yumer, Hao Zhang, and Leonidas J. Guibas. 2017. GRASS: Generative Recursive Autoencoders for Shape Structures. CoRR abs/1705.02090 (2017).
 Maron et al. [2017] Haggai Maron, Meirav Galun, Noam Aigerman, Miri Trope, Nadav Dym, Ersin Yumer, VLADIMIR G KIM, and Yaron Lipman. 2017. Convolutional Neural Networks on Surfaces via Seamless Toric Covers.
 Masci et al. [2015] Jonathan Masci, Davide Boscaini, Michael Bronstein, and Pierre Vandergheynst. 2015. Geodesic convolutional neural networks on riemannian manifolds. In Proc. ICCV workshops. 37–45.
 Maturana and Scherer [2015] Daniel Maturana and Sebastian Scherer. 2015. Voxnet: A 3d convolutional neural network for realtime object recognition. In Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on. IEEE, 922–928.
 Meyer et al. [2003] Mark Meyer, Mathieu Desbrun, Peter Schröder, and Alan H Barr. 2003. Discrete differentialgeometry operators for triangulated 2manifolds. In Visualization and mathematics III. Springer, 35–57.
 Mikolov et al. [2013] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. CoRR abs/1310.4546 (2013). arXiv:1310.4546
 Ovsjanikov et al. [2013] M. Ovsjanikov, M. BenChen, F. Chazal, and L. Guibas. 2013. Analysis and visualization of maps between shapes. Computer Graphics Forum 32, 6 (2013), 135–145.
 Ovsjanikov et al. [2012] Maks Ovsjanikov, Mirela BenChen, Justin Solomon, Adrian Butscher, and Leonidas Guibas. 2012. Functional Maps: A Flexible Representation of Maps Between Shapes. ACM Transactions on Graphics (TOG) 31, 4 (2012), 30.
 Ovsjanikov et al. [2017] Maks Ovsjanikov, Etienne Corman, Michael Bronstein, Emanuele Rodolà, Mirela BenChen, Leonidas Guibas, Frederic Chazal, and Alex Bronstein. 2017. Computing and processing correspondences with functional maps. In ACM SIGGRAPH 2017 Courses. ACM, 5.
 Ovsjanikov et al. [2011] Maks Ovsjanikov, Wilmot Li, Leonidas Guibas, and Niloy J Mitra. 2011. Exploration of continuous variability in collections of 3D shapes. ACM Transactions on Graphics (TOG) 30, 4 (2011), 33.
 Qi et al. [2016] Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2016. PointNet: deep learning on point sets for 3D classification and segmentation. CoRR abs/1612.00593 (2016).
 Reuter et al. [2006] Martin Reuter, FranzErich Wolter, and Niklas Peinecke. 2006. LaplaceBeltrami Spectra As ’ShapeDNA’ of Surfaces and Solids. Comput. Aided Des. 38, 4 (April 2006), 342–366.
 Rustamov et al. [2013] Raif M. Rustamov, Maks Ovsjanikov, Omri Azencot, Mirela BenChen, Frédéric Chazal, and Leonidas Guibas. 2013. Mapbased exploration of intrinsic shape differences and variability. ACM Transactions on Graphics 32, 4 (2013), 1.
 Sinha et al. [2016] Ayan Sinha, Jing Bai, and Karthik Ramani. 2016. Deep learning 3d shape surfaces using geometry images. In European Conference on Computer Vision. Springer, 223–240.
 Solomon et al. [2012] Justin Solomon, Andy Nguyen, Adrian Butscher, Mirela BenChen, and Leonidas Guibas. 2012. Soft maps between surfaces. In Computer Graphics Forum, Vol. 31. Wiley Online Library, 1617–1626.
 Su et al. [2015] Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik LearnedMiller. 2015. Multiview Convolutional Neural Networks for 3D Shape Recognition. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV) (ICCV ’15).
 Thompson et al. [1942] Darcy Wentworth Thompson et al. 1942. On growth and form. On growth and form. (1942).
 Tong et al. [2012] Jing Tong, Jin Zhou, Ligang Liu, Zhigeng Pan, and Hao Yan. 2012. Scanning 3d full human bodies using kinects. IEEE transactions on visualization and computer graphics 18, 4 (2012), 643–650.
 Wand et al. [2009] Michael Wand, Bart Adams, Maksim Ovsjanikov, Alexander Berner, Martin Bokeloh, Philipp Jenke, Leonidas Guibas, HansPeter Seidel, and Andreas Schilling. 2009. Efficient reconstruction of nonrigid shape and motion from realtime 3D scanner data. ACM Transactions on Graphics (TOG) 28, 2 (2009), 15.
 Wand et al. [2007] Michael Wand, Philipp Jenke, QiXing Huang, Martin Bokeloh, Leonidas Guibas, and Andreas Schilling. 2007. Reconstruction of Deforming Geometry from TimeVarying Point Clouds. In Proc. SGP. 49–58.
 Wang et al. [2013] Fan Wang, Qixing Huang, and Leonidas J. Guibas. 2013. Image cosegmentation via consistent functional maps. In Proceedings of the IEEE International Conference on Computer Vision. 849–856.
 Wang et al. [2014] Fan Wang, Qixing Huang, Maks Ovsjanikov, and Leonidas J Guibas. 2014. Unsupervised multiclass joint image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3142–3149.
 Wang and Singer [2013] Lanhui Wang and Amit Singer. 2013. Exact and stable recovery of rotations for robust synchronization. Information and Inference: A Journal of the IMA 2, 2 (2013), 145–193.
 Wang et al. [2017] PengShuai Wang, Yang Liu, YuXiao Guo, ChunYu Sun, and Xin Tong. 2017. Ocnn: Octreebased convolutional neural networks for 3d shape analysis. ACM Transactions on Graphics (TOG) 36, 4 (2017), 72.
 Wang et al. [2012] Yunhai Wang, Shmulik Asafi, Oliver van Kaick, Hao Zhang, Daniel CohenOr, and Baoquan Chen. 2012. Active coanalysis of a set of shapes. ACM Transactions on Graphics (TOG) 31, 6 (2012), 165.
 Wu et al. [2016] Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. 2016. Learning a Probabilistic Latent Space of Object Shapes via 3D GenerativeAdversarial Modeling. In Proc. NIPS. 82–90.
 Younes [2010] Laurent Younes. 2010. Shapes and diffeomorphisms. Vol. 171. Springer Science & Business Media.
 Zeng et al. [2012] Wei Zeng, Ren Guo, Feng Luo, and Xianfeng Gu. 2012. Discrete heat kernel determines discrete Riemannian metric. Graphical Models 74, 4 (2012), 121–129.
Appendix A Technical Details
Proof of Theorem 5.1
Proof.
First note that is welldefined since by consistency, . The regularization constraint therefore implies .
Now let to be a diagonal matrix (implicitly corresponds to the eigenvalues of the latent shape). Note that is a nonnegative diagonal matrix, thus admits an eigendecomposition and we let . Direct computation yields that , and . Thus it follows from that . On the other hand, it is easy to verify that the eigenfunctions of satisfies the the consistency constraint and the normalization, therefore they are equivalent. ∎
Proof of Lemma 6.1
Proof.
We first prove that:
It is easy to verify that , since . In other words, is a projection operator, then so is . For the sake of simplicity, we denote in the following and by respectively. Obviously are both symmetric matrices, and . Then the above equivalence can be rerewritten as
which amounts to .
Finally, the equivalence follows from
Finally, the difference is equal to
∎
Connection between our Method and the framework of [Huang and Ovsjanikov, 2017]
Our formulation constructs a linear combination of terms ,, where , and then computing the eigenvectors associated with the largest eigenvalues of it In essence, the distortion energy constructed in [Huang and Ovsjanikov, 2017] is similarly composed of a set of terms in the form , where is the adjoint functional map from to , and is the latent basis on .
Our main observation is that, in the case of areabased operators, under the same condition as of Theorem 5.1, . A consequence of the above argument is that when both the spectra of and have no repeating eigenvalues, then their eigenvectors are identical.
We provide a sketch proof of this claim. Following the proof of Theorem 5.1, we have , where is the full eigenbasis on , and is the eigenbasis of the average/latent shape. Then we have , which implies , where is the measure of the average shape. Regarding the adjoint case, we similarly have . Therefore it is easy to verify the commutativity between and .
Appendix B Neural Network Details
b.1. Regression
b.1.1. PointCloud Architectures
We used three configurations for making pointbase architectures. In a spirit similar to [Qi et al., 2016] we implemented all (3layer deep) encoders as D convolutions with filter size
, i.e., treating each point independently. The output of the last encoding layer was further processed by a featurewise maxpool which was further processed by an FCReLU decoder. Table
2 shows the exact number of parameters (columns) in each consecutive layer for the three configurations (rows).Version  Encoder (# filters)  Decoder (# Neurons) 

A  {32, 64, 64}  {64, 12} 
B  {64, 128, 128}  {64, 12} 
C  {64, 128, 128}  {64, 128, 12} 
We trained each of these architectures with learning rates of {0.001, 0.002, 0.005, 0.007, 0.01}. The learning rate of gave the best performance in the regression experiments..
b.1.2. MLPs
We used FCReLU MLPS for which the last 3 layers had {50, 100, 12} neurons respectively. The number of neurons of the first layer was calibrated according to the size of the input difference matrix. Table 3 shows their correspondence.
# LatentBases  5  10  20  30  40  50 
# Neurons  369  185  62  29  17  11 
b.1.3. CNNs
The encoding part of our CNNs was comprised by two convolutional layers leading to a single FCReLU layer with neurons. See Table 4 and Table 5 for the parameters of the convolutional layers when the input was difference matrices and , respectively.
Layer  # Filters  Kernelsize  Stride 

First  10  (2, 2)  1 
Second  10  (4, 4)  2 
Layer  # Filters  Kernelsize  Stride 

First  10  (3, 3)  2 
Second  10  (4, 4)  2 
b.2. Reconstruction Architecture
The architecture we used here is inspired by the CNN used for regression. Again, the convolutional part comes with two encoding layers (see Table 6 for parameters). The decoder is an MLP implemented with FCReLU layers of size .
Layer  # Filters  Kernelsize  Stride 

First  20  (3, 3)  2 
Second  20  (6, 6)  2 
b.3. Training details
For training we used stochastic gradient descent with Adam
[Kingma and Ba, 2014] () and batchsize of throughout all experiments. Moreover we normalized the differences matrices by subtracting their average wrt. the training split. For the regression task, the networks operating with differencematrices were trained with a learning rate of . In the reconstruction experiments we trained the CNNarchitecture for 850 epochs with a learning rate of .
Comments
There are no comments yet.