OperatorNet: Recovering 3D Shapes From Difference Operators

04/24/2019 ∙ by Ruqi Huang, et al. ∙ Stanford University 12

This paper proposes a learning-based framework for reconstructing 3D shapes from functional operators, compactly encoded as small-sized matrices. To this end we introduce a novel neural architecture, called OperatorNet, which takes as input a set of linear operators representing a shape and produces its 3D embedding. We demonstrate that this approach significantly outperforms previous purely geometric methods for the same problem. Furthermore, we introduce a novel functional operator, which encodes the extrinsic or pose-dependent shape information, and thus complements purely intrinsic pose-oblivious operators, such as the classical Laplacian. Coupled with this novel operator, our reconstruction network achieves very high reconstruction accuracy, even in the presence of incomplete information about a shape, given a soft or functional map expressed in a reduced basis. Finally, we demonstrate that the multiplicative functional algebra enjoyed by these operators can be used to synthesize entirely new unseen shapes, in the context of shape interpolation and shape analogy applications.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Encoding and reconstructing 3D shapes is a fundamental problem in computer graphics, computer vision and related fields. Unlike images, which enjoy a canonical representation, 3D shapes are encoded through a large variety of representations, such as point clouds, triangle meshes and volumetric data, to name a few. Perhaps even more importantly, 3D shapes may undergo a diverse set of transformations, ranging from rigid motions to complex non-rigid and articulated deformations, that impact these representations.

The representation issues have become even more prominent in recent years with the advent of learning-based techniques, leading to a number of solutions for learning directly on geometric 3D data [9]. This is challenging as point clouds and meshes lack the regular grid structure exploited by convolutional architectures. In particular, the problem of representations that are well adapted for both shape analysis and especially shape synthesis

remains difficult. For example, several methods for shape interpolation have been proposed by designing deep neural networks, including auto-encoder architectures, and interpolating the latent vectors learned by such networks 

[34, 1] . Unfortunately, it is not clear if the latent vectors lie in a linear vector space, and thus linear interpolation can lead to unrealistic intermediate shapes.

Figure 1:

Shape interpolation via OperatorNet (top) and PointNet autoencoder (bottom). Our interpolations are more smooth and less distorted.

In this paper, we show that 3D shapes can not only be compactly encoded as linear functional operators, using the previously proposed shape difference operators [31], but that this representation lends itself very naturally to learning, and allows us to recover

the 3D shape information, using a novel neural network architecture which we call OperatorNet. Our key observations are twofold: first we show that since shape difference operators can be stored as canonical matrices, for a given choice of basis, they enable the use of a Convolutional Neural Network architecture for shape recovery. Second, we demonstrate that the

functional algebra that is naturally available on these operators can be used to synthesize new shapes, in the context of shape interpolation and shape analogy applications. We argue that because this algebra is well-justified theoretically, it also leads to more accurate results in practice, compared to commonly used linear interpolation in the latent space (see Figure 1).

The shape difference operators introduced in [31], have proved to be a powerful tool in shape analysis, by allowing to characterize each shape in a collection as the ‘difference’ to some fixed base shape. These difference operators, in particular, encode precise information about how and where each shape differs from the base, but also, due to their compact representation as small matrices, enable efficient exploration of global variability within the collection. Inspired by the former perspective, purely geometric approaches [7, 12] have been proposed for shape reconstruction from shape differences. Though theoretically well-justified, these approaches rely on solving difficult non-linear optimization problems and require strong regularization for accurate results, especially when truncated bases are used.

Our OperatorNet, on the other hand, aims to leverage the information encoded at both the pairwise level and the collection level by using the shape collection to guide the reconstruction. It is well-known that related shapes in a collection often concentrate near a low-dimensional manifold in the shape space [32, 19]. In light of this, the shape difference operators can help to both encode the geometry of the individual shapes, but also help to learn the constrained space of realistic shapes, which is typically ignored by purely geometric approaches.

In addition to demonstrating the representative power of the shape differences in a learning framework, we also extend the original formulation in [31], which only involves intrinsic (i.e., invariant to isometric transformations) shape differences, with a novel extrinsic difference operator that facilitates pose-dependent embedding recovery. Our formulation is both simpler and more robust compared to previous approaches, e.g. [12], and, as we show below, can more naturally be integrated in a unified learning framework.

To summarize, our contributions are as follows:

  • We propose a learning-based pipeline to reconstruct 3D shapes from a set of difference operators.

  • We propose a novel formulation of extrinsic shape difference, which complements the intrinsic operators formulated in [31].

  • We demonstrate that by applying algebraic operations on shape differences, we can synthesize new operators and thus new shapes via OperatorNet, enabling shape manipulations such as interpolation and analogy.

2 Related Work

Shape Reconstruction

Our work is closely related to shape reconstruction from intrinsic operators, which was recently considered in [7, 12]. In these works, several advanced and purely geometric optimization techniques were proposed that give satisfactory results in the presence of full information [7] or under strong (extrinsic) regularization [12]. These works have also laid the theoretical foundation for shape recovery by demonstrating that shape difference operators, in principle, contain complete information necessary for recovering the shape embedding (e.g. Propositions 2 and 4 in [12]). On the other hand, these methods also highlight the practical challenges raised by this approach to reconstruction: when reconstructing a shape without any knowledge of the collection or “shape space” that it belongs to. In contrast, we show that when using shape difference representations in a learning-based framework, realistic 3D shapes can be recovered efficiently, and moreover that entirely new shapes can be synthesized using the algebraic structure of difference operators, e.g. for shape interpolation and analogy.

Shape Representations for Learning

Our work is also related to the recent techniques aimed at applying deep learning methods to shape analysis. One of the main challenges is defining a meaningful notion on convolution, while ensuring invariance to basic transformations, such as rotations and translations. Several techniques have been proposed based on e.g. geometry images

[33], volumetric [22, 36], point-based [27] and multi-view approaches [28], as well as, very recently intrinsic techniques that adapt convolution to curved surfaces [21, 8] (see also [9] for an overview), and even via toric covers [20], among many others.

Despite this tremendous progress in the last few years, defining a shape representation that is compact, lends itself naturally to learning, while being invariant to the desired class of transformations (e.g. rigid motions) and not limited to a particular topology, remains a challenge. As we show below, our representation is well-suited for learning applications, and especially for encoding and recovering geometric structure information.

Shape Space

Exploring the structure of shape spaces has been an attractive research topic for a long time. Classical PCA-based models, e.g.  [2, 14], and more recent shape space models, adapted to specific shape classes such as humans or animals, such as [19, 37]

, or parametric model collections 

[32]; all typically leverage the fact that the space of “realistic” shapes is significantly smaller than the space of all possible embeddings. This has also recently been exploited in the context of learning-based shape synthesis applications for shape completion [17], interpolation [4] and point cloud reconstruction [1] among others. These techniques heavily leverage the recent proliferation of large shape collections such as DFAUST [6] and SHAPENET [10] to name a few. At the same time, it is not clear if, for example, the commonly used linear interpolation of latent vectors is well-justified, leading to unrealistic synthesized shapes. Instead, the shape difference operators that we use satisfy a well-founded multiplicative algebra, which, as we show, can naturally create more realistic synthetic interpolations.

3 Preliminaries and Notations

Discretization of Shapes

Throughout this paper, we assume that a shape is given as a triangular mesh , where is the vertex set, and is the set of triangles encoding the connectivity of the vertices.

Laplace-Beltrami Operator

We associate with each shape a discretized Laplace-Beltrami operator, , using the cotangent weight scheme from [23, 26], where is the cotangent weight (stiffness) matrix, and is the diagonal lumped area (mass) matrix. Furthermore, we denote by the diagonal matrix containing the

smallest eigenvalues, and by

the corresponding eigenvectors of

, such that . In particular, the eigenvalues stored in are non-negative and can be ordered as . The columns of are sorted accordingly, and are orthonormal with respect to the area matrix, i.e. , the identity matrix. It is well-known that the Laplace-Beltrami eigenbasis provides a multi-scale understanding of a shape [16], and allows to approximate the space of functions via a subspace spanned by the first few eigenvectors of .

Functional Maps

The functional map framework was introduced in [24] primarily as an alternative representation of maps across shapes. In our context, given two shapes and a point-wise map from to , we can express the functional map from to , as follows:

(1)

Here, is the area matrix of , and is a binary matrix satisfying if and otherwise. Note that is a matrix, where is the number of basis functions chosen on and . This matrix allows to transport functions as follows: if is a function on expressed as a vector of coefficients , s.t. , then is the vector of coefficients of the corresponding function on , expressed in the basis of .

In general, not every functional map matrix arises from a point-wise map, and might include for example soft correspondences, which map a point to a probability density function. All of the tools that we develop below can accommodate such general maps. This is a key advantage of our approach, as it does not rely on all shapes having the same number of points, and only requires the knowledge of functional map matrices, which can be computed using existing techniques

[25, 18].

Intrinsic Shape Difference Operators

Finally, to represent shapes themselves, we use the notion of shape difference operators proposed in [31]. Within our setting, they can be summarized as follows: given a base shape , an arbitrary shape and a functional map between them, let (resp. ) be a positive semi-definite matrix, which defines some inner product for functions on (resp. ) expressed in the corresponding bases. Thus, for a pair of functions on expressed as vectors of coefficients , we have

Note that these two inner products are not comparable, since they are expressed in different bases. Fortunately, the functional map plays a role of basis synchronizer. Thus, a shape difference operator, which captures the difference between and is given simply as:

(2)

where is the Moore-Penrose pseudo-inverse.

The original work [31] considered two intrinsic inner products, which using the notation above, can be expressed as: , and . These inner products, in turn lead to the following shape differences operators:

Area-based (): (3)
Conformal (): (4)

These shape difference operators have several key properties. First, they allow to represent an arbitrary shape , as a pair of matrices of size , independent of the number of points, by requiring only a functional map between the base shape and . Thus, the size of this representation can be controlled by choosing an appropriate value of which allows to gain multi-scale information about the geometry of , from the point of view of . Second, and perhaps more importantly, these matrices are invariant to rigid (and indeed any intrinsic isometry) transformation of or . Finally, previous works [12] have shown that shape differences in principle contain complete information about the intrinsic geometry of a shape. As we show below these properties naturally enable the use of learning applications for shape recovery.

Functoriality of Shape Differences

Another useful property of the shape difference operators is functoriality, shown in [31], and which we exploit in our shape synthesis applications in Section 7. Given shape differences of shapes and with respect to a base shape , functoriality allows to compute the difference , without functional maps between and . Namely (see Prop. 4.2.4 in [11]):

(5)

Intuitively, this means that shape differences naturally satisfy the multiplicative algebra: , up to a change of basis ensured by .

This property can be used for shape analogies: given shapes and , to find such that relates to in the same way as relates to (see the illustration in Figure 2), which can be solved by looking for a shape that satisfies: . In our application, we first create an appropriate and then use our network to synthesize the corresponding shape.

Figure 2: Illustration of shape analogy.

Finally, the multiplicative property also suggests a way of interpolation in the space of shape differences. Namely, rather than using basic linear interpolation between and , we interpolate on the Lie algebra of the Lie group of shape differences, using the exponential map and its inverse, which leads to:

(6)

Here and are matrix exponential and logarithm respectively. Note that, around identity, the linearization provided by the Lie algebra is exact, and we have observed it to produce very accurate results in general.

4 Extrinsic Shape Difference

In theory, with purely intrinsic information one at the best can determine a shape up to isometric transformations, which in our setting means being able to capture the edge lengths of the mesh. Recovering the shape from its edge lengths, while possible in certain simple scenarios, nevertheless often leads to ambiguities, as highlighted in [12]. To alleviate such ambiguities, we propose to augment the existing purely intrinsic shape differences with a novel compatible shape difference operator that gives rise to a more complete characterization of shapes, and in turn boosts our reconstruction.

One basic approach that has been used to combine extrinsic or embedding-dependent information with the multi-scale Laplace-Beltrami basis, is simply to project the 3D coordinates, as functions onto the basis, to obtain three vectors of coefficients (one for each coordinates): , where is the matrix of vertex coordinates [16, 15]. Unfortunately representing a shape through , while also multi-scale and compact, has several limitations. First this representation is not rotationally invariant, and second, it does not provide information about intrinsic geometry, so that interpolation of coordinate vectors can easily lead to loss of shape area, for example.

Another option, which is more compatible with the shape difference representation and is rotationally invariant, is to encode the inner products of coordinate functions on each shape using the Gram matrix where is again the matrix of coordinate functions. Expressing in the corresponding basis, and using Eq. (2) gives rise to a shape difference-like representation of the shape coordinates. Indeed, the following theorem (see proof in Appendix A) guarantees that the resulting shape difference representation contains the same information, up to rotational invariance, as simply projecting the coordinates onto the basis.

Theorem 1.

Let be the extrinsic inner product encoded in , where is the lumped mass matrix (as normalization factor), then one can recover the projections of the coordinate functions, , on the subspace spanned by from up to rigid transformations. In particular, when is a complete full basis, the recovery of is exact.

As an illustration of Theorem 1, we show in Figure 3 the embeddings recovered from when the number of basis functions in range from 10 to 300.

Figure 3: From left to right: original shape with 1000 vertices, the recovered embedding from encoded in the leading k = 10, 60, 100 and 300 eigenbasis of the original shape.

Representing a shape via its Gram matrix in either the full or the reduced basis has one key limitation, however, since the rank of is at most 3, meaning that the majority of its eigenvalues are zero. This turns out to be an issue in applications, where gaining information about the local geometry of the shape is important, for example in our shape analogies experiments.

To compensate for this rank deficiency, we finalize our construction of the extrinsic inner product by making it Laplacian-like:

(7)

Where is , i.e., the squared Euclidean distance between points on the shape, weighted by the respective vertex area measure. Since can be regarded as the Laplacian of a complete graph, all but one of its eigenvalues are strictly positive.

It is worth noting that the Gram matrix and the squared Euclidean distance matrix are coherently related and can be recovered from each other as is commonly done in the Multi-Dimensional Scaling literature [13].

To summarize, given a base shape , another shape and a functional map we encode the extrinsic information of from the point of view of as follows:

(8)

In Figure 4, we compute and

of the target shape with respect to the base, and color code their respective eigenfunctions associated with the largest eigenvalue, on the shapes to the right. As argued in

[31] these functions capture the areas of highest distortion between the shapes, with respect to the corresponding inner products. Note that the eigenfunction of captures the armpit where the local area changes significantly, while that of captures the hand, where the pose changes are evident.

Figure 4: A pair of shapes are compared. The most area (resp. extrinsic) distorted region is captured by the leading eigenfunction of the area-based (resp. extrinsic) shape difference.

It is worth noting that in [12], the authors also propose a shape difference formulation for encoding extrinsic information, which is defined on the shape offset in order to extract information for surface normal. However, the construction of offset can lead to instabilities, and moreover, this formulation only gives information about local distances, making it hard to recover large changes in pose.

5 Network Details

Problem Setup

Our general goal is to develop a neural network capable of recovering the coordinates functions of a shape, given its representation as a set of shape difference operators. This way we aim to solve the same problem considered in [7, 12]. However, unlike these previous, purely geometric methods, we further leverage a collection of training shapes to learn and constrain the reconstruction to the space of realistic shapes.

Thus, we assume that we are given a collection of shapes, each represented by a set of shape difference operators with respect to a fixed base shape. We also assume the presence of a point-wise map from the base shape to each of the shapes in the collection, which allows us to compute the “ground truth” embedding of each shape. We represent this embedding as three coordinate functions on the base shape. Our goal then is to design a network, capable of converting the input shape difference operators to the ground truth coordinate functions.

At test time, we use this network to reconstruct a target shape given only the shape difference operators with respect to the base shape. These shape difference operators can be obtained using the knowledge of a functional map from the base shape, or synthesized directly for shape analogies or interpolations applications.

Architecture

To solve the problem above we developed the OperatorNet architecture, which takes as input shape difference matrices and outputs coordinate functions representing the original shapes. Our network has two modules: a shallow convolutional encoder and a 3-layer dense decoder as shown in Figure 5.

Figure 5: OperatorNet architecture. Given the shape difference operators as input, OperatorNet outputs the coordinate functions of a shape. In particular, one can efficiently extract information from the difference operators (here considered as channels) with a simple and standard network architecture, which consists of a convolutional encoder and a fully connected decoder built with dense layers, as shown above.

The grid structure of shape differences is exploited by the encoder through the use of convolutions. Note however that translation invariance does not apply to these matrices.

After comparing with multiple depths of encoders, we selected a shallow version as it performed the best in practice, implying that the shape difference representation already encodes meaningful information efficiently. Moreover, as shown in [12] the edge lengths of a mesh can be recovered from intrinsic shape differences through a series of least squares problems, hinting that increasing the depth of the network and thus the non-linearity might not be necessary with shape differences.

On the other hand, the decoder is selected for its ability to transform the latent representation to coordinate functions for reconstruction and interpolation tasks.

Input Shape Differences

We construct the input shape differences using a truncated eigenbasis of dimension on the base shape, and the full basis on the target one, within all experiments, regardless of the number of vertices on the actually shapes. The functional maps from the base to the targets are induced by the identity maps, since our training shapes are in 1-1 correspondence. This implies that each of the shapes is represented by three matrices, representing the area-based, conformal and extrinsic shape differences respectively. The independence among the shape differences allows flexibility in selecting the combination of input shape differences. In Section 6, we compare the performance of several combinations. A detailed ablation study with respect to the input channels and network depth is presented in Appendix D.

It is worth noting that recent learning-based shape matching techniques enable efficient (functional) maps estimation. In particular, we adapt the framework of 

[30] and evaluate OperatorNet trained with computed shape differences in Section 6.

Datasets

We train OperatorNet on two types of datasets: humans and animals.

For human shapes, our training set consists of 9440 shapes sampled from the DFAUST dataset [6] and 8000 from the SURREAL dataset [35], which is generated with the model proposed in [19]. The DFAUST dataset contains 4D scan of 10 human characters subject to a various of motions. On the other hand, the SURREAL dataset injects more variability to the body types.

For animals, we used the parametric model proposed in SMAL [37] to generate 1800 animals of 3 different species, including lions, dogs, and horses. The meshes of the humans (resp. animals) are simplified to 1000 vertices (resp. 1769 vertices).

Loss Function

OperatorNet reconstructs coordinate functions of a given training shape. Our shape reconstruction loss operates in two steps. First, we estimate the optimal rigid transformation to align the ground truth coordinate functions and the reconstructed ones using the Kabsh algorithm [3] with ground truth correspondences. Second, we estimate the mean squared error between the aligned reconstruction and the ground truth.

(9)

Here is the function that computes the optimal transformation between and

. We align the computed reconstructions to the ground truth embedding, so that the quality of the reconstructed point cloud is invariant to rigid transformations. This is important since the shape difference operators are invariant to rigid motion of the shape, and thus the network should not be penalized, for not recovering the correct orientation. On the other hand, this loss function is differentiable, since we use a closed-form expression of

, given by the SVD, which enables back-propagation in neural network training.

6 Evaluation

In this section, we provide both qualitative and quantitative evaluations of the results from OperatorNet, and compare them to the geometric baselines.

Evaluation Metrics

We denote by and the ground-truth and the reconstructed meshes respectively. First we denote by , where is the rotationally-invariant distance defined in Eq. (9) and is the coordinate functions of . For a comprehensive, unbiased evaluation and comparison, we introduce the following two new metrics: (1) , i.e., the relative error of mesh volumes; (3) , where is the length of edge .

Baselines

Two baselines are considered: (1) the intrinsic reconstruction method from [7], in which we evaluate with the ‘Shape-from-Laplacian’ option and use full basis in both the base shape and the target shape; (2) the reconstruction method from [12], where the authors construct offset surfaces of the shapes to take account of extrinsic geometry, we evaluate with the same basis truncation as our input. Moreover, the latter also gives a version of purely intrinsic reconstruction. Beyond that, we also consider the nearest neighbourhood retrieval from the training set with respect to distances between shape difference representations.

Test Data

We retain 800 shapes from DFAUST dataset as the test set, which contain 10 sub-collections (character + action sequence, each consists of 80 shapes) that are isolated from the training/validation set. For the efficiency of baseline evaluation, we further sample 5 shapes via furthest point sampling regarding the pair-wise Hausdorff distance from each of the sub-collection, resulting in a set of 50 shapes, that covers significant variability in both styles and poses in the original test set.

Quantitative Results

We list all the scores regarding the metrics defined above in Table 1. First of all, OperatorNet using both intrinsic and extrinsic shape differences achieved the smallest reconstruction error (i.e., ), and the purely intrinsic version is next to the best. OperatorNet trained on shape differences from computed functional maps achieve competing performances showing that our method is efficient even in the absence of ground truth one-to-one correspondences. Note also that all versions of OperatorNet significantly outperform the other baselines.

Regarding the volume and edge recovery accuracy, either the complete or the intrinsic-only version of OperatorNet achieves second to the best result. It is worth noting that, since the nearest neighbourhood search in general retrieves the right body type, the volume is well recovered. On the other hand, full Laplacian of the target shape is provided as input for the Shape-from-Laplacian baseline, thus it is expected to preserve well the intrinsic information.

Op.Net (Int+Ext) 0.014 0.045
Op.Net (Int) 2.41 0.013 0.046
Op.Net (Ext) 1.25 0.017 0.046
Op.Net (Comp)(Ext) 3.86 0.021 0.052
Op.Net (Comp)(Int+Ext) 6.22 0.022 0.053
SfL [7] 48.8 0.081
FuncChar [12](Int) 65.1 0.356 0.118
FuncChar [12] (Int+Ext) 28.4 0.028 0.110
NN 25.5 0.043
Table 1: Quantitative evaluation of shape reconstruction ( is at the scale of ).

Qualitative Results

We demonstrate the reconstructed shapes from OperatorNet and the aforementioned baselines in Figure 6, the red shape in each row therein is the respective ground-truth target shape. The base shape in this experiment (also the base shape we compute shape difference on) is shown in Figure 4, which is in the rest pose. The geometric baselines in general perform worse when the poses changes significantly with respect to the base (see the top two rows in Figure 6), but gives relatively better result when the difference is mainly on the style (see the last row).

Our method, on the other hand, produces consistently good reconstruction in all the cases. It is also worth noting that, as expected, OperatorNet using all 3 types of shape differences gives both the best quantitative and qualitative results.

Figure 6: Qualitative comparison of our reconstructions and the baselines.

Finally, we present a qualitative verification of the generalization power of OperatorNet in Appendix B.

7 Applications

In this section, we present all of our results using OperatorNet trained with both intrinsic and extrinsic shape differences, which are induced by ground-truth maps.

7.1 Shape Interpolation

Given two shapes, we first interpolate the regarding shape differences using the formulation in Eq.(6) (an alternative is to interpolate shape differences linearly, we provide comparison to such in Appendix C), and then synthesize intermediate shapes by inferring the interpolated shape differences with OperatorNet.

Figure 7: Shape interpolation between two humans. Note that the interpolation by Multi-chart GAN is less evenly developed than the bottom row (e.g., the positions of the arms in the centre three shapes change abruptly); autoencoders based on PointNet and PointNet++ both produce shapes with local area distortion; and the interpolation from Nearest-Neighborhood retrieval is not continuous. Contrastingly, the interpolation via OperatorNet is more natural and smooth, compared to the baselines (see text for more details).
Figure 8: Shape interpolation from a tiger (left) to a horse (right) using OperatorNet trained on animals dataset.

We compare our method against several baselines. First, a PointNet autoencoder is trained with the encoder architecture from [27] and with our decoder. Two versions of PointNet are trained: one autoencoder with spatial transformers and one without. The autoencoder without spatial transformers performs better at reconstruction and interpolation, it is therefore selected for the comparisons. Another autoencoder based on PointNet++ [29] is similarly trained.

Nearest neighbour (NN) interpolation retrieves the nearest neighbour of the interpolated shape differences in the training set and uses the corresponding labels for interpolation.

Moreover, we also compare to the interpolation result from a recent work [5], where a GAN is trained to generate realistic human shapes. Note that, as stated in [5], the interpolation is done as follows: first one picks two randomly generated latent vectors , which, via the GAN gives arise to two shapes . Then, the interpolation between the two shapes are achieved as , where . In particular, we randomly generate 1000 shapes using their trained model and pick that are nearest to the red shapes in the last row of Figure 7 for interpolation comparison.

In Figure 7, the first row shows the interpolation between the two end shapes by [4], which is not developed evenly, for instance, the arms change abruptly during the three middle shapes, while there is little change on that region afterwards. As shown in the second and the third rows, both the results of the two autoencoders suffer obvious area distortions in the arms (see a detailed comparison in Figure 1). The fourth row of Figure 7 indicates the NN approach fails to deliver a continuous deformation sequence. In contrast, interpolation using OperatorNet is continuous and respects the structure and constraints of the body, suggesting that shape differences encode efficiently the structure.

We also train OperatorNet on the animals dataset as described in Section 5 and show in Figure 8 an interpolation from a tiger to a horse.

7.2 Shape Analogy

Our second application is to construct semantically meaningful new shapes based on shape analogies. Given shapes , our goal is to construct a new shape , such that is to as is to .

Following the discussion in Section 3, the functoriality of shape differences allows an explicit and mathematically meaningful way of constructing the shape difference of , given that of and . Namely, Then, with our OperatorNet, we reconstruct the embedding of the unknown by feeding to the network.

Figure 9: Transferring gender via shape analogies: and are a fixed pair of human shapes with similar poses and styles, but of different genders. We generate , which is supposed to be a ‘female’ version of the varying . Our analogies are semantically meaningful, while PointNet can produce suboptimal results (see the red dotted boxes for the discrepancies).

We compare our results to that of the PointNet autoencoder. For the latter, we reconstruct by decoding the neural-network’s latent code with respect to the additive formula, i.e., , where is the latent code of shape (and similarly for ).

In Figure 10, we show shape analogies of pose transfer (top row) and style transfer (bottom row) via OperatorNet and PointNet autoencoder on human shapes. It is evident that our results are both more natural and intuitive.

Figure 10: Human shape analogies via OperatorNet and PointNet autoencoder (see the red dotted boxes for the discrepancies).

We also show analogies among animals in Figure 11, where we present both pose transfer (top row) and style transfer (bottom row) and comparison to the results of PointNet.

Figure 11: Animal shape analogies via OperatorNet and PointNet autoencoder (see the red dotted boxes for the discrepancies).

Moreover, we present a set of more challenging shape analogies that transfer gender across human shapes in Figure 9. Namely, we fix , and test analogies with different (the 6 male shapes in red). Note that and are two characters in comparable poses and styles but of different genders, ideally the right analogies, , should be a ‘female’ version of with similar poses and styles.

We present the analogies from OperatorNet in the first results column, and those from PointNet in the second results column (both in blue). It is obvious that PointNet, though works in some cases, in general produces less semantically meaningful analogies than ours (see discrepancies in the red dotted boxes).

8 Conclusion & Future Work

In this paper we have introduced a novel learning-based technique for recovering shapes from their difference operators. Our key observation is that shape differences, stored as compact matrices lead themselves naturally to learning, and allow to both recover the underlying shape space in a collection, and encode the geometry of individual shapes. We also introduced a novel extrinsic shape difference operator and showed its utility for shape reconstruction and other applications such as shape interpolation and analogies.

Currently our approach is only well-suited to shapes represented as triangle meshes. Future work is required in extending our framework to both learning (optimal) inner products from data, and adapting it to other representations, such as point clouds or triangle soups.

References

  • [1] P. Achlioptas, O. Diamanti, I. Mitliagkas, and L. Guibas. Learning representations and generative models for 3d point clouds. arXiv preprint arXiv:1707.02392, 2017.
  • [2] D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, and J. Davis. SCAPE: Shape Completion and Animation of People. In ACM Transactions on Graphics (TOG), volume 24, pages 408–416. ACM, 2005.
  • [3] K. S. Arun, T. S. Huang, and S. D. Blostein. Least-squares fitting of two 3-d point sets. IEEE Transactions on pattern analysis and machine intelligence, (5):698–700, 1987.
  • [4] H. Ben-Hamu, H. Maron, I. Kezurer, G. Avineri, and Y. Lipman. Multi-chart generative surface modeling. In Proc. SIGGRAPH Asia, page 215. ACM, 2018.
  • [5] H. Ben-Hamu, H. Maron, I. Kezurer, G. Avineri, and Y. Lipman. Multi-chart generative surface modeling. ACM Trans. Graph., 37(6):215:1–215:15, Dec. 2018.
  • [6] F. Bogo, J. Romero, G. Pons-Moll, and M. J. Black. Dynamic FAUST: Registering human bodies in motion. In CVPR, July 2017.
  • [7] D. Boscaini, D. Eynard, D. Kourounis, and M. M. Bronstein. Shape-from-operator: Recovering shapes from intrinsic operators. In Computer Graphics Forum, volume 34, pages 265–274. Wiley Online Library, 2015.
  • [8] D. Boscaini, J. Masci, E. Rodolà, and M. Bronstein. Learning shape correspondence with anisotropic convolutional neural networks. In Advances in Neural Information Processing Systems, pages 3189–3197, 2016.
  • [9] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, 2017.
  • [10] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. ShapeNet: An Information-Rich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago, 2015.
  • [11] E. Corman. Functional representation of deformable surfaces for geometry processing. PhD thesis, 2016. PhD thesis.
  • [12] E. Corman, J. Solomon, M. Ben-Chen, L. Guibas, and M. Ovsjanikov. Functional characterization of intrinsic and extrinsic geometry. ACM Trans. Graph., 36(2):14:1–14:17, Mar. 2017.
  • [13] T. F. Cox and M. Cox. Multidimensional Scaling, Second Edition. Chapman and Hall/CRC, 2 edition, 2000.
  • [14] N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, and H.-P. Seidel. A statistical model of human pose and body shape. In Computer graphics forum, volume 28, pages 337–346. Wiley Online Library, 2009.
  • [15] A. Kovnatsky, M. M. Bronstein, A. M. Bronstein, K. Glashoff, and R. Kimmel. Coupled quasi-harmonic bases. In Computer Graphics Forum, volume 32, pages 439–448. Wiley Online Library, 2013.
  • [16] B. Levy. Laplace-beltrami eigenfunctions towards an algorithm that "understands" geometry. In IEEE International Conference on Shape Modeling and Applications 2006 (SMI’06), pages 13–13, June 2006.
  • [17] O. Litany, A. Bronstein, M. Bronstein, and A. Makadia. Deformable shape completion with graph convolutional autoencoders. In Proc. CVPR, pages 1886–1895, 2018.
  • [18] O. Litany, T. Remez, E. Rodolà, A. Bronstein, and M. Bronstein. Deep functional maps: Structured prediction for dense shape correspondence. In Proc. ICCV, pages 5659–5667, 2017.
  • [19] M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black. Smpl: A skinned multi-person linear model. ACM Trans. Graph., 34(6):248:1–248:16, Oct. 2015.
  • [20] H. Maron, M. Galun, N. Aigerman, M. Trope, N. Dym, E. Yumer, V. G. KIM, and Y. Lipman. Convolutional neural networks on surfaces via seamless toric covers. 2017.
  • [21] J. Masci, D. Boscaini, M. Bronstein, and P. Vandergheynst. Geodesic convolutional neural networks on riemannian manifolds. In Proc. ICCV workshops, pages 37–45, 2015.
  • [22] D. Maturana and S. Scherer. Voxnet: A 3d convolutional neural network for real-time object recognition. In Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on, pages 922–928. IEEE, 2015.
  • [23] M. Meyer, M. Desbrun, P. Schröder, and A. H. Barr. Discrete Differential-Geometry Operators for Triangulated 2-Manifolds. In Visualization and mathematics III, pages 35–57. Springer, 2003.
  • [24] M. Ovsjanikov, M. Ben-Chen, J. Solomon, A. Butscher, and L. Guibas. Functional Maps: A Flexible Representation of Maps Between Shapes. ACM Transactions on Graphics (TOG), 31(4):30, 2012.
  • [25] M. Ovsjanikov, E. Corman, M. Bronstein, E. Rodolà, M. Ben-Chen, L. Guibas, F. Chazal, and A. Bronstein. Computing and processing correspondences with functional maps. In ACM SIGGRAPH 2017 Courses, 2017.
  • [26] U. Pinkall and K. Polthier. Computing Discrete Minimal Surfaces and their Conjugates. Experimental mathematics, 2(1):15–36, 1993.
  • [27] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: deep learning on point sets for 3d classification and segmentation. CoRR, abs/1612.00593, 2016.
  • [28] C. R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, and L. J. Guibas. Volumetric and multi-view cnns for object classification on 3d data. In Proc. CVPR, pages 5648–5656, 2016.
  • [29] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. CoRR, abs/1706.02413, 2017.
  • [30] J. Roufosse and M. Ovsjanikov. Unsupervised deep learning for structured shape matching. CoRR, abs/1812.03794, 2018.
  • [31] R. M. Rustamov, M. Ovsjanikov, O. Azencot, M. Ben-Chen, F. Chazal, and L. Guibas. Map-based exploration of intrinsic shape differences and variability. ACM Transactions on Graphics (TOG), 32(4):1, 2013.
  • [32] A. Schulz, A. Shamir, I. Baran, D. I. W. Levin, P. Sitthi-Amorn, and W. Matusik. Retrieval on parametric shape collections. ACM Trans. Graph., 36(4), Jan. 2017.
  • [33] A. Sinha, J. Bai, and K. Ramani. Deep learning 3d shape surfaces using geometry images. In European Conference on Computer Vision, pages 223–240. Springer, 2016.
  • [34] A. Sinha, A. Unmesh, Q. Huang, and K. Ramani. Surfnet: Generating 3d shape surfaces using deep residual networks. In CVPR, pages 791–800, July 2017.
  • [35] G. Varol, J. Romero, X. Martin, N. Mahmood, M. J. Black, I. Laptev, and C. Schmid. Learning from synthetic humans. In CVPR, 2017.
  • [36] P.-S. Wang, Y. Liu, Y.-X. Guo, C.-Y. Sun, and X. Tong. O-cnn: Octree-based convolutional neural networks for 3d shape analysis. ACM Transactions on Graphics (TOG), 36(4):72, 2017.
  • [37] S. Zuffi, A. Kanazawa, D. Jacobs, and M. J. Black. 3D menagerie: Modeling the 3D shape and pose of animals. In CVPR), July 2017.

Appendix A Proof of Theorem 1

Proof.

Since is known to be of rank 3, and is symmetric, we have, by SVD:

where,

are respectively the top 3 singular vectors and singular values of

. Therefore, we have , where is a rigid transformation matrix satisfying . In other words, we recover from , where is equivalent to up to rigid transformations. And to recover the projection of in , we simply compute .

To conclude, our recovery is done in two steps: first we use SVD to recover as , and in the second step, we compute for the projection of in the space spanned by , and is equivalent to up to rigid transformations. ∎

Appendix B Verification of the Generalization power of OperatorNet

Figure 12: Top row: ground-truth embeddings; middle row: reconstructions from OperatorNet; bottom row: shapes from the training set, whose shape differences that are closest to the ones of the test shapes in the top row.

To demonstrate the generalization power of OperatorNet, we show in Figure 12 our reconstructions of test shapes from the SURREAL dataset. For comparison, we also retrieve the shapes in the training set, whose shape differences are the nearest to the ones of the test shapes. In each of the figures, the top row presents the ground-truth test shapes; the middle row shows reconstructions from OperatorNet; the bottom row demonstrates the shapes retrieved from the training set via nearest neighbourhood search regarding shape differences.

It is evident that OperatorNet accurately reconstruct the test shapes, which deviate from the shapes in the training set significantly, suggesting that our network generalizes well in unseen data.

Appendix C Comparison of Interpolation Schemes for Shape Differences

In the following experiment we note that, since the shape differences are represented by matrices, it is also possible to interpolate shape differences linearly, i.e., . However, as we argue in Section 3, the multiplicative property of shape differences suggests that it is more natural to interpolate the difference operators following Eq. (6). To illustrate this point, we show in Figure 14 interpolated sequences with respect to the two schemes above – the multiplicative one in the top row and the linear one in the bottom row. It is visually evident that the former leads to more continuous and evenly deformed sequence. Moreover, we compute the distance between consecutive shapes in both sequences and plot the distributions in the bottom panel of Figure 13 as a quantitative verification.

Figure 13: The distances between consecutive reconstructed embeddings for both sequences. The multiplicative scheme clearly delivers more smooth deformation sequence.
Figure 14: Reconstructions regarding shape differences interpolated using multiplicative scheme (first row) and using linear scheme (second row).

Appendix D Ablation Study on Network Design

We investigate multiple architectures for OperatorNet. In Table 2 we compare the reconstruction performance over different combinations of input shape differences, and different depths of encoders.

We report the performance of 4 different convolutional encoders from 1 to 4 layers deep by doubling the number of neurons every layer.

Two trends are observed in Table 2: first, we always achieve the best performance when all three types of shape differences are used, for all depth of the network; second, fixing the combination of input shape differences, the network performs better as its depth gets shallower.

Putting these two observations together, we justify our final model, which has one single layer convolutional encoder and uses all three types of shape differences as input.

Encoder architecture Area Ext Conf A+E A+C E+C A+E+C
Conv. 8 8.61 4.29 3.78 3.82 3.41 2.56 2.46
Conv. 816 9.08 4.54 4.28 4.65 3.93 3.10 3.05
Conv. 81632 9.90 5.54 4.91 5.59 4.88 3.71 3.55
Conv. 8163264 11.16 6.39 5.93 6.89 5.42 4.35 4.24
Table 2: Ablation study: auto-encoder performance on DFAUST testset (measured by the loss function as defined in Eq. (9), the errors in the table are at the scale of ).