ShapeFlow: Learnable Deformations Among 3D Shapes

06/14/2020 ∙ by Chiyu "Max" Jiang, et al. ∙ Google berkeley college Stanford University 0

We present ShapeFlow, a flow-based model for learning a deformation space for entire classes of 3D shapes with large intra-class variations. ShapeFlow allows learning a multi-template deformation space that is agnostic to shape topology, yet preserves fine geometric details. Different from a generative space where a latent vector is directly decoded into a shape, a deformation space decodes a vector into a continuous flow that can advect a source shape towards a target. Such a space naturally allows the disentanglement of geometric style (coming from the source) and structural pose (conforming to the target). We parametrize the deformation between geometries as a learned continuous flow field via a neural network and show that such deformations can be guaranteed to have desirable properties, such as be bijectivity, freedom from self-intersections, or volume preservation. We illustrate the effectiveness of this learned deformation space for various downstream applications, including shape generation via deformation, geometric style transfer, unsupervised learning of a consistent parameterization for entire classes of shapes, and shape interpolation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 5

page 9

page 10

page 11

page 12

page 14

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Learning a shared representation space for geometries is a central task in 3D Computer Vision and in Geometric Modeling as it enables a series of important downstream applications, such as retrieval, reconstruction, and editing. For instance,

morphable models blanz1999morphable is a commonly used representation for entire classes of shapes with small intra-class variations (i.e., faces), allowing high quality geometry generation. However, morphable models generally assume a shared topology and even the same mesh connectivity for all represented shapes, and are thus less extensible to general shape categories with large intra-class variations. Therefore, such approaches have limited applications beyond collections with a shared structure such as humans blanz1999morphable; SMPL:2015 or animals zuffi20173d.

In contrast, when trained on large shape collections (e.g., ShapeNet chang2015shapenet), 3D generative models are not only able to learn a shared latent space for entire classes of shapes (e.g., chairs, tables, airplanes), but also capture large geometric variations between classes. A main area of focus in this field has been developing novel geometry decoders for these latent representations. These generative spaces allow the mapping from a latent code to some geometric representation of a shape, examples being voxels choy20163d; tatarchenko2017octree, meshes groueix2018papier; nash2020polygen, convexes deng2020cvxnet; chen2020bspnet, or implicit functions chen2019learning; genova2019deepsif. Such latent spaces are generally smooth and allow interpolation or deformation between arbitrary objects represented in this encoding. However, the shape generation quality is highly dependent on the decoder performance and generally imperfect. While some decoder architectures are able to produce higher quality geometries, auto-encoded shapes never exactly match their inputs, leading to a loss of fine geometric details.

Figure 1: Schematic for learning a deformation space using ShapeFlow. (a)Our input is either a sparse point cloud, or a depth map converted into a point cloud. (b) The visualization of the learned latent embedding (2D PCA) of sample shapes in the training set. ShapeFlow learns a geometrically meaningful embedding of geometries based on deformation distances in an unsupervised manner. (c) The unsupervised deformation space facilitates various downstream applications, including shape correspondences, reconstruction, and style transfer.

In this paper we introduce a different approach to shape generation based continuous flows between shapes that we term ShapeFlow. The approach views the shape generation process from a new perspective – rather than learning a generative space where a learned decoder directly maps a latent code to the shape as , ShapeFlow learns a deformation space facilitated by a learned deformer , where a novel shape is acquired by deforming one of many possible template shapes via this learned deformer: , where are the latent codes corresponding of and .

This deformation-centric view of shape generation has various unique properties. First, a deformation space, compared to a generative space, naturally disentangles geometry style from structure. Style comes from the choice of source shape , which also includes the shape topology and mesh connectivity. Structure includes the general placement of different parts, such as limb positioning in a human figure (i.e., pose), height and width of chair parts etc. Second, unlike template-based mesh generation frameworks such as wang2018pixel2mesh; litany2018deformable; groueix2018papier, whose generated shapes are inherently limited by the the template topology, a deformation space allows a multi-template scenario where each of the source shapes can be viewed as a template. Also, unlike volumetric decoders that require a potentially computationally intensive step for extracting surfaces (e.g., Marching Cubes), ShapeFlow directly outputs a mesh (or a point cloud) through deforming the source shape. Finally, by routing the deformations through a common waypoint in this space, we can learn a shared template for all geometries of the same class, despite differences in meshing or topology, allowing unsupervised learning of dense correspondences between all shapes within the same class.

The learned deformation function  deforms the template shape into so that it is geometrically close to the target shape . Our deformation function is based on neurally parametrized 3D vector fields or flows that locally advect a template shape towards its destination. This novel way of modeling deformations has various innate advantages compared to existing methods. We show that deformation induced by a flow naturally prevents self-intersections. Furthermore, we demonstrate that we can parametrize a divergence-free flow field effectively using a neural network, which ensures volume conservation during the deformation process. Finally, ShapeFlow ensures path invertibility (), and therefore also identity preservation (). Compared to traditional deformation parameterizations in computer graphics such as control handles schaefer2006image; jacobson2011bounded and control cages joshi2007harmonic; lipman2008green; weber2009complex, ShapeFlow is a flow-model realized by a neural network, allowing a more fine grained deformation without requiring user intervention.

In summary, our main contributions are:

  1. [leftmargin=*]

  2. We propose a flow-based deformation model via a neural network that allows exact preservation of identity, good preservation of local geometric features, and disentangles geometry style and structure.

  3. We show that our deformations by design prevent self-intersections and can preserve volume.

  4. We demonstrate that we can learn a common template for a class of shapes through which we can derive dense correspondences.

  5. We apply our method to interpolate shapes in different poses, producing smooth interpolation between key frames that can be used for animation and content creation.

2 Related work

Traditionally, shape representation in 3D computer vision roughly falls into two categories, template-based representation and template-free representation. In contrast, ShapeFlow fills a gap in between – it can be viewed as a multi-template space, where the source topology can be based on any of the training shapes, and where a very general deformation model is adopted.

Template-based representations

These methods generally assume a fixed topology for all modelled geometries. Morphable models blanz1999morphable is a commonly used representation for entire classes of shapes with very small intra-class variations, such as faces blanz1999morphable; booth20163d; huber2016multiresolution; zhu2015discriminative; zhu2016face; zhu2015high, heads dai20173d; ploumpis2020towards, human bodies hasler2009statistical; allen2003space; SMPL:2015, and even animals zuffi20173d; zuffi2018lions. Morphable models generally assume a shared topology and even the same mesh connectivity for all represented shapes, which restricts its use to few shape categories. Recently, neural networks have been employed to generate 3D shapes via morphable models genova2018unsupervised; sanyal2019learning; litany2018deformable; ranjan2018generating; kolotouros2019convolutional; zuffi20173d; zuffi2018lions. Some recent work has extended the template-based approach to shapes with larger variations wang2018pixel2mesh; groueix2018papier; deprelle2019learning; ganapathi2018parsing, but the generated results are polygonal meshes that often contain self-intersections and are not-watertight.

Template-free representations

These methods generally produce a volumetric implicit representation for the geometries rather than directly representing the surface under certain surface parameterizations, thus allowing the same model to model geometries across different topologies, with potentially large geometric variations. Earlier works in this line utilize voxel representations wu20153d; wu2016learning. Recently, the use of continuous implicit function decoders mescheder2019occupancy; park2019deepsdf; chen2019learning has been popularized due to its strong representation capacity for more detailed geometry. Similar ideas are extended to represent color, light field, and other scene related properties sitzmann2019scene; mildenhall2020nerf, and coupled with spatial jiang2020lig; chabra2020deep or spatio-temporal jiang2020meshfreeflownet latent grid structures to extend to larger scenes and domains. Still, these approaches lack the fine structures of real geometric models.

Shape deformation

Parametrizing the space of admissible deformations in a set of shapes with diverse topologies is a challenging problem. Directly predicting offsets for each mesh vertex with insufficient regularization will lead to non-physical deformations such as self-intersections. In computer graphics, geometry deformation is usually parameterized using a set of deformation handles schaefer2006image or deformation cages joshi2007harmonic; lipman2008green; weber2009complex. Surface-based energies are usually optimized in the deformation process sorkine2007rigid; chao2010simple; jacobson2011bounded; uy2020deformation to maintain rigidity, isometry, or other desired geometric properties. More recently, learned deformation models have been proposed, directly predicting vertex offsets wang20193dn or control cage deformations yifan2019neural. Different from our end-to-end deformation setting, the graphics approaches are typically aimed at interactive and incremental shape editing applications.

Flow models

Flow models have traditionally been used in machine learning for learning generative models for a given data distribution. Some examples of flow models include RealNVP 

dinh2016density and Masked Auto-Regressive Flows papamakarios2017masked; these generally involve a discrete number of learned transformations. Continuous normalizing flow models have also been recently proposed chen2018neural; grathwohl2018ffjord

, and our method is mainly inspired by these works. They create bijective mappings via a learned advection process, and are trained using a differential Ordinary Differential Equation (ODE) solver. PointFlow 

yang2019pointflow and OccFlow niemeyer2019occupancy are similar to our approach in using such learned flow dynamics for modeling geometry. However, PointFlow yang2019pointflow maps point clouds corresponding to geometries to a learned prior distribution while ShapeFlow directly learns the deformations function between geometries, bypassing a prior distribution and better preserves geometric details. OccFlow niemeyer2019occupancy only models the temporal deformation sequence for one object, while ShapeFlow learns a deformation space for entire classes of geometries.

3 Method

Consider a set of shapes . Each shape is represented by a polygonal mesh , where is an ordered set of points that represent the vertices of the polygonal mesh. For each point , we have . is a set of polygonal elements, where each element indexes into a set of vertices . For one-way deformations, we seek a mapping that minimizes the geometric distance between the deformed source shape and the target shape :

(1)

where is the symmetric Chamfer distance between two shapes . Note the mapping operates on the vertices , while retaining the mesh connectivity expressed by . As in previous work fan2017point; mescheder2019occupancy, since mesh-to-mesh Chamfer distance computation is expensive, we proxy it using the point set to point set Chamfer distance between uniform point samples on the meshes. Furthermore, in order to learn a symmetric deformation space, we optimize for maps that minimize the symmetric deformation distance:

(2)

We define such maps as an advection process via a flow function , where we associate intermediate deformations with an interpolation parameter . For any pair of shapes :

(3)
Intersection-free deformations

Not introducing self-intersections is a key property in shape deformation, since self-intersecting deformations are not physically plausible. In Proposition 1 (supplementary material), we prove that this property is algebraically satisfied in our formulation. Note that this property holds under the assumption of perfect integration. Errors in numerical integration will lead to its violation. However, we will empirically show in Sec. C.2 (supplementary material) that this can be controlled by bounding the numerical integration error.

Invertible deformations

For any pair of shapes, it would be ideal if performing a deformation of into , and then back to , would recover exactly. We want the deformation to be lossless for identity transformations, or, more formally, . In Proposition 3 (supplementary material), we derive a condition on that is sufficient to ensure bijectivity :

(4)

3.1 Deformation flow field

At the core of the learned deformations (3) is a learnable flow field . We start by assigning latent codes to the shapes , and then define the flow as:

(5)

where  are trainable parameters of a neural network. Note the same deformation function can be shared for all pairs of shapes , and that this flow satisfies the invertibility condition (4).

Flow function

The function , receives in input the spatial coordinates and a latent code . When deforming from shape to shape , the latent code  linearly interpolates between the two endpoints. is a fully-connected neural network with weights .

Sign function

The sign function , receives the normalized direction for the vector from to . The sign function has the additional requirement that it be symmetric, which can be satisfied either by fully-connected neural networks with learnable parameters

, with zero bias and symmetric activation function (e.g.,

tanh), or by construction via the hub-and-spokes model of Section 3.2.

Flow magnitude

With this regularization, we ensure that the distance within the latent space is directly proportional to the amount of required deformation between two shapes, and obtain several properties:

  • [leftmargin=*]

  • Consistency of the latent space, which ensures deforming half way from to is equivalent to deforming all-way from to the latent code half-way between and :

  • Identity preservation :

Implicit regularization: volume conservation

By learning a divergence-free flow field for the deformation process, we show that the volume of any enclosed mesh can be conserved through the deformation sequence; see Proposition 4 (supplementary material

). While we could penalize for divergence change via a loss, resulting in approximate volume conservation, we show how this hard-constraint can be implicitly and exactly satisfied without resorting to auxiliary loss functions. Based on Gauss’s theorem, the volume integral of the flow divergence is equal to the surface integral of flux, which amounts to zero for solenoidal flows. Additional, any divergence-free vector field can be represented as the curl of a vector potential. This allows us to parameterize a

strictly divergence-free flow field by first parameterizing a vector field as the vector potential. In particular, we parameterize the flow as = , with using a fully-connected network: . Since the curl operator is a series of first-order spatial derivatives, it can be efficiently calculated via a sum of the first-order derivatives with respect to the input layer for

, computed through a single backpropagation step; refer to the architecture in Sec. 

B.1 (supplementary material).

Implicit regularization: symmetries

Given that many geometric objects have a natural plane/axis/point of symmetry, being able to enforce implicit symmetry is a desired quality for the deformation network. We can parameterize the flow function by first parameterizing a . Without loss of generality, assume , and let be the plane of symmetry:

(6)

where the superscript denotes the components of the vector output.

Explicit regularization: surface metrics

Additionally, surface metrics such as rigidity and isometry can be explicitly enforced via an auxiliary loss term to the overall loss function. A simple isometry constraint can be enforced by penalizing the change in edge lengths of the original mesh through the transformations, similar to the stretch regularization in gadelha2020deep; bednarik2019shape.

Implementation

We use a modified version of IM-NET chen2019learning as the backbone flow model where we adjust the model with different number of hidden and output nodes. We defer discussions about the model architecture and training details to Sec. B.1 (supplementary material)

3.2 Hub-and-spoke deformation

Given a set of training shapes: , we train the deformer by picking random pairs of shapes from the set. There are two strategies for learning the deformation, either by directly deforming between each pair of shapes, or deforming each pair of shapes via a canonical latent shape corresponding to a “hub” latent code. Additionally, we use an encoder-less approach (i.e., an auto-decoder park2019deepsdf) where we initialize random latent codes from , corresponding to each training shape. . The latent codes are jointly optimized, along with the network parameters . Additionally, we define a “hub” latent vector as . Under the hub-and-spokes deformation model, the training process amounts to finding:

(7)

A visualization for the learned latent space via the hub-and-spokes model is shown in Fig. 1(b). With hub-and-spokes training, we can define the sign function (Sec. 3.1) simply to produce for the path towards the zero hub and for the path from the hub, without the need of parameters.

3.3 Encoder-less embedding

We adopt an encoder-less scheme for learning the deformation space, as well as embedding new observations into the deformation space. After we acquire a learned deformation space by training with the hub-and-spokes approach, we are able to embed new observations of point clouds into the learned latent space by optimizing for the latent code that minimizes the deformation error of random shapes in the original deformation space to the new observation. Again, this “embedding via optimization” approach is similar to the auto-decoder approach in park2019deepsdf. The embedding of a new point cloud amounts to seeking:

(8)

4 Experiments

Input Points

GT mesh

3D-R2N2 choy20163d

PSGN fan2017point

DMC liao2018deep

OccFlow mescheder2019occupancy

ShapeFlow

Retrieved

Figure 2: Qualitative comparison of mesh reconstruction from sparse point clouds as inputs. The shapes generated by ShapeFlow are able to preserve CAD-like geometric features (i.e. style) while faithfully aligning with the input observations (i.e. structure). The retrieved model is overlaid with the deformed model.

4.1 ShapeNet deformation space

As a first experiment, we learn the deformation space for entire classes of shapes from ShapeNet chang2015shapenet, and illustrate two downstream applications for such a deformation space: shape generation by deformation, and shape canonicalization. Specifically, we experiment on three representative shape categories in ShapeNet: chair, airplane and car. For each category, we follow the official train/test/validation split for the data. We preprocess the geometries into watertight manifolds using the preprocessing pipeline in mescheder2019occupancy, and further simplify the meshes to th of the original number of vertices using fastquadric. The deformation space is learned by deforming random pairs of objects using a hub-and-spokes deformation approach (as described in Section 3.2). More training details for learning the deformation space can be found in Section B.2 (supplementary material).

4.1.1 Surface reconstruction by template deformation

The learned deformation space can be used for reconstructing objects based on input observations. A schematic for this process is provided in Fig. 1: a new observation , in the form of a point cloud, can be embedded into a latent code the latent deformation space according to Eqn. 8. The top- nearest training shapes in the latent space are retrieved, and deformed to . During this step we further fine tune the network parameters to perform a better fitting to the observed point cloud.

Task definition

We seek to reconstruct a complete object given a (potentially incomplete) sparse input point cloud. Following mescheder2019occupancy, we subsample points from mesh surfaces and add a Gaussian noise of to the point samples. As a measure of the reconstruction quality, we measure the volumetric Intersection-over-Union (IoU), Chamfer-, as well as normal consistency metrics.

Results

We benchmark against various state-of-the-art shape generation models that outputs voxel grids (3D-R2N2 choy20163d), upsampled point sets (PSGN fan2017point), mesh surfaces (DMC liao2018deep) and implicit surfaces (OccFlow mescheder2019occupancy); see quantitative results in Table 1. Qualitative comparisons between the generated geometries are illustrated in Figure 2. Note our shape deformations are more constrained (i.e., less expressive) than traditional auto-encoding/decoding, resulting in slightly lower metrics (Table 1). However, ShapeFlow is able to produce visually appealing results (Figure 2), as the retrieved shapes are of CAD quality – and fine geometric details are preserved by the deformation.

category Chamfer- () IoU () Normal Consistency ()
DMC OccFlow PSGN R2N2 ShapeFlow DMC OccFlow PSGN R2N2 ShapeFlow DMC OccFlow PSGN R2N2 ShapeFlow
airplane 0.0969 0.0711 0.0976 0.1525 0.0858 0.5762 0.7158 - 0.4453 0.6156 0.8134 0.8857 - 0.6546 0.8387
car 0.1729 0.1218 0.1294 0.1949 0.1388 0.7182 0.8029 - 0.6728 0.6644 0.8222 0.8647 - 0.6979 0.7690
chair 0.1284 0.1302 0.1756 0.1851 0.1888 0.6250 0.6513 - 0.5166 0.4390 0.8348 0.8593 - 0.6599 0.7647
mean 0.1328 0.1077 0.1342 0.1775 0.1378 0.6398 0.7233 - 0.5449 0.5730 0.8235 0.8699 - 0.6708 0.7908
Table 1: Quantitative evaluation of shape reconstruction performance for ShapeNet chang2015shapenet models.

4.1.2 Canonicalization of shapes

An additional property of the deformation space learned through the hub-and-spoke formulation is that it naturally learns an aligned canonical deformation of all shapes. The canonical deformation corresponds to the zero latent code that corresponds to the hub, for shape it is simply the deformation of from latent code to the hub latent code . Dense correspondences between shapes can be acquired by searching for the nearest point on the opposing shape in the canonical space. For a point , the corresponding point on is found as:

(9)

[.54] [.44]

Figure 3: Unsupervised correspondences. Shapes RGB colors correspond to the coordinates of each point in the original / canonical space.
Figure 4: Deformation via parametric controllers Schulz:2017 compared to the ShapeFlow interpolation.
Evaluation metric

To quantitatively evaluate the quality of the such surface correspondences learned in an unsupervised manner, we propose the Semantic Matching Score (SMS) as a metric for evaluating such correspondences. While semantic correspondences between shapes do not exist, semantic part labels are provided in various shape datasets, including ShapeNet. Denote as an evaluation of the semantic label for the point , is a label comparison operator that evaluates to one if the categorical labels are the same and zero otherwise. We define SMS between  as:

(10)

We choose 10,000 random pairs of shapes in the chair category to compute semantic matching scores.

Domain SMS
ShapeNet 0.779
ShapeFlow 0.816
Results

We first visualize the canonicalization and surface correspondences of shapes in the deformation space in Fig. 4. We compare the semantic matching score for our learned dense correspondence function with the naive baseline of nearest neighbor matching in the original

(ShapeNet) shape space. The results are presented in the inset table. The shapes align better in the canonical pose, and the matches found by canonical space matching are more semantically correct, especially between originally poorly aligned space due to different aspect ratios (e.g., couch and bar stool). This is reflected in the improved SMS matching score, as reported in the inset table.

4.2 Human deformation animation

ShapeFlow can be used to producing smooth animated deformations between pairs of 3D geometries. These animations are subject to the implicit and explicit constraints for volume and isometry conservation; see Section 3.1. To test the quality of such animated deformations, we choose two relatively distinct SMLP poses SMPL:2015, and produce continuous deformations for in-between frames. Given that dense correspondences between shapes are given, we change the distance metric in Eqn. 2 to be the pairwise norm between all vertices. We supervise the deformation with 5 intermediate frames produced via linear interpolation. Denoting the geometries at the two end-points as and , the deformation at intermediate step is:

(11)
Figure 5: Animation of human figures via ShapeFlow deformation. (left) Intermediate poses interpolated between frames. (right) Volume change of the intermediate meshes, showing that our divergence-free flows conserve volume throughout the deformation.
Results

We present the results of this deformation in Figure 5. We compare several cases, including direct linear interpolation, deformation using an unconstrained flow, volume constrained flow, as well as volume and edge length constrained flow model. The volume change curve in Figure 5 empirically validates our theoretical result in Section 3.1, that (1) a divergence-free flow conserves the volume of a mesh through the deformation process, and (2) prevents self-intersections of the mesh, as in the example in Figure 5. Furthermore, we find that explicit constraints, such as the edge length constraint, reduces surface distortions.

4.3 Comparison with parametric deformations

As a final experiment, we compare the unsupervised deformation acquired using ShapeFlow with interpolations of parametric CAD models. We use an exemplar parametric CAD model from Schulz:2017; see Figure 4. ShapeFlow

produces novel intermediate shapes of CAD level geometric quality that are consistent with those produced by interpolating a parametric model.

5 Conclusions and future work

ShapeFlow is a flow-based model capable to build high-quality shape-spaces by using deformation flows. We analytically show that ShapeFlow prevents self-intersections, and provide ways to regularize volume, isometry, and symmetry. ShapeFlow can be applied to reconstruct new shapes via the deformation of existing templates. A main limitation for the current framework is that it does not incorporate semantic supervision for matching shapes. Future directions include analyzing part structures of geometries by grouping similar vector fields xu2011detecting, and exploring semantics-aware deformations. Furthermore, ShapeFlow may be used for the inverse problem of inferring a solenoidal flow field given tracer observations willert1991digital, an important problem in engineering physics.

Broader impact

The work has broad potential impact within the computer vision and graphics community, as it describes a novel methodology that enables a range of new applications, from animation to novel content creation. We have discussed the potential future directions the work could take in Sec. 5.

On the broader societal level, this work remains largely academic in nature, and does not pose foreseeable risks regarding defense, security, and other sensitive fields.

References

Appendix A Mathematical proofs and derivations

Proposition 1 (Intersection-free).

Any deformation map in spatial dimensions, , induced via a spatio-temporal continuous flow function , cannot induce self intersection of a continuous manifold throughout the entire deformation process.

Proof.

Let be two points on the manifold, such that:

(12)

Assume that the two points intersect at time . The location at time can be found via:

(13)

which contradicts Eq. 12. ∎

Proposition 2.

is a sufficient condition for bijectivity.

Proof.

By respectively setting and , we obtain:

(14)
(15)

We now replace from (14) into (15) (and analogously for ) showing that:

(16)
(17)

are nothing else than the bijectivity conditions. ∎

Proposition 3 (Bijectivity condition on ).

A sufficient condition on the flow function for deformation bijectivity is .

Proof.

We start by replacing (3) (and the equivalent for ) into (16), and employing Proposition 2.

(16) (3) (18)
(19)
(20)
(21)

Which is satisfied and via the constraint:

(22)

Proposition 4 (Volume conservation).

Suppose is a compact subset of . For , is the three-dimensional volume, and is the surface boundary of . Given a deformation map in spatial dimensions, , induced via a divergence-free (i.e. solenoidal) spatio-temporal continuous flow function , , the volume within the deformed boundary remains constant.

Proof.

As per divergence theorem, the flux across the boundary integrates to zero:

(23)

Theorem 1 (Existence of vector potential).

If is a vector field on with , then there exists a vector field with .

Proof.

This extends from the fundamental theorem of vector calculus, and is the result of the vector identity . ∎

Appendix B Implementation details

b.1 Neural architecture

Figure 6: Backbone flow architecture

We employ a variant of the IM-NET chen2019learning architecture as our backbone. The complexity of the flow model is parameterized by the number of feature layers , as well as the dimensionality of the latent space . In the case with no implicit regularization, we directly use the IM-NET backbone as the flow function (in Eqn.6). In the case with implicit volume or symmetry regularization, we use the backbone to parameterize ; see Figure 6 for a a schematic of the backbone. We do not learn an encoder, and instead we use an encoder-less scheme (Sec. 3.3) for training as well as embedding new observations into the deformation space.

b.2 Training details

ShapeNet deformation space (Sec. 4.1)

For training ShapeFlow to learn the ShapeNet deformation space, we use a backbone flow model with feature layers, use the ReLU activation function, learning rate of , a batch size of (across 8 GPUs), and train for steps. We train using an Adam Optimizer, and we compute the time integration using the dopri5 solver with relative and absolute error tolorence of . We samples 512 points on each mesh to use as proxy for computing point-to-point distance. We enforce symmetry condition on the deformations. We do not enforce isometry and volume conservation conditions since they do not apply to the shape categories in ShapeNet. For the reconstruction experiment (Sec.4.1.1) we use latent dimensions of , as a compact latent space allows better clustering of similar geometries, improving retrieval quality. For the canonicalization experiment (Sec. 4.1.2), we use a , since a larger latent dimension mitigates distortions at the canonical pose.

Furthermore, after training the deformation space, for embedding a new latent code, we initialize the latent code from . We optimize using Adam optimizer with learning rate of for 30 iterations. Then we finetune the neural network for the top-5 retrieved nearest neighbors, for an additional 30 iterations.

Human deformation animation (Sec. 4.2)

For human model deformation, since we are only learning the flow function for two, or a couple of shapes, we can afford to use a more lightweight model. We use a backbone flow model with . We use the Elu activation, since it is continuous, allowing us to parameterize a volume-conserving divergence-free flow function. We use the Adam optimizer with a learning rate of . For improved speed, we use the Runge-Kutta-4 (RK4) ODE solver, with 5 intermediate time steps. For the best performing result, we use the divergence-free parameterization, as well a edge loss weighting factor of . We optimize for 1000 steps.

Appendix C Additional analysis and visualization

Figure 7: Random examples of shapes in the deformation space. The diagonal shapes (in green) are the identity transformations for the shapes. The identity transformations are able to preserve the original geometric details almost perfectly, highlighting the identity preservation capability of ShapeFlow.
Figure 8: Deformation of shape examples in the deformation space via an RK4 ODE solver.

c.1 Deformation examples

We show additional examples of the deformation between random pairs of shapes in the deformation space. We present the visualizations in Fig. 7. We draw random subsets of 5 shapes at a time, and plot the pairwise deformations of the shapes in a grid. One takeaway from this is that when the source and target are identical, the transformation amounts to an identity transformation. By transforming the shape to and back from the “hub", the geometric details are almost exactly preserved.

c.2 Effects of integration scheme

We further study the impacts of the ODE solver scheme on the shape deformation. We note that for the ShapeNet deformation space, it involves much more shapes () than the case of human frame interpolation, therefore it involves much drastic deformations. A fixed-step solver, such as the RK4 solver, is not able to accurate compute the dynamics for the individual points.

Numerical error accumulated during the integration step leads to violations of non self-intersection, identity preservation, resulting in dramatically unsatisfactory deformations between shapes.