Computer graphics has reached impressively high standards in representation and rendering of 3D scenes, regularly achieving photo-realism. As a consequence, the problem of creating 3D models of matching quality has become a serious problem, making content creation a major bottleneck in practice.
Data-driven methods are a promising avenue towards addressing this problem. The reuse of existing content, such as models available in large online data-bases, might become a viable option for reducing the content creation costs in the future. In order to be useful as a creative tool, the goal is not to just copy existing models like clip-arts, but to be able to maneuver within the space spanned by the examples and synthesize new shapes of related structure.
An important low-level problem for building such navigatable shape spaces is the correspondence problem: We need to determine which parts of objects are equivalent and which surface points have to be matched, establishing dense correspondences. Most statistical analysis techniques for building parametrized shape spaces require such a dense prior alignment as input [Blanz and Vetter, 1999, Allen et al., 2003, Hasler et al., 2009]
. The results crucially depend on the quality of the correspondences: Inaccurate and drifting correspondences yield bad shape spaces. In such spaces, sampling and interpolation yields implausible results (see the accompanyingvideo111http://cg.cs.uni-bonn.de/en/people/dipl-inform-oliver-burghard/compact-part-based-shape-spaces-for-dense-correspondences/ for a visualization). In other words, such models fail to generalize beyond the input data. The problem can be reduced by using large training sets to learn rather low-dimensional shape spaces. This averages out drift but also reduces the accuracy; only low frequency bands of the geometry are still predicted (see for example the results in [Hasler et al., 2009] obtained from almost 2000 shapes).
Hence, good correspondences are a key requirement for building useful and informative shape spaces. The correspondence problem comes in two flavors: Global and local matching. Local matching requires a rough initialization and refines it but is prone to getting stuck in local optima. Global methods are complementary: They aim at providing the initialization, but are usually unable to compute detailed and dense solutions. Our paper addresses the local problem: We want to find good dense correspondences given a coarse initialization. Specifically, we address this problem for the case of shapes that have strongly varying geometry, as typically required for learning shape spaces.
Computing correspondences among rather “similar” shapes is a problem that is by now already quite well understood. Variants of the ICP algorithm handle local alignment [Rusinkiewicz and Levoy, 2001] for both rigid and deformable models. Deformable ICP employs differential deformation priors such as elasticity [Hähnel et al., 2003], isometry [Bronstein et al., 2006, Ovsjanikov et al., 2010], conformal maps [Lévy et al., 2002, Kim et al., 2011], or thin-plate-splines [Allen et al., 2003, Brown and Rusinkiewicz, 2007]. These approaches model the behavior of infinitesimally small portions of the object: For example, elasticity penalizes local stretch and bending, and thin-plate splines optimize for smooth deformations.
The problem with differential deformation models is that their assumptions are often not justified when considering shape families with substantial geometric variability, such as a diverse collection of four-legged animals (Figure 1). Elastic models can capture pose changes of a single shape reasonably well [Hähnel et al., 2003, Li et al., 2008], but matching objects of different proportions creates strong artifacts (see Figure 9c). Thin-plate splines [Allen et al., 2003, Brown and Rusinkiewicz, 2007] are more flexible but their bias towards affine mappings still causes very noticeable artifacts (Figure 9d). In both cases, reducing the weight of the regularization reduces bias but also increases noise and drift in the correspondences.
Isometry and conformal maps are by design already quite rigid: Both are already fixed by three point-to-point correspondences (for spherical topology), which is very valuable for solving the global matching problem efficiently [Lipman and Funkhouser, 2009]. These models are again useful for modeling pose changes, but shape sets of largely varying geometry are very unlikely to fall into the prescribed, low-dimensional sub-manifold of matchable shapes. Blending between partial maps [Kim et al., 2011] can reduce the problem, but substantial bias persists (see Figure 9e).
Overall, computing dense correspondences among shapes of strongly varying geometry remains a problem that is mostly unsolved. The conceptual problem is that we need an effective notion of similarity that does not yet prescribe very specific geometric properties. Supervised machine learning from user annotated examples[Kalogerakis et al., 2010, van Kaick et al., 2011, Sunkel et al., 2013] has shown promising results for establishing coarse correspondences. However, it cannot be easily extended to the dense case because it is very difficult if not impossible for a human to prescribe accurate dense correspondences for the training data.
This observation is a major motivation for our paper: We assume that coarse annotations are available. In addition to existing coarse matching methods [Huang et al., 2011, Sidi et al., 2011], we can always resort to manual human labeling. However, this is impossible for dense matches. We therefore develop a new method to get high-quality dense correspondences from a sparse and inaccurate initialization.
Towards this end we build upon another recent idea: Correspondence extraction from shape collections. By considering many shapes of a similar kind simultaneously, more information is available. Several recent papers employ the cycle consistency constraint to build correspondences in shape collections [Nguyen et al., 2011, Huang et al., 2012, Kim et al., 2012]: Correspondences are usually understood as a point-wise equivalence relation, being transitive over multiple shapes. Thus, unclosed loops indicate errors in pairwise matches that can be detected and removed. As pairwise regularizer, near-isometry [Nguyen et al., 2011, Huang et al., 2012] or (optionally) extrinsic shape similarity [Kim et al., 2012] are employed. However, this implicitly assumes that the shapes in the collection are dense samples of a continuous manifold of shapes, i.e., nearby samples are intrinsically very similar. This is not always the case in practical shape sets and thus introduces, as we will demonstrate experimentally, substantial artifacts.
In this paper, we therefore improve this model by explicitly regarding correspondence estimation asoptimization of shape spaces, aiming at capturing the class of observed models well. This can be understood as a statistical learning problem: A good explanation for a phenomenon is one that not only fits the observed data tightly but that is also simple [Duda et al., 2000]. It is trivial to fit a large number of observations with a highly flexible model with lots of parameters (overfitting). However, making accurate predictions with a small and concise model makes such a hypothesis statistically meaningful.
Matching shapes of widely varying geometry forces us to choose mappings from a very large and sufficiently flexible set. However, from this large set, we aim at picking the simplest, the most compact representation
: The model should minimize the degrees of freedom utilized for representing the various shapes, rendering accidental matching unlikely: only natural correspondences will create simple shape spaces because they arise from a hidden, simple explanation for the observed geometric variability. Technically, this is formalized by minimizing the description length (MDL) of objects created by a Gaussian generative probabilistic model on a linear shape space. This approach has been originally developed in computer vision and medical imaging[Kotcheff and Taylor, 1998, Davies et al., 2002a].
In order to extend the applicability to a spectrum of typical computer graphics problems, we extend the original idea: First, we adapt the representation to handle meshes of generic topology. Second, we introduce a part-based representation that permits modeling correspondences across shapes of varying topology, interpreting each shape as an assembly of dockable, deformable parts. This allows us to learn a larger class of such composite models with both continuous (part deformation) and discrete (part assembly) variations. In particular, we introduce a novel algorithm to synthesize seamless and continuous models for assemblies of parts. Finally, the part-based approach yields high quality results: It decouples correlations between distant parts, which permits learning of expressive shape spaces with fewer examples, and with higher-quality correspondences.
In summary, we make the following main contributions: First, we introduce compact shape spaces for correspondence optimization to graphics, and demonstrate that this approach has a substantial impact on correspondence quality. Second, in order to make the method applicable to general meshes, we develop a new algorithm that can handle manifold meshes of generic topology while still maintaining meshing quality (uniform sampling and avoiding fold-overs). Third, we introduce a part-based formulation that represents shapes of variable topology; in particular, we describe new analysis and synthesis algorithms for composite shapes. We show that the part-based approach also improves the quality of the results over global optimization. Finally, as an example application, we demonstrate an interactive system for designing deformable shapes with continuous and discrete variability.
2. Related Work
In this section, we discuss previous work on compactness of shape spaces, complementary to generic correspondence estimation methods already discussed above. The concept originates from studying point distribution models such as active shape/appearance models [Cootes et al., 1995] that build generative Gaussian models of variability in images.
For model optimization, Hill et al. [Hill and Taylor, 1994]
have proposed compactness as criterion, and modeled this as the total variance of the shape distribution. Kotcheff and Taylor[Kotcheff and Taylor, 1998]
employ normal-distribution entropy, which creates sparse representations.
Ericsson et al. [Ericsson and Åström, 2003]
derive a gradient for the MDL energy, replacing the rather slow genetic algorithms and simplex methods by more efficient gradient descent[Heimann et al., 2005].
The approach can be combined with surface parametrization [Davies et al., 2002b, Heimann et al., 2005, Davies et al., 2010] to handle manifolds and guarantee bijectivity, however this restricts the topology to the spherical case.
Cates et al. [Cates et al., 2006] extend the approach to regularly sampled point-based representations of manifolds, handling the sampling uniformity by an elegant complementary entropy term. This approach also removes the topological restrictions but does not yield continuous, bijective mappings between meshes.
Our technique builds upon the entropy-based approach of Kotcheff and Taylor [Kotcheff and Taylor, 1998]. Unlike previous methods, we use a smooth implicit representation of input meshes and parametrize the correspondences over a single such a shape of general topology. We enforce regular and uniform meshing by a bi-Laplacian regularizer and dynamic resampling [Botsch and Kobbelt, 2004]. Our representation automatically ensures cycle-consistent correspondences and permits handling of meshes of general topology while maintaining meshing quality (in practice, also effectively avoiding fold-overs). Further, the smoothness of the representation allows us to employ an efficient quasi-Newton method for optimization.
A problem of straightforward Gaussian MDL models is that they create bias towards linear representation of global shape rather than aligning surface features. Thodberg et al. [Thodberg and Olafsdottir, 2003] address this by adding a curvature-matching error. Our part-based approach can be seen as an alternative and complementary measure to limit such artifacts by providing localized adaptivity. In addition, it permits more flexiblity in analyzing and representing composite shapes, which none of the previous methods provide.
A second, orthogonal problem is the global nature of the statistics. The model tends to overfit correlations between unrelated parts. For example, the poses of the arms in a human model are mostly independent, but excessive training data is required for a PCA model to recognize this. For this reason, many approaches have used part-based formulations [Blanz and Vetter, 1999, Zhang et al., 2004, Feng et al., 2008, Tena et al., 2011]. Our main contribution in this respect is that our analysis algorithm optimizes such models automatically. As a convenient by-product of the part-based correspondence optimization, our method optimizes the boundaries of the segmentation automatically given only a very coarse initialization. Further, our synthesis method works in the gradient-domain and thereby provides improved smoothness across boundaries in comparison to previous spatial domain methods [Tena et al., 2011].
3. Creating Compact Shape Spaces
In this section, we describe the basic method for optimizing shape correspondences with the objective of creating compact shape spaces. We here first discuss the case of each shape consisting of a single part only; composite, part-based shape spaces will be discussed later, in Section 4.
Input: In the following, let be a set of 3D shapes. We assume that these are smooth, compact 2-manifolds embedded in . The topology can be arbitrary but has to be fixed across all input shapes for now (by assembling multiple such parts, this requirement can be relaxed later). In practice, the shapes are discretized as triangle meshes. We denote the corresponding vertices by ; each
is a matrix formed by the vector of the individual vertices. The set of triangles of each mesh are denoted by.
3.1. Linear Shape Spaces
First, we define the generative process: Let be a urshape, i.e., a base shape that has the same topology as each of the input shapes and that serves as parametrization domain for the shape space. This space is formed by the mappings:
For each vector and each , the function returns a point on the generated shape. We assume the generative process to be linear. This shapes can be described by coordinates in an orthogonal basis. For a , we have:
Where the function encodes the mean shape and are orthogonal basis functions that describe the possible linear modes of variation. In our implementation, we use (as most others) the mean shape as urshape, i.e., . In practice, will be approximated by a triangle mesh of vertices. We denote the matrix of the vertices of the mesh by , and denote the created meshes by , and the continuous version by , respectively.
We equip the shape space
with a Gaussian probability measure with an axis aligned neg-log likelihood
specifies the standard deviations along the main axes of the model.
Further, we will usually consider the space of shapes generated by and then rigidly arranged in . Given and , we denote a rigidly transformed shape (in slight abuse of notation) by:
Building the model:
Given a set of input shapes and correspondences between them, we can easily build Gaussian shape spaces using principal component analysis (PCA): Assume that we are given a set of consistently triangulated vertex meshesthat match the input with vertex correspondence, i.e., corresponding vertices located at matching geometry (we use the star to denote known correspondences). We compute the mean by averaging the input shapes and determine the covariance matrix
The mean and the eigenvectors of
yield the mean and basis meshes and the eigenvalues correspond to the standard deviations
. Further, it is easy to see (for example, by applying a singular value decomposition and rearranging terms) that the Gram matrix
has the same eigenvalue spectrum (up to the factor ). In the continuous case, the sum is replaced by an integral. Assume that we have homeomorphisms that encode continuous correspondences to our input shapes . We then again form the mean function by averaging and the Gram matrix:
The matrix has at most rank ; in the (typical) case of redundancy in the shape collection, the number of significant eigenvalues will typically be substantially smaller than . Importantly, the spectrum does not just depend on the geometry of but crucially on the correspondences encoded in the functions . While the variability of the shapes prescribes a lower bound on the rank of , we can in general artificially inflate it up to full rank by just letting the correspondences drift randomly along the surface.
We now discuss how to measure the compactness of the shape space and how to minimize it. We also recap ideas from [Kotcheff and Taylor, 1998, Cates et al., 2006, Davies et al., 2010] to keep the paper self-contained.
Spectral view: Let denote the vector of eigenvalues of . If the correspondences include unnecessary movements along the surfaces of the objects, the spectrum will spread out, creating more non-zero eigenvalues. In reverse, a compact shape space should have a compact spectrum. A simple way of modeling this is to penalize the square norm of the eigenvalues, as proposed by Hill and Taylor [Hill and Taylor, 1994]. It is equivalent to trying to keep all surface points in deformed shapes close to the mean shape, independent of each other (therefore not transporting information globally). From a spectral perspective, it favors multiple small eigenvalues over a few large ones, which does not match the intuition of a low-dimensional generative process that we want to reconstruct. Rather than that, we aim at a sparse spectrum, as detailed next.
|(a) untransformed||(b) linear differential prior||(c) entropy prior|
Geometric interpretation: Differential models such as (linearized) elasticity or thin-plate splines impose a quadratic energy on a linearly transformed shape space that attracts all target shapes to the source shape. Our method minimizes the entropy of the ensemble, moving correspondences such that the shapes align in a lower-dimensional subspace, creating less bias.
We can also look at the probability distribution the shapes are drawn from. The less variability it permits, without reducing the likelihood of the training examples, the more concisely it captures the shape space. In this view, we should measure theentropy of the Gaussian model:
This approach suffers from singularities: If one of the eigenvalues becomes zero, the determinants of the covariance and Gram matrix become zero, leaving the entropy ill-defined. Further, driving even just the least eigenvalue close to zero would falsely indicate a near-perfect solutions, which leads to instability and inconsistency.
Information theoretic view: From the point of view of information theory, we can measure the capacity of the generative probabilistic model (Equations (2, 3)) to encode different models by considering the description length of a specific shape, given the knowledge of the generative model in terms of the probabilistic shape space. To transmit one shape, we need to encode the shape parameters
. Given an independent Gaussian distribution along each axiswith variance , and assuming that a finite accuracy of is required in our application, encoding a single parameter requires roughly bits [Thodberg, 2003] (see Davies et al. [Davies et al., 2002a] for the accurate and more detailed derivation). For small variances , no information needs to be encoded. This suggests the following energy [Kotcheff and Taylor, 1998, Cates et al., 2006] that approximates the information content of the shape space depending on correspondences :
is a regularizer that determines the accuracy of the shape space: We assume that independent of the example data, there is always an isotropic Gaussian noise component of standard deviation in all dimensions of the space. This removes the singularity and makes the entropy usable as measure that encourages sparse PCA spectra during correspondence optimization [Kotcheff and Taylor, 1998]. This is an approximation to coding length [Davies et al., 2002a]; nonetheless, it already yields favorable results in practice.
Geometric view: We can also interpret these results as imposing a prior in a shape space. Figure 2 shows schematically a number of example shapes as points in a high-dimensional shape space. Traditional regularizers such as thin-plate-splines or linearized elasticity impose a Gaussian prior, i.e., the neg-log-likelihood is a quadratic energy of the form
where is a linear operator (a matrix) that acts on the vertex sets interpreted as -vectors. For example, in thin-plate-splines, measures the bending by taking second derivatives. In other words, traditional (linear) differential priors can be seen as an isotropic attraction to a single point (the urshape) in a linearly transformed shape space (Figure 2b). Contrarily, minimizing the entropy encourages a tight fit of an ellipsoid to the data, minimizing its volume, and thereby encouraging all models to be located on a low-dimensional linear subspace (Figure 2c). This creates bias towards a linearly correlated representation rather than towards a single shape. It is not surprising that this yields significantly better results when the final objective is to describe a shape collection with exactly this representation rather than reconstructing it from pairs of biased, point-wise matches in shape space.
3.3. Shape Optimization
Let be a set of example models given as triangle meshes. We approximate these by smooth surfaces , as detailed later. Let be an urshape of matching topology. We denote the vertices of by . We now want to compute correspondences
We denote the set of all correspondences by . All of these are hard-constrained to be located on the (smoothed) input surfaces. We optimize the correspondences by minimizing the following energy, subject to the constraint of moving only along the surface (as illustrated in Figure 3):
The term approxiates the description length as discussed above and is a bi-Laplacian regularizer.
We set its weight to the ratio of the number of triangles divided by the surface area squared (to make the overall weight mesh-independent), multiplied by a relative weight of .
The overall energy is minimized using l-BFGS, a nonlinear quasi-Newton solver. Further, we factor out rigid motions according to Equation (4): We compute a least-squares optimal translation, rotation, and reflection from the initial correspondences. The rigid motion is updated during the optimization by including the rotation as variable in the optimization (parametrization the small rotational update as Euler angles with respect to the initial least-squares fit).
For creating compact shape spaces, we use the energy from Equation (9). We compute the Gram matrix by integrating over the deformed triangle meshes according to Equation (7). Because of additional regularization (described next), it is sufficient to approximate the integrals by an unweighted sum over vertex positions (Equation (6)). We compute the derivative of the energy using the explicit formula derived in [Kotcheff and Taylor, 1998].
The regularization term is a prior on the graph Laplacian of the deformed meshes . With denoting the set of indices of vertices sharing an edge with vertex in the mesh , we obtain:
This term encourages the graph Laplacian of the triangle mesh to be zero, which is the case if every vertex is located in the center of its 1-ring neighborhood, corresponding to a uniform triangulation [Botsch and Kobbelt, 2004]. Although adding this least-squares energy does not guarantee bijectivity of the mapping, it also effectively avoids fold-overs in practice.
3.3.3. Data Modeling
We model the hard-constraint that correspondences must remain on the input surfaces by a level-set approach. As our input is only discrete, mesh approximation of a shape, we first build a smooth surface that tightly approximates so that we can slide along the surface smoothly during optimization. We first sample the input mesh with a dense, uniform point cloud representing the input mesh with (given) oriented normals . We fit a signed distance function to by minimizing the following energy [Calakli and Taubin, 2011]:
The first term assures that the zero crossing of is at the data points. The second term aligns the gradients with the normals, creating a smooth result and removes the trivial solution (). The last term integrates the squared Frobenius norm of the Hessian of over a bounding volume , acting at a regularizer that propagates function values linearly and encourages smoothness. We set the weight , and , which is sufficient to smooth very sharp corners a bit.
We optimize this quadratic energy by solving the linear system resulting of a finite difference discretization with spacing set to of the bounding box of the object. Continuous values for and
are obtained by interpolation with radial basis functions at each grid point; we employ Wendland kernels. is approximated by finite differences over the grid. The domain is obtained by including all grid cells within a distance of to data points. Triangle meshes sampled with spacing to obtain . We refer to the zero-level set of the result as .
Using the implicit function: The serve as constraint manifolds for correspondences during optimization: First, any initial solution is projected to by simple gradient descent. During numerical optimization, the quasi-Newton l-BFGS solver attempts to update the correspondences positions: by first finding a new direction and then the distance by a line search. In each iteration, we project back onto the surface (using the exponential map in ). The rational is that small step sizes turn the smooth constraint into a sequence of linear subspace constraints that can are handled by the quadratic (low-rank) optimizations performed in the inner loop of l-BFGS.
Motivation: In experiments with various formulations, the implicit function formulation with hard constraints to the zero level-set turned out to be most reliable and crucial for good results. Other options did not give satisfactory results: Least-squares soft-constraints are unreliable: weak constraints have trouble with thin structures and sharp creases, and strongly weighted soft constraints yield a numerically ill-conditioned energy, preventing convergence. The option of just using the input triangle meshes was not satisfactory either: Using such a surface lead to spurious local optima in our experiments. Experiments with a projection to dynamically computed MLS-approximation of the surface have also turned out to be slow and unreliable for general surfaces with small feature size.
4. Extended Model
We now extend our approach by introducing composite, part-based models that capture correspondences among objects of varying topology.
|(a) input shapes||(b) docking sites||(c) part graphs||(d) docking rules|
4.1. Part-Based Modeling
The method as discussed so far, as well as previous proposals in literature, is restricted to shapes that have global correspondences and form a single, global shape space. In practice, this is often a strong restriction. Many man made shapes consist of composite parts (for example, the irons in Figure 7b have been assembled from different parts), forming shape spaces of varying topology that cannot be captured by a single shape space.
We therefore propose a model that decomposes complex shapes into a set of parts that have individual shape spaces. First, we modify the analysis algorithm to optimize both the shape and the decomposition of the surface. Second, develop a synthesis algorithm that can build seamless models consisting of deformed parts in different poses. The synthesis can handle general arbitrary constraints (changing the discrete composition of the parts, handles for free-form deformation, subspace constraints).
Input: We again assume that we are given a set of example shapes as triangle meshes (Figure 4 shows an example, reflecting the actual result demonstrated in Figure 7b). We further assume that the shapes are segmented into parts, i.e., every triangle is tagged with a part type , where is the number of different part types (Figure 4a shows the types as different colors). Each part of the same type must have the same topology (The irons example use a adapter pieces (yellow/pink/brown) to attach handles of different topology to the body). Each discrete configuration corresponds to a different graph of parts (Figure 4c). In addition, each part has continuous parameters (not shown) that permit deformation according to the shape space learned from all parts of the same type (same color in our figures).
The initialization of the part boundaries does not need to be precise; only the topology and coarse geometry needs to match. We will improve the segmentation geometry automatically.
Part docking: Parts will share common boundaries, and possibly in different combinations. We learn the way parts can be discretely assembled from the input by just reading of the observed adjacency relations from the input.
Figure 4d,e show the rules that have been deduced from the input. Boundaries between that connect two types of parts across a common docking site are always in fixed correspondence; i.e., the dense correspondences of the parts themselves are enforced at the boundary, too.
In the following we discuss our analysis algorithm that creates part graphs and shape spaces for each part automatically given a coarse user segmentation and possibly a few additional landmark matches.
Part parametrization: The first step is to compute initial dense correspondences. We need bijective correspondences without fold-overs. For this, we use cross-parametrization [Kraevoy and Sheffer, 2004]: We first cut the parts further into topological discs, and then compute a cross-parametrization of the discs to obtain initial correspondences (see Figure 5). For cutting, we first detect the interior boundaries within all parts. We connect each resulting boundary curve to its closest neighbor (see Figure 5) and then cut along a geodesic path between the corresponding closest point. Cutting is iterated until only topological discs remain, and the process is done in all parts simultaneously. This initialization is presented to the user, who can move the initial landmark correspondences along the boundaries (red dots in Figure 5). The resulting sub-parts are set into correspondences by a least-squares conformal map to a unit circle; the boundaries of the circle are set into correspondence by comparing the relative arc length (normalized to ), using the cutting points as starting point. For the outer boundary of the initial segmentation, the user has to specify this starting point manually.
The result of this step are dense correspondences between all parts, with topology consistent to the user-defined segmentation. The correspondences are guaranteed to be bijective, but the quality is usually very bad, showing strong drift and distortions across the shape.
Optimization: We now perform the optimization from Section 3 to improve this initial guess. For each part type (same color in our figures), we setup a separate energy according to Equation (12). We use the cross-parametrization result as initial correspondences , and the resulting mean shape as urshape . We first run the optimization separately, constraining the boundaries of the domain to fit the boundary curve by a point-to-line energy that snaps the closest vertex to its the boundary of the parts. We add the boundary energy
that measures the deviation of boundary vertices from the boundaries of the input parts.
Boundary optimization: After the energy has converged, we remove the constraint of Equation (15) and start optimizing the boundary location. We need to make sure that parts still meet at the boundaries, and this should happen in a consistent way. As consistency condition, we maintain fixed correspondences along all matching part boundaries of the same type. We impose this consistency as a soft constraint in an alternating two-stage optimization:
In stage one, we find all pairs of closest points between boundaries of matching type: For points we compute all instances in the data, and for each instance, the closest point in the adjacent instance of type . We average over all of these matches and set a soft constraint penalizes the quadratic distances for all these pairs. In stage two, we run the optimization of according to Equation (12) with the additional constraint energy added. We optain improved correspondences, which are again used to refine the correspondences.
Conceptually, this could be interpreted as a variant of iterative closest points (ICP), performed simultaneously along multiple boundary curves while keeping their correspondences consistent. The alternating estimation of boundary correspondences is combined with the estimation of global rigid motions for each part, as already introduced in Section 3.3.
Further details: In our implementation, a few extra steps are performed to improve the efficiency of the method. First, before computing the cross-parametrization, we use the dynamic remeshing algorithm of Botsch et al. [Botsch and Kobbelt, 2004] in order to improve the mesh quality of each part (which might already have been bad in the input meshes). The method iteratively minimizes the sum of the squared graph Laplacians and performs edge contractions / vertex splits in order to create a uniformely sampled mesh. We use the average edge length in all part instances as length criterion. Second, after parametrization, we might end up with very uneven sampling; the conformal map can have large scale factors that lead to a uneven distribution of triangles. Using vertex splits, again according to the same criterion, we refine regions that are undersampled and project the resulting newly inserted points onto the implicit surface that models the data. These steps could in principle be omitted, but then a very dense initial mesh is required to obtain results of good quality. Another improvement is to perform a final optimization pass of the correspondence energy (Equation 12) where the Laplacian regularizer is evaluated for each composite input shape instead of its parts separately, which makes sure that the boundaries between parts are only determined by the compactness criterion and not by the mesh regularization (we obtain a slight improvement here).
Overall, the result of the preceding is an optimized composite shape space in which (i) the correspondences within each part, (ii) the position of the boundaries on the example shapes, and (iii) the correspondences among matching boundaries have been optimized with the goal of compactness and mesh quality. In the following, we discuss how we can utilize the result to create new shapes.
The model that we have obtained in the previous step describes a shape by a set of parts that are connected along their boundary lines. A key feature of this extended model is that we can instantiate composite models consisting of multiple parts, potentially rearranged by attaching the parts differently across compatible boundaries. We therefore need to devise a generative process by which we can instantiate such composite shapes, governed by multiple local shape spaces. We aim at maximum flexibility: Given an arbitrary arrangement of parts and arbitrary user constraints on geometry and shape parameters for each part, we want to find a global geometry that fits all of these constraints best.
Variational Part Reconstruction
The first step is to formulate the problem of reconstructing the part shapes as a variational problem. In order to facilitate a smooth reconstruction later on, we formulate the whole process in the gradient domain [Sumner et al., 2005, Sorkine and Alexa, 2007]. We first consider a single part. Assume that we are given a part shape space by its urshape , its mean and variation modes , and the standard deviations . Our objective is to reconstruct an instance of this shape with shape parameters . Because of the formulation as an optimization problem, multiple parts can be coupled along boundaries, thereby implicitly constraining the reconstruction and finding the best embedding of the part graph in a least-squares sense.
We model the similarity by comparing each vertex with a reconstructed vertex . The residual is minimized in a least squares sense. In order to get a smooth transition between multiple parts later, we follow [Sumner et al., 2005] and formulate the optimization in a gradient domain. We do not compare absolute coordinates but edge vectors in the mesh. Finally, we also include an orthogonal transformation to be invariant to rigid motion (translational invariance is automatically obtained by working on edge differences).
Formally, we get the following energy:
Again, denotes the set of neighboring vertices of in the mesh . is the cotangent weight of the edge in the mean shape. The variable is a rotation variable, to be optimized along with the variables and .
This formulation is an extension of the as-rigid-as-possible shape deformation of [Sorkine and Alexa, 2007], encouraging the result to be as-close-as-possible to a linear subspace of models (ignoring rigid differences as well). We use the same optimization method: The linear system is solved alternatingly with an update of the rotation matrix (which, in our approach, is global for the whole part); see Sorkine and Alexa’s paper [Sorkine and Alexa, 2007] for details.
Reconstructing Part Graphs
In order to reconstruct shapes consisting of multiple parts, we add up the energies for all parts. Along the boundaries, the analysis stage gives us fixed and consistent correspondences. Therefore, we can remesh the urshapes of the parts such that they share common points along the boundaries. We then use the same variables in order to enforce a continuous solution (see Figure 6a).
Improved smoothness: Although the shape of the boundary curve transports information between pairs of parts, it only captures limited information on the correlation between the part shapes. We therefore learn a more expressive model from the input data: We form an extended region by gathering the geometry within a fixed distance to the boundary between the pair of parts, called pair geometry (Figure 6b). As the parts are in dense correspondence, we have dense correspondences between all pair geometry that connects the same part type through the same pair of boundaries. We build the probabilistic shape space for the pair geometry by a simple PCA analysis. We add the additional energy to the overall energy for all docked part pairs.
To avoid discontinuities, we use smooth weights for all singleton part and pairwise constraints (Figure 6c): The attraction to the shape spaces of the parts fades continuously to zero when approaching the boundaries. Contrarily, the attraction to the pair geometry model grows when moving towards the boundary of the parts. We weight each vertex by , where is the distance to the boundary. For seamless results, we set to to blend within about one third of the part diameter.
We have implemented the method in C++ and tested the implementation on a dual socket PC (Intel Core i7 with 2.6Ghz and 6 cores per processor). The results are shown in Figures 1 and 7-9. We strongly encourage the reader to watch the accompanying video, which shows interpolation and sampling results from the constructed shape spaces; these make the improvements due to our method much clearer.
Dense correspondences from coarse co-segmentation: We use the painting interface discussed in Section 4.1.1 to annotate a number of models from the SHREC 2007 model collection. The user has to mark the colored regions shown in Figure 1b,9b by painting on the surface. Additionally, point-to-point correspondences have to be set if the initialization is not clear. For example, for the birds (Figure 7d), the tip of the wings needed one more such point match per wing. Additional constraints are not always necessary; for example, the animals data set has been build from the user segmentation only. After such initialization, we run the optimization. The user has to chose the parameter as well as the level of resolution for the remeshing step (after initial cross parametrization). The first parameter is critical for the results, the second trades-off run-time and accuracy. Finding an appropriate annotation and parametrization that works for a whole shape ensemble requires multiple iterations of interaction and optimization. Here, computing correspondences, minimizing Eq. 12), took on average about 20min for a shape set. Given the additional steps (parametrization, remeshing etc.), the net computation time adds up to to roughly 1h per model. In our examples, interaction and computation amounted to up to 6h for our example models, depending on the complexity (e.g., animals were more difficult than teddies). It would probably be possible to automate the procedure by using recent fully automatic co-analysis [Huang et al., 2011, Sidi et al., 2011, Huang et al., 2012, Kim et al., 2012] for initialization, but this is still subject to future work.
Results: The resulting correspondences within several subsets of this collection are shown in Figure 7: We use a checker-board texture projected to one instance of the input and transfer it to all other models to visualize correspondences. Additionally, differently colored regions depict parts. The resulting correspondences capture salient features of the models and there is not unwarranted drift. This is very well visible in our video, where we obtain good interpolations for within all of the shape spaces: Intermediate shapes due to morphing as well as due to random sampling from the underlying Gaussian at the learned standard deviations (sampling at ) are plausible (Figure 7f). We should highlight that the model is able to handle structures with fine details, such as the legs of the animals in Figure 7a. As discussed in Section 4.1.1, we have tried various alternative approaches, all of which failed at this data set.
Comparison to pairwise local registration: We compare to a number of base-line methods first. In all cases, matching is done by first sampling 43 points uniformly from our solution to be used as initialization and then switching to deformable ICP [Allen et al., 2003]. We have also tried alignment without landmarks, as well as deactivating landmarks; the show result (keeping the landmarks during ICP) yielded the best results.
Figure 9 shows the results of pairwise matching between a pig and a young deer. We examine as-rigid-as-possible (ARAP) deformation model ([Sorkine and Alexa, 2007], Figure 9c) and a closely related variant, using a smooth subspace deformation model ([Adams et al., 2008], Figure 9d): The subspace model uses a volumetric low-frequency basis ( grid in our case), which leads to smoother results than ARAP. Nonetheless, both cases suffer from artifacts such as wrinkles and drifting correspondences. The video illustrates the disastrous effect on the obtained shape spaces.
Thin-plate splines (TPS) are substantially better (Figure 9e), which was to be expected as this is the current standard solution for this type of matching problems [Allen et al., 2003, Hasler et al., 2009]. Nonetheless, the TPS model still creates wrinkles and unwarranted drift. Our video shows various artifacts in the resulting shape spaces.
Intrinsic matching methods: In order to compare to recent state-of-the-art methods, we have also performed pairwise matches with blended intrinsic maps (BIM) [Kim et al., 2011]. As shown in Figure 9f, the pairwise partial isometries cannot capture the variation in this challenging data set well (please note though, that BIM is a global correspondence method; it solves a more difficult problem than our paper). Subsequently, we use the hub-and-spoke ensemble matcher of Huang et al. [Huang et al., 2012] (also a global matcher) that takes these results as input and performs a selection of best partial isometries in order to create consistent equivalence relations. The method improves the quality of the correspondences (Figure 9g), but substantial misalignment persists, which is seen best in the morphs shown in our video (remark: the output of their method is not as dense as the original vertices; we use interpolation to visualize the results; the same artifacts are also visible in the sparser output alone). In comparison, our method (Figure 9i) has a very good feature alignment and virtually no drift (see the video). If we deactivate the compactness term for the shape space, keeping only the bi-Laplacian smoothing, the results considerably degrade (Figure 8i); the effect on the obtained shape spaces is catastrophic (see video).
Modeling with deformable parts: The reconstruction from part graphs is formulated as an optimization problem. A variational approach permits us to easily include additional constraints. For example, any of the points can be fixed. We can for example use an energy to implement handles that the user can attach to the shape for editing. Further, shape parameters can be prescribed. We use energies of the form to control the shape of individual parts. We can also couple parameters of different parts (energies of the form ), for example, to keep shapes of the same type symmetric. The accompanying video shows some morphs between shapes with random shape space parameters, as well as an interactive editing session. A result of interactive editing is also shown in Figure 8c.
Impact of the part-based model: Using a part-based approach has a number of advantages: First, as shown in Figure 7a,b and in the video, we can capture discrete, topological variations in addition to continuous shape parameters: The irons consist of parts that can be assembled in different variations; the four-legged animals also distinguish open and closed mouths. Learning these shape families would be very challenging with global approaches. Despite the part-wise approach, our gradient-domain synthesis algorithm yields perfectly smooth boundaries in all cases (deactivating for example the smoothed connections illustrated in Figure 6c degrades the quality significantly). Further, our learning method benefits from symmetry within a shape; for example, all four legs in each animal share the same shape space, similarly the wings of the birds. We also obtain additional benefits: As illustrated in Figure 8a,b we can learn more compact shape spaces using well-chosen parts. A global model yields bad correspondences for strong entropy penalties (i.e., low values of ). Reducing these improves the results, but the model then learns global correlations that are often unwanted. For example, in the case of the teddy bear in Figure 8c, global pose correlations are captured, which tilt the object against the position constraints on the chest; this that make harder and yields worse results (see also the video for an animated visualization). In summary, parts give us a more flexible model that allows us to integrate topologically diverse shapes and to learn shape spaces from fewer examples, avoiding overfitting.
|(a) four legged animals||(b) irons|
|(c) teddy bears||(d) birds|
|(e) fish||(f) samples from the|
|shape space of (a)|
|(a) part-based vs. global optimization|
|high penalty on|
|(b) part-based vs. global optimization|
|lower penalty on|
|part-based (left) vs. global (right)|
|(a) source shape||(b) target shape||(c) elastic (ARAP)|
|(d) elastic (subspace)||(e) thin-plate splines||(f) blended intrinsic maps|
|(g) spoke and hub||(h) our result||(i) input shape collection|
Limitations: The most important limitation of our method is that it is a local optimization technique, thus requiring quite some user interaction as well as parameter choices. Although we require only a coarse initialization, a too coarse annotation causes the algorithm to get stuck in a local optimum. A further, theoretical limitation is that we cannot formally guarantee bijectivity of correspondences, but we have not observed problems in practice. Finally, the hard-constraints for the surface constraints in the optimization limits the applicability to manifold input. Noisy and, in particular, incomplete data from 3D scans currently cannot be handled.
6. Conclusions and Future Work
We have presented a new method for refining correspondences in families of shapes. By taking the compactness of the shape space into account as an optimization criterion, we obtain high-quality dense correspondences among shapes of considerable variability, which is not possible with previous methods at a comparable level of quality: In direct comparison, previous methods show substantial artifacts in such situations that we can avoid. Even difficult situations such as strong deformations and widely varying geometry yield good results. Our method can handles objects of general topology, it handles challenging meshes with small feature sizes reliably, and is able to learn from objects of varying part composition, which can be used to synthesize new shapes with variable part configuration and continuous variability that adapts automatically to the designed part layout. Further, the part-based approach yields higher quality correspondences and is a useful tool to avoid overfitting.
In future work, we would like to extend the method towards fully automatic global matching, avoiding tedious manual initialization. Recent progress in co-segmentation would provide a starting point here, but a fully automatic method would require making our method robust to slight variations in part topology and outlier mismatches. In the long term, the question of how to build compact explanations from observed data is of fundamental importance. An ultimate modeling system with deformable parts would decompose shape collections automatically to obtain a shape grammar and various deformable, dockable shape spaces of parts, both optimized for compactness of encoding. While our model can in principle already handle such scenarios in terms of representation and synthesis, the automated analysis is the key challenge.
- [Adams et al., 2008] Adams, B., Ovsjanikov, M., Wand, M., Seidel, H.-P., and Guibas, L. J. (2008). Meshless modeling of deformable shapes and their motion. In Proc. of SCA.
- [Allen et al., 2003] Allen, B., Curless, B., and Popović, Z. (2003). The space of human body shapes: Reconstruction and parameterization from range scans. 22(3):587–594.
- [Blanz and Vetter, 1999] Blanz, V. and Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In Proc. SIGGRAPH, pages 187–194.
- [Botsch and Kobbelt, 2004] Botsch, M. and Kobbelt, L. (2004). A remeshing approach to multiresolution modeling. In Proceedings of the 2004 Eurographics/ACM SIGGRAPH symposium on Geometry processing, SGP ’04, pages 185–192, New York, NY, USA. ACM.
- [Bronstein et al., 2006] Bronstein, A. M., Bronstein, M. M., and Kimmel, R. (2006). Generalized multidimensional scaling: a framework for isometry-invariant partial surface matching. Proc. National Academy of Sciences (PNAS), 103(5):1168–1172.
- [Brown and Rusinkiewicz, 2007] Brown, B. and Rusinkiewicz, S. (2007). Global non-rigid alignment of 3-d scans. ACM Trans. Graph., 26(3).
- [Calakli and Taubin, 2011] Calakli, F. and Taubin, G. (2011). Ssd: Smooth signed distance surface reconstruction. Computer Graphics Forum, 30(7).
- [Cates et al., 2006] Cates, J., Fletcher, P., and Whitaker, R. (2006). Entropy-based particle systems for shape correspondence. In MICCAI Workshop Mathematical Foundations of Computational Anatomy, volume 9, pages 90–99. Med Image Comput Comput Assist Interv. MICCAI 2006.
- [Cootes et al., 1995] Cootes, T., Taylor, C., Cooper, D., and Graham, J. (1995). Active shape models-their training and application. Computer Vision and Image Understanding, 61(1):38 – 59.
- [Davies et al., 2010] Davies, R., Twining, C., Cootes, T., and Taylor, C. (2010). Building 3-d statistical shape models by direct optimization. Medical Imaging, IEEE Transactions on, 29(4):961 –981.
- [Davies et al., 2002a] Davies, R., Twining, C., Cootes, T., Waterton, J., and Taylor, C. (2002a). A minimum description length approach to statistical shape modeling. Medical Imaging, IEEE Transactions on, 21(5):525 –537.
- [Davies et al., 2002b] Davies, R. H., Twining, C. J., Cootes, T. F., Waterton, J. C., and Taylor, C. J. (2002b). 3d statistical shape models using direct optimisation of description length. In Proc. European Conf. Comp. Vision (ECCV), pages 3–20. Springer.
- [Duda et al., 2000] Duda, R. O., Hart, P. E., and Stork, D. G. (2000). Pattern Classification (2nd Edition). Wiley-Interscience.
- [Ericsson and Åström, 2003] Ericsson, A. and Åström, K. (2003). Minimizing the description length using steepest descent. In Proc. British Machine Vision Conference, Norwich, United Kingdom, volume 2, pages 93–102.
- [Feng et al., 2008] Feng, W.-W., Kim, B., and Yu, Y. (2008). Real-time data-driven deformation using kernel canonical correlation analysis. ACM Transactions on Graphics, 27(3).
[Hähnel et al., 2003]
Hähnel, D., Thrun, S., and Burgard, W. (2003).
An extension of the icp algorithm for modeling nonrigid objects with
Proc. Int. Joint Conf. on Artificial Intelligence (IJCAI), pages 915–920.
- [Hasler et al., 2009] Hasler, N., Stoll, C., Sunkel, M., Rosenhahn, B., and Seidel, H.-P. (2009). A statistical model of human pose and body shape. In Dutré, P. and Stamminger, M., editors, Computer Graphics Forum (Proc. Eurographics 2008), volume 2, Munich, Germany.
- [Heimann et al., 2005] Heimann, T., Wolf, I., Williams, T., and Meinzer, H.-P. (2005). 3d active shape models using gradient descent optimization of description length. In Christensen, G. and Sonka, M., editors, Information Processing in Medical Imaging, volume 3565 of Lecture Notes in Computer Science, pages 1–3. Springer Berlin / Heidelberg. 10.1007/11505730_47.
- [Hill and Taylor, 1994] Hill, A. and Taylor, C. J. (1994). Automatic landmark generation for point distribution models. In Proceedings of the conference on British machine vision (vol. 2), BMVC 94, pages 429–438, Surrey, UK, UK. BMVA Press.
[Huang et al., 2011]
Huang, Q., Koltun, V., and Guibas, L. (2011).
Joint-shape segmentation with linear programming.ACM Trans. Graph., 30(5):125:1–11.
- [Huang et al., 2012] Huang, Q., Zhang, G., Gao, L., Hu, S., Bustcher, A., and Guibas, L. (2012). An optimization approach for extracting and encoding consistent maps in a shape collection. ACM Transactions on Graphics, 31:125:1–125:11.
- [Kalogerakis et al., 2010] Kalogerakis, E., Hertzmann, A., and Singh, K. (2010). Learning 3d mesh segmentation and labeling. ACM Trans. Graph., 29(3).
- [Kim et al., 2012] Kim, V. G., Li, W., Mitra, N., DiVerdi, S., and Funkhouser, T. (2012). Exploring collections of 3d models using fuzzy correspondences. In ACM SIGGRAPH 2012 papers, SIGGRAPH ’12.
- [Kim et al., 2011] Kim, V. G., Lipman, Y., and Funkhouser, T. (2011). Blended intrinsic maps. ACM Trans. Graph., 30(4):79:1–79:12.
- [Kotcheff and Taylor, 1998] Kotcheff, A. C. and Taylor, C. J. (1998). Automatic construction of eigenshape models by direct optimization. Medical Image Analysis, 2(4):303 – 314.
- [Kraevoy and Sheffer, 2004] Kraevoy, V. and Sheffer, A. (2004). Cross-parameterization and compatible remeshing of 3d models. ACM Trans. Graph., 23(3):861–869.
- [Lévy et al., 2002] Lévy, B., Petitjean, S., Ray, N., and Maillo t, J. (2002). Least squares conformal maps for automatic texture atlas generation. In ACM, editor, ACM SIGGRAPH conference proceedings.
- [Li et al., 2008] Li, H., Sumner, R. W., and Pauly, M. (2008). Global correspondence optimization for non-rigid registration of depth scans. Computer Graphics Forum (Proc. SGP), 27(5):1421–1430.
- [Lipman and Funkhouser, 2009] Lipman, Y. and Funkhouser, T. (2009). Möbius voting for surface correspondence. In Proc. of SIGGRAPH ’09, pages 1–12.
- [Nguyen et al., 2011] Nguyen, A., Ben-Chen, M., Welnicka, K., Ye, Y., and Guibas, L. (2011). An optimization approach to improving collections of shape maps. Computer Graphics Forum (Proc. SGP), pages 1481–1491.
- [Ovsjanikov et al., 2010] Ovsjanikov, M., Mérigot, Q., Mémoli, F., and Guibas, L. (2010). One point isometric matching with the heat kernel. In Proc. of SGP, pages 1555–1564.
- [Rissanen, 1978] Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14(5):465 – 471.
- [Rusinkiewicz and Levoy, 2001] Rusinkiewicz, S. and Levoy, M. (2001). Efficient variants of the ICP algorithm. In Proceedings of the Third Intl. Conf. on 3D Digital Imaging and Modeling, pages 145–152.
[Sidi et al., 2011]
Sidi, O., van Kaick, O., Kleiman, Y., Zhang, H., and Cohen-Or, D. (2011).
Unsupervised co-segmentation of a set of shapes via descriptor-space spectral clustering.ACM Trans. Graph., 30(6):126:1–126:10.
- [Sorkine and Alexa, 2007] Sorkine, O. and Alexa, M. (2007). As-rigid-as-possible surface modeling. In Proceedings of Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, pages 109–116.
- [Sumner et al., 2005] Sumner, R. W., Zwicker, M., Gotsman, C., and Popović, J. (2005). Mesh-based inverse kinematics. ACM Trans. Graph., 24(3):488–495.
- [Sunkel et al., 2013] Sunkel, M., Jansen, S., Wand, M., and Seidel, H.-P. (2013). A correlated parts model for object detection in large 3d scans. Computer Graphcis Forum (Special Issue of Eurographics). to appear.
- [Tena et al., 2011] Tena, J. R., De la Torre, F., and Matthews, I. (2011). Interactive region-based linear 3d face models. In Proc. SIGGRAPH.
- [Thodberg, 2003] Thodberg, H. H. (2003). Minimum description length shape and appearance models. Inf Process Med Imaging, 18:51–62.
- [Thodberg and Olafsdottir, 2003] Thodberg, H. H. and Olafsdottir, H. (2003). Adding curvature to minimum description length shape models. In In Proc. British Machine Vision Conference, pages 251–260. BMVC.
- [van Kaick et al., 2011] van Kaick, O., Tagliasacchi, A., Sidi, O., Zhang, H., Cohen-Or, D., Wolf, L., , and Hamarneh, G. (2011). Prior knowledge for part correspondence. Computer Graphics Forum (Proc. Eurographics), 30(2):553–562.
- [Zhang et al., 2004] Zhang, L., Snavely, N., Curless, B., and Seitz, S. M. (2004). Spacetime faces: High resolution capture for modeling and animation. 23(3):548–558.