A Survey on Non-rigid 3D Shape Analysis

12/25/2018 ∙ by Hamid Laga, et al. ∙ 6

Shape is an important physical property of natural and manmade 3D objects that characterizes their external appearances. Understanding differences between shapes and modeling the variability within and across shape classes, hereinafter referred to as shape analysis, are fundamental problems to many applications, ranging from computer vision and computer graphics to biology and medicine. This chapter provides an overview of some of the recent techniques that studied the shape of 3D objects that undergo non-rigid deformations including bending and stretching. Recent surveys that covered some aspects such classification, retrieval, recognition, and rigid or nonrigid registration, focused on methods that use shape descriptors. Descriptors, however, provide abstract representations that do not enable the exploration of shape variability. In this chapter, we focus on recent techniques that treated the shape of 3D objects as points in some high dimensional space where paths describe deformations. Equipping the space with a suitable metric enables the quantification of the range of deformations of a given shape, which in turn enables (1) comparing and classifying 3D objects based on their shape, (2) computing smooth deformations, i.e. geodesics, between pairs of objects, and (3) modeling and exploring continuous shape variability in a collection of 3D models. This article surveys and classifies recent developments in this field, outlines fundamental issues, discusses their potential applications in computer vision and graphics, and highlights opportunities for future research. Our primary goal is to bridge the gap between various techniques that have been often independently proposed by different communities including mathematics and statistics, computer vision and graphics, and medical image analysis.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 5

page 10

page 14

page 17

page 22

page 23

page 24

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

IEEEexample:BSTcontrol Shape is an important physical property of natural and manmade objects that characterizes their external appearances. Understanding differences between shapes and modeling the variability within and across shape classes are fundamental problems and building blocks to many applications, ranging from computer vision and computer graphics to biology and medicine. In medicine, studying shapes of 3D anatomical structures and modeling their growth patterns are of particular interest since many diseases can be linked to alterations of these shapes [1, 2, 3]. In computer graphics, 3D modeling using low level primitives (e.g. polygons) is a tedious and time-consuming process. With the growing availability of 3D models via online repositories, recent research efforts have shifted their focus towards data-driven techniques for generating 3D shape variations from existing ones. Data-driven techniques are also gaining momentum in the field of 3D reconstruction and modeling from images, range scans and noisy point clouds [4], which are ill-posed problems. By leveraging shared information across multiple objects, data-driven techniques aim at discovering patterns, which can be used as geometric, structural, and semantic priors for driving the 3D reconstruction and modeling processes.

When dealing with numbers, statistics is the science for studying and modeling variabilities. Geometric shapes are fundamentally different. Research in this area, driven by the computer vision, medical imaging, computer graphics and statistics communities, seeks to develop mathematical tools for manipulating shapes in the same way we manipulate numbers. This includes:

  • Comparing the shapes of two objects, i.e. saying whether two objects have similar shape or not, and more importantly, quantifying and localizing the similarities and differences.

  • Computing summary statistics and prototypes, such as the mean shape, modes of variations and higher order statistics, of a collection of 3D models.

  • Mathematical modeling of shape variations,

    using, for example, probability distributions as generative models. These models form priors for random sampling and for statistical inferences, and

  • Exploring the shape variations

    for synthesizing valid shapes and performing interpolations, extrapolations, statistical inferences, and regressions.

Implementing these ideas, for 3D models, requires solving several challenges, each one has been the subject of important research and contributions. The first one is about the representation of shape. When selecting any representation, it is important to ensure that two different shapes cannot have the same representation and that a given representation can always be associated to a shape.

Second, almost any shape analysis task requires some measure of dissimilarity, hereinafter referred to as a metric, for comparing shapes. The challenge when designing a suitable metric comes from the fact that 3D objects have arbitrary pose, scale, and parameterization. They also bend, stretch, and change their topology and structure. Some of these variations, e.g. translation, pose, scale, and parameterization, are shape-preserving, while others affect the shape. Thus, the metric should be invariant to transformations that are shape-preserving (translation, scale, rotation and re-parameterization) and should quantify those that change the shape (bending, stretching, and structural changes).

The third difficulty in the analysis of 3D shapes is registration, i.e. matching points across objects. Any shape metric requires a registration component to help decide which point on one object is compared to which point on the other. This is often a difficult problem to solve, especially across objects with large deformations and pose variability. Thus, it is not surprising that a large body of literature is specifically dedicated to this problem, see for example [5] for a detailed survey of this topic.

Finally, there are several computation and numerical challenges that one needs to address. For instance, several shape representations are non-linear. This results in shape spaces and metrics that are non-Euclidean. As a consequence, standard techniques, such as Principal Component Analysis (PCA), for statistical analysis cannot be directly applied. Moreover, analytical solutions, e.g. for correspondence and geodesics, often do not exist and one has to rely on optimizing non-convex potential functions. This makes the problem even challenging and one often has to compromise between accuracy and computational efficiency.

The purpose of this chapter is to provide a comprehensive survey of the state-of-the-art techniques in elastic 3D shape analysis. By elastic we refer to 3D shapes that undergo elastic deformations, i.e. those that bend and stretch. This already forms a large class of natural 3D objects with a significant large body of literature and techniques dedicated to the topic. We structure the survey into four main sections. Section 2 provides a general formulation of the problem and some background materials on which the subsequent topics are built upon. Section 3 focuses on the concepts of shape spaces and metrics and their relation to physical deformations. Section 4 treats the elastic correspondence / registration problem and its closely related problem of computing geodesics between pairs of shapes. Section 5 discusses how these building blocks are used for computing mean shapes, studying the modes of variations and characterizing shape collections with statistical models. Section 6 discusses a few concrete examples where these concepts are used. Finally, we conclude in Section 7 by summarizing the main lessons learned in this survey and discussing future directions for research.

2 General formulation

We provide in this section a general formulation of the elastic shape analysis problem. We first overview the basic representations of 3D objects, which are essential for the subsequent sections of this chapter. Then, we formulate the elastic shape analysis problem and give the taxonomy that we will use for classifying state-of-the-art techniques.

2.1 Representations

Commonly, shape is interpreted as the boundary of a 3D object. Mathematically, it can be defined as a function , which maps every point of the domain to a point in . Here is equal to or depending on the choice of the representation.

A common choice of the function , and thus the domain , is the signed distance function, , whose values at a point is the signed distance of that point to the nearest point on the boundary of the object being represented. Positive (respectively negative) values of correspond to regions in space that are outside (respectively inside) the object. Subsequently, points such that form the boundary of the object. This representation is commonly known as level-set representation since for any , the points such that form a surface called level-set. This representation is implicit since one has to solve the equation in order to identify the points that belong to the object of interest.

In contrast to implicit representations, one can represent the boundaries of an object using an explicit function , which assigns to every point a three-dimensional point . The set of points then forms the 3D object, which is often referred to as an embedding of the canonical domain . Although this seems to be the most natural representation, the choice of the domain remains an open challenge since there is no single domain that is suitable for every type of objects. For instance, open surfaces such as 3D human faces can be easily embedded on a planar domain . can be, for example, a unit disk. Similarly, closed surfaces of genus-0 can be embedded onto a unit sphere .

But how is it done in practice? A 3D model is, in general, produced either by scanning a real 3D object, using the various types of 3D sensors that are currently available, or by using a 3D modeling software. In any case, a 3D model is represented as an unorganized point cloud or a uniformly-sampled set of points. To facilitate rendering and visualization processes, the point clouds are converted into polygonal meshes since many rendering algorithms as well as graphics processing units are optimized for processing planar, particularly triangular, polygons. As such, the boundary of a 3D object is represented as a piecewise linear surface in the form of a triangulated mesh , i.e. a set of vertices and triangles such that the triangles intersect only at common vertices or edges [6].

Finally, the parameterization process takes a triangulated mesh , finds a suitable continuous domain , referred to as the parameterization domain

, and then estimates the underlying continuous surface

. If is a disk-like surface, then can be chosen as a simply-connected region of the plane, e.g. the unit disk. When dealing with closed genus-0 surfaces then the unit sphere is the natural parameterization domain. We refer the reader to the paper of Floater et al. [6] for a detailed survey of different parameterization techniques.

In what follows, we assume, unless explicitly specified, that and thus is composed of the longitude and latitude coordinates.

2.2 Invariance requirements

An important challenge, when designing a framework for shape analysis, is to discard effects or variables that do not affect the shape of an object and to account only for those that affect shape. Kendall [7] described shape as a property of an object once its rotation, translation, scale, are parameterization are removed from the representation. In the field of shape analysis, the transformations associated with translation, rotation, and uniform scaling are termed shape-preserving. They are nuisance variables that should be discarded. Translation and scaling are probably the easiest ones to deal with. For instance, let be the space of all surfaces . One can discard translation by first translating so that its center of mass will be located at the origin: Here is the local surface area at

. It is defined as the norm of the normal vector to the surface at

. In the case of spherically parameterized surfaces, i.e. then and . The scale component can be also discarded by normalizing the surface in such a way that its overall surface area is one: The space of all origin-centered and unit-surface-area surfaces is called pre-shape space.

In addition to scale and translation, 3D objects undergo two other shape-preserving transformations, which are rotation and re-parameterization. Rotations, denoted by , are elements of , which are matrices. They transform each point into . Here, we denote by the rotated surface.

Re-parameterization is a diffeomorphism , which transforms a surface into . Let denote the space of all such diffeomorphisms. Diffeomorphisms are shape preserving transformations; that is, the surfaces and have the same shape, albeit having different parameterizations. Note that re-parameterization is important in 3D shape analysis because it provides registration. Consider two surfaces and where is, for example, the surface of a cat while is the surface of a horse, eventually in a different pose, see Figure 1. Let such that corresponds to the nose tip of . If and are arbitrarily parameterized, which is often the case since they have been parameterized independently from each other, may refer to any other location on , the ear tip for example. Thus and are not in correct correspondence. Putting and in correspondence is equivalent to reparameterizing with a diffeomorphism such that for every , and point to the same feature, i.e. nose tip to nose tip, etc..

Fig. 1: An example of correct correspondences between two 3D objects that bend and stretch. In general, however, the analysis process should find the optimal reparamterization of to put it in full correspondence with .

The simplest way of removing the rotation variability is to make sure that all shapes being analyzed are aligned, in a consistent fashion, to a common coordinate system. One way to do this is to compute the principal axis of the objects being analyzed, using Principal Component Analysis (PCA), and align these axis to the axis of the coordinate frame. This approach, however, is very sensitive to outliers and is sub-optimal especially in the presence of large elastic deformations.

Alternatively, invariances to rotations and re-parameterizations can be dealt with algebraically. The idea, in the case of rotations for example, is that a surface and any other surface obtained by rotating have the same shape. That is, , and are equivalent. Similarly, a surface and any of its possible reparameterizations are equivalent. Thus the set forms an equivalence class under the action of the rotation and reparameterization groups. The set of all equivalence classes is called shape space.

2.3 Problem statement and taxonomy

We are given a collection of origin-centered and scale-normalized 3D objects . For the generality of formulation, we assume that each object is represented as an embedding: , which assigns to every point a 3D point . When talking about shape analysis, we are interested in (1) quantifying similarities and differences in their shape, and (2) characterizing shape variability in the collection, which in turn can be done by computing summary statistics such as mean shape and modes of variation, and fitting probability distributions to the population in the same way as it is normally done with numbers.

Intuitively, the mean shape can be defined as the object whose shape is as close (similar) as possible to all the shapes of the other objects in the collection. This is known as the Karcher mean. Mathematically, we seek to find a shape and a set of optimal rotations and diffeomorphisms that align every surface onto the computed mean shape :

(1)

where is a certain measure of dissimilarity (or distance) that quantifies shape differences. This equation can be simplified by writing it in terms of orbits or elements of the shape space as follows:

(2)

where

Similarly, one can define the variance as the expected value of the squared distance from the mean 

[8]:

(3)

Implementing this simple general formulation requires solving the following challenging fundamental problems;

  • Choice of the metric . Computing shape statistics depends on the choice of the metric, or measure dissimilarity . The challenge is in designing a metric that captures shape differences while being invariant to the set of transformations that preserve shape.

  • The registration or correspondence problem. Naturally, the solution to Eqn. (2) depends on the way the input surfaces are parameterized, or registered, i.e. the process of finding the optimal rotations and diffeomorphisms .

  • Computational issues that arise when using complex metrics that are physically motivated. One often deals with large collections of 3D models and thus the process of evaluating the metric, computing geodesics and shape statistics should be computationally efficient.

The following sections discuss these challenging problems and present different solutions that have been proposed in the literature.

3 Shape spaces and metrics

In what follows, we represent the boundary of a 3D object as an embedding . Here, we assume that the parameterization domain is the unit sphere. It maps every point to a point . For clarity purposes, we assume at this stage that the given surfaces have been normalized for translation and scale, thus they are elements of .

In shape analysis, a dissimilarity measure quantifies the amount of deformations, or energy, that one needs to apply to one shape in order to align it onto the other. Consider the example of Fig. 2 where is a straight cylinder and is a bended one. The deformation of onto results in a sequence of intermediate shapes such that , and . Here, is a (deformation) vector field such that . The sequence can be seen as a path or a curve in the preshape space . Its length is given by:

(4)

where is a certain measure of distance between the surfaces and . When is sufficiently large, can be seen as a continuous deformation of onto . It can be interpreted as a parameterized path such that . Its length is given by:

(5)

Here, is in fact an infinitisemal vector field that deforms the surface . It can be also interpreted as a small perturbation of the surface. The inner product, , also known as the metric, measures the strength of this vector field. Thus, integrating over provides the length of the path.

In general, a surface is treated as an element, or a point, in a pre-shape space . A vector field that deforms is then a vector that is tangent to at . Let denote the tangent space to at . A metric is an inner product on . It takes two tangent vectors and and returns their inner product . The norm of a vector , according to the metric , is given by . Since there are many paths that deform onto , we are particularly interested in the shortest one, under the metric, which is called the geodesic in the pre-shape space :

(6)

The length of the geodesic path is the dissimilarity, in the preshape space, between the two surfaces:

Since almost every 3D shape analysis task includes a step in which shapes are compared based on some measure of dissimilarity, the choice of the metric is very critical for the subsequent analysis tasks.

3.1 Kendall’s shape space

The fundamentals of statistical analysis of shapes have been laid by Kendall [7], as early as , and advanced by many others [9, 10, 11, 12]. The basic idea is to represent the shape of a 3D object using a finite set p of ordered points, called landmarks, . These points are sampled from the shape boundary and put in correspondence across shapes.

In the case of our setup where a surface is treated as a mapping from a parameterization domain to , Kendall’s representation is equivalent to having a fixed finite sampling of the parameterization domain . That is . As such, the space of all surfaces that can be represented with landmarks is . The pre-shape space, , i.e. the space of surfaces that are normalized for translation and scale, is a hypersphere , which is a subspace of . To make the representation invariant to rotations, Kendall defines the group action whose orbits are given by . The shape space can then be viewed as the quotient space . The rotation-invariant geodesic distance between two surfaces and is given as the geodesic distance between their orbits on , i.e. :

(7)

where is the length of the arc on that connects to . A geodesic path is then a great circle on between and where Kendall’s shape space, introduced in 1977, and its subsequent developments [9, 10, 11, 12], are considered as the foundation that lead to the major recent developments in statistical shape analysis.

3.1.1 Morphable models

In the case of planar objects, simply treating Kendall’s representation as a vector space, i.e. assuming that the preshape space is Euclidean, and is an metric, results in the 2D Active Shape Models (ASM) of Cootes et al. [13, 14, 15]. This formulation has been later used for the analysis of the shape of 3D objects that are in static poses, leading to what is now known as morphable models. Morphable models have been originally introduced for the analysis of the 3D shape of human faces [16]. They have been later used for the analysis of the 3D shape of human bodies [17] as well as the shape of objects originating from various domains including archaeology, astronomy, morphometrics, and medical diagnosis [12, 18, 11].

Mathematically, for a given pair of vectors and , the metric is defined as:

(8)

where is the standard inner product. The geometric interpretation of this metric is as follows; A surface can be optimally deformed to another surface by simply adding to each point a displacement vector such that The optimal path that connects to , induced by the metric of Eqn. 8, is simply a straight line. Subsequently, the difference, or dissimilarity, between and is given by

(9)

Assuming that is Euclidean, and under a fixed parameterization, finding the optimal rotation that aligns one surface onto another is equivalent to finding the rotation that minimizes . This results in the Procrustean metric on the shape space

, which can be efficiently solved using Singular Value Decomposition (SVD) 

[19]. Now, for a fixed rotation and reparameterization, i.e. the surfaces are already registered, both the mean shape of Eqn. (1) as well as the covariance matrix have analytical solutions:

Thus, efficient parameterization of the shape variability in a collection of 3D shapes can be performed using Principal Component Analysis (PCA). Let

, be the leading eigenvalues of

, and

their corresponding eigenvectors, called

modes of variation. A new statistically feasible shape is given by where are coefficients that control the contribution of each mode of variation. Arbitrary new shapes can be generated by varying the coefficient vector

whose probability follows a Gaussian distribution defined as

. Thus, if one wants to synthesize a plausibe shape, i.e. a shape that is similar, but not identical, to those in the collection, one only has to generate arbitrary s and retain those that have high probability.

This approach is known as morphable models and is an extension of the Active Shape Models (ASM) developed by Cootes et al. [13, 14, 15] for the analysis of planar objects. They have been first introduced to 3D shape analysis by Blanz and Vetter in their seminal work on the synthesis of 3D faces [16]. Morphable models have been later generalized to characterize the space of the entire human body shapes [17]. Both in [16] and [17], the objects under study are assumed to be in neutral poses and their landmarks are in a one-to-one correspondence.

3.1.2 The non-linear nature of Kendall’s shape space

(a) Linear path (b) Geodesic path
by SRNF inversion
Fig. 2: Linear interpolation in vs. geodesic path by SRNF inversion (see Section 4). Image courtesy of [20].

Although the metric on Kendall’s shape space has been extensively used in a wide range of applications, e.g. archaeology, astronomy, morphometrics, medical diagnosis [12, 18, 11], it is only suitable for computing geodesics, and thus statistics, between shapes that undergo small deformations, e.g. anatomical shapes, 3D faces, or human shapes in a neutral pose. This main limitation is illustrated in Figure 2 using two synthetic cylinders. The example shows optimal deformation paths between surfaces and , where is a straight cylinder and is a bent cylinder. Figure 2-(a) shows the linear path between and in (i.e. after full registration) obtained using the metric by connecting each pair of corresponding points with a straight line. The intermediate shapes along this path shrink unnaturally. Figure 2-(b) shows a natural geodesic path computed with a metric that captures the shape deformations (here using SRNF inversion, which will be described later).

Kilian et al. [21] build on Kendall’s representation a new metric that does not suffer from the shrinkage issue of the metric. The idea is to define the distance between a surface and as the strength of a vector field that penalizes elastic deformations, enforcing as rigid-as-possible and as-isometric-as-possible deformations along geodesic paths. That is, given a shape , define the inner product between two vectors and in as:

(10)

where and is the local Voronoi area at the vertex . The first term penalizes non-rigid deformations. To define it, consider first that the deformation vector can be written as where is only composed of rigid transformations and is only composed of non-rigid deformations. To have a metric that is invariant to rigid transformations while quantifying non-rigid ones then one can set . As such, if and only if is a rigid motion. This term ensures invariance to translation, scale and rotation without normalization in a pre-processing step.

The second term of Eqn. (10) favours bending over stretching. It is defined as the strength of the deformations that remain after discarding isometric motions, i.e. motions that preserve geodesic distances along the surface of . In other words, it can be seen as a measure of stretch in terms changes in the length of curves along the surface . Since Kilian et al. [21]’s approach works on triangulated meshes then

where and are two adjacent vertices on the triangulated mesh that represents the surface . Thus, if and only if just bends the shape . In other terms, when deforming the straight cylinder of Figure 2, the metric will favor bending over stretching and thus it will minimize the shrinkage that is observed when using the standard metric.

The last term is a regularizer that makes the metric Riemannian. It also ensures that the geodesic between two isometric shapes is unique. The main limitation of this metric is the choice of the parameter in order to ensures that the metric is Riemannian. Large values of will give high weight to the Euclidean term of Eqn. (10). Thus, the metric may suffer from the shrinkage issue observed in morphable models (see Figure 2). Note that Kilian et al. empirically set this parameter to .

3.2 Metrics that capture physical deformations

Instead of using the metric on , one would like to explicitly capture the type of deformations that one needs to apply to in order to align it onto . Such deformations can be of two types; bending and stretching. This requires redefining Eqn. (5) in terms of an energy function that penalizes these deformations:

where and are respectively the stretching and bending energies. The parameters and control the contribution of each of the terms to the energy function. Below we discuss two important pieces of work that implement this model.

The first one (section 3.2.1) is inspired by the elasticity theory in physics [22] where surfaces are treated as thin shells, i.e. a thin three-dimensional material of thickness . Stretching in this case is caused by in-layer (tangential) shear or compression while bending is caused by friction due to transversal shear [23]. The second one (Section 3.2.2) showed that with specific choices of the bending and stretching terms as well as their weights, the deformation model of Eqn. (3.2) reduces to an metric in the space of Square Root Normal Fields (SRNF), a special representation of surfaces [24].

3.2.1 The shape space of thin shells

Based on a physical deformation paradigm, a surface can be viewed as the middle layer of a thin viscous material of some finite (albeit small) thickness  [25]. Wirth et al. [26], Heeren et al. [23, 27], Berkels et al. [28], and Zhang et al. [25] used the deformation model of Eqn. (3.2), with and proportional to the squared thickness . The stretch energy, referred to as the membrane term, is given as , where

(12)

The term penalizes material compression. and

are the Lamé constants of the tangential Newtonian dissipation measure. The Cauchy-Green strain tensor,

, where is the first fundamental form or metric of the surface , accounts for changes in the metric between the source and target surfaces. In particular, the trace of the Cauchy-Green strain tensor, , accounts for local changes in curve lengths while the determinant accounts for local changes in surface area [25]. Thus, this can be seen as the generalization of the stretch metric of Kilian et al. [21], which only considered changes in the length of curves.

The bending energy should account for out-of-plane bending and changes in curvature. Since such information is encoded in the shape operator, then the most natural choice of measure for quantifying bending is variation in the shape operator [23]:

(13)

where denotes the Frobenius norm. The energy of Eqn. (13) takes into account full change in the second fundamental form. Heeren et al. [27] and Windheuser et al. [29] simplified this metric by only considering changes in the mean curvature, which is the trace of the shape operator ;

(14)

This energy is also known as the Willmore energy. In the discrete setup, Heeren et al. [30] approximated the mean curvature by the dihedral angle, i.e. the angle between the normals of two adjacent faces on the triangulated mesh.

Since this deformation model acts on the Cauchy-Green tensor and mean curvature then one can define the preshape space as the space of all pairs . An important observation that has been made in [23, 25, 31] is that the Hessian of the elastic deformation energy results in a proper Riemannian metric on the space of shells, modulo rigid body motions. That is, for an infinitesimal deformation , is a proper metric that measures the strength of this deformation. (Here, is the second derivative of in the direction of .) Moreover, Heeren et al. [23] showed that if and only if induces an infinitesimal rigid motion. This last property is highly desirable in shape analysis where, as described in Section 2.2, one seeks a framework that is invariant to rigid motions of shapes. This properties guarantees this invariance without normalization for translation and rotation of the shapes being analyzed. Shapes, however, need to be normalized for scale.

3.2.2 The shape space of square-root representations

Instead of using the Hessian of the deformation energy, Jermyn et al. [24] showed that by representing a surface using the pair , one can define a full elastic metric on the product space of metrics (i.e. first fundamental forms) and unit normal fields as follows:

(15)

The first term of Eqn. (15) accounts for changes in the shape of a local patch that preserve patch area. The second term quantifies changes in the area of the patches. Thus, the first two terms capture full stretch. The last term measures changes in the direction of the surface normal vectors and thus captures bending. The parameters , and are positive weights that one can use to penalize the different types of deformations, e.g. if one wants to favor stretch over bending then the third term should be given a higher weight. As such, instead of working in the original space, e.g.  and , one can define a mapping such that where is the the first fundamental form of at and is the unit normal vector to the surface at . The preshape space is then the product space of metrics and unit normal vectors. The shape space is the quotient space , via the action of the rotation and re-parameterization groups.

Jermyn et al. [24] defined a new representation of surfaces called the square-root normal fields (SRNF), which are essentially surface normals scaled by the square root of the local area:

where and . The space of SRNFs, hereinafter denoted by , has very nice properties that are relevant for shape analysis. In particular, Jermyn et al. [24] showed that the metric on is a special case of the full elastic metric given by Eqn. (15). It corresponds to the case where , and :

(16)

Since is just , the Eqn. (16) is equivalent to

(17)

Thus, the metric in the space of SRNFs is equivalent to a weighted sum of surface area change and surface bending. This property of SRNFs makes them very promising as a representation of surfaces for elastic shape analysis. If geodesics, mean shapes, PCA, etc. could be computed in under the metric, and then mapped back to or then there would be large gains in computational efficiency with respect to other metrics such as those defined on the space of thin-shells. Unfortunately, there is no analytical expression for for arbitrary points in . Moreover, the injectivity and surjectivity of remain to be determined, meaning that for a given , there may be no such that , and if such an does exist, it may not be unique.

If one cannot invert the representation, one can always pull the metric back to under and perform computations there, as in  [32], but this is computationally expensive, and rather defeats the purpose of having an metric in the first place. Although an analytical solution to the inversion problem remains an open problem, Laga et al. [20] showed that one can always estimate, for a given a surface whose SRNF representation is as close as possible .

SRNF representation is a generalization of the Square-Root Velocity Function (SRVF) [33] proposed for elastic analysis of planar curves and used in many applications including biology and botany [34, 35]. It is also a special case of the family of square-root maps (SRM). Another example of SRM is the Q-maps introduced by Kurtek et al. [36, 37, 38, 39, 40], which, similar to SRNF maps, is used for the analysis of the shapes of parameterized surfaces using the metric. It is defined a mapping , where such that .

Properties. SRNF and Q-maps share very nice mathematical properties that make them suitable for shape analysis. In particular, both SRNFs and Q-maps are a subset of . This means that one can use the standard metric in the space of SRNFs or Q-maps for the analysis of shapes instead of the complex and expensive pullback elastic metrics. Furthermore, if a surface is re-parameterized according to , its corresponding Q-map or SRNF is given by , where is the determinant of the Jacobian of .

Another main motivation behind the use of Q-maps and SRNFs in surface analysis is that the action of is by isometries under the metric, i.e. , . This property is very important for joint rotation and re-parameterization-invariant comparison, and thus elastic registration, of surfaces as will be discussed in Section 4. As such, Q-maps and SRNF maps have been used for optimal re-parameterization, and thus registration, of spherical and quadrilateral surfaces [38, 37, 24, 20] that undergo isometric as well as elastic deformations, and surfaces that contain missing parts. They have been also used for computing geodesic paths between such surfaces [39, 20]. The strength of these two approaches is that shape comparison, registration and geodesic computations are performed under the same elastic Riemannian metric.

The main difference between Q-maps and SRNFs is that (1) the former lacks a physical interpretation; it does not have a clear relationship to an underlying elastic metric. The representation was solely devised for convenience of being able to compare the shapes of parameterized surfaces using the metric, and (2) unlike Q-maps, SRNF representation is invariant to translation since it is computed using the surface normals.

3.3 Transformation-based representations

The approaches described in Sections 3.1 and 3.2 represent a shape as an embedding while the metric quantifies the deformations the shape undergoes. Another approach is to model transformations that 3D objects undergo rather than the transformed objects themselves. These representations derive from Grenander’s pattern theory [1] in which a shape is not represented as such but as a deformation of another (fixed) shape, , called template [41]. This paradigm has been introduced by Grenander [1], developed in [42, 43, 44, 45] and applied to various problems in computational anatomy and more recently to the analysis of human body shapes [46].

Let’s assume that we are given a set of registered surfaces represented as triangulated meshes with the same graph topology as the template . We assume that the template has triangular faces, . Each triangular face can be represented with a matrix of its three vertices, , or with its edge matrix . Any surface of triangles can be represented with a set of transformation matrices that deform the template , i.e.  where is a matrix that encodes various types of deformations that can undergo. Hereinafter, we refer to as the deformation model.

Existing methods differ in the choice of (1) the template , (2) the deformation model , and (3) the metric used to measure distances in the space of deformations. We will discuss these aspects in the following subsections.

3.3.1 Choice of the template

When dealing with 3D models of the same class, e.g. human body shapes, one can define the template as a typical 3D model in a neutral pose. This has been used in the SCAPE model [46] and in many other papers [47, 48] for the analysis of human body shapes. In general, the template can be any surface, e.g. a sphere, of same topology and triangulation as the surfaces being analyzed. In practice, however, domain specific templates are desirable to improve robustness and computation time. Alternatively, one can define as a single canonical triangle in the -plane [49]. An exception to these representations is the SCAPE model [46] where, in addition to the template mesh , an articulated skeleton is used to capture articulated motion. These approaches operate on meshes that are in full correspondence with exactly the same connectivity.

3.3.2 Deformation models

The deformation model encodes different deformations that a triangular face undergoes with respect to its corresponding triangle on . These can be rotations due to articulated motion (e.g. bending an arm of a human body), stretching due to shape differences between say two individuals in the same pose, or pose induced deformations (e.g. muscle stretch due to arm bending).

SCAPE [46] used two deformation models, one acting on a template mesh and another on an articulated skeleton (which captures articulated motion). This greatly simplifies the mathematical formulation; A given mesh is represented with three sets of transformation matrices: , , and for pose induced deformations, body shape deformations, and articulated motions, respectively. Here, acts on an articulated skeleton used to capture articulated motion where refers to the rotation of the -th part of the skeleton to which the -th face belongs to.

Using separable deformation models assumes that intrinsic shape deformations and pose variations are independent. This, however, prevents one from capturing phenomena where there is a strong correlation between body pose and local deformations (e.g. near the joints or at the muscles). For example, the surface deformation generated by the motion performed by an athletic person exhibits different properties than the same motion carried out by a person with less pronounced muscles [47]. Separable models cannot capture such correlation between pose and body shape.

To overcome this limitation, Hasler et al. [47, 49] introduced a bi-linear model of pose and shape, without using a skeleton, where each triangular face is defined as . The matrices and , which represent the pose and shape components, respectively, are affine transforms applied to the template . Note that, while SCAPE requires a set of 3D shapes of the same subject in different poses to learn the space of pose deformations, Hasler et al. [47, 49] learn both spaces using a random mix of 3D models from different subjects in various poses. Both methods have shown detailed representation of the human body shape. However, using strong priors on a human body model, i.e. manually defined skeleton and body parts, makes these methods difficult to extend to other, especially free-form, object categories without redesigning the representation [49].

Another property of the representations above is that they have redundant degrees of freedom (DOF). For instance, Hasler et al. model deformations with

DoF and a non-linear encoding of triangle deformations that captures dependencies between pose and shape. Freifeld and Black [50] showed that triangle deformations lie in a dimensional nonlinear manifold. More precisely, , where is a rotation that maps the template triangle to a canonical planar triangle. The scaling and transformation are in-plane transformations. Finally, and are rotations that map the planar triangle to the 3D space. Note that is fixed since it depends on the template. Freifeld and Black also showed that is invertible and has DoF, which eliminates deformations that do not have physical meaning. A triangular mesh of faces is then represented with -tuples , where , is a matrix, and .

Finally, medial representations (M-reps), which are an extension of curve skeletons [51] and medial surfaces [52], have been introduced by Blum [53] as a tool for representing the geometry and structure of anatomical shapes [54, 55, 56, 57]. Unlike the deformation models described above, which act on the shape’s surface, a medial surface is formed by the centers of all spheres that are interior to the object and tangent to the object boundary in two or more points. Each medial point, called medial atom, is a 4-tuple , where is the location of the atom, is the radius of the maximum sphere that is tangent to the surface in two or more points, and are two unit spoke directions. A medial atom m is an element of the manifold . Fletcher et al. [57] showed that by representing the atoms with the deformation, i.e. translation, scaling, and rotation, of a canonical atom, e.g. , they become elements of , which is a Lie group. A grid of medial atoms representing a 3D object is an element of the Lie group , the product of copies of .

Since their introduction, M-reps have been used in various medical image processing tasks, including segmentation [58, 59], registration, shape discrimination [60], and brain-structure analysis [61, 62].

3.3.3 Metrics on the space of deformations

Representing shapes with the deformations of a template leads to shape spaces that are non Euclidean (rotations, for example, are elements of ). Some methods typically use a Euclidean representation of deformations and measure distances using the metric, ignoring the geometry of the space of deformations. The SCAPE model [46], for instance, learned a shape model using the Euclidean distance and PCA on the deformation matrices using only the subjects that are in the same neutral pose. Similarly, Hasler et al. [46] learned, separately, two low-dimensional models of shape and pose, and then combined them at a later stage.

Unlike these works, both M-reps [54, 55, 56, 57] and Lie bodies [50] exploit the manifold structure of the space of deformations where geodesics and statistics are computed using proper Riemannian metrics. An important property that makes these methods of practical interest is that, despite the non-linearity of the metric as well as the manifold, geodesic distances and geodesic paths between a pair of shapes have a closed-form formula, unlike other methods which require expensive optimization schemes, see [50] and [56, 57] for the mathematical details. This significantly simplifies the process of comparing shapes and computing geodesics and summary statistics.

Note that these representations require that all the shapes being analyzed to have the same connectivity and to be in one-to-one correspondence with each other and with the template. Such correspondences need to be computed using a different approach, and more importantly, using an optimality criteria that is often different from the one used for computing geodesics and statistics.

Methods Input Normalization requirement Preshape space Metric Geodesic distance
Trans. Scale Rot. Reg.
Morphable models [16, 17] yes yes yes yes
Kilian et al. [21] no no no yes . and are the non-rigid component of and , respectively. , and
Thin shells [23, 27, 28, 25] no no no yes
Q-maps[36, 37, 38, 39, 40] , yes yes no no on
SRNFs [24, 32, 63, 20] , no yes no no on
Lie bodies [50] yes yes yes yes ,
,
M-reps [54, 55, 56, 57] yes yes yes yes , the medial atom of the th shape.
TABLE I: Taxonomy of the most representative papers where: : number of landmarks or vertices, : Cauchy-Green tensor, : the shape operator, Trans.: translation, Rot.: rotation, Reg.: registration, : prehsape space of zero-centered and unit-area normalized surfaces, : shape space, preshape space of Q-maps or SRNFs, and : shape space of Q-maps or SRNFs.

4 Registration and geodesics

Now that we have introduced the concepts of metrics, pre-shape spaces, and shape spaces, we turn our attention to the problem of computing geodesics, i.e. smooth optimal deformations, , between a source surface and a target surface :

(18)

where is given by Eqn. (5) and depends on the choice of representation and metric. This double optimization problem finds the shortest path, and thus the geodesic, in the shape space between and . The inner minimization problem assumes that the surfaces are in correspondence, i.e. fixes the rotation and reparameterization , and finds the shortest path in that connects to . The outer optimization is a registration step, which finds the optimal rotation and reparameterization that bring as close as possible to without changing its shape. Clearly, these two problems, i.e. registration and geodesics, are interrelated and should be solved jointly using the same metric (or optimality criteria).

4.1 Registration

In shape analysis, registration quality is a major factor in determining the accuracy of the subsequent analysis steps such as computing optimal deformations (geodesics) or summary statistics (e.g. mean shapes and modes of variations). It has been extensively studied in the past two decades, see [5] and [64] for comprehensive surveys and different taxonomies of the state-of-the-art. In general, the registration problem can be mathematically stated as the problem of finding the optimal transformation, , and a re-parameterization such that is as close as possible to , i.e.  is minimized. Here is a certain measure of closeness in the space of surfaces. We have seen in Section 2.2 how to remove the variabilities due to translation and scale. In this section, we will focus on the remaining variables, which are rotations and reparameterizations. Thus, we are given two surfaces and and we seek to solve the following optimization problem:

(19)

The optimization problem of Eqn. (19) is a complex problem that can be solved by iterating between the optimization over rotations , i.e. finding the optimal rotation, given known correspondences (i.e. fixed ), and then finding the optimal correspondences or reparameterization given fixed rotation .

If correspondences are given, finding the best that aligns onto is straightforward. Under the metric, the best rotation can be analytically found using Singular Value Decomposition (SVD) leading to the Procrustes analysis of shapes [65], which forms the basis of 2D Active Shape Models (ASM) [13] and 3D morphable models [16]. Similarly, if one is using the partial elastic metric of Jermyn et al. [24] (Section 3.2.2), then one can find the optimal rotation in the space of SRNFs and apply it to the original surface since the SRNF map of is exactly .

In practice, correspondences are unknown and better results are obtained if one solves simultaneously for the best correspondences and for the optimal rigid alignment.

4.1.1 Landmark-based elastic registration

In this class of methods, 3D shapes are treated as sets of landmarks sampled from their surfaces. The problem of finding correspondences between two surfaces and is then reduced to the problem of matching landmarks across the surfaces using some optimality criteria. The sampling of landmarks can be either uniform or adaptive to cope with the varying surface complexity [66, 67, 68]. For instance, in the case of triangulated meshes, each vertex can be treated as a landmark. In the case of parameterized surfaces, , then one can sample points from the domain and use as landmarks.

A popular approach for putting in correspondence the landmarks is the Iterative Closest Point (ICP) algorithm, proposed in [69, 70], which iterates between two steps: (1) a matching step where correspondences are estimated either by nearest-neighbor search using point-to-point [69] or point-to-plane [70] distances, or using some local descriptors, and (2) an optimal alignment step using the estimated correspondences. Since its introduction, many variants of the ICP algorithm have been proposed. They aimed at improving various aspects, such as speed and quality of convergence, of the original algorithm, by, for example, combining multiple distance measures with some local descriptors [71]. Nevertheless, ICP algorithm and its variants provide good results when the poses of the shapes being aligned are close to each other. When the difference in pose is large, the search space for correspondences is very large. In theory, however, three pairs of corresponding points are sufficient to define a rigid transformation. This fact has been used in [72, 73, 74] to derive RANSAC-based solutions. Their complexity is, however, high, of order , where is the number of points being aligned. Aiger et al. [75] introduced the four point congruent sets (4PCS) algorithm, which reduced the complexity of RANSAC algorithms to . It has been later optimized to achieve a complexity of order  [76].

Another approach for finding correspondences is to characterize the shape’s geometry around the landmarks using some descriptors [77, 78, 79, 80, 81, 82, 83] and then match each landmark to its closest one in terms of a distance measure in the descriptor space [84, 85]. The main difficulty, however, is that in the vast majority of applications, one is required to put in correspondence two or more 3D objects that undergo non-rigid deformations that affect the geometry of the local regions. When restricting the analysis to surfaces that only undergo isometric deformations, i.e. bending, which is a subclass of non-rigid deformations, one can design methods or descriptors that only capture the intrinsic properties of shapes, i.e. the properties that remain unchanged after changes due to bending are factored out. Examples of such approaches include Generalized Multidimensional Scaling (GMDS) [86]

, spectral analysis based on Laplace-Beltrami eigenfunctions 

[87, 88], and Heat Kernel Signatures (HKS) [89, 90]. Several papers have attempted to relax the isometry constraint so that 3D objects can be put in correspondence even when they undergo stretching. Zhang et al. [91], for example, put two objects in correspondence by finding the optimal bending and stretching that is needed to align one shape onto another. In this work, bending is measured in terms of changes in the direction of the normal vectors while stretching is measured in terms of changes in edge lengths. Thus, the approach uses a metric that is similar to those presented in Section 3.

(a) Fixed parameterization. (b) Parameterization-invariant.
Fig. 3: A toy example showing the importance of the invariance to re-parameterization. (a) Correspondence under fixed parameterization. (b) Correspondence using a re-parameterization-invariant metric of [33].

4.1.2 Elastic registration as a reparameterization problem

Instead of using landmarks, some previous works tried to find the optimal diffeomorphism that puts in full correspondence with . Since the search space for optimal re-parameterizations is large, some of the previous works such as SPHARM [92, 93] and SPHARM-PDM [94, 95], used a fixed parameterization that is analogous to the arc-length parameterization on curves. Others restricted the type of re-parameterizations to a specific subset of the space of diffeomorphisms. Lipman et al. [96], for example, showed that the optimal correspondence between nearly-isometric genus-0 surfaces, conformally embedded onto a sphere, can be found by searching for the best Mbius transform that aligns them. The approach is, however, restricted to nearly isometric surfaces. In addition, it only finds a sparse set of correspondences since complex surfaces cannot be conformally embedded onto a sphere with minimal distortion. Zeng et al. [97] extended Lipman et al.’s approach and finds a dense matching in the presence of large elastic deformations. Kim et al. [98] showed that by blending multiple maps, each one is a Mbius transform, one can match surfaces under relatively large elastic deformations. Nevertheless, the correspondences with these methods are not one-to-one. The inter-surface maps they produce are not guaranteed to be a diffeomorphism. Thus they are not suitable for statistical shape analysis tasks. In addition, these methods are limited to surfaces that can be conformally parameterized onto a sphere. Although, in theory, any closed genus-0 surface can be conformally embedded into a sphere, in practice, however, the distortion that results from conformal embedding of complex surfaces that have elongated extruding parts, e.g. human or animal body shapes, is often very large. As a consequence, several important shape parts are being under-sampled and thus cannot be correctly matched across surfaces.

Using a fixed set of landmarks or a fixed parameterization of the surfaces being analyzed imposes constraints on the correspondences that can be found and often may lead to undesirable results especially in the presence of large elastic deformations. Fig. 3-(a) shows an example where fixed parameterization fails. Better results can be achieved by treating the boundaries of objects as continuous surfaces, rather than discretizing them into point sets at the outset, and solving directly Eqn. (19) for finding the optimal sampling (i.e. reparameterization ) that better matches features across shapes as proposed in [36, 37, 38, 40, 39, 24, 99], see Fig. 3-(b) for a 2D example. Continuous representations make it possible to elastically model and optimize correspondences. The process of matching two 3D shapes is, in fact, reduced to the problem of optimally re-parameterizing one shape so that it is as close as possible to the other. Thus, correspondence becomes a re-parameterization problem, which can be mathematically formulated using diffeomorphisms.

In order to do that, however, the action of on the space of surfaces should be by isometries, under the metric i.e.

(20)

Kurtek et al. [36, 37, 38, 40, 39, 24] showed that when using the metric in the space of surfaces , the condition of Eqn. (20) is only valid when is an area-preserving diffeomorphism. This is, however, too restrictive especially when dealing with elastic registration because elastic deformations correspond to changes in surface area.

Kurtek et al. [36] and Jermyn et al. [24] showed that in the space of Square Root Maps (SRM) and in the space of Square-Root Normal Fields (SRNF), respectively, the action of is by isometries under the metric. Given that the