1 Introduction
Point matching is the problem of finding pointbypoint correspondence between point sets where each set characterizes the geometry of an object. Finding such geometrical correspondences between point sets is studied in the fields of image recognition, computer vision, and, computer graphics. A major class of point matching problems is point set registration—the problem of finding a transformation between two point sets in addition to pointbypoint correspondences. Point set registration problems can be roughly classified into two classes according to the transformation models: rigid and nonrigid transformations. A rigid transformation model is defined as a linear map that preserves the relative positions of points in a point set, i.e., rotation and translation. The rigid point set registration problem is a relatively simple problem that has been intensively studied
[1, 2, 3, 4, 5]. Nonrigid registration is a more complex problem that involves transforming an object’s geometry. Typical transformation models used for point set registration problems are the thinplate spline functions [6, 7, 8, 9, 10] and Gaussian functions [11, 12, 13, 14]. These methods are differently classified according to the definitions of the point set registration problem: energy minimization [6, 7, 10]and probabilistic density estimation using a Gaussian mixture model
[8, 9, 11, 12, 13, 14].Crucial to the success of these registration methods is a robustness to outliers, points that are irrelevant to the true geometry of the object. Several approaches have been proposed to deal with outliers, such as statistical analysis for distances between correspondent points [15, 16, 17], soft assignments [7, 18], trimming point sets through iterative random sampling [19], kernel correlation [20], explicit probabilistic modeling of outliers [11, 12, 13, 14], and the use of robust estimators such as the estimator [8] and a scaled Geman–McClure estimator [21]. Another factor crucial to the success of registration methods is the assumption of a smooth displacement field, which forces neighboring points to move coherently. The smoothness of a displacement field, also known as the motion coherence, is imposed by a regularization techniques [6, 7, 10, 8, 9, 11, 12, 13, 14]. Owing to the assumption of a smooth displacement field, such nonrigid registration algorithms seek to find transformations with sufficient global flexibility while preserving the local topology of a point set.
These methods are universally applicable to general point set registration problems as no prior knowledge, except for that concerning the smoothness of a displacement field, is assumed for the geometry of objects to be registered. This means that registration performance can be improved by using prior knowledge of object geometry. One approach to incorporating such prior information is the use of a kinematic motion for articulated objects, such as the human body [22, 23, 24, 25]. These methods show promising results but cannot be applied to objects with no kinematics. Another approach is the use of partial correspondence across two point sets [26, 27]
. These methods also show promising results, but better performance is not expected if partial correspondence is not available. Yet another approach is the use of supervised learning techniques. One candidate of such supervised learning techniques is a statistical shape model
[28, 29] that describes the mean shape and statistical variation of geometrical objects. Shape variations represented by statistical shape models are constructed from shape statistics of landmark displacements. Therefore, movements of neighboring landmarks are not assumed to be correlated, and those of distant landmarks are allowed to be dependent on one another. The statistical shape models also do not require any physical models such as object kinematics.Statistical shape models have been widely used as prior shape information for various tasks, including image registration, image segmentation, and surface reconstruction, in the fields of computer graphics, computer vision, and medical imaging. A broad range of these works were reviewed in several survey papers [30, 31, 32, 33], and lecture notes [34, 35]. In the field of computer graphics and vision, statistical shape models have mainly been used for reconstruction or completion of smooth 3D surfaces [36, 37, 38, 39, 40], which can be interpreted in a wider sense as point set registration problems with extra information such as colors, surface normals, and the articulation of the human body shape. These works include applications related to the reconstruction of 3D human surfaces such as faces [36], body shapes and poses [37, 39], skins and muscles [38], and body shapes [40].
In the field of medical imaging, the point set registration problems with statistical shape models have often been developed in noisier settings, i.e., in the presence of outliers [41, 42, 43], and these methods were later extended to image registration [44] and image segmentation [45]. All these methods are designed on the basis of nonrigid registration techniques such as probabilistic density estimation with a Gaussian mixture model [41, 42], the one with a mixture model [43], and the minimization of an energy function [44]. Typically, these methods solve either or both of the two different problems. The first problem, which is sometimes called groupwise point set registration, involves finding pointbypoint correspondence among multiple unstructured point sets, aiming at learning statistical shape models [41, 42, 43, 44]. The second problem involves finding pointbypoint correspondence between a point set that forms the average shape in a pretrained statistical shape model and a new unstructured point set that forms an object shape [42, 44]. More recently, a method of 3D surface reconstruction, originating in these works, has been proposed with an efficient use of surface normals [46].
In this article, we focus on the second problem, motivated by the fact that (a) the scalability of the algorithms for the second problem is relatively limited, because their computational costs are proportional to the product of the numbers of the two point sets to be registered, and (b) their formulation did not explicitly include the rotational term and they might not be robust against a rotated target shape. As in [46], we assume that training shapes or pretrained statistical shape models are available as prior shape information. This assumption is somewhat strong but is not unrealistic, because training shapes with pointbypoint correspondence or pretrained shape models are often publicly available; for example, 2D human hands and faces [47], 3D human poses and shapes [37], and 3D human shapes [40]. In test cases, we do not assume the use of any extra information such as color, surface normals, the articulation in the body parts, or the partial correspondence between point sets, in order not to reduce the applicability of the method.
Contributions of this work
We propose a novel nonrigid point set registration algorithm based on a supervised learning approach that we call dependent landmark drift (DLD). In the context of statistical shape modeling, the proposed algorithm is a novel optimization algorithm for fitting a statistical shape model, where one input point set forms the mean shape and another forms a target shape. The proposed algorithm is designed as a Gaussian mixture model in which the transformation model is defined as a combination of the similarity transformation and the statistical shape model. Among the related algorithms, our approach is novel in that it simultaneously provides (1) fast computation, with computational costs proportional to the sum of the numbers of two point sets to be registered, (2) simultaneous optimization of scale, translation, rotation, and shape deformation, and (3) adaptive control of the smoothness of a displacement field. In addition, our algorithm retains the merits of the registration algorithms based on the Gaussian mixture model, i.e., the automatic radius control and robustness against outliers, originating in the explicit probabilistic modeling of outliers. The effectiveness of the proposed algorithm was tested through comparisons with supervised and unsupervised point set registration algorithms.
Related works
In computer graphics, shape matching algorithms, originating in point set registration algorithms, have been actively developed. The most common method for matching two shapes in computer graphics is the iterative closest point algorithm (ICP) [35]. The ICP solves the registration problems by iterating the twostep procedure: (1) to find the point in the target surface that is closest to each point in the current deformed surface and (2) to update the deformed surface using the current pairs of closest points. In this field, the ICP has been customized to solve shape matching problems specific to the types of the object such as a human face [48, 49], hand shape and pose [50, 51], body shape and pose [52, 53, 37, 54, 55], and dynamic human shape in motion [56]. The shapes registered by these algorithms can be used for building blend shapes and shape models, and they have been successfully applied to videoretouching [39] and the automatic synthesis of 3D human animation [54, 57, 56].
The main difference between shape matching algorithms in computer graphics and the general point set registration algorithms is in the design of the objective function that measures a dissimilarity between two shapes. The objective functions of general point set registration problems are typically based on the distance between points and the smoothness of a displacement field. On the contrary, the objective functions of the shape matching algorithms is designed as a combination of the distance between points and various information such as the userdefined marker errors and the smoothness of the deformed surface [52, 54], constraints on pose and shape deformation [37, 50, 51], and consistency with additional information such as color and brightness [55, 49].
The registration algorithms that are not domainspecific have also been developed in computer graphics. Mitra et al. proposed a rigid alignment algorithm with the objective function that efficiently approximates the pointtosurface distance using the d2Tree [58]. Li et al. proposed a nonrigid registration method that incorporates the rigid transformation, local affine transformation, and the spatial coherence of the deformation [59]. Gao et al. proposed a representation of a shape, called rotationinvariant local mesh differences (RIMD) to remove the rotational effect on the dissimilarity measure between two shapes [60]. The global optimization method that solves nonrigid registration problems using a convex optimization technique was reported by [61].
2 Methods
The goal of the point set registration is to find the map that transforms the geometric shape represented as a point set and matches the target shape represented as the other point set . The set of for any is called a displacement field as it defines the displacement of each point in . Generally, the point set registration problem is defined as a minimization problem as follows:
(1) 
where
is a loss function that measures the dissimilarity between two geometric shapes,
is a functional that measures the complexity of a displacement field, and is a parameter that balances the similarity of the shapes and the smoothness of the displacement field. In this study, we use a statistical shape model (SSM) with the similarity transformation as the transformation model and use the negative loglikelihood of a Gaussian mixture model (GMM) as the loss function. To clarify the merits in the choice, we review the SSM based on the principal component analysis
[29] and then introduce a GMM as a technique of the point set registration [11, 12]. We then propose a novel registration method, and finally, discuss the reduction in its computational cost. The list of symbols used throughout this paper is available in Appendix.2.1 Statistical shape model
We begin with definitions of a landmark and a shape, required to define a statistical shape model. To obtain shape statistics from multiple geometric objects, it is common to define correspondent points across them. These points of correspondence are called landmarks. A shape is typically defined as a set of landmarks for one of the geometric objects with scale, rotation, and translation effects removed [47, 62]. Note that a point set and a shape are distinguished from each other in that (1) shapes are composed of the same number of points whereas the number of points in a point set is generally different from those in other point sets; and (2) points in shapes are correspondent across all shapes, whereas points in multiple point sets are not correspondent. Statistical shape models are constructed from training shapes, i.e., multiple point sets with pointbypoint correspondence.
Definition of a statistical shape model
The SSM is a representation of a geometrical shape and its statistical variations in an object. Because definitions of SSMs diverge according to the aim of the application or the method of construction [30], we introduce a definition based on the principal component analysis (PCA) [28, 29, 47] that can adequately describe the proposed algorithm. Suppose a shape is composed of landmarks , each of which lies in a
dimensional space. Then, the shape is represented as a vector
. The PCAbased statistical shape model is defined as follows:(2) 
where is the mean shape, is the th leading shape variation, is the th weight corresponding to the th shape variation, is the number of shape variations, and is a residual vector. To reduce information sharing in shape variations to the maximum extent, are usually assumed to satisfy the orthonormality condition where is Kronecker’s delta.
Estimation of shape variations from training data
Mean shape and shape variations are unknown, and should be estimated from multiple shapes , i.e., training data. The mean shape is simply estimated as the average of sample shapes. Suppose is a shape covariance matrix defined as
(3) 
where is the sample average of shapes . The shape covariance matrix represents statistical dependencies for landmark displacements. The th shape variation can be estimated as the
th eigenvector of
corresponding to theth largest eigenvalue. We note that the shape covariance matrix is dependent on rotations of shapes
. On the contrary, coherent moves of landmarks caused by the rotation of a whole shape cannot be represented as their covariance if the rotation angle is relatively large. Therefore, effects of rotations should be eliminated from the training shapes in computing the shape covariance matrix .2.2 Gaussian mixture model for point set registration
We summarize the Gaussian mixture modeling approach for solving point set registration problems proposed by Myronenko et al. [12] as this approach is the basis for the proposed algorithm. They defined a registration problem of two point sets as a problem of probabilistic density estimation, where one point set is composed of centroids for a GMM, and the other consists of samples generated from the GMM. Here, we refer to a point set to be deformed as a source point set and the other point set that remains fixed as a target point set. We also refer to a point that is irrelevant to the true object geometry as an outlier and one that represents the true object geometry as an inlier.
Definition of the model
Let and be the th point in a target point set and the th point in a source point set , respectively. The probabilistic model for registering the two point sets and is designed as a mixture model for generating target point
in the fourstep procedure: (1) a label is selected as an outlier or an inlier based on the Bernoulli distribution with outlier probability
, (2) if the label is an inlier, a point is selected from the source point set with equal probability , (3) the selected point is moved by the transformation model , and (4) target pointis generated by the Gaussian distribution whose center is the moved point. More formally, the mixture model is defined as follows:
(4) 
where is a set of parameters of the mixture model, and is a distribution of outliers. The prior distribution of inliers is defined as . The inlier distribution for is defined as a Gaussian distribution
(5) 
where
is the variance of the Gaussian distribution,
is a transformation model for source point with a set of parameters , and is a set of parameters of the GMM.Problem definition as a probabilistic density estimation
Under the model construction, the problem is to find the best transformation that matches the target point set . Because the parameter characterizes the transformation , the solution of the point set registration problem is obtained by finding the best parameter . More formally, the point set registration problem is defined as a probabilistic density estimation as follows:
(6) 
where , and is the regularizer in the motion coherence theory [63]. We note that this formulation is a special case of the general definition of the point set registration problems, defined as Eq. (1), as the negative loglikelihood function is a type of loss function that measures the dissimilarity between geometric shapes represented as point sets.
Optimization by the EM algorithm
Because the analytic solution for Eq. (6) is not available, the EM algorithm is used to search for a local minimum of the function. The EM algorithm iteratively improves a solution by updating the upper bound of the negative loglikelihood function, called the function. Given a current estimate , the function for the GMM is derived as
(7) 
where is the effective number of matching points, and
denotes the posterior probability that source point
is correspondent to target point under the current estimate . The posterior probability can be calculated as(8) 
Based on the theory of the EM algorithm, a solution of the point set registration problem is obtained by iterating the following procedure: (i) updating the posterior probability , (ii) finding the that minimizes the function for , given the parameter set , and (iii) replacing the given parameter set with the minimizer of the function. This procedure is iterated until a suitable convergence criterion is satisfied.
Relation to ICPs and the automatic radius control
There is a close relationship between the methods based on the GMM and iterative closest point algorithms (ICPs), which are more common in computer graphics. Let be an index set composed of the neighbors of in the target point set . The ICPs for point set registration problems can be defined as follows:
(9) 
where is the matching weight between point and . The loss functions of ICPs and the GMM are the sum of the squared distance weighted by and , respectively. Therefore, the GMM and ICPs are the same in that they are included in the general class of the point set registration problem, defined as Eq. (1), with the same type of the loss functions.
On the contrary, there exists a difference in the definitions of the registration problems that yields the difference in the optimization trajectories and resulting deformed shapes: the residual variance . It can be intuitively interpreted as the parameter that defines the set of neighbors since and the corresponding loss term in the function become nearly zero if the distance between and is sufficiently large e.g., . Therefore, the sequential update of based on the EM algorithm dynamically changes the set of neighbors with a mutable radius during the optimization. For the ICPs, the set of neighbors is typically set to the closest point to the target surface [35]. We, therefore, call the characteristic that yields the difference between the GMM and ICPs the automatic radius control.
2.3 Our approach
We propose a novel algorithm called dependent landmark drift (DLD) for registering the mean shape and a novel point set. The algorithm uses the same GMM framework as the CPD algorithm. The merits of using a GMM, such as the explicit modeling of outliers and the automatic radius control, are thereby inherited by the proposed algorithm. The main difference between the CPD and the DLD is in the definition of transformation models for a source point set: the transformation model of the CPD is based on motion coherence, i.e., moving points under a smooth displacement field, whereas that of the DLD is a combination of a statistical shape model, a similarity transformation, and motion coherence.
Statistical shape model as a transformation model
We first describe that statistical shape models can be utilized as a transformation model for GMMbased point set registration. Suppose is the matrix notation of shape variations. We also suppose is the submatrix of that corresponds to the th landmark in the mean shape . Then, the statistical shape model (2) is denoted by a pointbypoint transformation model
(10) 
where is the th point in shape , is a weight vector for shape variations, and is a subvector of the residual vector corresponding to landmark . Therefore, the statistical shape model can be used as a transformation model of a point set registration. A merit of using statistical shape models for point set registration problems is that moves of landmarks are estimated based on the statistical dependency of landmark displacements. We therefore call our algorithm dependent landmark drift.
Transformation model
We incorporate prior shape information into the transformation model. We suppose that the training phase has already completed, that is, the mean shape and shape variations , also denoted by , have been calculated from training shapes. We then define the transformation model as a combination of the similarity transformation and the statistical shape model, encoding prior shape information as follows:
(11) 
where is a scale parameter, is a rotation matrix, and is a translation vector. We incorporated the similarity transformation into the transformation model because the effects of scale, translation, and rotation are typically eliminated from the statistical shape model during training. We refer to the triplet as location parameters and as shape parameters. We note that the transformation model is different from [42, 44] in that our model combines the similarity transformation and the nonrigid transformation with a PCA model, whereas their transformation model is defined as the PCA model only.
Definition of function
We use the Gaussian mixture model, defined as Eqs. (4) and (5), as the basis of solving the point set registration problems. We suppose is the th largest eigenvalue of the covariance matrix computed from training data and is the diagonal matrix defined as , where denotes the operation of converting a vector into a diagonal matrix. Replacing the transformation model in Eq. (7) by , we define the function of the point set registration problem as
(12) 
where , , and . We used the Tikonov regularizer as a regularization term to avoid searching for extreme shapes, where is a parameter that controls the search space of . We note that this regularization is interpreted as a smooth displacement field because the resulting transformed shape becomes increasingly similar to the mean shape as increases, i.e., the neighboring points move coherently. The prior shape information in our approach is therefore a combination of the statistical shape model, the similarity transformation, and motion coherence.
Outlier distribution and matching probabilities
We note that the outlier distribution included in the mixture model defined as Eq. (4) is an arbitrary choice according to applications. For example, can be removed from the mixture model if no outliers exist in the target point set. To register point sets with outliers, we use a
dimensional uniform distribution defined as
(13) 
where is the volume of the region in which outliers can be generated. The volume
is of course unknown and we therefore use the minimumvariance unbiased estimator of the
dimensional uniform distribution as an estimate of the volume , which is estimated from the target point set .In this setting of the outlier distribution, the posterior matching probability , which is defined as Eq. (8), is reformulated as
(14) 
where and . We also note that the computation of the matching probabilities can be prohibitive if and are relatively large. The acceleration of this bottleneck computation will be discussed in Section 2.4.
Optimization
Based on the theory of the EM algorithm, a solution to the point set registration problem is obtained by iterating the following procedure: (i) updating the posterior probability , (ii) finding the that minimizes the function for given the current estimate , and (iii) replacing the given parameter set with the minimizer of the function. Because the analytic solution is unavailable, we divide the optimization procedure (ii) into three steps:
(a)  Optimization of shape parameter and translation vector given , 

(b)  Optimization of location parameter given , 
(c)  Optimization of residual variance given . 
For each step, it is possible to find the exact minimizer of the function given corresponding fixed parameters. For steps (a) and (c), the exact minimizers are obtained by taking partial derivatives of , and equating them to zero. For step (b), must be optimized under the orthonormality condition as is a rotation matrix. This constrained optimization problem can be analytically solved by using the result reported in [64, 65].
The mean shape and shape variations are updated in step (b) after updating the scale parameter and the rotation matrix because and are dependent on the coordinate system in which the statistical shape model is defined. Translation vector is updated in both (a) and (b) of the Mstep to find a better approximation of the exact minimizer
in a heuristic manner. The residual variance
can be estimated in the same manner as in the general case of the GMMbased registration, and thereby, the automatic radius control in the GMM is retained.Adaptive control of the smoothness of a displacement field
The choice of parameter , defined in Eq. (7), is crucial to performing accurate registration because controls the smoothness of a displacement field. If is excessively large, the shape deformation is less affected by the target point set and tends to be more conservative. On the contrary, if is excessively small, the registration often fails as the displacement field becomes less smooth and the optimization tends to be stuck in local minima. To relax the difficulty in choosing an appropriate , we adopt an optional search strategy based on the adaptive control of ; we initially set to a moderately large value, and if the optimization process approaches convergence, we set it to zero or a value close to zero. The rationale of this approach is to register two point sets in a hierarchical manner; the average shape is deformed globally and conservatively in the early stage of the optimization, and is deformed more locally and flexibly near the convergence.
2.4 Linear time algorithm
We show that the computational cost of one step of the EM optimization described in the previous subsection can be reduced to linear, i.e., . The bottleneck of the EM optimization is the computation of matching probabilities for all and ; the naïve computation requires computation. The key to reducing the computational cost is to use the Nyström method [66], which avoids the direct evaluation of all the matching probabilities. The resulting linear time algorithm that we propose is shown in Figure 1.
Key observations for reducing the computational cost
Here, we summarize the idea for reducing the computational cost of the EM optimization. Suppose
is a Gaussian affinity matrix, the
th element of which is defined as . Key observations for reducing the computational cost are listed as follows:
All computations related to matching probabilities in the Mstep of the EM algorithm can be expressed as one of the products , , and .

The computations of , , and can be replaced by the products related to .

The matrixvector products related to can be calculated in time by using the Nyström method.

All matrices of size and , e.g., , , , and , are diagonal, and their products can be computed in time.
The first observation is obtained by solving the optimization subproblems for the Mstep and factoring out , , and from each term of their solutions, as shown in the Algorithm 1. The fourth observation is easily obtained by avoiding the evaluation of offdiagonal elements in the diagonal matrices. The number of nonzero elements of each diagonal matrix is at most , and thereby, the computational cost of evaluating the products related to the diagonal matrices is . We next see how the second and third observations are obtained.
Computation via the Gaussian affinity matrix
The second observation is guaranteed by [12]. In the same manner as the CPD, the products , , and can be replaced by the matrixvector products related to the Gaussian affinity matrix as the definitions of the products , , and are exactly the same as those of the CPD except for the definition of the outlier distribution. Suppose is the vector defined as
where “” denotes elementwise division and is a constant. Then, the matrixvector products , , and can be reformulated by using and as follows:
(15) 
where denotes the operation for converting a vector into a diagonal matrix. That is, all the products in the bottleneck computations , , and can be computed through the products related to . We note that the matrixmatrix products can be expressed as the sum of the matrixvector products related to as follows:
where “” denotes elementwise product and is the th column of . Therefore, if the matrixvector products and for arbitrary vectors and are computed in time, all the bottleneck computations , , and can also be computed in time.
Nystöm approximation for reducing the computational cost
We then proceed to the third observation. The matrixvector products and can be approximated in time by using the Nysröm method [66], which is a technique of generating a lowrank approximation of a Gram matrix without its eigendecomposition in a computationally efficient manner. The Gaussian affinity matrix for all points in the combined point set is a Gram matrix, and thereby, the Nyström method is applicable to the approximation of its submatrix . Suppose is the set of points that is randomly chosen times from the combined point set without replacement. Based on the theory of the Nyström method, the Gaussian affinity matrix can be approximated as follows:
(16) 
where , , and are the Gaussian affinity matrices between and , for the subset itself, and between and , respectively. Therefore, if the number of sampled points is much less than and , the approximation of the products or can be efficiently calculated by evaluating matrixvector products on the righthand side of the following equations
(17) 
from right to left. The computational costs of evaluating the righthand side equations are ; i.e., if we assume that and are constants. One issue of using the Nyström method is the tradeoff between speed and accuracy; setting to a small value leads to fast but less accurate computations. Therefore, if the residual variance is sufficiently small, we directly evaluate Eqs. (15), reducing the computational cost by ignoring the points that are sufficiently distant from the mean point of a Gaussian function. The points that lie within a small distance from the mean point can be efficiently found by using the KD tree search.
Total computational cost of the EM algorithm
The computations of the remaining terms in Algorithm 1 can be evaluated in time. The product in Algorithm 1 is computed in time because . Furthermore, the initialization step requires computations because can be computed by evaluating matrixvector products of the righthand side equation from left to right. Therefore, the total computational cost of the initialization and one iteration of the EM algorithm is proportional to .
We also note that the Nysröm method is typically faster than the improved fast Gauss transform [67], a more conventional technique for reducing the computational cost of evaluating the matching probabilities. The IFGT is the method for accelerating the sum of Gaussian functions by using the truncated Taylor series expansion. An issue with the IFGT is that its computational cost depends on the radius of the Gaussian functions; the modified IFGT algorithms that are practically faster than the original version require direct computation, i.e., computation, in a worstcase scenario [68, 69]. If we use the IFGT as an alternative method for speeding up the Estep, direct computation cannot be avoided because the radius, which corresponds to residual variance in our method, gradually decreases as the optimization proceeds. On the contrary, the computational cost of the Nyström method is unaffected by the radius of the Gaussian function, and is always during the optimization.
3 Experiments
In this section, we report the registration performance and the computational time of the proposed algorithm. The registration performance will be tested through experimental comparisons with supervised learning approaches for 3D examples and unsupervised learning approaches for 2D examples. The supervised learning approaches we compared were GMMASM
[42] and an ICP method motivated by [52, 21, 50], whereas the unsupervised ones were CPD [12] and TPSRPM [6] for which the authors of the papers distributed highquality implementations. The time taken for computing pairwise matching probabilities for the two approximation methods, the Nyström method [66] and the IFGT [67, 69] will be compared with the direct computations.3.1 Datasets and training statistical shape models
We used and the FLAME dataset [49] and the SCAPE dataset [37] as 3D examples and the IMM hand and face datasets [17] as 2D examples. The SCAPE dataset includes 71 human body shapes with different body postures, each of which is composed of 12,500 points. The ground truth data for pointbypoint correspondence were provided for all 71 point sets. The FLAME dataset constitutes timelapse 3D face models with various facial expressions for 10 people. The number of face models included in the dataset is 46,905, and each face model is composed of 5,023 points with pointbypoint correspondence. Among them, we randomly extracted 950 faces without considering the differences in facial expressions and human identities. For each face model, we used 3,929 points obtained by removing 1,094 points corresponding to eyeballs to visualize the point sets around the eyes clearly. For IMM datasets, each human hand and face is composed of 56 and 58 points. The points in the shapes were correspondent across all shapes for each dataset. The number of shapes in the hand dataset was 40, and the face dataset consisted of 240 shapes, i.e., six faces with different angles per person for 40 different people.
As mentioned in the previous section, the effects of scale, rotation, and translation are typically removed from the training data to learn the statistical shape model. Therefore, we preprocessed the shapes in the original datasets as follows: For each dataset, we first calculated the mean shape as the sample average of all shapes. We then constructed training data by computing shapes with the best rigid alignment with the mean shape. Note that, from Result 7.1 described in [62], it is possible to find the best rigid transformation to match two shapes, i.e., two point sets with pointbypoint correspondence. For the SCAPE and FLAME datasets, we repeated the computation process of the mean shape three times to improve the quality of the mean shape. Finally, for each body shape, we multiplied the scale factor to normalize the size of the mean shape for more interpretable experimental results. The scale factor was obtained by setting the volume of the cuboid that envelopes the mean shape to one. For the SCAPE and IMM datasets, the statistical shape model required for the registration was trained in a leaveoneout manner, i.e., one for a validation point set and the others for training shapes. For the FLAME dataset, we used 900 shapes as a training dataset and the remaining 50 shapes as a test dataset.
3.2 Application to the SCAPE dataset
Here, we report the applications to the SCAPE data set, which is composed of 71 human bodies with various body postures. The primary purpose of the applications is to compare (1) the registration accuracy of the DLD with that of the GMMASM [42], and (2) the CPU times of the DLD with or without the acceleration by the approximation techniques. Throughout the applications to the SCAPE dataset, we use the common parameters .
We first report the application to several human body shapes for the demonstration purpose. To demonstrate the robustness against missing regions, we generated a point set by removing 2,000 points from the legs and 2,000 points above the shoulder. The number of points used in the Nyström method was set to 500. Figure 2 shows the progress of the optimization for human body No. 1 with missing regions. In this figure, each optimization proceeds from the left to the right. The DLD algorithm succeeded in the registration for the body shape with missing regions, suggesting that it has robustness against missing regions. Figure 3 shows typical examples of success trials and a failure trial for different body postures. These examples suggested that our algorithm worked effectively for different body postures without any extra information such as colors, surface normal, and human body articulation. However, the examples also suggested that the registration error could not always be removed.
Performance comparison
We compared the registration accuracy of the DLD algorithm with that of the GMMASM [42]. By this comparison, we aim at quantifying the improvement in the registration performance, originating in two differences between the methods: (1) the simultaneous optimization of the shape and location parameters and (2) the adaptive control of the smoothness of the displacement field. To this aim, the comparison with the GMMASM is reasonable because the DLD algorithm is the same as their method except for the above two differences and the acceleration by the Nyström method.
To evaluate the robustness against the similarity transformation, we generated target point sets by adding combinations of the rotation and translation to each body shape. We changed the rotation angle from to at intervals of and changed the translation distance from to at intervals of . The maximum translation distance corresponds to roughly onefourth of the height for the mean shape. For the GMMASM, the mean shape was initially aligned to each body shape by using the rigid CPD [12] to minimize the disadvantage in the definition of the transformation model. Here, we used the IFGT with error bound as an alternative accelerating method for both methods to avoid the effect of randomness, originating in the random sampling scheme in the Nyström method. Figure 4 shows the number of successes for each combination of a rotation and a translation. A trial was defined as a success if the maximum distance between a point in a deformed shape and the corresponding point in a target shape was less than 0.15. In comparison with the GMMASM, the DLD was robust against both the rotation and translation; further, the effect of the translation was quite small. On the contrary, the GMMASM was less robust especially against the translation despite of the fact that the prealignment by the rigid CPD was conducted before fitting the SSM. This result shows that the two differences between these methods contributed to the improvement in the registration accuracy.
Computational time
We measured the CPU time of the proposed algorithm using the human body No. in the SCAPE dataset and the average shape calculated for the dataset. To evaluate the scalability of the DLD algorithm, we generated 10 pairs of point sets with the same numbers of points by randomly sampling 1250, 2500, , 12,500 points without replacement from the target point set and the average shape. We used a MacBook Pro (Retina 15inch, Early 2013, OS X El Capitan 10.11.6) with a 2.4 GHz Intel Core i7 and 16 GB RAM as our computational environment. We implemented the DLD algorithm in the C language, and used GCC 6.0 as a C compiler. We compared the direct method for computing the pairwise probability with two approximation methods: the IFGT [67, 69] and the Nystöm method [66]. The IFGT was implemented according to the technical paper [69], and the IFGT error bound was set to . The number of points used for the Nyström method was set to 500. We stopped the computation if the improvement of the loglikelihood during one iteration of the EM algorithm was below . Figure 5 shows the resulting CPU times for the three methods. For the point sets with 1,250 points, the CPU times for the Nyström method, the IFGT, and the direct computation were 5.19, 8.56, and 5.41 s, respectively. For the point sets with 12,500 points, i.e., the most severe condition, the CPU time for the Nyström method was 105.4 s, which was 12.2 times faster than the direct computation and 5.6 times faster than the IFGT. From the figure, we observe that the direct computation was overlinear. We also observe that the IFGT was slightly overlinear, suggesting that the direct computation included in the IFGT framework was used during the optimization. The computational time of the Nyström method was linear, which is consistent with the analysis of the computational cost described in the previous section.
3.3 Application to the FLAME dataset
We report the applications to the FLAME dataset, which constitutes timelapse 3D face models with various facial expressions. To evaluate the registration accuracy of the proposed method, we implemented an ICPbased method motivated by [52, 21, 50] as a competitor. The method we implemented as a competitor is summarized as follows: (1) the energy function is defined on the basis of the theory of a scaled GemanMcClure estimator, (2) the transformation model is the same as that of the proposed method, i.e., , and (3) the shape and location parameters are optimized by updating them and the matching weight alternatively. The energy function is formally defined as follows:
where is the matching weight between and , and is a parameter that controls the range within which residuals have a significant effect on the energy function. Figure 6 shows the apparent difference in the optimization trajectories for the DLD and the GemanMcClure estimator. For the DLD, the mean shape was shrunk and largely translated after one step of the optimization because of the automatic radius control. On the contrary, for the GemanMcClure estimator, the mean shape was not shrunk, and the translation distance was considerably small after one step of the optimization. The complete optimization trajectories are available in Supplementary Video 2.
To evaluate the robustness against the similarity transformation of target point sets, we added the combinations of the rotation and translation in a similar manner as the applications to the SCAPE dataset. We changed the rotation angle from to at intervals of and the translation distance from to at intervals of . The maximum translation distance roughly corresponds to the length of the average face. We used the parameters for the DLD and for the GemanMcClure estimator.
Figure 7 shows the result of the comparison. Here, a trial was defined as a success if the maximum distance between a point in a deformed shape and the correspondent point in the target shape was less than 0.05. The DLD was more robust against both the rotation and translation than the GemanMcClure estimator, and the success rate was nearly 100 % if the rotation angle was less than or equal to . We note that the success rate sharply dropped to 0 %. This was because the DLD tried to register point sets upsidedown when the rotation angle was greater than or equal to . The GemanMcClure estimator was also robust against the rotation and translation. The registration accuracy was, however, relatively less than the DLD.
3.4 Illustrative results for 2D datasets
True shape  

Target data  
Initial Shape  
DLD  
CPD  
TPSRPM 
To test the robustness of our approach, we generated four types of target point data for hand No. in the IMM hand data. The top row in Figure 8 shows the correct target shapes, i.e., hand No. in the IMM dataset, and the second row shows target point sets, which represent hand No. with four types of modifications: (a) replication of target points with dispersion, (b) deletion of target points, (c) addition of outliers, and (d) rotation of the whole shape. Here, we used hand No. selected from the original dataset as a validation point set instead of the preprocessed one for training a statistical shape model, to prevent the test problem from becoming too easy. The red points shown in figures of the third row are the initial hand shapes to be deformed, i.e., the mean shape of the hands, before optimization.
Choice of parameters
For all four target point sets, we fixed and , which are the parameters of the DLD, to and . On the contrary, we used the remaining parameter for (a), (b), and (d) and for (c) as outliers were included in (c), but not included in (a), (b), and (d). For CPD and TPSRPM, we used the software distributed by the authors of CPD and TPSRPM with their default parameters. For the DLD, we used 39 hand shapes as training data with target hand shape No. removed.
Results
The fourth, fifth, and sixth rows in Figure 8 show the results of registrations using the DLD, CPD, and TPSRPM algorithms, respectively. The DLD yielded the best registration for all data. For all hand shapes deformed by CPD and TPSRPM, images of all fingers except the thumb were rendered thinner than those of the true target shape, suggesting the weakness in assuming only a smooth displacement field as prior shape information: a point was forced to be displaced coherently with its neighboring points, even for a different finger. For data (a), all methods roughly recovered the true target shape, whereas thinner fingers were observed for the CPD and TPSRPM. For data (b), the DLD recovered the missing points correctly. On the contrary, the CPD and TPSRPM did not recover the missing points, and the thumb and all fingers were rendered shorter in their results than in the true target shapes. This suggests a weakness in the computation of the missing regions. This weakness originates from assuming only a smooth displacement field: source points moved much more coherently as the target points did not exist near them. For data (c), the DLD found the true target shape, whereas the deformed shapes generated by CPD and TPSRPM were close to the mean shape rather than the true target shape. For data (d), all methods roughly recovered the true target shape, whereas thinner fingers were observed in results for the CPD and TPSRPM. The DLD succeeded in recovering the true hand shape.
3.5 Comparison of registration performance in 2D cases
Target data  
Replication  
Deletion  
Outliers  
Rotation 
To evaluate the registration performance of the DLD more precisely, we compared the DLD, CPD, and TPSRPM using the same IMM hand dataset and the IMM face dataset under various conditions. As the true target shapes to be estimated, we used hand No. and face f shown in the first row of Figure 9. For the IMM hand dataset, we used hand shapes with hand No. removed from all shapes as a set of training data for the DLD, whereas 234 face shapes, with all six faces of human No. removed, were used as the other set of training data for the DLD.
Generating validation datasets
As in the experiment in the previous subsection, we generated four types of target data for hand No.
: (a) 20time replication of target points with dispersion, (b) random deletion of target points, (c) addition of outliers that follow uniform distributions, and (d) rotation of the whole shape. In generating the four types of target data, we changed the following data generation parameters: (a) standard deviation of the Gaussian distribution for replicating target points, (b) missing rate, (c) signaltonoise ratio, and (d) rotation angle. To repeat the experiments and reduce the influence of randomness on registration performance, we generated target point sets 20 times for (a), (b), and (c) in the same setting. For (d), a single set of target points was generated for each rotation angle, since there is no randomness in the case of rotation.
Definition of registration accuracy
The DLD, CPD, and TPSRPM were applied to each target data item, and the registration accuracy was calculated for each. For target data (a), (b), and (c), the average and standard errors in the registration accuracy were calculated. Registration accuracy was defined as the rate of correct matching between a true target shape and a deformed mean shape following registration. More formally, it was defined as follows:
Pointbypoint correspondence to compute registration accuracy was estimated using the nearestneighbor matching between the deformed mean shape and the true target shape, i.e., the shape before modifications such as deletion of points or addition of replicated points and outliers.
Choice of parameters
We used the default parameters for the CPD and TPSRPM implemented in their software. For the DLD, we fixed the number of shape variations to for the hand dataset and for the face dataset. We set the parameter of the DLD to for (a), (b), and (d), whereas it was set to for (c) because the former datasets did not include outliers and the latter included outliers. We changed the parameter of the DLD to , , and without the adaptive control of to investigate the influence of on registration performance.
Results
The second row in Figure 9 shows the results of a comparison between target shapes and replicated target points. The axis represents the standard deviation of replicating target points whereas the axis shows registration accuracy. For the hand datasets, the registration performance of DLD was insensitive to the regularization parameter and the DLD outperformed the CPD and TPSRPM in almost all cases. For face datasets, the DLD with achieved the best registration performance, whereas the DLD with was less accurate than the CPD.
The results for target shapes with random missing points are shown in the third row of Figure 9. The axis represents the missing rate and the axis registration accuracy. The DLDs with and were the most stable for the target shapes, and they outperformed CPD and TPSRPM in nearly all cases. On the contrary, the DLD with was moderately less accurate than CPD for the hand dataset. This suggests that a that is too large may, in some cases, reduce registration accuracy because of the large bias originating in the prior shape information.
The fourth row in Figure 9 shows the results for target shapes with outliers following a 2D uniform distribution with and for the hand dataset and that with and for the face dataset. The axis of each figure represents the signaltonoise ratio and the axis represents registration accuracy. The DLD achieved the best registration performance for both datasets in nearly all cases, showing its robustness against outliers. When the signaltonoise ratio was , the DLD was less accurate than the CPD and TPSRPM. This suggests that an inappropriate choice of noise probability reduces registration accuracy.
The fifth row in Figure 9 shows the results for rotated target shapes. The axis and the axis represent the rotation angle and registration accuracy, respectively. For the hand dataset, the registration accuracy of TPSRPM was considerably stable and was unaffected by the rotation angles, at least in the range . For the face dataset, the DLD achieved the best performance in nearly all cases, suggesting its robustness to the rotation.
4 Conclusion
Many existing algorithms for solving point set registration problems employ the assumption of a smooth displacement field, that is, the local geometry of the point set is preserved. Because of this assumption, registration problems are elegantly solved by these algorithms in various cases. This means that the registration performance can be improved using prior shape information instead of sacrificing the applicability of the algorithms. In this paper, we proposed a novel point set registration algorithm based on a Gaussian mixture model. We used a PCAbased statistical shape model to encode prior shape information, which was combined with the assumption of a smooth displacement field. The proposed algorithm works effectively if the target point set is rotated as location parameters corresponding to a similarity transformation are simultaneously estimated in addition to shape deformation. Our algorithm is also scalable to large point sets because of the presence of a lineartime algorithm, which was verified through the application to the SCAPE dataset. To evaluate the registration performance of the proposed algorithm, we compared it with the supervised learning approaches, the GMMASM and an ICP with the GemanMcClure estimator using the SCAPE and FLAME datasets. We also compared it with the the unsupervised learning approaches, CPD and TPSRPM using the IMM hand dataset and the IMM face dataset with four types of modifications: replication of target points with dispersion, random deletion of target points, addition of outliers, and rotation of the entire shape. The proposed algorithm outperformed the GMMASM, an ICPbased method with the GemanMcClure estimator, CPD and TPSRPM in almost all cases, showing its effectiveness for various types of point set registration problems.
Conflict of Interest
None declared.
References
 [1] L. G. Brown, A survey of image registration techniques, ACM Computing Surveys 24 (4) (1992) 325–376.
 [2] P. Besl, N. McKay, A Method for Registration of 3D Shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence 14 (2) (1992) 239–256.
 [3] S. Rusinkiewicz, M. Levoy, Efficient variants of the ICP algorithm, in: Proceedings Third International Conference on 3D Digital Imaging and Modeling, IEEE Comput. Soc, 2001, pp. 145–152.
 [4] A. W. Fitzgibbon, Robust registration of 2D and 3D point sets, Image and Vision Computing 21 (1314) (2003) 1145–1153.

[5]
A. Makadia, A. Patterson, K. Daniilidis,
Fully Automatic Registration
of 3D Point Clouds
, in: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’06), IEEE Comput. Soc, 2006, pp. 1297–1304.
 [6] H. Chui, A. Rangarajan, A feature registration framework using mixture models, in: Proceedings of IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA’00), IEEE Comput. Soc, 2000, pp. 190–197.
 [7] H. Chui, A. Rangarajan, A new point matching algorithm for nonrigid registration, Computer Vision and Image Understanding 89 (23) (2003) 114–141.
 [8] B. Jian, B. C. Vemuri, Robust point set registration using Gaussian mixture models, IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (8) (2011) 1633–1645.
 [9] J. Chen, J. Ma, C. Yang, L. Ma, S. Zheng, Nonrigid point set registration via coherent spatial mapping, Signal Processing 106 (2015) 62–72.
 [10] Y. Yang, S. H. Ong, K. W. C. Foong, A robust global and local mixture distance based nonrigid point set registration, Pattern Recognition 48 (1) (2015) 156–173.
 [11] A. Myronenko, X. Song, M. Á. CarreiraPerpiñán, Nonrigid point set registration: Coherent Point Drift, in: Proceedings of Advances in Neural Information Processing Systems 19 (NIPS’06), MIT press, 2006, pp. 1009–1016.
 [12] A. Myronenko, X. Song, Point set registration: Coherent point drift, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (12) (2010) 2262–2275.
 [13] J. Ma, W. Qiu, J. Zhao, Y. Ma, A. L. Yuille, Z. Tu, Robust L2E estimation of transformation for nonrigid registration, IEEE Transactions on Signal Processing 63 (5) (2015) 1115–1129.
 [14] J. Ma, J. Zhao, A. L. Yuille, NonRigid Point Set Registration by Preserving Global and Local Structures, IEEE Transactions on Image Processing 25 (1) (2016) 53–64.
 [15] Z. Zhang, Iterative point matching for registration of freeform curves and surfaces, International Journal of Computer Vision 13 (2) (1994) 119–152.
 [16] S. Granger, X. Pennec, Multiscale EMICP: A Fast and Robust Approach for Surface Registration, in: Proceedings of the 4th European Conference on Computer Vision (ECCV’02), SpringerVerlag, 2002, pp. 418–432.
 [17] C. Stewart, ChiaLing Tsai, B. Roysam, The dualbootstrap iterative closest point algorithm with application to retinal image registration, IEEE Transactions on Medical Imaging 22 (11) (2003) 1379–1394.
 [18] A. Rangarajan, H. C. I, F. L. Bookstein, The Softassign Procrustes Matching Algorithm, Lecture Notes in Computer Science 1230 (1997) 29–42.
 [19] D. Chetverikov, D. Stepanov, P. Krsek, Robust Euclidean alignment of 3D point sets: The trimmed iterative closest point algorithm, Image and Vision Computing 23 (3) (2005) 299–309.
 [20] Y. Tsin, T. Kanade, A correlationbased approach to robust point set registration, in: Proceedings of the 6th European Conference on Computer Vision (ECCV’04), SpringerVerlag, 2004, pp. 558–569.
 [21] Q.Y. Zhou, J. Park, V. Koltun, Fast global registration, in: Proceedings of the 14th European Conference on Computer Vision (ECCV’16), Lecture Notes in Computer Science, Springer International Publishing, Cham, 2016.
 [22] D. Mateus, R. Horaud, D. Knossow, F. Cuzzolin, E. Boyer, Articulated shape matching using Laplacian eigenfunctions and unsupervised point registration, in: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2008 (CVPR’08), IEEE Comput. Soc, 2008, pp. 1–8.
 [23] R. Horaud, F. Forbes, M. Yguel, G. Dewaele, J. Zhang, Rigid and articulated point registration with expectation conditional maximization, IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (3) (2011) 587–602.
 [24] S. Ge, G. Fan, Nonrigid articulated point set registration with Local Structure Preservation, in: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops 2015 (CVPR’15), IEEE Comput. Soc, 2015, pp. 126–133.
 [25] S. Ge, G. Fan, Nonrigid Articulated Point Set Registration for Human Pose Estimation, in: 2015 IEEE Winter Conference on Applications of Computer Vision, IEEE, 2015, pp. 94–101.
 [26] I. Kolesov, J. Lee, G. Sharp, P. Vela, A. Tannenbaum, A Stochastic Approach to Diffeomorphic Point Set Registration with Landmark Constraints, IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (2) (2016) 238–251.
 [27] V. Golyanik, B. Taetz, G. Reis, D. Stricker, Extended coherent point drift algorithm with correspondence priors and optimal subsampling, in: Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV’16), IEEE, 2016, pp. 1–9.
 [28] T. Cootes, C. Taylor, D. Cooper, J. Graham, Active Shape ModelsTheir Training and Application, Computer Vision and Image Understanding 61 (1) (1995) 38–59.
 [29] T. F. Cootes, C. Taylor, Statitical models of appearance for computer vision, Technical report in University of Manchester (2004) 1–124.
 [30] T. Heimann, H. P. Meinzer, Statistical shape models for 3D medical image segmentation: A review, Medical Image Analysis 13 (4) (2009) 543–563.
 [31] F. P. M. Oliveira, J. M. R. S. Tavares, Medical image registration: a review, Computer Methods in Biomechanics and Biomedical Engineering 17 (2) (2012) 73–93.
 [32] A. Sotiras, C. Davatzikos, N. Paragios, Deformable Medical Image Registration: A Survey, IEEE Transactions on Medical Imaging 32 (7) (2013) 1153–1190.
 [33] M. Berger, P. Alliez, A. Tagliasacchi, L. M. Seversky, C. T. Silva, J. a. Levine, A. Sharf, State of the Art in Surface Reconstruction from Point Clouds, Proceedings of the Eurographics 2014, Eurographics STARs (2014) 161–185.
 [34] S. Bouaziz, M. Pauly, Dynamic 2D/3D registration for the Kinect, ACM SIGGRAPH 2013 Courses on  SIGGRAPH ’13 (2013) 1–14.
 [35] S. Bouaziz, A. Tagliasacchi, M. Pauly, H. Li, Modern techniques and applications for realtime nonrigid registration, SIGGRAPH ASIA 2016 Courses on  SA ’16 (2016) 1–25.
 [36] V. Blanz, T. Vetter, A morphable model for the synthesis of 3D faces, in: Proceedings of the 26th annual conference on Computer graphics and interactive techniques  SIGGRAPH ’99, ACM Press, New York, New York, USA, 1999, pp. 187–194.
 [37] D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, J. Davis, SCAPE: Shape Completion and Animation of People, in: ACM SIGGRAPH 2005 Papers on  SIGGRAPH ’05, ACM Press, New York, New York, USA, 2005, pp. 408–416.
 [38] S. I. Park, J. K. Hodgins, Datadriven modeling of skin and muscle deformation, ACM Transactions on Graphics 27 (3) (2008).
 [39] A. Jain, T. Thormahlen, H.P. Seidel, C. Theobalt, MovieReshape: Tracking and Reshaping of Humans in Videos, ACM Transacons on Graphics 29 (5) (2010) No. 148.
 [40] Y. Yang, Y. Yu, Y. Zhou, S. Du, J. Davis, R. Yang, Semantic Parametric Reshaping of Human Body Models, in: 2014 2nd International Conference on 3D Vision, IEEE, 2014, pp. 41–48.

[41]
H. Hufnagel, X. Pennec, J. Ehrhardt, N. Ayache, H. Handels, Generation of a statistical shape model with probabilistic point correspondences and the expectation maximization iterative closest point algorithm, International Journal of Computer Assisted Radiology and Surgery 2 (5) (2008) 265–273.
 [42] A. Rasoulian, R. Rohling, P. Abolmaesumi, Groupwise registration of point sets for statistical shape models, IEEE Transactions on Medical Imaging 31 (11) (2012) 2025–2034.
 [43] N. Ravikumar, A. Gooya, A. F. Frangi, Z. A. Taylor, Robust groupwise rigid registration of point sets using tmixture model 9784 (2016) 1–14.
 [44] Y. Wang, J. Z. Cheng, D. Ni, M. Lin, J. Qin, X. Luo, M. Xu, X. Xie, P. A. Heng, Towards personalized statistical deformable model and hybrid point matching for robust MRTRUS registration, IEEE Transactions on Medical Imaging 35 (2) (2016) 589–604.
 [45] J. Kruger, J. Ehrhardt, H. Handels, Probabilistic appearance models for segmentation and classification, Proceedings of the IEEE International Conference on Computer Vision 1118Dece (2016) 1698–1706.
 [46] F. Bernard, L. Salamanca, J. Thunberg, A. Tack, D. Jentsch, H. Lamecker, S. Zachow, F. Hertel, J. Goncalves, P. Gemmar, Shapeaware surface reconstruction from sparse 3D pointclouds, Medical Image Analysis 38 (2017) 77–89.
 [47] M. B. Stegmann, D. D. Gomez, A brief introduction to statistical shape analysis (2002).
 [48] D. Cosker, E. Krumhuber, A. Hilton, A FACS Valid 3D Dynamic Action Unit Database with Applications to 3D Dynamic Morphable Facial Modellling, IEEE International Conference on Computer Vision (ICCV).
 [49] T. Li, T. Bolkart, M. J. Black, H. Li, J. Romero, Learning a model of facial shape and expression from 4D scans, ACM Transactions on Graphics 36 (6) (2017) 1–17.

[50]
D. A. Hirshberg, M. Loper, E. Rachlin, M. J. Black, Coregistration: Simultaneous alignment and modeling of articulated 3D shape, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7577 LNCS (PART 6) (2012) 242–255.
 [51] J. Romero, D. Tzionas, M. J. Black, Embodied Hands: Modeling and Capturing Hands and Bodies Together, ACM Transactions on Graphics 36 (6) (2017) 1–17.
 [52] B. Allen, B. Curless, Z. Popović, The space of human body shapes: reconstruction and parameterization from range scans, ACM Transactions on Graphics 22 (3) (2003) 587.
 [53] V. Kraevoy, A. Sheffer, TemplateBased Mesh Completion, Eurographics Symposium on Geometry Processing (2005) 13–22doi:10.2312/SGP/SGP05/013022.
 [54] N. Hasler, C. Stoll, A statistical model of human pose and body shape, Computer Graphics … 28 (2) (2009) 2009.
 [55] F. Bogo, J. Romero, M. Loper, M. J. Black, FAUST: Dataset and Evaluation for 3D Mesh Registration, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2014, pp. 3794–3801.
 [56] G. PonsMoll, J. Romero, N. Mahmood, M. J. Black, Dyna: A Model of Dynamic Human Shape in Motion, ACM Transactions on Graphics 34 (4) (2015) 120:1–120:14.
 [57] M. Loper, N. Mahmood, J. Romero, G. Ponsmoll, M. J. Black, SMPL : A Skinned MultiPerson Linear Model, ACM Trans. Graphics (Proc. SIGGRAPH Asia) 34 (6) (2015) 248:1—248:16.
 [58] N. J. Mitra, N. Gelfand, H. Pottmann, L. Guibas, Registration of point cloud data from a geometric optimization perspective, Proceedings of the 2004 Eurographics/ACM SIGGRAPH symposium on Geometry processing  SGP ’04 (2004) 22.
 [59] H. Li, R. W. Sumner, M. Pauly, Global correspondence optimization for nonrigid registration of depth scans, Eurographics Symposium on Geometry Processing 27 (5) (2008) 1421–1430.
 [60] L. Gao, Y.K. Lai, D. Liang, S.Y. Chen, S. Xia, Efficient and Flexible Deformation Representation for DataDriven Surface Modeling, ACM Transactions on Graphics 35 (5) (2016) 1–17.
 [61] H. Maron, N. Dym, I. Kezurer, S. Kovalsky, Y. Lipman, Point registration via efficient convex relaxation, ACM Transactions on Graphics 35 (4) (2016) 1–12.
 [62] I. L. Dryden, K. V. Mardia, Statistical Shape Analysis, with Applications in R, Wiley Series in Probability and Statistics, John Wiley & Sons, Ltd, Chichester, UK, 2016.
 [63] A. L. Yuille, N. M. Grzywacz, A mathematical analysis of the motion coherence theory, International Journal of Computer Vision 3 (2) (1989) 155–175.
 [64] S. Umeyama, LeastSquares Estimation of Transformation Parameters Between Two Point Patterns (1991).
 [65] A. Myronenko, X. Song, On the closedform solution of the rotation matrix arising in computer vision problems (2009).
 [66] C. Williams, M. W. Seeger, Using the Nystrom Method to Speed Up Kernel Machines, NIPS Proceedings 13 (2001) 682–688.
 [67] Yang, Duraiswami, Gumerov, Davis, Improved fast Gauss transform and efficient kernel density estimation, in: Proceedings Ninth IEEE International Conference on Computer Vision, IEEE, 2003, pp. 664–671.
 [68] V. I. Morariu, B. V. Srinivasan, V. C. Raykar, R. Duraiswami, L. S. Davis, Automatic online tuning for fast Gaussian summation, in: D. Koller, D. Schuurmans, Y. Bengio, L. Bottou (Eds.), Advances in Neural Information Processing Systems 21, Curran Associates, Inc., 2009, pp. 1113–1120.
 [69] V. C. Raykar, C. Yang, R. Duraiswami, Fast computation of sums of Gaussians in high dimensions, Technical report in University of Maryland.
Appendix
List of notations

: the number of points in the target point set.

: the number of points in the source point set (or the mean shape).

: the dimension of the space in which point sets are embedded.

: the number of shape variations in a statistical shape model.

: the number of points to be sampled for the Nyström approximation.

: the target point set, where the th point of which is denoted by .

: the source point set, where the th point of which is denoted by .

: the matrix notation of the mean shape, where the th point of which is denoted by .

: the vector notation of the mean shape.

: the shape variation matrix, where is the th leading shape variation.

: the weight vector corresponding to shape variations.

: the residual variance of a Gaussian distribution that constitutes the Gaussian mixture model.

: the outlier probability.

: the regularization parameter for controlling the smoothness of the displacement field.

: the effective number of matching points in the target point set.

: a transformation model for mapping a source point set to a target point set where is a set of parameters that characterizes the transformation.

: the parameter set that defines the similarity transformation, where is the scaling factor, is the rotation matrix, and is the translation vector.

: the whole parameter set of a Gaussian mixture model.

: the probability matrix, the th element of which is the posterior probability that matches (or ).

: the volume of the region in which outliers can be generated.

: the sampled point set from the combined point set for the Nyström approximation.

: the Gaussian affinity matrix between point sets and , where .
Comments
There are no comments yet.