1 Introduction
Dense registration of 3D face seeks canonical representation of different facial surfaces such that their detailed structures can be compared. It is also named as dense correspondence in many prior works since the dense vertextovertex mappings are established after the registration is completed. Dense registration of 3D face is fundamental to many downstream tasks for 3D and 2D facial analysis due to the invariance of 3D shape to pose and illumination variations. For example, it is an essential step for 3D statistical face modeling [6, 8, 35, 20, 26, 29, 45]. It has also activated many solutions to problems for 2D faces [64, 28, 37, 11, 53, 65, 15].
Dense registration of 3D face belongs to the common nonrigid registration problem and has its own characters:

It is an illposed optimization problem since the solution is not uniquely defined. In a departure from the rigid case for solving an optimal rigid motion, the nonrigid registration problem belongs to a much larger class that has no explicit formulation.

It is a domainspecific problem targeted at 3D facial surfaces. The intrinsic anatomical structure of face should be considered as vital clues for vertextovertex correspondences to guide the nonrigid registration.
In the common case, the 3D shape (commonly surface in reallife applications) nonrigid registration problem [1, 10, 54, 38, 39] can be revisited as an optimization problem for classical elastic shape similarity modeling [55, 50]:
Given a template surface and a target surface , the problem is to solve optimal parametrization that minimizes the following deformation energy measuring the difference between two surfaces as
(1) 
where is is the Frobenius norm, and are the first and second fundamental forms of the surface , respectively, and and are the corresponding forms of as the deformed version of . The first fundamental form measures the difference between and in a 2D embedding space on parameterized surface , while the second fundamental form measures the local difference of curvature. The two terms in Eq. 1 are weighted by and for tangential and normal distortions, respectively. This formulation favors the nonrigid deformation to be locally rigid. However, local rigidity is very abstract and Eq. 1 is difficult to be optimized directly.
In a particular case for 3D face, the stateoftheart works [2, 34, 41, 42, 25] commonly achieve vertextovertex correspondence by a nonrigid deformation from a template face to a target face
. In a pairwise manner, the template face is usually a wellcustomized mesh as the initialization for registration. Then, the optimization for registration is generally modeled as an expectation–maximization (EM) problem as follows.

Estep: searching for a plausible preliminary vertextovertex correspondence between and .

Mstep: solving an optimal nonrigid deformation from to under some certain constraints.
The above process is in fact a generalization of the wellknown iterative closest point (ICP) algorithm [5, 13, 63] for rigid registration to the nonrigid case by alternating the above two steps. In the Estep, correspondence of vertices is established with some certain rules, such as landmark correspondence as hard feature constraint and closest vertex correspondence as soft constraint. In the Mstep, the nonrigid deformation is usually modeled as some locally smooth deformations, such as locally rigid transformation [27, 23] and locally affine transformation [24, 2]. For example, the NICP [2] algorithm includes a landmark term together with a distance and a stiffness term in the full objective function for optimization. The stateoftheart (SOTA) works commonly include feature matchings for normal and curvature, e.g. the iterative closest normal point (ICNP) methods [40], therefore achieve reasonable results for minimizing the second term in Eq. 1. In this work, we show that the result can be further optimized for the first term of Eq. 1, which leads to better local rigidity, and furthermore a unique solution to the registration problem. This is neglected in the prior works yet a quite important issue for 3D face correspondence. In fact, better optimization for the first term in Eq. 1 results in superior representation of 3D facial surface. And it also leads to coherent local registration and elegant mesh grid routing referring to a template mesh as shown in Fig. 1.
Optimizing the first term of Eq. 1 for dense correspondence can be considered as dividing a surface according to a certain template, as in Fig. 2. If we shrink the dimension of the original problem from 3D surface to 2D plane, then it becomes a problem to solve for optimal locations of vertices to divide the plane into similar triangles. If we further evolve the problem to the 1D case, then the definition is clear and the result is unique (as in Fig. 2):
Given a template line with end points and dividing points , one can find optimal segmentations on a target line with end points by satisfying
(2) 
While the segmentation of a line in a 1D Euclidean space is simple, the generalization to a 3D surface on a Riemann manifold requires extensive studies. In this paper, we propose a novel dividing and diffusing method for 3D face dense registration. Starting from a target mesh in correspondence with a template mesh, the proposed method alternates between a dividing step and a diffusion step to achieve finegrained vertextovertex correspondence between them. We further propose a multiresolution (MR) version of the dividing and diffusing method to facilitate the correspondence process. We show that the proposed method is correlated to the minimization of the distance between the template and the target face embedded in a logarithmic metric space for local scaling. In summary, the main contributions of this work are as follows:

We propose a novel dividing and diffusing method for finegrained optimization of 3D face dense correspondence, which leads to superior representation of 3D facial shape.

We propose a multiresolution version of the dividing and diffusing method for fast convergence.

We elucidate the physical meaning of the proposed method as smooth rearrangement of a local scaling metric for 3D facial shape, which leads to coherent local registrations and elegant mesh grid routines.
2 Related Work
3D face dense registration is an extremely nonlinear optimization problem that requires a well initialization and plausible constraints for optimization. If we consider the problem as pairwise correspondence between two faces, then the optimization problem can be initialized by a template with a few labeled landmarks as feature correspondence to the target. Most of the current works in this field can be classified into deformation based and deformation free ones according to their optimization strategies.
2.1 Deformation Based Method
These methods commonly deform a template face to a target face under some constraints on reasonable initializations, feature correspondences, and locally smooth deformations, which are discussed respectively as follows.
Initialization. A wellcustomized template mesh is generally used as the initialization for a subsequent deformation process. Then, the template face is usually aligned to the target face with a globally rigid or affine transformation driven by a few labelled landmarks or some other significant features in correspondence. For example, Mohammadzade et al. [40] use automatically detected nose tip to guide the global alignment of the template face to the target face. Amberg et al. [2] use an affine transform to align the template face to the target face with the correspondence of a few annotated landmarks. Some other works [22, 34, 27] adopt rigid transformations to align the template to the target face. In particular, Fan et al. [23] construct a group of locally rigid transformations to match the landmark locations exactly. The above methods generally employ rigid or approximately rigid registrations as the preliminary step for nonrigid deformation.
Feature correspondence. Deformation guided by feature correspondence can maximally preserve the basic structure of a human face, while being able to converge to a reasonable solution for global optimization as well. Current works [2, 30, 22] commonly use some preannotated landmarks as feature correspondence across different faces. Generally, each pair of landmarks in correspondence dominates over other vertices in a local region around it. The landmark correspondences control the holistic shape and help to deal with expression variations in the guided deformation. There are also some other works that use landmarkfree approaches in a departure from the landmarkbased feature correspondences. For example, Gilani et al. [66] use level set geodesic curve to extract seed points automatically for global and local deformations. Fan et al. [22] use some highentropy points, which can be considered as dense feature points in significantly highcurvature areas of face to guide the global deformation. Pan et al. [42] use some denser points to model large deformations on the mouth region. The denser points in these methods can be regarded as robust representations of facial features, which are more tolerant to individual landmark matching errors. However, the capacity for guiding large deformation is discounted without accurate and definite landmarks, therefore being suboptimal for faces with large expressions to that with neutral expression.
Smooth deformation. Locally smooth deformations also preserve the basic structures of a 3D facial surface, while enable the deformed template to adhere to the target face gradually. Myronenko and Song [41] regularize the offset of each vertex by total variation constraint for local smoothness, as a general nonrigid registration method. Patel and Smith [43] use a thinplate spline warp towards the target face for smooth deformation. Zhang et al. [62]
incorporate functional maps into the deformation process, and ensure smooth local deformation by the lowfrequency basis of the eigenfunctions of the LaplaceBeltrami operators. The NICP method
[2] and its variant [14] incorporate locally affine transformations as smooth deformations. Fan et al. [23] construct a group of locally rigid transformations to guarantee smooth shape deformations. The deformations in these methods generally alternate with refined correspondence for each vertex on a face as restrictive optimization to acquire the results for nonrigid registration. In some other methods [56, 7, 25], a prior model originated from a number of face prototypes in correspondence are incorporated to fit the deformed target faces. However, this is a chickenandegg problem and the correspondence problem of face prototypes remains to be solved.The deformation based methods are able to reach a reasonable solution with an alternative EM approach for minimizing Eq. 1. However, since most of these methods employ different rules for both surface deformation and vertex correspondence, they generally lead to distinct solutions and neglect the optimality of the results. Therefore, the 3D face dense correspondence problem has no standard solution in the existing works. In this work, we propose a dividing and diffusing algorithm to optimize the results for 3D face dense registration, in particular for the tangential parameterization in the first term of Eq. 1, and achieve stable results which are largely independent of preliminary correspondences.
2.2 Deformation Free Method
Deformation free methods establish dense vertextovertex correspondence without an explicit deformation process. These methods commonly involve some point matching strategies with the guidance of a few significant features, such as facial landmarks. Other geometric features such as curvature and normal are usually employed as signatures for local shape matching. We list some representative methods in the literature as follows.
In the seminal work of 3D Morphable Model (3DMM) [6], Blanz and Vetter propose to encode and map both 3D shape and texture features to a 2D cylindrical coordinate, and use a regularized opticflow based algorithm for dense correspondence of 3D faces.
Sun and Abidi [52] project geodesic contours around some feature points onto their tangential planes, which are further used as signatures for shape matching. This method is further improved and applied to 3D face dense correspondence by Salazar et al. [47] on the BU3DFE dataset [60] for modeling shape variation from large expressions.
Gu et al. [33, 61] propose a Ricci flow based method for conformal mapping of a 3D facial surface to a 2D canonical coordinate. Then, some uniform dividing strategies are conducted, with a few preannotated landmarks as fixed feature points.
Gilani et al. [31] first detect sparse correspondences on the outer boundary of a 3D face, and then triangulate existing correspondences and expand them iteratively by matching points of distinctive surface curvature along the triangle edges. Finally, the triangle mesh is refined by evolving level set geodesic curves.
The deformation free methods can be seen as employing some indirect dividing strategies on the target facial surface. Since the dividing of a 3D surface directly is difficult, some of these methods reduce the dimension of the original problem from 3D to 2D [6, 33, 61, 32]. However, the limitation is that the intrinsic geometric features are only represented approximately, since it is unable to isometrically embed a nonflat 3D surface (e.g. 3D face) into 2D plane perfectly. This is particularly difficult for complex 3D surface, such as 3D face with large expressions and 3D hand with flexible joints. In this work, we propose a novel iterative dividing and diffusing method on the original target facial surface directly referring to the mesh architecture of a template facial surface, in a departure from the existing indirect dividing methods.
3 Segmentation of a Line
A scientific methodology to solve a problem in highdimensional space is to first consider the problem in a lowdimensional case. Following this idea we cast the registration problem for 3D facial surface as a line segmentation problem shown in Fig. 2. The problem asks for a tuple of optimal sublines on a target line referring to a certain template line proportionately. One can simply use a ruler to measure the length of the two lines and then determine the dividing points according to Eq. 2. Unfortunately, such a global ruler measuring the “length” of a 3D surface does not exist, while the difference between two surfaces can be compared locally. Keeping in mind that we do not have a global “ruler”, we employ an iterative approach to solve this segmentation problem instead. We formulate the problem as a dividing and diffusing process for solving Eq. 2 as follows.
Given a template line with points and a target line with initialized corresponding points , one can compute the optimal dividing points on by iteratively alternating the following two steps:

Dividing. Compute each optimal point on referring to each subline triplet on by satisfying
(3) where denotes the iteration number.

Diffusing. Denoting as the renewing offset for each point in the iteration, a local average strategy is applied as
(4) and each renewed point is computed by
(5)
The iterative process is able to reach a final solution when the renewed offset is smaller than a certain threshold. Table I shows an example of the renewing process for a target line referring to a corresponding template line with . The total error to the ground truth as
(6) 
the total renewed offset as
(7) 
and each renewing point are shown in Fig. 3.
Iterative Line Segmentation w/o MR Scheme  

Iteration Number  Point Location  
0  
1  
2  
5  
10  
18  
Iterative Line Segmentation w/ MR Scheme  
Iteration Number  Point Location  
0  
1  
2  
Ground Truth  
Reference Template 
This iterative process successfully achieves proportional division of a target line according to a certain template line. An additional advantage is its robustness to different initializations. The result still converges to the optimal solution, even if the initialized sublines intersect with each other, i.e., . An example is shown in Fig. 4, where the target line is initialized with . This indicates that the generalized method on 3D surface can repair the selfintersection caused by flawed deformation, which is an attractive property for 3D face dense correspondence (also see Fig. 1). However, the convergence speed is slow and can be improved further. If we use a multiresolution scheme, i.e., first locate for a coarse resolution and then locate for a fine resolution, two iterations are sufficient for convergence as in Table I. This motivates us to propose a multiresolution (MR) version of the dividing and diffusing algorithm for 3D face dense registration, which will be elaborated afterwards.
4 Segmentation of 3D Facial Surface
In this section, we propose an iterative dividing and diffusing method for the registration of 3D facial surfaces. Generalizing the algorithm in Section 3 from 1D line to 3D surface requires appropriate definition of the “dividing” and “diffusing” process, respectively. We employ local registration of a vertex’s ring neighbors as the local “dividing” process. We then construct a global optimization problem with constraints on local smoothing for the “diffusing” process. Contrary to the 1D case, the global optimization on a 3D surface involves additional fixed feature points (e.g. landmarks) on the surface and a leastsquare formulation is put forward to deal with this issue.
4.1 Dividing
The local registration process starts from a pair of initially registered template and target facial surfaces. We simply use the ring neighbors of a vertex as the basic local cell in the common triangle mesh representation of 3D surface. Other formats of data are also applicable by constructing a local cell with some nearest neighboring vertices.
Let be a template mesh including vertices in , edges in , and triangles in ; and let be the corresponding target mesh. For each vertex , we denote its ring neighbors as and its corresponding vertices on the target as and . Then, we suppose there exists a rigid transformation that aligns to , as
(8) 
where
can be estimated by a leastsquare alignment problem of the surrounding
ring neighbors, as(9) 
Here denotes the space of all rank orthonormal matrices with the determinants to be (i.e. Givens Matrices). The problem in Eq. 9
can be solved efficiently by a singular value decomposition (SVD) based method
[51].After the rigid transformation is obtained, we compute the preliminary offset for each vertex by
(10) 
The local registration process is illustrated in Fig. 5.
4.2 Diffusing
The preliminary offset for each vertex in Eq. 10 should not be managed isolatedly, since the adjacent vertices share some common vertices in their ring neighbors. Therefore, we construct a diffusing (local smoothing) strategy to spread out the local effect for global optimization. We denote and formulate the problem as
(11) 
where is the weights for diffusing of each offset and the superscript is omitted for brevity.
Solving Eq. 11 requires taking the partial derivative with respect to each offset and leads to a linear system
(12) 
where
(13) 
(14) 
and
(15) 
Here
is the identity matrix,
is the number of vertices in , and the superscript denotes the operation for matrix transpose. If we let be the coefficient matrix of Eq. 12, then is a sparse and strict diagonaldominant matrix. The solution for Eq. 12 is discussed in two different cases:Case : Without Fixed Vertices. In this case, all vertices are free to be renewed. Thus is a fullrank square matrix such that the linear system is wellposed and has a unique solution as
(16) 
This is similar to the mesh correction algorithm proposed in [22], which is applicable to neutral faces with small deformations.
Case : With Fixed Vertices. In this case, some vertices are fixed as constraints for solving Eq. 12, in which is a rankdeficient matrix. Let be the number of fixed vertices (), we exclude the columns in and the rows in which are related to the fixed vertices. Then Eq. 12 is equal to
(17) 
which is an overdetermined linear system. It can be further converted to a wellposed system as
(18) 
Then the leastsquare solution to Eq. 17 is
(19) 
By this way, we are able to process the mesh by fixing some already wellcorresponded vertices while renewing other vertices. This is particularly useful for faces with large expressions, where feature points (e.g. landmarks) are used to guide the overall correspondences.
4.3 The Overall Algorithm
The overall algorithm alternates between the above dividing and diffusing process until the average of the renewed offset
(21) 
is smaller than a certain threshold . In practice, the implementation of the proposed method involves the following details, which are not trivial.

Categorizing of Vertices. The raw scanning data are commonly not able to capture some inner structures accurately on a face, such as the nostril and the ear regions. These structures are created from a template face in many 3D face dense registration method. We define the vertices in these regions as noninterested vertices. The other vertices are defined as interested vertices which are applied to the proposed method. In addition, the interested vertices can be classified into fixed vertices and free vertices. The fixed vertices include the landmarks, the edges of facial mesh, and the edges between interested and noninterested vertices. Fig. 6 (a) shows the different classes of vertices.

Effects of Fixed Vertices. The fixed vertices should serve as constraints during the dividing and diffusing process. We apply the dividing process for each fixed vertex and its ring neighbors. We then substitute each preliminary offset by Eq. 10 into the diffusing process. However, we do not renew the fixed vertices with Eq. 20 in the iterations. Therefore, the fixed vertices are used to “drag” and “pull” the neighboring vertices to correct locations. Taking a row for a fixed vertex in Eq. 17 for example, we obtain
(22) where the fixed vertex transmits a “reacting force” to its neighboring vertices. To this end, we set the weights for global diffusing in Eq. 11 to decrease with the geodesic distance to the fixed vertices to enlarge the effect of fixed features, as in Fig. 6 (b).

Skills in Solving Eq. 18. We do not use Eq. 19 as matrix inversion to solve Eq. 18 in practice. Since is a symmetric positive definite matrix, we adopt three steps:
I. Reordering of variables and allocating memory;
II. Cholesky factorization of ;
III. Substitution and iterative refinement.
Step I and II only require to be computed once in the whole algorithm. Therefore, solving step III in the sparse linear system is very fast by incorporating an advanced computational tool^{1}^{1}1https://pardisoproject.org/.

Nearest Neighboring Vertex Searching on Surfaces. In recent works [42, 59, 31, 22] for nearest vertex searching on a 3D surface, a KD tree architecture [3] for point cloud is commonly used to accelerate the computational process. However, the closest vertex on a facial surface to a given vertex is not necessarily a vertex in a KD tree. It generally locates insides a triangle unless the sampled point cloud is extremely dense. In this work, we use an axisaligned bounding box (AABB) tree [4] considering both point cloud ( vertices) and data structures ( triangles) to improve the computational efficiency. This way also leads to accurate location of nearest neighboring vertex. Fig. 7 shows the difference between the results by the two different trees, which is commonly neglected in the existing works for 3D face dense registration. We use an advanced package^{2}^{2}2https://libigl.github.io/ for the implementation of AABB tree and it only requires to be built once for a certain target face in our method.
The overall method is concluded in Alg. 1. This method can generally filter out the nonuniform mesh grid routine and achieve coherent local registrations for a target mesh, as shown in Fig. 1. It also leads to better quantitative results for a group of data, which will be elaborated in the experiments.
5 Multiresolution Analysis
Section 3 shows that a multiresolution (MR) version of the dividing and diffusing algorithm on a line accelerates the convergence speed greatly. In this section, we generalize the multiresolution based method to the facial surface and expect this will also benefit the convergence speed. To this end, the template face for registration is resampled into different resolutions. We incorporate a hierarchical strategy for the representation of 3D faces.
Since the proposed method starts from a template face and a topological uniform target face by an existing method, multiresolution analysis on the template face is sufficient. The counterpart on the target face follows the correspondence of each vertex. We organize the vertices on a template face in a pyramid structure with farthest point sampling (FPS) method [21], which ensures uniform spacing of included vertex in an iterative manner. Suppose that a template face include free vertices and fixed vertices (original feature points), the steps for the reconstruction of a multiresolution model are as follows.

Compute vertextovertex geodesic distances on .

Define the number of free vertices as a pyramid architecture with different levels decimated by a factor .

Initialize fixed vertices as included vertices for FPS.

Include vertices sequentially using FPS, such that the MR model includes vertices for different resolutions , respectively.
We use a fast heatflow based method [16] for the computation of vertextovertex geodesic distance on the template mesh, which is the metric for FPS in our method. In the first coarse resolution , we use the original feature points as fixed vertices and included vertices as free vertices. In other resolutions, we use the vertices in the previous resolution as fixed vertices, and the lately included vertices in the current resolution as free vertices. The dividing and diffusing algorithm is applied to each resolution in a cascade manner. Finally, we denote the method in Section 4 as the fullresolution and apply it to reach the final results. The different number of fixed and free vertices in each resolution is summarized in Table II.
Resolution  Free Vertices  Fixed Vertices 

An example for the MR model in the FaceScape dataset [58] is shown in Fig. 8, where we set . Unlike the fullresolution model that the neighboring relationship of all vertices are defined by the ring neighbors of the triangle mesh, this relationship for each vertex in other resolutions requires to be customized. In this work, we define the neighboring vertices by the following rules.

If a vertex is on the edges of the mesh, we adopt two vicinity vertices and another nearest vertex inside the mesh as its neighbors.

If a vertex is inside the mesh, we adopt six nearest neighboring vertices as its neighbors.

The neighboring relationship is symmetrized: i.e. if vertex is incident to vertex , then we make incident to as well.
In these rules, we use geodesic distance as the basic metric for nearest neighbor searching. The neighboring relationship is further used to replace the ring setting in Eq. 11.
6 Local Scaling Metric
In this work, we consider the 3D face dense correspondence problem in a lowdimensional way, and propose an iterative dividing and diffusing method motivated by simple proportional segmentation of a line. As a result, the proposed method divides a target facial surface as rigidly as possible referring to a given template facial surface, while constrained by some fixed feature correspondences. In this way, the proposed method forces the difference between the target and template surface to vary in a locally smooth manner. We make some modifications upon Eq. 2 and define a
similarity score vector
between two faces according to their edge differences as(23) 
where and are the corresponding edge lengths of the target and template face, respectively. The ratio is equal to if the corresponding edge lengths are the same. We define the local scale embeding vector of a target face as
(24) 
by taking a common template face and using the logarithmic notation, where is the length of each edge on the template face. By this way, the template face locates at the origin in a highdimensional space. We define the induced local distance vector between two faces and as
(25)  
where are the lengths of the corresponding edges. We add up and average each element in Eq. 25 and obtain a global distance metric
(26) 
as
(27) 
where
(28) 
is each normalized weight that compensates for different edge lengths on the template face. The metric in Eq. 27 satisfies the following theorems.
Theorem I (Identity of Indiscernibles). Given two arbitrary faces and , we have
(29) 
and the equality holds if and only if
(30) 
Proof. Inequality 29 is correct since every element in Eq. 27 is positive. We now focus on proving
(31) 
The steps are as follows.
(32)  
Theorem II (Symmetry). Given the local scale embedding vectors and of two arbitrary faces, we have
(33) 
Theorem III (Triangle Inequality). Given the local scale embedding vectors , , and of three arbitary faces, we have
(35) 
Proof. Considering each of the elements in , we have the following inequality
(36)  
Then it follows
(37)  
Theorem I to III guarantee mathematical rigorism for a given metric and indicate that the local scale embedding vector for each face can be treated as separable points in a highdimensional Euclidean space. Physically, we represent the scale of a face locally referring to a template face in the tangential direction of the target surface. The geometric features of a face can be determined uniquely by the local scale in combination with the curvature (in the normal direction of the target surface). One merit of the local scale embedding vector in Eq. 24 as a pack of scalars is its invariance to rigid transformations, thus represents the intrinsic features for a face. The proposed dividing and diffusing algorithm tends to minimize the global distance metric for two faces denoted by Eq. 27. We show two examples in Fig. 9 for both the local and global metrics in Eq. 25 and Eq. 27. We can see that the global distance metric by Eq. 27 is significantly smaller after applying the proposed method between two faces, while the vector in Eq. 25 becomes more smooth and locally coherent on the target surface. In Fig. 9, we can also see that the local distance vector measures the degree of stretching () or shrinking (
) of a face with respect to another face. This is particularly meaningful as an indicator for shape variances caused by different identities and expressions.
7 Experiment
Assessing the results for 3D face dense registration is not an easy task. On the one hand, some feature vertices are definite for anatomical significances and are visibly salient for local shape and texture features. On the other hand, most vertices for a dense face model are not definite with the groundtruth. Some existing works commonly use indirect ways for evaluation under different perspectives, such as representation ability of resulted face model [23, 31, 25], landmark localization accuracy [49, 17, 30, 36, 25]
, and 3D face recognition performance
[9, 57, 19, 22]. In this work, the proposed dividing and diffusing method gives practicable definition to the problem by extending the fixed feature correspondence to the overall dense correspondence. Thus we treat 3D landmark detection (either manually or automatically annotated) as a preprocessing step and do not use it for evaluation. We do not use 3D face recognition performance for evaluation either, since it requires postprocessing, such as shape clipping and feature extraction fed into different classifiers. In addition to the commonly used metric as groupwise representation ability of resulted face model, we include extensive experiments including computational efficiency, robustness to initialization, and direct metric embedding tailored for the proposed method for a full evaluation in different perspectives.
7.1 Datasets
We carry out the experiments on two publicly available datasets including BU3DFE [60] and FaceScape [58]. They are two typical datasets since both of them are rich for expressions and densely constructed by scans from multiple directions. BU3DFE includes facial samples with different expressions in different levels by actors. The resolution of raw scanning data is around vertices per face. FaceScape is publicly available recently in and includes highresolution (around vertices) 3D faces with different expressions. It also provides the registered faces which share the same topology as triangle meshes with vertices and triangles. It is worth mentioning that the collection of large and highquality 3D dataset is not an easy task. The existing works on 3D dataset [48, 44, 12, 8] play indispensable roles for the advancement of 3D face applications.
7.2 Computational Efficiency & Convergence Analysis
In the proposed method, the sparse matrix decomposition, the neighboring relationship of each vertex, and the multiresolution organization of vertices require to be computed only once for a common template face. The building of AABB tree is computed once for each target face in the registration process. Thus we neglect the cost for these steps. The local registration process and the iterative refinement for solving Eq. 17 contribute to most of the computational time in practice. The former involves small matrix multiplications () and SVD decomposition () for each vertex (totally ), while the latter involves iterative substitution of a sparse rank coefficient matrix. Thus the computational complexity for the fullresolution version is approximately
(38)  
In a MR version with the number of vertices growing exponentially in each resolution, the computational complexity is theoretically reduced to
(39) 
Thus the computational time increases linearly and logarithmically with the number of vertices for the full resolution and the MR version of the proposed method, respectively. In our implementation^{3}^{3}3We implement the proposed method on the software and hardware environments of VC++2015 and i79700 (single thread), respectively., it requires for the fullresolution version on average for a model with vertices, while the MR model reduces the time to . Fig. 10 shows an example by setting the threshold in Eq. 21 to , which is normalized by the mean edge length of a template face. We include both the fullresolution version and the MR version for comparison. It shows that the proposed method is very efficient, while the MR version accelerates the convergence speed further. The faster convergence of the MR version is due to both less iterations and less vertices for operations in lowresolution steps.
7.3 Definite with Stable Correspondence
Section 3 shows an example for iterative line segmentation which results in a unique solution defined by Eq. 2. We further guess that the proposed dividing and diffusing method is also able to converge to a unique solution which is independent to different initializations. We now test the influence of different initialization methods and noisy initializations, respectively.
Initialization with different methods. First, we select samples in the BU3DFE dataset which cover different identities and expressions. The initial correspondences of these samples are established with two different and reproducible methods, the NICP method [2] and a local shape deformation (LSD) method [23]. We apply the proposed method (fullresolution version) to these corresponding samples and compare the average pervertex errors between the results from the initializations with the two different methods. Fig. 11 shows the normalized results. We can see that both the mean and the standard deviation of the errors decrease with the number of iterations. This means that the results by the proposed method are independent from different initializations to some extent. The final residual errors should be caused by different meshedge vertices (as fixed vertices) and shape fitting errors in the normal direction by different methods.
Initialization with noise. Then, we extract the corresponded faces in the BU3DFE dataset by the local shape deformation method [23]
and add different level of noise on free vertices for each face in the tangential directions. The distribution of noises on each vertex follows a uniform distribution with variance
. Fig. 12 shows a qualitative example of optimized meshes with different levels of added noise. The average pervertex errors between the results from a clean target and that with noise are also annotated. We can see that the proposed method is able to “repair” the noisy mesh grid routine and leads to locally coherent registrations. The vertex locations of corresponding faces and the mesh structures on the results across different level of noise are also stable. Therefore, the proposed method is very robust to noisy initializations.The above experiments on different initializations show some evidences that the proposed method leads to stable correspondences with the fixed vertices (fixed features and meshedges). Therefore, the proposed method gives the definition of vertex correspondences on smooth regions of faces to some extent, which is a critical issue in the field of 3D face dense correspondence.
7.4 Evaluation with the Proposed Metric
Section 6 introduces a local scaling metric that forces the registered target face to vary in a locally smooth manner referring to a template face. We now use both the global (Eq. 27) and local (Eq. 25) annotations for this metric to evaluate the correspondence results for the FaceScape and BU3DFE dataset. We choose the topological uniform samples provided by the FaceScape dataset as initializations for the proposed method. The dense correspondences of these samples in topological uniform formats are achieved by a classical NICP [2] method carefully with extra manual assistance. Therefore, we consider this as an enhanced NICP (ENICP) baseline. The implementation of nonrigid registration methods commonly involves a lot of parameter settings and additional expertise in this field, against which the public baseline for FaceScape dataset avoids the induced subjective biases. In addition, we implement both the NICP method [2] and the local shape deformation method [23] as baselines for the BU3DFE dataset. The parameters for both of the two methods are carefully tuned to minimize implementation errors.
First, we apply the proposed method (both the fullresolution and MR versions) to the samples^{4}^{4}4We exclude some problematic data in this dataset. in the FaceScape dataset and samples in the BU3DFE dataset. Table III shows the results for the global metric in Eq. 27 averaged over all samples in comparison with those by the baseline methods. We can see that the global metric is smaller after applying the proposed method, which denotes better coherent local deformations. In addition, the results by the fullresolution version and the MR version show no significant differences, since the fullresolution optimization is adhered to the final step of the MR version in this work.
Then, we apply linear discriminative analysis (LDA) to the local scaling embedding vectors of samples with respect to different expression labels. We show the clustering results for the first dimensions before and after applying the proposed method in Fig. 13 (a) and Fig. 13 (b), respectively. It shows that the discriminate ability with respect to different expressions is largely improved after applying the proposed method. Specifically, the expressions with jaw_left, neutral, brow_raiser, and jaw_forward (see Fig. 14 for an example) are mixed sequentially in pairs with each other for original representations with the ENICP method, whereas they are separated after optimization by the proposed method. This demonstrates that the proposed method leads to better discriminative representation for 3D facial shape, which is beneficial to many downstream applications.
Dataset  FaceScape [58]  BU3DFE [60]  

Baseline Method  ENICP [58]  NICP [2]  LSD [23] 
Before Opt.  
After Opt.(Full)  
After Opt.(MR) 
7.5 Compactness, Generalization, and Specificity for Lowdimensional Representations
Fig. 1 shows that the proposed method leads to locally coherent results between the template mesh and the target mesh qualitatively. We hope the pairwise locally coherent results also benefit the global groupwise representations of 3D facial shapes. We follow a common practice [18] for the evaluation of statistical shape models using compactness, generalization, and specificity. We use the topological uniform data in the FaceScape dataset as standard baseline for the ENICP method.
Training&Test Set Split. The 3D faces in the FaceScape dataset include identities with typical expressions. We include the first identities ( samples) as the training set and the rest identities (
samples) as the test set. We apply principal component analysis (
PCA) to the aligned training samples to achieve a lowdimensional representative model referring to the 3DMM [6] pipeline.Compactness is measured by the percentage of variance retained by the PCA model. Compactness is a vital property of data for dimension reduction. It is evaluated by a minimum description length principle [46]
in the machine learning community as “
the less, the better”. We compare the PCA models constructed from samples before and after applying the proposed methods, respectively. The results are shown in Fig. 15 (a). The first principal components explain 90.38% and 91.19% of the variances by the baseline model and that achieved by the proposed method, respectively. This demonstrates better compactness of lowdimensional models by the proposed method, which indicates accurate correspondence of 3D faces.Generalization measures the ability of a model to represent novel facial shapes that are not included in the training samples. We use the lowdimensional models, which are constructed with the training samples before and after applying the proposed method, respectively, to represent the test samples with different number of principal components. The mean vertex representation errors (as pervertex distances) averaged over all samples are shown in Fig. 15 (b). We observe that the generalization error is reduced notably after applying the proposed method to the corresponding training samples. Fig. 16 also shows a qualitative result for the representation of an exemplar facial sample with principal components, where there are visible differences between the models before and after applying the proposed method. These results demonstrate that the proposed method leads to not only better mesh grid routines (as in Fig. 1) but also enhanced ability to represent shape variances.
Specificity
measures the validity of generated faces by a representative model. We assume each principal component of the model follows Gaussian distribution with certain variances and randomly generate novel instances for a fixed number of principal components according to each Gaussian distribution model. The generated samples are then used to search their closest samples on the test set with minimum Euclidean distances. Fig.
15 (c) shows the average pervertex distances with generated samples for different settings of principal component number. It shows that the results by the proposed method achieve smaller specificity error compared with the baseline method. This demonstrates the effectiveness of the generative model for synthesizing novel facial samples.7.6 Failure Cases
The proposed method has been tested with a variety of facial data and can generally “repair” the nonuniform mesh grid routine on corresponded faces, even if the initialization is corrupted by tremendous noise (see Fig. 12) in the tangential direction. However, the proposed method cannot achieve a “clean” result when the facial surface is not reconstructed properly by a scanning device. A failure example in the FaceScape dataset is shown in Fig. 17, where the initial corresponded result by the ENICP method fails to represent the groundtruth facial surface in the normal direction. The proposed method is not able to solve this problem thoroughly, although most of the nonuniform mesh grid routines are rectified. Therefore, we suggest that the proposed method can expand its applications in combination with the stateoftheart surface denoising methods.
7.7 Application to Other Format of Data
The proposed dividing and diffusing method can be also applied to other format of data which are not limited to face. In fact, it is able to achieve finegrained correspondence and repair the nonuniform mesh grid routine given a pair of initially registered template and target surface. Fig. 18 shows an example for hand mesh data. The common ground between the face and the hand is that they have both fixed vertices (feature points) and free vertices in the setting for dense registration. The proposed method is general under this assumption, and is thus very useful in a variety of applications for surface registration.
8 Conclusion
We propose a dividing and diffusing method for the registration of 3D facial surface in this paper. The proposed method is motivated by proportional segmentation of a line, and alternates between local dividing and global diffusing, to finally achieve pairwise and finegrained dense correspondence of 3D facial surface. We further elucidate the physical meaning of the proposed method as smooth rearrangement of a local scaling metric for 3D facial shape. Extensive experiments have demonstrated its effectiveness, including computational efficiency, robustness to initializations, metric embedding property, and representation ability of resulted face model. The proposed method can be also used to establish correspondences for other format of surface data, such as hands. Generally, it gives a plausible definition for vertex correspondence on smooth regions for 3D face, and leads to locally coherent details and elegant mesh grid routines for dense registrations, which we hope is helpful for a variety of applications.
There is a remaining issue in the proposed method as well as in other existing works. We link the proposed method to the fundamental forms in Eq. 1 for shape analysis in this paper. We assume that the second fundamental form for curvature can be well established by a number of feature points, based on which we optimize the first fundamental form for surface parameterization. However, we observe that some highcurvature features on facial surface cannot be uniquely determined if the number of landmarks is not enough. On the contrary, the accuracy of preannotated landmarks is a burden if we include too many of them which are not very salient, especially for raw scanning data with tremendous noise. Therefore, balancing the number and accuracy of facial landmarks as well as including soft constraints versus hard feature constraints deserves further study. In the future, we will explore automatic and accurate landmark localization methods on point clouds for 3D facial data. We will also study “soft” method to take into consideration noisy feature matchings for robust 3D face dense correspondence with lifted performance.
References
 [1] (2003) The space of human body shapes: reconstruction and parameterization from range scans. ACM Transactions on Graphics 22 (3), pp. 587–594. Cited by: §1.

[2]
(2007)
Optimal step nonrigid icp algorithms for surface registration.
In
IEEE Conference on Computer Vision and Pattern Recognition
, pp. 1–8. Cited by: Fig. 1, §1, §1, §2.1, §2.1, §2.1, §7.3, §7.4, TABLE III.  [3] (1975) Multidimensional binary search trees used for associative searching. Communications of the ACM 18 (9), pp. 509–517. Cited by: 4th item.
 [4] (1997) Efficient collision detection of complex deformable models using aabb trees. Journal of Graphics Tools 2 (4), pp. 1–13. Cited by: 4th item.
 [5] (1992) A method for registration of 3d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14 (2), pp. 239–256. Cited by: §1.
 [6] (1999) A morphable model for the synthesis of 3d faces. In ACM Annual Conference on Computer Graphics and Interactive Techniques, pp. 187–194. Cited by: §1, §2.2, §2.2, §7.5.
 [7] (2015) A groupwise multilinear correspondence optimization for 3d faces. In IEEE International Conference on Computer Vision, pp. 3604–3612. Cited by: §2.1.
 [8] (2018) Large scale 3d morphable models. International Journal of Computer Vision 126 (2), pp. 233–254. Cited by: §1, §7.1.
 [9] (2005) Threedimensional face recognition. International Journal of Computer Vision 64 (1), pp. 5–30. Cited by: §7.
 [10] (2007) Global nonrigid alignment of 3d scans. ACM Transactions on Graphics 26 (3), pp. 1276404. Cited by: §1.
 [11] (2017) How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In IEEE International Conference on Computer Vision, pp. 1021–1030. Cited by: §1.
 [12] (2013) Facewarehouse: a 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20 (3), pp. 413–425. Cited by: §7.1.
 [13] (1992) Object modelling by registration of multiple range images. Image and Vision Computing 10 (3), pp. 145–155. Cited by: §1.
 [14] (2015) Active nonrigid icp algorithm. In IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, Vol. 1, pp. 1–8. Cited by: §2.1.
 [15] (2016) Survey on rgb, 3d, thermal, and multimodal approaches for facial expression recognition: history, trends, and affectrelated applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (8), pp. 1548–1568. Cited by: §1.
 [16] (2013) Geodesics in heat: a new approach to computing distance based on heat flow. ACM Transactions on Graphics 32 (5), pp. 1–11. Cited by: §5.
 [17] (2013) A machinelearning approach to keypoint detection and landmarking on 3d meshes. International Journal of Computer Vision 102 (13), pp. 146–179. Cited by: §7.
 [18] (2002) A minimum description length approach to statistical shape modeling. IEEE Transactions on Medical Imaging 21 (5), pp. 525–537. Cited by: §7.5.
 [19] (2013) 3D face recognition under expressions, occlusions, and pose variations. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (9), pp. 2270–2283. Cited by: §7.
 [20] (2020) 3D morphable face models—past, present, and future. ACM Transactions on Graphics 39 (5), pp. 1–38. Cited by: §1.
 [21] (1997) The farthest point strategy for progressive image sampling. IEEE Transactions on Image Processing 6 (9), pp. 1305–1315. Cited by: §5.
 [22] (2018) Dense semantic and topological correspondence of 3d faces without landmarks. In European Conference on Computer Vision, pp. 523–539. Cited by: §2.1, §2.1, 4th item, §4.2, §7.
 [23] (2019) Boosting local shape matching for dense 3d face correspondence. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 10944–10954. Cited by: §1, §2.1, §2.1, §7.3, §7.3, §7.4, TABLE III, §7.
 [24] (1996) Rigid, affine and locally affine registration of freeform surfaces. International Journal of Computer Vision 18 (2), pp. 99–119. Cited by: §1.
 [25] (2021) A sparse and locally coherent morphable face model for dense semantic correspondence across heterogeneous 3d faces. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §1, §2.1, §7.
 [26] (2017) A dictionary learningbased 3d morphable shape model. IEEE Transactions on Multimedia 19 (12), pp. 2666–2679. Cited by: §1.
 [27] (2011) Locally rigid globally nonrigid surface registration. In International Conference on Computer Vision, pp. 1527–1534. Cited by: §1, §2.1.
 [28] (2014) Automatic face reenactment. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 4217–4224. Cited by: §1.
 [29] (2018) Morphable face modelsan open framework. In IEEE International Conference on Automatic Face and Gesture Recognition, pp. 75–82. Cited by: §1.
 [30] (2017) Deep, dense and accurate 3d face correspondence for generating population specific deformable models. Pattern Recognition 69, pp. 238–250. Cited by: §2.1, §7.
 [31] (2017) Dense 3d face correspondence. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (7), pp. 1584–1598. Cited by: §2.2, 4th item, §7.
 [32] (2016) Fully automated and highly accurate dense correspondence for facial surfaces. In European Conference on Computer Vision, pp. 552–568. Cited by: §2.2.
 [33] (2007) Ricci flow for 3d shape analysis. In IEEE International Conference on Computer Vision, pp. 1–8. Cited by: §2.2, §2.2.
 [34] (2008) Global correspondence optimization for nonrigid registration of depth scans. In Computer Graphics Forum, Vol. 27, pp. 1421–1430. Cited by: §1, §2.1.
 [35] (2017) Learning a model of facial shape and expression from 4d scans. ACM Transaction on Graphics 36 (6), pp. 194–1. Cited by: §1.
 [36] (2019) 3d face modeling from diverse raw scan data. In IEEE International Conference on Computer Vision, pp. 9408–9418. Cited by: §7.
 [37] (2016) Joint face alignment and 3d face reconstruction. In European Conference on Computer Vision, pp. 545–560. Cited by: §1.
 [38] (2014) Robust point matching via vector field consensus. IEEE Transactions on Image Processing 23 (4), pp. 1706–1721. Cited by: §1.
 [39] (2017) Recent developments and trends in point set registration methods. Journal of Visual Communication and Image Representation 46, pp. 95–106. Cited by: §1.
 [40] (2012) Iterative closest normal point for 3d face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2), pp. 381–397. Cited by: §1, §2.1.
 [41] (2010) Point set registration: coherent point drift. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (12), pp. 2262–2275. Cited by: §1, §2.1.
 [42] (2013) Establishing point correspondence of 3d faces via sparse facial deformable model. IEEE Transactions on Image Processing 22 (11), pp. 4170–4181. Cited by: §1, §2.1, 4th item.
 [43] (2009) 3d morphable face models revisited. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1327–1334. Cited by: §2.1.
 [44] (2005) Overview of the face recognition grand challenge. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 947–954. Cited by: §7.1.
 [45] (2020) Towards a complete 3d morphable model of the human head. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §1.
 [46] (1978) Modeling by shortest data description. Automatica 14 (5), pp. 465–471. Cited by: §7.5.
 [47] (2014) Fully automatic expressioninvariant face correspondence. Machine Vision and Applications 25 (4), pp. 859–879. Cited by: §2.2.
 [48] (2008) Bosphorus database for 3d face analysis. In European Workshop on Biometrics and Identity Management, pp. 47–56. Cited by: §7.1.
 [49] (2010) Automatic face segmentation and facial landmark detection in range images. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 40 (5), pp. 1319–1330. Cited by: §7.
 [50] (2007) Asrigidaspossible surface modeling. In Symposium on Geometry processing, Vol. 4, pp. 109–116. Cited by: §1.
 [51] (2017) Leastsquares rigid motion using svd. Computing 1 (1), pp. 1–5. Cited by: §4.1.
 [52] (2001) Surface matching by 3d point’s fingerprint. In IEEE International Conference on Computer Vision, Vol. 2, pp. 263–269. Cited by: §2.2.
 [53] (2017) Synthesizing obama: learning lip sync from audio. ACM Transactions on Graphics 36 (4), pp. 1–13. Cited by: §1.
 [54] (2012) Registration of 3d point clouds and meshes: a survey from rigid to nonrigid. IEEE Transactions on Visualization and Computer Graphics 19 (7), pp. 1199–1217. Cited by: §1.
 [55] (1987) Elastically deformable models. In ACM Annual Conference on Computer Graphics and Interactive Techniques, pp. 205–214. Cited by: §1.
 [56] (2005) Face transfer with multilinear models. ACM Transactions on Graphics 24 (3), pp. 426–433. Cited by: §2.1.
 [57] (2010) Robust 3d face recognition by local shape difference boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (10), pp. 1858–1870. Cited by: §7.
 [58] (2020) Facescape: a largescale high quality 3d face dataset and detailed riggable 3d face prediction. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 601–610. Cited by: Fig. 1, Fig. 8, §5, Fig. 9, §7.1, TABLE III.
 [59] (2015) Goicp: a globally optimal solution to 3d icp pointset registration. IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (11), pp. 2241–2254. Cited by: 4th item.
 [60] (2006) A 3d facial expression database for facial behavior research. In International Conference on Automatic Face and Gesture Recognition, pp. 211–216. Cited by: §2.2, §7.1, TABLE III.
 [61] (2010) Ricci flow for 3d shape analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (4), pp. 662–677. Cited by: §2.2, §2.2.
 [62] (2016) Functional faces: groupwise dense correspondence using functional maps. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 5033–5041. Cited by: §2.1.
 [63] (1994) Iterative point matching for registration of freeform curves and surfaces. International Journal of Computer Vision 13 (2), pp. 119–152. Cited by: §1.
 [64] (2017) Face alignment in full pose range: a 3d total solution. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (1), pp. 78–92. Cited by: §1.
 [65] (2018) State of the art on monocular 3d face reconstruction, tracking, and applications. In Computer Graphics Forum, Vol. 37, pp. 523–550. Cited by: §1.
 [66] (2015) Shapebased automatic detection of a large number of 3d facial landmarks. In IEEE conference on Computer Vision and Pattern Recognition, pp. 4639–4648. Cited by: §2.1.
Comments
There are no comments yet.