Towards Fine-grained 3D Face Dense Registration: An Optimal Dividing and Diffusing Method

Dense vertex-to-vertex correspondence between 3D faces is a fundamental and challenging issue for 3D 2D face analysis. While the sparse landmarks have anatomically ground-truth correspondence, the dense vertex correspondences on most facial regions are unknown. In this view, the current literatures commonly result in reasonable but diverse solutions, which deviate from the optimum to the 3D face dense registration problem. In this paper, we revisit dense registration by a dimension-degraded problem, i.e. proportional segmentation of a line, and employ an iterative dividing and diffusing method to reach the final solution uniquely. This method is then extended to 3D surface by formulating a local registration problem for dividing and a linear least-square problem for diffusing, with constraints on fixed features. On this basis, we further propose a multi-resolution algorithm to accelerate the computational process. The proposed method is linked to a novel local scaling metric, where we illustrate the physical meaning as smooth rearrangement for local cells of 3D facial shapes. Extensive experiments on public datasets demonstrate the effectiveness of the proposed method in various aspects. Generally, the proposed method leads to coherent local registrations and elegant mesh grid routines for fine-grained 3D face dense registrations, which benefits many downstream applications significantly. It can also be applied to dense correspondence for other format of data which are not limited to face. The core code will be publicly available at https://github.com/NaughtyZZ/3D_face_dense_registration.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 5

page 6

page 8

page 9

page 12

page 13

page 14

12/20/2012

Automatic landmark annotation and dense correspondence registration for 3D human facial images

Dense surface registration of three-dimensional (3D) human facial images...
06/06/2020

A Sparse and Locally Coherent Morphable Face Model for Dense Semantic Correspondence Across Heterogeneous 3D Faces

The 3D Morphable Model (3DMM) is a powerful statistical tool for represe...
08/28/2020

Pixel-Face: A Large-Scale, High-Resolution Benchmark for 3D Face Reconstruction

3D face reconstruction is a fundamental task that can facilitate numerou...
10/06/2021

Topologically Consistent Multi-View Face Inference Using Volumetric Sampling

High-fidelity face digitization solutions often combine multi-view stere...
06/10/2020

3D Human Mesh Regression with Dense Correspondence

Estimating 3D mesh of the human body from a single 2D image is an import...
02/07/2020

Image Fine-grained Inpainting

Image inpainting techniques have shown promising improvement with the as...
12/09/2021

ScaleNet: A Shallow Architecture for Scale Estimation

In this paper, we address the problem of estimating scale factors betwee...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Fig. 1: Qualitative results for two examplar faces in FaceScape [58] dataset (filename: (122)13_lip_funneler.obj and (340)4_anger.obj). The input target meshes are the initial results in correspondence with the template mesh by the NICP [2] method, and the output targets are the optimized results by the proposed method in this work. Some corresponded vertices with notable differences by the proposed method from the input meshes are marked with colored dots for better viewing.

Dense registration of 3D face seeks canonical representation of different facial surfaces such that their detailed structures can be compared. It is also named as dense correspondence in many prior works since the dense vertex-to-vertex mappings are established after the registration is completed. Dense registration of 3D face is fundamental to many downstream tasks for 3D and 2D facial analysis due to the invariance of 3D shape to pose and illumination variations. For example, it is an essential step for 3D statistical face modeling [6, 8, 35, 20, 26, 29, 45]. It has also activated many solutions to problems for 2D faces [64, 28, 37, 11, 53, 65, 15].

Dense registration of 3D face belongs to the common non-rigid registration problem and has its own characters:

  • It is an ill-posed optimization problem since the solution is not uniquely defined. In a departure from the rigid case for solving an optimal rigid motion, the non-rigid registration problem belongs to a much larger class that has no explicit formulation.

  • It is a domain-specific problem targeted at 3D facial surfaces. The intrinsic anatomical structure of face should be considered as vital clues for vertex-to-vertex correspondences to guide the non-rigid registration.

In the common case, the 3D shape (commonly surface in real-life applications) non-rigid registration problem [1, 10, 54, 38, 39] can be revisited as an optimization problem for classical elastic shape similarity modeling [55, 50]:

Given a template surface and a target surface , the problem is to solve optimal parametrization that minimizes the following deformation energy measuring the difference between two surfaces as

(1)

where is is the Frobenius norm, and are the first and second fundamental forms of the surface , respectively, and and are the corresponding forms of as the deformed version of . The first fundamental form measures the difference between and in a 2D embedding space on parameterized surface , while the second fundamental form measures the local difference of curvature. The two terms in Eq. 1 are weighted by and for tangential and normal distortions, respectively. This formulation favors the non-rigid deformation to be locally rigid. However, local rigidity is very abstract and Eq. 1 is difficult to be optimized directly.

In a particular case for 3D face, the state-of-the-art works [2, 34, 41, 42, 25] commonly achieve vertex-to-vertex correspondence by a non-rigid deformation from a template face to a target face

. In a pairwise manner, the template face is usually a well-customized mesh as the initialization for registration. Then, the optimization for registration is generally modeled as an expectation–maximization (EM) problem as follows.

  • E-step: searching for a plausible preliminary vertex-to-vertex correspondence between and .

  • M-step: solving an optimal non-rigid deformation from to under some certain constraints.

The above process is in fact a generalization of the well-known iterative closest point (ICP) algorithm [5, 13, 63] for rigid registration to the non-rigid case by alternating the above two steps. In the E-step, correspondence of vertices is established with some certain rules, such as landmark correspondence as hard feature constraint and closest vertex correspondence as soft constraint. In the M-step, the non-rigid deformation is usually modeled as some locally smooth deformations, such as locally rigid transformation [27, 23] and locally affine transformation [24, 2]. For example, the NICP [2] algorithm includes a landmark term together with a distance and a stiffness term in the full objective function for optimization. The state-of-the-art (SOTA) works commonly include feature matchings for normal and curvature, e.g. the iterative closest normal point (ICNP) methods [40], therefore achieve reasonable results for minimizing the second term in Eq. 1. In this work, we show that the result can be further optimized for the first term of Eq. 1, which leads to better local rigidity, and furthermore a unique solution to the registration problem. This is neglected in the prior works yet a quite important issue for 3D face correspondence. In fact, better optimization for the first term in Eq. 1 results in superior representation of 3D facial surface. And it also leads to coherent local registration and elegant mesh grid routing referring to a template mesh as shown in Fig. 1.

Optimizing the first term of Eq. 1 for dense correspondence can be considered as dividing a surface according to a certain template, as in Fig. 2. If we shrink the dimension of the original problem from 3D surface to 2D plane, then it becomes a problem to solve for optimal locations of vertices to divide the plane into similar triangles. If we further evolve the problem to the 1D case, then the definition is clear and the result is unique (as in Fig. 2):

Given a template line with end points and dividing points , one can find optimal segmentations on a target line with end points by satisfying

(2)
Fig. 2: Evolution from an original registration problem on a 3D surface to a 2D plane, and finally to a 1D line by the “reasoning in lower dimension” methodology.

While the segmentation of a line in a 1D Euclidean space is simple, the generalization to a 3D surface on a Riemann manifold requires extensive studies. In this paper, we propose a novel dividing and diffusing method for 3D face dense registration. Starting from a target mesh in correspondence with a template mesh, the proposed method alternates between a dividing step and a diffusion step to achieve fine-grained vertex-to-vertex correspondence between them. We further propose a multi-resolution (MR) version of the dividing and diffusing method to facilitate the correspondence process. We show that the proposed method is correlated to the minimization of the distance between the template and the target face embedded in a logarithmic metric space for local scaling. In summary, the main contributions of this work are as follows:

  • We propose a novel dividing and diffusing method for fine-grained optimization of 3D face dense correspondence, which leads to superior representation of 3D facial shape.

  • We propose a multi-resolution version of the dividing and diffusing method for fast convergence.

  • We elucidate the physical meaning of the proposed method as smooth rearrangement of a local scaling metric for 3D facial shape, which leads to coherent local registrations and elegant mesh grid routines.

2 Related Work

3D face dense registration is an extremely non-linear optimization problem that requires a well initialization and plausible constraints for optimization. If we consider the problem as pairwise correspondence between two faces, then the optimization problem can be initialized by a template with a few labeled landmarks as feature correspondence to the target. Most of the current works in this field can be classified into deformation based and deformation free ones according to their optimization strategies.

2.1 Deformation Based Method

These methods commonly deform a template face to a target face under some constraints on reasonable initializations, feature correspondences, and locally smooth deformations, which are discussed respectively as follows.

Initialization. A well-customized template mesh is generally used as the initialization for a subsequent deformation process. Then, the template face is usually aligned to the target face with a globally rigid or affine transformation driven by a few labelled landmarks or some other significant features in correspondence. For example, Mohammadzade et al. [40] use automatically detected nose tip to guide the global alignment of the template face to the target face. Amberg et al. [2] use an affine transform to align the template face to the target face with the correspondence of a few annotated landmarks. Some other works [22, 34, 27] adopt rigid transformations to align the template to the target face. In particular, Fan et al. [23] construct a group of locally rigid transformations to match the landmark locations exactly. The above methods generally employ rigid or approximately rigid registrations as the preliminary step for non-rigid deformation.

Feature correspondence. Deformation guided by feature correspondence can maximally preserve the basic structure of a human face, while being able to converge to a reasonable solution for global optimization as well. Current works [2, 30, 22] commonly use some pre-annotated landmarks as feature correspondence across different faces. Generally, each pair of landmarks in correspondence dominates over other vertices in a local region around it. The landmark correspondences control the holistic shape and help to deal with expression variations in the guided deformation. There are also some other works that use landmark-free approaches in a departure from the landmark-based feature correspondences. For example, Gilani et al. [66] use level set geodesic curve to extract seed points automatically for global and local deformations. Fan et al. [22] use some high-entropy points, which can be considered as dense feature points in significantly high-curvature areas of face to guide the global deformation. Pan et al. [42] use some denser points to model large deformations on the mouth region. The denser points in these methods can be regarded as robust representations of facial features, which are more tolerant to individual landmark matching errors. However, the capacity for guiding large deformation is discounted without accurate and definite landmarks, therefore being suboptimal for faces with large expressions to that with neutral expression.

Smooth deformation. Locally smooth deformations also preserve the basic structures of a 3D facial surface, while enable the deformed template to adhere to the target face gradually. Myronenko and Song [41] regularize the offset of each vertex by total variation constraint for local smoothness, as a general non-rigid registration method. Patel and Smith [43] use a thin-plate spline warp towards the target face for smooth deformation. Zhang et al. [62]

incorporate functional maps into the deformation process, and ensure smooth local deformation by the low-frequency basis of the eigenfunctions of the Laplace-Beltrami operators. The NICP method 

[2] and its variant [14] incorporate locally affine transformations as smooth deformations. Fan et al. [23] construct a group of locally rigid transformations to guarantee smooth shape deformations. The deformations in these methods generally alternate with refined correspondence for each vertex on a face as restrictive optimization to acquire the results for non-rigid registration. In some other methods [56, 7, 25], a prior model originated from a number of face prototypes in correspondence are incorporated to fit the deformed target faces. However, this is a chicken-and-egg problem and the correspondence problem of face prototypes remains to be solved.

The deformation based methods are able to reach a reasonable solution with an alternative EM approach for minimizing Eq. 1. However, since most of these methods employ different rules for both surface deformation and vertex correspondence, they generally lead to distinct solutions and neglect the optimality of the results. Therefore, the 3D face dense correspondence problem has no standard solution in the existing works. In this work, we propose a dividing and diffusing algorithm to optimize the results for 3D face dense registration, in particular for the tangential parameterization in the first term of Eq. 1, and achieve stable results which are largely independent of preliminary correspondences.

2.2 Deformation Free Method

Deformation free methods establish dense vertex-to-vertex correspondence without an explicit deformation process. These methods commonly involve some point matching strategies with the guidance of a few significant features, such as facial landmarks. Other geometric features such as curvature and normal are usually employed as signatures for local shape matching. We list some representative methods in the literature as follows.

In the seminal work of 3D Morphable Model (3DMM) [6], Blanz and Vetter propose to encode and map both 3D shape and texture features to a 2D cylindrical coordinate, and use a regularized optic-flow based algorithm for dense correspondence of 3D faces.

Sun and Abidi [52] project geodesic contours around some feature points onto their tangential planes, which are further used as signatures for shape matching. This method is further improved and applied to 3D face dense correspondence by Salazar et al. [47] on the BU-3DFE dataset [60] for modeling shape variation from large expressions.

Gu et al. [33, 61] propose a Ricci flow based method for conformal mapping of a 3D facial surface to a 2D canonical coordinate. Then, some uniform dividing strategies are conducted, with a few pre-annotated landmarks as fixed feature points.

Gilani et al. [31] first detect sparse correspondences on the outer boundary of a 3D face, and then triangulate existing correspondences and expand them iteratively by matching points of distinctive surface curvature along the triangle edges. Finally, the triangle mesh is refined by evolving level set geodesic curves.

The deformation free methods can be seen as employing some indirect dividing strategies on the target facial surface. Since the dividing of a 3D surface directly is difficult, some of these methods reduce the dimension of the original problem from 3D to 2D [6, 33, 61, 32]. However, the limitation is that the intrinsic geometric features are only represented approximately, since it is unable to isometrically embed a non-flat 3D surface (e.g. 3D face) into 2D plane perfectly. This is particularly difficult for complex 3D surface, such as 3D face with large expressions and 3D hand with flexible joints. In this work, we propose a novel iterative dividing and diffusing method on the original target facial surface directly referring to the mesh architecture of a template facial surface, in a departure from the existing indirect dividing methods.

3 Segmentation of a Line

A scientific methodology to solve a problem in high-dimensional space is to first consider the problem in a low-dimensional case. Following this idea we cast the registration problem for 3D facial surface as a line segmentation problem shown in Fig. 2. The problem asks for a tuple of optimal sub-lines on a target line referring to a certain template line proportionately. One can simply use a ruler to measure the length of the two lines and then determine the dividing points according to Eq. 2. Unfortunately, such a global ruler measuring the “length” of a 3D surface does not exist, while the difference between two surfaces can be compared locally. Keeping in mind that we do not have a global “ruler”, we employ an iterative approach to solve this segmentation problem instead. We formulate the problem as a dividing and diffusing process for solving Eq. 2 as follows.

Given a template line with points and a target line with initialized corresponding points , one can compute the optimal dividing points on by iteratively alternating the following two steps:

  1. Dividing. Compute each optimal point on referring to each sub-line triplet on by satisfying

    (3)

    where denotes the iteration number.

  2. Diffusing. Denoting as the renewing offset for each point in the iteration, a local average strategy is applied as

    (4)

    and each renewed point is computed by

    (5)

The iterative process is able to reach a final solution when the renewed offset is smaller than a certain threshold. Table I shows an example of the renewing process for a target line referring to a corresponding template line with . The total error to the ground truth as

(6)

the total renewed offset as

(7)

and each renewing point are shown in Fig. 3.

Fig. 3: Illustration of the iterative dividing and diffusing process in Table I. The change of some variables is shown.
Iterative Line Segmentation w/o MR Scheme
Iteration Number Point Location
0
1
2
5
10
18
Iterative Line Segmentation w/ MR Scheme
Iteration Number Point Location
0
1
2
Ground Truth

Reference Template
TABLE I: An example of the iterative dividing and diffusing process for point correspondences. The points on the template line are , and the points on the target line are initialized as . It shows that the renewing points gradually approach to the ground-truth ones (directly computed via Eq. 2) as in iterations, while a multi-resolution scheme accelerates this process.

This iterative process successfully achieves proportional division of a target line according to a certain template line. An additional advantage is its robustness to different initializations. The result still converges to the optimal solution, even if the initialized sublines intersect with each other, i.e., . An example is shown in Fig. 4, where the target line is initialized with . This indicates that the generalized method on 3D surface can repair the self-intersection caused by flawed deformation, which is an attractive property for 3D face dense correspondence (also see Fig. 1). However, the convergence speed is slow and can be improved further. If we use a multi-resolution scheme, i.e., first locate for a coarse resolution and then locate for a fine resolution, two iterations are sufficient for convergence as in Table I. This motivates us to propose a multi-resolution (MR) version of the dividing and diffusing algorithm for 3D face dense registration, which will be elaborated afterwards.

Fig. 4: An example for the iterative method dealing with self-intersected initialization. The points on the target line are initialized as , with for self-intersection.

4 Segmentation of 3D Facial Surface

In this section, we propose an iterative dividing and diffusing method for the registration of 3D facial surfaces. Generalizing the algorithm in Section 3 from 1D line to 3D surface requires appropriate definition of the “dividing” and “diffusing” process, respectively. We employ local registration of a vertex’s -ring neighbors as the local “dividing” process. We then construct a global optimization problem with constraints on local smoothing for the “diffusing” process. Contrary to the 1D case, the global optimization on a 3D surface involves additional fixed feature points (e.g. landmarks) on the surface and a least-square formulation is put forward to deal with this issue.

4.1 Dividing

The local registration process starts from a pair of initially registered template and target facial surfaces. We simply use the -ring neighbors of a vertex as the basic local cell in the common triangle mesh representation of 3D surface. Other formats of data are also applicable by constructing a local cell with some nearest neighboring vertices.

Let be a template mesh including vertices in , edges in , and triangles in ; and let be the corresponding target mesh. For each vertex , we denote its -ring neighbors as and its corresponding vertices on the target as and . Then, we suppose there exists a rigid transformation that aligns to , as

(8)

where

can be estimated by a least-square alignment problem of the surrounding

-ring neighbors, as

(9)

Here denotes the space of all rank- orthonormal matrices with the determinants to be (i.e. Givens Matrices). The problem in Eq. 9

can be solved efficiently by a singular value decomposition (SVD) based method 

[51].

After the rigid transformation is obtained, we compute the preliminary offset for each vertex by

(10)

The local registration process is illustrated in Fig. 5.

Fig. 5: Illustration of the local registration process for a vertex’s -ring neighbors.

4.2 Diffusing

The preliminary offset for each vertex in Eq. 10 should not be managed isolatedly, since the adjacent vertices share some common vertices in their -ring neighbors. Therefore, we construct a diffusing (local smoothing) strategy to spread out the local effect for global optimization. We denote and formulate the problem as

(11)

where is the weights for diffusing of each offset and the superscript is omitted for brevity.

Solving Eq. 11 requires taking the partial derivative with respect to each offset and leads to a linear system

(12)

where

(13)
(14)

and

(15)

Here

is the identity matrix,

is the number of vertices in , and the superscript denotes the operation for matrix transpose. If we let be the coefficient matrix of Eq. 12, then is a sparse and strict diagonal-dominant matrix. The solution for Eq. 12 is discussed in two different cases:

Case : Without Fixed Vertices. In this case, all vertices are free to be renewed. Thus is a full-rank square matrix such that the linear system is well-posed and has a unique solution as

(16)

This is similar to the mesh correction algorithm proposed in [22], which is applicable to neutral faces with small deformations.

Case : With Fixed Vertices. In this case, some vertices are fixed as constraints for solving Eq. 12, in which is a rank-deficient matrix. Let be the number of fixed vertices (), we exclude the columns in and the rows in which are related to the fixed vertices. Then Eq. 12 is equal to

(17)

which is an over-determined linear system. It can be further converted to a well-posed system as

(18)

Then the least-square solution to Eq. 17 is

(19)

By this way, we are able to process the mesh by fixing some already well-corresponded vertices while renewing other vertices. This is particularly useful for faces with large expressions, where feature points (e.g. landmarks) are used to guide the overall correspondences.

The vertex on the target mesh is added by each offset in after solving Eq. 11, as

(20)

Finally, the closest vertex on the target surface to the vertex computed by Eq. 20 is used to update the target mesh.

4.3 The Overall Algorithm

The overall algorithm alternates between the above dividing and diffusing process until the average of the renewed offset

(21)

is smaller than a certain threshold . In practice, the implementation of the proposed method involves the following details, which are not trivial.

Fig. 6: (a) Different classes of vertices;(b) Weights for global diffusing.
  • Categorizing of Vertices. The raw scanning data are commonly not able to capture some inner structures accurately on a face, such as the nostril and the ear regions. These structures are created from a template face in many 3D face dense registration method. We define the vertices in these regions as non-interested vertices. The other vertices are defined as interested vertices which are applied to the proposed method. In addition, the interested vertices can be classified into fixed vertices and free vertices. The fixed vertices include the landmarks, the edges of facial mesh, and the edges between interested and non-interested vertices. Fig. 6 (a) shows the different classes of vertices.

  • Effects of Fixed Vertices. The fixed vertices should serve as constraints during the dividing and diffusing process. We apply the dividing process for each fixed vertex and its -ring neighbors. We then substitute each preliminary offset by Eq. 10 into the diffusing process. However, we do not renew the fixed vertices with Eq. 20 in the iterations. Therefore, the fixed vertices are used to “drag” and “pull” the neighboring vertices to correct locations. Taking a row for a fixed vertex in Eq. 17 for example, we obtain

    (22)

    where the fixed vertex transmits a “reacting force” to its neighboring vertices. To this end, we set the weights for global diffusing in Eq. 11 to decrease with the geodesic distance to the fixed vertices to enlarge the effect of fixed features, as in Fig. 6 (b).

  • Skills in Solving Eq. 18. We do not use Eq. 19 as matrix inversion to solve Eq. 18 in practice. Since is a symmetric positive definite matrix, we adopt three steps:

    I. Re-ordering of variables and allocating memory;

    II. Cholesky factorization of ;

    III. Substitution and iterative refinement.

    Step I and II only require to be computed once in the whole algorithm. Therefore, solving step III in the sparse linear system is very fast by incorporating an advanced computational tool111https://pardiso-project.org/.

  • Nearest Neighboring Vertex Searching on Surfaces. In recent works [42, 59, 31, 22] for nearest vertex searching on a 3D surface, a K-D tree architecture [3] for point cloud is commonly used to accelerate the computational process. However, the closest vertex on a facial surface to a given vertex is not necessarily a vertex in a K-D tree. It generally locates insides a triangle unless the sampled point cloud is extremely dense. In this work, we use an axis-aligned bounding box (AABB) tree [4] considering both point cloud ( vertices) and data structures ( triangles) to improve the computational efficiency. This way also leads to accurate location of nearest neighboring vertex. Fig. 7 shows the difference between the results by the two different trees, which is commonly neglected in the existing works for 3D face dense registration. We use an advanced package222https://libigl.github.io/ for the implementation of AABB tree and it only requires to be built once for a certain target face in our method.

Fig. 7: The closest vertex searching on two different data structures. The closest vertex to a given vertex is and by the AABB tree and the K-D tree, respectively.

The overall method is concluded in Alg. 1. This method can generally filter out the non-uniform mesh grid routine and achieve coherent local registrations for a target mesh, as shown in Fig. 1. It also leads to better quantitative results for a group of data, which will be elaborated in the experiments.

Input:
 A template mesh
 A target mesh
Output:
 A target mesh

1:Initilize
2:Do re-ordering of variables, memory allocation, and matrix factorization for Eq. 12
3:Build an AABB tree for
4:while  do
5:     Compute each offset by the dividing step in Eq. 9 and Eq. 10;
6:     Compute each regularized offset by the diffusing step in Eq. 11;
7:     Add each regularized offset to by Eq. 20;
8:     Traverse the AABB tree on for closest vertex searching and renew accordingly.
9:return
Algorithm 1 Iterative Dividing and Diffusing on Surface

5 Multi-resolution Analysis

Section 3 shows that a multi-resolution (MR) version of the dividing and diffusing algorithm on a line accelerates the convergence speed greatly. In this section, we generalize the multi-resolution based method to the facial surface and expect this will also benefit the convergence speed. To this end, the template face for registration is resampled into different resolutions. We incorporate a hierarchical strategy for the representation of 3D faces.

Since the proposed method starts from a template face and a topological uniform target face by an existing method, multi-resolution analysis on the template face is sufficient. The counterpart on the target face follows the correspondence of each vertex. We organize the vertices on a template face in a pyramid structure with farthest point sampling (FPS) method [21], which ensures uniform spacing of included vertex in an iterative manner. Suppose that a template face include free vertices and fixed vertices (original feature points), the steps for the reconstruction of a multi-resolution model are as follows.

  1. Compute vertex-to-vertex geodesic distances on .

  2. Define the number of free vertices as a pyramid architecture with different levels decimated by a factor .

  3. Initialize fixed vertices as included vertices for FPS.

  4. Include vertices sequentially using FPS, such that the MR model includes vertices for different resolutions , respectively.

We use a fast heat-flow based method [16] for the computation of vertex-to-vertex geodesic distance on the template mesh, which is the metric for FPS in our method. In the first coarse resolution , we use the original feature points as fixed vertices and included vertices as free vertices. In other resolutions, we use the vertices in the previous resolution as fixed vertices, and the lately included vertices in the current resolution as free vertices. The dividing and diffusing algorithm is applied to each resolution in a cascade manner. Finally, we denote the method in Section 4 as the full-resolution and apply it to reach the final results. The different number of fixed and free vertices in each resolution is summarized in Table II.

Resolution Free Vertices Fixed Vertices
TABLE II: The number of fixed and free vertices in different resolutions of the MR model.
Fig. 8: An example of the customized multi-resolution model for a template face in the FaceScape dataset [58].

An example for the MR model in the FaceScape dataset [58] is shown in Fig. 8, where we set . Unlike the full-resolution model that the neighboring relationship of all vertices are defined by the -ring neighbors of the triangle mesh, this relationship for each vertex in other resolutions requires to be customized. In this work, we define the neighboring vertices by the following rules.

  • If a vertex is on the edges of the mesh, we adopt two vicinity vertices and another nearest vertex inside the mesh as its neighbors.

  • If a vertex is inside the mesh, we adopt six nearest neighboring vertices as its neighbors.

  • The neighboring relationship is symmetrized: i.e. if vertex is incident to vertex , then we make incident to as well.

In these rules, we use geodesic distance as the basic metric for nearest neighbor searching. The neighboring relationship is further used to replace the -ring setting in Eq. 11.

6 Local Scaling Metric

In this work, we consider the 3D face dense correspondence problem in a low-dimensional way, and propose an iterative dividing and diffusing method motivated by simple proportional segmentation of a line. As a result, the proposed method divides a target facial surface as rigidly as possible referring to a given template facial surface, while constrained by some fixed feature correspondences. In this way, the proposed method forces the difference between the target and template surface to vary in a locally smooth manner. We make some modifications upon Eq. 2 and define a

similarity score vector

between two faces according to their edge differences as

(23)

where and are the corresponding edge lengths of the target and template face, respectively. The ratio is equal to if the corresponding edge lengths are the same. We define the local scale embeding vector of a target face as

(24)

by taking a common template face and using the logarithmic notation, where is the length of each edge on the template face. By this way, the template face locates at the origin in a high-dimensional space. We define the induced local distance vector between two faces and as

(25)

where are the lengths of the corresponding edges. We add up and average each element in Eq. 25 and obtain a global distance metric

(26)

as

(27)

where

(28)

is each normalized weight that compensates for different edge lengths on the template face. The metric in Eq. 27 satisfies the following theorems.

Fig. 9: Two examples in the FaceScape dataset [58]. The global distance metrics for each face before and after optimization by the proposed method are annotated, and the local distance vectors are illustrated as color maps on the facial surfaces. Please zoom in to see the more smooth and locally coherent details on the color maps after optimization.

Theorem I (Identity of Indiscernibles). Given two arbitrary faces and , we have

(29)

and the equality holds if and only if

(30)

Proof. Inequality 29 is correct since every element in Eq. 27 is positive. We now focus on proving

(31)

The steps are as follows.

(32)

Theorem II (Symmetry). Given the local scale embedding vectors and of two arbitrary faces, we have

(33)

Proof. Considering each of the elements in and , we have

(34)

Thus Eq. 33 holds.

Theorem III (Triangle Inequality). Given the local scale embedding vectors , , and of three arbitary faces, we have

(35)

Proof. Considering each of the elements in , we have the following inequality

(36)

Then it follows

(37)

Theorem I to III guarantee mathematical rigorism for a given metric and indicate that the local scale embedding vector for each face can be treated as separable points in a high-dimensional Euclidean space. Physically, we represent the scale of a face locally referring to a template face in the tangential direction of the target surface. The geometric features of a face can be determined uniquely by the local scale in combination with the curvature (in the normal direction of the target surface). One merit of the local scale embedding vector in Eq. 24 as a pack of scalars is its invariance to rigid transformations, thus represents the intrinsic features for a face. The proposed dividing and diffusing algorithm tends to minimize the global distance metric for two faces denoted by Eq. 27. We show two examples in Fig. 9 for both the local and global metrics in Eq. 25 and Eq. 27. We can see that the global distance metric by Eq. 27 is significantly smaller after applying the proposed method between two faces, while the vector in Eq. 25 becomes more smooth and locally coherent on the target surface. In Fig. 9, we can also see that the local distance vector measures the degree of stretching () or shrinking (

) of a face with respect to another face. This is particularly meaningful as an indicator for shape variances caused by different identities and expressions.

7 Experiment

Assessing the results for 3D face dense registration is not an easy task. On the one hand, some feature vertices are definite for anatomical significances and are visibly salient for local shape and texture features. On the other hand, most vertices for a dense face model are not definite with the ground-truth. Some existing works commonly use indirect ways for evaluation under different perspectives, such as representation ability of resulted face model [23, 31, 25], landmark localization accuracy [49, 17, 30, 36, 25]

, and 3D face recognition performance 

[9, 57, 19, 22]

. In this work, the proposed dividing and diffusing method gives practicable definition to the problem by extending the fixed feature correspondence to the overall dense correspondence. Thus we treat 3D landmark detection (either manually or automatically annotated) as a pre-processing step and do not use it for evaluation. We do not use 3D face recognition performance for evaluation either, since it requires post-processing, such as shape clipping and feature extraction fed into different classifiers. In addition to the commonly used metric as groupwise representation ability of resulted face model, we include extensive experiments including computational efficiency, robustness to initialization, and direct metric embedding tailored for the proposed method for a full evaluation in different perspectives.

7.1 Datasets

We carry out the experiments on two publicly available datasets including BU-3DFE [60] and FaceScape [58]. They are two typical datasets since both of them are rich for expressions and densely constructed by scans from multiple directions. BU-3DFE includes facial samples with different expressions in different levels by actors. The resolution of raw scanning data is around vertices per face. FaceScape is publicly available recently in and includes high-resolution (around vertices) 3D faces with different expressions. It also provides the registered faces which share the same topology as triangle meshes with vertices and triangles. It is worth mentioning that the collection of large and high-quality 3D dataset is not an easy task. The existing works on 3D dataset [48, 44, 12, 8] play indispensable roles for the advancement of 3D face applications.

Fig. 10: An example for convergence speed of the proposed method. The full resolution version converges in iterations, while the MR version converges in ,,,, and iterations for different resolutions in a cascade manner.

7.2 Computational Efficiency & Convergence Analysis

In the proposed method, the sparse matrix decomposition, the neighboring relationship of each vertex, and the multi-resolution organization of vertices require to be computed only once for a common template face. The building of AABB tree is computed once for each target face in the registration process. Thus we neglect the cost for these steps. The local registration process and the iterative refinement for solving Eq. 17 contribute to most of the computational time in practice. The former involves small matrix multiplications () and SVD decomposition () for each vertex (totally ), while the latter involves iterative substitution of a sparse rank- coefficient matrix. Thus the computational complexity for the full-resolution version is approximately

(38)

In a MR version with the number of vertices growing exponentially in each resolution, the computational complexity is theoretically reduced to

(39)

Thus the computational time increases linearly and logarithmically with the number of vertices for the full resolution and the MR version of the proposed method, respectively. In our implementation333We implement the proposed method on the software and hardware environments of VC++2015 and i7-9700 (single thread), respectively., it requires for the full-resolution version on average for a model with vertices, while the MR model reduces the time to . Fig. 10 shows an example by setting the threshold in Eq. 21 to , which is normalized by the mean edge length of a template face. We include both the full-resolution version and the MR version for comparison. It shows that the proposed method is very efficient, while the MR version accelerates the convergence speed further. The faster convergence of the MR version is due to both less iterations and less vertices for operations in low-resolution steps.

Fig. 11:

The differences for per-vertex errors between the optimized results from two different initialization methods as functions to the iteration steps. The errors for both the mean and standard deviation are normalized to

with a common factor in the errorbar map.

7.3 Definite with Stable Correspondence

Section 3 shows an example for iterative line segmentation which results in a unique solution defined by Eq. 2. We further guess that the proposed dividing and diffusing method is also able to converge to a unique solution which is independent to different initializations. We now test the influence of different initialization methods and noisy initializations, respectively.

Fig. 12: Qualitative results for an exemplar face with different level of noise in the BU-3DFE dataset. Both the average per-vertex errors and the standard deviations of added noise are normalized by the mean edge length of a template face. Please zoom in to see the details for mesh grid routine.

Initialization with different methods. First, we select samples in the BU-3DFE dataset which cover different identities and expressions. The initial correspondences of these samples are established with two different and reproducible methods, the NICP method [2] and a local shape deformation (LSD) method [23]. We apply the proposed method (full-resolution version) to these corresponding samples and compare the average per-vertex errors between the results from the initializations with the two different methods. Fig. 11 shows the normalized results. We can see that both the mean and the standard deviation of the errors decrease with the number of iterations. This means that the results by the proposed method are independent from different initializations to some extent. The final residual errors should be caused by different mesh-edge vertices (as fixed vertices) and shape fitting errors in the normal direction by different methods.

Initialization with noise. Then, we extract the corresponded faces in the BU-3DFE dataset by the local shape deformation method [23]

and add different level of noise on free vertices for each face in the tangential directions. The distribution of noises on each vertex follows a uniform distribution with variance

. Fig. 12 shows a qualitative example of optimized meshes with different levels of added noise. The average per-vertex errors between the results from a clean target and that with noise are also annotated. We can see that the proposed method is able to “repair” the noisy mesh grid routine and leads to locally coherent registrations. The vertex locations of corresponding faces and the mesh structures on the results across different level of noise are also stable. Therefore, the proposed method is very robust to noisy initializations.

The above experiments on different initializations show some evidences that the proposed method leads to stable correspondences with the fixed vertices (fixed features and mesh-edges). Therefore, the proposed method gives the definition of vertex correspondences on smooth regions of faces to some extent, which is a critical issue in the field of 3D face dense correspondence.

7.4 Evaluation with the Proposed Metric

Section 6 introduces a local scaling metric that forces the registered target face to vary in a locally smooth manner referring to a template face. We now use both the global (Eq. 27) and local (Eq. 25) annotations for this metric to evaluate the correspondence results for the FaceScape and BU-3DFE dataset. We choose the topological uniform samples provided by the FaceScape dataset as initializations for the proposed method. The dense correspondences of these samples in topological uniform formats are achieved by a classical NICP [2] method carefully with extra manual assistance. Therefore, we consider this as an enhanced NICP (E-NICP) baseline. The implementation of non-rigid registration methods commonly involves a lot of parameter settings and additional expertise in this field, against which the public baseline for FaceScape dataset avoids the induced subjective biases. In addition, we implement both the NICP method [2] and the local shape deformation method [23] as baselines for the BU-3DFE dataset. The parameters for both of the two methods are carefully tuned to minimize implementation errors.

First, we apply the proposed method (both the full-resolution and MR versions) to the samples444We exclude some problematic data in this dataset. in the FaceScape dataset and samples in the BU-3DFE dataset. Table III shows the results for the global metric in Eq. 27 averaged over all samples in comparison with those by the baseline methods. We can see that the global metric is smaller after applying the proposed method, which denotes better coherent local deformations. In addition, the results by the full-resolution version and the MR version show no significant differences, since the full-resolution optimization is adhered to the final step of the MR version in this work.

Then, we apply linear discriminative analysis (LDA) to the local scaling embedding vectors of samples with respect to different expression labels. We show the clustering results for the first dimensions before and after applying the proposed method in Fig. 13 (a) and Fig. 13 (b), respectively. It shows that the discriminate ability with respect to different expressions is largely improved after applying the proposed method. Specifically, the expressions with jaw_left, neutral, brow_raiser, and jaw_forward (see Fig. 14 for an example) are mixed sequentially in pairs with each other for original representations with the E-NICP method, whereas they are separated after optimization by the proposed method. This demonstrates that the proposed method leads to better discriminative representation for 3D facial shape, which is beneficial to many downstream applications.

Dataset FaceScape [58] BU-3DFE [60]
Baseline Method E-NICP [58] NICP [2] LSD [23]
Before Opt.
After Opt.(Full)
After Opt.(MR)
TABLE III: Average quantitative results for the global metric on samples of the FaceScape dataset and samples of the BU-3DFE dataset before and after the proposed optimization method. The (enhanced) NICP and the local shape deformation (LSD) method are used as baselines for the initializations of the correspondence results.
Fig. 13: Clustering results by LDA with respect to different expressions for samples on the FaceScape dataset. The mixed expressions are marked with arrows on the maps and bold typeface on the legend.
Fig. 14: An example for faces (ID: 395) with different expressions.
Fig. 15: (a) Compactness, (b) generalization, and (c) specificity for low-dimension representations before and after applying the proposed method.

7.5 Compactness, Generalization, and Specificity for Low-dimensional Representations

Fig. 1 shows that the proposed method leads to locally coherent results between the template mesh and the target mesh qualitatively. We hope the pairwise locally coherent results also benefit the global groupwise representations of 3D facial shapes. We follow a common practice [18] for the evaluation of statistical shape models using compactness, generalization, and specificity. We use the topological uniform data in the FaceScape dataset as standard baseline for the E-NICP method.

Training&Test Set Split. The 3D faces in the FaceScape dataset include identities with typical expressions. We include the first identities ( samples) as the training set and the rest identities (

samples) as the test set. We apply principal component analysis (

PCA) to the aligned training samples to achieve a low-dimensional representative model referring to the 3DMM [6] pipeline.

Compactness is measured by the percentage of variance retained by the PCA model. Compactness is a vital property of data for dimension reduction. It is evaluated by a minimum description length principle [46]

in the machine learning community as “

the less, the better”. We compare the PCA models constructed from samples before and after applying the proposed methods, respectively. The results are shown in Fig. 15 (a). The first principal components explain 90.38% and 91.19% of the variances by the baseline model and that achieved by the proposed method, respectively. This demonstrates better compactness of low-dimensional models by the proposed method, which indicates accurate correspondence of 3D faces.

Generalization measures the ability of a model to represent novel facial shapes that are not included in the training samples. We use the low-dimensional models, which are constructed with the training samples before and after applying the proposed method, respectively, to represent the test samples with different number of principal components. The mean vertex representation errors (as per-vertex distances) averaged over all samples are shown in Fig. 15 (b). We observe that the generalization error is reduced notably after applying the proposed method to the corresponding training samples. Fig. 16 also shows a qualitative result for the representation of an exemplar facial sample with principal components, where there are visible differences between the models before and after applying the proposed method. These results demonstrate that the proposed method leads to not only better mesh grid routines (as in Fig. 1) but also enhanced ability to represent shape variances.

Fig. 16: An exemplar face (filename: (594)3_mouth_stretch.obj) showing the difference between the generalization errors before and after applying the proposed method.

Specificity

measures the validity of generated faces by a representative model. We assume each principal component of the model follows Gaussian distribution with certain variances and randomly generate novel instances for a fixed number of principal components according to each Gaussian distribution model. The generated samples are then used to search their closest samples on the test set with minimum Euclidean distances. Fig. 

15 (c) shows the average per-vertex distances with generated samples for different settings of principal component number. It shows that the results by the proposed method achieve smaller specificity error compared with the baseline method. This demonstrates the effectiveness of the generative model for synthesizing novel facial samples.

7.6 Failure Cases

The proposed method has been tested with a variety of facial data and can generally “repair” the non-uniform mesh grid routine on corresponded faces, even if the initialization is corrupted by tremendous noise (see Fig. 12) in the tangential direction. However, the proposed method cannot achieve a “clean” result when the facial surface is not reconstructed properly by a scanning device. A failure example in the FaceScape dataset is shown in Fig. 17, where the initial corresponded result by the E-NICP method fails to represent the ground-truth facial surface in the normal direction. The proposed method is not able to solve this problem thoroughly, although most of the non-uniform mesh grid routines are rectified. Therefore, we suggest that the proposed method can expand its applications in combination with the state-of-the-art surface denoising methods.

Fig. 17: A failure case for dense registration where the initialization fails to represent the ground-truth surface. Both the near-frontal view and the bottom-up view are shown. Please also zoom in to see the corrective mesh grid routine as a positive effect of the proposed method.

7.7 Application to Other Format of Data

The proposed dividing and diffusing method can be also applied to other format of data which are not limited to face. In fact, it is able to achieve fine-grained correspondence and repair the non-uniform mesh grid routine given a pair of initially registered template and target surface. Fig. 18 shows an example for hand mesh data. The common ground between the face and the hand is that they have both fixed vertices (feature points) and free vertices in the setting for dense registration. The proposed method is general under this assumption, and is thus very useful in a variety of applications for surface registration.

Fig. 18: An example for the optimization of registered hand mesh data. The fixed vertices are shown in red dot.

8 Conclusion

We propose a dividing and diffusing method for the registration of 3D facial surface in this paper. The proposed method is motivated by proportional segmentation of a line, and alternates between local dividing and global diffusing, to finally achieve pairwise and fine-grained dense correspondence of 3D facial surface. We further elucidate the physical meaning of the proposed method as smooth rearrangement of a local scaling metric for 3D facial shape. Extensive experiments have demonstrated its effectiveness, including computational efficiency, robustness to initializations, metric embedding property, and representation ability of resulted face model. The proposed method can be also used to establish correspondences for other format of surface data, such as hands. Generally, it gives a plausible definition for vertex correspondence on smooth regions for 3D face, and leads to locally coherent details and elegant mesh grid routines for dense registrations, which we hope is helpful for a variety of applications.

There is a remaining issue in the proposed method as well as in other existing works. We link the proposed method to the fundamental forms in Eq. 1 for shape analysis in this paper. We assume that the second fundamental form for curvature can be well established by a number of feature points, based on which we optimize the first fundamental form for surface parameterization. However, we observe that some high-curvature features on facial surface cannot be uniquely determined if the number of landmarks is not enough. On the contrary, the accuracy of pre-annotated landmarks is a burden if we include too many of them which are not very salient, especially for raw scanning data with tremendous noise. Therefore, balancing the number and accuracy of facial landmarks as well as including soft constraints versus hard feature constraints deserves further study. In the future, we will explore automatic and accurate landmark localization methods on point clouds for 3D facial data. We will also study “soft” method to take into consideration noisy feature matchings for robust 3D face dense correspondence with lifted performance.

References

  • [1] B. Allen, B. Curless, and Z. Popović (2003) The space of human body shapes: reconstruction and parameterization from range scans. ACM Transactions on Graphics 22 (3), pp. 587–594. Cited by: §1.
  • [2] B. Amberg, S. Romdhani, and T. Vetter (2007) Optimal step nonrigid icp algorithms for surface registration. In

    IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 1–8. Cited by: Fig. 1, §1, §1, §2.1, §2.1, §2.1, §7.3, §7.4, TABLE III.
  • [3] J. L. Bentley (1975) Multidimensional binary search trees used for associative searching. Communications of the ACM 18 (9), pp. 509–517. Cited by: 4th item.
  • [4] G. v. d. Bergen (1997) Efficient collision detection of complex deformable models using aabb trees. Journal of Graphics Tools 2 (4), pp. 1–13. Cited by: 4th item.
  • [5] P. Besl and H. McKay (1992) A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14 (2), pp. 239–256. Cited by: §1.
  • [6] V. Blanz and T. Vetter (1999) A morphable model for the synthesis of 3d faces. In ACM Annual Conference on Computer Graphics and Interactive Techniques, pp. 187–194. Cited by: §1, §2.2, §2.2, §7.5.
  • [7] T. Bolkart and S. Wuhrer (2015) A groupwise multilinear correspondence optimization for 3d faces. In IEEE International Conference on Computer Vision, pp. 3604–3612. Cited by: §2.1.
  • [8] J. Booth, A. Roussos, A. Ponniah, D. Dunaway, and S. Zafeiriou (2018) Large scale 3d morphable models. International Journal of Computer Vision 126 (2), pp. 233–254. Cited by: §1, §7.1.
  • [9] A. M. Bronstein, M. M. Bronstein, and R. Kimmel (2005) Three-dimensional face recognition. International Journal of Computer Vision 64 (1), pp. 5–30. Cited by: §7.
  • [10] B. J. Brown and S. M. Rusinkiewicz (2007) Global non-rigid alignment of 3-d scans. ACM Transactions on Graphics 26 (3), pp. 1276404. Cited by: §1.
  • [11] A. Bulat and G. Tzimiropoulos (2017) How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In IEEE International Conference on Computer Vision, pp. 1021–1030. Cited by: §1.
  • [12] C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou (2013) Facewarehouse: a 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20 (3), pp. 413–425. Cited by: §7.1.
  • [13] Y. Chen and G. Medioni (1992) Object modelling by registration of multiple range images. Image and Vision Computing 10 (3), pp. 145–155. Cited by: §1.
  • [14] S. Cheng, I. Marras, S. Zafeiriou, and M. Pantic (2015) Active nonrigid icp algorithm. In IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, Vol. 1, pp. 1–8. Cited by: §2.1.
  • [15] C. A. Corneanu, M. O. Simón, J. F. Cohn, and S. E. Guerrero (2016) Survey on rgb, 3d, thermal, and multimodal approaches for facial expression recognition: history, trends, and affect-related applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (8), pp. 1548–1568. Cited by: §1.
  • [16] K. Crane, C. Weischedel, and M. Wardetzky (2013) Geodesics in heat: a new approach to computing distance based on heat flow. ACM Transactions on Graphics 32 (5), pp. 1–11. Cited by: §5.
  • [17] C. Creusot, N. Pears, and J. Austin (2013) A machine-learning approach to keypoint detection and landmarking on 3d meshes. International Journal of Computer Vision 102 (1-3), pp. 146–179. Cited by: §7.
  • [18] R. H. Davies, C. J. Twining, T. F. Cootes, J. C. Waterton, and C. J. Taylor (2002) A minimum description length approach to statistical shape modeling. IEEE Transactions on Medical Imaging 21 (5), pp. 525–537. Cited by: §7.5.
  • [19] H. Drira, B. B. Amor, A. Srivastava, M. Daoudi, and R. Slama (2013) 3D face recognition under expressions, occlusions, and pose variations. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (9), pp. 2270–2283. Cited by: §7.
  • [20] B. Egger, W. A. Smith, A. Tewari, S. Wuhrer, M. Zollhoefer, T. Beeler, F. Bernard, T. Bolkart, A. Kortylewski, S. Romdhani, et al. (2020) 3D morphable face models—past, present, and future. ACM Transactions on Graphics 39 (5), pp. 1–38. Cited by: §1.
  • [21] Y. Eldar, M. Lindenbaum, M. Porat, and Y. Y. Zeevi (1997) The farthest point strategy for progressive image sampling. IEEE Transactions on Image Processing 6 (9), pp. 1305–1315. Cited by: §5.
  • [22] Z. Fan, X. Hu, C. Chen, and S. Peng (2018) Dense semantic and topological correspondence of 3d faces without landmarks. In European Conference on Computer Vision, pp. 523–539. Cited by: §2.1, §2.1, 4th item, §4.2, §7.
  • [23] Z. Fan, X. Hu, C. Chen, and S. Peng (2019) Boosting local shape matching for dense 3d face correspondence. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 10944–10954. Cited by: §1, §2.1, §2.1, §7.3, §7.3, §7.4, TABLE III, §7.
  • [24] J. Feldmar and N. Ayache (1996) Rigid, affine and locally affine registration of free-form surfaces. International Journal of Computer Vision 18 (2), pp. 99–119. Cited by: §1.
  • [25] C. Ferrari, S. Berretti, P. Pala, and A. Del Bimbo (2021) A sparse and locally coherent morphable face model for dense semantic correspondence across heterogeneous 3d faces. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §1, §2.1, §7.
  • [26] C. Ferrari, G. Lisanti, S. Berretti, and A. Del Bimbo (2017) A dictionary learning-based 3d morphable shape model. IEEE Transactions on Multimedia 19 (12), pp. 2666–2679. Cited by: §1.
  • [27] K. Fujiwara, K. Nishino, J. Takamatsu, B. Zheng, and K. Ikeuchi (2011) Locally rigid globally non-rigid surface registration. In International Conference on Computer Vision, pp. 1527–1534. Cited by: §1, §2.1.
  • [28] P. Garrido, L. Valgaerts, O. Rehmsen, T. Thormahlen, P. Perez, and C. Theobalt (2014) Automatic face reenactment. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 4217–4224. Cited by: §1.
  • [29] T. Gerig, A. Morel-Forster, C. Blumer, B. Egger, M. Luthi, S. Schönborn, and T. Vetter (2018) Morphable face models-an open framework. In IEEE International Conference on Automatic Face and Gesture Recognition, pp. 75–82. Cited by: §1.
  • [30] S. Z. Gilani, A. Mian, and P. Eastwood (2017) Deep, dense and accurate 3d face correspondence for generating population specific deformable models. Pattern Recognition 69, pp. 238–250. Cited by: §2.1, §7.
  • [31] S. Z. Gilani, A. Mian, F. Shafait, and I. Reid (2017) Dense 3d face correspondence. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (7), pp. 1584–1598. Cited by: §2.2, 4th item, §7.
  • [32] C. M. Grewe and S. Zachow (2016) Fully automated and highly accurate dense correspondence for facial surfaces. In European Conference on Computer Vision, pp. 552–568. Cited by: §2.2.
  • [33] X. Gu, S. Wang, J. Kim, Y. Zeng, Y. Wang, H. Qin, and D. Samaras (2007) Ricci flow for 3d shape analysis. In IEEE International Conference on Computer Vision, pp. 1–8. Cited by: §2.2, §2.2.
  • [34] H. Li, R. W. Sumner, and M. Pauly (2008) Global correspondence optimization for non-rigid registration of depth scans. In Computer Graphics Forum, Vol. 27, pp. 1421–1430. Cited by: §1, §2.1.
  • [35] T. Li, T. Bolkart, M. J. Black, H. Li, and J. Romero (2017) Learning a model of facial shape and expression from 4d scans. ACM Transaction on Graphics 36 (6), pp. 194–1. Cited by: §1.
  • [36] F. Liu, L. Tran, and X. Liu (2019) 3d face modeling from diverse raw scan data. In IEEE International Conference on Computer Vision, pp. 9408–9418. Cited by: §7.
  • [37] F. Liu, D. Zeng, Q. Zhao, and X. Liu (2016) Joint face alignment and 3d face reconstruction. In European Conference on Computer Vision, pp. 545–560. Cited by: §1.
  • [38] J. Ma, J. Zhao, J. Tian, A. L. Yuille, and Z. Tu (2014) Robust point matching via vector field consensus. IEEE Transactions on Image Processing 23 (4), pp. 1706–1721. Cited by: §1.
  • [39] B. Maiseli, Y. Gu, and H. Gao (2017) Recent developments and trends in point set registration methods. Journal of Visual Communication and Image Representation 46, pp. 95–106. Cited by: §1.
  • [40] H. Mohammadzade and D. Hatzinakos (2012) Iterative closest normal point for 3d face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2), pp. 381–397. Cited by: §1, §2.1.
  • [41] A. Myronenko and X. Song (2010) Point set registration: coherent point drift. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (12), pp. 2262–2275. Cited by: §1, §2.1.
  • [42] G. Pan, X. Zhang, Y. Wang, Z. Hu, X. Zheng, and Z. Wu (2013) Establishing point correspondence of 3d faces via sparse facial deformable model. IEEE Transactions on Image Processing 22 (11), pp. 4170–4181. Cited by: §1, §2.1, 4th item.
  • [43] A. Patel and W. A. Smith (2009) 3d morphable face models revisited. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1327–1334. Cited by: §2.1.
  • [44] P. J. Phillips, P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek (2005) Overview of the face recognition grand challenge. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 947–954. Cited by: §7.1.
  • [45] S. Ploumpis, E. Ververas, E. O’Sullivan, S. Moschoglou, H. Wang, N. Pears, W. Smith, B. Gecer, and S. P. Zafeiriou (2020) Towards a complete 3d morphable model of the human head. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §1.
  • [46] J. Rissanen (1978) Modeling by shortest data description. Automatica 14 (5), pp. 465–471. Cited by: §7.5.
  • [47] A. Salazar, S. Wuhrer, C. Shu, and F. Prieto (2014) Fully automatic expression-invariant face correspondence. Machine Vision and Applications 25 (4), pp. 859–879. Cited by: §2.2.
  • [48] A. Savran, N. Alyüz, H. Dibeklioğlu, O. Çeliktutan, B. Gökberk, B. Sankur, and L. Akarun (2008) Bosphorus database for 3d face analysis. In European Workshop on Biometrics and Identity Management, pp. 47–56. Cited by: §7.1.
  • [49] M. P. P. Segundo, L. Silva, O. R. P. Bellon, and C. C. Queirolo (2010) Automatic face segmentation and facial landmark detection in range images. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 40 (5), pp. 1319–1330. Cited by: §7.
  • [50] O. Sorkine and M. Alexa (2007) As-rigid-as-possible surface modeling. In Symposium on Geometry processing, Vol. 4, pp. 109–116. Cited by: §1.
  • [51] O. Sorkine-Hornung and M. Rabinovich (2017) Least-squares rigid motion using svd. Computing 1 (1), pp. 1–5. Cited by: §4.1.
  • [52] Y. Sun and M. A. Abidi (2001) Surface matching by 3d point’s fingerprint. In IEEE International Conference on Computer Vision, Vol. 2, pp. 263–269. Cited by: §2.2.
  • [53] S. Suwajanakorn, S. M. Seitz, and I. Kemelmacher-Shlizerman (2017) Synthesizing obama: learning lip sync from audio. ACM Transactions on Graphics 36 (4), pp. 1–13. Cited by: §1.
  • [54] G. K. Tam, Z. Cheng, Y. Lai, F. C. Langbein, Y. Liu, D. Marshall, R. R. Martin, X. Sun, and P. L. Rosin (2012) Registration of 3d point clouds and meshes: a survey from rigid to nonrigid. IEEE Transactions on Visualization and Computer Graphics 19 (7), pp. 1199–1217. Cited by: §1.
  • [55] D. Terzopoulos, J. Platt, A. Barr, and K. Fleischer (1987) Elastically deformable models. In ACM Annual Conference on Computer Graphics and Interactive Techniques, pp. 205–214. Cited by: §1.
  • [56] D. Vlasic, M. Brand, H. Pfister, and J. Popovic (2005) Face transfer with multilinear models. ACM Transactions on Graphics 24 (3), pp. 426–433. Cited by: §2.1.
  • [57] Y. Wang, J. Liu, and X. Tang (2010) Robust 3d face recognition by local shape difference boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (10), pp. 1858–1870. Cited by: §7.
  • [58] H. Yang, H. Zhu, Y. Wang, M. Huang, Q. Shen, R. Yang, and X. Cao (2020) Facescape: a large-scale high quality 3d face dataset and detailed riggable 3d face prediction. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 601–610. Cited by: Fig. 1, Fig. 8, §5, Fig. 9, §7.1, TABLE III.
  • [59] J. Yang, H. Li, D. Campbell, and Y. Jia (2015) Go-icp: a globally optimal solution to 3d icp point-set registration. IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (11), pp. 2241–2254. Cited by: 4th item.
  • [60] L. Yin, X. Wei, Y. Sun, J. Wang, and M. J. Rosato (2006) A 3d facial expression database for facial behavior research. In International Conference on Automatic Face and Gesture Recognition, pp. 211–216. Cited by: §2.2, §7.1, TABLE III.
  • [61] W. Zeng, D. Samaras, and D. Gu (2010) Ricci flow for 3d shape analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (4), pp. 662–677. Cited by: §2.2, §2.2.
  • [62] C. Zhang, W. A. Smith, A. Dessein, N. Pears, and H. Dai (2016) Functional faces: groupwise dense correspondence using functional maps. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 5033–5041. Cited by: §2.1.
  • [63] Z. Zhang (1994) Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision 13 (2), pp. 119–152. Cited by: §1.
  • [64] X. Zhu, X. Liu, Z. Lei, and S. Z. Li (2017) Face alignment in full pose range: a 3d total solution. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (1), pp. 78–92. Cited by: §1.
  • [65] M. Zollhöfer, J. Thies, P. Garrido, D. Bradley, T. Beeler, P. Pérez, M. Stamminger, M. Nießner, and C. Theobalt (2018) State of the art on monocular 3d face reconstruction, tracking, and applications. In Computer Graphics Forum, Vol. 37, pp. 523–550. Cited by: §1.
  • [66] S. Zulqarnain Gilani, F. Shafait, and A. Mian (2015) Shape-based automatic detection of a large number of 3d facial landmarks. In IEEE conference on Computer Vision and Pattern Recognition, pp. 4639–4648. Cited by: §2.1.