Face Manifold: Manifold Learning for Synthetic Face Generation

10/03/2019 ∙ by Kimia Dinashi, et al. ∙ University of Tehran 30

Face is one of the most important things for communication with the world around us. It also forms our identity and expressions. Estimating the face structure is a fundamental task in computer vision with applications in different areas such as face recognition and medical surgeries. Recently, deep learning techniques achieved significant results for 3D face reconstruction from flat images. The main challenge of such techniques is a vital need for large 3D face datasets. Usually, this challenge is handled by synthetic face generation. However, synthetic datasets suffer from the existence of non-possible faces. Here, we propose a face manifold learning method for synthetic diverse face dataset generation. First, the face structure is divided into the shape and expression groups. Then, a fully convolutional autoencoder network is exploited to deal with the non-possible faces, and, simultaneously, preserving the dataset diversity. Simulation results show that the proposed method is capable of denoising highly corrupted faces. The diversity of the generated dataset is evaluated qualitatively and quantitatively and compared to the existing methods. Experiments show that our manifold learning method outperforms the state of the art methods significantly.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 4

page 5

page 6

page 7

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Face can indicate different and complex thoughts and feelings through gestures. Alongside with our emotions, the geometric features of our faces form our identity. Recovering the structure of a face is an important task in computer vision with fundamental role in various applications. For example in the face recognition task, a received face should be aligned to form a neutral frontal face to remove the variation of pose in the same subject [12, 24]. As another example, in medical applications, having face structure allows better planning for operations and surgeries [36, 9].

Recovering the structure of a face from a flat image is a challenging task, since there exist a vast number of degrees of freedom and flexibility. Camera projection, lighting condition, texture, and head position and orientation are the main sources of ambiguity. There exist numerous methods for face construction using a single image. Some methods, characterize face with limited number of parameters

[37, 25]. These methods are based on the fact that human faces share similar characteristics. A popular example of such methods is 3D morphable model(3DMM) [3], which has been used in many applications such as face reconstruction [2, 14], recognition [46], digital makeup [34], and synthetic face data generation [7]. The low dimensional parameter space is derived from a face example dataset, which has a great influence on the resulting model. They produce unsatisfactory outputs when the input face deviates largely from the example faces in the dataset. There also exist methods that use RGB-D cameras [4, 22, 13]. However, RGB-D cameras such as Microsoft Kinect are not still common and not of high resolution at low cost as RGB cameras are. Shape-from-shading (SFS) [43, 8] is another computer vision technique that is employed in the face reconstruction problem. SFS could potentially result in ambiguous results [27] and need additional information for reconstruction process such as face symmetry [41, 44], reference face [19], multi-view images [42, 16], and unconstrained face images [31, 32].

With rapid growth and success of deep learning techniques in various computer vision areas, researchers also used this technique in 3D face reconstruction problem [18, 29, 38]

. An image-to-image translation network which forms a pixel-based mapping from the input image to the depth image is introduced in

[35]. CoarseNet and FineNet are introduced in [29] to derive the face structure in a coarse-to-fine fashion. Another coarse-to-fine framework is also proposed in [10] for monocular videos. Reconstruction under two extreme condition, i.e. out-of-plane rotation and occlusion is considered in [39]. Zhu et. al.

also employed convolutional neural networks for face alignment purpose

[47]. Their method could align faces in large poses up to .

A common point between all mentioned methods which are based on deep learning frameworks is the lack of large labeled dataset. All methods need a large pair of input image and its corresponding true 3D representation. Most of the methods follow pipelines like the one suggested in [28]

. Each synthetic face is constructed by drawing random vectors for

identity, expression, and texture

from a face morphable model (e.g. 3DMM). As we mentioned, models like 3DMM are highly dependent on example dataset. Moreover, the random vectors are drawn from a Gaussian distribution which makes them concentrate around the average face. Another disadvantage of this pipeline is the generation of

non-possible faces. This is because of the fact that the parameters cover a wide range of positions for different parts of a face in a weekly related manner. However, the different parts of a face are highly related to each other. Thus, a possible human face is only exist in a small subspace of this representation, which is called face manifold in this paper.

Manifold learning is a process of learning the geometric and topological properties of samples given that the input data are sampled from a smooth manifold [23]. To the best of our knowledge, there exist no complete research on face manifold in the literature. However, manifold learning is a common technique in some areas like human motion [1], mechanics [15], signal processing [40], and time series analysis [11]. By learning a manifold for a human face, most of the non-possible faces resulted by the different synthetic data generators could be corrected or eliminated. Thus, the reconstruction methods based on the deep learning techniques, which need a large number of labeled data, could be trained on a large and diverse datasets.

In this paper, we propose a face manifold learning method based on the convolutional autoencoders. First, we divide a face structure into the two sets of parameters related to the shape and expression features, in the 3DMM method. Thereafter, a fully convolutional autoencoder is designed to correct the corrupted data. This also could be seen as a noise removing procedure, where the noise is the source of corruption. To train the proposed network, two datasets are used. The input data is a randomly corrupted version of 3DMM parameters and the output is the original clean data. Results are evaluated with quantitative and qualitative criteria. The proposed algorithm leads to a promising and high quality results and we can show that by using this method, it is possible to generate a highly diverse synthetic face dataset. The rest of the paper is organized as follows. In Section II, we go through the details of the proposed method. The evaluation of the proposed method is detailed in Section III. Finally, Section IV concludes the paper. Code is publicly available on GitHub111 https://github.com/SCL-UT/face-manifold..

Ii The Proposed Method

Ii-a Face Model

3D morphabel models are derived from applying principal components analysis (PCA)

[17] to a set of 3D face scans. Then, a 3D face could be obtained by specifying the coefficients of each of the bases of the face space. As suggested in [6], these bases are divided into two sets; first, the bases describing identity of a neutral face, and second ones describing facial expressions. In a 3DMM model, a face could be characterized as follows:

(1)

where and are the identity and expression bases, respectively, and is the average face. and

are the identity and expression coefficients respectively, which are regarded as the only parameters of this parametric model. The 3DMM bases employed in this work are from Basel face model (BFM)

[26] for face shapes and FaceWarehouse [5] for face expressions. In this model and have and dimensions, respectively. The final vector , representing a 3D face, contains 3D vertices. Therefore, employing this representation leads to a significant dimensionality reduction from thousands of vertices to only 228 parameters.

As discussed in [3]

, the probability for each parameter follows a Gaussian distribution with the variance equivalent to its corresponding eigenvalue, independently from other parameters. As stated before, many of the existing methods use synthetic datasets for training a network to learn 3D face reconstruction task

[28, 7, 10, 29, 35]

. In order to generate a synthetic dataset, 3DMM parameters are drawn from the mentioned Gaussian distribution. However this method leads to concentration of the generated faces around the mean face and very limited variations in the shape and expression. For instance, in the case of expression parameters, most of the expressions would be similar to the neutral expression. Thus the synthetic dataset would be very restricted far from real face images. Training a neural network on the aforementioned dataset limits the ability of the network in accurate prediction for face images with varying shapes and expressions. As a solution to this problem, one may draw 3DMM parameters from a uniform distribution or within a larger interval instead of the traditional method. This would increase diversity in generated faces, but at the same time decreases their possibility of being a real human face. In other words, the generated faces become more probable to be corrupted or

non-possible. In Fig.1 some instances of non-possible faces generated by this method are shown. The interval for each of the parameters is chosen times of its corresponding eigenvalue. For the shape and expression parameters is considered and , respectively. Through this section, we propose a method to correct this non-possible faces. In other words, correcting their corruption, in such a manner that their deviation from the mean face would be preserved as much as possible. To aim this purpose, two separate convolutional neural networks, one for expression and the other one for shape, are trained in order to learn the human face manifold. Afterwards, the trained networks are used to map non-possible faces to possible ones without losing their particular characteristics in terms of shape and expression.

Fig. 1:

Generated faces by drawing 3DMM parameters from a uniform distribution instead of normal distribution, which have very extreme shapes or expressions. Top row: faces with noisy shapes. Bottom row: faces with noisy expressions.

Ii-B Network Structure

The mapping from a set of noisy 3DMM parameters to its valid version is learned by a convolutional autoencoder network. An autoencoder structure consists of two parts: i) the encoder part generates a representation of the input with usually reduced dimension by extracting main features while ignoring noise and ii) the decoder part which is responsible for generating a clean version of the input from this compact representation. Autoencoders are considered as effective tools in learning data compression and denoising tasks. The main challenge of this work is in fact denoising the 3DMM parameters, where an autoencoder network appears to be an appropriate choice. In the following, the structure of the proposed convolutional autoencoder network would be described in details.

Fig. 2: The encoder part of the proposed convolutional autoencoder. Each layer consists of several channels of convolution filters and poolings. Number of channels in layers 1, 2, 3, and 4 is respectively 8, 16, 32, and 64. The numbers indicate the dimensions of input feature maps in each layer, the top row for the shape and the bottom row for the expression parameters.

Here, an effective yet light-weighted convolutional autoencoder architecture is selected. Empirical experiments demonstrate that considering and training an eight-layer symmetrical autoencoder as both the shape and expression networks, enables promising results in denoising 3DMM parameters. The encoder part of the proposed structure is detailed in Fig.2

. The architectural symmetry implies that the decoder is the reverse of the encoder, i.e. the order of layers and also the operation of each layer in the encoder is reversed in the decoder. In the encoder, each layer consists of multiple channels of one-dimensional convolution stacked by a max-pooling layer. Alternately, the max-unpooling stacked by transposed convolution exist in each channel of the decoder layers. The max-unpooling operation is partial inverse of the max-pooling, which is an upsampling process, transferring its input value to one of the locations in its output feature map and zeroing all the other locations. The non-zero location is derived from the max-pooling stage. The indices of max locations for each max-pooling operation in the encoder are stored and used in the corresponding max-unpooling operation as the non-zero location. In the max-unpooling, the output feature map is generated assigning the saved index position with its input and zeroing all non-maximal values. The kernel size of all convolution filters is chosen

with zero padding of size

. In all pooling layers, the kernel size and the stride are considered

. The expression or shape parameters are fed to the proposed network. After passing four layers of encoder, the final encoded parameters are a or vector in the expression and shape networks, respectively. Then this representation is fed to the decoder in order to reconstruct both the clean expression and the shape parameters.

Ii-C Dataset

For training the network, a dataset containing pairs of correct and noisy 3DMM parameters is required. The best way to insure that the 3DMM parameters considered as clean ones would properly represent a possible 3D face structure, is making use of fitted 3DMM parameters to real face images as the ground truth data, existing in the available datasets. Two different datasets are used in order to prepare the required training data: 300W-3D [45] and AFLW2000-3D [45].

300W-3D: 300W [33] contains images from multiple databases with their corresponding standardized -point landmarks. 300W-3D consists of images of 300W with their fitted 3DMM parameters. Zhu et. al. have employed the Multi Features Framework (MFF) [30] to provide the fitted 3DMM parameters of 300W samples [45].

AFLW2000-3D: AFLW [21] contains face images in large poses with their annotated -point visible landmarks. In [45] a reconstruction algorithm is applied to the first samples of AFLW resulting in AFLW2000-3D database, which contains the fitted 3DMM parameters and their corresponding -point landmarks.

Putting all samples of the above two datasets together form a large dataset containing fitted 3DMM parameters to the real face images. In order to train the shape and expression networks, two different datasets are needed, one with noisy shapes and the other with noisy expressions. The mentioned fitted shape and expression parameters are the clean and noiseless parameters and are considered as ground truth data which would be compared to the network outputs. Noisy versions of the parameters are also needed as the inputs to the network.

A simple yet effective procedure is employed in generating noisy versions of the parameters. In order to randomly corrupt the expression parameters, first a number is selected randomly between and . is defined as the number of parameters to be corrupted with noise. Then,

parameters are randomly selected from the expression parameters. Afterwards, different Gaussian noise signals with standard deviation denoted as

are added to each of the selected parameters. The same process is repeated times for each of the samples of the dataset in order to prepare corrupted versions of each of the samples. Finally a dataset, containing pairs of clean and noisy 3DMM expression parameters is obtained. samples are used as training data and about or samples are considered as the test data.

The same procedure is done in the case of shape parameters. denotes the standard deviation of the added noise. The number of prepared noisy versions of each sample is . Finally a dataset, containing pairs of clean and noisy 3DMM shape parameters is prepared. pairs are used as training data and the other samples are used as the test data. Some examples of the prepared shape and expression samples are shown in Fig.3 and Fig.4, respectively.

Fig. 3: Some samples of the prepared expression dataset. Top row: faces with ground truth expression parameters. Bottom row: faces with noisy expression parameters.
Fig. 4: Some samples of the prepared shape dataset. Top row: faces with ground truth shape parameters. Bottom row: faces with noisy shape parameters.

This noisy dataset generation fashion is chosen because it leads to satisfying results in practice. The variance of the noise added to the training dataset completely depends on noise values expected to be removed by the network. As the interval of choosing 3DMM parameters in synthetic dataset generation scenario becomes larger, the generated faces would be more corrupted, hence elimination of larger noises would be required. In the cases where the noise variance added to training data is too small, the network would not be able to recognize and eliminate the noise signal.

Ii-D Training

In the training procedure, the noisy 3DMM parameters are fed to the network, then the network outputs are compared to the ground truth parameters. The mean squared error (MSE) between the network outputs and clean parameters is used as the loss function. The networks are trained for 10 epochs over the training data. The learning rate is set to 0.001 and the batch size is considered 128. Adam optimization algorithm

[20] is employed for training the networks. As mentioned, shape parameters have very large values, hence to enable the network to learn from this data, the whole shape training set is divided by . The trained network is evaluated in the next section. Training and evaluation were conducted on a desktop PC hardware with an NVIDIA GeForce GTX 1070 GPU and an Intel Core i7-4770K @ 3.50GHz CPU.

Iii Experiments

To verify the effectiveness of the proposed method, the experimental results would be presented in three subsections: qualitative and quantitative results on the test data, qualitative results on the corrupted uniformly generated synthetic dataset, and comparison between diversity of the synthetic datasets generated by the proposed method and the common traditional method. In the third subsection, experiments show that the proposed method surpasses common methods in terms of diversity in shapes and expressions by a large margin.

Fig. 5: Logarithm of the training and testing loss over epochs. Learning curves of the shape (top) and expression (bottom) networks.

First of all, in order to generate training data, and must be obtained empirically. As expressed before, the amount of these parameters depends on intervals of uniform distribution in synthetic data construction. Here the network is expected to denoise synthetic data with the shape and expression parameters drawn from uniform distribution within intervals 10 and 15 times of their corresponding eigenvalues, respectively. The appropriate values for and are experimentally achieved and for this setup. The training and testing loss over learning epochs are plotted in Fig.5.

Fig. 6: The results of the shape network on test data. Faces with ground truth shape parameters are in the left column, the second column contains the faces with noisy shape parameter as input to the network, and the third column shows the faces with network output as their shape parameters.
Fig. 7: The results of the expression network on test data. Faces with ground truth expression parameters are in the left column, the second column contains the faces with noisy expression parameter as input to the network, and the third column shows the faces with network output as their expression parameters.
Fig. 8: MSE of the test data and the network output over noise variance value of the test data for shape (top) and expression (bottom) networks.

Iii-a Result on Test Data

To analyze the network accuracy in denoising test faces, the prepared test data for shape and expression are fed to their corresponding networks and the outputs of the networks are compared to the ground truth values. Mean squared error criterion is used as a performance measure to better examine the accuracy of the network output. This yields to MSE 0.12 for the expression network and 0.17 for the shape network. It is worth mentioning that the noisy and clean shape parameters are divided by 1e5. The MSE of noisy test data fed to the network is 2.07 for the expression and 12.57 for the shape parameters. This sharp decline in MSE implies that the networks are able to successfully recover facial shape and expression parameters from noisy ones. To examine the performance of the proposed network on the test data with noise variance different than the noise variance of the training data, an experiment is conducted and its results are shown in Fig.8. In this figure the x-axis shows the variance of the test data. We note that the shape (top) and expression (bottom) networks are trained with the noise variance of 500,000 and 2, respectively. Increasing in test data generation procedure leads to a parabolic increase of the test data MSE. However, the proposed networks are able to reduce the noise MSE to around zero, and still remain roughly constant in a wide range of noise variance values. This property induces completely clean faces generated by the network even when extremely noisy faces are fed to it. The qualitative results of evaluation on test data are shown in Fig.6 and Fig.7 for the shape and expression networks respectively. It can be observed that both networks are able to reconstruct noiseless faces with completely clean shapes and expressions. Moreover, the original shape and expression are preserved almost as they were before adding noise.

Fig. 9: The only failures of the network in denoising 3D faces among 500 faces. The bottom right image is accounted as the shape network failure and in the other images the facial expression is corrupted.
Fig. 10: Comparison of MSE of the networks trained on datasets with three different noise levels against test noise level variation for shape (top) and expression (bottom) networks.

In order to assess the networks performance in the situation where the variance of the noise added to the test data is not equal to the noise variance used in training data, further analysis of noisy data variance would be conducted in this part. We train different networks on three different values of the noise variance used in the training data generation procedure. The test data are also generated with nine different values of noise variance. Then, the MSE of the networks output on each set of test data is measured. The results of this experiment are demonstrated in Fig.10. In this figures the x-axis shows the variance of the noise added to the test data. As observed, all curves are ascending, implying that increasing the test data noise level would reduce the performance; however, the performance distance is insignificant if compared to the MSE of input data as illustrated in Fig.8. Besides, the networks trained on noisy data with larger noise levels have the best performance on data with high noise level and vice versa. But for values of noise level less than a certain value, all networks lead to almost the same output MSE, i.e. eliminating low level of noise would be a simple task for all networks.

Iii-B Result on Synthetic Dataset

Here, we are willing to show the performance of the proposed networks in generating synthetic face as the main purpose of our study.. In the following, qualitative results on noisy synthetic dataset is provided. Experiments show that the proposed method obtains high quality synthetic face generation. Synthetic dataset images are generated by drawing random expression and shape parameters from uniform distribution in intervals and times their corresponding eigenvalues around zero. This is same procedure as faces generated in Fig.1). These noisy expression and shape parameters are fed as the inputs to each of the networks and the outputs are examined qualitatively (see Fig.11). The networks has generated clean and noiseless outputs. Thereby this method enables us to create a synthetic dataset with high diversity in expressions and shapes without any corruption at the same time.

To better evaluate the reliability of the network outputs which will be considered as synthetic face data, a quality metric is introduced. The metric is defined as the percentage of acceptable outputs as a real face in the generated synthetic dataset. The percentage is computed feeding 500 noisy face shape and expression parameters to their corresponding networks and examining the generated faces. The experiment is done using 5 subjects, evaluating the generated faces. A sample would be considered as non-possible face, if at least one subject rejects the face. This leads to reliability for the expression network and for the shape network. Fig.9 represents all of the outputs rejected to be real faces in this experiment. The generated expressions are not accepted only because of a minor asymmetry in the eyes. It is possible to increase the reliability of the networks outputs, in exchange for their diversity by reducing range of uniform distribution of the input parameters. For instance, if the expression parameters are drawn from uniform distribution within 5 times their eigenvalues around zero, the expression network generates real outputs for all 500 faces.

Fig. 11: The results of the proposed method on a synthetic dataset with uniformly distributed 3DMM parameters. The interval of the uniform distribution is considered 10 times eigenvalues for the shape and 15 times eigenvalues for the expression parameters. The first and third rows contain generated synthetic faces, and the second and fourth rows contain the faces with networks outputs as their shape and expression parameters.
Fig. 12: Scatter diagrams comparing diversity of three datasets in terms of shape (top) and expression (bottom).

Iii-C Scatter Diagrams

This experiment proves that the proposed method for generating synthetic dataset leads to increased scattering and diversity of the generated faces compared to the previous methods. In order to plot scatter diagram, PCA dimensionality reduction is applied to the shape and expression parameters separately. The same work is repeated for three different datasets: 1) the dataset generated by feeding randomly generated parameters from a uniform distribution to the networks (the proposed method), 2) the dataset generated by choosing parameters from a normal distribution, and 3) the fitted 3DMM parameters to the real images. The comparison between diversity of these three datasets is shown as the scatter diagram plotted for 70 samples in Fig.12. As can be seen, the scatter diagram of the generated dataset by using the proposed method is higher than the other two datasets. Thus, it can be demonstrated that the proposed method for synthetic face generation outperforms the existing methods in terms of diversity of shapes and expressions. It should be noted that the amount of the scattering is completely adjustable by changing the uniform distribution interval used in the proposed procedure.

Parameters Our dataset Normal dataset Realistic dataset
shape
(2)
expression
(3)

A scattering criterion is also used for a quantitative comparison. The trace of the covariance matrix is considered as the scattering criterion. The value of this criterion for 2000 samples of each of the three datasets is reported in Table LABEL:scatter which confirms the scatter diagram results.

Iv Conclusion

3D face reconstruction using 2D images is a challenging task where deep learning techniques achieved promising results. Synthetic dataset generation is a way to prepare the training data sets for such techniques. In this paper, we proposed an autoencoder network which learns the human face manifold to generate synthetic faces. Using manifold, makes the network capable to deal with the non-possible faces. We showed that the common existing methods for synthetic face generation reduce diversity in order to have non-corrupted faces. The proposed network produces completely possible faces without sacrificing the diversity. In other words, it generates highly diverse dataset without any non-possible faces. Experiments show that the diversity of the generated faces could be improved up to 8 and 19 times in terms of the shape and expression, respectively, in comparison to the the existing methods and datasets, which also could be further improved by adding more noises to the training datasets. Experiments also confirm that the trained network is robust against high MSE values of noise and can denoise highly corrupted faces. The high reliability in the network results ( in the expression and in the shape of the generated faces) insures that the proposed network could be placed at the top of any 3D reconstruction network to improve the output quality.

References

  • [1] A. Aristidou, J. Lasenby, Y. Chrysanthou, and A. Shamir (2018) Inverse kinematics techniques in computer graphics: a survey. In Computer Graphics Forum, Vol. 37, pp. 35–58. Cited by: §I.
  • [2] A. Bas, W. A. Smith, T. Bolkart, and S. Wuhrer (2016) Fitting a 3d morphable model to edges: a comparison between hard and soft correspondences. In Asian Conference on Computer Vision, pp. 377–391. Cited by: §I.
  • [3] V. Blanz, T. Vetter, et al. (1999) A morphable model for the synthesis of 3d faces.. In Siggraph, Vol. 99, pp. 187–194. Cited by: §I, §II-A.
  • [4] S. Bouaziz, Y. Wang, and M. Pauly (2013) Online modeling for realtime facial animation. ACM Transactions on Graphics (ToG) 32 (4), pp. 40. Cited by: §I.
  • [5] C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou (2013) Facewarehouse: a 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20 (3), pp. 413–425. Cited by: §II-A.
  • [6] B. Chu, S. Romdhani, and L. Chen (2014) 3D-aided face recognition robust to expression and pose variations. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 1899–1906. Cited by: §II-A.
  • [7] P. Dou, S. K. Shah, and I. A. Kakadiaris (2017) End-to-end 3d face reconstruction with deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5908–5917. Cited by: §I, §II-A.
  • [8] J. D. Durou, M. Falcone, and M. Sagona (2008) Numerical methods for shape-from-shading: A new survey with benchmarks. Computer Vision and Image Understanding. External Links: Document, ISSN 10773142 Cited by: §I.
  • [9] L. Goto, T. Huysmans, W. Lee, J. F. Molenbroek, and R. H. Goossens (2018) A comparison between representative 3d faces based on bi-and multi-variate and shape based analysis. In Congress of the International Ergonomics Association, pp. 1355–1364. Cited by: §I.
  • [10] Y. Guo, J. Zhang, J. Cai, B. Jiang, and J. Zheng (2018) CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images. External Links: Document, 1708.00980v3, ISSN 19393539 Cited by: §I, §II-A.
  • [11] M. Han, S. Feng, C. L. P. Chen, M. Xu, and T. Qiu (2018) Structured manifold broad learning system: a manifold perspective for large-scale chaotic time series analysis and prediction. IEEE Transactions on Knowledge and Data Engineering (), pp. 1–1. External Links: Document, ISSN 1041-4347 Cited by: §I.
  • [12] T. Hassner, S. Harel, E. Paz, and R. Enbar (2015) Effective face frontalization in unconstrained images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4295–4304. Cited by: §I.
  • [13] P. L. Hsieh, C. Ma, J. Yu, and H. Li (2015) Unconstrained realtime facial performance capture. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, External Links: Document, ISBN 9781467369640, ISSN 10636919 Cited by: §I.
  • [14] P. Huber, G. Hu, R. Tena, P. Mortazavian, P. Koppen, W. J. Christmas, M. Ratsch, and J. Kittler (2016) A multiresolution 3d morphable face model and fitting framework. In Proceedings of the 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Cited by: §I.
  • [15] R. Ibanez, E. Abisset-Chavanne, J. V. Aguado, D. Gonzalez, E. Cueto, and F. Chinesta (2018) A manifold learning approach to data-driven computational elasticity and inelasticity. Archives of Computational Methods in Engineering 25 (1), pp. 47–57. Cited by: §I.
  • [16] A. E. Ichim, S. Bouaziz, and M. Pauly (2015) Dynamic 3D avatar creation from hand-held video input. ACM Transactions on Graphics. External Links: Document, ISSN 07300301 Cited by: §I.
  • [17] I. Jolliffe (2003) Principal component analysis. Technometrics 45 (3), pp. 276. Cited by: §II-A.
  • [18] A. Jourabloo and X. Liu (2016) Large-pose Face Alignment via CNN-based Dense 3D Model Fitting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), External Links: Document, ISBN 9781467388511, ISSN 10636919 Cited by: §I.
  • [19] I. Kemelmacher-Shlizerman and R. Basri (2011) 3D face reconstruction from a single image using a single reference face shape. IEEE Transactions on Pattern Analysis and Machine Intelligence. External Links: Document, ISSN 01628828 Cited by: §I.
  • [20] D. P. Kingma and J. Ba (2014) Adam: A Method for Stochastic Optimization. pp. 1–15. External Links: 1412.6980, Link Cited by: §II-D.
  • [21] M. Koestinger, P. Wohlhart, P. M. Roth, and H. Bischof (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In 2011 IEEE international conference on computer vision workshops (ICCV workshops), pp. 2144–2151. Cited by: §II-C.
  • [22] H. Li, J. Yu, Y. Ye, and C. Bregler (2013) Realtime facial animation with on-the-fly correctives. ACM Transactions on Graphics. External Links: Document, 1111.6189v1, ISBN 0730-0301, ISSN 07300301 Cited by: §I.
  • [23] T. Lin and H. Zha (2008-05) Riemannian manifold learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (5), pp. 796–809. External Links: Document, ISSN 0162-8828 Cited by: §I.
  • [24] I. Masi, F. Chang, J. Choi, S. Harel, J. Kim, K. Kim, J. Leksut, S. Rawls, Y. Wu, T. Hassner, et al. (2019) Learning pose-aware models for pose-invariant face recognition in the wild. IEEE transactions on pattern analysis and machine intelligence 41 (2), pp. 379–393. Cited by: §I.
  • [25] M. Meytlis and L. Sirovich (2007) On the dimensionality of face space. IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (7), pp. 1262–1267. Cited by: §I.
  • [26] P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vetter (2009) A 3d face model for pose and illumination invariant face recognition. In 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 296–301. Cited by: §II-A.
  • [27] E. Prados and O. Faugeras (2006) Shape from shading. In Handbook of Mathematical Models in Computer Vision, External Links: Document, ISBN 0387263713 Cited by: §I.
  • [28] E. Richardson, M. Sela, and R. Kimmel (2016) 3D face reconstruction by learning from synthetic data. In Proceedings - 2016 4th International Conference on 3D Vision, 3DV 2016, External Links: Document, 1609.04387v2, ISBN 9781509054077 Cited by: §I, §II-A.
  • [29] E. Richardson, M. Sela, R. Or-El, and R. Kimmel (2017) Learning detailed face reconstruction from a single image. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, External Links: Document, ISBN 9781538604571 Cited by: §I, §II-A.
  • [30] S. Romdhani and T. Vetter (2005) Estimating 3d shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2, pp. 986–993. Cited by: §II-C.
  • [31] J. Roth, Y. Tong, and X. Liu (2015) Unconstrained 3D face reconstruction. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, External Links: Document, ISBN 9781467369640, ISSN 10636919 Cited by: §I.
  • [32] J. Roth, Y. Tong, and X. Liu (2017) Adaptive 3D Face Reconstruction from Unconstrained Photo Collections. IEEE Transactions on Pattern Analysis and Machine Intelligence. External Links: Document, ISBN 978-1-4673-8851-1, ISSN 01628828 Cited by: §I.
  • [33] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic (2013) 300 faces in-the-wild challenge: the first facial landmark localization challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 397–403. Cited by: §II-C.
  • [34] K. Scherbaum, T. Ritschel, M. Hullin, T. Thormählen, V. Blanz, and H. Seidel (2011) Computer-suggested facial makeup. In Computer Graphics Forum, Vol. 30, pp. 485–492. Cited by: §I.
  • [35] M. Sela, E. Richardson, and R. Kimmel (2017) Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation. In Proceedings of the IEEE International Conference on Computer Vision, External Links: Document, arXiv:1703.10131v2, ISBN 9781538610329, ISSN 15505499 Cited by: §I, §II-A.
  • [36] M. Sela, N. Toledo, Y. Honen, and R. Kimmel (2016) Customized facial constant positive air pressure (cpap) masks. arXiv preprint arXiv:1609.07049. Cited by: §I.
  • [37] L. Sirovich and M. Kirby (1987) Low-dimensional procedure for the characterization of human faces. Josa a 4 (3), pp. 519–524. Cited by: §I.
  • [38] A. T. Tran, T. Hassner, I. Masi, and G. Medioni (2017) Regressing robust and discriminative 3D morphable models with a very deep neural network. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, External Links: Document, 1612.04904, ISBN 9781538604571, ISSN 1063-6919 Cited by: §I.
  • [39] A. Tu?n Tr?n, T. Hassner, I. Masi, E. Paz, Y. Nirkin, and G. r. Medioni (2018-06) Extreme 3d face reconstruction: seeing through occlusions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I.
  • [40] W. Wang, Y. Yan, F. Nie, S. Yan, and N. Sebe (2018-06) Flexible manifold learning with optimal graph for image and video representation. IEEE Transactions on Image Processing 27 (6), pp. 2664–2675. External Links: Document, ISSN 1057-7149 Cited by: §I.
  • [41] Wen Yi Zhao and R. Chellappa (2002) Illumination-insensitive face recognition using symmetric shape-from-shading. External Links: Document Cited by: §I.
  • [42] C. Wu, B. Wilburn, Y. Matsushita, and C. Theobalt (2011) High-quality shape from multi-view stereo and shading under general illumination. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, External Links: Document, ISBN 9781457703942, ISSN 10636919 Cited by: §I.
  • [43] R. Zhang, P. Tsai, J. E. Cryer, and M. Shah (1999) Shape-from-shading: a survey. IEEE transactions on pattern analysis and machine intelligence 21 (8), pp. 690–706. Cited by: §I.
  • [44] W. Y. Zhao and R. Chellappa (2001) Symmetric shape-from-shading using self-ratio image. International Journal of Computer Vision. External Links: Document, ISSN 09205691 Cited by: §I.
  • [45] X. Zhu, Z. Lei, X. Liu, H. Shi, and S. Z. Li (2016) Face alignment across large poses: a 3d solution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 146–155. Cited by: §II-C, §II-C, §II-C.
  • [46] X. Zhu, Z. Lei, J. Yan, D. Yi, and S. Z. Li (2015) High-fidelity pose and expression normalization for face recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 787–796. Cited by: §I.
  • [47] X. Zhu, X. Liu, Z. Lei, and S. Z. Li (2019) Face Alignment in Full Pose Range: A 3D Total Solution. IEEE Transactions on Pattern Analysis and Machine Intelligence. External Links: Document, arXiv:1804.01005, ISSN 19393539 Cited by: §I.
TABLE I: Quantitative comparison between the diversity of three datasets.

Iv Conclusion

3D face reconstruction using 2D images is a challenging task where deep learning techniques achieved promising results. Synthetic dataset generation is a way to prepare the training data sets for such techniques. In this paper, we proposed an autoencoder network which learns the human face manifold to generate synthetic faces. Using manifold, makes the network capable to deal with the non-possible faces. We showed that the common existing methods for synthetic face generation reduce diversity in order to have non-corrupted faces. The proposed network produces completely possible faces without sacrificing the diversity. In other words, it generates highly diverse dataset without any non-possible faces. Experiments show that the diversity of the generated faces could be improved up to 8 and 19 times in terms of the shape and expression, respectively, in comparison to the the existing methods and datasets, which also could be further improved by adding more noises to the training datasets. Experiments also confirm that the trained network is robust against high MSE values of noise and can denoise highly corrupted faces. The high reliability in the network results ( in the expression and in the shape of the generated faces) insures that the proposed network could be placed at the top of any 3D reconstruction network to improve the output quality.

References

  • [1] A. Aristidou, J. Lasenby, Y. Chrysanthou, and A. Shamir (2018) Inverse kinematics techniques in computer graphics: a survey. In Computer Graphics Forum, Vol. 37, pp. 35–58. Cited by: §I.
  • [2] A. Bas, W. A. Smith, T. Bolkart, and S. Wuhrer (2016) Fitting a 3d morphable model to edges: a comparison between hard and soft correspondences. In Asian Conference on Computer Vision, pp. 377–391. Cited by: §I.
  • [3] V. Blanz, T. Vetter, et al. (1999) A morphable model for the synthesis of 3d faces.. In Siggraph, Vol. 99, pp. 187–194. Cited by: §I, §II-A.
  • [4] S. Bouaziz, Y. Wang, and M. Pauly (2013) Online modeling for realtime facial animation. ACM Transactions on Graphics (ToG) 32 (4), pp. 40. Cited by: §I.
  • [5] C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou (2013) Facewarehouse: a 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20 (3), pp. 413–425. Cited by: §II-A.
  • [6] B. Chu, S. Romdhani, and L. Chen (2014) 3D-aided face recognition robust to expression and pose variations. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 1899–1906. Cited by: §II-A.
  • [7] P. Dou, S. K. Shah, and I. A. Kakadiaris (2017) End-to-end 3d face reconstruction with deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5908–5917. Cited by: §I, §II-A.
  • [8] J. D. Durou, M. Falcone, and M. Sagona (2008) Numerical methods for shape-from-shading: A new survey with benchmarks. Computer Vision and Image Understanding. External Links: Document, ISSN 10773142 Cited by: §I.
  • [9] L. Goto, T. Huysmans, W. Lee, J. F. Molenbroek, and R. H. Goossens (2018) A comparison between representative 3d faces based on bi-and multi-variate and shape based analysis. In Congress of the International Ergonomics Association, pp. 1355–1364. Cited by: §I.
  • [10] Y. Guo, J. Zhang, J. Cai, B. Jiang, and J. Zheng (2018) CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images. External Links: Document, 1708.00980v3, ISSN 19393539 Cited by: §I, §II-A.
  • [11] M. Han, S. Feng, C. L. P. Chen, M. Xu, and T. Qiu (2018) Structured manifold broad learning system: a manifold perspective for large-scale chaotic time series analysis and prediction. IEEE Transactions on Knowledge and Data Engineering (), pp. 1–1. External Links: Document, ISSN 1041-4347 Cited by: §I.
  • [12] T. Hassner, S. Harel, E. Paz, and R. Enbar (2015) Effective face frontalization in unconstrained images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4295–4304. Cited by: §I.
  • [13] P. L. Hsieh, C. Ma, J. Yu, and H. Li (2015) Unconstrained realtime facial performance capture. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, External Links: Document, ISBN 9781467369640, ISSN 10636919 Cited by: §I.
  • [14] P. Huber, G. Hu, R. Tena, P. Mortazavian, P. Koppen, W. J. Christmas, M. Ratsch, and J. Kittler (2016) A multiresolution 3d morphable face model and fitting framework. In Proceedings of the 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Cited by: §I.
  • [15] R. Ibanez, E. Abisset-Chavanne, J. V. Aguado, D. Gonzalez, E. Cueto, and F. Chinesta (2018) A manifold learning approach to data-driven computational elasticity and inelasticity. Archives of Computational Methods in Engineering 25 (1), pp. 47–57. Cited by: §I.
  • [16] A. E. Ichim, S. Bouaziz, and M. Pauly (2015) Dynamic 3D avatar creation from hand-held video input. ACM Transactions on Graphics. External Links: Document, ISSN 07300301 Cited by: §I.
  • [17] I. Jolliffe (2003) Principal component analysis. Technometrics 45 (3), pp. 276. Cited by: §II-A.
  • [18] A. Jourabloo and X. Liu (2016) Large-pose Face Alignment via CNN-based Dense 3D Model Fitting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), External Links: Document, ISBN 9781467388511, ISSN 10636919 Cited by: §I.
  • [19] I. Kemelmacher-Shlizerman and R. Basri (2011) 3D face reconstruction from a single image using a single reference face shape. IEEE Transactions on Pattern Analysis and Machine Intelligence. External Links: Document, ISSN 01628828 Cited by: §I.
  • [20] D. P. Kingma and J. Ba (2014) Adam: A Method for Stochastic Optimization. pp. 1–15. External Links: 1412.6980, Link Cited by: §II-D.
  • [21] M. Koestinger, P. Wohlhart, P. M. Roth, and H. Bischof (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In 2011 IEEE international conference on computer vision workshops (ICCV workshops), pp. 2144–2151. Cited by: §II-C.
  • [22] H. Li, J. Yu, Y. Ye, and C. Bregler (2013) Realtime facial animation with on-the-fly correctives. ACM Transactions on Graphics. External Links: Document, 1111.6189v1, ISBN 0730-0301, ISSN 07300301 Cited by: §I.
  • [23] T. Lin and H. Zha (2008-05) Riemannian manifold learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (5), pp. 796–809. External Links: Document, ISSN 0162-8828 Cited by: §I.
  • [24] I. Masi, F. Chang, J. Choi, S. Harel, J. Kim, K. Kim, J. Leksut, S. Rawls, Y. Wu, T. Hassner, et al. (2019) Learning pose-aware models for pose-invariant face recognition in the wild. IEEE transactions on pattern analysis and machine intelligence 41 (2), pp. 379–393. Cited by: §I.
  • [25] M. Meytlis and L. Sirovich (2007) On the dimensionality of face space. IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (7), pp. 1262–1267. Cited by: §I.
  • [26] P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vetter (2009) A 3d face model for pose and illumination invariant face recognition. In 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 296–301. Cited by: §II-A.
  • [27] E. Prados and O. Faugeras (2006) Shape from shading. In Handbook of Mathematical Models in Computer Vision, External Links: Document, ISBN 0387263713 Cited by: §I.
  • [28] E. Richardson, M. Sela, and R. Kimmel (2016) 3D face reconstruction by learning from synthetic data. In Proceedings - 2016 4th International Conference on 3D Vision, 3DV 2016, External Links: Document, 1609.04387v2, ISBN 9781509054077 Cited by: §I, §II-A.
  • [29] E. Richardson, M. Sela, R. Or-El, and R. Kimmel (2017) Learning detailed face reconstruction from a single image. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, External Links: Document, ISBN 9781538604571 Cited by: §I, §II-A.
  • [30] S. Romdhani and T. Vetter (2005) Estimating 3d shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2, pp. 986–993. Cited by: §II-C.
  • [31] J. Roth, Y. Tong, and X. Liu (2015) Unconstrained 3D face reconstruction. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, External Links: Document, ISBN 9781467369640, ISSN 10636919 Cited by: §I.
  • [32] J. Roth, Y. Tong, and X. Liu (2017) Adaptive 3D Face Reconstruction from Unconstrained Photo Collections. IEEE Transactions on Pattern Analysis and Machine Intelligence. External Links: Document, ISBN 978-1-4673-8851-1, ISSN 01628828 Cited by: §I.
  • [33] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic (2013) 300 faces in-the-wild challenge: the first facial landmark localization challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 397–403. Cited by: §II-C.
  • [34] K. Scherbaum, T. Ritschel, M. Hullin, T. Thormählen, V. Blanz, and H. Seidel (2011) Computer-suggested facial makeup. In Computer Graphics Forum, Vol. 30, pp. 485–492. Cited by: §I.
  • [35] M. Sela, E. Richardson, and R. Kimmel (2017) Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation. In Proceedings of the IEEE International Conference on Computer Vision, External Links: Document, arXiv:1703.10131v2, ISBN 9781538610329, ISSN 15505499 Cited by: §I, §II-A.
  • [36] M. Sela, N. Toledo, Y. Honen, and R. Kimmel (2016) Customized facial constant positive air pressure (cpap) masks. arXiv preprint arXiv:1609.07049. Cited by: §I.
  • [37] L. Sirovich and M. Kirby (1987) Low-dimensional procedure for the characterization of human faces. Josa a 4 (3), pp. 519–524. Cited by: §I.
  • [38] A. T. Tran, T. Hassner, I. Masi, and G. Medioni (2017) Regressing robust and discriminative 3D morphable models with a very deep neural network. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, External Links: Document, 1612.04904, ISBN 9781538604571, ISSN 1063-6919 Cited by: §I.
  • [39] A. Tu?n Tr?n, T. Hassner, I. Masi, E. Paz, Y. Nirkin, and G. r. Medioni (2018-06) Extreme 3d face reconstruction: seeing through occlusions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I.
  • [40] W. Wang, Y. Yan, F. Nie, S. Yan, and N. Sebe (2018-06) Flexible manifold learning with optimal graph for image and video representation. IEEE Transactions on Image Processing 27 (6), pp. 2664–2675. External Links: Document, ISSN 1057-7149 Cited by: §I.
  • [41] Wen Yi Zhao and R. Chellappa (2002) Illumination-insensitive face recognition using symmetric shape-from-shading. External Links: Document Cited by: §I.
  • [42] C. Wu, B. Wilburn, Y. Matsushita, and C. Theobalt (2011) High-quality shape from multi-view stereo and shading under general illumination. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, External Links: Document, ISBN 9781457703942, ISSN 10636919 Cited by: §I.
  • [43] R. Zhang, P. Tsai, J. E. Cryer, and M. Shah (1999) Shape-from-shading: a survey. IEEE transactions on pattern analysis and machine intelligence 21 (8), pp. 690–706. Cited by: §I.
  • [44] W. Y. Zhao and R. Chellappa (2001) Symmetric shape-from-shading using self-ratio image. International Journal of Computer Vision. External Links: Document, ISSN 09205691 Cited by: §I.
  • [45] X. Zhu, Z. Lei, X. Liu, H. Shi, and S. Z. Li (2016) Face alignment across large poses: a 3d solution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 146–155. Cited by: §II-C, §II-C, §II-C.
  • [46] X. Zhu, Z. Lei, J. Yan, D. Yi, and S. Z. Li (2015) High-fidelity pose and expression normalization for face recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 787–796. Cited by: §I.
  • [47] X. Zhu, X. Liu, Z. Lei, and S. Z. Li (2019) Face Alignment in Full Pose Range: A 3D Total Solution. IEEE Transactions on Pattern Analysis and Machine Intelligence. External Links: Document, arXiv:1804.01005, ISSN 19393539 Cited by: §I.

References

  • [1] A. Aristidou, J. Lasenby, Y. Chrysanthou, and A. Shamir (2018) Inverse kinematics techniques in computer graphics: a survey. In Computer Graphics Forum, Vol. 37, pp. 35–58. Cited by: §I.
  • [2] A. Bas, W. A. Smith, T. Bolkart, and S. Wuhrer (2016) Fitting a 3d morphable model to edges: a comparison between hard and soft correspondences. In Asian Conference on Computer Vision, pp. 377–391. Cited by: §I.
  • [3] V. Blanz, T. Vetter, et al. (1999) A morphable model for the synthesis of 3d faces.. In Siggraph, Vol. 99, pp. 187–194. Cited by: §I, §II-A.
  • [4] S. Bouaziz, Y. Wang, and M. Pauly (2013) Online modeling for realtime facial animation. ACM Transactions on Graphics (ToG) 32 (4), pp. 40. Cited by: §I.
  • [5] C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou (2013) Facewarehouse: a 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20 (3), pp. 413–425. Cited by: §II-A.
  • [6] B. Chu, S. Romdhani, and L. Chen (2014) 3D-aided face recognition robust to expression and pose variations. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 1899–1906. Cited by: §II-A.
  • [7] P. Dou, S. K. Shah, and I. A. Kakadiaris (2017) End-to-end 3d face reconstruction with deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5908–5917. Cited by: §I, §II-A.
  • [8] J. D. Durou, M. Falcone, and M. Sagona (2008) Numerical methods for shape-from-shading: A new survey with benchmarks. Computer Vision and Image Understanding. External Links: Document, ISSN 10773142 Cited by: §I.
  • [9] L. Goto, T. Huysmans, W. Lee, J. F. Molenbroek, and R. H. Goossens (2018) A comparison between representative 3d faces based on bi-and multi-variate and shape based analysis. In Congress of the International Ergonomics Association, pp. 1355–1364. Cited by: §I.
  • [10] Y. Guo, J. Zhang, J. Cai, B. Jiang, and J. Zheng (2018) CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images. External Links: Document, 1708.00980v3, ISSN 19393539 Cited by: §I, §II-A.
  • [11] M. Han, S. Feng, C. L. P. Chen, M. Xu, and T. Qiu (2018) Structured manifold broad learning system: a manifold perspective for large-scale chaotic time series analysis and prediction. IEEE Transactions on Knowledge and Data Engineering (), pp. 1–1. External Links: Document, ISSN 1041-4347 Cited by: §I.
  • [12] T. Hassner, S. Harel, E. Paz, and R. Enbar (2015) Effective face frontalization in unconstrained images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4295–4304. Cited by: §I.
  • [13] P. L. Hsieh, C. Ma, J. Yu, and H. Li (2015) Unconstrained realtime facial performance capture. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, External Links: Document, ISBN 9781467369640, ISSN 10636919 Cited by: §I.
  • [14] P. Huber, G. Hu, R. Tena, P. Mortazavian, P. Koppen, W. J. Christmas, M. Ratsch, and J. Kittler (2016) A multiresolution 3d morphable face model and fitting framework. In Proceedings of the 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Cited by: §I.
  • [15] R. Ibanez, E. Abisset-Chavanne, J. V. Aguado, D. Gonzalez, E. Cueto, and F. Chinesta (2018) A manifold learning approach to data-driven computational elasticity and inelasticity. Archives of Computational Methods in Engineering 25 (1), pp. 47–57. Cited by: §I.
  • [16] A. E. Ichim, S. Bouaziz, and M. Pauly (2015) Dynamic 3D avatar creation from hand-held video input. ACM Transactions on Graphics. External Links: Document, ISSN 07300301 Cited by: §I.
  • [17] I. Jolliffe (2003) Principal component analysis. Technometrics 45 (3), pp. 276. Cited by: §II-A.
  • [18] A. Jourabloo and X. Liu (2016) Large-pose Face Alignment via CNN-based Dense 3D Model Fitting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), External Links: Document, ISBN 9781467388511, ISSN 10636919 Cited by: §I.
  • [19] I. Kemelmacher-Shlizerman and R. Basri (2011) 3D face reconstruction from a single image using a single reference face shape. IEEE Transactions on Pattern Analysis and Machine Intelligence. External Links: Document, ISSN 01628828 Cited by: §I.
  • [20] D. P. Kingma and J. Ba (2014) Adam: A Method for Stochastic Optimization. pp. 1–15. External Links: 1412.6980, Link Cited by: §II-D.
  • [21] M. Koestinger, P. Wohlhart, P. M. Roth, and H. Bischof (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In 2011 IEEE international conference on computer vision workshops (ICCV workshops), pp. 2144–2151. Cited by: §II-C.
  • [22] H. Li, J. Yu, Y. Ye, and C. Bregler (2013) Realtime facial animation with on-the-fly correctives. ACM Transactions on Graphics. External Links: Document, 1111.6189v1, ISBN 0730-0301, ISSN 07300301 Cited by: §I.
  • [23] T. Lin and H. Zha (2008-05) Riemannian manifold learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (5), pp. 796–809. External Links: Document, ISSN 0162-8828 Cited by: §I.
  • [24] I. Masi, F. Chang, J. Choi, S. Harel, J. Kim, K. Kim, J. Leksut, S. Rawls, Y. Wu, T. Hassner, et al. (2019) Learning pose-aware models for pose-invariant face recognition in the wild. IEEE transactions on pattern analysis and machine intelligence 41 (2), pp. 379–393. Cited by: §I.
  • [25] M. Meytlis and L. Sirovich (2007) On the dimensionality of face space. IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (7), pp. 1262–1267. Cited by: §I.
  • [26] P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vetter (2009) A 3d face model for pose and illumination invariant face recognition. In 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 296–301. Cited by: §II-A.
  • [27] E. Prados and O. Faugeras (2006) Shape from shading. In Handbook of Mathematical Models in Computer Vision, External Links: Document, ISBN 0387263713 Cited by: §I.
  • [28] E. Richardson, M. Sela, and R. Kimmel (2016) 3D face reconstruction by learning from synthetic data. In Proceedings - 2016 4th International Conference on 3D Vision, 3DV 2016, External Links: Document, 1609.04387v2, ISBN 9781509054077 Cited by: §I, §II-A.
  • [29] E. Richardson, M. Sela, R. Or-El, and R. Kimmel (2017) Learning detailed face reconstruction from a single image. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, External Links: Document, ISBN 9781538604571 Cited by: §I, §II-A.
  • [30] S. Romdhani and T. Vetter (2005) Estimating 3d shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2, pp. 986–993. Cited by: §II-C.
  • [31] J. Roth, Y. Tong, and X. Liu (2015) Unconstrained 3D face reconstruction. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, External Links: Document, ISBN 9781467369640, ISSN 10636919 Cited by: §I.
  • [32] J. Roth, Y. Tong, and X. Liu (2017) Adaptive 3D Face Reconstruction from Unconstrained Photo Collections. IEEE Transactions on Pattern Analysis and Machine Intelligence. External Links: Document, ISBN 978-1-4673-8851-1, ISSN 01628828 Cited by: §I.
  • [33] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic (2013) 300 faces in-the-wild challenge: the first facial landmark localization challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 397–403. Cited by: §II-C.
  • [34] K. Scherbaum, T. Ritschel, M. Hullin, T. Thormählen, V. Blanz, and H. Seidel (2011) Computer-suggested facial makeup. In Computer Graphics Forum, Vol. 30, pp. 485–492. Cited by: §I.
  • [35] M. Sela, E. Richardson, and R. Kimmel (2017) Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation. In Proceedings of the IEEE International Conference on Computer Vision, External Links: Document, arXiv:1703.10131v2, ISBN 9781538610329, ISSN 15505499 Cited by: §I, §II-A.
  • [36] M. Sela, N. Toledo, Y. Honen, and R. Kimmel (2016) Customized facial constant positive air pressure (cpap) masks. arXiv preprint arXiv:1609.07049. Cited by: §I.
  • [37] L. Sirovich and M. Kirby (1987) Low-dimensional procedure for the characterization of human faces. Josa a 4 (3), pp. 519–524. Cited by: §I.
  • [38] A. T. Tran, T. Hassner, I. Masi, and G. Medioni (2017) Regressing robust and discriminative 3D morphable models with a very deep neural network. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, External Links: Document, 1612.04904, ISBN 9781538604571, ISSN 1063-6919 Cited by: §I.
  • [39] A. Tu?n Tr?n, T. Hassner, I. Masi, E. Paz, Y. Nirkin, and G. r. Medioni (2018-06) Extreme 3d face reconstruction: seeing through occlusions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I.
  • [40] W. Wang, Y. Yan, F. Nie, S. Yan, and N. Sebe (2018-06) Flexible manifold learning with optimal graph for image and video representation. IEEE Transactions on Image Processing 27 (6), pp. 2664–2675. External Links: Document, ISSN 1057-7149 Cited by: §I.
  • [41] Wen Yi Zhao and R. Chellappa (2002) Illumination-insensitive face recognition using symmetric shape-from-shading. External Links: Document Cited by: §I.
  • [42] C. Wu, B. Wilburn, Y. Matsushita, and C. Theobalt (2011) High-quality shape from multi-view stereo and shading under general illumination. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, External Links: Document, ISBN 9781457703942, ISSN 10636919 Cited by: §I.
  • [43] R. Zhang, P. Tsai, J. E. Cryer, and M. Shah (1999) Shape-from-shading: a survey. IEEE transactions on pattern analysis and machine intelligence 21 (8), pp. 690–706. Cited by: §I.
  • [44] W. Y. Zhao and R. Chellappa (2001) Symmetric shape-from-shading using self-ratio image. International Journal of Computer Vision. External Links: Document, ISSN 09205691 Cited by: §I.
  • [45] X. Zhu, Z. Lei, X. Liu, H. Shi, and S. Z. Li (2016) Face alignment across large poses: a 3d solution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 146–155. Cited by: §II-C, §II-C, §II-C.
  • [46] X. Zhu, Z. Lei, J. Yan, D. Yi, and S. Z. Li (2015) High-fidelity pose and expression normalization for face recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 787–796. Cited by: §I.
  • [47] X. Zhu, X. Liu, Z. Lei, and S. Z. Li (2019) Face Alignment in Full Pose Range: A 3D Total Solution. IEEE Transactions on Pattern Analysis and Machine Intelligence. External Links: Document, arXiv:1804.01005, ISSN 19393539 Cited by: §I.