1 Introduction
Significant progress has been observed during the past years in the field of sparse and dense 3D face alignment [9, 11, 21, 22, 2]
. Recent developments include the utilization of Deep Neural Networks (DNNs) for estimation of 3D facial structure, as well as a methodology for fitting a 3D Morphable Model (3DMM) in “inthewild” images
[2]. Additionally, several benchmarks for training sparse 3D face alignment models have been recently developed [21, 2]. The utilization of these methods introduces new challenges and opportunities as far as facial texture is concerned. In particular, by sampling over the fitted image, a 2D UV map of the facial texture can be constructed. In order to further motivate the proposed method, in Fig. 1, we depict an example of such a facial UV map. As evinced by the figure, facial UV maps contain a considerable amount of missing data (pixels) due to factors such as selfocclusion. Nevertheless, they do not suffer from warping effects, in contrast to the facial images produced by a 2D face alignment algorithm (Fig. 1). Utilizing facial UV maps for the discovery of latent components suitable for specific tasks (such as age or illumination transfer) requires the design of statistical component analysis methods that (a) can appropriately handle missing values, (b) can alleviate problems arising from gross errors, and (c) exploit any existing labels/attributes that are available. To tackle the aforementioned issues, it is natural to adopt techniques from the family of robust component analysis.In the past years, significant research has been conducted in terms of formulating robust component analysis techniques. Arguably, the most prominent example lies in the Robust PCA (RPCA) algorithm [3], that has also been extended for handling missing values in [20]. The RPCA algorithm with missing values has been recently proven extremely useful towards the extraction of a lowrank subspace of facial UV textures that is free of gross errors, thus deeming it extremely useful for the fitting of 3DMMs “inthewild” [2]. Nevertheless, RPCA is an unsupervised component analysis technique, and hence does not take into account the various attributes/annotations that may be present in the data athand.
Other recent robust component analysis methods include Robust Correlated and Individual Component Analysis (RCICA) [14]
, as well as the Robust Joint and Individual Variance Explained (RJIVE)
[16]. RCICA robustly recovers both the correlated and individual lowrank components of two views of noisy data, and can therefore be interpreted as a robust extension of Canonical Correlation Analysis (CCA) [19]. Nevertheless, RCICA is not designed to utilize labels or any available annotations. RJIVE further extends RCICA by extracting lowrank subspaces from multipleviews similarly to RCICA in the presence of a single attribute only (e.g., if the data athand are annotated for the attribute age, then they may be split in different agegroups and each agegroup can be considered as a different view). As a result, data that is annotated in terms of multiple attributes (such as identity and age) cannot be fully exploited in RJIVE. Finally, both aforementioned methods have not been extended in order to deal with missing values, and thus can not be directly applied to the analysis of facial UV maps.To alleviate the shortcomings of the previously mentioned methods with respect to the problem of facial UV analysis, in this paper we introduce MultiAttribute Robust Component Analysis, dubbed MARCA. In summary, the contributions of the paper are as follows.

We introduce MARCA, a novel component analysis technique which recovers suitable components that robustly capture the shared and individual variation of data under a multiattribute scenario. Furthermore, MARCA is inherently able to handle observations with missing values, as well as sparse and gross corruptions.

We demonstrate that MARCA can be applied to a number of challenging problems, such as completion of missing data in the texture of a reconstructed 3D facial image and transfer of multiple attributes in images captured “inthewild” (e.g., illumination, identity and age).

We show that the components obtained by MARCA can be used to train deep learning systems for various tasks.
2 MultiAttribute Robust Component Analysis
2.1 Preliminaries
Prior to delving into the model, a few explanations regarding the notations used throughout the paper are provided. Lowercase letters, e.g., , denote scalars, lowercase (uppercase) bold letters denote vectors (matrices), e.g., (). Moreover, () vector norm is defined as (). Similarly, () matrix norm is defined as (). The nuclear norm of a matrix
, i.e., the sum of its singular values, is defined as
. The Hadamard, i.e., elementwise, product of two matrices and is denoted as . Finally, we provide the following operator definitions which will be utilized in the mathematical derivations required for MARCA.
Procrustes operator: , where and are given by the rankSingular Value Decomposition (SVD) of , i.e., .

Shrinkage operator:

Singular Value Thresholding (SVT) operator: , where and are given by the rank SVD of , i.e., .
2.2 Problem formulation
Without any loss of generality, suppose that the incomplete, contaminated with gross errors UV maps athand are annotated for attributes (e.g., identity, age, etc.), where each attribute may have , different instantiations (e.g., attribute identity may have the instantiation Frank Sinatra, Albert Einstein, etc.). Moreover, assume that there is a total of samples in the training set. Aim of MARCA is to robustly extract joint components corresponding to the available attributes during training, an individual component which captures the rest data information that cannot be explained by the components and a component which captures the gross but sparse errors. Let training data be concatenated in a columnwise manner, i.e., , where , is a vectorized form of a facial UV map. Then MARCA admits the following decomposition.
(1) 
where , are the shared components for every attribute, is the individual component and is the error component.
Nevertheless, , must have a specific lowrank structure which accounts for the different instantiations of each attribute. That is, every attribute should be rendered by a base and subsequently every corresponding instantiation be rendered by a selector on that base. Therefore, (1) is reformulated as follows.
(2) 
where , are the bases that render each attribute and , are comprised of the shared selectors, i.e., , which render a specific instantiation for an attribute (e.g., assuming that base renders attribute identity, then would render a particular instantiation of this attribute, e.g., Albert Einstein). It should be noted that data which bear the same instantiation for a particular attribute (e.g., multiple data with instantiation Albert Einstein) have the same selector. Furthermore, lowrank base , renders the individual variation for all of the images in the training set that cannot be explained by the existing attributes. Finally, encapsulates gross errors (such as occlusions, pixel corruptions, etc.) for all of the data samples in the training set.
In order to recover components and which are as informative as possible, the error term which accounts for the existence of gross but sparse errors in the visible parts of the UVs has to be minimized. This is equivalent to minimizing the norm of the error term [5] for the visible parts of the UV maps. The problem is then formulated as follows.
(3)  
where and , is a hyperparameter. Moreover, , where , is the corresponding vectorized occlusion mask for each UV map . The visible (missing) pixels for each UV map correspond to ones (zeros) in each matching occlusion mask. Orthonormalization constraints on the bases facilitate the recovery of unique and identifiable selectors. Because of the fact that is a hyperparameter, it requires a large number of experiments to estimate the optimal rank for . Since is upper bounded, the following relaxed decomposition can be used to automatically recover the optimal rank for .
(4)  
where the nuclear norm of is introduced as a convex surrogate of the rank function [3] and is a regularizer.
2.3 Mathematical derivations
Because problem (4) is separable, we adopt a alternating optimization scheme to find the updates for every parameter. The corresponding partially Augmented Lagrangian for (4) may then be written as
(5)  
s.t. 
where . Problem (5) is minimized by employing the Alternating Direction Method of Multipliers (ADMM) [6, 1]. Comprehensive derivations of the optimization problems and the complete algorithm for solving (4) are provided in the supplementary material. The algorithm terminates when the iterations reach a predefined max value or a convergence criterion is met. The convergence criterion is met when the normalized reconstruction error, i.e., is less than a predefined threshold . The ADMM iteration reads as follows.
Update the primal variables:
For obtaining , where, as previously mentioned, , we need to solve individually for every . Based on (5), the solution is given by minimizing
(6) 
Problem (6) admits a closedform solution, which is
(7) 
where superscript means that only the columns corresponding each time to the th instantiation of the th attribute are considered (e.g., columns corresponding to data annotated for attribute identity and instantiation Albert Einstein) and is a column vector of ones.
For deriving subspace , the following needs to be solved.
(8)  
In order to solve (8), we rely on the Procrustes Operator and the Lemma introduced next.
Lemma: The constraint minimization problem
(9)  
has a closedform solution [17] of the form . As a result, the solution for (8), taking into account (9), is
(10) 
For obtaining subspace , the following needs to be solved.
(11) 
Problem (11) is solved utilizing the SVT operator . The solution is
(12) 
For obtaining , the following problem needs to solved.
(13) 
Problem (13) is solved utilizing the Shrinkage operator . The solution is
(14) 
where
(15) 
and is the complement of .
Update the Lagrange multiplier and parameter:
(16)  
(17) 
Regarding the theoretical convergence of the ADMM algorithm presented previously, there is no proof when ADMM is utilized in settings with more than two blocks of variables. Nevertheless, ADMM provides good results in nonlinear optimization problems [15]. Furthermore, experimental evaluation of MARCA in a number of different tasks on “inthewild” data admits that the derived solutions constitute a good approximation.
2.4 Reconstruction of a test image
After the bases and the selectors have been recovered as described in Section 2.2, they may be utilized in order to recover the shared and individual components of a test image. Then the said components can be utilized in experiments such as completion of missing UV parts and joint transfer of a facial test image to another age, identity or illumination, as demonstrated in Section 3.
Without any loss of generality, assume a test UV map , which may be decomposed in the shared and individual components as follows.
(18) 
where is the linear span of , given by applying the rank SVD on . In the most general case, optimal selectors and must be extracted by minimizing the sparse error term corresponding to the visible part of , for already recovered and . That is, the following needs to be solved.
(19)  
where is the occlusion mask corresponding to the test UV map and is the reconstructed facial UV map. In the case where, e.g., transfer of a test image to a specific age is required, the selector corresponding to the specific age will be fixed (i.e., the corresponding selector found during the training process in Section 2.2 is utilized). Problem (19) is solved by employing the ADMM. The algorithm and complete derivations for solving problem (19) are provided in the supplementary material.
3 Experiments
The experimental evaluation of MARCA against other stateoftheart algorithms is carried out via a series of experiments such as: a) completion of UV maps with missing values on data captured in “inthewild” conditions, b) ageprogression on data captured in controlled as well as “inthewild” conditions, c) joint illumination and identity transfer on data captured in controlled as well as “inthewild” conditions. Moreover, we demonstrate how MARCA can be employed to produce data with specific characteristics (e.g., “inthewild” images illuminated from various angles) that can be then utilized to train deep networks for tailored applications such as illumination transfer.
In order to extract the incomplete facial UV maps that have been used in our algorithm we have fitted the various databases with a 3DMM. The 3DMM fitting process that has been used was the one in [2], which is publicly available. The shape and camera parameters were used in order to sample the texture and compute the occlusion masks in the fitted image.
For the experimental evaluations, databases MultiPIE [7] and AgeDB [12] were utilized to train MARCA. MultiPIE is a database captured under controlled lab conditions and thus the images do not contain grosserrors attributed to e.g., occlusions. Nevertheless, MultiPIE is a multiattribute database, since it contains labels for attributes such as identity and illumination. That renders it suitable to be utilized in MARCA to extract bases with respect to e.g., illumination that can be then used to reconstruct “inthewild” images with various illumination settings (Section 3.3). In particular, in the training phase of MARCA on MultiPIE, all of the identities and illuminations available in the database were used.
AgeDB contains images captured under “inthewild” conditions (i.e., occlusions, various poses, pixel corruptions are present in the images). Moreover, it is annotated for multiple attributes (i.e., identity, age) and thus it is suitable for evaluating MARCA. AgeDB was split in six distinct agegroups, namely 2130, 3140, 4150, 5160, 6170 and 71100. Then the UVs belonging to each agegroup were further split according to attribute identity. In the training phase of MARCA, of the total UVs were kept to extract the bases with respect to attributes identity and agegroups and the rest were used for testing.
3.1 Completion of UV maps with missing values
Completion of UV maps with missing values is a very challenging task which, to the best of our knowledge, has not been addressed in the literature. MARCA is the first technique that can be utilized to handle UV maps with missing values. In this experiment, AgeDB “inthewild” [12] was utilized.
For the testing phase, a random, incomplete, contaminated with gross but sparse errors UV map which did not belong to the training set was chosen and reconstructed following the process described in Section 2.4. As it is evident in Fig. 3 as well as Fig. 4, MARCA successfully fills the missing, occluded parts in the original UV map as well as the missing parts in the corresponding 3D facial textures.
3.2 Ageprogression “’inthewild”
Ageprogression “inthewild” entails the task of rendering a facial image of a subject at various ages. It is arguably a very challenging task in Computer Vision, since “inthewild” images have been captured in uncontrolled conditions (e.g., different illuminations and poses, selfocclusions, etc.). AgeDB
[12] was utilized in this experiment, since it is a manually collected “inthewild” age database with accurate age and identity labels and hence the extracted agegroups and identity bases will contain no errors due to incorrect annotations.In the testing phase, a random UV map which did not belong to the training set was chosen and reconstructed for various ages following the process described in Section 2.4.
Comparisons against other broadly used ageprogression methods are provided in Fig. 6 and Fig. 7. More specifically, we compare MARCA against Illumination Aware Age Progression (IAAP) [10]
, Aging with Deep Restricted Boltzmann Machines (ADRBM)
[13], Exemplarbased Age Progression (EAP) [18] and RJIVE [16] . Finally, MARCA is compared against the stateoftheart RJIVE [16] in Fig. 5.3.3 Multiattribute transfer “inthewild”
In this section, we present a series of experiments under the multiattribute scenario, i.e., when a test image is reconstructed with more than one attribute transferred at the same time. MARCA is the first, to the best of our knowledge, method that can successfully carry out such a task. For this series of experiments, both MultiPIE [7] and AgeDB [12] are utilized.
In particular, the illumination base is extracted from MultiPIE while the identity and individual bases are extracted from AgeDB. During the reconstruction of a test image (Section 2.4) the bases from MultiPIE as well as AgeDB are utilized. In Fig. 8, we present how MARCA can be utilized to transfer the identity of a particular subject into another one and also transfer the illumination setting at the same time.
3.4 Utilizing MARCA in deep learning applications
MARCA applications may be also deemed beneficial in the training phase of deep networks. For example, MARCA can be utilized to reconstruct “inthewild” images with a specific illumination setting. The reconstructed illuminated images can then be used in pairs with the corresponding original “inthewild” data to train a deep generative network (e.g., pix2pix GAN
[8]) for illumination transfer “inthewild”. After the training phase is complete, the trained network can generate illuminated images on test images, similar to the ones MARCA would have reconstructed (Fig. 9). Furthermore, MARCA may be also utilized to augment facial datasets (by transferring illumination, age, etc. in images) and thus provide stateoftheart neural networks with more data during training.4 Conclusions
With the use of 3D face fitting methods we can generate incomplete facial UV maps of the facial textures. The use of incomplete facial UV maps contaminated with gross errors introduces many opportunities and challenges. In particular, since facial UV map lies in a pose free space, linear component analysis techniques can be applied to learn statistical components for various tasks. In this paper, we propose a novel statistical component analysis technique that can tackle the above challenges and at the same time exploit multiple labels of the data athand during training. We demonstrate the usefulness of the proposed robust component analysis technique in various tasks including UV map completion on “inthewild” data, illumination and identity transfer, as well as aging.
5 Acknowledgements
S. Moschoglou was supported by the EPSRC DTA studentship from Imperial College London. E. Ververas was supported by the teaching scholarship from Imperial College London. S. Zafeiriou was partially funded by the EPSRC Project EP/N007743/1 (FACER2VM).
References
 [1] D. P. Bertsekas. Constrained optimization and Lagrange multiplier methods. Academic press, 2014.
 [2] J. Booth, E. Antonakos, S. Ploumpis, G. Trigeorgis, Y. Panagakis, and S. Zafeiriou. 3d face morphable models” inthewild”. arXiv preprint arXiv:1701.05360, 2017.

[3]
E. J. Candès, X. Li, Y. Ma, and J. Wright.
Robust principal component analysis?
Journal of the ACM (JACM), 58(3):11, 2011.  [4] T. Cootes and A. Lanitis. The fgnet aging database, 2008.
 [5] D. L. Donoho. For most large underdetermined systems of linear equations the minimal l1norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics, 59(6):797–829, 2006.
 [6] D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & Mathematics with Applications, 2(1):17–40, 1976.
 [7] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker. Multipie. Image and Vision Computing, 28(5):807–813, 2010.
 [8] P. Isola, J.Y. Zhu, T. Zhou, and A. A. Efros. Imagetoimage translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004, 2016.
 [9] L. A. Jeni, S. Tulyakov, L. Yin, N. Sebe, and J. F. Cohn. The first 3d face alignment in the wild (3dfaw) challenge. In Proceeding of the European Conference on Computer Vision (ECCV), 2016.

[10]
I. KemelmacherShlizerman, S. Suwajanakorn, and S. M. Seitz.
Illuminationaware age progression.
In
Proceedings of the IEEE International Conference on Computer Vision & Pattern Recognition (CVPR)
, pages 3334–3341, 2014.  [11] H. Kim, M. Zollhöfer, A. Tewari, J. Thies, C. Richardt, and C. Theobalt. Inversefacenet: Deep singleshot inverse face rendering from a single image. arXiv:1703.10956, 2017.
 [12] S. Moschoglou, A. Papaioannou, C. Sagonas, J. Deng, I. Kotsia, and S. Zafeiriou. Agedb: the first manually collected, inthewild age database. 2017.
 [13] C. Nhan Duong, K. Luu, K. Gia Quach, and T. D. Bui. Longitudinal face modeling via temporal deep restricted boltzmann machines. In Proceedings of the IEEE International Conference on Computer Vision & Pattern Recognition (CVPR), pages 5772–5780, 2016.
 [14] Y. Panagakis, M. A. Nicolaou, S. Zafeiriou, and M. Pantic. Robust correlated and individual component analysis. IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI), 38(8):1665–1678, 2016.
 [15] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma. Rasl: Robust alignment by sparse and lowrank decomposition for linearly correlated images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11):2233–2246, 2012.
 [16] C. Sagonas, Y. Panagakis, A. Leidinger, S. Zafeiriou, et al. Robust joint and individual variance explained. In Proceedings of IEEE International Conference on Computer Vision & Pattern Recognition (CVPR), 2017.
 [17] P. H. Schönemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1–10, 1966.
 [18] C.T. Shen, W.H. Lu, S.W. Shih, and H.Y. M. Liao. Exemplarbased age progression prediction in children faces. In Proceedings of the IEEE International Symposium on Multimedia, pages 123–128. IEEE, 2011.
 [19] B. Thompson. Canonical correlation analysis. Encyclopedia of statistics in behavioral science, 2005.
 [20] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma. Robust principal component analysis: Exact recovery of corrupted lowrank matrices via convex optimization. In Advances in Neural Information Processing Systems (NIPS), pages 2080–2088, 2009.
 [21] S. Zafeiriou, G. G. Chrysos, A. Roussos, E. Ververas, J. Deng, and G. Trigeorgis. The 3d menpo facial landmark tracking challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), 2017.
 [22] X. Zhu, Z. Lei, X. Liu, H. Shi, and S. Z. Li. Face alignment across large poses: A 3d solution. In Proceedings of the IEEE International Conference on Computer Vision & Pattern Recognition (CVPR), 2016.
Comments
There are no comments yet.