Look globally, age locally: Face aging with an attention mechanism

10/24/2019 ∙ by Haiping Zhu, et al. ∙ 0

Face aging is of great importance for cross-age recognition and entertainment-related applications. Recently, conditional generative adversarial networks (cGANs) have achieved impressive results for face aging. Existing cGANs-based methods usually require a pixel-wise loss to keep the identity and background consistent. However, minimizing the pixel-wise loss between the input and synthesized images likely resulting in a ghosted or blurry face. To address this deficiency, this paper introduces an Attention Conditional GANs (AcGANs) approach for face aging, which utilizes attention mechanism to only alert the regions relevant to face aging. In doing so, the synthesized face can well preserve the background information and personal identity without using the pixel-wise loss, and the ghost artifacts and blurriness can be significantly reduced. Based on the benchmarked dataset Morph, both qualitative and quantitative experiment results demonstrate superior performance over existing algorithms in terms of image quality, personal identity, and age accuracy.



There are no comments yet.


page 3

page 4

Code Repositories


The implement attention conditional GANs (AcGAN) model.

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Face aging, also known as age progression, aims to render a given face image with natural aging effects under a certain age or age group. In recent years, face aging has attracted major attention due to its extensive use in numerous applications, entertainment [1], finding missing children [2]

, cross-age face recognition 

[3], etc. Although impressive results have been achieved recently [4, 5, 6, 7, 8], there are still many challenges due to the intrinsic complexity of aging in nature and the insufficient labeled aging data. Intuitively, the generated face images should be photo-realistic, e.g., without serious ghosting artifacts. In addition to that, the face aging accuracy and personal identify permanence of the generated face images should be guaranteed simultaneously.

Recently, the generative adversarial networks (GANs) [9] have shown an impressive ability in generating synthetic images [10] and face aging [11, 4, 5, 6, 7, 8]. These approaches render faces with more natural aging effects in terms of high quality, identity consistency, and aging accuracy compared to the previous conventional solutions, such as prototype-based [2] and physical model-based methods [12, 13]. However, the problems have not been completely solved. For example, Zhang et al. [4]

first proposed a conditional adversarial autoencoder (CAAE) for face aging by traversing on the face manifold in low dimension, but it cannot keep the identity information of generated faces well. To solve this problem, Yang 

et al. [8] and Wang et al. [5]

proposed a condition GANs with a pre-trained neural network to preserve the identity of generated images. Most existing GANs-based methods usually train the model with the pixel-wise loss 

[4, 8] to preserve identity consistency and keep background information. But to minimize the Euclidean distance between the synthesized images and the input images will easily cause the synthesized images becoming ghosted or blurred [14]. In particular, this problem would be more severe if the gap between the input age and the target age becomes larger.

Inspired by the success of attention mechanism in image-to-image translation 

[15], in this paper, we propose an Attention Conditional GANs (AcGANs) to tackle these issues mentioned-above. Specifically, the proposed AcGANs consists of a generator and a discriminator . The generator receives an input image and a target age code and the output of the generator contains an attention mask and a color mask. The attention mask learns the modified regions relevant to face aging and the color mask learns how to modify. The final output of the generator is a combination of the attention and the color masks. Since the attention mechanism only modifies the regions relevant to face aging, it can preserve the background information and the personal identity well without using the pixel-wise loss in training. The discriminator

consists of an image discriminator and an age classifier, aiming to make the generated face be more photo-realistic and guarantee the synthesized face lies in the target age group.

The main contributions of this paper are: i) We propose a novel approach for face aging, which utilizes an attention mechanism to modify the regions relevant to face aging. Thus, the synthesized face can well preserve the background information and personal identity without using pixel-wise loss, and the ghost artifacts and blurriness can be significantly reduced. ii) Both qualitative and quantitative experiment results on Morph demonstrate the effectiveness of our model in terms of image quality, personal identity, and age accuracy.

2 The Proposed Methods

We divide faces with different ages into 5 nonoverlapping groups, i.e., 11-20, 21-30, 31-40, 41-50, and 50+. Given a face image , where and are the hight and width of the image, respectively. We use a one-hot label to indicate the age group that belongs to. The aim is to learn a generator to generate a synthesized face image that lies in target age group , looks realistic, and has the same identity as the input face image .

2.1 Network Architecture

The proposed approach, shown in Fig. 1, consists of two main modules: i) A generator is trained to generate a synthesized face with target age ; ii) A discriminator aims to make looks realistic and guarantee lie in target age group .

Figure 1: The architecture of the proposed method. In our network, color mask module and attention mask module share parameters , except for the last layer. Similarly, discriminator and age classifier share parameters .

Generator Given a input face image and a target age

, we need to pad the one-hot label

into . Then, we form the input of generator as a concatenation .

One key ingredient of our approach is to make focus on those regions of the image that are relevant to face aging and keep the background information unchanged and preserve identity consistency. For this purpose, we have embedded an attention mechanism to the generator. Concretely, instead of regressing a full image, our generator outputs two masks, an attention mask and a color mask . The final generated image can be obtained as:


where denotes the element-wise product, , and . The mask indicates to which extend each pixel of the contributions to the generative image .

Discriminator This module consists of an image discriminator and an age classifier , aiming to make the generated face be realistic and guarantee the synthesized face lies in the target age group. Note that and share parameters , as shown in Fig. 1, which makes the performance of the discriminator improve significantly.

2.2 Loss Function

The defined loss function includes three terms: 1)

The adversarial loss proposed by Gulrajani et al. [16] that pushed the distribution of the generated images to the distribution of the training images; 2) The attention loss to drive the attention masks to be smooth and prevent them from saturating; 3) The age classification loss to make the generated facial image more accurate in age classification.

Adversarial Loss To learn the parameters of the generator , we utilize the modification of the standard GAN algorithm [9] proposed by Wasserstein GAN with gradient penalty (GAN-GP) [16]

. Specifically, the original GAN formulation is based on the Jenson-Shannon (JS) divergence loss function and aims to maximize the probability of correctly classifying real and fake images while the generator tries to fool the discriminator. This loss is potentially not continuous for the parameters of the generator and can locally saturate leading to vanishing gradients in the discriminator. This is addressed in WGAN 

[17] by replacing JS with the continuous Earth Mover Distance. To maintain a Lipschitz constraint, WGAN-GP [16] added a gradient penalty for the critic network computed as the norm of the gradients for the critic input.

Formally, let be the distribution of the input image , and

be the random interpolation distribution between

and . Then, the adversarial loss can be written as:


where is a penalty coefficient.

Attention Loss Note that when training the model, we do not have ground-truth annotation for the attention masks . Similarly as for the color masks , they are learned from the resulting gradients of the discriminative module and the age classification loss. However, the attention masks can easily saturate to 1, which makes that the attention module does not effect. To prevent this situation, we regularize the mask with a -weight penalty. Besides, to enforce smooth spatial color transformation when combining the pixel from the input image and the color transformation , we perform a Total Variation Regularization over . The attention loss can be defined as:


where and is the entry of . Besides, is a penalty coefficient.

Age Classification Loss While reducing the image adversarial loss, the generator must also reduce the age error by the age classifier

. The age classification loss is defined with two components: an age estimation loss with fake images used to optimize G, and an age estimation loss of real images used to learn the age classifier

. This loss is computed as:


where is the label of input image , corresponds to a softmax loss.

Final Loss To generate the target age image , we build a loss function by linearly combining all previous losses:


where , and are the hyper-parameters that control the relative importance of every loss term. Finally, we can define the following minimax problem:


where draws samples from the data distribution. Additionally, we constrain our discriminator to lie in , which represents the set of 1-Lipschitz functions.

3 Experiments

In this section, we introduce our implementation details and then evaluate our proposed model both qualitatively and quantitatively on a large public dataset Morph [18], which contains 55,000 face images of 13,617 subjects from 16 to 77 years old. To better demonstrate the superiority in preserving identity features of our methods, we have also compared the two state-of-the-art methods: Conditional Adversarial Autoencoder Network (CAAE) [4] and Identity-Preserved Conditional Generative Adversarial Networks (IPCGANs) [5].

Figure 2: Illustration of generation results by the proposed AcGANs method. For each subject, the first row images are the generative face aging images, the second row is the details of the intermediate attention mask A, and the third row is the color mask C.

3.1 Implementation Details

Following prior works [4, 5, 6], before fed into the networks, the faces are (1) aligned by the five facial landmarks detected by MTCNN [19], (2) cropped to pixels of 10% more area, thus not only hair but also beard are all covered, (3) divided into five age groups, i.e.

, 10-20, 21-30, 31-40, 41-50, 51+. Consequently, a total of 54,539 faces are collected and then we split Morph dataset into two parts, 90% for training and the rest for testing without overlapping. The realization of AcGANs is based on the open-sourcePyTorch” framework

111The code has been released in https://github.com/JensonZhu14/AcGAN..

During training, we adopt an architecture similar with [6] which is shown in Fig. 1. Different from [6], our generator receives images and condition feature maps concatenated together along channel as input, which is larger than of CAAE and IPCGANs, thus a more clear result is generated. Furthermore, the conditional feature maps are similar to one-hot code in some ways where only one of which is filled with ones while the rest are all filled with zeros. For IPCGANs, we first train the age classifier which is finetuned based on AlexNet on the CACD [20] and other parameters are set according to [4]. For CAAE, we remove the gender information and use 5 age groups instead of for fair comparison. For AcGANs, we set to 10, while is 2, is 100, is 10, and is , respectively. For all of them including AcGANs, we choose Adam to optimize both and with learning rate and batch-size set to and 64, respectively. Thus we train the and

in turn every iteration with total 100 epochs on four 2080 Ti GPU.

3.2 Results on Morph

In this subsection, we first visualize the aging process from the perspective of what AcGANs have learned from the input image, i.e., attention mask and color mask. As shown in Fig. 2, we select four face images from the test dataset randomly regardless of their original age group and exhibit the aging results in the first row while the second row is attention mask and the third row is color mask correspondingly. According to the attention mask, we can draw a convincing conclusion that AcGANs indeed learns which parts of the face should be aged.

We further qualitatively compare the generated faces of different methods in Fig. 3. All of the three generated results show that AcGANs has a more powerful capability of removing ghosted artifacts. Meanwhile, the adornments marked in the red rectangle of the last two faces are preserved integrally by AcGANs, which has proved that AcGANs has learned what should be aged in the face once again.

Figure 3: Some synthesized faces generated by different methods. For each sample, from top to bottom, they are images generated by AcGANs, IPCGANs, and CAAE. The input age lines in [11-20] age group and the numbers above the images are the corresponding target age.

3.3 Quantitative Comparison

To avoid the suspicion that the limited images demonstrated in the paper, we have also evaluated all results quantitatively. In literature [5, 6]

, there are two critical evaluation metrics in age progression,

i.e. identity permanence and aging accuracy. We first generate the elder faces from young faces, i.e. faces of 10-20 age group, and then evaluate them separately.

To estimate the aging accuracy, we use Face++ API [21] to estimate the age distributions, i.e., mean value of both generic and generated faces in each age group, where less discrepancy between real and fake images indicates more accurate simulation of aging effects. For simplicity, we report the mean value of age distributions while the discrepancy with generic age distribution is shown in brackets (seen in Table 1). For identity permanence, Face verification experiments are also conducted on Face++ API, where high verification confidence and verification rate indicates a powerful performance to preserve identity information. From Table 2 it can be seen that the top is verification confidence between ground truth young faces and their aging elder ones generated by AcGANs, and the bottom is verification rate between them which means the accuracy that they are the same person. The best values for each column of both Table 1 and Table 2 are indicated in bold.

On Morph, it could be easily seen that our AcGANs consistently outperform CAAE and IPCGANs in two metrics during all four aging processes. Although IPCGANs has a better capability in preserving identity information, it generates worse inferior aging faces than CAAE, while CAAE fails to keep the original identity. However, AcGANs could not only achieve a better aging result but also preserve identity consistently in an advantageous position.

Estimated Age Distributions
Age group 21-30 31-40 41-50 50+
Generic 25.12 35.43 44.72 54.88
CAAE [4] 24.31(0.81) 31.02(4.41) 39.03(5.69) 47.84(7.04)
IPCGANs [5] 22.38(2.74) 27.53(7.90) 36.41(8.31) 46.42(8.46)
AcGANs 25.92(0.80) 36.49(1.06) 40.59(4.13) 47.88(7.00)
Table 1: Estimated Age Distributions (in years) on MORPH. Generic means that the mean value of each group is computed in the ground truth, while the number in brackets indicates the differences from generic mean age.
21-30 31-40 41-50 50+
Verification Confidence
10-20 95.36 94.78 94.74 93.44
21-30 - 95.37 95.28 94.11
31-40 - - 95.65 94.72
41-50 - - - 95.26
Verification Rate (Threshold = 73.975,
FAR = 1e-5)
CAAE [4] 99.38 97.82 92.72 80.56
IPCGANs [5] 100 100 100 100
AcGANs 100 100 100 100
Table 2: Face verification results on Morph. The top is the verification confidence by AcGANs and the bottom is the verification rate for all methods. Noted that the generated and the input faces are considered as the same identity if the verification confidence is above the pre-defined threshold.

4 Conclusions

In this paper, we propose a novel approach based on an attention mechanism for face aging. Since the attention mechanism only modifies the regions relevant to face aging, the proposed approach can well preserve the background information and the personal identity without using the pixel-wise loss, significantly reducing the ghost artifacts and blurring. Besides, the proposed approach is simple for it consists of only a generator and a discriminator sub-networks and can be learned without additional pre-trained models. Moreover, both qualitative and quantitative experiments validate the effectiveness of our approach.


  • [1] Yun Fu, Guodong Guo, and Thomas S. Huang, “Age synthesis and estimation via faces: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 11, pp. 1955–1976, 2010.
  • [2] Ira Kemelmacher-Shlizerman, Supasorn Suwajanakorn, and Steven M Seitz, “Illumination-aware age progression,” in

    IEEE Conference on Computer Vision and Pattern Recognition

    , 2014, pp. 3334–3341.
  • [3] Unsang Park, Yiying Tong, and Anil K. Jain, “Age-invariant face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 5, pp. 947–954, 2010.
  • [4] Zhifei Zhang, Yang Song, and Hairong Qi, “Age progression/regression by conditional adversarial autoencoder,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5810–5818.
  • [5] Zongwei Wang, Xu Tang, Weixin Luo, and Shenghua Gao, “Face aging with identity-preserved conditional generative adversarial networks,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7939–7947.
  • [6] Yunfan Liu, Qi Li, and Zhenan Sun, “Attribute-aware face aging with wavelet-based generative adversarial networks,” in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11877–11886.
  • [7] Sveinn Palsson, Eirikur Agustsson, Radu Timofte, and Luc Van Gool, “Generative adversarial style transfer networks for face aging,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 2084–2092.
  • [8] Hongyu Yang, Di Huang, Yunhong Wang, and Anil K Jain, “Learning face age progression: A pyramid architecture of GANs,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 31–39.
  • [9] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014, pp. 2672–2680.
  • [10] Jon Gauthier, “Conditional generative adversarial nets for convolutional face generation,”

    Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester

    , vol. 2014, no. 5, pp. 2, 2014.
  • [11] Wei Wang, Zhen Cui, Yan Yan, Jiashi Feng, Shuicheng Yan, Xiangbo Shu, and Nicu Sebe, “Recurrent face aging,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2378–2386.
  • [12] Jinli Suo, Xilin Chen, Shiguang Shan, Wen Gao, and Qionghai Dai, “A concatenational graph evolution aging model,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2083–2096, 2012.
  • [13] Jinli Suo, Song-Chun Zhu, Shiguang Shan, and Xilin Chen, “A compositional and dynamic model for face aging,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 3, pp. 385–401, 2009.
  • [14] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros,

    “Image-to-image translation with conditional adversarial networks,”

    in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1125–1134.
  • [15] Albert Pumarola, Antonio Agudo, Aleix M Martinez, Alberto Sanfeliu, and Francesc Moreno-Noguer, “Ganimation: Anatomically-aware facial animation from a single image,” in European Conference on Computer Vision, 2018, pp. 818–833.
  • [16] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville, “Improved training of Wasserstein GANs,” in Advances in Neural Information Processing Systems, 2017, pp. 5767–5777.
  • [17] Martin Arjovsky, Soumith Chintala, and Léon Bottou, “Wasserstein generative adversarial networks,” in

    International Conference on Machine Learning

    , 2017, pp. 214–223.
  • [18] Karl Ricanek and Tamirat Tesafaye, “Morph: A longitudinal image database of normal adult age-progression,” in International Conference on Automatic Face and Gesture Recognition, 2006, pp. 341–345.
  • [19] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016.
  • [20] Bor-Chun Chen, Chu-Song Chen, and Winston H Hsu, “Cross-age reference coding for age-invariant face recognition and retrieval,” in European Conference on Computer Vision, 2014, pp. 768–783.
  • [21] Megvii Inc, “Face++ research toolkit,” https://www.faceplusplus.com/.