Face age progression [27, 5], also called face aging, is to predict the future looks of a person. It is one of the key techniques for a variety of applications, including looking for the missing person, cross-age face analysis  and so on. Recently, many research efforts have been devoted to generating realistic aged faces, which can be roughly divided into two categories: physical model-based age progression [29, 30] and prototype-based age progression [1, 13]
. The physical model-based methods model the facial patterns and physical mechanisms of aging. While, the prototype-based age progression methods transfer the differences between prototypes (e.g., the average face of each group) into the individual faces. Deep learning methods have also been applied in face age progression due to their powerful feature representations. Wang et al. showed a recurrent face aging framework and Zhang et al.  proposed conditional adversarial autoencoder framework (CAAE) for age progression.
However, existing works always generate only one future look for a person, while totally ignore the future look of a person may change in different occupations. For example, a 20-years-old young man chooses actor/farmer as his career. When he is 50-years-old, the look of the farmer and the look of the actor may have some differences even for the same person. Different occupations may have different appearances  as shown in Figure 1. In this paper, we explore the impact of occupations on the age progression. Note that, for different occupations, the most perceptible difference is skin texture. Hence, we only focus on skin aging 111The face aging process can be divided into two stages [30, 5]: child growth and adult aging. Shape change is the most prominent factor during the child growth, while the most greatest change is skin aging (texture change) during adult aging. We only focus on adult aging. in this work.
We firstly introduce occupational face aging dataset (OFAD). Figure 1 shows some example faces. OFAD is a comprehensively annotated dataset that contains five kinds of occupations: actor, singer, doctor, teacher and farmer. Each occupation includes two range of ages, i.e., 30-50 and 50-80. To the best of our knowledge, it is the first face aging dataset with occupational information. We believe this dataset is benefit for the researches in face progression under different occupations.
We further present a realistic image generation for aging progression under different occupations via the proposed occupational-aware adversarial face aging network which is referred as OAFA. Different from previous approaches which only have one output for one age group, our OAFA can generate several outputs of different occupations.
OAFA has three major components: 1) the generator/encoder aims to generate different future looks for a young face of different occupations, 2) the decoder brings the future looks back to the young face, and 3) the discriminator aims to encourage generator/encoder to generate high quality images of different occupations. The three major components make up two networks: personalized network and occupational-aware adversarial network. The personalized network is an autoencoder network which is formed by the generator/encoder and the decoder. We propose a personalized loss to make the original face can be regenerated by the future looks to keep personalized facial characteristics. The occupational-aware adversarial network consists of the generator/encoder and the discriminator components. The proposed occupational loss includes two terms: conditional adversarial loss [6, 20] and triplet rank loss [15, 21], which aims to obtain the visually plausible texture changes, i.e., skin aging, for different occupations.
Our contributions can be summarized as following:
We introduce an occupational face aging dataset which includes several occupations. This helps to explore the effects of the occupations in face age progression problem.
We propose an occupational-aware face aging adversarial network to generate multi outputs for occupational-aware age progression problem, which can model the personalities and occupational characteristics of the persons in the face aging process.
The empirical results demonstrate the superiority of the proposed method over the state-of-the-arts baseline methods, in which more aging details (textures) and more realistic face images are generated.
2 Related Work
Many face age progression approaches have been proposed to model the dynamic aging process, which can be mainly divided into two categories, physical model-based  and prototype-based methods . The physical model-based methods [16, 25, 26, 29, 30]
simulate face aging by modelling the aging mechanisms, e.g., skins, muscles, wrinkle, etc, via employing the parametric model. However, these methods are computationally expensive and require a long age span of each individual. Unfortunately, collecting a wide range of ages of the same person is very difficult or even unlikely, and few of face aging datasets satisfy this requirement.
use non-parametric model. They firstly divide faces into groups by age and then the average face is computed for each age group. The average face is referred as prototype and the differences between prototypes are viewed as the aging pattern. The main problem of prototype-based methods is that they may ignore the personalized information, e.g., wrinkles. To persevere the personality, Shu et al. presented an age progression method based on dictionaries. Each group has one dictionary, and two neighbouring groups are linked together for learning the aging pattern. Moreover, the personalized layer is proposed to keep the personalized information.
Deep learning methods have also been proposed for solving the age progression problem. Wang et al. 
proposed a recurrent face aging framework based on a recurrent neural network, which can age the face gradually and keep the personalized information by memorizing the previous faces. Zhang et al. presented a conditional adversarial autoencoder network (CAAE) for learning a face manifold. Their method is based on the conditional generative adversarial networks (CGAN) [6, 20], which shows impressive results in image generation.
However, almost all the existing works do not consider that person’s appearance may be different under different occupations. To facilitate the researches, we introduce an occupational face aging dataset for exploring the effects of the occupations. We also propose a new occupational-aware adversarial face aging network for age progression. The most similar work to ours is CAAE . Both CAAE and our method utilize autoencoder and CGAN to generate high quality images. The main difference between the CAAE and our method is that we propose personalized network and occupational-aware adversarial network to explicitly pursuit the common age pattern in different ages and occupations. As a result we can obtain more aging details, e.g., the wrinkles and blemishes are more obvious. Different to our method, CAAE assumes the face images lie in a manifold and the autoencoder network is proposed to learn the manifold.
3 The Occupational Face Aging Dataset
3.1 Data Statistics
We collect a dataset of people in 5 different occupations for analysing the occupational effects, which is referred to occupational face aging dataset (OFAD). OFAD consists of over 2,000 diverse face images which are divided into five occupations (actor, singer, doctor, teacher, and farmer). Each occupation contains over 200 male images and 200 female images of different races, and all images have obvious texture information. The age range of these occupational images is between 30 and 80. We divide it into two groups: middle age group (30-50) and old age group (50-80). Some example images are shown in Figure 1. We also collect about 200 images of persons without occupational information as input for training, in which the ages of these persons are in the range of 15 to 45.
3.2 Image Collection
Image search engines such as Google and Bing are common sources for constructing face aging datasets. In addition to these sources, we also collect face images from two available databases, CACD  and FGNET .
Collecting Images from Image Search Engines. We download face images from two representative image search engines: Google and Bing, and each of them contains a great number of high-quality face images. In order to collect images with accurately occupational information, we use a combination of descriptive words that contain age information and occupation name as keywords. For example, we use “retired doctor” as keywords to download doctors’ faces for old age group (50-80).
Collecting Images from CACD. The CACD dataset contains more than 160,000 images of 2,000 celebrities from 16 to 62 years-old. According to CACD, most of the celebrities’ names are crawled from IMDb.com, which is one of the largest online movie database and contains profiles of millions of movies and celebrities, so we download 30-50 and 50-62 years-old face images as the train set of actor.
Collecting Images from FGNET. To evaluate the performance, faces in FGNET are used as a test set, which contains 1,002 images of 82 people with age range from 0 to 69. It includes the ground truth images for evaluation.
4 Our Approach
In this section, we introduce Occupational-aware Adversarial Face Aging network (OAFA), which learns the human aging process under different occupations.
We first introduce some notations. We define as the young persons’ images. And denotes as a set of middle-age face images and is elder face images. The and have occupational information. Afterwards, we only discuss how to generate the looks of elder people. The generation for middle-age is similar. Let , where denotes a set of images of the persons who have the -th occupation and is the number of images. The data distributions are denoted as and where in our paper.
As illustrated in Figure 2, our architecture contains three components: the generative/encoder network , the decoder network , and the discriminative network . Given a young face image , it goes through the multi convolutional layers and is encoded into high-level feature maps which denote as . Then, the feature maps conditioned on certain occupation denote as , where is one-hot occupational label for the -th occupation. Finally, these conditioned feature maps are encoded into a future look for a certain occupation, . Note that only changing , we can generate multiple outputs for different occupations. For ease of representation, we define . In addition, we have a adversarial discriminator which aims to distinguish the generated images from the real elder images . And a decode function which is to reconstruct its own input, i.e., .
4.1 Loss Function
Our objective contains two type of terms: 1) personalized loss for keeping human identity information and 2) occupational-aware adversarial loss for obtaining the skin changes in different occupations.
4.1.1 Personalized Loss
The primary principle of face age progression is to preserve the personality of the input faces. For example, given a young face image and the generated old face image with the -occupation , the generated face image and young face image should be recognized as the same person.
To achieve this goal, we utilize the autoencoder approach [33, 10]. It includes an encoder and a decoder, in which the encoder learns a representation for an input data and the decoder reconstructs the representation back to its own input. We require that the generated face image should be able to be reconstructed back to the original image as the CycleGAN , i.e., . With this, the generated face image is one of the representation of the input young face image, which helps preserve the features of young image and keep the personality of the young face. Thus, the personalized loss can be formulated as
The personalized loss limits the space of possible mapping function because the generated images should be reconstructed back to the original images. Hence, the generated images can’t be far away from the source domain.
4.1.2 Occupational-aware Adversarial Loss
The second principle is to preserve the common age pattern under different occupations, e.g., the generated face image for farmer should be recognized as a farmer’s face. We propose occupational-aware adversarial loss to address this problem.
Inspired by the impressive results in image generation of the conditional generative adversarial network (CGAN) [6, 20], we adopt it for our human aging process under different occupations. The objective can be expressed as:
where but . tries to generate image that looks similar to images from the -th occupation, and tries to distinguish the real occupational image and the generated image . minimizes this objective while aims to maximize it. Note that we also add to distinguish from other generated faces of different occupations.
where is the margin. We explicitly require that the should be closer to the images of the -th occupation than other occupations. It is to make the generate image of -th occupation looks similar to the target domain and help to distinguish the multiple output images.
4.1.3 Full Objective
Our full objective is
where , , and control the importance of the three objectives. The final optimization problem can be formulated as
4.2 Network Architecture
Generator/Encoder . Our generator network follows the architectural proposed by Johnson et al.  and SRGAN . It consists of two parts, and . is a small network with three convolutional layers, which learns the feature maps that facilitate the following image generation. The kernel size of the first convolutional layer is and the last two conovlutional layers with filter kerners. Each convolutional layer is followed by one instance-normalization layer 
and one ReLU layer. We use two strides in the last two convolutional layers which makes the size of the output be half of the input. Given a input image, the final output of network is
. The occupational information is one-hot vector, e.g.,indicates for actor. We resize the one-hot vector into a cube, e.g., where the values in the -th channel are all one, and other channels are zero. Then it is concatenated to the output of and used as the condition.
Given the high-level feature maps and occupational label as input, is a residual network 
to generate the realistic face image. Residual connections are powerful method, which make the very deep network easily to be trained. Followed the design of [8, 12], each residual network consists of two convolutional layers with
filter kernels and 128 feature maps, each convolutional layer followed by one instance-normalization layers and one ReLU activation function. There are 12 residual networks in total.
After the residual network, we use a bilinear interpolation method to upsample the input instead of deconvoluions, since deconvoluions tend to introduce characteristic artifacts [22, 3]. Bilinear interpolation is one of the basic resampling techniques and used to produce a reasonably realistic image. Two bilinear interpolations are used to increase the size of the feature maps, each bilinear interpolation followed by a convolutional layer, one instance-normalization layer and one ReLU layer. The last convolutional layer with kernel size and followed by one instance-normalization layer and one Tanh layer.
Decoder . The decoder is an inverse generator. The only difference is that we remove the occupational conditions.
Discriminator . Our discriminator adopt PatchGAN  as our basic framework. the input of consists of an old image and an condition vector. All LReLU are leaky with slop 0.2.
In the section, we compare the proposed OAFA against several baselines both qualitatively and quantitatively.
5.1 Implementation of OAFA
As with previous works, we normalize the value of each pixel of the input images into , because the normalized pixel value will make the training easier and achieve faster convergence. Similarly, when we concat one-hot vector, we also put its values in the specification between -1 to 1. The value of 0 in the one-hot vector corresponds to -1 and the value of 1 corresponds to 1. The output of the proposed architecture is also in range by using of the Tanh layer.
In training, the hyper-parameters are set as , , . The three networks are updated alternatively with a mini-batch size of 1 through the stochastic gradient solver, i.e., ADAM  (,
). After nearly 200 epochs, high-quality results can be obtained. During testing, onlyis active. Give a young face and a certain occupational condition, will generate the corresponding aging face.
5.2 Qualitative and Quantitative Comparison
In this subsection, we evaluate the performance of the proposed method. Following the CAAE , we qualitatively and quantitatively compare with ground truth and the best results from prior works [35, 34].
5.2.1 Comparison with Ground Truth
To qualitatively evaluate the performance, we compare our results with the real face images with occupational information in Figure 3. We also compare the face images in FGNET [16, 4] in Figure 4. We can see that OAFA can well obtain different textures of different occupations, e.g. wrinkles, hair, blemishes, etc.
For the comparison of quantity, we found 100 volunteers to do the test. Each participant was shown a sequence of paired images: the original image, our generated image (we randomly selected one in five results), the ground truth. Participants were asked whether the last two images are the same person or not sure. There are 240 paired images of 48 subjects from FGNET in total. First, we randomly selected 40 pairs of images to let participants understand the test process. Then we randomly selected 100 paired images from the rest of the images for testing. 95 valid test results are received, with 60.32% indicating that the generated face image is the same person as the ground truth, 30.57% indicating they are not, and 9.11% not sure. Some test results are shown in Figure 5. Qualitative and quantitative comparisons show that our method can obtain realistic images.
5.2.2 Comparison with State-of-the-arts
We select CAAE  and RFA  as our baselines. For fair comparison, we use the same input without preprocess to generate images and all results of baselines are directly cited from their original papers. Figure 6 and Figure 7 show the comparison results. We can see that our method can generate more realist, older images and obtain more skin textures. For example, even 80-years old, the generated faces of CAAE look very young. However, the details of skin aging can be clearly seen in the faces generated by our method.
For the comparison of quantity, we also found 100 volunteers to do the test. Each participant was shown a sequence of paired images, the original image, our generated images (we also randomly selected one in five results), and the image generated by prior work (we use the code 222https://github.com/ZZUTK/Face-Aging-CAAE of CAAE to generate these image), and asked them which method perform better or not sure. The test set consists of 740 paired images of 148 subjects from FGNET, and some examples were shown in Figure 8. We randomly selected 40 paired images to let participants understand the test process. Then from the rest of the images, we randomly selected 100 paired images for testing. 92 valid test results are received, with 61.35% indicating that our method is better, 13.06% indicating our method is worse, and 25.59% not sure.
Note that there is no any pre-process to our input images.
5.3 Effect of the Occupational-aware Adversarial Loss
For exploring the effects of triplet rank loss, we compare some examples that using and not using the triplet rank loss, i.e., with and without triplet rank loss. The other two losses are fixed. Figure 9 shows the results. We can observe that using the triplet rank loss gives better results.
5.4 Effect of the Personalized Loss
For exploring the effects of personalized loss, we also show some exampled results that using different values of . Figure 10 shows the results which the is in the range of . We can see the generated face images look more like the input young faces with larger weight of personalized loss.
In this paper, we proposed an occupation-aware face aging progression method via the conditional generative adversarial network. In the proposed deep adversarial architecture, an input young face image conditioned on an occupation goes through the generator network and the future look is generated. Then, we present personalized network and occupational-aware adversarial network for preserving personality and generate more realistic images for skin changes , respectively. Empirical evaluations on both the qualitative and quantitative comparisons demonstrate the appealing performance of our method.
In future work, we plan to study shape change, e.g., child growth, to obtain more realistic face images.
-  D. M. Burt and D. I. Perrett. Perception of age in adult caucasian male faces: Computer graphic manipulation of shape and colour information. Proceedings of the Royal Society of London B: Biological Sciences, 259(1355):137–143, 1995.
B.-C. Chen, C.-S. Chen, and W. H. Hsu.
Cross-age reference coding for age-invariant face recognition and retrieval.In ECCV, pages 768–783. Springer, 2014.
-  Q. Chen and V. Koltun. Photographic image synthesis with cascaded refinement networks. arXiv preprint arXiv:1707.09405, 2017.
-  T. Cootes and A. Lanitis. The fg-net aging database, 2008.
Y. Fu, G. Guo, and T. S. Huang.
Age synthesis and estimation via faces: A survey.TPAMI, 32(11):1955–1976, 2010.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
-  K. T. Gribbon and D. G. Bailey. A novel approach to real-time bilinear interpolation. In Electronic Design, Test and Applications, Proceedings. DELTA 2004. Second IEEE International Workshop on, pages 126–131. IEEE, 2004.
S. Gross and M. Wilber.
Training and investigating residual nets.
Facebook AI Research, CA.[Online]. Avilable: http://torch. ch/blog/2016/02/04/resnets. html, 2016.
-  K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
-  G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. science, 313(5786):504–507, 2006.
-  P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004, 2016.
J. Johnson, A. Alahi, and L. Fei-Fei.
Perceptual losses for real-time style transfer and super-resolution.In ECCV, pages 694–711. Springer, 2016.
-  I. Kemelmacher-Shlizerman, S. Suwajanakorn, and S. M. Seitz. Illumination-aware age progression. In CVPR, pages 3334–3341, 2014.
-  D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
-  H. Lai, Y. Pan, Y. Liu, and S. Yan. Simultaneous feature learning and hash coding with deep neural networks. In CVPR, pages 3270–3278, 2015.
-  A. Lanitis, C. J. Taylor, and T. F. Cootes. Toward automatic simulation of aging effects on face images. TPAMI, 24(4):442–455, 2002.
-  C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:1609.04802, 2016.
-  A. C. Little and S. C. Roberts. Evolution, appearance, and occupational success. Evolutionary Psychology, 10(5):147470491201000503, 2012.
-  X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley. Least squares generative adversarial networks. 2016.
-  M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
-  M. Norouzi, D. J. Fleet, and R. Salakhutdinov. Hamming distance metric learning. In NIPS, pages 1061–1069, 2012.
-  A. Odena, V. Dumoulin, and C. Olah. Deconvolution and checkerboard artifacts. Distill, 1(10):e3, 2016.
-  U. Park, Y. Tong, and A. K. Jain. Age-invariant face recognition. TPAMI, 32(5):947–954, 2010.
-  A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. Computer Science, 2015.
-  N. Ramanathan and R. Chellappa. Modeling age progression in young faces. In CVPR, 2006 IEEE Computer Society Conference on, volume 1, pages 387–394. IEEE, 2006.
-  N. Ramanathan and R. Chellappa. Modeling shape and textural variations in aging faces. In Automatic Face & Gesture Recognition, 2008. FG’08. 8th IEEE International Conference on, pages 1–8. IEEE, 2008.
-  N. Ramanathan, R. Chellappa, S. Biswas, et al. Age progression in human faces: A survey. Journal of Visual Languages and Computing, 15:3349–3361, 2009.
-  X. Shu, J. Tang, H. Lai, L. Liu, and S. Yan. Personalized age progression with aging dictionary. In CVPR, pages 3970–3978, 2015.
-  J. Suo, X. Chen, S. Shan, W. Gao, and Q. Dai. A concatenational graph evolution aging model. TPAMI, 34(11):2083–2096, 2012.
-  J. Suo, S.-C. Zhu, S. Shan, and X. Chen. A compositional and dynamic model for face aging. TPAMI, 32(3):385–401, 2010.
-  B. Tiddeman, M. Burt, and D. Perrett. Prototyping and transforming facial textures for perception research. IEEE computer graphics and applications, 21(5):42–50, 2001.
-  D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance normalization: The missing ingredient for fast stylization. 2016.
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol.
Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.
Journal of Machine Learning Research, 11(Dec):3371–3408, 2010.
-  W. Wang, Z. Cui, Y. Yan, J. Feng, S. Yan, X. Shu, and N. Sebe. Recurrent face aging. In CVPR, pages 2378–2386, 2016.
-  Z. Zhang, Y. Song, and H. Qi. Age progression/regression by conditional adversarial autoencoder. arXiv preprint arXiv:1702.08423, 2017.
-  J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593, 2017.