I Introduction
In recent years, age progression has received considerable interest from the computer vision community. Starting from the predominant approaches that require lots of time and professional skills with the support from forensic artists, several breakthroughs have been achieved. Numerous automatic age progression approaches from anthropology theories to deep learning models have been proposed. In general, the age progression methods can be technically classified into four categories, i.e. modeling, reconstruction, prototyping and deep learning based methods. The methods in the first three categories usually tend to simulate the aging process of facial features by (1) adopting prior knowledge from anthropometric studies ; or (2) representing the face geometry and appearance by a set of parameters via conventional models such as Active Appearance Models (AAMs), 3D Morphable Models (3DMM) and manipulate these parameters via learned aging functions. Although they have achieved some inspiring synthesis results, these face representations are still linear and facing lots of limitations in modeling the nonlinear aging process.
Meanwhile, the fourth category introduces modern approaches with the stateoftheart Deep Generative Models (DGM) for both face modeling and aging embedding process. Since deep learning structures have more capabilities of interpreting and transferring the highly nonlinear features of the input signals, they are more suitable for modeling the human aging process. As a result, superior synthesized facial images [10, 11], [12, 41, 39] can be generated. Inspired by these stateoftheart results, in this paper, we aim to provide a review of recent developments for face age progression. Both structures and formulations
of several Deep Generative Models, i.e. Restricted Boltzmann Machines (RBM), Deep Boltzmann Machines (DBM), and Generative Adversarial Network (GAN), as well as
the way they are adopted to age progression problem will be presented. Moreover, several common face aging databases are also reviewed.Ii Face Aging Databases
Database collection for face aging is also a challenging problem. There are several requirements during the collecting process. Not only should each subject have images at different ages, but also the covered age range should be large. Therefore, face aging databases are still limited in terms of age labels and the number of available databases. The characteristics and age distributions of several current existing face aging databases are summarized in Table I and Fig. 2.
Method  Approach 

Architecture  Summary  


Model based  AAMs  ✗ 



Model based  AAMs  ✗ 



Model based  AAMs  ✗  Incorporated familial facial cues  

Model based  Aging Patterns  Aging Pattern Subspace  Grammatical face model  

Model based  Aging Patterns  Aging Pattern Subspace 



Model based  Partbased  AndOrGraph  Markov Chain on Parse Graphs  

Model based  Partbased  AndOrGraph  Composition of shortterm graph evolution  

Prototype  Image pixel  ✗ 



Reconstructing  Sparse Representation  Coupled dictionaries 



Reconstructing  Sparse Representation  Hidden Factor Analysis 

Further than these databases, a largescale inthewild dataset, named AGing Face intheWild (AGFW), was also introduced in our work [10] with 18,685 facial images with individual ages sampled ranging from 10 to 64. In this database, images are divided into 11 age groups with the span of 5 years, each group contains 1,700 images on average. This database is then extended to AGFWv2 with double scale, i.e. 36,325 images with an average of 3,300 images per age group.
Iii Conventional Approaches
In this section, we provide a brief review of conventional age progression approaches including modeling, prototyping, and reconstructing based approaches. Their properties are also summarized in Table II.
Iiia Modelingbased approach
Modelingbased approach is among the earliest categories presented for face age progression. These methods usually exploit some kinds of appearance models, i.e. Active Appearance Models (AAM), 3D Morphable Models (3DMM), to represent the shapes and texture of the input face by a set of parameters. Then the aging process is simulated by learning some aging functions from the relationship of the parameter sets of different age groups. In particular, Pattersons et al. [26] and Lanitis et al. [20] employed a set of Active Appearance Models (AAMs) parameters with four aging functions to model both the general and the specific aging processes. Four variations of aging functions were introduced: Global Aging Function, Appearance Specific Aging Function (ASA), Weighted Appearance Aging Function (WAA), and Weighted Person Specific Aging Function (WSA). Also by employing AAMs during the modeling step, Luu et al. [24] later incorporated familial facial cues to the process of face age progression.
Another direction of modeling was proposed in [13] with a definition of AGing pattErn Subspace (AGES). In this approach, the authors construct a representative subspace for aging patterns as a chronological sequence of face images. Then given an image, the proper aging pattern is determined by the projection in this subspace that produces smallest reconstruction error. Finally, the synthesized result at a target age is obtained by the reconstructed faces corresponding to that age position in the subspace. Tsai et al. [38] then enhanced the AGES using guidance faces corresponding to the subject’s characteristics to produce more stable results. Suo et al. [36, 35] introduced the threelayer AndOr Graph (AOG) of smaller parts, i.e. eyes, nose, mouth, etc., to model a face. Then, the face aging process was learned for each part using a Markov chain.
IiiB Prototyping approach
The main idea of the methods in this category is to predefine some types of aging prototypes and transfer the difference between these prototypes to produce synthesized face images. Usually, the aging prototypes are defined by the average faces of all age groups [32]. Then, input face image can be progressed to the target age by incorporating the differences between the prototypes of two age groups [5]. Notice that this approach requires a good alignment between faces in order to produce plausible results. KemelmacherShlizerman et al. [18] then proposed to construct high quality average prototypes from a largescale set of images. Sharper average faces are obtained via the collection flow method introduced in [23] to align and normalize all the images in one age group. Then illumination normalization and subspace alignment technique are employed to handle images with various lighting conditions. Figure 3 illustrates the results obtained in [18].
IiiC Reconstructingbased approach
Rather than constructing aging prototypes for each age group, the reconstructingbased methods focus on constructing the “aging basis” for each age group and model aging faces by the combination of these bases. Dictionary learning techniques are usually employed for this type of approach. Shu et al. [34] proposed to use the aging coupled dictionaries (CDL) to model personalized aging patterns by preserving personalized facial features. The dictionaries are learned using face pairs from neighboring age groups via a personalityaware coupled reconstruction loss. Yang et al. [40] represented personspecific and agespecific factors independently using sparse representation hidden factor analysis (HFA). Since only agespecific gradually changes over time, the age factor is transformed to the target age group via sparse reconstruction and then combined with the identity factor to achieve the aged face.
Iv Deep Generative Models for Face Aging
In this section, we firstly provide an overview of the structures and formulations of the common Deep Generative Models before going through the age progression techniques developed from these structures.
Iva From Linear Models to Deep Structures
Compared to linear models such as AAMs and 3DMM, deep structures have gained significant attention as one of the emerging research topics in both representing higherlevel data features and learning the distribution of observations. For example, being designed following the concepts from Probabilistic Graphical Models (PGM), the RBMbased models organize their nonlinear latent variables in multiple connected layers with an energy function such that each layer can learn a different factor to represent the data variations. This section introduces the structures, formulations of several Deep Generative Models including RBM, Deep Boltzmann Machines (DBM), Generative Adversarial Networks (GANs).
IvA1 Restricted Boltzmann Machines (RBM)[15]
are undirected graphical models consisting two layers of stochastic units, i.e. visible and hidden units . This is a simplified version of Boltzmann Machines where no intra connections between units in the same layer is created. RBM structure is a bipartite graph where visible and hidden units are pairwise conditionally independent. Given a binary state of
, the energy of RBM and the joint distribution of visible and hidden units can be computed as
(1) 
where
denotes the parameter set of RBM including the connection weights and the biases of visible and hidden units, respectively. The conditional probabilities RBM structure can be computed as
and where is the logistic function.In the original RBM, both visible and hidden units are binary. To make it more powerful and be able to deal with realvalued data, an extension of RBM, named Gaussian Restricted Boltzmann Machine, is introduced in [19]. In Gaussian RBM, the visible units are assumed to have values in
and normally distributed with mean
and variance
. Another extension of RBM is Temporal Restricted Boltzmann Machines (TRBM) [37] which was designed to model complex timeseries structure. The structure of TRBM is shown in Fig. 4 (b). The major difference between the original RBM and TRBM is the directed connections from both visible and hidden units of previous states to the current states. With these new connections, the short history of their activations can act as “memory” and is able to contribute to the inference step of current states of visible units.IvA2 Deep Boltzmann Machines (DBM)
As an extension of RBM with more than one hidden layer, the structure of DBM contains several RBMs organized in layers. Thanks to this structure, the hidden units in higher layer can learn more complicated correlations of features captured in lower layer. Another interesting point of DBM is that these higher representations can be built from the training data in an unsupervised fashion. Unlike other models such as Deep Belief Network
[16]or Deep Autoencoders
[4], all connections between units in two consecutive layers are undirected. As a result, each unit receives both bottomup and topdown information and, therefore, can better propagate uncertainty during the inference process.Let be the set of units in two hidden layers, the energy of the state is given as follows.
(2) 
where are the weights of visibletohidden and hiddentohidden connections. Notice that the bias terms for visible and hidden units are ignored in Eqn. (2) for simplifying the representation. Exploiting the advantages of DBM, Deep Appearance Models (DAM) [9] and Robust Deep Appearance Models (RDAM) [27] have been introduced and proven to be superior to other classical models such as AAMs in inferencing a representation for new face images under various challenging conditions.
IvA3 Generative Adversarial Networks (GAN)
In order to avoid the intractable Markov chain sampling during the training stage of RBM, Goodfellow et al. [14] borrowed the idea from adversarial system to design their Generative Adversarial Networks (GAN). The intuition behind this approach is to set up a game between generator and discriminator. On one hand, the discriminator learns to determine whether given data are from the generator or real samples. On the other hand, the generator learns how to fool the discriminator by its generated samples. This game continues as the learning process takes place. The learning process will stop at a point that the discriminator can’t distinguish between real data and the ones produced by the generator. This is also an indication that the generator has already learned the distribution of input data. Formally, let be the input data, be the distribution learned from generator, and be the prior distribution of variable
. Then GAN is defined by two neural networks representing two differentiable functions for the generator
and discriminator where denotes the probability that comes from the data distribution rather than ; and are the parameters of the CNNs representing and , respectively. The training process is then formulated as maximizing the probability while minimizing :(3) 
In original GAN, the use of fully connected neural network for its generator makes it very hard to generate highresolution face images. Then numerous extensions of GAN focusing on different aspects of this structure have been proposed in literature such as Laplacian pyramid Generative Adversarial Networks (LAPGAN) [8], Deep Convolutional Generative Adversarial Networks (DCGAN) [28], InfoGAN [7], Wasserstein GAN [3].
Method  Approach  Architecture 








DL  RNN  ✔  ✔  ✗  ✗  

DL  GAN  ADV +  ✔  ✔  ✗  ✗  

DL  GAN  ADV +  ✔  ✔  ✗  ✗  

DL  GAN 

✔  ✔  ✗  ✗  

DL  TRBM  LL  ✔  ✗  ✗  ✗  

DL 


✔  ✔  ✗  ✗  

DL + IRL 


✔  ✔  ✔  ✔ 
Properties of Deep Generative Model Approaches for Age Progression. Deep Learning (DL), LogLikelihood (LL), Inverse Reinforcement Learning (IRL), Probabilistic Graphical Models (PGM), Adversarial (ADV)
IvB Deep Aging Models for Age Progression
Thanks to the power of Deep Learning models in terms of nonlinear variations modeling, many deep learning based age progression approaches have been recently developed and achieved considerable results in face age progression. Table III summarizes the key features of these deep learning based approaches.
TRBMbased model
In addition to single face modeling, a TRBM based age progression model is introduced in [10] to embed the temporal relationship between images in a face sequence. By taking the advantages of loglikelihood objective function and avoiding the reconstruction error during training, the model is able to efficiently capture the nonlinear aging process and automatically synthesize a series of ageprogressed faces in various age ranges with more aging details. This approach has presented a carefully designed architecture with the combination of both RBM and TRBM for age variation modeling and age transformation embedding. Fig. 5(A) illustrates the aging architecture with TRBM proposed in [10]. In this approach, the longterm aging development is considered as a composition of shortterm changes and can be represented as a sequence of that subject faces in different age groups. After the decomposition, a set of RBMs is employed to model the age variation of each age group as well as the wrinkles presented in the faces of older ages. Then the TRBM based model is constructed to embed the aging transformation between faces of consecutive age groups. Particularly, keeping similar form of the energy function as original TRBM and RBM , the bias terms are defined as
(4) 
where are the model parameters; and denote the reference faces produced by the set of learned RBM. With this structure, both linear and nonlinear interactions between faces are efficiently exploited. Finally, some wrinkle enhancement together with geometry constraints are incorporated in postprocessing steps for more consistent results. Therefore, plausible synthesized results can be achieved using this technique. A comparison in term of synthesis quality between this model and other conventional approaches is shown in Fig. 7.
Recurrent Neural Networkbased model
Approaching the age progression in a similar way of decomposition, instead of using TRBM, Wang et al. [39]
proposed to use a Recurrent Neural Network with twolayer gated recurrent unit (GRU) to model aging sequence. With the recurrent connection between the hidden units, the model can efficiently exploit the information from previous faces as
“memory” to produce smoother transition between faces during synthesizing process. Fig. 6(A) illustrates the architecture of the proposed RNN for age progression. In particular, let be the input face at young age, this network firstly encodes it into latent representation (hidden/memory units) by the bottom GRU and then decodes this representation into an older face of the subject using the top GRU. The relationship between and can be interpreted as follows.(5) 
Similar formulations are also employed for the relationship between and . Then the difference between and the groundtruth aged face is computed in a form of loss function. The system is then trained to obtain the synthesis capability. Finally, in order to generate the wrinkles for the agedfaces, the prototypingstyle approach is adopted for wrinkle transferring. Although this approach has produced some improvements comparing to classical approaches, the use of a fixed reconstruction loss function has limited its synthesis ability and usually resulted in blurry faces.
GANbased model
Rather than stepbystep synthesis as in previous approaches, Antipov et al. [2], Zhang et al. [41], and Li et al. [22] turned into another direction of age progression, i.e. direct approach, and adopted the structure of GAN in their architectures. Fig. 5(B) illustrates the structure of the Conditional Adversarial Autoencoder (CAAE) [41]. From this figure, one can easily see that the the authors have adopted the GAN structure as presented in Section IVA3 with an additional age label feature in the representation of latent variables. By this way, they can further encode the relationship between subject identity related highlevel features of the input face and its age label. After training, by simply changing the aging label according to the target age, the deepneuralnetwork generator is able to synthesize the aged face at that age. Compared to Eqn. (3), the new objective function is adapted as.
where
denotes the vector represented age label;
is the latent feature vector; is the decoder function, i.e. . and are the norm and total variation functions, respectively. denotes the distribution of the training data. As one can see, the conditional constraint on the age label is represented in the last two terms of the loss function. Although this model type can avoid the requirement of longitudinal age database during training, it is not easy to be converged due to the step of maintaining a good balance between generator and discriminator which is hard to achieve. Moreover, similar to RNNbased approach, GANbased models also incorporate the norm in their objective functions. Therefore, their synthesized results are limited in terms of the image sharpness.Temporal NonVolume Preserving transformation
Recently, addressing a limitation of intractable learning process of TRBM based model as well as the image quality of RNNbased and GANbased approaches, the Temporal NonVolume Preserving (TNVP) approach is introduced in [11] for embedding the feature transformations between faces in consecutive stages while keeping a tractable density function, exact inference and evaluation. Unlike previous approaches which incorporate only PGM or CNN structures, this proposed model enjoys the advantages of both architectures to improve its image synthesis quality and highly nonlinear feature generation. The idea of this model start from a PGM with relationships between variables in image and latent domains (see Fig. 6(B)) given by
(6) 
where denote the bijection functions mapping and to their latent variables , respectively.
is the function embedding the aging transformation between latent variables. Then the probability density function is derived by.
(7) 
where and denote the conditional distribution of and , respectively. By a specific design of mapping functions , the two terms on the righthandside of Eqn. (7
) can be computed exactly and effectively. As a result, the authors can form a deep CNN network optimized under the concepts of PGM. While keeping the tractable loglikelihood density estimation in its objective function, the model turns age progression architectures into new direction where the CNN network can avoid using a fix reconstruction loss function and obtain highquality synthesized faces. Fig.
8 illustrates the synthesized results achieved by TNVP in comparison with other approaches.Subjectdependent Deep Aging Path (SDAP) model
Inspiring from the advantages of TVNP, the Inverse Reinforcement (IRL) Learning is also taken into account in the structure of Subjectdependent Deep Aging Path (SDAP) model [12]. Under the hypothesis that each subject should have his/her own facial development, Duong et al. [12] proposed to use an additional aging controller in the structure of TNVP. Then rather than only embedding the aging transformation between pairwise relationship between consecutive age groups, the SDAP structure learns from the aging transformation of the whole face sequence for better longterm aging synthesis. This goal is achieved via a SubjectDependent Aging Policy Network which guarantees to provide an appropriate planning aging path for the age controller corresponding to the subject’s features. The most interesting point of SDAP is that this is one of the pioneers incorporating IRL framework into age progression task. In this approach, let be the age sequence of th subject where are the face sequence representing the facial development of th subject and denote the variables control the aging amount added to to become . The probability of can be formulated via an energy function by
(8) 
where is the partition function. Notice that the formulation of Eqn. (8) is very similar to joint distribution between variables of RBM as in Eqn. (1). Then the goal is to learn a SubjectDependent Aging Policy Network that can predict for each during synthesized process. The objective function is defined as.
(9) 
Finally, a specific design of IRL framework is proposed to learn the Policy Network. From the experimental results, SDAP has shown its potential to outperform TNVP and other approaches on synthesis results and crossage verification accuracy. As shown in Fig. 9, SDAP can help to significantly improve the accuracy for face recognition system.
V Conclusion
In this paper, we have reviewed the main structures of Deep Generative Models for Age Progression task. Compared to other classical approaches, Deep Learning has shown its potential either in learning the highly nonlinear age variation or aging transformation embedding. As a result, not only do their synthesized faces improve in the image quality but also help to significantly boost the recognition accuracy for crossage face verification system. Several common aging databases that support the facial modeling and aging embedding process are also discussed.
References
 [1] FGNET Aging Database. http://www.fgnet.rsunit.com.
 [2] G. Antipov, M. Baccouche, and J.L. Dugelay. Face aging with conditional generative adversarial networks. arXiv preprint arXiv:1702.01983, 2017.
 [3] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.

[4]
Y. Bengio.
Learning deep architectures for ai.
Foundations and trends® in Machine Learning
, 2(1):1–127, 2009.  [5] D. M. Burt and D. I. Perrett. Perception of age in adult caucasian male faces: Computer graphic manipulation of shape and colour information. Proc R Soc Lond B Biol Sci, 259(1355):137–143, 1995.
 [6] B.C. Chen, C.S. Chen, and W. H. Hsu. Crossage reference coding for ageinvariant face recognition and retrieval. In ECCV, 2014.
 [7] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS, pages 2172–2180, 2016.
 [8] E. L. Denton, S. Chintala, R. Fergus, et al. Deep generative image models using aï¿Œ laplacian pyramid of adversarial networks. In NIPS, pages 1486–1494, 2015.
 [9] C. N. Duong, K. Luu, K. G. Quach, and T. D. Bui. Beyond principal components: Deep boltzmann machines for face modeling. In CVPR, pages 4786–4794. IEEE, 2015.
 [10] C. N. Duong, K. Luu, K. G. Quach, and T. D. Bui. Longitudinal face modeling via temporal deep restricted boltzmann machines. In CVPR, 2016.
 [11] C. N. Duong, K. G. Quach, K. Luu, N. Le, and M. Savvides. Temporal nonvolume preserving approach to facial ageprogression and ageinvariant face recognition. In ICCV, 2017.
 [12] C. N. Duong, K. G. Quach, K. Luu, T. Le, and M. Savvides. Learning from longitudinal face demonstrationwhere tractable deep modeling meets inverse reinforcement learning. arXiv preprint arXiv:1711.10520, 2017.
 [13] X. Geng, Z.H. Zhou, and K. SmithMiles. Automatic age estimation based on facial aging patterns. PAMI, 29(12):2234–2240, 2007.
 [14] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, pages 2672–2680, 2014.

[15]
G. E. Hinton.
Training products of experts by minimizing contrastive divergence.
Neural computation, 2002.  [16] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.
 [17] I. KemelmacherShlizerman, S. M. Seitz, D. Miller, and E. Brossard. The megaface benchmark: 1 million faces for recognition at scale. In CVPR, 2016.
 [18] I. KemelmacherShlizerman, S. Suwajanakorn, and S. M. Seitz. Illuminationaware age progression. In CVPR, pages 3334–3341. IEEE, 2014.
 [19] A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. 2009.
 [20] A. Lanitis, C. J. Taylor, and T. F. Cootes. Toward automatic simulation of aging effects on face images. PAMI, 24(4):442–455, 2002.

[21]
G. Levi and T. Hassner.
Age and gender classification using convolutional neural networks.
In CVPRW, 2015.  [22] P. Li, Y. Hu, Q. Li, R. He, and Z. Sun. Global and local consistent age generative adversarial networks. arXiv preprint arXiv:1801.08390, 2018.
 [23] C. Liu, J. Yuen, and A. Torralba. Sift flow: Dense correspondence across scenes and its applications. TPAMI, 33(5):978–994, 2011.
 [24] K. Luu, C. Suen, T. Bui, and J. K. Ricanek. Automatic childface ageprogression based on heritability factors of familial faces. In BIdS, pages 1–6. IEEE, 2009.
 [25] S. Moschoglou, A. Papaioannou, C. Sagonas, J. Deng, I. Kotsia, and S. Zafeiriou. Agedb: the first manually collected, inthewild age database. In CVPRW, Hawaii, 2017.
 [26] E. Patterson, K. Ricanek, M. Albert, and E. Boone. Automatic representation of adult aging in facial images. In Proc. IASTED Int’l Conf. Visualization, Imaging, and Image Processing, pages 171–176, 2006.
 [27] K. G. Quach, C. N. Duong, K. Luu, and T. D. Bui. Robust deep appearance models. In ICPR, pages 390–395, 2016.
 [28] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
 [29] N. Ramanathan and R. Chellappa. Modeling age progression in young faces. In CVPR, 2006.
 [30] K. Ricanek Jr and T. Tesafaye. Morph: A longitudinal image database of normal adult ageprogression. In FGR 2006., pages 341–345. IEEE, 2006.
 [31] R. Rothe, R. Timofte, and L. V. Gool. Deep expectation of real and apparent age from a single image without facial landmarks. IJCV, 2016.
 [32] D. Rowland, D. Perrett, et al. Manipulating facial appearance through shape and color. CG&A, IEEE, 15(5):70–76, 1995.
 [33] C.T. Shen, W.H. Lu, S.W. Shih, and H.Y. M. Liao. Exemplarbased age progression prediction in children faces. In ISM, pages 123–128. IEEE, 2011.
 [34] X. Shu, J. Tang, H. Lai, L. Liu, and S. Yan. Personalized age progression with aging dictionary. In ICCV, December 2015.
 [35] J. Suo, X. Chen, S. Shan, W. Gao, and Q. Dai. A concatenational graph evolution aging model. PAMI, 34(11):2083–2096, 2012.
 [36] J. Suo, S.C. Zhu, S. Shan, and X. Chen. A compositional and dynamic model for face aging. PAMI, 32(3), 2010.

[37]
I. Sutskever and G. E. Hinton.
Learning multilevel distributed representations for highdimensional sequences.
In AISTATS, pages 548–555, 2007.  [38] M.H. Tsai, Y.K. Liao, and I.C. Lin. Human face aging with guided prediction and detail synthesis. Multimedia tools and applications, 72(1):801–824, 2014.
 [39] W. Wang, Z. Cui, Y. Yan, J. Feng, S. Yan, X. Shu, and N. Sebe. Recurrent face aging. In CVPR, 2016.
 [40] H. Yang, D. Huang, Y. Wang, H. Wang, and Y. Tang. Face aging effect simulation using hidden factor analysis joint sparse representation. TIP, 25(6):2493–2507, 2016.
 [41] Z. Zhang, Y. Song, and H. Qi. Age progression/regression by conditional adversarial autoencoder. In CVPR, July 2017.