1 Introduction
The problem of face aging targets on the capabilities to aesthetically synthesize the faces of a subject at older ages, i.e. age progression, or younger ages, i.e. age regression or deaging
. This problem is applicable in various realworld applications from age invariant face verification, finding missing children to cosmetic studies. Indeed, face aging has raised considerable attentions in computer vision and machine learning communities recently. Several breakthroughs with numerous face aging approaches, varying from anthropology theories to deep learning structures have been presented in literature. However, the synthesized face aging results in these previous approaches are still far from perfect due to various challenging factors, such as heredity, living styles, etc. In addition, face aging databases used in most methods to learn the aging processes are usually limited in both number of images per subject and the covered age ranges of each subject.
Both conventional and deep learning methods usually include two directions, i.e. direct and stepbystep aging synthesis, in exploring the temporal face aging features from training databases. In the former direction, these methods directly synthesize a face to the target age using the relationships between training images and their corresponding age labels. For example, the prototyping approaches (Burt and Perrett, 1995; Kemelmacher Shlizerman et al, 2014; Rowland et al, 1995) use age labels to organize images into age groups and compute average faces for their prototypes. Then, the difference between sourceage and targetage prototypes is applied directly to the input image to obtain the ageprogressed face at the target age. Similarly, the Generative Adversarial Networks (GAN) approach (Zhang et al, 2017) models the relationship between highlevel representation of input faces and age labels by constructing a deepneuralnetwork generator. This generator is then incorporated with the target age labels to synthesize the outputs. Although these kinds of models are easy to train, they are limited in capabilities to synthesize faces much older than the input faces of the same subject, e.g. directly from ten to 60 years old. Indeed, the progression of a face at ten years old to the one at 60 years old in these methods usually ends up with a synthesized face using 10yearold features plus wrinkles.
Meanwhile, the latter approaches (Duong et al, 2017, 2016; Shu et al, 2015; Wang et al, 2016; Yang et al, 2016) decompose the longterm aging process into shortterm developments and focus on the aging transform embedding between faces of two consecutive development stages. Using learned transformation, these methods stepbystep generate progressed faces from one age group to the next until reaching the target. These modeling structures can efficiently learn the temporal information and provide more age variation even when a target age is very far from the input age of a subject. However, the main limitation of these methods is the lack of longitudinal face aging databases. The longest training sequence usually contains only three or four images per subject.
Limitations of previous approaches.
In either directions (i.e. direct or stepbystep aging synthesis) the aging approach falls in, these previous approaches still suffer from many challenging factors and remain with lots of limitations. Table 1 compares the properties of different aging approaches.

Nonlinearity. Since human aging is a complicated and highly nonlinear process, the linear models mostly used in conventional methods (i.e. prototype, AAMsbased and 3DMMbased approaches) are not able to efficiently interpret the aging variations and the quality of their synthesized results is very limited.

Tractability. Exploiting the advantages of probabilistic graphical models has introduced a potential direction for deep model design and produced prominent synthesized results for the age progression task (Duong et al, 2016).

Data usability. Even though a subject in training/testing set has multiple images at the same age, there is only one image used to learn/synthesize in these methods. The other images are usually wastefully ignored. In addition, the aging transformation embedding in these approaches is only able to proceed on images from two age groups.

Fixed aging development path. The learned aging development path is identically applied for all subjects which is not true in reality. Instead, each subject should have his/her own aging development.
Contributions of this work.
The paper presents a novel Subjectdependent Deep Aging Path (SDAP) model to face age progression, which is an extension of our previous work (Duong et al, 2017). In that work, TNVP structure is proposed to embed the pairwise transformations between two consecutive age groups. In this work, the SDAP structure is introduced to further enhance the capability to discover the optimal aging development path for each individual. This goal can be done by embedding the transformation over the whole aging sequence of a subject under an IRL framework. Our contributions can be summarized as follows.

The aging transformation embedding is designed using (1) a tractable loglikelihood
density estimation with (2) Convolution Neural Network (CNN) structures and (3) an
age controller to indicate the amount of aging changes for synthesis. Thus, the proposed SDAP is able to provide a smoother synthesis across faces and maximize the usability of aging data, i.e. all images of a subject in different or the same ages are utilized. 
Unlike most previous methods, our proposed SDAP model further enhances the capability to find the optimal aging development path for individual. This goal can be done by embedding the transformation over the whole aging sequence of a subject under an IRL framework.

Instead of using predefined or addhoc aging reward and objective functions as in most previous work, our proposed approach allows the algorithm to automatically come up with the optimal objective formulation and parameters via a data driven strategy in training.
We believe that this is the first work that designs an IRL framework to model the longitudinal face aging.
2 Related work
This section reviews recent methods in face age progression. These methods can be technically classified into four categories, i.e. modeling, reconstruction, prototyping, and deep learningbased methods.
Modelingbased aging is one of the earliest categories presented face age progression. These methods usually model both facial shapes and textures using a set of parameters, and learn the face aging process via aging functions. (Patterson et al, 2006) and (Lanitis et al, 2002) employed a set of Active Appearance Models (AAMs) parameters with four aging functions to model both the general and the specific aging processes. Luu et al. (Luu et al, 2009) combined familial facial cues to the process of face age progression. (Geng et al, 2007) presented AGing pattErn Subspace (AGES) method to construct a subspace for aging patterns as a chronological sequence of face images. (Tsai et al, 2014) then enhanced AGES using guidance faces corresponding to the subject’s characteristics to produce more stable results. Texture synthesis was also combined in the later stage to produce better facial details. (Suo et al, 2010, 2012)
introduced the threelayer AndOr Graph (AOG) of smaller parts, i.e. eyes, nose, mouth, etc., to model a face. Then, the face aging process was learned for each part using a Markov chain.
Reconstructionbased aging methods model aging faces by unifying the aging basis in each group. (Yang et al, 2016) represented personspecific and agespecific factors independently using sparse representation hidden factor analysis (HFA). (Shu et al, 2015) presented the aging coupled dictionaries (CDL) to model personalized aging patterns by preserving personalized facial features.
Prototypingbased aging methods employ the age prototypes to produce new face images. The average faces of all age groups are used as the prototypes (Rowland et al, 1995). Then, input face image can be progressed to the target age by incorporating the differences between the prototypes of two age groups (Burt and Perrett, 1995). (Kemelmacher Shlizerman et al, 2014) presented a method to construct high quality average prototypes from a largescale set of images. The subspace alignment and illumination normalization were also included in this system. Aging patterns across genders and ethnicities were also investigated in (Guo and Zhang, 2014).
Deep learningbased aging approaches have recently achieved considerable results in face age progression using the power of deep learning. (Duong et al, 2016)
introduced Temporal Restricted Boltzmann Machines (TRBM) to represent the nonlinear aging process with geometry constraints and spatial RBMs to model a sequence of reference faces and wrinkles of adult faces.
(Wang et al, 2016)approximated aging sequences using a Recurrent Neural Networks (RNN) with twolayer Gated Recurrent Unit (GRU). Recently, the structure of Conditional Adversarial Autoencoder (CAAE) is also applied to synthesize aged images in
(Antipov et al, 2017). (Duong et al, 2017) proposed a novel generative probabilistic model, called Temporal NonVolume Preserving (TNVP) transformation, to model a longterm facial aging process as a sequence of shortterm stages.3 Our Proposed SDAP
TNVP structure has provided an efficient model to capture the pairwise transformation between faces of consecutive age groups (Duong et al, 2017). However, it still has some limitations. Firstly, the TNVP mainly focuses on the pairwise relationship rather than the longterm relationship presented in an aging sequence. Secondly, capability of applying different development paths for different subjects is still absent. In reality, each subject should have his/her own aging development progress because each person ages differently. In this section, we introduce a more flexible structure, named Subjectdependent Deep Aging Path (SDAP), with an additional component, i.e. an age controller. This age controller provides the capability of defining how much age variation should be added during synthesis. This architecture, therefore, benefits both training stages, i.e. by maximizing the usability of training aging data, and testing stage, i.e. becoming more flexible to adopt different aging path to different subjects according their features. Moreover, instead of only learning from image pairs of a subject in two consecutive age groups, SDAP has the capability of embedding the aging transformation from longer aging sequences of that subject which efficiently reflects the longterm aging development of the subject. We also show that goal can be achieved under an Inverse Reinforcement Learning (IRL) framework. The structure of this section is as follows: We first present our novel approach to model the facial structures in Subsection 3.1. Then, our IRL learning approach to the longitudinal face aging modeling is detailed in Subsection 3.2.
3.1 Aging Embedding with Age Controller
The proposed architecture consists of three main components, i.e. (1) latent space mapping, (2) aging transformation, and (3) age controller. Our age controller provides the capability of defining how much age variation should be added during synthesis. Using this structure, our model is flexible to aging in different ways corresponding to the input faces. Moreover, it also helps to maximize the usability of training aging data.
Structures and Variable Relationship Modeling:
Our graphical model (Fig. 2) consists of three sets of variables: observed variables encoding the textures of face images in the image domain at two stages and ; their corresponding latent variables in the latent space ; and an aging controller variable . The aging controller
is represented as a onehot vector indicating how many years old the progression process should perform on
. The bijection functions , mapping from the observation space to the latent space, and the aging transformation are defined as in Eqn. (1).(1) 
where denotes the set of parameters of , , , respectively. Notice that in SDAP, the structure of bijection functions is adopted from TNVP architecture. Then, the relationship between latent variables and is computed as .
The interactions between latent variables and the aging controller variable are 3way multiplicative. They can be mathematically encoded as in Eqn. (2).
(2) 
where
is a 3way tensor weight matrix and
is the bias of these connections. Eqn. (2) enables two important properties in the architecture. First, since is an onehot vector, different controllers will enable different sets of weights to be used. Thus, it allows controlling the amount of aging information to be embedded to the aging process. Second, given the age controller, the model is able to use all images of a subject to enhance its performance.The Loglikelihood:
Given a face in the age group
, the probability density function can be formulated as,
(4) 
where and are the distribution of conditional on and the distribution of conditional on , respectively. Then, the loglikelihood can be computed as follows:
The Joint Distributions:
In order to model the aging transformation flow, the Gaussian distribution is presented as the prior distribution
for the latent space. After mapping to the latent space, the age controller variables are also constrained as a Gaussian distribution. In particular, let represent the latent variables of . The latent variables distribute as Gaussians with means and covariances respectively. Then, the latent is as,(5) 
Since the connection between and
embeds the relationship between variables of different Gaussian distributions, we further assume that their joint distribution is also a Gaussian. Then, the joint distribution
can be computed as follows.where is an allones vector.
The Objective Function:
The parameter of the model is optimized to maximize the loglikelihood as in Eqn. (6).
(6) 
This constraint is then incorporated to the objective function
where is the loglikelihood function of given mean and covariance .
3.2 IRL Learning from Aging Sequence
In this section, we further extend the capability of our model by defining an Subjectdependent Deep Aging Policy Network to provide a planning aging path for the aging controller. Consequently, the synthesized sequence, i.e. , is guaranteed to be the best choice in the face aging development for a given subject.
Let be the observed age sequence of the th subject and be the set of all aging sequences in the dataset. The probability of a sequence can be defined as
(7) 
where is an energy function parameterized by , and is the partition function computed using all possible aging sequences . Then, the goal is to learn a model such that the loglikelihood of the observed aging sequences is maximized as follows:
(8) 
In Eqn. (8), if is considered as a form of a reward function, then the problem is equivalent to learning a policy network from a Reinforcement Learning (RL) system given a set of demonstrations .
The reward function is the key element for policy learning in RL. However, predefining a reasonable reward function for face aging synthesis is impossible in practice. Indeed, it is very hard to measure the goodness of the ageprogressed images even the groundtruth faces of the subject at these ages are available. Therefore, rather than predefine an addhoc aging reward, the energy is represented as a neural network with parameters and adopted as a nonlinear cost function of an Inverse Reinforcement Learning problem.
In this IRL system, can be directly learned from the set of observed aging sequences . Fig. 4 illustrates the structure of the proposed IRL framework. Based on this structure, given a set of aging sequences as demonstrations, not only the cost function can be learned to maximize the loglikelihood of observed age sequence but also the policy, i.e. predicting aging path for each individual, is obtained with respect to the optimized cost.
Mathematically, the IRL based age progression procedure can be formulated as follows. Let
be a Markov Decision Process (MDP) where
denote the state space, the action space, and the transition model, respectively. is the set of observed aging sequences and represents the cost function. Given an MDP , our goal is to discover the unknown cost function from the observation as well as simultaneously extract the policy that minimizes the learned cost function.State: The state is defined as a composition of two information, i.e. the face image at the th stage; and the age label of .
Action: Similar to the age controller, an action is defined as the amount of aging variations that the progression process should perform on state . Given , an action
is selected by stochastically sampling from the action probability distribution. During testing, given the current state, the action with the highest probability is chosen for synthesizing process. Due to data availability where the largest aging distance between the starting and ending images of a sequence is 15, we choose the length of
(i.e. plus one state of where and has the same age).Cost Function: The cost function plays a crucial role to guide the whole system to learn the sequential policies to obtain a specific aging path for each subject. Getting a state and as inputs, the cost function maps them to a value . Thus, the cost for the th aging sequence can be obtained as . In order to learn a complex and nonlinear cost formulation, each
is approximated by a neural network with two hidden layers of 32 hidden units followed by Rectified Linear Unit (ReLU).
Policy: Given the cost function , the policy is presented as a Gaussian trajectory distribution as follows.
(9) 
Then it is optimized respecting to the expected cost function .
Given the defined state and action , the observation aging sequence is redefined as . The loglikelihood in Eqn. (8) can be rewritten as,
(10) 
Since the computation of the partition function is intractable, the samplingbased approach (Finn et al, 2016) is adopted to approximate the second term of in Eqn. (10) by a set of aging sequences sampled from the distribution .
where is the number of age sequences sampled from a sampling distribution . Then the gradient is given by
(11) 
where and .
The choice of the distribution is now critical to the success of the approximation. It can be adaptively optimized by first initialized with a uniform distribution and followed an iteratively threestep optimization process: (1) generate a set of aging sequences ; (2) update the cost function using Eqn. (11); and (3) refine the distribution as in Eqn. (12).
(12) 
To solve Eqn. (12), we adopt the optimization approach (Levine and Abbeel, 2014) that also results in a policy . The Algorithm 1 presents the learning procedure in our policy network and cost function parameters.
Face aging with single and multiple inputs: During testing stage, given a face, its inputs, i.e. image and age, are used in the first state . The action for is predicted by the policy network. Then, the synthesis component can produce the ageprogressed face for next state. This step is repeated until the age of the synthesized face reaches the target age.
Using this structure, the framework can be easily extended to take multiple inputs. Given inputs to the framework, they are first ordered by ages and an input sequence is created, where denotes the state with the th input face and age; and is the age difference between and . The synthesis component can be employed to obtain the values for latent variable . This variable can act as “memory” that encodes all information from the inputs. We then initialize and start the synthesis process as in the single input case.
4 Model Properties
Tractability and Invertibility. Similar to its predecessor, i.e. TNVP, with the specific structure of the invertible mapping function , both inference and generation processes in SDAP are exact, tractable, and invertible.
Generalization. During training stage, the action a is selected by stochastically sampling from the action probability distribution. This helps our model implicitly handle uncertainty during learning process and is generalized to all the aging steps of age controller.
Capability of learning from limited number of face images. As we can see in Eqn. (10), the first term is the datadependent expectation which can be easily computed using training data. For the second term, it is considered as the model expectation and computed via a samplingbased approach. Thanks to the sampling process, our model can still approximate the distribution with a small number of training sequences to be used for the first term.
5 Discussion
By setting up the invertible mapping functions as deep convolutional networks, SDAP structure is able to shares the advantages of its predecessor, i.e. TNVP, in the capabilities of efficiently capturing highly nonlinear facial features while maintaining a tractable loglikelihood density estimation. Besides aging variation, SDAP is also able to effectively handle other variations such as pose, expression, illumination, etc., as can be seen in Figs. 5 and 6.
Unlike TNVP, SDAP provides a more advanced architecture that optimizes the amount of aging information to be embedded to input face. This ability benefits not only the training process, i.e. maximize the training data usability, but also the testing phase, i.e. flexible and more controlled in the progressed face to be synthesized via the age controller. Fig. 3 illustrates different ageprogressed results obtained by varying the values of the age controller. The bigger gap value produces the older faces.
While previous aging approaches only embed the aging information via the relationships between the input image and age label (i.e. direct approach), or images of two consecutive age groups (i.e. stepbystep) approach, SDAP structure aims at learning from the entire age sequence with the learning objective is designed specifically for sequence (see Eqn. (7)). Under the IRL framework, the whole sequence can be fitted into the learning process for network optimization. As a result, more stable aging sequences can be synthesized.
Moreover, SDAP’s policy learning is more advanced compared to Imitation Learning through supervised learning. In particular, in SDAP, the aging relationship between variables in the whole sequence is
explicitly considered and optimized during learning process (see Eqn. 8). Therefore, besides the ability of generalization, SDAP is able to recover from “outoftrack” results in the middle steps during synthesizing. On the other hand, Imitation Learning lacks the generalization capability and cannot recover from failures (Attia and Dayan, 2018). Moreover, since the input face images usually contain different variations (i.e. poses, expressions, etc.) besides age variation, the synthesized results in the middle steps are easily deviated from the optimal trajectory of the demonstration. As a result, Imitation Learning will produce a cascade of errors and reduce the performance of the whole system.6 Experimental Results
6.1 Databases
The proposed SDAP approach is trained and evaluated using two training and two testing databases that are not overlapped. The training sets consist of images from AginG Faces in the Wild (Duong et al, 2016) and aging sequences from CrossAge Celebrity Dataset (Chen et al, 2014). In testing, two common databases, i.e. FGNET (fgN, 2004) and MORPH, (Ricanek Jr and Tesafaye, 2006) are employed.
AginG Faces in the Wild (AGFW): introduces a largescale dataset with 18,685 images collected from search engines and mugshot images from public domains.
CrossAge Celebrity Dataset (CACD) includes 163446 images of 2000 celebrities with the age range of 14 to 62.
FGNET is a common testing database for both age estimation and synthesis. It includes 1002 images of 82 subjects with the age range is from 0 to 69 years old.
MORPH provides two albums with different scales. The smallscale album consists of 1690 images while the largescale one includes 55134 images. We use the smallscale album in our evaluation.
6.2 Implementation Details
Data setting.
In order to train our SDAP model, we first extract the face regions of all images in AGFW and CACD and align them according to fix positions of two eyes and mouth corners. Then, we select all possible image pairs (at age and ) of a subset of 575 subjects from CACD such that and obtain 13,667 training pairs. From the images of these subjects, we further construct the observed aging sequence set by ordering all images of each subject by age. This process produces 575 aging sequences.
Training Stages.
Using these training data, we adopt the structure of mapping functions in (Duong et al, 2017) for our bijections and pretrain them using all images from AGFW for the capability of face interpretation. Then a twostep training process is applied. In the first step, the structure of synthesis unit with two functions and an age controller is employed to learn the aging transformation presented in all 13,667 training pairs. The synthesis units are then composed to formulate the synthesis component. Then in the second step, the SubjectDependent Aging Policy Learning is applied to embed the aging relationships of observed face sequences and learn the Policy Network.
Model Structure.
The structure of includes 10 mapping units where each unit is set up with 2 residual CNN blocks with 32 hidden feature maps for its scale and translation components. The convolution size is . The training batch size is set to 64. In the IRL model, a fully connected network with two hidden layers is employed to build policy model. Each layer contains 32 neural units followed by a ReLU activation. The input of this policy network is the state defined in Sec. 3.2 with the dimension of 12289. The output of this policy network is the probability for each action (
= 16) and tanh activation function is applied to obtain the predicted action. To model the reward/cost function, we adopted a regression network with two hidden layers to predict the reward given the state and action. Each layer in the network has 32 neural units and followed by a ReLU operator.
6.3 Age Progression
Since our SDAP is trained using face sequences with age ranging from 10 to 64 years old, it is evaluated using all faces above ten years old in FGNET and MORPH. Given faces of different subjects, our aging policy can find the optimal aging path to reach the target ages via intermediate ageprogressed steps (Fig. 5). Indeed, SDAP not only produces aging path for each individual, but also well handles inthewild facial variations, e.g. poses, expressions, etc.
In addition, the facial textures and shapes are also naturally and implicitly evolved according to individuals differences. In particular, more changes are focused around the ages of 20s, 40s and over 50s where beards and wrinkles naturally appear in the ageprogressed faces around those ages. In Fig. 6, we further compare our synthesized results against other recent work, including IAAP (Kemelmacher Shlizerman et al, 2014), CAAE (Zhang et al, 2017), and TNVP (Duong et al, 2017). The predicted aging path of each subject is also displayed for reference. When the age distance between the input and target ages becomes large, the direct age progression approaches usually produce synthesized faces that are similar to the input faces plus wrinkles. The stepbystep age progression tends to have better synthesis results but still limited in the amount of variations in synthesized faces. SDAP shows the advantages in the capability of capturing and producing more aging variations in faces of the same age group. Fig. 8 presents our further results at different ages with the real faces as reference.
6.4 Age Invariant Face Recognition
Our SDAP is also validated using the two testing protocols as in (Duong et al, 2017) with two benchmarking sets of crossage face verification, i.e. smallscale and largescale sets.
Smallscale crossage face verification.
In this protocol, we firstly construct a set A of 1052 randomly picked image pairs from FGNET with age span larger than 10 years old. There are 526 positive pairs (the same subjects) and 526 negative pairs (different subjects). For each pair in A, SDAP synthesizes the face of the younger age to the face of the older one. This process results in the set . The same process is then applied using other age progression methods, i.e. IAAP (Tsai et al, 2014), TRBM (Duong et al, 2016) and TNVP (Duong et al, 2017) to construct , and , respectively. Then, the False Rejection RateFalse Acceptance Rate (FRRFAR) is reported under the Receiver Operating Characteristic (ROC) curves as presented in Fig. 8(a). These results show that with adaptive aging path for each subject, our SDAP outperforms other age progression approaches with a significant improvement rate for matching performance over the original pairs.
Comparison with other approaches in age invariant face recognition (a) ROC curves of face verification on the smallscale testing protocol; (b) Cumulative Match Curve (CMC) and (c) ROC curves of SF model
(Liu et al, 2017) and its improvements using ageprogressed faces from TNVP (Duong et al, 2017) and our SDAP on the largescale testing protocol of MegaFace challenge 1.Largescale crossage face verification.
In the largescale testing protocol, we conduct Megaface face verification benchmarking (KemelmacherShlizerman et al, 2016) targeted on FGNET plus one million distractors to validate the capabilities of SADP. This is a very challenging benchmarking protocol which aims at validating face recognition algorithms not only for the age changing factors but also at the million scale of distractors (i.e. people who are not in the testing set). With this experiment, our goal is to show that using SADP, the performance of a face recognition algorithm can be boosted significantly without retraining with crossage databases. In this benchmarking, there are two datasets in Megaface including probe and gallery sets. The probe set is the FGNET while the gallery set consists of more than 1 million photos of 690K subjects. Practical face recognition models should achieve high performance against having gallery set of millions of distractors.
Fig. 8(b) illustrates how the Rank1 identification rates change when the number of distractors increases. The corresponding rates of all comparing methods at one million distractors are shown in Table 2. Fig. 8(c) presents the ROC curves respecting to True and False Acceptance Rates (TARFAR) ^{1}^{1}1The results of other methods are provided in MegaFace website.. The Sphere Face (SF) model (Liu et al, 2017), trained solely on a small scale CASIA dataset having M images without crossage information, achieves the best performance among all compared face matching approaches. Using our SDAP aging model, this face matching model can achieve even higher matching results in face verification. Moreover, these significant improvements gain without retraining the SF model and it outperforms other models as shown in Table 2.
Methods  Training set  Accuracy 

Barebones_FR  with crossage faces  7.136 % 
3DiVi  with crossage faces  15.78 % 
NTechLAB  with crossage faces  29.168 % 
DeepSense  with crossage faces  43.54 % 
SF (Liu et al, 2017)  without crossage faces  52.22% 
SF + TNVP  without crossage faces  61.53% 
SF + SDAP  without crossage faces  64.4% 
6.5 Age perceived of synthesized faces
In this section, the performance of SDAP is further evaluated by assessing the age perceived of the synthesized faces. The goal of this experiment is to validate whether the ageprogressed faces are perceived to be at the target ages. In particular, we adopt the age estimator of (Rothe et al, 2016) (i.e. the winner of the Looking At People (LAP) challenge) to the protocol of (Duong et al, 2016) and compare the Mean Absolute Error (MAE) on real faces and synthesized faces. In this evaluation, 802 realface images from FGNET are randomly selected and used to finetune the age estimator. The remaining images of FGNET are used to form the testing Set A of realfaces. Then, for each facial image of an individual in Set A, SDAP is adopted to progress that face to the ages where the subject’s real faces are available. This process results in the Set B of 361 images. Similar processes also adopted using TNVP (Duong et al, 2017) and TRBM (Duong et al, 2016) to obtain the Sets C and D, respectively. The age accuracy in terms of MAEs of these sets is showed in Table 3. These results again show that the MAE achieved by SDAP’s synthesized faces is comparable to the real faces. Moreover, comparing to the TRBM and TNVP, the difference in MAE between SDAP and real faces are smaller. This further shows that SDAP outperforms these approaches in terms of generating the ageprogressed faces at the target ages.
Inputs  MAEs 

Real Faces (Set A)  4.70 
SDAP’s synthesized faces (Set B)  4.90 
TNVP’s synthesized faces (Set C)  5.19 
TRBM’s synthesized faces (Set D)  5.33 
7 Conclusions
This work has presented a novel Generative Probabilistic Modeling under an IRL approach to age progression. The model inherits the strengths of both probabilistic graphical model and recent advances of deep network. Using the proposed tractable loglikelihood objective functions together deep features, our SDAP produce sharpened and enhanced skin texture ageprogressed faces. In addition, the proposed SDAP aims at providing a subjectdependent aging path with the optimal reward. Furthermore, it makes full advantage of input source by allowing using multiple images to optimize aging path. The experimental results conducted on various databases including largescale Megaface have proven the robustness and effectiveness of the proposed SDAP model on both face aging synthesis and crossage verification
References
 fgN (2004) (2004) Fgnet aging database. http://www.fgnet.rsunit.com
 Antipov et al (2017) Antipov G, Baccouche M, Dugelay JL (2017) Face aging with conditional generative adversarial networks. arXiv preprint arXiv:170201983
 Attia and Dayan (2018) Attia A, Dayan S (2018) Global overview of imitation learning. arXiv preprint arXiv:180106503
 Burt and Perrett (1995) Burt DM, Perrett DI (1995) Perception of age in adult caucasian male faces: Computer graphic manipulation of shape and colour information. Proceedings of the Royal Society of London B: Biological Sciences 259(1355):137–143
 Chen et al (2014) Chen BC, Chen CS, Hsu WH (2014) Crossage reference coding for ageinvariant face recognition and retrieval. In: ECCV, pp 768–783
 Duan et al (2016) Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P (2016) Benchmarking deep reinforcement learning for continuous control. In: International Conference on Machine Learning, pp 1329–1338
 Duong et al (2016) Duong CN, Luu K, Quach KG, Bui TD (2016) Longitudinal face modeling via temporal deep restricted boltzmann machines. In: CVPR, pp 5772–5780
 Duong et al (2017) Duong CN, Quach KG, Luu K, Le N, Savvides M (2017) Temporal nonvolume preserving approach to facial ageprogression and ageinvariant face recognition. In: The IEEE International Conference on Computer Vision (ICCV), pp 3755–3763
 Finn et al (2016) Finn C, Levine S, Abbeel P (2016) Guided cost learning: Deep inverse optimal control via policy optimization. In: International Conference on Machine Learning, pp 49–58
 Geng et al (2007) Geng X, Zhou ZH, SmithMiles K (2007) Automatic age estimation based on facial aging patterns. PAMI 29(12):2234–2240

Guo and Zhang (2014)
Guo G, Zhang C (2014) A study on crosspopulation age estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4257–4263
 Kemelmacher Shlizerman et al (2014) Kemelmacher Shlizerman I, Suwajanakorn S, Seitz SM (2014) Illuminationaware age progression. In: CVPR, IEEE, pp 3334–3341
 KemelmacherShlizerman et al (2016) KemelmacherShlizerman I, Seitz SM, Miller D, Brossard E (2016) The megaface benchmark: 1 million faces for recognition at scale. In: CVPR, pp 4873–4882
 Lanitis et al (2002) Lanitis A, Taylor CJ, Cootes TF (2002) Toward automatic simulation of aging effects on face images. PAMI 24(4):442–455
 Levine and Abbeel (2014) Levine S, Abbeel P (2014) Learning neural network policies with guided policy search under unknown dynamics. In: Advances in Neural Information Processing Systems, pp 1071–1079
 Liu et al (2017) Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) Sphereface: Deep hypersphere embedding for face recognition. arXiv preprint arXiv:170408063
 Luu et al (2009) Luu K, Suen C, Bui T, Ricanek JK (2009) Automatic childface ageprogression based on heritability factors of familial faces. In: BIdS, IEEE, pp 1–6
 Patterson et al (2006) Patterson E, Ricanek K, Albert M, Boone E (2006) Automatic representation of adult aging in facial images. In: Proc. IASTED Intâl Conf. Visualization, Imaging, and Image Processing, pp 171–176
 Ricanek Jr and Tesafaye (2006) Ricanek Jr K, Tesafaye T (2006) Morph: A longitudinal image database of normal adult ageprogression. In: FGR 2006., IEEE, pp 341–345
 Rothe et al (2016) Rothe R, Timofte R, Gool LV (2016) Deep expectation of real and apparent age from a single image without facial landmarks. International Journal of Computer Vision (IJCV)
 Rowland et al (1995) Rowland D, Perrett D, et al (1995) Manipulating facial appearance through shape and color. Computer Graphics and Applications, IEEE 15(5):70–76
 Shu et al (2015) Shu X, Tang J, Lai H, Liu L, Yan S (2015) Personalized age progression with aging dictionary. In: ICCV, pp 3970–3978
 Suo et al (2010) Suo J, Zhu SC, Shan S, Chen X (2010) A compositional and dynamic model for face aging. PAMI 32(3):385–401
 Suo et al (2012) Suo J, Chen X, Shan S, Gao W, Dai Q (2012) A concatenational graph evolution aging model. PAMI 34(11):2083–2096
 Taylor and Hinton (2009) Taylor GW, Hinton GE (2009) Factored conditional restricted boltzmann machines for modeling motion style. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 1025–1032
 Tsai et al (2014) Tsai MH, Liao YK, Lin IC (2014) Human face aging with guided prediction and detail synthesis. Multimedia tools and applications 72(1):801–824
 Wang et al (2016) Wang W, Cui Z, Yan Y, Feng J, Yan S, Shu X, Sebe N (2016) Recurrent face aging. In: CVPR, pp 2378–2386
 Yang et al (2016) Yang H, Huang D, Wang Y, Wang H, Tang Y (2016) Face aging effect simulation using hidden factor analysis joint sparse representation. TIP 25(6):2493–2507
 Zhang et al (2017) Zhang Z, Song Y, Qi H (2017) Age progression/regression by conditional adversarial autoencoder. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)