Learning from Longitudinal Face Demonstration - Where Tractable Deep Modeling Meets Inverse Reinforcement Learning

This paper presents a novel Generative Probabilistic Modeling under an Inverse Reinforcement Learning approach, named Subject-dependent Deep Aging Path (SDAP), to model the facial structures and the longitudinal face aging process of given subjects. The proposed SDAP is optimized using tractable log-likelihood objective functions with Convolutional Neural Networks based deep feature extraction. In addition, instead of using a fixed aging development path for all input faces and subjects, SDAP is able to provide the most appropriate aging development path for each subject that optimizes the reward aging formulation. Unlike previous methods that can take only one image as the input, SDAP allows multiple images as inputs, i.e. all information of a subject at either the same or different ages, to produce the optimal aging path for the subject. Finally, SDAP allows efficiently synthesizing in-the-wild aging faces without a complicated pre-processing step. The proposed method is experimented in both tasks of face aging synthesis and cross-age face verification. The experimental results consistently show the state-of-the-art performance using SDAP on numerous face aging databases, i.e. FG-NET, MORPH, AginG Faces in the Wild (AGFW), and Cross-Age Celebrity Dataset (CACD). The method also performs on the large-scale Megaface challenge 1 to demonstrate the advantages of the proposed solution.

READ FULL TEXT VIEW PDF

page 6

page 7

11/27/2018

Automatic Face Aging in Videos via Deep Reinforcement Learning

This paper presents a novel approach for synthesizing automatically age-...
03/24/2017

Temporal Non-Volume Preserving Approach to Facial Age-Progression and Age-Invariant Face Recognition

Modeling the long-term facial aging process is extremely challenging due...
06/07/2016

Longitudinal Face Modeling via Temporal Deep Restricted Boltzmann Machines

Modeling the face aging process is a challenging task due to large and n...
02/19/2016

Large age-gap face verification by feature injection in deep networks

This paper introduces a new method for face verification across large ag...
09/27/2018

3D Face Synthesis Driven by Personality Impression

Synthesizing 3D faces that give certain personality impressions is commo...
04/03/2017

A Good Practice Towards Top Performance of Face Recognition: Transferred Deep Feature Fusion

Unconstrained face recognition performance evaluations have traditionall...
09/16/2019

Unsupervised Eyeglasses Removal in the Wild

Eyeglasses removal is challenging in removing different kinds of eyeglas...

1 Introduction

The problem of face aging targets on the capabilities to aesthetically synthesize the faces of a subject at older ages, i.e. age progression, or younger ages, i.e. age regression or deaging

. This problem is applicable in various real-world applications from age invariant face verification, finding missing children to cosmetic studies. Indeed, face aging has raised considerable attentions in computer vision and machine learning communities recently. Several breakthroughs with numerous face aging approaches, varying from anthropology theories to deep learning structures have been presented in literature. However, the synthesized face aging results in these previous approaches are still far from perfect due to various challenging factors, such as heredity, living styles, etc. In addition, face aging databases used in most methods to learn the aging processes are usually limited in both number of images per subject and the covered age ranges of each subject.

Both conventional and deep learning methods usually include two directions, i.e. direct and step-by-step aging synthesis, in exploring the temporal face aging features from training databases. In the former direction, these methods directly synthesize a face to the target age using the relationships between training images and their corresponding age labels. For example, the prototyping approaches (Burt and Perrett, 1995; Kemelmacher Shlizerman et al, 2014; Rowland et al, 1995) use age labels to organize images into age groups and compute average faces for their prototypes. Then, the difference between source-age and target-age prototypes is applied directly to the input image to obtain the age-progressed face at the target age. Similarly, the Generative Adversarial Networks (GAN) approach (Zhang et al, 2017) models the relationship between high-level representation of input faces and age labels by constructing a deep-neural-network generator. This generator is then incorporated with the target age labels to synthesize the outputs. Although these kinds of models are easy to train, they are limited in capabilities to synthesize faces much older than the input faces of the same subject, e.g. directly from ten to 60 years old. Indeed, the progression of a face at ten years old to the one at 60 years old in these methods usually ends up with a synthesized face using 10-year-old features plus wrinkles.

Meanwhile, the latter approaches (Duong et al, 2017, 2016; Shu et al, 2015; Wang et al, 2016; Yang et al, 2016) decompose the long-term aging process into short-term developments and focus on the aging transform embedding between faces of two consecutive development stages. Using learned transformation, these methods step-by-step generate progressed faces from one age group to the next until reaching the target. These modeling structures can efficiently learn the temporal information and provide more age variation even when a target age is very far from the input age of a subject. However, the main limitation of these methods is the lack of longitudinal face aging databases. The longest training sequence usually contains only three or four images per subject.

Limitations of previous approaches.

In either directions (i.e. direct or step-by-step aging synthesis) the aging approach falls in, these previous approaches still suffer from many challenging factors and remain with lots of limitations. Table 1 compares the properties of different aging approaches.

  • Non-linearity. Since human aging is a complicated and highly nonlinear process, the linear models mostly used in conventional methods (i.e. prototype, AAMs-based and 3DMM-based approaches) are not able to efficiently interpret the aging variations and the quality of their synthesized results is very limited.

  • Loss function of deep structure. The use of a fixed reconstruction loss function, i.e. -norm, in the proposed deep structures, i.e. (Wang et al, 2016; Yang et al, 2016), usually produces blurry synthesis results.

  • Tractability. Exploiting the advantages of probabilistic graphical models has introduced a potential direction for deep model design and produced prominent synthesized results for the age progression task (Duong et al, 2016).

  • Data usability. Even though a subject in training/testing set has multiple images at the same age, there is only one image used to learn/synthesize in these methods. The other images are usually wastefully ignored. In addition, the aging transformation embedding in these approaches is only able to proceed on images from two age groups.

  • Fixed aging development path. The learned aging development path is identically applied for all subjects which is not true in reality. Instead, each subject should have his/her own aging development.

Contributions of this work.

The paper presents a novel Subject-dependent Deep Aging Path (SDAP) model to face age progression, which is an extension of our previous work (Duong et al, 2017). In that work, TNVP structure is proposed to embed the pairwise transformations between two consecutive age groups. In this work, the SDAP structure is introduced to further enhance the capability to discover the optimal aging development path for each individual. This goal can be done by embedding the transformation over the whole aging sequence of a subject under an IRL framework. Our contributions can be summarized as follows.

  1. The aging transformation embedding is designed using (1) a tractable log-likelihood

    density estimation with (2) Convolution Neural Network (CNN) structures and (3) an

    age controller to indicate the amount of aging changes for synthesis. Thus, the proposed SDAP is able to provide a smoother synthesis across faces and maximize the usability of aging data, i.e. all images of a subject in different or the same ages are utilized.

  2. Unlike most previous methods, our proposed SDAP model further enhances the capability to find the optimal aging development path for individual. This goal can be done by embedding the transformation over the whole aging sequence of a subject under an IRL framework.

  3. Instead of using pre-defined or add-hoc aging reward and objective functions as in most previous work, our proposed approach allows the algorithm to automatically come up with the optimal objective formulation and parameters via a data driven strategy in training.

We believe that this is the first work that designs an IRL framework to model the longitudinal face aging.

2 Related work

This section reviews recent methods in face age progression. These methods can be technically classified into four categories, i.e. modeling, reconstruction, prototyping, and deep learning-based methods.

Modeling-based aging is one of the earliest categories presented face age progression. These methods usually model both facial shapes and textures using a set of parameters, and learn the face aging process via aging functions. (Patterson et al, 2006) and (Lanitis et al, 2002) employed a set of Active Appearance Models (AAMs) parameters with four aging functions to model both the general and the specific aging processes. Luu et al. (Luu et al, 2009) combined familial facial cues to the process of face age progression. (Geng et al, 2007) presented AGing pattErn Subspace (AGES) method to construct a subspace for aging patterns as a chronological sequence of face images. (Tsai et al, 2014) then enhanced AGES using guidance faces corresponding to the subject’s characteristics to produce more stable results. Texture synthesis was also combined in the later stage to produce better facial details. (Suo et al, 2010, 2012)

introduced the three-layer And-Or Graph (AOG) of smaller parts, i.e. eyes, nose, mouth, etc., to model a face. Then, the face aging process was learned for each part using a Markov chain.

Reconstruction-based aging methods model aging faces by unifying the aging basis in each group. (Yang et al, 2016) represented person-specific and age-specific factors independently using sparse representation hidden factor analysis (HFA). (Shu et al, 2015) presented the aging coupled dictionaries (CDL) to model personalized aging patterns by preserving personalized facial features.

Prototyping-based aging methods employ the age prototypes to produce new face images. The average faces of all age groups are used as the prototypes (Rowland et al, 1995). Then, input face image can be progressed to the target age by incorporating the differences between the prototypes of two age groups (Burt and Perrett, 1995). (Kemelmacher Shlizerman et al, 2014) presented a method to construct high quality average prototypes from a large-scale set of images. The subspace alignment and illumination normalization were also included in this system. Aging patterns across genders and ethnicities were also investigated in (Guo and Zhang, 2014).

Deep learning-based aging approaches have recently achieved considerable results in face age progression using the power of deep learning. (Duong et al, 2016)

introduced Temporal Restricted Boltzmann Machines (TRBM) to represent the non-linear aging process with geometry constraints and spatial RBMs to model a sequence of reference faces and wrinkles of adult faces.

(Wang et al, 2016)

approximated aging sequences using a Recurrent Neural Networks (RNN) with two-layer Gated Recurrent Unit (GRU). Recently, the structure of Conditional Adversarial Autoencoder (CAAE) is also applied to synthesize aged images in

(Antipov et al, 2017). (Duong et al, 2017) proposed a novel generative probabilistic model, called Temporal Non-Volume Preserving (TNVP) transformation, to model a long-term facial aging process as a sequence of short-term stages.

Figure 2: The structures of (a) a Synthesis Component as a composition of Synthesis Units; (b) a mapping function with two mapping units; and (c) the aging transformation component. During synthesizing process, the value for aging controller at each step is predicted by a Policy learned through an Inverse Reinforcement Learning framework.

3 Our Proposed SDAP

TNVP structure has provided an efficient model to capture the pairwise transformation between faces of consecutive age groups (Duong et al, 2017). However, it still has some limitations. Firstly, the TNVP mainly focuses on the pairwise relationship rather than the long-term relationship presented in an aging sequence. Secondly, capability of applying different development paths for different subjects is still absent. In reality, each subject should have his/her own aging development progress because each person ages differently. In this section, we introduce a more flexible structure, named Subject-dependent Deep Aging Path (SDAP), with an additional component, i.e. an age controller. This age controller provides the capability of defining how much age variation should be added during synthesis. This architecture, therefore, benefits both training stages, i.e. by maximizing the usability of training aging data, and testing stage, i.e. becoming more flexible to adopt different aging path to different subjects according their features. Moreover, instead of only learning from image pairs of a subject in two consecutive age groups, SDAP has the capability of embedding the aging transformation from longer aging sequences of that subject which efficiently reflects the long-term aging development of the subject. We also show that goal can be achieved under an Inverse Reinforcement Learning (IRL) framework. The structure of this section is as follows: We first present our novel approach to model the facial structures in Subsection 3.1. Then, our IRL learning approach to the longitudinal face aging modeling is detailed in Subsection 3.2.

3.1 Aging Embedding with Age Controller

The proposed architecture consists of three main components, i.e. (1) latent space mapping, (2) aging transformation, and (3) age controller. Our age controller provides the capability of defining how much age variation should be added during synthesis. Using this structure, our model is flexible to aging in different ways corresponding to the input faces. Moreover, it also helps to maximize the usability of training aging data.

Structures and Variable Relationship Modeling:

Our graphical model (Fig. 2) consists of three sets of variables: observed variables encoding the textures of face images in the image domain at two stages and ; their corresponding latent variables in the latent space ; and an aging controller variable . The aging controller

is represented as a one-hot vector indicating how many years old the progression process should perform on

. The bijection functions , mapping from the observation space to the latent space, and the aging transformation are defined as in Eqn. (1).

(1)

where denotes the set of parameters of , , , respectively. Notice that in SDAP, the structure of bijection functions is adopted from TNVP architecture. Then, the relationship between latent variables and is computed as .

The interactions between latent variables and the aging controller variable are 3-way multiplicative. They can be mathematically encoded as in Eqn. (2).

(2)

where

is a 3-way tensor weight matrix and

is the bias of these connections. Eqn. (2) enables two important properties in the architecture. First, since is an one-hot vector, different controllers will enable different sets of weights to be used. Thus, it allows controlling the amount of aging information to be embedded to the aging process. Second, given the age controller, the model is able to use all images of a subject to enhance its performance.

Figure 3: Synthesized results with different values of aging controller. Given an input image, by varying the values of aging controller, different age-progressed images can be obtained. Notice that the age controller can help to efficiently control the amount to aging features to be embedded while maintaining other variations between synthesized faces. Best viewed in color.

In practice, the large number of parameters of the 3-way tensor matrix may have negative effects to the scalability of the model. Thus, can be further factorized into three matrices , , and with factors by adopting (Taylor and Hinton, 2009) as in Eqn. (3).

(3)

where stands for the Hadamard product.

The Log-likelihood:

Given a face in the age group

, the probability density function can be formulated as,

(4)

where and are the distribution of conditional on and the distribution of conditional on , respectively. Then, the log-likelihood can be computed as follows:

The Joint Distributions:

In order to model the aging transformation flow, the Gaussian distribution is presented as the prior distribution

for the latent space. After mapping to the latent space, the age controller variables are also constrained as a Gaussian distribution. In particular, let represent the latent variables of . The latent variables distribute as Gaussians with means and covariances respectively. Then, the latent is as,

(5)

Since the connection between and

embeds the relationship between variables of different Gaussian distributions, we further assume that their joint distribution is also a Gaussian. Then, the joint distribution

can be computed as follows.

where is an all-ones vector.

The Objective Function:

The parameter of the model is optimized to maximize the log-likelihood as in Eqn. (6).

(6)

This constraint is then incorporated to the objective function

where is the log-likelihood function of given mean and covariance .

3.2 IRL Learning from Aging Sequence

Figure 4: The Subject-Dependent Aging Policy Learning Framework. Given the Age Sequence Demonstrations, the cost function is learned by maximizing the log-likelihood of these sequences. Its output is then can be used to optimize the Subject Dependent Aging Policy Network which later is able to predict the most appropriate aging path for each subject.

In this section, we further extend the capability of our model by defining an Subject-dependent Deep Aging Policy Network to provide a planning aging path for the aging controller. Consequently, the synthesized sequence, i.e. , is guaranteed to be the best choice in the face aging development for a given subject.

Let be the observed age sequence of the -th subject and be the set of all aging sequences in the dataset. The probability of a sequence can be defined as

(7)

where is an energy function parameterized by , and is the partition function computed using all possible aging sequences . Then, the goal is to learn a model such that the log-likelihood of the observed aging sequences is maximized as follows:

(8)

In Eqn. (8), if is considered as a form of a reward function, then the problem is equivalent to learning a policy network from a Reinforcement Learning (RL) system given a set of demonstrations .

The reward function is the key element for policy learning in RL. However, pre-defining a reasonable reward function for face aging synthesis is impossible in practice. Indeed, it is very hard to measure the goodness of the age-progressed images even the ground-truth faces of the subject at these ages are available. Therefore, rather than pre-define an add-hoc aging reward, the energy is represented as a neural network with parameters and adopted as a non-linear cost function of an Inverse Reinforcement Learning problem.

In this IRL system, can be directly learned from the set of observed aging sequences . Fig. 4 illustrates the structure of the proposed IRL framework. Based on this structure, given a set of aging sequences as demonstrations, not only the cost function can be learned to maximize the log-likelihood of observed age sequence but also the policy, i.e. predicting aging path for each individual, is obtained with respect to the optimized cost.

Mathematically, the IRL based age progression procedure can be formulated as follows. Let

be a Markov Decision Process (MDP) where

denote the state space, the action space, and the transition model, respectively. is the set of observed aging sequences and represents the cost function. Given an MDP , our goal is to discover the unknown cost function from the observation as well as simultaneously extract the policy that minimizes the learned cost function.

State: The state is defined as a composition of two information, i.e. the face image at the -th stage; and the age label of .

Action: Similar to the age controller, an action is defined as the amount of aging variations that the progression process should perform on state . Given , an action

is selected by stochastically sampling from the action probability distribution. During testing, given the current state, the action with the highest probability is chosen for synthesizing process. Due to data availability where the largest aging distance between the starting and ending images of a sequence is 15, we choose the length of

(i.e. plus one state of where and has the same age).

Cost Function: The cost function plays a crucial role to guide the whole system to learn the sequential policies to obtain a specific aging path for each subject. Getting a state and as inputs, the cost function maps them to a value . Thus, the cost for the -th aging sequence can be obtained as . In order to learn a complex and nonlinear cost formulation, each

is approximated by a neural network with two hidden layers of 32 hidden units followed by Rectified Linear Unit (ReLU).

Policy: Given the cost function , the policy is presented as a Gaussian trajectory distribution as follows.

(9)

Then it is optimized respecting to the expected cost function .

Given the defined state and action , the observation aging sequence is redefined as . The log-likelihood in Eqn. (8) can be rewritten as,

(10)
1:Input: Observed age sequence where .
2: Output: optimized cost params and distribution
3:Initialization: Randomly initialize policy distribution

with a uniform distribution.

4:for  to  do
5:   Sample aging paths from .
6:   for  to  do
7:      Apply synthesis component in Section 3.1 given
8:       the -th sampled aging path and the starting state
9:       to obtain the sampled sequence .
10:      Add to
11:   end for
12:   for  to  do
13:      Sample a batch of observed sequence
14:      Sample a batch of sampling sequence
15:      
16:      Compute the gradient using Eqn. (11)
17:      Update using
18:   end for
19:   Update with and to using approach in (Levine and Abbeel, 2014).
20:end for
Algorithm 1 Subject-Dependent Aging Policy Learning

Since the computation of the partition function is intractable, the sampling-based approach (Finn et al, 2016) is adopted to approximate the second term of in Eqn. (10) by a set of aging sequences sampled from the distribution .

where is the number of age sequences sampled from a sampling distribution . Then the gradient is given by

(11)

where and .

The choice of the distribution is now critical to the success of the approximation. It can be adaptively optimized by first initialized with a uniform distribution and followed an iteratively three-step optimization process: (1) generate a set of aging sequences ; (2) update the cost function using Eqn. (11); and (3) refine the distribution as in Eqn. (12).

(12)

To solve Eqn. (12), we adopt the optimization approach (Levine and Abbeel, 2014) that also results in a policy . The Algorithm 1 presents the learning procedure in our policy network and cost function parameters.

Figure 5: Age progression results on FG-NET. Given images at different ages and the target age of 60s, SDAP automatically predicts the optimal aging path and produce plausible age-progressed faces for each subject. Best viewed in color.

Face aging with single and multiple inputs: During testing stage, given a face, its inputs, i.e. image and age, are used in the first state . The action for is predicted by the policy network. Then, the synthesis component can produce the age-progressed face for next state. This step is repeated until the age of the synthesized face reaches the target age.

Using this structure, the framework can be easily extended to take multiple inputs. Given inputs to the framework, they are first ordered by ages and an input sequence is created, where denotes the state with the -th input face and age; and is the age difference between and . The synthesis component can be employed to obtain the values for latent variable . This variable can act as “memory” that encodes all information from the inputs. We then initialize and start the synthesis process as in the single input case.

4 Model Properties

Tractability and Invertibility. Similar to its predecessor, i.e. TNVP, with the specific structure of the invertible mapping function , both inference and generation processes in SDAP are exact, tractable, and invertible.

Generalization. During training stage, the action a is selected by stochastically sampling from the action probability distribution. This helps our model implicitly handle uncertainty during learning process and is generalized to all the aging steps of age controller.

Capability of learning from limited number of face images. As we can see in Eqn. (10), the first term is the data-dependent expectation which can be easily computed using training data. For the second term, it is considered as the model expectation and computed via a sampling-based approach. Thanks to the sampling process, our model can still approximate the distribution with a small number of training sequences to be used for the first term.

5 Discussion

By setting up the invertible mapping functions as deep convolutional networks, SDAP structure is able to shares the advantages of its predecessor, i.e. TNVP, in the capabilities of efficiently capturing highly nonlinear facial features while maintaining a tractable log-likelihood density estimation. Besides aging variation, SDAP is also able to effectively handle other variations such as pose, expression, illumination, etc., as can be seen in Figs. 5 and 6.

Unlike TNVP, SDAP provides a more advanced architecture that optimizes the amount of aging information to be embedded to input face. This ability benefits not only the training process, i.e. maximize the training data usability, but also the testing phase, i.e. flexible and more controlled in the progressed face to be synthesized via the age controller. Fig. 3 illustrates different age-progressed results obtained by varying the values of the age controller. The bigger gap value produces the older faces.

While previous aging approaches only embed the aging information via the relationships between the input image and age label (i.e. direct approach), or images of two consecutive age groups (i.e. step-by-step) approach, SDAP structure aims at learning from the entire age sequence with the learning objective is designed specifically for sequence (see Eqn. (7)). Under the IRL framework, the whole sequence can be fitted into the learning process for network optimization. As a result, more stable aging sequences can be synthesized.

Moreover, SDAP’s policy learning is more advanced compared to Imitation Learning through supervised learning. In particular, in SDAP, the aging relationship between variables in the whole sequence is

explicitly considered and optimized during learning process (see Eqn. 8). Therefore, besides the ability of generalization, SDAP is able to recover from “out-of-track” results in the middle steps during synthesizing. On the other hand, Imitation Learning lacks the generalization capability and cannot recover from failures (Attia and Dayan, 2018). Moreover, since the input face images usually contain different variations (i.e. poses, expressions, etc.) besides age variation, the synthesized results in the middle steps are easily deviated from the optimal trajectory of the demonstration. As a result, Imitation Learning will produce a cascade of errors and reduce the performance of the whole system.

6 Experimental Results

Figure 6: Comparisons between our SDAP against direct approach, i.e. IAAP (Kemelmacher Shlizerman et al, 2014), CAAE (Zhang et al, 2017), and step-by-step approach, i.e. TNVP (Duong et al, 2017), on FGNET. Best viewed in color.

6.1 Databases

The proposed SDAP approach is trained and evaluated using two training and two testing databases that are not overlapped. The training sets consist of images from AginG Faces in the Wild (Duong et al, 2016) and aging sequences from Cross-Age Celebrity Dataset (Chen et al, 2014). In testing, two common databases, i.e. FG-NET (fgN, 2004) and MORPH, (Ricanek Jr and Tesafaye, 2006) are employed.

AginG Faces in the Wild (AGFW): introduces a large-scale dataset with 18,685 images collected from search engines and mugshot images from public domains.

Cross-Age Celebrity Dataset (CACD) includes 163446 images of 2000 celebrities with the age range of 14 to 62.

FG-NET is a common testing database for both age estimation and synthesis. It includes 1002 images of 82 subjects with the age range is from 0 to 69 years old.

MORPH provides two albums with different scales. The small-scale album consists of 1690 images while the large-scale one includes 55134 images. We use the small-scale album in our evaluation.

6.2 Implementation Details

Data setting.

In order to train our SDAP model, we first extract the face regions of all images in AGFW and CACD and align them according to fix positions of two eyes and mouth corners. Then, we select all possible image pairs (at age and ) of a subset of 575 subjects from CACD such that and obtain 13,667 training pairs. From the images of these subjects, we further construct the observed aging sequence set by ordering all images of each subject by age. This process produces 575 aging sequences.

Training Stages.

Using these training data, we adopt the structure of mapping functions in (Duong et al, 2017) for our bijections and pretrain them using all images from AGFW for the capability of face interpretation. Then a two-step training process is applied. In the first step, the structure of synthesis unit with two functions and an age controller is employed to learn the aging transformation presented in all 13,667 training pairs. The synthesis units are then composed to formulate the synthesis component. Then in the second step, the Subject-Dependent Aging Policy Learning is applied to embed the aging relationships of observed face sequences and learn the Policy Network.

Model Structure.

The structure of includes 10 mapping units where each unit is set up with 2 residual CNN blocks with 32 hidden feature maps for its scale and translation components. The convolution size is . The training batch size is set to 64. In the IRL model, a fully connected network with two hidden layers is employed to build policy model. Each layer contains 32 neural units followed by a ReLU activation. The input of this policy network is the state defined in Sec. 3.2 with the dimension of 12289. The output of this policy network is the probability for each action (

= 16) and tanh activation function is applied to obtain the predicted action. To model the reward/cost function, we adopted a regression network with two hidden layers to predict the reward given the state and action. Each layer in the network has 32 neural units and followed by a ReLU operator.

The training time for our models is 24.75 hours in total on a machine with a Core i7-6700@3.4GHz CPU, 64.00 GB RAM and a GPU NVIDIA GTX Titan X. We develop and evaluate our inverreinforcement learning algorithm 1 and models using the framework (rllab) (Duan et al, 2016).

6.3 Age Progression

Since our SDAP is trained using face sequences with age ranging from 10 to 64 years old, it is evaluated using all faces above ten years old in FG-NET and MORPH. Given faces of different subjects, our aging policy can find the optimal aging path to reach the target ages via intermediate age-progressed steps (Fig. 5). Indeed, SDAP not only produces aging path for each individual, but also well handles in-the-wild facial variations, e.g. poses, expressions, etc.

In addition, the facial textures and shapes are also naturally and implicitly evolved according to individuals differences. In particular, more changes are focused around the ages of 20s, 40s and over 50s where beards and wrinkles naturally appear in the age-progressed faces around those ages. In Fig. 6, we further compare our synthesized results against other recent work, including IAAP (Kemelmacher Shlizerman et al, 2014), CAAE (Zhang et al, 2017), and TNVP (Duong et al, 2017). The predicted aging path of each subject is also displayed for reference. When the age distance between the input and target ages becomes large, the direct age progression approaches usually produce synthesized faces that are similar to the input faces plus wrinkles. The step-by-step age progression tends to have better synthesis results but still limited in the amount of variations in synthesized faces. SDAP shows the advantages in the capability of capturing and producing more aging variations in faces of the same age group. Fig. 8 presents our further results at different ages with the real faces as reference.

Figure 7: Age progression using SDAP on FGNET. Given images (1st column), SDAP synthesizes the subject’s faces at different ages (row above) against the ground-truth (row below). Best viewed in color.

6.4 Age Invariant Face Recognition

Our SDAP is also validated using the two testing protocols as in (Duong et al, 2017) with two benchmarking sets of cross-age face verification, i.e. small-scale and large-scale sets.

Small-scale cross-age face verification.

In this protocol, we firstly construct a set A of 1052 randomly picked image pairs from FG-NET with age span larger than 10 years old. There are 526 positive pairs (the same subjects) and 526 negative pairs (different subjects). For each pair in A, SDAP synthesizes the face of the younger age to the face of the older one. This process results in the set . The same process is then applied using other age progression methods, i.e. IAAP (Tsai et al, 2014), TRBM (Duong et al, 2016) and TNVP (Duong et al, 2017) to construct , and , respectively. Then, the False Rejection Rate-False Acceptance Rate (FRR-FAR) is reported under the Receiver Operating Characteristic (ROC) curves as presented in Fig. 8(a). These results show that with adaptive aging path for each subject, our SDAP outperforms other age progression approaches with a significant improvement rate for matching performance over the original pairs.

Figure 8: Age progression using SDAP on MORPH. Given images (1st column), SDAP synthesizes the subject’s faces at different ages (row above) against the ground-truth (row below). Best viewed in color.
(a) ROC curves of FG-NET pairs
(b) CMC curves on MegaFace
(c) ROC curves on MegaFace
Figure 9:

Comparison with other approaches in age invariant face recognition (a) ROC curves of face verification on the small-scale testing protocol; (b) Cumulative Match Curve (CMC) and (c) ROC curves of SF model

(Liu et al, 2017) and its improvements using age-progressed faces from TNVP (Duong et al, 2017) and our SDAP on the large-scale testing protocol of MegaFace challenge 1.

Large-scale cross-age face verification.

In the large-scale testing protocol, we conduct Megaface face verification benchmarking (Kemelmacher-Shlizerman et al, 2016) targeted on FG-NET plus one million distractors to validate the capabilities of SADP. This is a very challenging benchmarking protocol which aims at validating face recognition algorithms not only for the age changing factors but also at the million scale of distractors (i.e. people who are not in the testing set). With this experiment, our goal is to show that using SADP, the performance of a face recognition algorithm can be boosted significantly without retraining with cross-age databases. In this benchmarking, there are two datasets in Megaface including probe and gallery sets. The probe set is the FGNET while the gallery set consists of more than 1 million photos of 690K subjects. Practical face recognition models should achieve high performance against having gallery set of millions of distractors.

Fig. 8(b) illustrates how the Rank-1 identification rates change when the number of distractors increases. The corresponding rates of all comparing methods at one million distractors are shown in Table 2. Fig. 8(c) presents the ROC curves respecting to True and False Acceptance Rates (TAR-FAR) 111The results of other methods are provided in MegaFace website.. The Sphere Face (SF) model (Liu et al, 2017), trained solely on a small scale CASIA dataset having M images without cross-age information, achieves the best performance among all compared face matching approaches. Using our SDAP aging model, this face matching model can achieve even higher matching results in face verification. Moreover, these significant improvements gain without re-training the SF model and it outperforms other models as shown in Table 2.

Methods Training set Accuracy
Barebones_FR with cross-age faces 7.136 %
3DiVi with cross-age faces 15.78 %
NTechLAB with cross-age faces 29.168 %
DeepSense with cross-age faces 43.54 %
SF (Liu et al, 2017) without cross-age faces 52.22%
SF + TNVP without cross-age faces 61.53%
SF + SDAP without cross-age faces 64.4%
Table 2: Comparison results in Rank-1 Identification Accuracy with one million Distractors on MegaFace #1 FG-NET.

6.5 Age perceived of synthesized faces

In this section, the performance of SDAP is further evaluated by assessing the age perceived of the synthesized faces. The goal of this experiment is to validate whether the age-progressed faces are perceived to be at the target ages. In particular, we adopt the age estimator of (Rothe et al, 2016) (i.e. the winner of the Looking At People (LAP) challenge) to the protocol of (Duong et al, 2016) and compare the Mean Absolute Error (MAE) on real faces and synthesized faces. In this evaluation, 802 real-face images from FGNET are randomly selected and used to fine-tune the age estimator. The remaining images of FGNET are used to form the testing Set A of real-faces. Then, for each facial image of an individual in Set A, SDAP is adopted to progress that face to the ages where the subject’s real faces are available. This process results in the Set B of 361 images. Similar processes also adopted using TNVP (Duong et al, 2017) and TRBM (Duong et al, 2016) to obtain the Sets C and D, respectively. The age accuracy in terms of MAEs of these sets is showed in Table 3. These results again show that the MAE achieved by SDAP’s synthesized faces is comparable to the real faces. Moreover, comparing to the TRBM and TNVP, the difference in MAE between SDAP and real faces are smaller. This further shows that SDAP outperforms these approaches in terms of generating the age-progressed faces at the target ages.

Inputs MAEs
Real Faces (Set A) 4.70
SDAP’s synthesized faces (Set B) 4.90
TNVP’s synthesized faces (Set C) 5.19
TRBM’s synthesized faces (Set D) 5.33
Table 3: MAEs (years) of Age Estimation System on Real and Age-progressed faces.

7 Conclusions

This work has presented a novel Generative Probabilistic Modeling under an IRL approach to age progression. The model inherits the strengths of both probabilistic graphical model and recent advances of deep network. Using the proposed tractable log-likelihood objective functions together deep features, our SDAP produce sharpened and enhanced skin texture age-progressed faces. In addition, the proposed SDAP aims at providing a subject-dependent aging path with the optimal reward. Furthermore, it makes full advantage of input source by allowing using multiple images to optimize aging path. The experimental results conducted on various databases including large-scale Megaface have proven the robustness and effectiveness of the proposed SDAP model on both face aging synthesis and cross-age verification

References

  • fgN (2004) (2004) Fg-net aging database. http://www.fgnet.rsunit.com
  • Antipov et al (2017) Antipov G, Baccouche M, Dugelay JL (2017) Face aging with conditional generative adversarial networks. arXiv preprint arXiv:170201983
  • Attia and Dayan (2018) Attia A, Dayan S (2018) Global overview of imitation learning. arXiv preprint arXiv:180106503
  • Burt and Perrett (1995) Burt DM, Perrett DI (1995) Perception of age in adult caucasian male faces: Computer graphic manipulation of shape and colour information. Proceedings of the Royal Society of London B: Biological Sciences 259(1355):137–143
  • Chen et al (2014) Chen BC, Chen CS, Hsu WH (2014) Cross-age reference coding for age-invariant face recognition and retrieval. In: ECCV, pp 768–783
  • Duan et al (2016) Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P (2016) Benchmarking deep reinforcement learning for continuous control. In: International Conference on Machine Learning, pp 1329–1338
  • Duong et al (2016) Duong CN, Luu K, Quach KG, Bui TD (2016) Longitudinal face modeling via temporal deep restricted boltzmann machines. In: CVPR, pp 5772–5780
  • Duong et al (2017) Duong CN, Quach KG, Luu K, Le N, Savvides M (2017) Temporal non-volume preserving approach to facial age-progression and age-invariant face recognition. In: The IEEE International Conference on Computer Vision (ICCV), pp 3755–3763
  • Finn et al (2016) Finn C, Levine S, Abbeel P (2016) Guided cost learning: Deep inverse optimal control via policy optimization. In: International Conference on Machine Learning, pp 49–58
  • Geng et al (2007) Geng X, Zhou ZH, Smith-Miles K (2007) Automatic age estimation based on facial aging patterns. PAMI 29(12):2234–2240
  • Guo and Zhang (2014)

    Guo G, Zhang C (2014) A study on cross-population age estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4257–4263

  • Kemelmacher Shlizerman et al (2014) Kemelmacher Shlizerman I, Suwajanakorn S, Seitz SM (2014) Illumination-aware age progression. In: CVPR, IEEE, pp 3334–3341
  • Kemelmacher-Shlizerman et al (2016) Kemelmacher-Shlizerman I, Seitz SM, Miller D, Brossard E (2016) The megaface benchmark: 1 million faces for recognition at scale. In: CVPR, pp 4873–4882
  • Lanitis et al (2002) Lanitis A, Taylor CJ, Cootes TF (2002) Toward automatic simulation of aging effects on face images. PAMI 24(4):442–455
  • Levine and Abbeel (2014) Levine S, Abbeel P (2014) Learning neural network policies with guided policy search under unknown dynamics. In: Advances in Neural Information Processing Systems, pp 1071–1079
  • Liu et al (2017) Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) Sphereface: Deep hypersphere embedding for face recognition. arXiv preprint arXiv:170408063
  • Luu et al (2009) Luu K, Suen C, Bui T, Ricanek JK (2009) Automatic child-face age-progression based on heritability factors of familial faces. In: BIdS, IEEE, pp 1–6
  • Patterson et al (2006) Patterson E, Ricanek K, Albert M, Boone E (2006) Automatic representation of adult aging in facial images. In: Proc. IASTED Int’l Conf. Visualization, Imaging, and Image Processing, pp 171–176
  • Ricanek Jr and Tesafaye (2006) Ricanek Jr K, Tesafaye T (2006) Morph: A longitudinal image database of normal adult age-progression. In: FGR 2006., IEEE, pp 341–345
  • Rothe et al (2016) Rothe R, Timofte R, Gool LV (2016) Deep expectation of real and apparent age from a single image without facial landmarks. International Journal of Computer Vision (IJCV)
  • Rowland et al (1995) Rowland D, Perrett D, et al (1995) Manipulating facial appearance through shape and color. Computer Graphics and Applications, IEEE 15(5):70–76
  • Shu et al (2015) Shu X, Tang J, Lai H, Liu L, Yan S (2015) Personalized age progression with aging dictionary. In: ICCV, pp 3970–3978
  • Suo et al (2010) Suo J, Zhu SC, Shan S, Chen X (2010) A compositional and dynamic model for face aging. PAMI 32(3):385–401
  • Suo et al (2012) Suo J, Chen X, Shan S, Gao W, Dai Q (2012) A concatenational graph evolution aging model. PAMI 34(11):2083–2096
  • Taylor and Hinton (2009) Taylor GW, Hinton GE (2009) Factored conditional restricted boltzmann machines for modeling motion style. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 1025–1032
  • Tsai et al (2014) Tsai MH, Liao YK, Lin IC (2014) Human face aging with guided prediction and detail synthesis. Multimedia tools and applications 72(1):801–824
  • Wang et al (2016) Wang W, Cui Z, Yan Y, Feng J, Yan S, Shu X, Sebe N (2016) Recurrent face aging. In: CVPR, pp 2378–2386
  • Yang et al (2016) Yang H, Huang D, Wang Y, Wang H, Tang Y (2016) Face aging effect simulation using hidden factor analysis joint sparse representation. TIP 25(6):2493–2507
  • Zhang et al (2017) Zhang Z, Song Y, Qi H (2017) Age progression/regression by conditional adversarial autoencoder. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)