Introduction
Age estimation has attracted much attention in many realworld applications such as video surveillance, product recommendation, internet safety for minors, etc. It aims to label a given face image with an exact age or age group. Impressive progress has been made on age estimation in the last several decades and many methods [Niu et al.2016, Gao et al.2017, Chen et al.2017, Shen et al.2017, Agustsson, Timofte, and Van Gool2017, Yang et al.2018] have been proposed. However, largescale age estimation is still a very challenging problem due to several extreme reasons. 1) Many large variations with the datasets, including illumination, pose and expression, affect the accuracy of age estimation. 2) Different people age in different ways. Thus, the mapping from agerelated features to age labels is not unique. 3) Age estimation is a finegrained recognition task and it is almost impossible for human to accurately discriminate age.
Existing models for age estimation can be roughly divided into four categories: regression models [Shen et al.2017, Agustsson, Timofte, and Van Gool2017], multiclass classification models [Rothe, Timofte, and Van Gool2015, Yang et al.2018], Ranking CNN models [Niu et al.2016, Chen et al.2017] as well as label distribution learning models [Gao et al.2017, Gao et al.2018]. By predicting the age distribution, label distribution learning (LDL) has the potential benefits of dealing with the relevance and uncertainty among different ages. Besides, label distribution learning improves the data utilization, because the given face images provide agerelated information about not only the chronological age but also its neighboring ages.
We believe that label distribution learning faces two major challenges. First, we argue that the age label distributions vary with different individuals and it is better not to assume their distribution forms like [Yang et al.2015, Gao et al.2018]
. Figure 1 depicts the detailed interpretation of this. We can see from Figure 1(a) that the aging tendencies are different for different individuals. Thus it is unreasonable to assume that the age label distributions for all ages obey Gaussian distributions with same standard deviation as Figure 1(b) shows, or with different deviations as Figure 1(c) shows. The second challenge is that label distribution learning is essentially a discrete learning process without considering the ordered information of age labels, while the change of age is an ordered and continuous process.
To address the first challenge, we propose evolutionary label distribution learning, a solution that uses a neural network to adaptively learn label distributions from the given individuals and constantly refine the learning results during evolution. Figure 1(d) shows the learnt distribution. It is clear that the age label distributions vary from different individuals and not strictly obey the Gaussian distribution. For the second challenge of label distribution learning, we propose a coupled training mechanism to jointly perform label distribution learning and regression. Regression model can capture the ordered and continuous information of age labels and regress an age value, which relieves the seconde challenge. Besides, a slack term is designated to further convert the discrete age label regression to the continuous age interval regression.
The main contributions of this work are as follows:
1) By simulating evolutionary mechanisms, we propose a Coupled Evolutionary Network (CEN) with two concurrent processes: evolutionary label distribution learning and evolutionary slack regression.
2) The proposed evolutionary label distribution learning adaptively estimates the age distributions without the strong assumptions about the form of label distribution. Benefiting from the constant evolution of the learning results, evolutionary label distribution learning generates more precise label distributions.
3) The experiments show that the combination of label distribution learning and regression achieves superior performance. Hence, we propose evolutionary slack regression to assist evolutionary label distribution learning. Besides, we introduce a slack term to further convert the discrete age label regression to the continuous age interval regression.
4) We evaluate the effectiveness of the proposed CEN on three age estimation benchmarks and consistently obtain the stateoftheart results.
Related Work
Age Estimation
Benefiting from the deep CNNs (e.g., VGG16 [sim2014], LightCNN [Wu et al.2018], ResNet [He et al.2016] and DenseNet [Huang et al.2017]
) trained on largescale age face datasets, the deep learning based age estimation methods achieve stateoftheart performance on age estimation, which can be roughly divided into four categories: regression
[Shen et al.2017, Agustsson, Timofte, and Van Gool2017], multiclass classification [Rothe, Timofte, and Van Gool2015, Can Malli, Aygun, and Kemal Ekenel2016, Yang et al.2018], Ranking CNN [Niu et al.2016, Chen et al.2017] as well as label distribution learning (LDL) [Gao et al.2017, Gao et al.2018].With the huge improvement in the performance of object recognition tasks, some researchers propose to transform age estimation into a multiclassification problem, in which different ages or age groups are regarded as independent classes. However, multiclass classification methods usually neglect the relevance and uncertainty among neighboring labels. Since age is a continuous value, to better fit the aging mechanism, a natural idea is to treat age estimation as regression task. However, due to the presence of outliers, regression methods can not achieve the satisfactory results either. The change speeds of appearance at all ages are different. To alleviate this, ranking CNN and LDL methods are proposed, in which individual classifier or label distribution for each age class is adopted. In this paper, we employ LDL based method assisted with regression.
Label Distribution Learning
Label ambiguity and redundancy hinder the improvement for the object recognition and classification performance. Label distribution learning (LDL) [Geng and Ji2013, Geng, Yin, and Zhou2013] addresses this problem by learning the distribution over each label from the description of the instance. LDL has been widely used in many applications, such as expression recognition [Zhou, Xue, and Geng2015], public video surveillance [Zhang, Wang, and Geng2015] as well as age estimation [Geng, Yin, and Zhou2013, Yang, Geng, and Zhou2016, Gao et al.2017, Gao et al.2018]. [Geng, Yin, and Zhou2013] deals with age estimation by learning the age label distribution. [Gao et al.2018] analyzes that the ranking method is learning label distribution implicitly and assumes that the age label distribution is consistent with a Gaussian distribution with fixed size of standard deviation. However, since the age characteristics of different ages are different, age labels cannot be identical for all ages. To deal with it, we propose a neural network model to learn the mapping from the given image to its age label distribution.
Our Approach
In this section, we firstly give the state of problem definition. Then, we describe the two components in the proposed coupled evolutionary network (CEN). Finally, we detail the training and testing procedures, following with the network architecture.
Problem Formulation
In the setting of CEN, we define as the ages of the training set, where and are the minimal and maximum ages, respectively. Suppose is the training set, where we omit the instance indices for simplification. Among them, denotes the input instance and is the age of .
represents the corresponding onehot vector of
and denotes the normalized age label, which is formulated as:(1) 
We are interested to learn a mapping from the instance to its accurate age .
Inspired by the biological evolutionary mechanism, we propose a coupled evolutionary network (CEN) with two concurrent processes: evolutionary label distribution learning and evolutionary slack regression. The overall framework of CEN is depicted in Figure 2. We first obtain an initial ancestor CEN. Then, with the experience and knowledge transferred by the ancestor CEN, the offspring CEN utilizes and incrementally evolves itself to achieve better performance. After each evolution, the offspring CEN will be treated as the new ancestor CEN for the next evolution. The predicted age is obtained only with the last CEN.
Evolutionary Label Distribution Learning
Previous researches usually make strong assumptions on the form of the label distributions, which may not be able to truly and flexibly reflect the reality. We address this problem by introducing evolutionary label distribution learning, a solution that uses a neural network to adaptively learn and constantly refine the age label distributions during evolution.
The initial ancestor CEN takes the given instance as the input and learn to predict the age label distribution of . Then, the offspring CEN inherits all the age label distributions from its ancestor CEN and updates itself over the entire training set . After each evolution, the offspring CEN will be treated as the new ancestor for the next CEN .
The Initial Ancestor
We first utilize the initial ancestor coupled evolutionary network to adaptively learn the initial age label distributions. Specifically, given an input instance , learns the mapping from
to the logits
by:(2) 
where is the output of the last pooling layer of , and are the weights and biases of a fully connected layer, respectively.
The predicted age label distribution can be formulated as:
(3) 
where is the temperature parameter, dominating the softness of the predicted distribution. The larger , the softer distribution is obtained. We set and employ cross entropy as the supervised signal to learn the initial ancestor for evolutionary label distribution learning:
(4) 
where denotes the ith element of the one hot vector .
The goal of the initial ancestor for label distribution learning is to minimize the cross entropy loss. The predicted label distribution will be transferred to the offspring network .
The Evolutionary Procedure
After the first evolution, we obtain the preliminary age label distribution without making strong assumptions for the form of the distribution. Then the preliminary age label distribution acts as new experience and knowledge to be transferred to the next evolution.
In th evolution, where , the predicted age label distribution of is calculated by Eq.(4). We set and employ KullbackLeibler (KL) divergence to transfer the age label distribution from (1)th evolution to the current evolution:
(5) 
Since is a constant, Eq.(5) can be further simplified as follows:
(6) 
It is worth nothing that there is a discrepancy between the real label distribution and the predicted label distribution of . Using only Eq.(6) in the evolutionary procedure may obtain inferior performance. Consequently, we employ an additional cross entropy term to rectify such discrepancy.
The final supervision for evolutionary procedure contains both the predicted age label distributions and the target age labels, which can be formulated as:
(7) 
where is the tradeoff parameter to balance the importance of KL loss and cross entropy loss.
Evolutionary Slack Regression
Evolutionary label distribution learning is essentially a discrete learning process without considering the ordered information of age labels. However, the change of age is an ordered and continuous process. Accordingly, we propose a new regression method, named evolutionary slack regression, to transfer the ordered and continuous age information of the previous evolution to the current evolution. Specially, a slack term is introduced into evolutionary slack regression, which converts the discrete age label regression to the continues age interval regression.
The initial ancestor CEN takes the given instance as the input and produces a roughly regressed age. Then, the absolute difference between the regressed age and the groundtruth age is treated as knowledge to be inherited by the offspring CEN . Similarly, after each evolution, the offspring CEN will be treated as the new ancestor for the next evolution.
The Initial Ancestor
For regression, learns the mapping from the given instance to a real value :
(8) 
where and are the weights and biases of a fully connected layer, respectively.
We train the initial ancestor with loss to minimize the distance between the regressed age and the groundtruth age .
(9) 
The Evolutionary Procedure
We observe that the Eq.(9) is essentially a discrete regression process, because the target age is a discrete value. In order to deliver the ordered and continuous age information of the ancestor CEN to the offspring CEN , we introduce a slack term into the regression of , which is defined as follows:
(10) 
We assume that is superior to , which means the regression error of should not exceed :
(11) 
Eq.(11) can be rewritten as:
(12) 
Above all, we define a slack loss as follows:
(13) 
Eq.(13) pushes the regressed age of lies in a continuous age interval , but not strictly equal to a discrete age label . From this perspective, by introducing the slack term into the regression, we convert the discrete age label regression to the continuous age interval regression in age estimation.
At each evolution, we minimize the slack loss and find the can gradually decrease. Specially, a slack term is introduced into evolutionary slack regression, which further converts the discrete age label regression to the continuous age interval regression.
Training Framework
The training procedure of CEN contains both evolutionary label distribution learning and evolutionary slack regression. It can be divided into two parts: the initial ancestor and the evolutionary procedure.
The total supervised loss for the initial ancestor is
(14) 
where is the tradeoff parameter to balance the importance of the initial label distribution learning and the regression.
The total supervised loss for the evolutionary procedure is
(15) 
where and is the tradeoff parameter to balance the importance of evolutionary label distribution learning and the slack regression.
Age Estimation in Testing
In the testing phase, for a given instance, we use to denote the estimated age of evolutionary label distribution learning, which can be written as:
(16) 
The estimated age of evolutionary slack regression can be formulated as
(17) 
where and are the minimal and maximum ages of the training set, respectively.
Then, the final estimated age is the average of the above two results.
(18) 
Network Architecture
ResNet10 and ResNet18 [He et al.2016] are adopted as the backbone networks of the proposed method. In particular, two fully connected layers are inserted immediately after the last pooling layer for evolutionary label distribution learning and evolutionary slack regression respectively. Considering the size and efficiency of ResNet10 and ResNet18, we further halve the number of feature channels and obtain two tiny variations, named ResNet10Tiny and ResNet18Tiny respectively. The details are listed in Table 7.
Experiments
Dataset and Protocol
We evaluate the proposed CEN on both apparent age and real age datasets.
IMDBWIKI [Rothe, Timofte, and Van Gool2015] is the largest publicly available dataset of facial images with age and gender labels. It consists of 523,051 facial images in total, 460,723 images from IMDB and 62,328 from Wikipedia. The ages of IMDBWIKI dataset range from 0 to 100 years old. Although it is the largest dataset for age estimation, IMDBWIKI is still not suitable for evaluation due to the existing of much noise. Thus, like most previous works [Yang et al.2018], we utilize IMDBWIKI only for pretraining.
ChaLearn15 [Escalera et al.2015] is the first dataset for apparent age estimation, which contains 4,691 color images, 2,476 for training, 1,136 for validation and the rest 1087 for testing. ChaLearn15 comes from the first competition track ChaLearn LAP 2015. Each image is labeled using the online voting platform. We follow the protocol in [Rothe and etal.2016] to train on the training set and evaluate on the validation set.
Morph [Ricanek and Tesafaye2006] is the most popular benchmark for real age estimaion, which contains 55,134 color images of 13,617 subjects with age and gender information. The age of Morph ranges from 16 to 77 years old. It has four images of each subject on average. Classical protocol 8020 split is used for Morph.
MegaAgeAsian [Zhang et al.2017] is a newly released largescale facial age dataset. Different from most of facial age datasets that only contain faces of Westerners, there are only faces of Asians in MegaAgeAsian dataset. It consists of 40, 000 images encompassing ages from 0 to 70. Following [Zhang et al.2017], we reserve 3,945 images for testing.
Evaluation Metric
We evaluate the performance of the proposed CEN with MAE, error and CA(n).
Mean Absolute Error (MAE) is widely used to evaluate the performance of age estimation. It is defined as the average of distances between the groundtruth and predicted age, which can be written as:
(19) 
where and denote the predicted age and the groundtruth of the th testing instance, respectively.
error
is the evaluation metric for apparent age estimation, which can be formulated as:
(20) 
where , and denote the predicted age, mean age and standard deviation of the th testing instance, respectively.
Cumulative Accuracy (CA) is employed as the evaluation metric for MegaAgeAsian, which can be calculated as:
(21) 
where is the number of the test images whose absolute estimated error is smaller than . We report CA(3), CA(5), CA(7) as [Zhang et al.2017, Yang et al.2018] in our experiments.
Implementation Details
Preprocessing.
We utilize multitask cascaded CNN [Zhang et al.2016] to detect and align face images. Then all the images are resized into 224
224 as the inputs. Besides, data augmentation is important to deep neural networks for age estimation. We augment the training data by: (a) random resized cropping with the aspect ratio from 0.8 to 1.25 and the scale from 0.8 to 1.0; (b) random horizontal flipping with the probability of 0.5.
Training Details.
All the network architectures used in CEN are pretrained on the IMDBWIKI dataset by Eq.(14
). We employ SGD optimizer and set the initial learning rate, the momentum and the weight decay to 0.01, 0.9 and 1e4, respectively. The learning rate is decreased by a factor of 10 every 40 epochs. Each model is trained totally 160 epochs with the minibatch size of 128. And then the pretrained models on IMDBWIKI are used as initializations on the target age datasets including ChaLearn15, Morph and MegaageAsian. All the networks are optimized by SGD and the initial learning rate, the momentum and the weight decay are set to 0.001, 0.9 and 1e4, respectively. If not specific, we employ
, and in our experiments. The learning rate is decreased by a factor of 10 every 40 epochs. Each model is trained totally 160 epochs with the minibatch size of 128.Analysis of Coupled Training Mechanism
In this subsection, we explore the coupled training mechanism of label distribution learning and regression. Table 1 shows the comparison results. The first and second rows are the baseline results of using only label distribution learning(LDL) and regression(Reg), respectively. The last three rows present the coupled training performance(LDL+Reg). Specifically, with coupled training mechanism, , and are calculated by Eq.(16),Eq.(17) and Eq.(18), respectively, denoting the outputs of label distribution learning, regression and average of the above outputs.
Methods  Morph  MegaAgeAsian  

MAE  CA(3)  CA(5)  CA(7)  
Reg  2.578  58.22  79.01  89.03 
LDL  2.323  59.14  78.70  89.26 
LDL+Reg  2.243  60.57  79.77  90.21 
LDL+Reg  2.231  59.14  79.24  89.62 
LDL+Reg  2.220  60.83  80.11  90.52 
Obviously, the proposed coupled training mechanism (LDL+Reg) achieves superior performance than training only with LDL or Reg. For example, compared with Reg, LDL+Reg gains 0.335 improvement of MAE on Morph. And the average of the label distribution learning and regression terms further gains 0.023 and 0.011 improvements of MAE compared with and , respectively. It indicates that the coupled training mechanism can significantly improve the performance of age estimation task, therefore we choose as age estimation results in the following experiments.
Comparisons with StateoftheArts
We compare the proposed CEN with previous stateoftheart methods on Morph, ChaLearn and MegaAgeAsian datasets. The proposed CEN performs mostly the best among all the stateoftheart methods.
Table 2 shows the MAEs of the individual methods on Morph. Benefiting from the adaptive learning of label distribution and the coupled evolutionary mechanism, our CEN, based on ResNet18, obtains 1.905 on Morph and outperforms the previous stateoftheart method from ThinAgeNet [Gao et al.2018].
Methods  Pretrained  Morph 
MAE  
ORCNN[Niu et al.2016]    3.34 
DEX[Rothe and etal.2016]  IMDBWIKI  2.68 
Ranking [Chen et al.2017]  Audience  2.96 
Posterior[Zhang et al.2017]  IMDBWIKI  2.52 
DRFs[Shen et al.2017]    2.17 
SSRNet[Yang et al.2018]  IMDBWIKI  2.52 
MV Loss[Pan et al.2018]  IMDBWIKI  2.16 
TinyAgeNet [Gao et al.2018]  MSCeleb1M  2.291 
ThinAgeNet [Gao et al.2018]  MSCeleb1M  1.969 
CEN(ResNet10Tiny)  IMDBWIKI  2.229 
CEN(ResNet10)  IMDBWIKI  2.134 
CEN(ResNet18Tiny)  IMDBWIKI  2.069 
CEN(ResNet18)  IMDBWIKI  1.905 

Used partial data of the dataset;
In addition to real age estimation, apparent age estimation is also important. We conduct experiments on ChaLearn15 to validate the performance of our method on apparent age estimation. Since there are only 2,476 training data in ChaLearn15, huge network may lead to overfitting. Therefore, we choose ResNet10Tiny with 1.2M parameters as the backbone for evaluations. Table 3 shows the comparison results of MAE and error. The proposed method creates a new stateoftheart 3.052 of MAE. The error 0.274 is also close to the best competition result 0.272 (ThinAgeNet). Note that the parameters of CEN(ResNet10Tiny) is 1.2M, less than 3.7M of ThinAgeNet.
Methods  Pretrained  ChaLearn15  Param  

MAE  error  
DEX[Rothe and etal.2016]    5.369  0.456  134.6M 
DEX[Rothe and etal.2016]  IMDBWIKI  3.252  0.282  134.6M 
ARN (Agustsson et al. 2017)  IMDBWIKI  3.153    134.6M 
TinyAgeNet [Gao et al.2018]  MSCeleb1M  3.427  0.301  0.9M 
ThinAgeNet [Gao et al.2018]  MSCeleb1M  3.135  0.272  3.7M 
CEN(ResNet10Tiny)  IMDBWIKI  3.052  0.274  1.2M 

Used partial data of the dataset;
Besides, we evaluate the performance of CEN on the MegaAgeAsian dataset, which only contains Asians. Table 4 reports the comparison results of CA(3), CA(5) and CA(7). Our CEN(ResNet18Tiny) achieves 64.23%, 82.15% and 90.80%, which are the new stateofthearts, and obtains 0.22%, 0.80% and 1.18% improvements compared with previous best method Posterior[Zhang et al.2017].
Methods  Pretrained  MegaAgeAsian  

CA(3)  CA(5)  CA(7)  
Posterior[Zhang et al.2017]  IMDBWIKI  62.08  80.43  90.42 
Posterior[Zhang et al.2017]  MSCeleb1M  64.23  82.15  90.80 
MobileNet[Yang et al.2018]  IMDBWIKI  44.0  60.6   
DenseNet[Yang et al.2018]  IMDBWIKI  51.7  69.4   
SSRNet[Yang et al.2018]  IMDBWIKI  54.9  74.1   
CEN(ResNet10Tiny)  IMDBWIKI  63.60  82.36  91.80 
CEN(ResNet10)  IMDBWIKI  62.86  81.47  91.34 
CEN(ResNet18Tiny)  IMDBWIKI  64.45  82.95  91.98 
CEN(ResNet18)  IMDBWIKI  63.73  82.88  91.64 
The Superiority of Evolutionary Mechanism
In this subsection, we qualitatively and quantitatively demonstrate the superiority of the proposed evolutionary mechanism. Figure 3 depicts the evolutions of age label distributions. As shown in the second column of Figure 3(b), with the given instance who is 45 years old, the first predicted distribution can be approximately regarded as a bimodal distribution with two peaks 41 and 51, which is ambiguous for age estimation. After 1 time evolution, the predicted distribution is refined from bimodal distribution to unimodal distribution with the single peak 48. After 2 times evolution, the peak of unimodal distribution moves from 48 to 45, which is the true age of the input instance. This movement indicates the effectiveness of the additional cross entropy term in Eq.(7), which aims to rectify the discrepancy between the real label distribution and the predicted label distribution. More results are shown in Figure 4 and Figure 5.
Backbones  Morph  MegaAgeAsian  

MAE  CA(3)  CA(5)  CA(7)  
CEN(ResNet10Tiny)  t=1  2.446  60.52  80.13  90.64 
t=2  2.300  62.01  81.90  91.64  
t=3  2.241  63.14  82.31  91.84  
t=4  2.229  63.60  82.36  91.80  
CEN(ResNet10)  t=1  2.321  59.57  79.44  89.39 
t=2  2.207  61.91  81.18  91.16  
t=3  2.150  62.86  81.47  91.34  
t=4  2.134  62.78  81.77  91.00  
CEN(ResNet18Tiny)  t=1  2.304  61.88  81.31  91.34 
t=2  2.136  63.57  82.00  91.46  
t=3  2.069  64.52  82.03  91.70  
t=4  2.074  64.45  82.95  91.98  
CEN(ResNet18)  t=1  2.220  60.83  80.11  90.52 
t=2  1.996  62.42  82.75  91.59  
t=3  1.905  63.31  83.11  92.28  
t=4  1.919  63.73  82.88  91.64 
In addition, we show quantitative experimental results of evolutionary mechanism on Morph and MegaAgeAsian in Table 5. We observe that the performance of all the network architectures will increase through evolution. For example, after 2 time evolutions (from to ), the CA(7) for CEN(ResNet10Tiny), CEN(ResNet10), CEN(ResNet18Tiny) and CEN(ResNet18) on MegaAgeAsian improve from 90.64%, 89.39%, 91.34% and 90.52% to 91.84%, 91.34%, 91.70% and 92.28%, respectively. It demonstrates the superiority of the proposed evolutionary mechanism. Specifically, there is a significant improvement from the first evolution() to the second evolution(), which is mainly because of the additional employment of KullbackLeibler (KL) divergence and the slack term. We also observe that the best results are achieved in 3th evolution or 4th evolution, indicating the boosting is saturated in the evolutionary procedure.
Additional visualization results of the evolutionary age label distributions on Morph and MegaAgeAsian are presented in Figure 4 and Figure 5.
Ablation Study
In this section, we explore the influences of three hyperparameters , and for CEN. All the ablation studies are trained on Morph with ResNet18 model.
Influence of Temperature Parameters .
The temperature parameter plays an important role in the age distribution estimation. Figure 3 provides a schematic illustration of the influence of . In Figure 3(a), from left to right, each column presents the age label distributions when . We observe that works better in our CEN than other lower or higher temperatures. To be specific, when , the negative logits are mostly ignored, even though they may convey useful information about the knowledge from the ancestor CEN. While or would suppress the probability of peak in the age label distribution, which contributes to misleading during optimization.
Besides, we quantitatively compare the MAE on Morph with different . Specifically, we fix to 0.5, to 2 and report results with ranging from 1 to 5 in Table 6. Apparently, when , we obtain the best result on MAE 1.905. Thus, we choose to use in our experiments.
Hyperparam  Morph  Hyperparam  Morph  Hyperparam  Morph  

MAE  MAE  MAE  
1  0.5  4  2.096  2  0.25  4  1.946  2  0.5  1  1.965 
2  0.5  4  1.905  2  0.50  4  1.905  2  0.5  2  1.962 
3  0.5  4  1.941  2  0.75  4  1.921  2  0.5  3  1.922 
4  0.5  4  1.970  2  1.00  4  1.952  2  0.5  4  1.905 
                2  0.5  5  1.933 
Influence of Hyperparameters .
We use the hyperparameter to balance the importance of the cross entropy and KullbackLeibler (KL) divergence losses in evolutionary label distribution learning. We fix the to 2, to 2 and report results with from 0.25 to 1.00 in Table 6. When
, we obtain the best result, which indicates that both the cross entropy loss and KullbackLeibler divergence loss are equally important (
) in our method.Influence of Hyperparameters .
We use the hyperparameter to balance the importance of the evolutionary label distribution learning and evolutionary slack regression in the our CEN. We fix the to 2, to 0.5 and report results with from 1 to 4 in Table 6. We can see that when , CEN performs the best.
Conclusion
In this paper, we propose a Coupled Evolutionary Network (CEN) for age estimation, which contains two concurrent processes: evolutionary label distribution learning and evolutionary slack regression. The former contributes to adaptively learn and refines the age label distributions without making strong assumptions about the distribution patterns in an evolutionary manner. The later concentrates on the ordered and continuous information of age labels, converting the discrete age label regression to the continuous age interval regression. Experimental results on Morph, ChaLearn15 and MegaAgeAsian datasets show the superiority of CEN.
References

[Agustsson, Timofte, and
Van Gool2017]
Agustsson, E.; Timofte, R.; and Van Gool, L.
2017.
Anchored regression networks applied to age estimation and super resolution.
In ICCV, 1652–1661.  [Can Malli, Aygun, and Kemal Ekenel2016] Can Malli, R.; Aygun, M.; and Kemal Ekenel, H. 2016. Apparent age estimation using ensemble of deep learning models. In CVPRW, 9–16.
 [Chen et al.2017] Chen, S.; Zhang, C.; Dong, M.; Le, J.; and Rao, M. 2017. Using rankingcnn for age estimation. In CVPR, 742–751.
 [Escalera et al.2015] Escalera, S.; Fabian, J.; Pardo, P.; Baró, X.; Gonzalez, J.; Escalante, H. J.; Misevic, D.; Steiner, U.; and Guyon, I. 2015. Chalearn looking at people 2015: Apparent age and cultural event recognition datasets and results. In ICCVW, 1–9.
 [Gao et al.2017] Gao, B.B.; Xing, C.; Xie, C.W.; Wu, J.; and Geng, X. 2017. Deep label distribution learning with label ambiguity. IEEE TIP 26(6):2825–2838.
 [Gao et al.2018] Gao, B.B.; Zhou, H.Y.; Wu, J.; and Geng, X. 2018. Age estimation using expectation of label distribution learning. In IJCAI, 712–718.
 [Geng and Ji2013] Geng, X., and Ji, R. 2013. Label distribution learning. In ICDMW, 377–383.
 [Geng, Yin, and Zhou2013] Geng, X.; Yin, C.; and Zhou, Z.H. 2013. Facial age estimation by learning from label distributions. IEEE TPAMI 35(10):2401–2412.
 [He et al.2016] He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR, 770–778.
 [Huang et al.2017] Huang, G.; Liu, Z.; Van Der Maaten, L.; and Weinberger, K. Q. 2017. Densely connected convolutional networks. In CVPR, 2261–2269.
 [Niu et al.2016] Niu, Z.; Zhou, M.; Wang, L.; Gao, X.; and Hua, G. 2016. Ordinal regression with multiple output cnn for age estimation. In CVPR, 4920–4928.

[Pan et al.2018]
Pan, H.; Han, H.; Shan, S.; and Chen, X.
2018.
Meanvariance loss for deep age estimation from a face.
In CVPR, 5285–5294.  [Ricanek and Tesafaye2006] Ricanek, K., and Tesafaye, T. 2006. Morph: A longitudinal image database of normal adult ageprogression. In FG, 341–345.
 [Rothe and etal.2016] Rothe, R., and etal. 2016. Dex: Deep expectation of apparent age from a single image. In ICCVW, 252–257.
 [Rothe, Timofte, and Van Gool2015] Rothe, R.; Timofte, R.; and Van Gool, L. 2015. Dex: Deep expectation of apparent age from a single image. In ICCVW, 10–15.
 [Shen et al.2017] Shen, W.; Guo, Y.; Wang, Y.; Zhao, K.; Wang, B.; and Yuille, A. 2017. Deep regression forests for age estimation. arXiv.
 [sim2014] 2014. arXiv.
 [Wu et al.2018] Wu, X.; He, R.; Sun, Z.; and Tan, T. 2018. A light cnn for deep face representation with noisy labels. IEEE TIFS 13(11):2884–2896.
 [Yang et al.2015] Yang, X.; Gao, B.B.; Xing, C.; Huo, Z.W.; Wei, X.S.; Zhou, Y.; Wu, J.; and Geng, X. 2015. Deep label distribution learning for apparent age estimation. In ICCVW, 102–108.
 [Yang et al.2018] Yang, T.Y.; Huang, Y.H.; Lin, Y.Y.; Hsiu, P.C.; and Chuang, Y.Y. 2018. Ssrnet: A compact soft stagewise regression network for age estimation. In IJCAI, 1078–1084.
 [Yang, Geng, and Zhou2016] Yang, X.; Geng, X.; and Zhou, D. 2016. Sparsity conditional energy label distribution learning for age estimation. In IJCAI, 2259–2265.

[Zhang et al.2016]
Zhang, K.; Zhang, Z.; Li, Z.; and Qiao, Y.
2016.
Joint face detection and alignment using multitask cascaded convolutional networks.
IEEE SPL 23(10):1499–1503.  [Zhang et al.2017] Zhang, Y.; Liu, L.; Li, C.; et al. 2017. Quantifying facial age by posterior of age comparisons. arXiv.
 [Zhang, Wang, and Geng2015] Zhang, Z.; Wang, M.; and Geng, X. 2015. Crowd counting in public video surveillance by label distribution learning. NLM 166:151–163.
 [Zhou, Xue, and Geng2015] Zhou, Y.; Xue, H.; and Geng, X. 2015. Emotion distribution recognition from facial expressions. In ACM MM, 1247–1250.
Comments
There are no comments yet.