A Coupled Evolutionary Network for Age Estimation

09/20/2018
by   Peipei Li, et al.
0

Age estimation of unknown persons is a challenging pattern analysis task due to the lacking of training data and various aging mechanisms for different people. Label distribution learning-based methods usually make distribution assumptions to simplify age estimation. However, age label distributions are often complex and difficult to be modeled in a parameter way. Inspired by the biological evolutionary mechanism, we propose a Coupled Evolutionary Network (CEN) with two concurrent evolutionary processes: evolutionary label distribution learning and evolutionary slack regression. Evolutionary network learns and refines age label distributions in an iteratively learning way. Evolutionary label distribution learning adaptively learns and constantly refines the age label distributions without making strong assumptions on the distribution patterns. To further utilize the ordered and continuous information of age labels, we accordingly propose an evolutionary slack regression to convert the discrete age label regression into the continuous age interval regression. Experimental results on Morph, ChaLearn15 and MegaAge-Asian datasets show the superiority of our method.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 10

page 11

02/23/2022

Towards Speaker Age Estimation with Label Distribution Learning

Existing methods for speaker age estimation usually treat it as a multi-...
11/06/2016

Deep Label Distribution Learning with Label Ambiguity

Convolutional Neural Networks (ConvNets) have achieved excellent recogni...
07/23/2019

Deep Differentiable Random Forests for Age Estimation

Age estimation from facial images is typically cast as a label distribut...
03/15/2021

Evaluating distributional regression strategies for modelling self-reported sexual age-mixing

The age dynamics of sexual partnership formation determine patterns of s...
04/01/2022

Unimodal-Concentrated Loss: Fully Adaptive Label Distribution Learning for Ordinal Regression

Learning from a label distribution has achieved promising results on ord...
03/30/2019

UVA: A Universal Variational Framework for Continuous Age Analysis

Conventional methods for facial age analysis tend to utilize accurate ag...
05/22/2018

Speeding-up Age Estimation in Intelligent Demographics System via Network Optimization

Age estimation is a difficult task which requires the automatic detectio...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Figure 1: Different label distribution assumptions for age estimation. (a) Aging speed of young-aged and old-aged are faster than middle-aged. (b) Assume that the age label distribution , where is same for all ages. (c) Assume that the age label distribution , where is different at different age. (d) Learnt distribution by the proposed CEN.

Age estimation has attracted much attention in many real-world applications such as video surveillance, product recommendation, internet safety for minors, etc. It aims to label a given face image with an exact age or age group. Impressive progress has been made on age estimation in the last several decades and many methods [Niu et al.2016, Gao et al.2017, Chen et al.2017, Shen et al.2017, Agustsson, Timofte, and Van Gool2017, Yang et al.2018] have been proposed. However, large-scale age estimation is still a very challenging problem due to several extreme reasons. 1) Many large variations with the datasets, including illumination, pose and expression, affect the accuracy of age estimation. 2) Different people age in different ways. Thus, the mapping from age-related features to age labels is not unique. 3) Age estimation is a fine-grained recognition task and it is almost impossible for human to accurately discriminate age.

Existing models for age estimation can be roughly divided into four categories: regression models [Shen et al.2017, Agustsson, Timofte, and Van Gool2017], multi-class classification models [Rothe, Timofte, and Van Gool2015, Yang et al.2018], Ranking CNN models [Niu et al.2016, Chen et al.2017] as well as label distribution learning models [Gao et al.2017, Gao et al.2018]. By predicting the age distribution, label distribution learning (LDL) has the potential benefits of dealing with the relevance and uncertainty among different ages. Besides, label distribution learning improves the data utilization, because the given face images provide age-related information about not only the chronological age but also its neighboring ages.

We believe that label distribution learning faces two major challenges. First, we argue that the age label distributions vary with different individuals and it is better not to assume their distribution forms like [Yang et al.2015, Gao et al.2018]

. Figure 1 depicts the detailed interpretation of this. We can see from Figure 1(a) that the aging tendencies are different for different individuals. Thus it is unreasonable to assume that the age label distributions for all ages obey Gaussian distributions with same standard deviation as Figure 1(b) shows, or with different deviations as Figure 1(c) shows. The second challenge is that label distribution learning is essentially a discrete learning process without considering the ordered information of age labels, while the change of age is an ordered and continuous process.

To address the first challenge, we propose evolutionary label distribution learning, a solution that uses a neural network to adaptively learn label distributions from the given individuals and constantly refine the learning results during evolution. Figure 1(d) shows the learnt distribution. It is clear that the age label distributions vary from different individuals and not strictly obey the Gaussian distribution. For the second challenge of label distribution learning, we propose a coupled training mechanism to jointly perform label distribution learning and regression. Regression model can capture the ordered and continuous information of age labels and regress an age value, which relieves the seconde challenge. Besides, a slack term is designated to further convert the discrete age label regression to the continuous age interval regression.

The main contributions of this work are as follows:

1) By simulating evolutionary mechanisms, we propose a Coupled Evolutionary Network (CEN) with two concurrent processes: evolutionary label distribution learning and evolutionary slack regression.

2) The proposed evolutionary label distribution learning adaptively estimates the age distributions without the strong assumptions about the form of label distribution. Benefiting from the constant evolution of the learning results, evolutionary label distribution learning generates more precise label distributions.

3) The experiments show that the combination of label distribution learning and regression achieves superior performance. Hence, we propose evolutionary slack regression to assist evolutionary label distribution learning. Besides, we introduce a slack term to further convert the discrete age label regression to the continuous age interval regression.

4) We evaluate the effectiveness of the proposed CEN on three age estimation benchmarks and consistently obtain the state-of-the-art results.

Related Work

Age Estimation

Benefiting from the deep CNNs (e.g., VGG-16 [sim2014], LightCNN [Wu et al.2018], ResNet [He et al.2016] and DenseNet [Huang et al.2017]

) trained on large-scale age face datasets, the deep learning based age estimation methods achieve state-of-the-art performance on age estimation, which can be roughly divided into four categories: regression

[Shen et al.2017, Agustsson, Timofte, and Van Gool2017], multi-class classification [Rothe, Timofte, and Van Gool2015, Can Malli, Aygun, and Kemal Ekenel2016, Yang et al.2018], Ranking CNN [Niu et al.2016, Chen et al.2017] as well as label distribution learning (LDL) [Gao et al.2017, Gao et al.2018].

With the huge improvement in the performance of object recognition tasks, some researchers propose to transform age estimation into a multi-classification problem, in which different ages or age groups are regarded as independent classes. However, multi-class classification methods usually neglect the relevance and uncertainty among neighboring labels. Since age is a continuous value, to better fit the aging mechanism, a natural idea is to treat age estimation as regression task. However, due to the presence of outliers, regression methods can not achieve the satisfactory results either. The change speeds of appearance at all ages are different. To alleviate this, ranking CNN and LDL methods are proposed, in which individual classifier or label distribution for each age class is adopted. In this paper, we employ LDL based method assisted with regression.

Figure 2: Overview of the proposed Coupled Evolutionary Network for age estimation. The initial ancestor network takes the given instance as the input and produces the initial age label distribution as well as the initial regressed age. The offspring network inherits the experience and knowledge of its ancestor to boost itself.

Label Distribution Learning

Label ambiguity and redundancy hinder the improvement for the object recognition and classification performance. Label distribution learning (LDL) [Geng and Ji2013, Geng, Yin, and Zhou2013] addresses this problem by learning the distribution over each label from the description of the instance. LDL has been widely used in many applications, such as expression recognition [Zhou, Xue, and Geng2015], public video surveillance [Zhang, Wang, and Geng2015] as well as age estimation [Geng, Yin, and Zhou2013, Yang, Geng, and Zhou2016, Gao et al.2017, Gao et al.2018]. [Geng, Yin, and Zhou2013] deals with age estimation by learning the age label distribution. [Gao et al.2018] analyzes that the ranking method is learning label distribution implicitly and assumes that the age label distribution is consistent with a Gaussian distribution with fixed size of standard deviation. However, since the age characteristics of different ages are different, age labels cannot be identical for all ages. To deal with it, we propose a neural network model to learn the mapping from the given image to its age label distribution.

Our Approach

In this section, we firstly give the state of problem definition. Then, we describe the two components in the proposed coupled evolutionary network (CEN). Finally, we detail the training and testing procedures, following with the network architecture.

Problem Formulation

In the setting of CEN, we define as the ages of the training set, where and are the minimal and maximum ages, respectively. Suppose is the training set, where we omit the instance indices for simplification. Among them, denotes the input instance and is the age of .

represents the corresponding one-hot vector of

and denotes the normalized age label, which is formulated as:

(1)

We are interested to learn a mapping from the instance to its accurate age .

Inspired by the biological evolutionary mechanism, we propose a coupled evolutionary network (CEN) with two concurrent processes: evolutionary label distribution learning and evolutionary slack regression. The overall framework of CEN is depicted in Figure 2. We first obtain an initial ancestor CEN. Then, with the experience and knowledge transferred by the ancestor CEN, the offspring CEN utilizes and incrementally evolves itself to achieve better performance. After each evolution, the offspring CEN will be treated as the new ancestor CEN for the next evolution. The predicted age is obtained only with the last CEN.

Evolutionary Label Distribution Learning

Previous researches usually make strong assumptions on the form of the label distributions, which may not be able to truly and flexibly reflect the reality. We address this problem by introducing evolutionary label distribution learning, a solution that uses a neural network to adaptively learn and constantly refine the age label distributions during evolution.

The initial ancestor CEN takes the given instance as the input and learn to predict the age label distribution of . Then, the offspring CEN inherits all the age label distributions from its ancestor CEN and updates itself over the entire training set . After each evolution, the offspring CEN will be treated as the new ancestor for the next CEN .

The Initial Ancestor

We first utilize the initial ancestor coupled evolutionary network to adaptively learn the initial age label distributions. Specifically, given an input instance , learns the mapping from

to the logits

by:

(2)

where is the output of the last pooling layer of , and are the weights and biases of a fully connected layer, respectively.

The predicted age label distribution can be formulated as:

(3)

where is the temperature parameter, dominating the softness of the predicted distribution. The larger , the softer distribution is obtained. We set and employ cross entropy as the supervised signal to learn the initial ancestor for evolutionary label distribution learning:

(4)

where denotes the i-th element of the one hot vector .

The goal of the initial ancestor for label distribution learning is to minimize the cross entropy loss. The predicted label distribution will be transferred to the offspring network .

The Evolutionary Procedure

After the first evolution, we obtain the preliminary age label distribution without making strong assumptions for the form of the distribution. Then the preliminary age label distribution acts as new experience and knowledge to be transferred to the next evolution.

In -th evolution, where , the predicted age label distribution of is calculated by Eq.(4). We set and employ Kullback-Leibler (KL) divergence to transfer the age label distribution from (-1)-th evolution to the current evolution:

(5)

Since is a constant, Eq.(5) can be further simplified as follows:

(6)

It is worth nothing that there is a discrepancy between the real label distribution and the predicted label distribution of . Using only Eq.(6) in the evolutionary procedure may obtain inferior performance. Consequently, we employ an additional cross entropy term to rectify such discrepancy.

The final supervision for evolutionary procedure contains both the predicted age label distributions and the target age labels, which can be formulated as:

(7)

where is the trade-off parameter to balance the importance of KL loss and cross entropy loss.

Evolutionary Slack Regression

Evolutionary label distribution learning is essentially a discrete learning process without considering the ordered information of age labels. However, the change of age is an ordered and continuous process. Accordingly, we propose a new regression method, named evolutionary slack regression, to transfer the ordered and continuous age information of the previous evolution to the current evolution. Specially, a slack term is introduced into evolutionary slack regression, which converts the discrete age label regression to the continues age interval regression.

The initial ancestor CEN takes the given instance as the input and produces a roughly regressed age. Then, the absolute difference between the regressed age and the ground-truth age is treated as knowledge to be inherited by the offspring CEN . Similarly, after each evolution, the offspring CEN will be treated as the new ancestor for the next evolution.

The Initial Ancestor

For regression, learns the mapping from the given instance to a real value :

(8)

where and are the weights and biases of a fully connected layer, respectively.

We train the initial ancestor with loss to minimize the distance between the regressed age and the ground-truth age .

(9)

The Evolutionary Procedure

We observe that the Eq.(9) is essentially a discrete regression process, because the target age is a discrete value. In order to deliver the ordered and continuous age information of the ancestor CEN to the offspring CEN , we introduce a slack term into the regression of , which is defined as follows:

(10)

We assume that is superior to , which means the regression error of should not exceed :

(11)

Eq.(11) can be rewritten as:

(12)

Above all, we define a slack loss as follows:

(13)

Eq.(13) pushes the regressed age of lies in a continuous age interval , but not strictly equal to a discrete age label . From this perspective, by introducing the slack term into the regression, we convert the discrete age label regression to the continuous age interval regression in age estimation.

At each evolution, we minimize the slack loss and find the can gradually decrease. Specially, a slack term is introduced into evolutionary slack regression, which further converts the discrete age label regression to the continuous age interval regression.

Training Framework

The training procedure of CEN contains both evolutionary label distribution learning and evolutionary slack regression. It can be divided into two parts: the initial ancestor and the evolutionary procedure.

The total supervised loss for the initial ancestor is

(14)

where is the trade-off parameter to balance the importance of the initial label distribution learning and the regression.

The total supervised loss for the evolutionary procedure is

(15)

where and is the trade-off parameter to balance the importance of evolutionary label distribution learning and the slack regression.

Age Estimation in Testing

In the testing phase, for a given instance, we use to denote the estimated age of evolutionary label distribution learning, which can be written as:

(16)

The estimated age of evolutionary slack regression can be formulated as

(17)

where and are the minimal and maximum ages of the training set, respectively.

Then, the final estimated age is the average of the above two results.

(18)

Network Architecture

Figure 3: The evolutions of the age label distributions with different temperature parameters , where denotes the -th evolution. With the given instance, the first, second and third rows are the predicted age label distributions of , and respectively.

ResNet10 and ResNet18 [He et al.2016] are adopted as the backbone networks of the proposed method. In particular, two fully connected layers are inserted immediately after the last pooling layer for evolutionary label distribution learning and evolutionary slack regression respectively. Considering the size and efficiency of ResNet10 and ResNet18, we further halve the number of feature channels and obtain two tiny variations, named ResNet10-Tiny and ResNet18-Tiny respectively. The details are listed in Table 7.

Experiments

Dataset and Protocol

We evaluate the proposed CEN on both apparent age and real age datasets.

IMDB-WIKI [Rothe, Timofte, and Van Gool2015] is the largest publicly available dataset of facial images with age and gender labels. It consists of 523,051 facial images in total, 460,723 images from IMDB and 62,328 from Wikipedia. The ages of IMDB-WIKI dataset range from 0 to 100 years old. Although it is the largest dataset for age estimation, IMDB-WIKI is still not suitable for evaluation due to the existing of much noise. Thus, like most previous works [Yang et al.2018], we utilize IMDB-WIKI only for pre-training.

ChaLearn15 [Escalera et al.2015] is the first dataset for apparent age estimation, which contains 4,691 color images, 2,476 for training, 1,136 for validation and the rest 1087 for testing. ChaLearn15 comes from the first competition track ChaLearn LAP 2015. Each image is labeled using the online voting platform. We follow the protocol in [Rothe and etal.2016] to train on the training set and evaluate on the validation set.

Morph [Ricanek and Tesafaye2006] is the most popular benchmark for real age estimaion, which contains 55,134 color images of 13,617 subjects with age and gender information. The age of Morph ranges from 16 to 77 years old. It has four images of each subject on average. Classical protocol 80-20 split is used for Morph.

MegaAge-Asian [Zhang et al.2017] is a newly released large-scale facial age dataset. Different from most of facial age datasets that only contain faces of Westerners, there are only faces of Asians in MegaAge-Asian dataset. It consists of 40, 000 images encompassing ages from 0 to 70. Following [Zhang et al.2017], we reserve 3,945 images for testing.

Evaluation Metric

We evaluate the performance of the proposed CEN with MAE, -error and CA(n).

Mean Absolute Error (MAE) is widely used to evaluate the performance of age estimation. It is defined as the average of distances between the ground-truth and predicted age, which can be written as:

(19)

where and denote the predicted age and the ground-truth of the -th testing instance, respectively.

-error

is the evaluation metric for apparent age estimation, which can be formulated as:

(20)

where , and denote the predicted age, mean age and standard deviation of the -th testing instance, respectively.

Cumulative Accuracy (CA) is employed as the evaluation metric for MegaAge-Asian, which can be calculated as:

(21)

where is the number of the test images whose absolute estimated error is smaller than . We report CA(3), CA(5), CA(7) as [Zhang et al.2017, Yang et al.2018] in our experiments.

Implementation Details

Pre-processing.

We utilize multi-task cascaded CNN [Zhang et al.2016] to detect and align face images. Then all the images are resized into 224

224 as the inputs. Besides, data augmentation is important to deep neural networks for age estimation. We augment the training data by: (a) random resized cropping with the aspect ratio from 0.8 to 1.25 and the scale from 0.8 to 1.0; (b) random horizontal flipping with the probability of 0.5.

Training Details.

All the network architectures used in CEN are pretrained on the IMDB-WIKI dataset by Eq.(14

). We employ SGD optimizer and set the initial learning rate, the momentum and the weight decay to 0.01, 0.9 and 1e-4, respectively. The learning rate is decreased by a factor of 10 every 40 epochs. Each model is trained totally 160 epochs with the mini-batch size of 128. And then the pre-trained models on IMDB-WIKI are used as initializations on the target age datasets including ChaLearn15, Morph and Megaage-Asian. All the networks are optimized by SGD and the initial learning rate, the momentum and the weight decay are set to 0.001, 0.9 and 1e-4, respectively. If not specific, we employ

, and in our experiments. The learning rate is decreased by a factor of 10 every 40 epochs. Each model is trained totally 160 epochs with the mini-batch size of 128.

Analysis of Coupled Training Mechanism

In this subsection, we explore the coupled training mechanism of label distribution learning and regression. Table 1 shows the comparison results. The first and second rows are the baseline results of using only label distribution learning(LDL) and regression(Reg), respectively. The last three rows present the coupled training performance(LDL+Reg). Specifically, with coupled training mechanism, , and are calculated by Eq.(16),Eq.(17) and Eq.(18), respectively, denoting the outputs of label distribution learning, regression and average of the above outputs.

Methods Morph MegaAge-Asian
MAE CA(3) CA(5) CA(7)
Reg 2.578 58.22 79.01 89.03
LDL 2.323 59.14 78.70 89.26
LDL+Reg 2.243 60.57 79.77 90.21
LDL+Reg 2.231 59.14 79.24 89.62
LDL+Reg 2.220 60.83 80.11 90.52
Table 1: Comparisons with using only label distribution learning and regression on Morph and MegaAge-Asian. Lower MAE is better, while higher CA(n) is better. We employ ResNet-18 as the backbone. The unit of CA(n) is .

Obviously, the proposed coupled training mechanism (LDL+Reg) achieves superior performance than training only with LDL or Reg. For example, compared with Reg, LDL+Reg gains 0.335 improvement of MAE on Morph. And the average of the label distribution learning and regression terms further gains 0.023 and 0.011 improvements of MAE compared with and , respectively. It indicates that the coupled training mechanism can significantly improve the performance of age estimation task, therefore we choose as age estimation results in the following experiments.

Comparisons with State-of-the-Arts

We compare the proposed CEN with previous state-of-the-art methods on Morph, ChaLearn and MegaAge-Asian datasets. The proposed CEN performs mostly the best among all the state-of-the-art methods.

Table 2 shows the MAEs of the individual methods on Morph. Benefiting from the adaptive learning of label distribution and the coupled evolutionary mechanism, our CEN, based on ResNet-18, obtains 1.905 on Morph and outperforms the previous state-of-the-art method from ThinAgeNet [Gao et al.2018].

Methods Pretrained Morph
MAE
OR-CNN[Niu et al.2016] - 3.34
DEX[Rothe and etal.2016] IMDB-WIKI 2.68
Ranking [Chen et al.2017] Audience 2.96
Posterior[Zhang et al.2017] IMDB-WIKI 2.52
DRFs[Shen et al.2017] - 2.17
SSR-Net[Yang et al.2018] IMDB-WIKI 2.52
M-V Loss[Pan et al.2018] IMDB-WIKI 2.16
TinyAgeNet [Gao et al.2018] MS-Celeb-1M 2.291
ThinAgeNet [Gao et al.2018] MS-Celeb-1M 1.969
CEN(ResNet10-Tiny) IMDB-WIKI 2.229
CEN(ResNet10) IMDB-WIKI 2.134
CEN(ResNet18-Tiny) IMDB-WIKI 2.069
CEN(ResNet18) IMDB-WIKI 1.905
  • Used partial data of the dataset;

Table 2: Comparisons with state-of-the-art methods on the Morph dataset. Lower MAE is better.

In addition to real age estimation, apparent age estimation is also important. We conduct experiments on ChaLearn15 to validate the performance of our method on apparent age estimation. Since there are only 2,476 training data in ChaLearn15, huge network may lead to overfitting. Therefore, we choose ResNet10-Tiny with 1.2M parameters as the backbone for evaluations. Table 3 shows the comparison results of MAE and -error. The proposed method creates a new state-of-the-art 3.052 of MAE. The -error 0.274 is also close to the best competition result 0.272 (ThinAgeNet). Note that the parameters of CEN(ResNet10-Tiny) is 1.2M, less than 3.7M of ThinAgeNet.

Methods Pretrained ChaLearn15 Param
MAE -error
DEX[Rothe and etal.2016] - 5.369 0.456 134.6M
DEX[Rothe and etal.2016] IMDB-WIKI 3.252 0.282 134.6M
ARN (Agustsson et al. 2017) IMDB-WIKI 3.153 - 134.6M
TinyAgeNet [Gao et al.2018] MS-Celeb-1M 3.427 0.301 0.9M
ThinAgeNet [Gao et al.2018] MS-Celeb-1M 3.135 0.272 3.7M
CEN(ResNet10-Tiny) IMDB-WIKI 3.052 0.274 1.2M
  • Used partial data of the dataset;

Table 3: Comparisons with state-of-the-art methods on the ChaLearn15 dataset. Lower MAE and -error are better.

Besides, we evaluate the performance of CEN on the MegaAge-Asian dataset, which only contains Asians. Table 4 reports the comparison results of CA(3), CA(5) and CA(7). Our CEN(ResNet18-Tiny) achieves 64.23%, 82.15% and 90.80%, which are the new state-of-the-arts, and obtains 0.22%, 0.80% and 1.18% improvements compared with previous best method Posterior[Zhang et al.2017].

Methods Pretrained MegaAge-Asian
CA(3) CA(5) CA(7)
Posterior[Zhang et al.2017] IMDB-WIKI 62.08 80.43 90.42
Posterior[Zhang et al.2017] MS-Celeb-1M 64.23 82.15 90.80
MobileNet[Yang et al.2018] IMDB-WIKI 44.0 60.6 -
DenseNet[Yang et al.2018] IMDB-WIKI 51.7 69.4 -
SSR-Net[Yang et al.2018] IMDB-WIKI 54.9 74.1 -
CEN(ResNet10-Tiny) IMDB-WIKI 63.60 82.36 91.80
CEN(ResNet10) IMDB-WIKI 62.86 81.47 91.34
CEN(ResNet18-Tiny) IMDB-WIKI 64.45 82.95 91.98
CEN(ResNet18) IMDB-WIKI 63.73 82.88 91.64
Table 4: Comparisons with state-of-the-art methods on the MegaAge-Asian dataset. The unit of CA(n) is . Higher CA(n) is better.

The Superiority of Evolutionary Mechanism

In this subsection, we qualitatively and quantitatively demonstrate the superiority of the proposed evolutionary mechanism. Figure 3 depicts the evolutions of age label distributions. As shown in the second column of Figure 3(b), with the given instance who is 45 years old, the first predicted distribution can be approximately regarded as a bimodal distribution with two peaks 41 and 51, which is ambiguous for age estimation. After 1 time evolution, the predicted distribution is refined from bimodal distribution to unimodal distribution with the single peak 48. After 2 times evolution, the peak of unimodal distribution moves from 48 to 45, which is the true age of the input instance. This movement indicates the effectiveness of the additional cross entropy term in Eq.(7), which aims to rectify the discrepancy between the real label distribution and the predicted label distribution. More results are shown in Figure 4 and Figure 5.

Backbones Morph MegaAge-Asian
MAE CA(3) CA(5) CA(7)
CEN(ResNet10-Tiny) t=1 2.446 60.52 80.13 90.64
t=2 2.300 62.01 81.90 91.64
t=3 2.241 63.14 82.31 91.84
t=4 2.229 63.60 82.36 91.80
CEN(ResNet10) t=1 2.321 59.57 79.44 89.39
t=2 2.207 61.91 81.18 91.16
t=3 2.150 62.86 81.47 91.34
t=4 2.134 62.78 81.77 91.00
CEN(ResNet18-Tiny) t=1 2.304 61.88 81.31 91.34
t=2 2.136 63.57 82.00 91.46
t=3 2.069 64.52 82.03 91.70
t=4 2.074 64.45 82.95 91.98
CEN(ResNet18) t=1 2.220 60.83 80.11 90.52
t=2 1.996 62.42 82.75 91.59
t=3 1.905 63.31 83.11 92.28
t=4 1.919 63.73 82.88 91.64
Table 5: The influences of evolution mechanism. The first evolution() means the initial ancestor in CEN. The unit of CA(n) is . Lower MAE is better, while higher CA(n) is better.

In addition, we show quantitative experimental results of evolutionary mechanism on Morph and MegaAge-Asian in Table 5. We observe that the performance of all the network architectures will increase through evolution. For example, after 2 time evolutions (from to ), the CA(7) for CEN(ResNet10-Tiny), CEN(ResNet10), CEN(ResNet18-Tiny) and CEN(ResNet18) on MegaAge-Asian improve from 90.64%, 89.39%, 91.34% and 90.52% to 91.84%, 91.34%, 91.70% and 92.28%, respectively. It demonstrates the superiority of the proposed evolutionary mechanism. Specifically, there is a significant improvement from the first evolution() to the second evolution(), which is mainly because of the additional employment of Kullback-Leibler (KL) divergence and the slack term. We also observe that the best results are achieved in 3-th evolution or 4-th evolution, indicating the boosting is saturated in the evolutionary procedure.

Additional visualization results of the evolutionary age label distributions on Morph and MegaAge-Asian are presented in Figure 4 and Figure 5.

Ablation Study

In this section, we explore the influences of three hyper-parameters , and for CEN. All the ablation studies are trained on Morph with ResNet18 model.

Influence of Temperature Parameters .

The temperature parameter plays an important role in the age distribution estimation. Figure 3 provides a schematic illustration of the influence of . In Figure 3(a), from left to right, each column presents the age label distributions when . We observe that works better in our CEN than other lower or higher temperatures. To be specific, when , the negative logits are mostly ignored, even though they may convey useful information about the knowledge from the ancestor CEN. While or would suppress the probability of peak in the age label distribution, which contributes to misleading during optimization.

Besides, we quantitatively compare the MAE on Morph with different . Specifically, we fix to 0.5, to 2 and report results with ranging from 1 to 5 in Table 6. Apparently, when , we obtain the best result on MAE 1.905. Thus, we choose to use in our experiments.

Hyper-param Morph Hyper-param Morph Hyper-param Morph
MAE MAE MAE
1 0.5 4 2.096 2 0.25 4 1.946 2 0.5 1 1.965
2 0.5 4 1.905 2 0.50 4 1.905 2 0.5 2 1.962
3 0.5 4 1.941 2 0.75 4 1.921 2 0.5 3 1.922
4 0.5 4 1.970 2 1.00 4 1.952 2 0.5 4 1.905
- - - - - - - - 2 0.5 5 1.933
Table 6: The influences of hyper-parameters , and .

Influence of Hyper-parameters .

We use the hyper-parameter to balance the importance of the cross entropy and Kullback-Leibler (KL) divergence losses in evolutionary label distribution learning. We fix the to 2, to 2 and report results with from 0.25 to 1.00 in Table 6. When

, we obtain the best result, which indicates that both the cross entropy loss and Kullback-Leibler divergence loss are equally important (

) in our method.

Influence of Hyper-parameters .

We use the hyper-parameter to balance the importance of the evolutionary label distribution learning and evolutionary slack regression in the our CEN. We fix the to 2, to 0.5 and report results with from 1 to 4 in Table 6. We can see that when , CEN performs the best.

Conclusion

In this paper, we propose a Coupled Evolutionary Network (CEN) for age estimation, which contains two concurrent processes: evolutionary label distribution learning and evolutionary slack regression. The former contributes to adaptively learn and refines the age label distributions without making strong assumptions about the distribution patterns in an evolutionary manner. The later concentrates on the ordered and continuous information of age labels, converting the discrete age label regression to the continuous age interval regression. Experimental results on Morph, ChaLearn15 and MegaAge-Asian datasets show the superiority of CEN.

References

  • [Agustsson, Timofte, and Van Gool2017] Agustsson, E.; Timofte, R.; and Van Gool, L. 2017.

    Anchored regression networks applied to age estimation and super resolution.

    In ICCV, 1652–1661.
  • [Can Malli, Aygun, and Kemal Ekenel2016] Can Malli, R.; Aygun, M.; and Kemal Ekenel, H. 2016. Apparent age estimation using ensemble of deep learning models. In CVPRW, 9–16.
  • [Chen et al.2017] Chen, S.; Zhang, C.; Dong, M.; Le, J.; and Rao, M. 2017. Using ranking-cnn for age estimation. In CVPR, 742–751.
  • [Escalera et al.2015] Escalera, S.; Fabian, J.; Pardo, P.; Baró, X.; Gonzalez, J.; Escalante, H. J.; Misevic, D.; Steiner, U.; and Guyon, I. 2015. Chalearn looking at people 2015: Apparent age and cultural event recognition datasets and results. In ICCVW, 1–9.
  • [Gao et al.2017] Gao, B.-B.; Xing, C.; Xie, C.-W.; Wu, J.; and Geng, X. 2017. Deep label distribution learning with label ambiguity. IEEE TIP 26(6):2825–2838.
  • [Gao et al.2018] Gao, B.-B.; Zhou, H.-Y.; Wu, J.; and Geng, X. 2018. Age estimation using expectation of label distribution learning. In IJCAI, 712–718.
  • [Geng and Ji2013] Geng, X., and Ji, R. 2013. Label distribution learning. In ICDMW, 377–383.
  • [Geng, Yin, and Zhou2013] Geng, X.; Yin, C.; and Zhou, Z.-H. 2013. Facial age estimation by learning from label distributions. IEEE TPAMI 35(10):2401–2412.
  • [He et al.2016] He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR, 770–778.
  • [Huang et al.2017] Huang, G.; Liu, Z.; Van Der Maaten, L.; and Weinberger, K. Q. 2017. Densely connected convolutional networks. In CVPR, 2261–2269.
  • [Niu et al.2016] Niu, Z.; Zhou, M.; Wang, L.; Gao, X.; and Hua, G. 2016. Ordinal regression with multiple output cnn for age estimation. In CVPR, 4920–4928.
  • [Pan et al.2018] Pan, H.; Han, H.; Shan, S.; and Chen, X. 2018.

    Mean-variance loss for deep age estimation from a face.

    In CVPR, 5285–5294.
  • [Ricanek and Tesafaye2006] Ricanek, K., and Tesafaye, T. 2006. Morph: A longitudinal image database of normal adult age-progression. In FG, 341–345.
  • [Rothe and etal.2016] Rothe, R., and etal. 2016. Dex: Deep expectation of apparent age from a single image. In ICCVW, 252–257.
  • [Rothe, Timofte, and Van Gool2015] Rothe, R.; Timofte, R.; and Van Gool, L. 2015. Dex: Deep expectation of apparent age from a single image. In ICCVW, 10–15.
  • [Shen et al.2017] Shen, W.; Guo, Y.; Wang, Y.; Zhao, K.; Wang, B.; and Yuille, A. 2017. Deep regression forests for age estimation. arXiv.
  • [sim2014] 2014. arXiv.
  • [Wu et al.2018] Wu, X.; He, R.; Sun, Z.; and Tan, T. 2018. A light cnn for deep face representation with noisy labels. IEEE TIFS 13(11):2884–2896.
  • [Yang et al.2015] Yang, X.; Gao, B.-B.; Xing, C.; Huo, Z.-W.; Wei, X.-S.; Zhou, Y.; Wu, J.; and Geng, X. 2015. Deep label distribution learning for apparent age estimation. In ICCVW, 102–108.
  • [Yang et al.2018] Yang, T.-Y.; Huang, Y.-H.; Lin, Y.-Y.; Hsiu, P.-C.; and Chuang, Y.-Y. 2018. Ssr-net: A compact soft stagewise regression network for age estimation. In IJCAI, 1078–1084.
  • [Yang, Geng, and Zhou2016] Yang, X.; Geng, X.; and Zhou, D. 2016. Sparsity conditional energy label distribution learning for age estimation. In IJCAI, 2259–2265.
  • [Zhang et al.2016] Zhang, K.; Zhang, Z.; Li, Z.; and Qiao, Y. 2016.

    Joint face detection and alignment using multitask cascaded convolutional networks.

    IEEE SPL 23(10):1499–1503.
  • [Zhang et al.2017] Zhang, Y.; Liu, L.; Li, C.; et al. 2017. Quantifying facial age by posterior of age comparisons. arXiv.
  • [Zhang, Wang, and Geng2015] Zhang, Z.; Wang, M.; and Geng, X. 2015. Crowd counting in public video surveillance by label distribution learning. NLM 166:151–163.
  • [Zhou, Xue, and Geng2015] Zhou, Y.; Xue, H.; and Geng, X. 2015. Emotion distribution recognition from facial expressions. In ACM MM, 1247–1250.