Ordinal Distribution Regression for Gait-based Age Estimation

05/27/2019 ∙ by Haiping Zhu, et al. ∙ FUDAN University 0

Computer vision researchers prefer to estimate the age from face images due to informative facial features. Estimating the age from face images becomes challenging when people are far away from camcorders or occluded. As the unique biometric feature that can be perceived efficiently at a distance, gait can be an alternative way to predict the age in case that face images are not available. However, existing gait-based classification or regression methods ignore the ordinal relationship of different ages, which is an important clue to the age estimation. In this paper, we proposes an ordinal distribution regression with a global and local convolutional neural network for gait-based age estimation. Specifically, we decompose the gait-based age regression into a series of binary classifications to incorporate the ordinal information of the age. Then an ordinal distribution loss is proposed to take inner relationship among these classifications into account by penalizing the distribution discrepancy between the estimated and the ground-truth. In addition, our neural network consists of a global and three local sub-networks, which is capable of learning the global structure and more local details from head, body and feet of gait, respectively. By comparing with the state-of-the-art methods of gait-based age estimation, this paper highlights, experimentally, that the proposed approach has a better predictive performance on the OULP-Age dataset.



There are no comments yet.


page 1

page 3

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Human age estimation remains to be an active research topic, which plays an important role on many potential applications, such as video surveillance, social networking, human-computer interaction, etc. Existing human age estimation methods are mainly based on facial images [2, 3, 22, 23, 31], which are very informative and easy to be estimated. The performance of the face-based age estimation approaches, however, will compromise when the face is occluded, for example, with sunglass or makeup. What’s more, the face-based age estimation becomes challenging if a person is far away from camcorders, which often happens in many video surveillance systems located at crossroads, airports and railway stations. As the unique biometric feature that can be perceived efficiently at a distance, gait can be an alternative way to predict the age in case that human faces are less informative or not available. Remarkably, gait-based estimation has its psychological foundation which cannot be easily faked [7]. For example, an old person might hobble along, whereas a young person might walk briskly.

Figure 1: GEIs of different age and gender subjects in the OULP-Age dataset. The number above each GEI is the corresponding age of the subject.

In the field of gait-based age estimation, gait energy image (GEI) [15, 21], which compresses one or more gait sequences into a single image, is one of the most widely used gait templates for its simplicity and effectiveness. Some researches applied age manifold learning techniques on GEI to learn a low-dimensional representation capturing the intrinsic data distribution and geometric structure [16, 17]. Existing gait-based age estimation approaches can be roughly grouped into two categories: classification-based  [15] and regression-based methods [17, 21]. However, both of them do not consider the ordinal relationship between age labels, which is an important clue to age estimation. Therefore, the ranking-based methods for facial-based age estimation [3, 5, 13, 22]

are proposed to solve such a problem by utilizing the ordinal information between age labels. These methods usually decompose the ordinal regression into a series of binary classifications and utilize cross-entropy loss to optimize these binary classifications. However, the cross-entropy loss treats these classifications independently, ignoring the inner relationship among them. For these ordered binary classifications, the expected inner relationship is that the predictive probability of the

-th classifier should not be greater than the probability of the (

)-th classifier, as explained in Fig. 2.

Figure 2: The predictive probability of -th classifier is expected not greater than that of ()-th classifier on an ordinal distribution. Both A and B have a same cross-entropy loss, but B is preferable to A on an ordinal distribution.

In this paper, we propose an ordinal distribution regression with a global and local convolutional neural network, named as ODR-GLCNN, for gait-based age estimation. Similar to the ranking-based methods for facial-based age estimation, we regard the gait-based age estimation as an ordinal regression, and decompose the ordinal regression problem into a series of binary classifications sub-problems. Note that the major issue with the existing ranking-based methods is that they solve these binary sub-problems independently, neglecting the inner relationship in a degree and making not good use of the correlation between these sub-binary tasks. To address this shortcoming, we propose an ordinal distribution loss to penalize the distribution difference between the estimated and ground-truth ages. Besides, we proposed a novel network, consisting of a global and three local sub-networks, to obtain global structure and local structures from the head, body and feet of a gait. Experimental results on the OULP-Age dataset [40] and the MORPH Album II dataset [26] demonstrate that the proposed approach outperforms the state-of-the-art methods both on gait-based and face-based age estimation.

The contributions of this paper are: 1) A deep ordinal distribution regression for gait-based age estimation is proposed, achieving the state-of-the-art predictive performance on the OULP-Age dataset; 2) An ordinal distribution loss is proposed to take the inner relationship among a series of binary sub-problems into account; 3) A novel network consisting of a global network and three local sub-networks is proposed, learning more representative features from the gait globally and locally.

2 Related work

In this section, we give a brief survey of face-based and gait-based age estimation as well as ordinal regression.

Face-based age estimation: Existing approaches for face-based age estimation can be categorized into three categories: classification-, regression-, and ranking-based methods. Classification-based methods were often used to roughly estimate the age group of the subject in a face image [12, 41]. Different ages or age groups were treated as independent classes. These methods, however, hardly consider the cost difference of subjects belonging to different age groups. Regression-based methods provided a more accurate age assessment to a facial image of a subject [6, 39]. Typically, regression-based methods employed an Euclidean loss ( loss) to penalize the difference between the estimated and ground-truth ages. Recently, ranking-based or ordinal methods were proposed for facial age estimation [1, 2, 22, 3]. Regarding the age as an ordered label, these approaches used multiple binary classifiers to determine the rank of a specific age. Different from the loss that ignores the ordinal information, ranking-based methods are able to explicitly model the ordinal relationship among those face images sampled from different ages.

Gait-based age estimation: The earliest work for gait-based age estimation can be dated back to [21], where a Gaussian process regression (GPR) [25] was introduced to predict age from human gait. Then the GPR was refined with an active set method [37] to reduce the computational time for online age estimation [20]. Lu and Tan proposed a multi-label guided subspace (MLG) to better characterize the feature space by correlating the age and gender information of subjects [15]. They further proposed an ordinary preserving manifold learning approach to seek a low-dimensional discriminative subspace for age estimation [17]. Considering the age variations within different age groups such as children, adult, and the elderly, Li et al. proposed an age group-dependent manifold method [14]. After an age group classifier has been trained, a kernel SVM regression was added for accurately assessment in each age group. This method achieves the state-of-the-art performance in gait-based age estimation so far.

Figure 3: The structure of the proposed ODR-GLCNN network. The output layer contains binary classifications incorporating the ordinal information into current end-to-end learning process.

Ordinal regression: Most ordinal algorithms can be regarded as the refined version of classification algorithms with ordinal constraints [4, 9, 30]. For examples, Herbrich et al.

utilized support vector machine for ordinal regression 

[9], and then Shashua and Levin refined SVM to handle multiple thresholds [30]

. Crammer and Singer proposed the perceptron ranking algorithm to generalize the online perceptron algorithm with multiple thresholds for ordinal regression 

[4]. Another way to directly utilize the classification algorithms is to transfer the ordinal regression into a series of simpler binary classifications [5, 13]

. Specifically, Frank and Hall utilized decision tree as binary classifications for ordinal regression 

[5]. Li and Lin learned the ordinal regression by a set of classifiers, followed by employed an SVM for final classification [13]. Recently, Niu et al. introduced a CNN network with multiple binary outputs to solve the ordinal regression for age estimation [22]. Ordinal regression was also used in [3] by learning multiple binary CNNs, and aggregating the final outputs. However, these ordinal regression methods solved each binary sub-problem independently, and less utilized the underlying relationship among these binary sub-problems. In this paper, we thus proposed a distribution loss to utilize such a relationship to improve age estimation.

3 The proposed method

In this section, we present the ordinal regression for gait-based age estimation, our novel network consisting of one global and three local sub-networks, and a novel distribution loss in more details.

3.1 Ordinal regression

We treat the gait-based age estimation as an ordinal regression so that the ordinal relationship of age labels can be utilized. Let denote the -th input GEI sample, and the corresponding age is with ordered ranks . The symbol denotes the order among different ranks. Given a training set , the ordinal regression is to learn a mapping from images to ranks, , .

Inspired by two ranking-based methods [3, 22], we decompose the ordinal regression into a series of binary classifications. Specifically, the ordinal regression with ranks is decomposed into binary classifiers . For each , a binary classifier is constructed to predict whether the rank of a sample is greater than . The final rank of an unknown test sample is determined by summarizing all the classifiers results of the binary classifiers.

To train the -th binary classifier , more concretely, the given dataset is divided into two subsets - one positive class and one negative class, determined by whether age is greater than , ,


The whole binary classifiers are well-trained with their respective training datasets, the age of the test sample is predicted as follows:


where is the output probability of the -th classifier for the sample (, the -th output of GL-CNN), is the partitioning interval, and denotes the truth-test operator, which is 1 if the inner condition holds, and 0 otherwise.

3.2 The global and local convolutional neural networks (GL-CNN)

Fig. 3 presents the overview of the proposed deep neural network for gait-based age estimation, consisting of one global and three local convolutional neural networks, followed by three fully connected layers with outputs. Next, we describe the network in details.

The grayscale GEI images of size are fed into the global network as the input. Considering that different parts of gait take on different local behaviors, we crop the GEI template into three parts - head, body and feet. In OULP-Age dataset [40], the gait images of various people are detected, cropped, aligned and resized into the uniform silhouette template with the same height. In our work, the three parts are cropped using three boxes without overlap, each of which is fixed to size of , , and , respectively. Then three local networks are designed to learn finer details from these three parts separately. More specifically, there are three convolutional layers in both global and local sub-networks. At the first convolutional layer, 32 filters of size

with stride of 1 pixel are applied on the input images, followed by a Leaky Rectified Linear Unit (LeakyReLU) 


. Then a max pooling operation with filters of size

applied with a stride of 2 is used to emphasize the strongest responsive points in the feature maps. The similar operations are conducted at the second and third convolutional layers with different filter sizes (refer to Fig. 3 for details). It should be noted that we concatenate the three local feature maps from second convolution layers along height dimension to form new local feature maps in local network, which is further concatenated with the feature maps from global network along the channel dimension.

After that, there are three fully connected layers as shown in Fig. 3

. Among them, F4 is the first fully connected layer in which the feature maps are flattened into a feature vector. There are 1024 neurons in F4 followed by LeakyReLU and a dropout layer 

[35]. F5 is the second fully connected layer with 1024 neurons that receives the output from F4 followed by LeakyReLU and another dropout layer. F6 is the third fully connected layer with neurons that receives the outputs from F5 followed by LeakyReLU and a dropout layer. Through a sigmoid layer, the outputs correspond to the predictive probabilities from

binary classifiers. The parameters of the network are typically optimized by minimizing a loss function.

3.3 Ordinal distribution loss

Here we cast the age label as the for binary classifiers where . We employ the cross-entropy loss as the loss function for these binary classifiers. The loss can be calculated as:


where is the output value of the -th binary classifier for the -th sample. The cross-entropy loss, however, optimizes these binary classifiers separately, resulting in discrepancy between different binary classifications, as described in Fig. 2.

In order to fully utilize the inner relationship among these

outputs, we regard these outputs as a probability distribution and then propose a distribution loss, , the squared Earth Mover’s Distance (

[10], to penalize the discrepancy between the output distribution and the ground-truth distribution. Firstly, the output values are softly transformed to probability value:


Then the loss is defined as:


where and are the probability distributions corresponding to the -th output and the -th ground-truth , respectively. is a cumulative density function of its input, and is the -th element of the CDF of its input.

Finally, we propose an ordinal distribution loss through combining the cross-entropy loss with the loss. This loss function is easily embedded into the architecture of GL-CNN for an end-to-end learning. The ordinal distribution loss (ODL) is,


where is a hyper-parameter that controls the influence of in the joint loss.

3.4 Learning ODR-GLCNN

One advantage of using Eq. (6) is that the ordinal distribution loss can simultaneously learn each binary classification and the inner relationship between these binary classifications. For the -th sample , the gradient of our loss can be derivate as:


where represents the parameters of network, and

could be derived through the standard backpropagation method. For the

-th element of , the gradient can be derivate as:


For the -th element of , the gradient can be derivate as:


where . Eq. (8) indicates that the gradient of the cross entropy loss is only related with output value of each binary classification and its corresponding ground-truth, ignoring the intrinsic correlation for their binary classifiers. In contrast, the output value of each classification would be considered when computing the gradient of a specific binary classification in loss, as shown in Eq. (9). Therefore, the ordinal distribution loss can not only consider each binary classification but also utilize inner relationship among them.

4 Experiments

In this section, we describe the experimental settings in details and demonstrate the effectiveness of the proposed method through comparing with the state-of-the-art methods and performing a set of ablative studies on OULP-Age gait dataset [40]. In addition, we evaluate the generalization ability of the proposed approach to other tasks, facial age estimation on MORPH Album II [26].

4.1 Experimental settings

4.1.1 Data preparation

OULP-Age is the largest gait dataset in the world so far, which contains 63,846 samples of GEI (31,093 males and 32,753 females) with age ranging from 2 to 90 years old, and each GEI sample is pixels. According to gender, the age histogram of this dataset in five-year intervals is shown in Fig. 4. As the dataset suggested [40], the OULP-Age dataset was averagely divided into two disjoint subsets (training set and test set). The training set contains 15,596 males and 16,327 females, the testing set 15,497 males and 16,426 females. In addition, these two sets keep a similar age distribution. Another popular gait dataset on age estimation is the USF dataset [29], which includes only 122 subjects and is too small to train a deep network. Thus, we evaluate the proposed method on the OULP-Age dataset but not on the USF dataset.

MORPH Album II is one of the largest longitudinal face databases in the public domain, which contains 55,134 face images of 13,617 subjects in the 16-to-77 age range [26]. Followed the protocol as [22, 3, 23, 32], we use the five-fold random split (RS) protocol to evaluate the performance of the facial age estimation. All face images are aligned based on five facial landmarks detected using an open-source SeetaFaceEngine111https://github.com/seetaface/SeetaFaceEngine and are resized into .

Figure 4: The distribution of age and gender for the OULP-Age dataset.

4.1.2 Evaluation metrics

The performance of age estimation is evaluated by the Mean Absolute Error (MAE) and the Cumulative Score (CS). MAE represents the average of the absolute errors between the predicted age and ground-truth over all test samples. The MAE is defined as , where is the total number of test samples. And CS is calculated as , where is the number of test samples whose absolute error between the estimated age and the ground-truth is not greater than years. CS reveals the consistently performance by computing the accuracy of this model in different levels.

4.2 Gait-based age estimation results

4.2.1 Implementation details

In our experiments, we utilize GL-CNN, CNN (consist of a global part as shown in Fig. 3), and VGG16 [24, 33] as three backbone networks. We use Adam [11] with learning rate of , beta1 0.5, beta2 0.999, weight decay

, batch size of 300 and the maximal epochs 300 for CNN and GL-CNN. Followed the setting as 


, we use stochastic gradient descent (SGD) with learning rate of

, weight decay

, batch size of 300 and the maximal epochs 100 for VGG16, and reduce the learning rate by multiplying 0.1 for every 15 epochs. To make the grayscale GEI suitable for VGG16, we copied the GEI three times as RGB channels to fed into VGG16, which is pre-trained on ImageNet 

[28]. Besides, the weight coefficient of the loss term in Eq. (6) is set to

, which is tuned according to the model performance. All our experiments are implemented on PyTorch with four GeForce GTX 1080 Ti GPUs.

4.2.2 Comparisons with the state-of-the-arts

We compare the proposed method with the state-of-the-art methods, including the classification-based methods (e.g., MLG [15]), regression-based methods (e.g., GPR [21], SVR [34] and ASSOLPP [14]), and age manifold learning-based methods (e.g., OPLDA and OPMFA [16]

). Besides, we implemented a deep learning method as a baseline, named as

VGG16 + Mean-Variance

, to validate the effectiveness of the proposed method. This method proposed in [23] achieves an outperform performance in the field of face-based age estimation.

Methods MAE CS ()
SVR [34] 7.66 41.40%
MLG [15] 10.98 43.40%
OPLDA [16] 8.45 36.50%
OPMFA [16] 9.08 34.70%
GPR [21] 7.30 43.60%
ASSOLPP [14] 6.78 53.00%
VGG16 + Mean-Variance [23] 5.59 60.46%
ODR-GLCNN (Ours) 5.12 66.95%
Table 1: Comparisons of the age estimation MAEs by the proposed approach and the state-of-the-art methods on the OULP-Age dataset.
Figure 5: Comparisons of the age estimation CS by the proposed approach and the state-of-the-art methods on the OULP-Age dataset.

Table 1 shows the results of eight methods on OULP-Age. This suggests that CNN-based methods, such as [23], perform better than traditional methods in MAE [14, 15, 16, 21, 34]

. The reason is because CNN-based methods have much more parameters and learn more representative features with end-to-end training. Our method performs the best among all the approaches, because our method benefits from not only a more representative feature extraction network (GL-CNN) but also a novel loss function (the ordinal distribution loss), which can learn not only each binary classification of ordinal regression but the inner relationship among them. Besides, as shown in Fig. 

5, the CS results on the OULP-Age dataset further demonstrate the proposed approach performs consistently better than other state-of-the-art methods.

Some age estimation examples are shown in Fig. 6. We can see that the proposed approach performs quite robust for young, middle-aged, and old subjects. It is noticeable from the last row of Fig. 6 that the age estimation accuracy may be degenerated when a person wears a heavy clothes, or when a person is too thin or too fat.

Figure 6: Examples of gait-based age estimation results by the proposed approach on OULP-Age dataset. The top three rows show nine successful age estimation examples (MAE smaller than 5 years) for young, middle-aged, and old subjects, respectively. The last row shows nine failure cases (MAE larger than 20 years). The numbers above each images show the ground-truth age and the estimation age of the subject, i.e., ground-truth age (estimated age).

4.2.3 Analyzing the performance of GL-CNN

We evaluate the performance of the proposed network GL-CNN by comparing with a simple CNN consisting of a global part and the widely used network VGG16 for age estimation based on gait. Three networks choose the cross-entropy loss as their loss function. The results of MAE and CS () with three different networks are shown in Table 2.

Network MAE CS () Time (ms)
CNN 5.45 64.64% 7.27
VGG16 5.63 63.92% 21.9
GL-CNN (Ours) 5.24 65.96% 8.99
Table 2: The comparisons among different CNN-based methods on the OULP-Age. The performance is measured by MAE and CS (). The testing time for one sample by these methods are reported in the last column.

Compared with a simple CNN, GL-CNN achieves a better performance in age estimation. It can be seen from Table 2 that 1) compared with CNN and VGG16, GL-CNN achieves the best performance in two criteria; 2) although VGG16 has more parameters, GL-CNN effectively learns more detail information through combining a global and three local structures, resulting in an improved performance; and 3) the computational cost of GL-CNN is only slightly higher than CNN, but much smaller than VGG16.

Figure 7: Features visualization of CNN and GL-CNN. The network features are reduced from 1024 dimensions to 2 dimensions by -SNE technique. We divide the age into 9 age groups, and different colors represent different age groups.

To better demonstrate the effectiveness of the proposed network, we visualize the features of CNN and GL-CNN through -distributed stochastic neighbor embedding [19] (-SNE) technique with perplexity 30, as shown in Fig. 7. For better visualization, the age label is divided into 9 age groups, . We can see that both GL-CNN and CNN features seem keeping a manifold-like structure since the order of ages varies smoothly from left to right. However, if zooming in Fig. 7, it can be seen that the inner age group samples of the GL-CNN are denser than CNN, especially in age group and , which shows that GL-CNN can achieve lower error in age estimation, because it learns a better feature representation from both global and local structures.

4.2.4 Comparison of different losses

To validate the effectiveness of the proposed ordinal distribution loss, we compare it with three widely used losses in age estimation task, , Euclidean, MAE and cross-entropy losses by performing age estimation based on the proposed GL-CNN. The MAE and CS () of these losses are reported in Table 3.

Loss MAE CS ()
Euclidean 6.73 52.95%
MAE 6.65 55.16%
Cross-Entropy 5.24 65.96%
Ordinal Distribution (Ours) 5.12 66.95%
Table 3: The comparisons among different losses with the proposed GL-CNN on the OULP-Age dataset. The performances are measured by MAE and CS ().

It can be seen that cross-entropy loss outperforms Euclidean loss and MAE loss for age estimation task. The reason is that Euclidean and MAE losses are easily lead to over-fitting and do not consider the ordinal information between age labels. In contrary, the proposed ordinal distribution loss incorporates the inner relationship between the binary classifications by using a distribution loss, named loss, resulting in a better predictive performance.

4.2.5 Discussion of the influence of gender

Figure 8: Multi-task structure for age and gender estimation tasks.

We realize that the gender is correlated with the age of gait since human gait appearances vary between males and females even within the same age group from Fig. 1. To better utilize the relationship between age and gender of gait, we embed a multi-task technique into the CNN-based framework. As shown in Fig. 8, specifically, we integrate a gender classification task to the proposed method and other three CNN-based methods. As a binary classification, the gender loss is defined as:


where is the ground-truth of gender for the -th sample and is the corresponding predicted value.

Methods w/o gender w/ gender acc.
CNN + Euclidean 6.96 6.82 96.70%
CNN + CE 5.40 5.34 97.20%
VGG16 + MV 5.59 5.52 96.70%
ODR-GLCNN 5.12 5.06 97.80%
Table 4: The influence of human gender for gait-based age estimation in terms of age MAE and gender accuracy. MV and CE represent the mean-variance loss and the cross-entropy Loss, respectively.

Table 4 indicates that the gender information indeed improves the performance of gait-based age estimation. Moreover, the accuracy of gender classification in our method is 97.8%, implying that as a by-product, our network can accurately predict the gender of a person from a gait.

4.3 The facial age estimation

We demonstrate the proposed ordinal distribution loss (ODL) to perform facial age estimation on MORPH Album II, and compare the results with the state-of-the-art methods [22, 27, 3, 23, 32]. Followed as [23, 32], we also utilize VGG16, pre-trained with ImageNet [28], as the backbone network with the proposed ODL. The results of individual approaches in terms of MAE and CS are reported in Table 5. We can see that our approach achieves better prediction performance than the state-of-the-art method DRFs [32], which suggests that our approach can be generalized well to facial age estimation task. Besides, the results by using the ordinal distribution loss (ODL, ) are better than using a single cross-entropy loss (ODL, ). It indicates that ODL is more effective in learning the ordinal relationship among different age than a single cross-entropy loss.

Methods MAE CS ()
OR-CNN [22] 3.27 73.0%*
DEX [27] 3.25 N/A
Ranking-CNN [3] 2.96 85.0%*
VGG16 + Mean-Variance [23] 2.41 90.0%*
DRFs [32] 2.17 91.3%
VGG16 + ODL() 2.30 91.1%
VGG16 + ODL() (Ours) 2.16 92.9%
Table 5: Comparisons between our approach and the state-of-the-art methods on the MORPH Album II dataset in terms of MAE and CS value (*: the value is read from the reported CS curve).

5 Conclusion

In this paper, we proposed an ordinal distribution regression with GL-CNN consisting of one global and three local sub-networks for gait-based age estimation. By incorporating the cross-entropy loss and the loss, the proposed ordinal distribution loss is more effective in learning the ordinal relationship among different age than a single cross-entropy loss. Moreover, one global and three local sub-networks are constructed to extract more representative gait features. We also notice that if the gender information is available for training, embedding a multi-task strategy into the proposed framework can more or less improve the performance of age estimation. Experiments on the OULP-Age and the MORPH Album II datasets show that our approach not only performs better than the state-of-the-art methods on gait-based age estimation, but also generalizes well into facial age estimation task.

In the future, it is worth studying how to utilize temporal information or cross-view information [8, 36, 38] of gait sequence to improve the accuracy and the effectiveness of gait-based age estimation.


  • [1] K.-Y. Chang and C.-S. Chen. A learning framework for age rank estimation based on face images with scattering transform. IEEE TIP, 24(3):785–798, 2015.
  • [2] K.-Y. Chang, C.-S. Chen, and Y.-P. Hung.

    Ordinal hyperplanes ranker with cost sensitivities for age estimation.

    In CVPR, pages 585–592, 2011.
  • [3] S. Chen, C. Zhang, M. Dong, J. Le, and M. Rao. Using ranking-CNN for age estimation. In CVPR, 2017.
  • [4] K. Crammer and Y. Singer. Pranking with ranking. In NIPS, pages 641–647, 2002.
  • [5] E. Frank and M. Hall. A simple approach to ordinal classification. In ECML, pages 145–156, 2001.
  • [6] Y. Fu and T. S. Huang. Human age estimation with regression on discriminative aging manifold. IEEE TMM, 10(4):578–584, 2008.
  • [7] J. Han and B. Bhanu. Individual recognition using gait energy image. IEEE TPAMI, 28(2):316–322, 2006.
  • [8] Y. He, J. Zhang, H. Shan, and L. Wang. Multi-task GANs for view-specific feature learning in gait recognition. IEEE TIFS, 14(1):102–113, 2019.
  • [9] R. Herbrich, T. Graepel, and K. Obermayer. Support vector learning for ordinal regression. ICANN, pages 97–102, 1999.
  • [10] L. Hou, C.-P. Yu, and D. Samaras. Squared earth mover’s distance-based loss for training deep neural networks. arXiv:1611.05916, 2016.
  • [11] D. Kinga and J. B. Adam. A method for stochastic optimization. In ICLR, volume 5, 2015.
  • [12] A. Lanitis, C. Draganova, and C. Christodoulou. Comparing different classifiers for automatic age estimation. IEEE TSMCB, 34(1):621–628, 2004.
  • [13] L. Li and H.-T. Lin. Ordinal regression by extended binary classification. In NIPS, pages 865–872, 2007.
  • [14] X. Li, Y. Makihara, C. Xu, Y. Yagi, and M. Ren. Gait-based human age estimation using age group-dependent manifold learning and regression. MTA, 77(21):1–22, 2018.
  • [15] J. Lu and Y.-P. Tan. Gait-based human age estimation. IEEE TIFS, 5(4):761–770, 2010.
  • [16] J. Lu and Y.-P. Tan. Ordinary preserving manifold analysis for human age estimation. In CVPR Workshops, pages 90–95, 2010.
  • [17] J. Lu and Y.-P. Tan. Ordinary preserving manifold analysis for human age and head pose estimation. IEEE THMS, 43(2):249–258, 2013.
  • [18] A. L. Maas, A. Y. Hannun, and A. Y. Ng. Rectifier nonlinearities improve neural network acoustic models. In ICML, volume 30, page 3, 2013.
  • [19] L. v. d. Maaten and G. Hinton. Visualizing data using -SNE. JMLR, 9(Nov):2579–2605, 2008.
  • [20] Y. Makihara, T. Kimura, F. Okura, I. Mitsugami, M. Niwa, C. Aoki, A. Suzuki, D. Muramatsu, and Y. Yagi. Gait collector: an automatic gait data collection system in conjunction with an experience-based long-run exhibition. In ICB, pages 1–8, 2016.
  • [21] Y. Makihara, M. Okumura, H. Iwama, and Y. Yagi. Gait-based age estimation using a whole-generation gait database. In ICB, pages 1–6, 2011.
  • [22] Z. Niu, M. Zhou, L. Wang, X. Gao, and G. Hua. Ordinal regression with multiple output CNN for age estimation. In CVPR, pages 4920–4928, 2016.
  • [23] H. Pan, H. Han, S. Shan, and X. Chen. Mean-variance loss for deep age estimation from a face. In CVPR, pages 5285–5294, 2018.
  • [24] O. M. Parkhi, A. Vedaldi, A. Zisserman, et al.

    Deep face recognition.

    In BMVC, volume 1, 2015.
  • [25] C. E. Rasmussen.

    Gaussian processes in machine learning.

    In Advanced Lectures on Machine Learning, pages 63–71. 2004.
  • [26] K. Ricanek and T. Tesafaye. Morph: A longitudinal image database of normal adult age-progression. In FGR, pages 341–345, 2006.
  • [27] R. Rothe, R. Timofte, and L. Van Gool. Deep expectation of real and apparent age from a single image without facial landmarks. IJCV, 126(2-4):144–157, 2018.
  • [28] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. IJCV, 115(3):211–252, 2015.
  • [29] S. Sarkar, P. J. Phillips, Z. Liu, I. R. Vega, P. Grother, and K. W. Bowyer. The humanID gait challenge problem: Data sets, performance, and analysis. IEEE TPAMI, 27(2):162–177, 2005.
  • [30] A. Shashua and A. Levin. Ranking with large margin principle: two approaches. In NIPS, pages 961–968, 2003.
  • [31] W. Shen, Y. Guo, Y. Wang, K. Zhao, B. Wang, and A. Yuille. Deep regression forests for age estimation. arXiv:1712.07195, 2017.
  • [32] W. Shen, Y. Guo, Y. Wang, K. Zhao, B. Wang, and A. L. Yuille. Deep regression forests for age estimation. In CVPR, pages 2304–2313, 2018.
  • [33] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.
  • [34] A. J. Smola and B. Schölkopf. A tutorial on support vector regression. Statistics and Computing, 14(3):199–222, 2004.
  • [35] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. JMLR, 15(1):1929–1958, 2014.
  • [36] N. Takemura, Y. Makihara, D. Muramatsu, T. Echigo, and Y. Yagi. Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition. IPSJ TCVA, 10(4):1–14, 2018.
  • [37] T. Wada, Y. Matsumura, S. Maeda, and H. Shibuya.

    Gaussian process regression with dynamic active set and its application to anomaly detection.

    In ICDM, 2013.
  • [38] Z. Wu, Y. Huang, L. Wang, X. Wang, and T. Tan. A comprehensive study on cross-view gait based human identification with deep CNNs. IEEE TPAMI, 39(2):209–226, 2017.
  • [39] B. Xiao, X. Yang, H. Zha, Y. Xu, and T. S. Huang. Metric learning for regression problems and human age estimation. In PCM, pages 88–99, 2009.
  • [40] C. Xu, Y. Makihara, G. Ogi, X. Li, Y. Yagi, and J. Lu. The OU-ISIR gait database comprising the large population dataset with age and performance evaluation of age estimation. IPSJ TCVA, 9(1):24, 2017.
  • [41] Z. Yang and H. Ai. Demographic classification with local binary patterns. In ICB, pages 464–473, 2007.