Transferring Rich Deep Features for Facial Beauty Prediction

03/20/2018 ∙ by Lu Xu, et al. ∙ NONG Village Practical JISHUYU Information Magazine University of North Texas 0

Feature extraction plays a significant part in computer vision tasks. In this paper, we propose a method which transfers rich deep features from a pretrained model on face verification task and feeds the features into Bayesian ridge regression algorithm for facial beauty prediction. We leverage the deep neural networks that extracts more abstract features from stacked layers. Through simple but effective feature fusion strategy, our method achieves improved or comparable performance on SCUT-FBP dataset and ECCV HotOrNot dataset. Our experiments demonstrate the effectiveness of the proposed method and clarify the inner interpretability of facial beauty perception.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 4

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Facial beauty analysis [1] has been widely used in many fields such as facial image beautification APPs (e.g., MeiTu and Facetune), plastic surgery, and face-based pose analysis [2]

. In the mobile computing era, billions of images per day are acquired and uploaded to social networks and online platform, leading to the demand for better image processing and analyzing technology. Recently, thanks to the big data and high-performance computational hardware, computational and data-driven approaches have been proposed for solving these questions such as face recognition, facial expression recognition, facial beauty analysis and etc.

The existing methods resort to machine learning and computer vision techniques to analyze facial beauty and achieve promising results

[3]

. The methods often include image feature descriptors (such as HOG, SIFT, LBP, etc) and supervised machine learning predictors (such as SVM, KNN, DNN, LR, etc).

In order to explore the best facial beauty prediction approach that precisely maps high-level features into face beauty ratings, we propose a method that combines transfer learning and Bayesian regression. The method achieves the improved or comparable performance on SCUT-FBP dataset [4] and ECCV HotOrNot dataset [5].

The main contributions of this paper are as follows:

  • We apply transfer learning to our facial beauty prediction problems for feature extraction. Experimental results show that the transferred deep features can attain more impressive performance compared with the traditional image feature descriptors such as HOG, LBP and gray value features.

  • We make a detailed analysis about deep features based on knowledge adaptation. Additionally, we perform an effective feature fusion strategy to build more informative facial features in our facial beauty prediction task.

  • Studies found that the neural networks are lack of satisfactory interpretation. We make ablative studies by visualizing the face feature and reveal the elements that influence facial beauty perception.

The rest of this paper is organized as follows. Section II reviews the related works of facial descriptor and learning methods. Section III describes our proposed method in details, which include deep feature extraction and Bayesian ridge regression. Experimental results and comparisons are presented in Section IV and Section V concludes this paper with a summary and future work.

Ii Related Work

Ii-a Facial Descriptors and Machine Learning Predictors

Many researchers focus on developing new machine learning algorithms to achieve better classification or regression performance, while others focus on designing better facial feature descriptors. Zhang et al. [6]

combine several low-level face representations and high-level features to form a feature vector and perform feature selection to optimize the feature set. Eisenthal et al.

[7] use a vector of gray values created by concatenating the rows or columns of an image. Huang et al. [8]

propose a method to learn hierarchical representations of convolutional deep belief networks. Xie et al.

[4] resort to deep learning to train a predictor and achieve state-of-the-art performance. Amit et al. [9] use numerous facial features that describe facial geometry, color and texture to predict facial attractiveness. Lu et al. [10] detect face landmarks with ASM and then extract facial features based on Blocked-LBP which achieved the Pearson Correlation at 0.874 on 400 high-quality female face images. Zhang et al. [11] compute geometric distances between feature points and ratio vectors composed of geometric distances, and then treat them as features for machine learning algorithm. For the lack of abundant labeled images, it always takes lots of time to fine-tune the deep neural networks architecture and parameters to achieve a comparative result and avoid overfitting problems as well.

In addition, some research works towards developing or improving new machine learning algorithms. Eisenthal et al. [7]

employ KNN and SVM as classifiers to rate faces belongs to different levels. Gan et al.

[12] use deep self-taught learning to obtain hierarchical representations and learn the concept of facial beauty. Xu et al. [13]

propose a method which constructs a convolutional neural network (CNN) for facial beauty prediction using a new deep cascaded fine tuning scheme with various face inputting channels. Wang et al.

[14] use deep auto encoders to extract features and take a low-rank fusion method to integrate scores, and their method achieves promising results. Xu et al. [15] propose “psychologically inspired CNN (PI-CNN)” for automatically facial beauty prediction.

Ii-B Deep CNN and Transfer Learning

Deep learning allows computational models that are composed of processing layers to learn representations of data with multiple levels of abstraction [16]. CNN is a type of neural networks which is designed to process data that come in form of multiple arrays. Deep learning has been used as a dramatically powerful tool in computer vision tasks such as image recognition [17, 18, 19, 20]. The features are automatically extracted via stacked layers. Neural networks are trained through back-propagation algorithm to minimize the cost function.

Deep convolutional neural networks show more extraordinary capacity in feature extraction than traditional hand-crafted descriptors. However, we may need to design different networks architectures and train the deep neural networks almost from scratch to satisfy our task, which takes much computational burden. Transfer learning allows us to fine-tune the higher layers based on a pretrained model, or even just treat the pretrained model as a feature extractor.

Yosinski et al. [21] show that initializing a network with transferred features from almost any number of layers can produce an improvement to the generalization even after fine-tuning to the target domain dataset. Yoshua Bengio et al. [22] explore why unsupervised pre-training of representations can be useful, and how it can be exploited in the transfer learning scenario. Donahue et al. [23]

show that the features extracted by deep convolutional neural networks pretrained on ImageNet can achieve much better performance than many algorithms on lots of classification tasks, which illustrates the great generality and transferability of deep convolutional neural networks.

Iii Method

Iii-a VGG Network

We include a brief review of VGG, which is employed by our proposed method. VGG [18] consists of 16-19 weight layers and very small () convolution filters as well. Fig. 1 shows the overall architecture of the VGG16 networks. Though VGG networks architecture is simple, it is widely used in many computer vision tasks. In our experiments, we take a VGG face model which is pretrained on a face verification task [24]. Although the original task is absolutely different from our facial beauty prediction task, it shows dramatically impressive performance. We believe the main reason for this issue can be attributed to the extraordinary feature representation power of deep CNNs.

Fig. 1: Networks architecture. we adopt VGG16 in our feature extraction procedure, which is composed of multiple small convolutional filters to extract more informative features compared with bigger filters used in AlexNet [17] for ImageNet recognition task.

Iii-B Deep Feature Extraction

Several research works [22, 16] show that the deep convolutional neural networks can learn increasingly powerful representations as the feature hierarchy becomes deeper. However, due to the limited labeled face images, if we train a deep convolutional neural network directly, we may suffer from severe overfitting problems. Recently, transfer learning has aroused much attention [25], which enables us to fine-tune from a pretrained model or just treat the learned neural network as a feature extractor to satisfy our tasks [21].

We extract facial features with VGG face model [24] pretrained on face verification task. Despite their target task is different from our facial beauty prediction task, the feature can achieve remarkable performance, which indicates extraordinary feature representation power of CNNs to some degree. Researches [21] show that the features in lower layers contain more detailed information while features in higher layers represent more semantic meaning. Our method concatenates on both relatively low layer’s features and relatively high layer’s features as our facial representation. We also use HOG, grayscale and LBP features in our experiments for comparison to evaluate the feature extraction capacity of deep CNNs.

Iii-C Bayesian Ridge Regression

We feed the concatenated feature vectors into Bayesian ridge regressor. Bayesian ridge regression includes regularization during estimation procedure: the regularization item is not embedded with cost function directly, but tuned to your data distribution. The

regularization used in Bayesian ridge regression is equal to maximizing a posterior estimation of the parameters with precision under a Gaussian prior.

The output

is assumed to be Gaussian distributed around

in order to form a fully probabilistic model:

(1)

Bayesian ridge regressor evaluates a probabilistic model of the regression problem. The prior for the parameter is decided by a spherical Gaussian:

(2)

The priors over and

are chosen to be Gamma distributions, the conjugate prior for the precision of the Gaussian.

The parameters , and

are estimated jointly during the fit procedure. The remaining hyperparameters are the parameters of the Gamma priors over

and . All the parameters are tuned by maximizing the marginal log likelihood.

Iv Experiments

We implement our method with TensorFlow 

[26] and Scikit-Learn [27] on an Ubuntu server with NVIDIA Tesla K80 GPU and Intel Xeon CPU.

Iv-a SCUT-FBP Dataset

The SCUT-FBP dataset [4] contains images of 500 Asian females. Each image is scored by 10 raters, the main task is to build a computational model to predict the average score of the human portrait image.

Since the images in SCUT-FBP [4]

are not in same size, deep CNNs can only support fixed squared data as input. We conduct three methods named “Crop”, “Warp” and “Padding” to get squared images respectively. In “Crop” setting, we detect face provided by

[28] and crop the face region, then we resize it to . In “Warp” setting, we just warp the image forcely to form a image. In “Padding” setting, we resize the longer side to 224 and zero-pad the shorter side to form a image (See Fig. 2

). We also normalize the input image by substracting the mean and dividing the standard variance of the pixels. Furthermore, we manually crop the central region of the image and treat it as the input for our neural networks in case of failed face detection. In SCUT-FBP dataset, we concatenates the

conv5_1 and conv4_1 layer’s features. The pipeline is shown in Fig. 3:

(a) Original
(b) Crop
(c) Warp
(d) Padding
Fig. 2: Different settings to form a squared image: (a) Original image in SCUT-FBP. (b) Cropped image. (c) Image warp. (d) Image padding. We conduct these experiments to see whether the facial beauty perception is correlated with non-facial elements such as haircut, wearing, posture and etc.
Fig. 3: Pipeline of our proposed method. The face is detected and then fed into CNNs, we concatenate and layers’ feature maps, and flatten them into feature vectors for the input of Bayesian ridge regression.

Iv-B Performance Evaluation

In our experiment, we use Pearson Correlation (PC), Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) as the criteria for evaluating our method.

(3)
(4)
(5)

where m denotes the number of images, denotes the input feature vector of image , denotes the learning algorithm, denotes the groundtruth attractiveness score of image .

MAE and RMSE measure the fit quality of the learning algorithms, the performance is better if the value is closer to zero. PC measures the linear correlation between and . Its value lays between 1 and -1, where 1 means absolutely positive linear correlation, 0 means no linear correlation, and -1 means absolutely negative linear correlation.

In order to make the prediction more reliable and reproducable, we follow the provision denoted in [4] for fair comparison. We randomly select 400 images as training set and the rest 100 images as test set. Finally, we average the 5 experimental results as the final performance to remove sample variances. The results are shown in TABLE I.

MAE RMSE PC
1 0.2569 0.3418 0.8735
2 0.2594 0.3470 0.8299
3 0.2479 0.3117 0.8929
4 0.2651 0.3473 0.8562
5 0.2680 0.3508 0.8323
AVG 0.2595 0.3397 0.8570
  • Performance on SCUT-FBP [4] of 5 rounds. We average the 5 results to remove examples variances as denoted in [4].

TABLE I: AVERAGE PERFORMANCE

TABLE II shows performance comparison with other methods. The best performance is marked with bold font and the second best is highlighted with an underline. Our method ranks the second place on SCUT-FBP [4] dataset.

Method RMSE MAE PC
KeyPointGabor+PCA+SVR 0.5606 0.5541 0.5490
KeyPointGabor+PCA+Gaussian Reg 0.6152 0.4724 0.4591
UniSampleGabor+PCA+SVR 0.5452 0.4230 0.5847
UniSampleGabor+PCA+Gaussian Reg 0.5164 0.3969 0.6347
Combined Features+SVR [4] 0.5120 0.3961 0.6433
Combined Features+Gaussian Reg [4] 0.5149 0.3931 0.6482
CNN-based [4] - - 0.8187
PI-CNN [15] - - 0.87
Ours 0.2595 0.3397 0.8570
  • Performance comparison with other methods, our method ranks the second place on PC and first place on RMSE and MAE, respectively. The best and second results are emphasized in bold and underline respectively. Since RMSE and MAE of CNN-based methods proposed in [4] and [15] are not given and are hence denoted with “-”.

TABLE II: PREDICTOR PERFORMANCE COMPARISON

Iv-C Ablation Analysis

It is almost a common sense in machine learning practice is that “feature matters”. To illustrate the feature extraction capability by deep learning, we conduct experiments based on different features including HOG, LBP, gray image and transferred deep features for performance comparison and visualization:

Fig. 4: Visualization of the original portrait images and their corresponding visualized features described by different feature extractors: (top) Original portrait images. (middle) Visualization of HOG descriptors. (down) Visualization of LBP descriptors.
  • Raw Grayscale: we convert the RGB facial images into their corresponding gray scale ones, and the flattened pixel gray scale value is used as the feature.

  • HOG: HOG is an image feature descriptor which is widely used in computer vision and image processing for object detection tasks. Details can be found in [29].

  • LBP: LBP is a type of feature descriptor which especially cares more about texture details, and is widely used in many machine vision tasks.

Feature RMSE MAE PC
TransCNN 0.2595 0.3397 0.8570
HOG 0.3308 0.4394 0.6216
LBP 0.3987 0.4800 0.5631
Gray Scale 0.4008 0.4889 0.5149
  • Performance comparison with other feature descriptors on Bayesian ridge regression. Our transferred deep features outperforms other descriptors with a large margin. The best results are given in bold font.

TABLE III: PERFORMANCE COMPARISON BETWEEN DIFFERENT FEATURES

In addition, we compare the feature performance from different layers to find which layer produces the most discriminative features (See Fig. 5).

Moreover, among three preprocessing methods (Crop, Warp, and Padding), Crop achieves the best performance on SCUT-FBP, which indicates that facial region plays a more significant part in beauty perception, while background may act as noise in our facial beauty prediction task on SCUT-FBP dataset (See TABLE. IV).

Fig. 5:

Performance comparison between different layers: the performance gets better as layer goes deeper, which means the deep CNN extracts more discriminative features. It decreases sharply after max pooling operation, which may be attributed as heavy spatial information loss.

Fig. 5 depicts that as layer goes deeper the performance gets better, and reaches the best at conv5_1. While when feature maps are flattened into vectors, we see a sharp drop in performance, which may be attributed as the heavy spatial information loss.

Crop Warp Padding
PC 0.8570 0.7376 0.8255
  • Performance of diffrent preprocessing methods (“Crop”, “Warp”, and “Padding”) on SCUT-FBP. “Crop” achieves the best.

TABLE IV: PEARSON CORRELATION OF DIFFERENT PREPROCESSING METHODS

Iv-D ECCV HotOrNot Dataset

ECCV HotOrNot dataset [5] contains 2056 faces which are collected from the Internet. Each face is labeled with a score, and the dataset has already been split into 5 training and test datasets. Unlike SCUT-FBP dataset [4], the faces in ECCV HotOrNot dataset [5] are more challenging because of the variant postures, cluttered background, illumination, low resolution and unaligned faces problems, which make the facial beauty prediction more difficult (See Fig. 6).

ECCV HotOrNot dataset uses Pearson Correlation (PC) for performance metric. We also list MAE and RMSE for more detailed comparison.

(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig. 6: ECCV HotOrNot face samples: this dataset is more challenging due to low resolution (b), illumination problem (c), gray version (d), occlusion (e), different race (f), cluttered background (g) and unaligned posture (h). Aligned samples are like (a).

Iv-E Ablation Study

We concatenate and layers’ feature maps and flatten them to form more informative features. The concatenated features are then fed into Bayesian ridge regression algorithm [30].

We implement two means to evaluate the impact of preprocessing techniques. In solution A, we run face detector [28] to detect 68 facial landmarks and the facial region. For grayscale images, we replicate the gray pixel value twice to form an RGB channels image. Then we calculate the inclination angle to the horizontal line with two eyes coordinates, which is denoted as . If , we rotate the face around the central point by

degree and crop the facial region. The mean pixel value is subtracted from the cropped image, which is normalized by its standard deviation. Solution B includes mean subtraction and standard error division on the original images. No additional preprocessing is taken.

Fig. 7: , which means these samples are not well predicted by our algorithm.
Fig. 8: , which means these samples are well fitted by our algorithm.
solution A solution B
Dataset RMSE MAE PC RMSE MAE PC
1 0.9417 1.1948 0.3970 0.9140 1.1493 0.4656
2 0.9755 1.2406 0.4022 0.9210 1.1562 0.4728
3 0.9293 1.1810 0.3840 0.8989 1.1258 0.4694
4 0.9469 1.1788 0.3898 0.8736 1.1087 0.4775
5 0.9394 1.1856 0.3862 0.9104 1.1316 0.4541
Average 0.9466 1.1962 0.3918 0.9036 1.1343 0.4679
  • The ECCV HotOrNot dataset [5] has been divided into 5 parts which contain training set and test set respectively. We compare the performance of solution A and solution B. Much to our surprise, solution B achieves better results with a large margin. It may be explained by the extra non-facial information such as hairstyle, wearing, posture, etc.

TABLE V: PERFORMANCE ON ECCV HOTORNOT DATASET

We find that solution B achieves much better performance than solution A, the results can be found in TABLE V. We believe the main reason is that the annotators may also take extra information such as haircut, posture, and clothing into consideration while labeling these facial beauty scores, instead of just measuring face region.

Additionally, we define , which describes the error between the predicted facial beauty score () and the ground truth beauty score (). If , we believe there is a relatively severe bias among the predicted values and ground truth scores. If , we believe our algorithm fits these samples perfectly.

In this part, we set and for detailed analysis (See Fig. 7 and Fig. 8). We believe the performance could be greatly improved through face alignment techniques. Besides, posture and facial expression may also contribute to beauty perception because our algorithm fails to capture these samples with variant postures.

Table VI compares the Pearson Correlation of our proposed method with five state-of-the-art methods. Our method outperforms other methods and achieves the best performance on ECCV HorOrNot dataset without face alignment.

Method PC
Eigenface 0.180
Single Layer Model 0.417
Two Layer Model 0.438
Multiscale Model [5] 0.458
Auto Encoder [14] 0.437
Ours 0.468
  • Performance comparison on ECCV HotOrNot dataset [5]. Pearson Correlation (PC) is used for evaluating performance. Our method achieves the best result on this dataset as mentioned in [5].

TABLE VI: PEARSON CORRELATION OF HOTORNOT DATASET.

V Conclusion

In this paper, we propose a method which extracts rich deep facial features through knowledge adaptation, and then trains Bayesian ridge regression algorithm for face beauty prediction. Despite that the VGG model is pretrained for a totally different task, it also captures more descriptive information than conventional hand-crafted features, and even outperforms many deep learning-based methods in our facial beauty prediction task, which shows the great generality of deep features in transfer learning. With our feature fusion strategy, our method outperforms other methods and achieves the state-of-the-art performance on ECCV HotOrNot dataset [5] without face alignment and comparable performance on SCUT-FBP dataset [4]. In our future work, we plan to explore 3D face alignment and novel networks architecture for extracting more descriptive features.

Acknowledgment

This work was primarily supported by Foundation Research Funds for the Central Universities (Program No.2662017JC049) and State Scholarship Fund (NO.261606765054).

References

  • [1] D. I. Perrett, K. A. May, and S. Yoshikawa, “Facial shape and judgements of female attractiveness,” Nature, vol. 368, no. 6468, pp. 239–42, 1994.
  • [2] Y. Liu, Z. Xie, X. Yuan, J. Chen, and W. Song, “Multi-level structured hybrid forest for joint head detection and pose estimation,” Neurocomputing, vol. 266, no. Aug, pp. 206–215, 2017.
  • [3] D. Zhang, F. Chen, and Y. Xu, Computer models for facial beauty analysis.   Springer, 2016.
  • [4] D. Xie, L. Liang, L. Jin, J. Xu, and M. Li, “Scut-fbp: A benchmark dataset for facial beauty perception,” in Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on.   IEEE, 2015, pp. 1821–1826.
  • [5] D. Gray, K. Yu, W. Xu, and Y. Gong, “Predicting facial beauty without landmarks,” ECCV 2010, pp. 434–447, 2010.
  • [6] F. Chen, X. Xiao, and D. Zhang, “Data-driven facial beauty analysis: Prediction, retrieval and manipulation,” IEEE Transactions on Affective Computing, 2016.
  • [7] Y. Eisenthal, G. Dror, and E. Ruppin, “Facial attractiveness: Beauty and the machine,” Neural Computation, vol. 18, no. 1, pp. 119–142, 2006.
  • [8] G. B. Huang, H. Lee, and E. Learned-Miller, “Learning hierarchical representations for face verification with convolutional deep belief networks,” in

    Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on

    .   IEEE, 2012, pp. 2518–2525.
  • [9] A. Kagian, G. Dror, T. Leyvand, D. Cohen-Or, and E. Ruppin, “A humanlike predictor of facial attractiveness,” in Advances in Neural Information Processing Systems, 2007, pp. 649–656.
  • [10] G. Lu, X. Xiao, and F. Chen, “A new face beauty prediction model based on blocked lbp,” in International Conference on Computer Vision Theory and Applications, 2016, pp. 87–92.
  • [11] D. Zhang, Q. Zhao, and F. Chen, “Quantitative analysis of human facial beauty using geometric features,” Pattern Recognition, vol. 44, no. 4, pp. 940–950, 2011.
  • [12] J. Gan, L. Li, Y. Zhai, and Y. Liu, “Deep self-taught learning for facial beauty prediction,” Neurocomputing, vol. 144, pp. 295–303, 2014.
  • [13] J. Xu, L. Jin, L. Liang, Z. Feng, and D. Xie, “A new humanlike facial attractiveness predictor with cascaded fine-tuning deep learning model,” arXiv preprint arXiv:1511.02465, 2015.
  • [14] S. Wang, M. Shao, and Y. Fu, “Attractive or not?: Beauty prediction with attractiveness-aware encoders and robust late fusion,” in Proceedings of the 22nd ACM international conference on Multimedia.   ACM, 2014, pp. 805–808.
  • [15] J. Xu, L. Jin, L. Liang, Z. Feng, D. Xie, and H. Mao, “Facial attractiveness prediction using psychologically inspired convolutional neural network (pi-cnn),” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2017, pp. 1657–1661.
  • [16] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
  • [17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
  • [18] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • [19] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on CVPR, 2015, pp. 1–9.
  • [20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on CVPR, 2016, pp. 770–778.
  • [21] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” in Advances in neural information processing systems, 2014, pp. 3320–3328.
  • [22] Y. Bengio, “Deep learning of representations for unsupervised and transfer learning,” in Proceedings of ICML Workshop on Unsupervised and Transfer Learning, 2012, pp. 17–36.
  • [23] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “Decaf: A deep convolutional activation feature for generic visual recognition,” in International conference on machine learning, 2014, pp. 647–655.
  • [24] O. M. Parkhi, A. Vedaldi, A. Zisserman et al., “Deep face recognition.” in BMVC, vol. 1, no. 3, 2015, p. 6.
  • [25] X. Yuan, D. Li, D. Mohapatra, and M. Elhoseny, “Automatic removal of complex shadows from indoor videos using transfer learning and dynamic thresholding,” Computers and Electrical Engineering, in press, doi:10.1016/j.compeleceng.2017.12.026, 2018.
  • [26] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
  • [27] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al., “Scikit-learn: Machine learning in python,” J. of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
  • [28] D. E. King, Dlib-ml: A Machine Learning Toolkit.   JMLR.org, 2009.
  • [29] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1.   IEEE, 2005, pp. 886–893.
  • [30]

    D. J. MacKay, “Bayesian interpolation,”

    Neural computation, vol. 4, no. 3, pp. 415–447, 1992.