Introduction
Face alignment, aiming at accurately and robustly localizing facial landmarks, plays a key role to many automatic facial analysis tasks including face recognition, expression recognition, attribute analysis, and animation. Recently, cascaded regression has become one of the most popular approaches to face alignment due to its accuracy and robustness
[Ren et al.2014, Xiong and De la Torre2013, Kowalski, Naruniec, and Trzcinski2017]. This approach learns a series of regressors between shapeindexed features and shape updates or gradients from a set of manually labeled face images. Inevitably, the performance of cascaded regression highly depends on the quantity and quantity of training examples. The quantity of unlabeled facial images is not a problem in this ’bigdata’ era, but example labeling and the quality of labels are still critical. In this study, we focus on these critical issues for cascaded regression.Despite of its great success, the discrepancy or mismatch between limited training examples and the huge solution space typically downgrades the stability and accuracy of cascaded regression. One typical treatment is to divide the original shape space into smaller subspaces [Zhu et al.2016, Tuzel, Marks, and Tambe2016]. Researchers also attempt to group relevant input features for mitigating mismatches [Cao et al.2014, Ren et al.2014]. The cascade Gaussian process (GP) regression trees find input features showing consistent appearance through GP kernel functions [Lee, Park, and Yoo2015]. The common strategy of these methods lies in that they ‘tighten’ the correlation between input feature and target shape from the perspective of local appearance.
Alternatively, researchers resort to the global geometry (shape) among facial landmarks in order to address the discrepancy issue. Martinez et al. embed nonparametric Markov networks [Martinez et al.2013], while Liu et al. incorporate sparse shape constraints into regression [Liu, Deng, and Tao2016]. In addition to these explicit shape models, Li et al. discover the common geometry shared by human faces using a projective invariant, called characteristic number (CN), and append this geometric regression to appearance [Li et al.2015]. These various forms of facial geometric representation are able to regularize the regression, and thus improve the robustness of alignment.
It is commonly accepted in the machine learning (ML) community that training examples are central to any ML algorithms including regression. Unfortunately, the aforementioned alignment algorithms pay more attention to the regression mechanism, instead of data itself, to tackle the issue arisen from data discrepancy. Targeting at data preparation for training and validating regressors, Sagonas
et al. develop a semiautomatic tool to annotate facial landmarks [Sagonas et al.2013], but how these annotations may affect regression is untouched in their study. Antonakos et al.generate bounding boxes as face labels and validate these labels in the context of linear parametric models but not more complex cascade regression
[Antonakos and Zafeiriou2014]. Recently, Zhang et al. develop a complicated deep network to leverage face annotations across data sets [Zhang et al.2015]. Nevertheless, a general framework is still highly demanded to fuse the discovering and upgrading training examples of low discrepancy into cascaded regression for face alignment.Selfreinforcement refers to “a process whereby individuals control their own behavior by rewarding themselves when a certain standard of performance has been attained or surpassed” [Artino2011]. In this paper, we propose selfreinforced cascaded regression that upgrades itself through minimizing an objective function analogous to meeting the performance standard. The optimization process iteratively updates example labeling, sample survival, and regression in one framework as shown in Fig. 1. The process starts from predicting unlabeled faces by the regression trained from a small number of labeled examples, and then evaluates the consistence of predicted labels on both local appearance and global geometry of human faces. Those survived examples are fed to train an upgraded regression. This process iteratively runs until convergence, yielding the cascaded regression for accurate and robust alignment.
The objective in our framework is not directly defined on the consistence between predicted labels and the ground truth as typical semisupervised learning
[Zhu and Goldberg2009] that has the risk of overfitting, but is derived from indirect consistency with local appearance and global geometry. This independence on regressors is so general to generate the selfreinforced versions of various cascaded regression algorithms. We demonstrate that our strategy is able to automatically predict and find good examples starting from a subset as small as one hundred for typical regressors [Ren et al.2014] and [Zhu et al.2015], and even deep networks [Kowalski, Naruniec, and Trzcinski2017]. These selfreinforced regressions output comparable accuracy with the stateoftheart on the 300W set consisting of the test sets of LFPW and Helen [Le et al.2012] when only a small fraction of labeled examples are available, validating its effectiveness.Related Work
In this section, we review recent advances on labeling or generate examples in the machine learning community.
Semisupervised learning attempts to use unlabeled data for performance improvements of classifiers trained by a small number of labeled examples [Zhu and Goldberg2009]. It has made great progress on solving the discrete classification problems in this decade [Li and Fu2013, Li and Zhou2015]. However, it is nontrivial to directly bring the semisupervised algorithms for discrete problems to cascaded regression where target shape updates are continuous and the solution space is quite huge. Selfpaced learning (SPL), falling in the category of semisupervised learning, include training samples in an easytocomplex fashion [Jiang et al.2014, Singh et al.2015]. Our approach shares commons with SPL on example selection embedded in the training process, differing in that our objective is general and decoupled from the training objective.
Generative adversarial network (GAN) [Goodfellow et al.2014]
is able to generate visually realistic images by competing two deep networks, a generator and a discriminator. Recently, GAN finds wide applications in many low level image processing tasks such as superresolution
[Ledig et al.2016] and image attribute transfer [Huang et al.2017]. Semisupervised learning can also be combined with GAN in order to improve the realism of a simulator’s output while preserving the annotation information [Shrivastava et al.2016] . Our example prediction and survival share the similar spirit with the generative and discriminative processes in GAN, respectively. But GAN has to initialize from a relatively larger number of examples to train two deep networks as the generator and discriminator, and provides no explicit regressor as selfreinforced regression does.SelfReinforced Cascaded Regression
We describe our selfreinforced cascaded regression that defines an objective function with a local appearance and a global geometry discrepancy to iteratively expand the training set and simultaneously upgrade the regressor as shown in Fig. 1
General formulation
We attempt to devise a general formulation where the selfreinforcement is embedded with cascaded regression. Typical cascaded regression minimizes a loss function
, where is the annotated shape of the th sample in the training set . The symbol indicates the shapeindexed feature of the th sample image, and denotes the parameters of the learnt regressor. We denote as the regularization term and as a hyper parameter, and thus have a general representation for cascaded regression as follow:(1) 
Given the cascaded regression representation (1), we impose a regularize term to formulate the iterative reinforcement of predicted examples as:
(2)  
where the subscript indicates the th iteration. The training set for the regression
includes either manually labeled or originally unlabeled examples with predicted annotations. The vector
consists of the binary that indicates whether the th sample is accurately labeled or not, and the parameter is a weight that determines the number of survived samples. The increase of during the iteration leads to including more samples for regression.The objective function (2) embraces the regression , shape labels and example selection into one general framework whose optimization brings the joint upgrading of all these factors. Consequently, the optimization of this objective forms a complicated problem with the mixture of continuous and discrete variables. We resort to an iterative approximation to find the solution of (2). First, we fix and to find the optimal regression parameters . The problem (2) degrades to conventional cascaded regression (1), e.g., [Ren et al.2014] and [Zhu et al.2015] as detailed in the next section. For initialization, is set to if the sample is manually labeled otherwise .
Once the trained regression is available, we are able to predict the unlabeled or to update the labeled subset. Given fixed and in (2), the updating of example labels becomes:
(3) 
This minimization is equivalent to perform a prediction by applying the learned cascaded regression. This updating is so important in our selfreinforced regression that the process does not only expand the example quantity but also improves the labeling accuracy by the regression trained from the survived examples in the previous iteration.
Finally, we update with and fixed by degenerating (2) to:
(4) 
We compute the indicator upon local appearance and global geometry of human faces:
(5) 
where the parameter weighs appearance and geometry. The value is derived from local appearance indicating how accurately an individual landmark labels, and indicates how a group of predicted labels satisfies the common geometry of human faces. The calculation of this new regularization term is independent to the regression , providing the generalization for various regression algorithms.
Remark: The calculation of and acts as the goodness evaluation of individuals (examples), and hence initiates adjusting the behavior (accuracy) of individuals and that of cascaded regression for the next iteration, constructing the selfreinforcement process. The binary indicator specifies whether one label survives or not, implying the wellknown law of nature “survivor of the fittest”. As nature evolves repeatedly, our selfreinforced cascaded regression iteratively upgrades from a small subset of labels until and are stable as shown in Fig. 1.
Local appearance discrepancy
We define as the discrepancy (similarity) among the shapeindexed features (concatenating HOG [Dalal and Triggs2005] and FREAK [Alahi, Ortiz, and Vandergheynst2012]) associated with an individual landmark. Figure 2 demonstrates the patches around three landmarks, i.e., the right corner, the upper boundary of the right eye, and the nose tip, from manually labeled images. The patches around the same landmark exhibit similar appearance, while greatly different from the other landmarks. Hence, the consistency of local patches around a landmark is able to indicate the accuracy of the labeled position.
We take a straightforward technique to train an offline naive Bayes classifier that discriminates those labels with inconsistent neighboring appearance. We generate the positive and negative samples for training the classifier from the originally labeled subset by assuming that labeled and predicted landmarks are normally distributed. Hence, we randomly perturb the ground truth labels with a normal distribution, and compute the distance
between the ground truth and the perturbed landmark . The feature around the landmark whose is less than a threshold(related to the standard deviation of the Gaussian distribution) is taken as one positive sample for the classifier, others as the negative. This generation scheme is illustrated in Fig.
3, where the white dot denotes the ground truth, the red ones stand for positive samples and the blue for negative ones.Given a predicted landmark, we apply the trained classifier to determine whether the landmark is a valid prediction, and evaluate the local appearance discrepancy for a predicted (or labeled) example as the portion of valid landmarks in the example:
(6) 
The symbol denotes the set of local features for all landmarks in the th sample, is the number of landmarks,and the local feature vector has components. The classifier output is binary, where indicates a valid landmark and zero stands for an invalid one.
Global geometry discrepancy
The above discrepancy can only reflect the local feature consistency around a landmark. We use the intrinsic facial geometry given by a projective invariant, named the characteristic number (CN) [Fan et al.2015], to evaluate the discrepancy of predicted or labeled examples.
Fan et al. discover the common geometry on 8 landmarks [Li et al.2015]. Herein, we are considering to label and select examples with 68 landmarks. Unfortunately, it is prohibited for us to investigate all combinations of these 68 landmarks. We pick 14 landmarks that are all stably presented in all face examples, shown as the blue points in Fig. 4(a). We enumerate all possible threepoint, fivepoint and sixpoint^{1}^{1}1Four points cannot construct a projective invariant. combinations of these 14 landmarks, and then calculate the CN values of these combinations on all available samples. If a combination presents one common CN value with low standard deviation for all sample images, we set the value as the intrinsic value reflecting the common geometry underlying this landmark combination. Figure 4(b) and (c) show one sample with correctly labeled landmarks and another with an inaccurately labeled landmark, respectively. Their CN values are quite different. We have to emphasize that this process seeking combinations with stable intrinsic values only runs once for a large face data set. We verify the CN values of predicted landmark annotations on these fixed combinations in the iterative selecting process.
It is reasonable to regard a set of landmark annotations (labels) as valid when its CN value falls within a range around its corresponding intrinsic value, recorded as [ ]. Accordingly, the discrepancy for the global geometry is given below:
(7)  
is the th combination of CN values in the th sample, and is the total number of combinations, each of which can give one intrinsic value.
Alignment Algorithms
The last regular term in (2) is independent on the choice of regression, and thus it is ready to embed the proposed algorithm into any cascaded regression algorithms. In this section, we exemplify the embedding to two algorithms LBF [Ren et al.2014] and CFSS [Zhu et al.2015] that balance accuracy and efficiency.
In every iteration, LBF have two updating stages: one for learning local binary features
, and the other for global linear regression
. We pose the learning for the first stage as the minimization of the objective function (8), where is the ground truth 2dimensional offset of the th landmark in the th training sample. is the facial image corresponding to sample:(8)  
Subsequently, we transform (2) into (9) in order to obtain the linear regression in LBF and combine it into our formulation.
(9)  
Comparing (2) with (9), we have and . Consequently, we have the LBF algorithm embedded with our selfreinforcement.
The training of CFSS is to iteratively estimate a finer shape subregion,
, where is the center of the estimated subregion andis the probability distribution depicting the subregion around the center. We simply replace the regression stage in (
2) with the iterative training of CFSS. At this moment, the regression parameter
indicates , and then we can apply the selfreinforced process for CFSS.Experimental Results and Analysis
The experiments were performed on six widely used datasets include FRGC v2.0, LFPW, HELEN, AFW, iBUG and 300W. All faces are labeled 68 landmarks. We compute the alignment error for testing images using the standard mean error normalized by the interpupil distance (NME). The value of error indicates the percentage of the interpupil distance, and we simply ignore the symbol ‘%’.
Firstly, we verify the correlation between our discrepancy (no groundtruth label is available for its computation) and labeling error against the groundtruth. Then, we perform our selfreinforcement on two typical regressors and one recent deep model, resulting in examples of high quality at seven to twenty times, and finally compare our regression, whose training starts from a small number of labeled faces, with recent alignment algorithms.
Correlation between discrepancy and error
We analyze the effectiveness of discrepancy that evaluates the example goodness in our selfreinforcement. The discrepancy attemps to reflect the labeling error, i.e., how inaccurate a sample is labeled. Generally, samples exhibiting larger discrepancy have higher labeling error.
To verify the correlation between the discrepancy and labeling error, we randomly chose 100 samples in LFPW, and trained an alignment regressor with these samples. Other 711 samples in LFPW were then labeled with the trained regressor. The labeling error and discrepancy of these predicted samples are plotted in Fig. 5. The axis is sample IDs sorted by labeling error in an ascending order. The red line indicates the labeling error and one blue circle denotes the value of discrepancy for each sample. Figure 5 demonstrates that there is a strong correlation between the discrepancy and labeling error. The values of the discrepancy for corresponding samples climb up with the increase of labeling error. The red line fits the changes of the discrepancy very well. This fittingness verifies that the defined discrepancy reflects how accurate a label is. Therefore, every time we keep the samples having lower discrepancies, we have the most accurately labeled sample survived. These labels of low discrepancy introduce minimal error into training.
Unlabeled example predicting and survival
We firstly validate the selfreinforcement for typical regression, e.g., LBF and CFSS, on LFPW, and then our strategy for deep models highly data demanding on a larger mixed data set.
Selfreinforcement on conventional regression
LFPW contains more than one thousand images showing great variations especially on pose changes. Previous studies show that LBF and CFSS perform well on this set as long as hundreds of accurately labeled faces are available. We validate how close the selfreinforced versions of LBF and/or CFSS with unlabeled examples work to the original algorithms with labeled ones.
Firstly, we validate how the minimization of our objective (2) continuously predicts and preserves those examples of low discrepancy. Manually including examples of the lowest prediction error against the groundtruth (available in LFPW) gives the upper bound of the example survival. We started from 100 labeled examples, and implemented the selfreinforced version of LBF (SRLBF) to automatically include 711 extra samples (regarded as unlabeled). The comparisons between manual inclusion of the lowest labeled error (LE) and our SRLBF are plotted in Fig. 6 showing the mean alignment error in every iteration. The testing error of SRLBF on 224 images, shown as the red solid line, decreases from 10.5 to 8.98, 14% lower than training without any extra unlabeled data (WED). The orange dots indicate the alignment errors of the regression with manually chosen samples having the lowest labeling error against the groundtruth. There is almost no difference between ours and LE in the beginning of the iteration process. The gap increases as more selfreinforced samples, automatically labeld and survived, are included, but reaches as low as 0.5 when the process converges. Our selfreinforcement is not necessarily able to generate and include the ‘groundtruth’ labels (not exist in practice), but definitely to improve the behavior of the regression toward the optimal.
Secondly, we demonstrate the effectiveness of selfreinforcement by comparing SRLBF with LBF when including different ratios of groundtruth labels for training. Besides those groundtruth labels, SRLBF can include the rest of LFPW training images without their labels. Figure 7 illustrates the mean errors for SRLBF and LBF on 224 testing LFPW images. As the increase of the percentage of groundtruth labels, both LBF and SRLBF give lower errors because the quantity of training examples with high quality labels is expanding. The errors of SRLBF are always lower than LBF, and the gaps are evident especially when only small fractions (less than 50%) of groundtruth labels are available. When all groundtruth labels are given, our regression degrades to LBF. This plot validates that the selfreinforcement is able to expand the quantity of training examples while maintaining the quality.
Thirdly, we compare the selfreinforced versions of CFSS [Zhu et al.2015] and LBF [Ren et al.2014] with the original algorithms as well as GPRT [Lee, Park, and Yoo2015]. Figure 8 illustrates the cumulative error distribution plots on 224 testing images of LFPW. All methods were trained with only 100 annotated images, but our selfreinforcement included 711 extra unlabeled samples. SRCFSS has better performance than CFSS, and SRLBF better than LBF. Both perform superior than GPRT, and SRCFSS is the best of these five algorithms. The proposed selfreinforcement is capable of automatically labeling examples and preserving good ones. Faces annotated with alignment results are shown in Fig. 11^{2}^{2}2More images are available in the supplementary materials. The SR versions performs much better on noses and mouthes presenting large variations that cannot be covered by a small number of training examples in the original regression algorithms.
Selfreinforcement on deep networks
To test the capability of our selfreinforced strategy on a large amount of unlabeled facial images, we construct a large dataset which contains 8,151 images and is made up of 6 facial datasets include FRGC v2.0, LFPW, HELEN, AFW, IBUG and 300W. We compare the performance between the DAN[Kowalski, Naruniec, and Trzcinski2017] trained only by labeled examples, labeled examples with extra examples obtained by our selfreinforced strategy and labeled examples with extra examples obtained by LBF. The number of labeled examples is 100. Our selfreinforced framework use LBF as alignment algorithm and obtains over 3,000 labeled facial images (some bad samples are not chosen), then we choose 400 and 900 of them as extra examples for DAN. We also directly run LBF [Ren et al.2014] which is trained by 100 samples on the large dataset, then perform randomly selection on the result of LBF to obtain 900 extra examples for DAN. 1,000 images from the large dataset are used for testing. Figure 9
illustrates the cumulative error distribution of these methods. As a deep learning method, DAN needs large amount of training data. The result shows that, when there are only 100 labeled training data provided, our method can enhance the performance of DBN by provide them another 400 training data. The performance can be improved when the number of extra data is added from 400 to 900. The comparisons between the regressor trained by labeled examples with extra examples obtained by our selfreinforced strategy and labeled examples with extra examples obtained by LBF prove that: selecting extra samples indiscriminately cannot only improve the performance but also result in poor accuracy.
Quantitative comparisons with the stateoftheart
We conducted comparisons with six face alignment algorithms on 300W. These six face alignment regressors are pretrained by a huge number of labeled images. CFAN and CFSS were trained on a combination of Helen (2000), LFPW (811) and AFW (337). The total number of these training samples is 3148. POCR and GNDBM [Tzimiropoulos and Pantic2014] were trained on the training set consisting of LFPW and Helen. ESR [Cao et al.2014] were trained on Helen. The total number of training samples is 2811. GPRT and LBF were trained on the training set of LFPW having 811 labeled images. In contrast, our selfreinforced LBF (SRLBF) starts from only a half of LFPW, i.e. 400 training labels, and the other half are included by our selfreinforced strategy. The cumulative error distributions of the compared methods and ours are shown in Figure 10.
The comparisons show that our regression does not necessarily give a better performance than the others. Instead, we are able to achieve comparable performance on common subsets of 300W with an extremely small training set of labels. The number of our training labels is one half of GPRT, 25% of ESR, 14% of POCR and GNDBM, and only 12% of CFAN and CFSS. Especially, our regression performs close to LBF with half of labels. Again, our selfreinforcement is open to any cascaded regression, and has the potential to improve the respective ability by automatically predicting and preserving high quality labels.
Conclusion
We propose a selfreinforced cascaded regression that fuses the discovering and upgrading training examples of low discrepancy into cascaded regression for face alignment. The framework is derived from indirect consistency with local appearance and global geometry. Finally, we validate the effectiveness of our regression. We are not intending to devise a competitive alignment algorithm trained with huge collected labels, but instead a selfreinforced strategy that automatically expands good training examples from a small subset, thus being complementary and more general to existing cascaded regression.
Acknowledgments
This work is partially supported by the National Natural Science Foundation of China (Nos. 61572096, 61432003, 61733002, 61672125, and 61632019), and the Hong Kong Scholar Program (No. XJ2015008). Dr. Liu is also a visiting researcher with Shenzhen Key Laboratory of Media Security, Shenzhen University, Shenzhen 518060.
References
 [Alahi, Ortiz, and Vandergheynst2012] Alahi, A.; Ortiz, R.; and Vandergheynst, P. 2012. Freak: Fast retina keypoint. In CVPR, 510–517.
 [Antonakos and Zafeiriou2014] Antonakos, E., and Zafeiriou, S. 2014. Automatic construction of deformable models inthewild. In CVPR, 1813–1820.
 [Artino2011] Artino, A. R. 2011. SelfReinforcement. Boston, MA: Springer US. 1322–1324.
 [Cao et al.2014] Cao, X.; Wei, Y.; Wen, F.; and Sun, J. 2014. Face alignment by explicit shape regression. IJCV 107(2):177–190.
 [Dalal and Triggs2005] Dalal, N., and Triggs, B. 2005. Histograms of oriented gradients for human detection. In CVPR, volume 1, 886–893.
 [Fan et al.2015] Fan, X.; Wang, H.; Luo, Z.; Li, Y.; Hu, W.; and Luo, D. 2015. Fiducial facial point extraction using a novel projective invariant. IEEE TIP 24(3):1164–1177.
 [Goodfellow et al.2014] Goodfellow, I.; PougetAbadie, J.; Mirza, M.; Xu, B.; WardeFarley, D.; Ozair, S.; Courville, A.; and Bengio, Y. 2014. Generative adversarial nets. In NIPS, 2672–2680.
 [Huang et al.2017] Huang, R.; Zhang, S.; Li, T.; and He, R. 2017. Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis. arXiv:1704.04086.
 [Jiang et al.2014] Jiang, L.; Meng, D.; Mitamura, T.; and Hauptmann, A. G. 2014. Easy samples first: Selfpaced reranking for zeroexample multimedia search. In Proceedings of the 22nd ACM international conference on Multimedia, 547–556. ACM.
 [Kowalski, Naruniec, and Trzcinski2017] Kowalski, M.; Naruniec, J.; and Trzcinski, T. 2017. Deep alignment network: A convolutional neural network for robust face alignment. arXiv:1706.01789.
 [Le et al.2012] Le, V.; Brandt, J.; Lin, Z.; Bourdev, L.; and Huang, T. S. 2012. Interactive facial feature localization. In ECCV. 679–692.
 [Ledig et al.2016] Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. 2016. Photorealistic single image superresolution using a generative adversarial network. arXiv:1609.04802.
 [Lee, Park, and Yoo2015] Lee, D.; Park, H.; and Yoo, C. D. 2015. Face alignment using cascade gaussian process regression trees. In CVPR, 4204–4212.
 [Li and Fu2013] Li, S., and Fu, Y. 2013. Lowrank coding with bmatching constraint for semisupervised classification. In IJCAI.
 [Li and Zhou2015] Li, Y.F., and Zhou, Z.H. 2015. Towards making unlabeled data never hurt. IEEE TPAMI 37(1):175–188.

[Li et al.2015]
Li, Y.; Fan, X.; Liu, R.; Feng, Y.; Luo, Z.; and Li, Z.
2015.
Characteristic number regression for facial feature extraction.
In ICME, 1–6.  [Liu, Deng, and Tao2016] Liu, Q.; Deng, J.; and Tao, D. 2016. Dual sparse constrained cascade regression for robust face alignment. IEEE TIP 25(2):700–712.
 [Martinez et al.2013] Martinez, B.; Valstar, M. F.; Binefa, X.; and Pantic, M. 2013. Local evidence aggregation for regressionbased facial point detection. IEEE TPAMI 35(5):1149–1163.
 [Ren et al.2014] Ren, S.; Cao, X.; Wei, Y.; and Sun, J. 2014. Face alignment at 3000 fps via regressing local binary features. In CVPR, 1685–1692.
 [Sagonas et al.2013] Sagonas, C.; Tzimiropoulos, G.; Zafeiriou, S.; and Pantic, M. 2013. A semiautomatic methodology for facial landmark annotation. In CVPR.
 [Shrivastava et al.2016] Shrivastava, A.; Pfister, T.; Tuzel, O.; Susskind, J.; Wang, W.; and Webb, R. 2016. Learning from simulated and unsupervised images through adversarial training. arXiv:1612.07828.
 [Singh et al.2015] Singh, B.; Han, X.; Wu, Z.; Morariu, V. I.; and Davis, L. S. 2015. Selecting relevant web trained concepts for automated event retrieval. In ICCV, 4561–4569.
 [Tuzel, Marks, and Tambe2016] Tuzel, O.; Marks, T. K.; and Tambe, S. 2016. Robust face alignment using a mixture of invariant experts. In ECCV.
 [Tzimiropoulos and Pantic2014] Tzimiropoulos, G., and Pantic, M. 2014. Gaussnewton deformable part models for face alignment inthewild. In CVPR, 1851–1858.
 [Xiong and De la Torre2013] Xiong, X., and De la Torre, F. 2013. Supervised descent method and its applications to face alignment. In CVPR, 532–539.
 [Zhang et al.2015] Zhang, J.; Kan, M.; Shan, S.; and Chen, X. 2015. Leveraging datasets with varying annotations for face alignment via deep regression network. In ICCV, 3801–3809.

[Zhu and Goldberg2009]
Zhu, X., and Goldberg, A. B.
2009.
Introduction to semisupervised learning.
Synthesis lectures on artificial intelligence and machine learning
3(1):1–130.  [Zhu et al.2015] Zhu, S.; Li, C.; Change Loy, C.; and Tang, X. 2015. Face alignment by coarsetofine shape searching. In CVPR, 4998–5006.
 [Zhu et al.2016] Zhu, S.; Li, C.; Loy, C. C.; and Tang, X. 2016. Unconstrained face alignment via cascaded compositional learning. In CVPR.
Comments
There are no comments yet.