Representation-based classification (RC) methods have drawn intensive interest and shown great potential in face recognition in recent years [1, 2, 3, 4]. An appealing merit of RC methods is that they can exploit the subspace structure of data in each class for classification. Concretely, RC methods are based on the observation that many real-world data in a class often approximately lie in a low-dimensional subspace, such as face images of a subject under varying illumination  and hand-written digit images with distinct rotations and translations .
In the past decades, various RC methods have been proposed for face recognition (FR). Inspired by the success of lasso regression in compressed sensing, Wright et al.  first developed the sparse RC (SRC) method for face recognition. To improve the efficiency of SRC, Zhang et al. 
put forward the collaborative RC (CRC) approach by using the ridge regression to compute the representation vector. To exploit the block structure of the dictionary, Elhamifar and Vidal proposed a block sparse RC (BSRC) method by utilizing the group lasso regression for FR. Zhang et al.  proposed a nonlinear extension of SRC by incorporating the kernel trick into SRC. Shekhar et al.  developed a joint sparse RC (JSRC) method for multimodal face recognition by utilizing the correlation information among distinct modalities. More recent advances on RC methods can be found in the references [10, 11, 12, 13, 14]. Previous RC methods are devised separately based on different motivations. In our previous work , we developed a unified framework termed as atomic representation-based classification (ARC). We show that many important RC methods can be reformulated as special cases of ARC.
. These regression models in fact impose a predefined assumption on the distribution of the noise variable, such as the Gaussian distribution or Laplacian distribution[10, 18, 19]. Such limitation may impede their performance when the assumptions violate in the presence of complicated noises in real-world face recognition.
In this paper, we propose to learn the representation vector based on the modal regression and the atomic norm regularization. The modal regression , the mode is defined as the value at which its density function attains its peak value, i.e., . For a set of observations, the mode is the value that appears most frequently. Fig. 1 shows the mode of the noise variable in a facial image with sunglasses. Previous research results [20, 21, 19]
have shown that one of the most appealing merits of modal regression is its robustness to various complex noise, including heavy-tailed noises, impulsive noises and outliers. The novelties and contributions of this work are summarized as follows:
We develop a general unified framework termed as modal regression based atomic representation and classification (MRARC) for robust face recognition and reconstruction. Unlike previous RC methods, MRARC does not require the noise variable to follow any specific predefined distributions. This gives rise to its ability in handling the various complicated noises in reality.
Using MRARC as a general platform, we propose four novel modal regression based RC methods by specifying distinct atomic sets for unimodal and multimodal face recognition, respectively.
Ii Related Work
This section briefly introduces some representative RC methods. Consider a classification problem with classes. Let be a matrix of labeled training samples from the -th class for . Define . Table I summarizes the key notations used in this paper. Given the training data matrix , the goal is to correctly determine the label of any new test sample .
1) SRC (Sparse Representation based Classification) : The SRC method tries to seek the sparsest solution to the linear system of equations for classification. To this end, it first computes the representation vector by solving the minimization problem
where the norm is defined as . To deal with noise, SRC solves the following minimization problem also known as lasso regression 
where is a positive regularization parameter.
2) CRC (Collaborative Representation based Classification) : To improve the efficiency of SRC, the CRC method computes the representation vector by solving the norm based ridge regression problem
The problem (3) has a closed-form solution, which can be explicitly expressed as . Here
denotes the identity matrix.
|number of classes|
|number of all training samples|
|matrix of all training samples|
|a new test sample|
3) BSRC (Block Sparse Representation based Classification) : This method suggests considering the block structure of training data. It assumes that training samples in each class form a few blocks. Let be a partition of , i.e., and . It computes the representation vector based on group lasso regression
where denotes the subvector of with entries of indexed by .
After the representation vector is obtained, RC methods compute the class-specific residuals for each class
where is the vector that only keeps the nonzero entries of with respect to the -th class . Finally, the test sample is assigned to the class yielding the minimal residual.
In our previous work , we have proposed a general unified framework called atomic representation based classification (ARC). Most RC methods can be reformulated as special cases of ARC by specifying the atomic set. To review ARC, we first introduce the definition of the atomic norm.
 The atomic norm of with respect to an atomic set is defined by
where denotes the convex hull of the set .
Two typical examples of the atomic norm are the norm and the nuclear norm . The former induces sparsity for vectors while the latter induces low rankness for matrices. The atomic representation (AR) model is given by
For noisy data , we consider the regularized AR model
Then we compute the class-specific residual for each class and assign to the class with the minimal residual.
Most previous RC methods belong to ARC as special cases with the specific atomic set , as shown in Fig. 2. For example, if we define where is a unit vector with its only nonzero entry 1 in the -th coordinate, we have and ARC reduces to SRC . Similarly, BSRC  also belongs to ARC. Concretely, Let be a partition of as mentioned before, i.e., and . Denote . Define It can be proved that and ARC reduces to BSRC by setting . It can be shown that CRC also belongs to ARC by using the atomic set . Another example of ARC is the low-rank RC (LRRC)  when multiple test samples are considered simultaneously. Given test samples , we arrange them as columns of a matrix . The LRRC model looks for the representation matrix with the lowest rank by
denotes the nuclear norm of
, i.e., the sum of singular values of. If we define the atomic set we have and ARC reduces to LRRC.
Iii Proposed Method
In this section, we describe the proposed modal regression based atomic representation and classification (MRARC) framework for robust face recognition and reconstruction.
Iii-a Modal Regression
where denotes the unknown target function and represents the noise term. The goal of the regression problem is to approximate the unknown target function . Modal regression aims to recover the target function by regressing towards the following modal regression function .
The modal regression function is defined as
where denotes the conditional density of conditioned on .
If we assume the mode of the conditional distribution of the noise at any to be zero, i.e.,
there holds according to Eq. (8). Here denotes the conditional density of the noise conditioned on . Thus, under zero-mode noise assumption, we have
and the target is converted to estimating the modal regression function. To this end, we introduce the modal regression risk  as below.
For a measurable function , its modal regression risk is defined as
where denotes the marginal distribution of .
It can be proved that is the minimizer of the risk , i.e., , where denotes the set of all measurable functions on . For any measurable function, denote as the error random variable. According to Eq. (8), there holds . Then the density of can be formulated as
Setting , we have
Thus, the modal regression problem is converted to minimizing the value of the density at . In reality, we often have only a finite samples and the density is unknown. For this reason, the Parzen window method  is utilized to estimate as below
where for and denotes the general kernel function satisfying . Some common kernel functions include the Gaussian kernel and the Epanechnikov kernel , where if and otherwise. Then we have the estimator of the modal regression function as follows
Under the zero-mode noise assumption, there holds
Based on the analysis above, we can find that modal regression does not require the noise to follow any specific preset distributions such as the Gaussian distribution required by some conventional regression models. This makes modal regression attractive in handling various complex noise in practice .
Iii-B Modal Regression based Atomic Representation (MRAR)
Define the error vector where . The problem (10) can be rewritten in the equivalent form
denotes the modal regression based loss function (MRLF)
For simplicity, consider the linear function where is the unknown coefficient vector. Then and , where and . Incorporating the MRLF into the regularized AR model (6), we have the following modal regression based atomic representation (MRAR) model
The problem above is difficult to tackle due to the combination of the nonlinearity of the MRLF and the abstract atomic norm regularization. In addition, most previous optimization techniques are originally proposed for RC methods with special atomic norms, such as sparse representation. They are difficult to be applied for the general MRAR framework. In this paper, we devise an effective optimization algorithm to implement the general MRAR model based on ADMM  and the half-quadratic (HQ) theory . We first introduce an auxiliary vector and reformulate the objective function in Eq. (13) as
The augmented Lagrangian function of Eq. (14) is
where is independent of and . Here is the Lagrangian multiplier and denotes a penalty parameter. Given the initialization of , and , we alternatively update each variable while fixing others in each iteration.
In the first step, we update while fixing and by
Algorithm 1 Implementation of MRAR (13)
Input: , , , and .
Initialization: , , , , , , .
while not converged and do
Update by the proximity operator
Update as by Eq. (18).
Update the Lagrange multiplier vector by
Check the convergence conditions:
The optimal solution of Eq. (16) can be written as
where denotes the proximity operator with respect to . Here we introduce the proximity operators for some common atomic sets
Here denotes a vector of the sign of entries of and represents the Hadamard product. if and otherwise. denotes the index sets in BSRC aforementioned. For the vector , denotes the subvector of containing the entries indexed by the set .
In the second step, we update the auxiliary variable while fixing and by
where the infimum is reached at For the Gaussian kernel , . Then the MRLF in Eq. (12) is rewritten as
Algorithm 2 MRAR based Classification
Input: An atomic set , training samples , a test sample , and the parameter .
Output: identity .
Normalize the columns of to have unit Euclidean norm.
Learn the representation vector via MRAR (13).
Calculate the residuals
Thus, the problem in (18) can be reformulated as
where denotes a square diagonal matrix with the elements of on the main diagonal. The iterations above are guaranteed to converge according to the HQ theory . Finally, the Lagrange multiplier vector is updated by . Algorithm 1 summarizes the algorithm of MRAR.
Iii-C MRAR based Classification
In this section, we develop the general MRAR based classification (MRARC) framework and some novel methods as special cases of MRARC.
Given the training samples and a new test sample , the first step is to compute the optimal coefficient vector using the MRAR model
Secondly, we calculate the class-specific residuals for each class. Unlike most previous methods using the norm, we utilize the MRLF to calculate the residuals
where denotes the vector only keeping the nonzero entries of with respect to the -th class. Finally, the test sample is assigned to the class yielding the minimal residual. Algorithm 2 summarizes the classification procedure of MRARC.
It is worth pointing out that MRARC is a general framework for pattern classification. We can use it to devise new classification methods by specifying the atomic set . Concretely, we refer to the MRARC with the atomic sets , , and as MRSRC, MRBSRC and MRCRC for short, respectively.
Iii-D MRARC for Multimodal Data
Assume that we have modalities and the corresponding dimensions are . Let and be the test sample and dictionary in the -th modality where . For multimodal data, the MRAR model can be formulated as
where denotes the -th column of the matrix and is an atomic set of matrices. To take advantage of the correlation information among multiple modalities, we can use the joint sparsity inducing atomic set
where denotes the -th row of the matrix . We refer to the MRARC method using in Eq. (23) as modal regression based joint sparse representation classification (MRJSRC). The MRJSRC method encourages the representation vectors of a test data (i.e., columns of ) in distinct modalities to have the same sparsity pattern and locations of nonzero entries. The optimization problem in Eq. (23) can be tackled using the similar way as Algorithm 2. The main difference is that the proximity operator for the atomic set is formulated as
for . Once the solution of Eq. (23) is obtained, the class-dependent residuals are computed by
Finally, the multimodal test sample is assigned to the class yielding the minimal residual.
In this section, we evaluate the efficacy of the proposed MRARC framework for unimodal and multimodal face recognition against various noises.
Experiments are conducted on four public available databases, i.e., the Extended Yale B database (EYaleB) , the AR database , the CMU MoBo database  and the the CMU PIE database . Fig. 3 shows some sample images in these databases and Table II
depicts the details of them, including the number of classes, data dimension and the number of instances. For unimodal face recognition, we compare MRSRC, MRBSRC and MRCRC in the MRARC framework with the three typical methods SRC, BSRC and CRC in the ARC framework. The linear regression-based classification (LRC) approach is used as the baseline. For multimodal face recognition, we compare the proposed MRJSRC in MRARC with the JSRC  method. For ARC and MRARC methods, the regularization parameter is tuned by searching a discrete set to achieve their best performance as possible. For MRARC methods, we set the penalty parameter in all experiments.
|Extended Yale B||38||2,432 (images)|
|CMU MoBo||24||96 (videos)|
|CMU PIE||68||41,368 (images)|
Iv-a Face Recognition With Occlusion
In this subsection, we conduct five different experiments to analyze the performance of proposed methods for face recognition and reconstruction against occlusions.
Experiment 1–Effect of percent of occlusion: In the first experiment, we evaluate the performance of the competing methods against different levels of random occlusions. For each test image, a random square region is occluded by a baboon image, as shown in Fig. 4. The Extended Yale B database is used for the experiment and the images are resized to for efficiency. In the literature of RC methods , many researchers manually chose a subset of images in the database with normal or moderate light conditions for training and only test images have extreme light conditions. However, in real-world scenarios both training and testing images may have different light conditions, including moderate and extreme light conditions. For this reason, we randomly select half of images (32 images) per subject for training and the rest for testing. Fig. 4 shows the recognition rates of various methods as a function of the percent of occlusion. From Fig. 4, it can be seen that most competing methods achieve high recognition rates when the occlusion level is low. However, as the occlusion level increases, the recognition rates of LRC, and the three methods (SRC, BSRC and CRC) in ARC drop rapidly. In contrast, the three methods (MRSRC, MRBSRC and MRCRC) in MRARC outperform other methods in different occlusion levels.
Experiment 2–Effect of feature dimension: In the second experiment, we study the effect of the feature dimension (or image size) to the recognition performance using the Extended Yale B database. To this end, we resize the images to , , , and , respectively. The corresponding downsampling ratio is , , , and , respectively. Fig. 5 shows the recognition rates versus feature dimension with 20 percent occlusion using facial images of the first , and subjects. The three methods in MRARC can improve the corresponding ones in ARC with varying feature dimensions and number of classes. In particular, the MRSRC method significantly outperform other competing methods especially when the feature dimension is low.
Experiment 3–Effect of training set size: In the third experiment, we analyze the impact of the training set size on the final face recognition performance with random occlusion. For each subject, we randomly select images for training and perform recognition on the test images with 20% occlusion. We repeat the experiment ten times and compute the mean, minimum and maximum recognition rates of each algorithm. The results are reported in Table III. It can be found in Table III that even with small training set, MRARC methods can enhance the ARC ones with large margin.
Experiment 4–Recognition with real-world occlusions: In the fourth experiment, we evaluate the performance of the proposed methods against real-world occlusions such as sunglass and scarf. Fig. 3(b) shows four facial images occluded by sunglasses or scarf in the AR database. The AR database is used for this experiment and the images are resized to . For each subject, the eight images with varying expressions are utilized for training. The four images with sunglasses or scarves are used for testing. Table IV reports the recognition results. Some results of LRC, SRC, BSRC and CRC are copied from the corresponding papers. The results suggest that the proposed MRARC methods can well handle facial images with real-world occlusions with high recognition accuracy.
Iv-B Face Recognition With Corruption
In this part, we evaluate the performance of proposed methods for face recognition and reconstruction against random corruption.
Experiment 1–Effect of percent of corruption:
In the first experiment, we study the performance of proposed methods against different levels of random pixel corruption. To this end, a fraction of pixels of each test image are randomly chosen and their values are replaced by random values following uniform distribution over the interval. Like the settings in Section IV-A, we randomly select half of images per subject in the Extended Yale B databases for training and the rest are used for testing. As for the AR database, we use the seven images with expression and illumination variations per subject in the first session for training and the seven images in the second session for testing. Fig. 6 shows the recognition rates of competing algorithms as the percent of corruption varies from 10 to 60. The results demonstrate the superiority of the methods in MRARC over those in ARC for recognition with random corruption.
Experiment 2–Effect of training set size: In the second experiment, we analyze the impact of the training set size on the recognition performance with random pixel corruption. Like the settings in Section IV-A, we vary the number of training samples per class from 16 to 28. Table V reports the recognition results using test images with 30% random corruption over ten runs. The results further validate the fact that MRARC methods can improve the recognition performance of ARC methods against random corruption in most cases.
Iv-C Results on the CMU MoBo Database
In this part, we evaluate the performance of MRARC for image set based face recognition (ISFR) using the CMU Mobo database . The database consists of 96 video sequences of 24 subjects walking on a treadmill. For each subject, 4 video sequences are taken with four distinct walking patterns, respectively. The detected facial images using the Viola-Jones face detector  are resized to . For each facial image, we extract the Local Binary Pattern (LBP) features  and normalize each feature to have unit Euclidean norm. To evaluate the robustness of proposed methods against corruption, we randomly choose 10 percent of the entries of each feature vector and replace them by random values following the uniform distribution . In the experiment, we randomly select a video sequence for training and use the rest for testing. To extend ARC and MRARC methods for ISFR, we use the average class-dependent reconstruction residuals of the images in the query set for recognition . Eventually, the query set is assigned to the class minimizing the average residual. Instead of using all frames of each video, we randomly choose frames of images from each video and use them to construct the training and query (testing) set for recognition. To obtain reliable results, we repeat each test ten times and compute the average, minimal and maximum recognition rates among the ten tests. Table VI reports the recognition results. The results show that the three MRARC methods have close performance and outperform other competing RC methods.
Iv-D Multimodal Face Recognition
In this section, we compare the MRJSRC method in the MRARC framework with JSRC  for multimodal face recognition. For the comparison between JSRC and other multimodal recognition methods, see the reference .
In the first experiment, we extract four weak modalities from each facial image for evaluation . They are the left and right periocular, nose, and mouth as shown in Fig. 7. For the AR database, the images are resized to . Like the settings in Experiment 1 in Section IV-B, we use the seven images with expression and illumination variations per subject in the first session for training and the seven images in the second session for testing. For the EYaleB database, the images are resized to . We randomly select half of all images per subject for training and the rest for testing. Analogously, we use the original intensity values of the images for the experiments.
Table VIII reports the recognition results of MRJSRC and JSRC using varying number of modalities. Based on the results, we can draw the following conclusions.
First, the recognition rate of each competing method grows rapidly as the number of modalities increases from 1 to 4. This suggests that both JSRC and MRJSRC can exploit the complementarity among distinct modalities to improve the recognition performance. However, the proposed MRJSRC method stably improves JSRC with varying number of modalities.
Second, the competing methods using more modalities have better robustness against random block occlusion on the EYaleB database. For example, the recognition rate of MRJSRC using the single modality 1 (i.e., the left periocular modality) varies from without occlusion to with occlusion, declining over . In contrast, the recognition rate of MRJSRC using the four modalities varies from to , declining less than .
In the second experiment, we evaluate the performance of the proposed method for multiview face recognition using the CMU PIE database . This database consists of 41,368 images of 68 people, of which the facial images are captured under 13 different poses, 43 different illumination conditions, and with 4 different expressions. Two near frontal poses (i.e., C09 and C29) are selected to construct the multiview setting, as shown in Fig. 8. Thus, a pair of images of one subject under the two poses are regarded as a two-view or two-modality sample. Each image is resized to . In the experiment, we randomly select half of all images per subject for training and the rest for testing. As shown in Fig. 8, a fraction of pixels of each test image are randomly selected and their values are replaced by random values following uniform distribution over the interval .
Fig. 9 shows the recognition rates of competing methods as a function of the percent of corrupted pixels in each test image. We also compare JSRC and MRJSRC using single-view and two-view samples and with varying percent of random corruption. The results are reported in Table VIII. Concretely, View 1 and View 2 correspond to Pose C09 and Pose C29, respectively. From Fig. 9 and Table VIII, we can find that the proposed MRJSRC method stably outperforms JSRC in varying number of views and corruption level. This comes from the fact that MRJSRC has decent robust property and can well take advantage of the complementary information among multiple modalities.
|Algorithm||View 1 View 2||View 1||View 2|
This paper presents a novel general classification framework termed as MRARC for robust face recognition and reconstruction. The proposed MRARC framework is based on the modal regression and does not require the noise to follow any specific distribution. This gives rise to the ability of MRARC in handling various complicated noises in reality. Using MRARC as a platform, we have also developed several novel RC methods for robust unimodal and multimodal face recognition. The experiments on real-world databases show the efficacy of the proposed methods for robust face recognition and reconstruction.
-  J. Wright, A.Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 2, pp. 210–227, Jan. 2009.
-  E. Elhamifar and R. Vidal, “Block-sparse recovery via convex optimization,” IEEE Trans. Signal Process., vol. 60, no. 8, pp. 4094–4107, Aug. 2012.
-  L. Zhang, M. Yang, and X. Feng, “Sparse representation or collaborative representation: which helps face recognition?” in Proc. IEEE Conf. Int. Conf. Comput. Vis., Nov. 2011, pp. 471–478.
-  R. He, W. Zheng, and B. Hu, “Maximum correntropy criterion for robust face recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 8, pp. 1561–1576, Aug. 2011.
-  R. Basri and D. Jacobs, “Lambertian reflection and linear subspaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 3, pp. 218–233, Feb. 2003.
-  T. Hastie and P. Simard, “Metrics and models for handwritten character recognition,” Statistical Science, vol. 13, no. 1, pp. 54–65, 1998.
E. Elhamifar and R. Vidal, “Robust classification using structured sparse
Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2011, pp. 1873–1879.
L. Zhang, W. Zhou, P. Chang, J. Liu, Z. Yan, T. Wang, and F. Li, “Kernel sparse representation-based classifier,”IEEE Trans. Signal Process., vol. 60, no. 4, pp. 1684–1695, Jan. 2012.
-  S. Shekhar, V. Patel, N. Nasrabadi, and R. Chellappa, “Joint sparse representation for robust multimodal biometrics recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 1, pp. 113–126, Jan. 2014.
-  M. Yang, L. Zhang, J. Yang, and D. Zhang, “Regularized robust coding for face recognition,” IEEE Trans. Image Process., vol. 22, no. 5, pp. 1753–1766, May 2013.
-  R. He, W. Zheng, T. Tan, and Z. Sun, “Half-quadratic-based iterative minimization for robust sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 2, pp. 261–275, Feb. 2014.
-  J. Yang, L. Luo, J. Qian, Y. Tai, F. Zhang, and Y. Xu, “Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 1, pp. 156–171, Jan. 2017.
-  Y. Tang, Y. Wang, L. Li, and C. Chen, “Structural atomic representation for classification,” IEEE Trans. Cybernetics, vol. 45, no. 12, pp. 2905–2913, Dec. 2015.
-  Y. Wang, Y. Tang, L. Li, and P. Wang, “Information-theoretic atomic representation for robust pattern classification,” in Proc. Int. Conf. Pattern Recognit., Dec. 2016, pp. 3685–3690.
-  R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Roy. Stat. Soc., Ser. B, vol. 58, no. 1, pp. 267–288, 1996.
-  A. Hoerl and R. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970.
-  M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variabels,” J. Royal. Statist. Soc B., vol. 68, no. 1, pp. 49–67, 2006.
-  D. Erdogmus and J. C. Principe, “An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems,” IEEE Trans. Signal Process., vol. 50, no. 7, pp. 1780–1786, Jul. 2002.
-  Y. Feng, J. Fan, and J. Suykens, “A statistical learning approach to modal regression,” Arxiv preprint arXiv:1702.05960v2, 2017.
-  T. Sager and R. Thisted, “Maximum likelihood estimation of isotonic modal regression,” Ann. Stat., vol. 10, no. 3, pp. 690–707, 1982.
-  H. Zhou and X. Huang, “Nonparametric modal regression in the presence of measurement error,” Electronic Journal of Statistics, vol. 10, no. 2, pp. 3579–3620, 2016.
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed
optimization and statistical learning via the alternating direction method of
Foundations and Trends in Machine Learning, vol. 3, pp. 1–122, Jan. 2011.
-  M. Nikolova and M. K. Ng, “Analysis of half-quadratic minimization methods for signal and image recovery,” SIAM J. Sci. Comput., vol. 27, no. 3, pp. 937–966, 2005.
-  V. Chandrasekaran, B. Recht, P. Parrilo, and A. Willsky, “The convex geometry of linear inverse problems,” Found. Comput. Math., vol. 12, pp. 805–849, Dec. 2012.
-  G. Liu, Z. Lin, and Y. Yu, “Robust subspace segmentation by low-rank representation,” in Proc. Int. Conf. Mach. Learn., 2010, pp. 663–670.
E. Parzen, “On the estimation of a probability density function and the mode,”Ann. Math. Stat., vol. 33, no. 3, pp. 1065–1076, Sep. 1962.
-  J. C. Principe, Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives. New York: Springer, 2010.
-  K. Lee, J. Ho, and D. Driegman, “Acquiring linear subspaces for face recognition under variable lighting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 5, pp. 684–698, May 2005.
A. Martinez and R. Benavente, “The AR face database,” Computer Vision Center, Tech. Rep. 24, Jun. 1998.
-  R. Gross and J. Shi, “The CMU Motion of Body (MoBo) Database,” Robotics Institute, Tech. Rep. CMU-RI-TR-01-18, June 2001.
-  T. Sim, S. Baker, and M. Bsat, “The CMU pose, illumination, and expression database,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 12, pp. 1615–1618, Dec. 2003.
-  I. Naseem, R. Togneri, and M. Bennamoun, “Linear regression for face recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 11, pp. 2106–2112, Nov. 2010.
-  P. Viola and M. Jones, “Robust real-time face detection,” Int. J. Comput. Vis., vol. 57, no. 2, pp. 137–154, 2004.
-  T. Ahonen, A. Hadid, and M. Pietikainen, “Face description with local binary patterns: Application to face recogntion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 12, pp. 2037–2041, Dec. 2006.
-  P. Zhu, W. Zuo, L. Zhang, S. Shiu, and D. Zhang, “Image set-based collaborative representation for face recognition,” IEEE Trans. Inf. Forensics Security, vol. 9, no. 7, pp. 1120–1132, July 2014.