I Introduction
Face recognition (FR) has been an active field of research in biometrics for over two decades [1]. Current methods work well when the test images are captured under controlled conditions. However, quite often the performance of most algorithms degrades significantly when they are applied to the images taken under uncontrolled conditions where there is no control over pose, illumination, expressions and resolution of the face image. Image resolution is an important parameter in many practical scenarios such as surveillance where high resolution cameras are not deployed due to cost and data storage constraints and further, there is no control over the distance of human from the camera.
Many methods have been proposed in the vision literature that can deal with this resolution problem in FR. Most of these methods are based on application of superresolution (SR) technique to increase the resolution of images so that the recovered higherresolution (HR) images can be used for recognition. One of the major drawbacks of applying SR techniques is that there is a possibility that recovered HR images may contain some serious artifacts. This is often the case when the resolution of the image is very low. As a result, these recovered images may not look like the images of the same person and the recognition performance may degrade significantly.
In practical scenarios, the resolution change is also coupled with other parameters such as pose change, illumination variations and expression. Algorithms specifically designed to deal with LR images quite often fail in dealing with these variations. Hence, it is essential to include these parameters while designing a robust method for lowresolution FR. To this end, in this paper, we present a generative approach to lowresolution FR that is also robust to illumination variations based on learning class specific dictionaries. One of the major advantages of using generative approaches is that they are known to have reduced sensitivity to noise than the discriminative approaches [1]. Furthermore, we kernelize the learning algorithm to handle nonlinearity in the data samples and introduce a bilevel sparse coding framework for robust recognition.
Training stage of our method consists of three main steps. In the first step of the training stage, given HR training samples from each class, we use an image relighting method to generate multiple images of the same subject with different lighting so that robustness to illumination changes can be realized. In the second step, the resolution of the enlarged gallery images from each class is matched with that of the probe image. Finally, in the third step, class and resolution specific dictionaries are trained for each class. For the testing phase, a novel LR image is projected onto the span of the atoms in each learned dictionary. The residual vectors are then used to classify the subject. A flowchart of the proposed algorithm is shown in figure
1.A preliminary version of this work appeared in [2]. Extensions to [2] include kernalization of the dictionary learning algorithm as well as additional experiments using this kernalized algorithm.
Ia Paper organization
Ii Previous Work
In this section, we review some of the recent FR methods that can deal with poor resolution. These techniques can be broadly divided into the following categories.
Iia SRbased approaches
SR is the method of estimating HR image
given downgraded image . The LR image model is often given aswhere and are the downsampling matrix, the blurring matrix and the noise, respectively. Earlier works for solving the above problem were based on taking multiple LR inputs and combining them to produce the HR image. A classical work by Simon and Baker [3] showed that the methods using multiple LR images using smooth priors would fail to produce good results as the resolution factor increases. They also proposed a face hallucination method for superresolving face images. Subsequently, there have been works using single image for SR such as examplebased SR [4], SR using neighborhood embedding [5] and sparse representationbased SR [6]. While these methods can be used for superresolving the face images and subsequent recognition, methods have also been proposed for specifically handling the problem for faces.
In particular, an eigenface domain SR method for FR was proposed by Gunturk et al. in [7]. This method proposes to solve the FR at LR using SR of multiple LR images using their PCA domain representation. Given an LR face image, Jia and Gong [8]
propose to directly compute a maximum likelihood identity parameter vector in the HR tensor space that can be used for SR and recognition. HenningsYeomans
et al. [9] presented a Tikhonov regularization method that can combine the different steps of SR and recognition in one step. Wilman et al. [10] proposed a relational learning approach for superresolution and recognition of low resolution faces.IiB Metric learningbased approaches
Though the LR are directly not suitable for face recognition purpose, it is also not necessary to superresolve the image before recognition, as the problem of recognition is not the same as SR. Based on this motivation, some different approaches to this problem have been suggested. Coupled Metric Learning [11] attempts to solve this problem by mapping the LR image to a new subspace, where higher recognition can be achieved. A similar approach for improving the matching performance of the LR images using multidimensional scaling was recently proposed by Biswas et al. in [12, 13, 14]. Further, Ren et al. [15] used coupled kernel methods for low resolution recognition. A coupled Fisher analysis method was proposed by Sienna et al [16]. Lei et al [17]. also proposed a coupled discriminant analysis framework for heterogenous face recognition.
IiC Other methods
There have been works to solve the problem of unconstrained FR using videos. In particular, Arandjelovic and Cipolla [18] use a video database of LR face images with a variability in pose and illumination. Their method combines a photometric model of image formation with a statistical model of generic face appearance variation to deal with illumination. To handle pose variation, it learns local appearance manifold structure and a robust sameidentity likelihood.
A change in resolution of the image changes the scale of the image. Scale change has a multiplicative effect on the distances in image. Hence, if the image is represented in logpolar domain, a scale change will lead to a translation in the said domain. Based on this, a FR approach has been suggested by Hotta et al. in [19] to make the algorithm scale invariant. This method proposes to extract shiftinvariant features in the logpolar domain.
Additional methods for LR FR include correlation filterbased approach [KerCor] and a support vector data description method [20]. 3D face modelling has also been used to address the LR face recognition problem [21] [22]. Choi et al [23] make an interesting study on the use of color for degraded face recognition.
Iii Proposed Approach
In this section, we present the details of our proposed lowresolution FR algorithm based on learning class specific dictionaries.
Iiia Image Relighting
As discussed earlier, the resolution change is usually coupled with other parameters such as illumination variation. In this section, we introduce an image relighting method that can deal with this illumination problem in LR face recognition. The idea is to capture various illumination conditions using the HR training samples, and subsequently use the expanded gallery for recognition at low resolutions.
Assuming the Lambertian reflectance model for facial surface, the HR intensity image is given by the Lambert’s cosine law as follows:
(1) 
where is the pixel intensity at location , is the light source direction, is the surface albedo at location , is the surface normal of the corresponding surface point. Given the face image, , image relighting involves estimating , and , which is an extremely illposed problem. To overcome this, we use 3D facial normal data [27] to first estimate an average surface normal, . Further, the model is nonlinear due to the term in (1). However, the shadow points do not reveal any information about albedo. Hence, we neglect the term in further discussion. The albedo, and source directions can now be estimated as follows:

The source direction can be estimated using following a linear Least Squares approach [28]:

An inital estimate of albedo, can be obtained as:

The final albedo estimate is obtained using minimum mean square approach based on Wiener filtering framework [29]:
where, denotes the minimum mean square estimate (MMSE) of the albedo.
Using the estimated albedo map, and average normal, we can generate new images under any illumination condition using the image formation model (1). It was shown in [30] that an image of an arbitrarily illuminated object can be approximated by a linear combination of the image of the same object in the same pose, illuminated by nine different light sources placed at preselected positions.
Hence, the image formation equation can be rewritten as
(2) 
where
and are prespecified illumination directions. Since, the objective is to generate HR gallery images which will be sufficient to account for any illumination in the probe image, we generate images under prespecified illumination conditions and use them in the gallery. Figure 2 shows some relighted HR images along with the corresponding input and LR images. Furthermore, as the condition is true irrespective of the resolution of LR image, the same set of gallery images can be used for all resolutions.
IiiB Low Resolution Dictionary Learning
In LR face recognition, given labeled HR training images, the objective is to identify the class of a novel probe LR face image. Suppose that we are given distinct face classes and a set of HR training images per class, . Here, corresponds to the total number of images in class including the relighted images. We identify an grayscale image as an dimensional vector, , which can be obtained by stacking its columns, where . Let
be an matrix of training images corresponding to the class. For resolution and illumination robust recognition, the matrix is premultiplied by downsampling and blurring matrices. Here, has a fixed dimension of and will be of size , where , the LR probe being a grayscale image of . The resolution specific training matrix, is thus created as
(3) 
Given this matrix, we seek the dictionary that provides the best representation for each elements in this matrix. One can obtain this by finding a dictionary and a sparse matrix that minimizes the following representation error
(4) 
where represent the columns of and the sparsity measure counts the number of nonzero elements in the representation. Here, denotes the Frobenius norm defined as . Many approaches have been proposed in the literature for solving such optimization problem. In this paper, we adapt the KSVD algorithm [31] for solving (IIIB) due to its simplicity and fast convergence. The KSVD algorithm alternates between sparsecoding and dictionary update steps. In the sparsecoding step, is fixed and the representation vectors s are found for each example . Then, with fixed a the dictionary is updated atombyatom in an efficient way. See [31] for more details on the KSVD dictionary learning algorithm.
Classification: Given an LR probe, it is columnstacked to give the column vector . It is projected onto the span of the atoms in each of the class dictionary, using the orthogonal projector
The approximation and residual vectors can then be calculated as
(5) 
and
(6) 
respectively, where
is the identity matrix and
(7) 
are the coefficients. Since the KSVD algorithm finds the dictionary, , that leads to the best representation for each examples in , will be small if were to belong to the class and large for the other classes. Based on this, we can classify by assigning it to the class, , that gives the lowest reconstruction error, :
(8) 
Generic Dictionary Learning: The classspecific dictionary, learnt above can be extended to use features other than intensity images. Specifically, the dictionary can be learnt using features like Eigenbasis, extracted from training matrix . However, as equation (3) does not hold for , the resolution specific feature matrix is directly extracted using . Our Synthesisbased LR FR (SLRFR) algorithm is summarized in Figure 3.
IiiC Nonlinear Dictionary Learning
The class identities in the face dataset may not be linearly separable. Hence, we also extend the SLRFR framework to the kernel space. This essentially requires the dictionary learning model to be nonliner [32].
Let be a nonlinear mapping from dimensional space into a dot product space . A nonlinear dictionary can be trained in the feature space by solving the following optimization problem
(9) 
where
In (9) we have used the following model for the dictionary in the feature space,
Since it can be shown that the dictionary lies in the linear span of the samples , where is a matrix with atoms [32]. This model provides adaptivity via modification of the matrix . Through some algebraic manipulations, the cost function in (9) can be rewritten as,
(10) 
where is a kernel matrix whose elements are computed from
It is apparent that the objective function is feasible since it only involves a matrix of finite dimension , instead of dealing with a possibly infinite dimensional dictionary.
An important property of this formulation is that the computation of only requires dot products. Therefore, we are able to employ Mercer kernel functions to compute these dot products without carrying out the mapping . Some commonly used kernels include polynomial kernels
and Gaussian kernels
where and are the parameters.
Similar to the optimization of (IIIB) using the linear KSVD [31] algorithm, the optimization of (9) involves sparse coding and dictionary update steps in the feature space which results in the kernel dictionary learning algorithm [32]. Details of the optimization can be found in [32] and Appendix A.
Classification: Let denote the learned dictionaries for classes. Let be a vectorized LR probe image of size . We first find coefficient vectors with at most nonzero coefficients such that approximates by minimizing the following problem
(11) 
for all . The above problem can be solved by the Kernel Orthogonal Matching Pursuit (KOMP) algorithm [32]. The reconstruction error is then computed as
(12) 
where Similar to the linear case, once the residuals are found, we can classify by assigning it to the class, , that gives the lowest reconstruction error, :
(13) 
Our kernel Synthesisbased LR FR (kerSLRFR) algorithm is summarized in Figure 4.
IiiD Joint Nonlinear Dictionary Learning
In the previous sections, we described methods to learn resolutionspecific dictionaries for linear and nonlinear cases. However, even though dictionaries can capture classspecific variations, the recognition performance would go down at low resolutions. Hence, information available in the HR training images must be exploited to make the method robust. To this, we propose a framework of learning joint dictionaries for HR and corresponding LR images. We achieve this through sharing sparse codes between HR and LR dictionaries. This regularizes the learned LR dictionary to output similar sparse codes as HR dictionary, thus, making it robust. The proposed formulation is described as follows.
Let be a nonlinear mapping from dimensional space into a dot product space . We seek to learn dictionaries and by solving the optimization problem:
(14) 
where,
is a hyperparameter. This can be reformulated as:
(15) 
where,
The optimization problem (14) can be solved in a similar way as (9) using a modified version of kernel KSVD algorithm [32]. Details of the method are presented in Appendix A.
Classification: Let denote the learned dictionaries for classes. Then a low resolution probe can be classified using the KOMP algorithm [32], as described in (11), (12) and (13), by substituting for dictionary term. The proposed algorithm joint kernel SLRFR (jointKerSLRFR) is summarized in Figure 5.
Iv Experiments
To demonstrate the effectiveness of our method, in this section, we present experimental results on various face recognition datasets. We deomonstrate the effectiveness of proposed recognition framework, as well as compared with metric learning [12, 11] and superresolution [10, 9] based methods. For all the experiments, we learnt the dictionary elements using PCA features.
Iva FRGC Dataset
We also evaluated on Experiment 1 of the FRGC dataset [33]. It consists of gallery images, each subject having one gallery and probe images under controlled setting. A separate training set of images is also available which was used to learn the PCA basis.
Implementation The resolution of the HR image was fixed at and probe images at resolutions of , and were created by smoothening and downsampling the HR probe images. From each gallery image, 5 different illumination images were produced, which were flipped to give 10 images per subject. The experiments were done at resolutions of and , thus validating the method across resolutions. We also tested the CLPM algorithm [11] and PCA performances on the expanded gallery to get a fair comparison. We also report the recognition rate for PCA using the original gallery image to demonstrate the utility of gallery extension at low resolutions. Results from other algorithms are also tabulated. We chose RBF kernel for tesing kerSLRFR and jointKerSLRFR and set for jointKerSLRFR. The kernel parameter, was obtained through crossvalidation for both HR and LR data.
Observations Figure 6 and Table I show that the proposed methods clearly outperforms previous algorithms. The proposed algorithm, SLRFR improves the CLPM algorithm for all the resolutions, while kerSLRFR further boosts the performance. The jointKerSLRFR shows the best performance for all the methods. The joint sparse coding framework, clearly helps in improving performance at low resolutions. Further, PCA using the extended gallery set also improves the performance over using a single gallery image. This shows that our method of gallery extension can be coupled with the existing face recognition algorithms to improve performance at low resolutions.
Resolution  MDS [12]  S2R2 [9]  VLR [10]  SLRFR  kerSLRFR  jointKerSLRFR 

    
    
   
Sensitivity to noise: Low resolution images are often corrupted with noise. Thus, senstivity of noise is important in assessing performance of different algorithms. Figure 7 shows the recognition rate for different algorithms with increasing noise level. It can be seen that CLPM shows a sharp decline with increasing noise, but the proposed approaches SLRFR, kerSLRFR and jointKerSLRFR are stable with noise. This is because the CLPM algorithm learns a model tailored to noisefree low resolution images, whereas the generative approach in the proposed methods leads to stable performance with increasing noise.
IvB CMUPIE dataset
The PIE dataset [34] consists of subjects in frontal pose and under different illumination conditions. Each subject has face images under different illumination conditions.
Implementation We chose first subjects with randomly chosen illuminations as the training set to learn PCA basis. For the remaining subjects and the illumination conditions, the experiment was done by choosing one gallery image per subject and taking the remaining as the probe image. The procedure was repeated for all the images and the final recognition rate was obtained by averaging over all the images. The size of the HR images was fixed to . The LR images were obtained by smoothening followed by downsampling the HR images. For each galley image, images under different illuminations produced using gallery extension method and the corresponding flipped images were added to the gallery set. The RBF kernel was chosen for kerSLRFR and jointKerSLRFR and the kernel parameter, was set through crossvalidation.
Resolution  MDS [12]  VLR* [10]  SLRFR  kerSLRFR  jointKerSLRFR 

  
 
Observations Figure 8, 9 and Table II show that the proposed method clearly outperforms previous algorithms. The proposed algorithms shows over improvement over PCA performance with the original gallery set at rank one recognition rate and better than the CLPM method at the lowest probe resolution. PCA using the extended gallery set also improves the performance over using a single gallery image. This shows that our method of gallery extension can be coupled with the existing face recognition algorithms to improve performance at low resolutions.
IvC AR Face dataset
We also tested the proposed algorithms on the AR Face dataset [35]. TheAR face dataset consists of faces with varying illumination and expression conditions, captured
in two sessions. We evaluated our algorithms on a set of 100 users. Images from the first session, seven for each subject,were used as training and gallery and the images from the second
session, again seven per subject, were used for testing.
Implementation To test our method and compare with the existing metric learning based methods [11] [12], we chose first subjects from the first session as the training set. For the remaining subjects, the experiment was done by choosing one gallery image per subject from the first session and taking the corresponding images from session 2 as probes. The procedure was repeated for all the images in the session 1 and the final recognition rate was obtained by averaging over all the runs. The size of the HR images was fixed to . The LR images were obtained by smoothening followed by downsampling the HR images to . We also tested the CLPM algorithm [11] and PCA performances on the expanded gallery to get a fair comparison. Results from other algorithms are also tabulated.
Observations Figure 10 shows the CMC curve for the first ranks. Clearly, the proposed approaches outperform other methods. SLRFR gives better rank one performance than CLPM algorithm, while kerSLRFR and jointKerSLRFR further increases the recognition over all the ranks.
IvD Outdoor Face Dataset
We also tested our method on a challenging outdoor face dataset. The database consists of face images of individuals at different distances from camera. We chose a subset of low resolution images, which were also corrupted with blur, illumination and pose variations. high resolution, frontal and wellilluminated images were taken as the gallery set for each subject. The images were aligned using manually selected facial points. The gallery resolution was fixed at and the probe resolution at . Figure 11 shows some of the gallery images and the low quality probe images. The recognition rates for the dataset are shown in Table III. We compare our method with the Regularized Discriminant Analysis (RDA) [36] and CLPM [11]. For the reg LDA comparison, we first used the PCA as a dimensionality reduction method to project the raw data onto an intermediate space, then we used the RDA to project the PCA coefficients onto a final feature space.
Method  Recognition Rate 

reg LDA  
CLPM [11]  
SLRFR  
kerSLRFR  
jointKerSLRFR 
Observations It can be seen from the table that SLRFR outperforms other algorithms on this difficult outdoor face dataset. The kerSLRFR algorithm further improves the performance, however, the jointKerSLRFR doesn’t improve it further. This may be because this is a challenging dataset containing variations other than LR, like pose, blur, etc. The CLPM algorithm performs rather poorly on this dataset, as it is unable to learn the challenging variations in the dataset.
V Computational Efficiency
All the experiments were conducted using 2.13GHz Intel Xeon processor on Matlab programming interface. The gallery extension step using relighting took an average of per gallery image of size . The KSVD Dictionary took on an average to train each class, while classification of a probe image was done in an average of at the resolution of . Thus, the proposed algorithm is computationally efficient . Further, as the extended gallery can be used for all resolutions, it can be computed once and stored for a database.
Vi Discussion and Conclusion
We have proposed an algorithm which can provide good accuracy for low resolution images, even when a single HR gallery image is provided per person. While the method avoids the complexity of previously proposed algorithms, it is also shown to provide stateoftheart results when the LR probe differ in illumination from the given gallery image. The idea of exploiting information in HR gallery image is novel and can be used to extend the limits of remote face recognition. Future extensions to this work will be to extend the proposed method to account for other variations such as pose, expression, etc. The present classification using reconstruction error can be studied further to explore a mix of discriminative and reconstructive techniques to further improve the recognition.
Acknowledgment
This work was supported by a Multidisciplinary University Research Initiative grant from the Office of Naval Research under the grant N000140810638.
References
 [1] W. Zhao, R. Chellappa, P. Phillips, and A. Rosenfeld, “Face recognition: A literature survey,” ACM Computing Surveys, vol. 35, no. 4, pp. 399–458, Dec 2003.
 [2] S. Shekhar, V. M. Patel, and R. Chellappa, “Synthesisbased recognition of low resolution faces,” in International Joint Conference on Biometrics, Oct 2011, pp. 1–6.
 [3] S. Baker and T. Kanade, “Limits on superresolution and how to break them,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1167–1183, Sep 2002.
 [4] W. T. Freeman, T. R. Jones, and E. C. Pasztor, “Examplebased superresolution,” IEEE Computer Graphics and Applications, vol. 22, pp. 56–65, 2002.

[5]
H. Chang, D.Y. Yeung, and Y. Xiong, “Superresolution through neighbor
embedding,” in
IEEE Conference on Computer Vision and Pattern Recognition
, vol. 1, Jun 2004, pp. 275–282.  [6] J. Yang, J. Wright, T. Huang, and Y. Ma, “Image superresolution as sparse representation of raw image patches,” in IEEE Conference on Computer Vision and Pattern Recognition, Jun 2008, pp. 1–8.
 [7] B. Gunturk, A. Batur, Y. Altunbasak, I. Hayes, M.H., and R. Mersereau, “Eigenfacedomain superresolution for face recognition,” IEEE Transactions on Image Processing, vol. 12, no. 5, pp. 597–606, May 2003.
 [8] K. Jia and S. Gong, “Multimodal tensor face for simultaneous superresolution and recognition,” in IEEE International Conference on Computer Vision, vol. 2, Oct 2005, pp. 1683–1690.

[9]
P. HenningsYeomans, S. Baker, and B. Kumar, “Simultaneous superresolution and feature extraction for recognition of lowresolution faces,” in
IEEE Conference on Computer Vision and Pattern Recognition, Jun 2008, pp. 1–8.  [10] W. Zou and P. Yuen, “Very low resolution face recognition problem,” Image Processing, IEEE Transactions on, vol. 21, no. 1, pp. 327–340, Jan 2012.
 [11] B. Li, H. Chang, S. Shan, and X. Chen, “Low resolution face recognition via coupled locality preserving mappings,” in IEEE Signal Processing Letters, vol. 17, no. 1, Jan 2010, pp. 20–23.
 [12] S. Biswas, K. Bowyer, and P. Flynn, “Multidimensional scaling for matching lowresolution facial images,” in IEEE International Conference on Biometrics: Theory Applications and Systems, Sep 2010, pp. 1–6.
 [13] S. Biswas, G. Aggarwal, P. J. Flynn, and K. W. Bowyer, “Poserobust recognition of lowresolution face images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 12, pp. 3037–3049, 2013.
 [14] S. Biswas, K. W. Bowyer, and P. J. Flynn, “Multidimensional scaling for matching lowresolution face images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 10, pp. 2019–2030, Oct 2012.
 [15] C.X. Ren, D.Q. Dai, and H. Yan, “Coupled kernel embedding for lowresolution face image recognition,” IEEE Transactions on Image Processing, vol. 21, no. 8, pp. 3770–3783, Aug 2012.
 [16] S. Siena, V. N. Boddeti, and B. V. Kumar, “Coupled marginal Fisher analysis for lowresolution face recognition,” in ECCV 2012: Workshops and Demonstrations. Springer, 2012, pp. 240–249.
 [17] Z. Lei, S. Liao, A. Jain, and S. Li, “Coupled discriminant analysis for heterogeneous face recognition,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 6, pp. 1707–1716, Dec 2012.
 [18] O. Arandjelovic and R. Cipolla, “Face recognition from video using the generic shapeillumination manifold,” in European Conference on Computer Vision, 2006, pp. IV: 27–40.
 [19] K. Hotta, T. Kurita, and T. Mishima, “Scale invariant face detection method using higherorder local autocorrelation features extracted from logpolar image,” in IEEE International Conference on Automatic Face and Gesture Recognition, Apr 1998, pp. 70–75.
 [20] S.W. Lee, J. Park, and S.W. Lee, “Low resolution face recognition based on support vector data description,” Pattern Recognition, vol. 39, no. 9, pp. 1809–1812, 2006.
 [21] G. Medioni, J. Choi, C.H. Kuo, A. Choudhury, L. Zhang, and D. Fidaleo, “Noncooperative persons identification at a distance with 3D face modeling,” in IEEE International Conference on Biometrics: Theory, Applications, and Systems, Sep 2007, pp. 1–6.
 [22] H. Rara, S. Elhabian, A. Ali, M. Miller, T. Starr, and A. Farag, “Distant face recognition based on sparsestereo reconstruction,” in IEEE International Conference on Image Processin, Nov 2009, pp. 4141–4144.
 [23] J.Y. Choi, Y.M. Ro, and K. Plataniotis, “Color face recognition for degraded face images,” Systems, Man, and Cybernetics, Part B:IEEE Transactions on Cybernetics,, vol. 39, no. 5, pp. 1217–1230, Oct 2009.
 [24] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, Feb 2009.
 [25] V. M. Patel, Y.C. Chen, R. Chellappa, and P. J. Phillips, “Dictionaries for image and videobased face recognition,” J. Opt. Soc. Am. A, vol. 31, no. 5, pp. 1090–1103, May 2014.
 [26] Y. Jia, M. Salzmann, and T. Darrell, “Factorized latent spaces with structured sparsity.” in Neural Information Processing Systems, 2010, pp. 982–990.
 [27] V. Blanz and T. Vetter, “Face recognition based on fitting a 3D morphable model,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, pp. 1063–1074, 2003.
 [28] M. J. Brooks and B. K. P. Horn, “Shape from shading.” Cambridge, MA: MIT Press, 1989.
 [29] S. Biswas, G. Aggarwal, and R. Chellappa, “Robust estimation of albedo for illuminationinvariant matching and shape recovery,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 2, pp. 884–899, Mar 2009.
 [30] K.C. Lee, J. Ho, and D. J. Kriegman, “Acquiring linear subspaces for face recognition under variable lighting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp. 684–698, 2005.
 [31] M. Aharon, M. Elad, and A. Bruckstein, “KSVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–4322, Nov 2006.
 [32] H. Van Nguyen, V. Patel, N. Nasrabadi, and R. Chellappa, “Design of nonlinear kernel dictionaries for object recognition,” IEEE Transactions on Image Processing, vol. 22, no. 12, pp. 5123–5135, Dec 2013.
 [33] P. Phillips, P. Flynn, W. Scruggs, K. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek, “Overview of the face recognition grand challenge,” in IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp. 947–954.
 [34] T. Sim, S. Baker, and M. Bsat, “The CMU pose, illumination, and expression database,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 1, pp. 1615–1618, Dec 2003.
 [35] A. M. Martinez and R. Benavente, “The AR face database,” Tech. Rep., 1998.
 [36] J. Friedman, “Regularized discriminant analysis,” Journal of the American Statistical Association, vol. 84, pp. 165–175, 1989.
Appendix A
Here, we will describe the kernel dictionary learning algorithm [32] and the framework for the proposed joint kernel dictionary learning algorithm (jointKerKSVD).
Via Kernel Dictionary Learning
The optimization problem (9) can be solved in two stages.
Sparse Coding: Here, is kept fixed while searching for the optimal sparse code, . The cost term in (9) can be written as:
Hence, the optimization problem can be broken up into different subproblems:
We can solve this using kernel orthogonal matching pursuit (KOMP). Let denote the set of selected atoms at iteration , denote the reconstruction of the signal, using the selected atoms, being the corresponding residue and the sparse code at iteration.

Start with , , .

Calculate the residue as:
.

Project the residue on atoms not selected and add the atom with maximum projection value to :
(16) Update the set as:

Update the sparse code, and reconstruction, as:
(17) (18) 
; Repeat steps 24 times.
Dictionary update Once the sparse codes are calculated, the dictionary can be updated as:
The dictionary atoms are now normalized to unit norm in feature space:
ViB Joint kernel dictionary learning
The optimization problem (14) can be solved in a similar way as the kernel dictionary learning problem in two alterative steps:
Sparse Coding
Here, we keep and fixed and learn the joint sparse code . The optimization problem (15) can be written as:
Thus, the optimization can be broken up into subproblems:
This is similar to the original kernel dictionary learning formulation, with the signal replaced by . Thus, the above problem can be solved using similar procedure as KOMP. Let denote the set of selected atoms at iteration , denote the reconstruction of the signal, using the selected atoms, being the corresponding residue and the sparse code at iteration.

Start with , , .

Calculate the residue as:
.

Project the residue on atoms not selected and add the atom with maximum projection value to :
(19) where,
Update the set as:

Update the sparse code, and reconstruction, as:
(20) (21) 
; Repeat steps 24 times.
Dictionary update The dictionaries and can now be obtained as:
Further the dictionary atoms are normalized to unit norm in feature space:
Comments
There are no comments yet.