I Introduction
Most countries around the world use Closed Circuit Television (CCTV) systems to combat crime in their major cities. These cameras are normally installed to cover a large field of view where the query face image may not be sampled densely enough by the camera sensors [1]. The lowresolution and quality of face images captured on camera reduces the effectiveness of CCTV in identifying perpetrators and potential eyewitnesses [2, 3].
Superresolution techniques can be used to enhance the quality of lowresolution facial images to improve the recognition performance of existing face recognition software and the identification of individuals from CCTV images. In a recent survey Wang et al. distinguishes between two main categories of superresolution methods: reconstruction based and learning based approaches [4]. Reconstruction based methods register a sequence of lowresolution images onto a highresolution grid and fuse them to suppress the aliasing caused by undersampling [5, 6]. On the other hand, learning based methods use coupled dictionaries to learn the mapping relations between low and high resolution image pairs to synthesize highresolution images from lowresolution images [4, 7]. The research community has lately focused on the latter category of superresolution methods, since they can provide higher quality images and larger magnification factors.
In their seminal work Baker and Kanade [8]
exploited the fact that the human face images are a relatively small subset of natural scenes and introduced the concept of face superresolution (also known as face hallucination) where only facial images are used to construct the dictionaries. The highresolution face image is then hallucinated using Bayesian inference with gradient priors. The authors in
[9] assume that two similar face images share similar local pixel structures so that each pixel could be generated by a linear combination of spatially neighbouring pixels. This method was later extended in [10]where they use sparse local pixel structure. Although these methods were found to perform well at moderately lowresolutions, they fail when considering very lowresolution face images where the local pixel structure is severely distorted. Classical face representation models such as Principal Component Analysis (PCA)
[11, 12, 13, 14, 15], Kernel PCA (KPCA) [16], Locality Preserving Projections (LPP) [17] and NonNegative Matrix Factorization (NMF) [18], were used to model a novel lowresolution face image using a linear combination of prototype lowresolution face images present in a dictionary. The combination weights are then used to combine the corresponding highresolution prototype face images to hallucinate the highresolution face image. Nevertheless, global methods do not manage to recover the local texture details which are essential for face recognition. To alleviate this problem, some methods employ patchbased local approaches such as Markov Random Fields [13, 14, 15], Locally Linear Embedding (LLE) [17], Sparse Coding (SC) [18] as a postprocess.Different data representation methods, including PCA [19], Principal Component Sparse Representation (PCSR) [20], LLE in both spatial [21, 22] and Discrete Cosine Transform (DCT) domain [23, 24, 25], LPP [26], Orthogonal LPP [27]
(OLPP), Tensors
[28], Constrained Least Squares [29], Sparse Representation [30], LocalityConstrained Representation [31, 32], and Local Appearance Similarity (LAS) [33] have used the same concept to hallucinate highresolution overlapping patches which are then stitched together to reconstruct the highresolution face image. All these methods [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33] assume that low and high manifolds have similar local geometrical structures. However, the authors in [21, 34, 35, 36] have shown that this assumption does not hold well because the onetomany mappings from low and highresolution distort the structure of the lowresolution manifold.Motivated by this observation, Coupled Manifold Alignment [35], EasyPartial Least Squares (EZPLS) [36]
, Dual Associative Learning
[37], and Canonical Correlation Analysis (CCA)[38], were used to derive a pair of projection matrices that can be used to project both low and highresolution patches on a common coherent subspace. However, the dimension of the coherent subspaces is equal to the lowest rank of the low and highresolution dictionary matrices. Therefore, the projection from the coherent subspace to the highresolution manifold is illconditioned. On the other hand, the Localityconstrained Iterative Neighbour Embedding (LINE) method presented in [39, 40] reduces the dependence from the lowresolution manifold by iteratively updating the neighbours on the highresolution manifold. This was later extended by the same authors in [41] where an iterative dictionary learning scheme was integrated to bridge the lowresolution and highresolution manifolds. Although this method yields stateoftheart performance, it cannot guarantee to converge to an optimal solution. A recent method based on Transport Analysis was proposed in [42] where the highresolution face image is reconstructed by morphing high resolution training images which best fit the given lowresolution face image. However, this method heavily relies on the assumption that the degradation function is known, which is generally not possible in typical CCTV scenarios.Different automated crossresolution face recognition methods have been proposed to cater for the resolution discrepancy between the gallery and probe images^{1}^{1}1Gallery images are highquality frontal facial images stored in a database which are usually taken in a controlled environment (e.g. ID and passport photos). Probe images are query face images which are compared to each and every face image included in the gallery. Probe images are usually taken in a noncontrolled environment and can have different resolution.
. These methods either try to include the resolution discrepancy within the classifier’s optimization function
[43, 1, 44, 45] or else by projecting both probe and gallery images on a coherent subspace and compute the classification there [46, 47, 48, 49]. However, although these methods are reported to provide good results, they suffer from the following shortcomings i) most of the methods ([44, 46, 47, 48, 49]) do not synthesize a high resolution face image and ii) they generally assume that multiple images per subject are available in the gallery, which are often scarce in practice.This work presents a two layer approach named Linear Models of Coupled Sparse Support (LMCSS), which employs a coupled dictionary containing low and corresponding highresolution training patches to learn the mapping relation between low and highresolution patch pairs. LMCSS first employs all atoms in the dictionary to project the lowresolution patch onto the highresolution manifold to get an initial approximation of the highresolution patch. This solution gives a good estimate of the highresolution groundtruth. However, experimental results provided in section II show that texture detail is better preserved if only a subset of coupled low and highresolution atoms are used for reconstruction. Basis Pursuit Denoising (BPDN) was then used to derive the atoms needed to optimally reconstruct the initial approximation on the highresolution manifold. Given that highresolution patches reside on a highresolution manifold and that the initial solution is sufficiently close to the groundtruth, we exploit the locality of the highresolution manifold to refine the initial solution. This set of coupled sparse support are then used by a Multivariate Ridge Regression (MRR) to model the upscaling function for each patch, which better preserves the texture detail crucial for recognition.
The proposed approach was extensively evaluated against six face hallucination methods using 930 probe images from the FRGC dataset against a gallery of 889 individuals using a closed set identification scenario with one face image per subject in the gallery^{2}^{2}2Collection of gallery images is laborious and expensive in practice. This limits the number of gallery images that can be used in practice for recognition, where frequently only one image per subject is available in the gallery. This problem is referred to as the one sample per person in face recognition literature [50].. This method was found to provide stateoftheart performance in terms of face recognition achieving rank1 recognition gains between 0.4%  1.5% over LINE [41] and between 1%  9% over Sparse Position Patching [30] who ranked second and third respectively. The quality analysis further shows that the proposed method outperforms EigenPatches [19] and PositionPatches [29] by around 0.1 dB and 0.2 dB in Peak SignaltoNoise Ratio (PSNR) respectively, followed by the others.
The remained of the paper is organized as follows In section II we give some notation and concepts needed later in the paper. The proposed method is described in section III while the experimental protocol is outlined in the following section. The experimental result and discussion are presented in section V, while the final comments and conclusion are provided in section VI.
Ii Background
Iia Problem Formulation
We consider a lowresolution face image where the distance between the eye centres is defined as . The goal of face superresolution is to upscale by a scale factor , where represents the distance between the eyecentres of the desired superresolved face image. The image is divided into a set of overlapping patches of size with an overlap of
, and the resulting patches are reshaped to columnvectors in lexicological order and stored as vectors
, where represents the patch index.In order to learn the mapping relation between low and highresolution patches, we have high resolution face images which are registered based on eye and mouth center coordinates, where the distance between eye centres is set to . These images are divided into overlapping patches of size with an overlap of , where stands for the rounding operator. The th patch of every highresolution image is reshaped to columnvectors in lexicological order and placed within the highresolution dictionary . The lowresolution dictionary of the th patch is constructed using the same images present in the highresolution dictionary, which are downscaled by a factor and divided into overlapping patches of size with an overlap of . This formulation is in line with the positionpatch method published in [29] where only collocated patches with index are used to superresolve the low resolution patch .
Without loss of generalization we will assume that the column vectors of both dictionaries are standardized to have zero mean and unit variance to compensate for illumination and contrast variations. The standardized lowresolution patch is denoted by
and the aim of this work is to find an upscaling projection matrix that minimizes the following objective function(1) 
where is the upscaling projection matrix of dimensions . The standardized th highresolution patch is then hallucinated using
(2) 
The pixel intensities of the patch are then recovered using
(3) 
where and
represent the mean and standard deviation of the lowresolution patch
. The resulting hallucinated patches are then stitched together to form the hallucinated highresolution face image .IiB Multivariate Ridge Regression
The direct least squares solution to (1
) can provide singular values. Instead, this work adopts an
norm regularization term to derive a more stable solution for the upscaling projection matrix . The analytic solution of multivariate ridge regression is given by(4)  
where is a regularization parameter used to avoid singular values and
is the identity matrix. This solution will be referred to as the direct upscaling matrix.
Very recently, the authors in [51] have adopted ridge regression to solve a different optimization problem for generic superresolution, which using our notation can be formulated as
(5) 
where their intent is to find the reconstruction weight vector that is optimal to represent the lowresolution patch using a weighted combination of the low resolution dictionary . The same reconstruction weights are then used to combine the highresolution patches in the coupled dictionary to hallucinate the highresolution patch using
(6)  
where represents the hallucinated solution using the method in [51]. It can be seen that the solution in (6) can be interpreted as upscaling the lowresolution patch using the upscaling projection matrix . This is referred to as the indirect upscaling matrix.
It can be shown in Appendix A that . This leads to the following equivalent solutions to the standardized highresolution patch
(7) 
This shows that finding the optimal reconstruction weights to model and then use them to reconstruct the highresolution patch (indirect method) is equivalent to modelling the upscaling projection matrix directly (as the proposed method). Nevertheless, the direct formulation has a complexity of the order while the indirect formulation has a complexity of the order , where .
Iii Linear Models of Coupled Sparse Support
Iiia Quality and Texture Analysis
The multivariate ridge regression solution given in (4) is heavily overdetermined since it employs all columnvectors. This generally results in biased and overlysmooth solutions [30]. Motivated by the success of Neighbour Embedding based schemes, we will investigate here the effect that neighbourhood size has on performance. We define to represent the support which specifies the columnvector indices that are used to model the upscaling projection matrix. Let and represent the coupled subdictionaries using the columnvectors listed in from and respectively. Therefore, the upscaling projection matrix can be solved using
(8)  
where represents the cardinality of set which corresponds to the size of the support. Figure 1 shows the quality and texture analysis as a function of the number of support points . These results were computed using the AR face dataset [52] while the coupled dictionary is constructed using oneimage per subject from the Color Feret [53] and MultiPie [54] datasets. More information about these datasets is provided in section IV.
In this experiment the support was chosen using Nearest neighbours. The quality was measured using the PSNR^{3}^{3}3Similar results were obtained using other fullreference quality metrics such as Structural Similarity (SSIM) and Feature Similarity (FSIM) metrics. while the Texture Consistency (TC) was measured by comparing the Local Binary Pattern (LBP) features of the reference and hallucinated image. The LBP features were extracted using the method in [55] where the similarity was measured using histogram intersection. In this experiment , and . More information is provided in section IV.
The results in Figure 0(a) demonstrate that the PSNR increases rapidly until , and keeps on improving slowly for larger values of . The upscaling function which maximizes the PSNR metric, and is therefore closer to the groundtruth in Euclidean space, was obtained when i.e. all columnvectors are included as support. However, the results in Figure 0(b) show that the texture consistency increases steadily up till and starts degrading (or remains steady) as increases. The subjective results in Figure 2 support this observation where it can be seen that the images derived using (middle row) generally contain more texture detail while the images for (bottom row) are more blurred.
PSNR = 29.48 TS = 0.52  PSNR = 30.23 TS = 0.50  PSNR = 28.95 TS = 0.55  PSNR = 29.25 TS = 0.54 
PSNR = 30.21 TS = 0.49  PSNR = 31.03 TS = 0.48  PSNR = 29.08 TS = 0.53  PSNR = 29.58 TS = 0.51 
All the face hallucination methods found in literature follow the same philosophy of generic superresolution and are designed to maximize an objective measure such as PSNR. These methods assume that increasing the PSNR metric will inherently improve the face recognition performance. The above results and observations reveal that improving the PSNR does not correspond to improving the texture detail of the hallucinated face image. However, recent face recognition methods exploit the texture similarity between probe and gallery images to achieve stateoftheart performance [55, 56]. This indicates that optimizing the face hallucination to minimize the mean square error leads to suboptimal solutions, at least in terms of recognition.
The results in Fig. 0(b) further show that there is a relation between texture consistency and sparsity, i.e. facial images hallucinated using the nearest atoms, where , have better texture consistency. The aim of this work is to learn upscaling models between the low and highresolution manifolds exploiting the local geometrical structure of the highresolution manifold. Opposed to method proposed by Jung [30], Sparse Coding is employed on the highresolution manifold and the upscaling matrix is learned using Multivariate Ridge Regression. The results in section V reveal the superiority of the proposed approach over this method.
IiiB Proposed Method
The proposed method builds on the observations drawn in the previous subsection where the main objective is to find the atoms that are able to better preserve the similarity between the hallucinated and groundtruth images in terms of both texture and quality. Fig. 3 shows a blockdiagram of the proposed method, where in this example the first patch covering the right eye is being processed. The lowresolution patch is first standardized and then passed to the first layer which derives the first approximation of the desired standardized groundtruth , which is not known in practice.
Fig. 4 depicts the geometrical representation of this method and shows that if is sufficiently close to the groundtruth, they will share similar localstructure on the highresolution manifold [57]. The first approximation is computed using
(9)  
where is approximated using all atoms in the coupled dictionaries. This solution ignores the local geometric structure of the lowresolution manifold which is known to be distorted, and tries to approximate the groundtruth vector using the global structure of the lowresolution manifold. This provides a unique and global solution for the approximation of the groundtruth point. Backed up by the results in Fig. 0(a), this solution provides the largest PSNR and is thus close to the groundtruth in Euclidean space. However, as shown in Fig. 2, the solution is generally blurred and lacks important texture details.
The second layer assumes that is sufficiently close to the groundtruth and exploits the locality of the highresolution manifold to refine the first approximated solution and recover the required texture details. The aim here is to find the atoms from the highresolution dictionary which can optimally reconstruct the first approximation . This can be formulated as finding the sparsest representation of using
(10) 
where is the sparse vector, represents the number of nonzero entries in and is the noise parameter. This solution however is intractable since it cannot be solved in polynomial time. The authors in [58, 59] have shown that (10) can be relaxed and solved using Basis Pursuit Denoising (BPDN)
(11) 
where
is a regularization parameter. This optimization can be solved in polynomial time using linear programming. In this work we use the solver provided by SparseLab
^{4}^{4}4The code can be found at https://sparselab.stanford.edu/ to solve the above BPDN problem. The support is then set as the indices of with the largest magnitude. The highresolution patch is then solved using(12) 
It is important to notice that using directly will provide a solution close to the first approximation and therefore the coefficients cannot be used directly. Instead we use the coupled anchor points to get an upscaling projection matrix with richer texture. This can be further explained that by extending (7) for the support , one gets the following relation
(13) 
where
(14) 
These equations indicate that a larger support will lead to averaging a larger number of highresolution atoms, which will result in blurred solutions. Therefore, increasing the support will reduce the ability of the projection matrix to recover texture details.
Fig. 5 shows the performance of the proposed sparse support selection method compared to nearest neighbour, to model the groundtruth samples using a weighted combination of selected atoms from dictionary . The weights were derived using
(15) 
where is the pseudoinverse operator. These results were tested on the AR dataset using similar configuration provided above. It can be seen that the reconstruction of the groundtruth using sparse support provides images of better quality and textureconsistency than when using the nearest neighbours. Therefore, the proposed method finds an optimal initial guess in sense and then finds an optimal set of support using Basis Pursuit Denoising.
The last step involves stitching the overlapping patches together. This process is generally given little importance in literature, where most methods simply average the overlapping regions. The quilting method [60] is an alternative approach which tries to find the minimum error boundary patch and better preserves the texture recovered by each patch when reconstructing the global face. The results in section V show that the performance of face hallucination is significantly dependent on the stitching method used.
Iv Experimental Protocol
The experiments conducted in this paper use different publicly available datasets: i) AR [52], ii) Color Feret [53], iii) MultiPie [54], iv) MEDSII [61] and v) FRGCV2 [62]. The AR dataset was primarily used for the analysis provided in section III, and consists of 112 images with 8 images per subject resulting in a total of 886 images with different expressions.
The dictionary used to learn the upscaling projection matrix for each patch consisted of a composite dataset which includes images from both Color Feret and MultiPie datasets, where only frontal facial images were considered. One image per subject was randomly selected, resulting in a dictionary of of facial images. The gallery consisted of another composite dataset which combined frontal facial images from the FRGCV2 (controlled environment) and MEDS datasets. One unique image per subject was randomly selected, providing a gallery of 889 facial images. The probe images were taken from the FRGCV2 dataset (uncontrolled environment), where two images per subjects were included, resulting in 930 probe images. All probe images are frontal images, however various poses and illuminations were considered. The performance of the proposed method is reported on a closed set identification scenario.
All the images were registered using affine transformation computed on landmark points of the eyes and mouth centres, such that the distance between the eyes . The probe and lowresolution dictionary images were downsampled to the desired scale using MATLAB’s function. Unless stated otherwise all patch based methods are configured such that the number of pixels in a lowresolution patch , the lowresolution overlap , and the patches are stitched by averaging overlapping regions. The dictionaries for each patch are constructed using positionpatch method introduced in [29] and described in section IIA.
Two face recognition methods were adopted in this experiment, namely the LBP face recognition [55] method (which was found to provide stateoftheart performance on the single image per subject problem in [56]) and the Gabor face recognizer [63]^{5}^{5}5The code was provided by the authors in http://www.mathworks.com/matlabcentral/fileexchange/35106thephdfacerecognitiontoolbox.
. The latter method performs classification on the Principal Component Analysis (PCA) subspace. The PCA basis were trained offline on the AR dataset. The proposed method was compared with BiCubic Interpolation, a global face method
[11] and five patchbased methods which are reputed to provide stateoftheart performance [19, 29, 30, 64, 41]. These methods were configured using the same patch size and overlap as indicated above and configured using the optimal parameters provided in their respective papers. The methods were implemented in MATLAB, where the code for [41] was provided by the authors. All simulations were run using a machine with Intel (R) Core (TM) i73687U CPU at 2.10GHz running Windows 64bit Operating system.The proposed method has only three parameters that need to be tuned. The regularization parameter
adopted by Ridge Regression can be easily set to a very small value since its purpose is to perturb the lineardependent vectors within a matrix to avoid singular values. In all experiments, this was set to
. Similarly, the BPDN’s regularization parameter which controls the sparsity of the solution was set to 0.01, since it provided satisfactory performance on the AR dataset. The performance of the proposed method is mainly affected by the number of anchor vectors , where its effect will be extensively analysed in the following section.V Results
Va Parameter Analysis
In this section we investigate the effect of the number of anchor vectors of the proposed method. Fig. 6 shows the average PSNR and Rank1 recognition using LBP face recognizer on all 930 probe images. From these results it can be observed that PSNR increases as the number of neighbours is increased, until where it starts decreasing (or stays in steady state). On the other hand, the best rank1 recognition is attained at , and recognition starts decreasing at larger values of . This can be explained by the fact that increasing the number of support points will increase the number of highresolution atoms to be combined (see (13)), thus reducing the texture consistency of the hallucinated patch. More theoretical details are provided in section IIIB.
VB Recognition Analysis
The recognition performance of the proposed and other stateoftheart face hallucination methods are summarized in Table I
. In this table we adopt the Area Under the ROC curve (AUC) as a scalarvalued measure of accuracy for unsupervised learning
[65] together with the rank1 recognition. It can be seen that the proposed method is most of the time superior to all the methods considered in this experiment when using the same averaging stitching method. This performance can be further improved using the Quilting stitching method which provides better texture preservation properties.



Hall Method  Rec Method  Resolution  
8  10  15  20  
rank1  AUC  rank1  AUC  rank1  AUC  rank1  AUC  
BiCubic  Gabor  0.0000  0.6985  0.0000  0.7823  0.0344  0.8829  0.5215  0.9181 
LBP  0.3065  0.9380  0.5032  0.9598  0.6065  0.9708  0.7054  0.9792  
Eigentransformation [11]  Gabor  0.0591  0.7852  0.1097  0.8359  0.3312  0.8841  0.5183  0.9098 
LBP  0.2559  0.9390  0.4516  0.9554  0.5624  0.9633  0.6495  0.9688  
Neighbour Embedding [64]  Gabor  0.2323  0.8624  0.4710  0.8968  0.6172  0.9182  0.6409  0.9272 
LBP  0.5548  0.9635  0.6398  0.9712  0.7215  0.9795  0.7559  0.9830  
Sparse PositionPatches [30]  Gabor  0.2333  0.8632  0.4645  0.8969  0.6118  0.9152  0.6398  0.9254 
LBP  0.5677  0.9649  0.6441  0.9721  0.7247  0.9803  0.7570  0.9830  
PositionPatches [29]  Gabor  0.1108  0.8354  0.2849  0.8814  0.5774  0.9154  0.6419  0.9281 
LBP  0.4699  0.9588  0.5849  0.9675  0.6849  0.9782  0.7312  0.9812  
EigenPatches [19]  Gabor  0.1613  0.8517  0.3849  0.8934  0.6065  0.9172  0.6387  0.9283 
LBP  0.5226  0.9625  0.6215  0.9704  0.7237  0.9800  0.7602  0.9830  
LINE [41]  Gabor  0.3118  0.8696  0.5011  0.8986  0.6118  0.9168  0.6409  0.9252 
LBP  0.5925  0.9647  0.6559  0.9714  0.7323  0.9804  0.7677  0.9833  
Proposed ()  Gabor  0.2753  0.8803  0.5000  0.9036  0.6183  0.9202  0.6452  0.9281 
LBP  0.6032  0.9658  0.6581  0.9722  0.7398  0.9798  0.7742  0.9833  
Proposed Quilt. ()  Gabor  0.3258  0.8785  0.5086  0.9051  0.6226  0.9204  0.6409  0.9275 
LBP  0.6065  0.9663  0.6656  0.9732  0.7355  0.9799  0.7753  0.9825  

Figure 7 shows the Cumulative Matching Score Curve (CMC) which measures the recognition at different ranks. Because of the lack of space, only CMCs of LBP face recognizer were included. For clarity, only LINE which provided the most competitive rank1 recognition and EigenPatches, which as will be shown in the latter subsection, provides the most competitive results in terms of quality were included, and compared to the oracle where the probe images were not down sampled. Bicubic interpolation was included to show that stateoftheart face recognition methods benefit from the texture details recovered by patchbased hallucination methods, achieving significant higher recognition rates at all ranks. It can also be noted that the proposed method outperforms the other hallucination methods, especially at lower resolutions and lower ranks.
VC Quality Analysis
Table II shows the quality analysis measured in terms of PSNR and SSIM. These results reveal that the proposed method outperforms all other methods achieving PSNR gains of around 0.3 dB over LINE using the same neighbourhood size and 0.1 dB over EigenPatches which uses the whole dictionary, when using the same stitching method. However, the use of quilting to stitch the patches provides a degradation in quality. This can be explained since the averaging stitching acts as a denoising algorithm over overlapping regions which while reduce the texture detail (contributing to reducing the recognition performance), it suppresses the noise hallucinated in each patch.



Hall Method  Resolution  
8  10  15  20  
PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  
BiCubic  24.0292  0.6224  26.2024  0.7338  25.2804  0.7094  28.6663  0.8531 
Eigentransformation [11]  24.3958  0.6496  26.8645  0.7504  24.9374  0.6724  27.7883  0.7892 
Neighbour Embedding [64]  26.9987  0.7533  27.9560  0.7973  29.9892  0.8714  31.6301  0.9122 
PositionPatches [29]  27.3044  0.7731  28.2906  0.8145  30.1887  0.8785  31.7192  0.9143 
Sparse PositionPatches [30]  27.2500  0.7666  28.2219  0.8100  30.1290  0.8767  31.7162  0.9146 
EigenPatches [19]  27.3918  0.7778  28.3847  0.8196  30.3118  0.8842  31.8986  0.9203 
LINE [41]  27.0927  0.7591  28.0253  0.8009  30.0471  0.8727  31.6970  0.9131 
Proposed ()  27.4866  0.7802  28.4200  0.8009  30.3431  0.8845  31.9610  0.9209 
Proposed Quilt. ()  27.3916  0.7762  28.3345  0.8178  30.2512  0.8815  31.8507  0.9185 

Fig. 8 compares subjectively our method with LINE and EigenPatches, which were found to provide the most competitive results in terms of recognition and quality respectively. It can be seen that Eigenpatches generally provides images which are blurred. On the other hand, the images hallucinated by LINE contain more texture detail but have visible artefacts. The images recovered by the proposed method manages to preserve more texture when having at the expense of having more noise. The results in table I reveal that the face recognizers employed in this experiment are more robust to these distortions than to the smoothing effect provided by EigenPatches. These results also reveal that these distortions can be alleviated by simply increasing the size of the support.
VD Complexity Analysis
The complexity of the first layer which computes ridge regression using all entries in the dictionary is of order . The second process first computed BPDN followed by Multivariate Ridge Regression on the selected anchor points. In this work we use the SparseLab solver for BPDN which employs PrimalDual InteriorPoint Algorithm whose complexity is of order [66]. The complexity of Multivariate Ridge Regression using support vectors is of the order . This analysis reveals that the complexity of the proposed method is mostly dependent on the complexity of the sparse optimization method used.



Hall Method  Resolution  
8  10  15  20  
Eigentransformation [11]  2.84  2.69  2.74  2.75 
Neighbour Embedding [64]  0.25  0.33  0.73  1.24 
PositionPatches [29]  1.59  2.11  4.69  8.37 
Sparse PositionPatches [30]  0.59  0.79  1.72  3.02 
EigenPatches [19]  10.89  14.80  34.74  63.98 
LINE [41]  2.23  2.16  2.61  2.83 
Proposed  8.04  5.78  4.71  5.43 

The complexity in terms of the average time taken to synthesize a highresolution image from a lowresolution image in seconds is summarized in Table III. These results show that the proposed method is significantly less computationally intensive than EigenPatches but more complex than the other methods, including LINE. While complexity is not the prime aim of this work, the performance of the proposed scheme can be significantly improved using more efficient minimization algorithms as mentioned in [66].
Vi Conclusion
In this paper, we propose a new approach which can be used to superresolve a highresolution image from a lowresolution test image. The proposed method first derives a smooth approximation of the groundtruth on the highresolution manifold using all entries in the coupled dictionaries. Then we use Basis Pursuit Denoising to find the optimal atomic decomposition to represent the approximated sample . Based on the assumption that the patches reside on a highresolution manifold, we assume that the optimal support to represent is good to reconstruct the groundtruth, and use the coupled support to hallucinate the highresolution patch using Multivariate Ridge Regression. Extensive simulation results demonstrate that the proposed method outperforms the six face superresolution methods in both recognition and quality analysis.
Future work points us in the direction to implement face hallucination techniques which are able to hallucinate and enhance face images afflicted by different distortions such as compression, landmarkpoint misalignment, bad exposure and other distortions commonly found in CCTV images. The ability of current schemes (including the proposed method) are dependent on the dictionaries used, and therefore these schemes can be made more robust by building more robust dictionaries.
Appendix A Indirect and Direct Projection Matrices
Using the results from section IIB, the direct upscaling projection matrix is defined by
(16) 
while the indirect projection matrix according to the authors in [51] is given by
(17) 
The covariance matrix while the covariance matrix , where . This shows that has at least zero eigenvalues. Given the above observation, we hypothesize that if the solutions in (16) and (17) are equivalent, then
(18)  
Therefore, we can conclude that the direct and indirect projection matrices are equivalent.
Acknowledgment
References
 [1] W. Zou and P. Yuen, “Very low resolution face recognition problem,” Image Processing, IEEE Transactions on, vol. 21, no. 1, pp. 327–340, Jan 2012.
 [2] M. A. Sasse, “Not seeing the crime for the cameras?” Commun. ACM, vol. 53, no. 2, pp. 22–25, Feb. 2010. [Online]. Available: http://doi.acm.org/10.1145/1646353.1646363
 [3] N. La Vigne, S. Lowry, J. Markman, and A. Dwyer, “Evaluating the use of public surveillance cameras for crime control and prevention,” Urban Institute Justice Policy Center, Tech. Rep., 2011.

[4]
N. Wang, D. Tao, X. Gao, X. Li, and J. Li, “A
comprehensive survey to face hallucination,”
International Journal of Computer Vision
, vol. 106, no. 1, pp. 9–30, 2014. [Online]. Available: http://dx.doi.org/10.1007/s1126301306459  [5] M. Elad and A. Feuer, “Superresolution reconstruction of image sequences,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 21, no. 9, pp. 817–834, Sep 1999.
 [6] S. Farsiu, M. Robinson, M. Elad, and P. Milanfar, “Fast and robust multiframe super resolution,” Image Processing, IEEE Transactions on, vol. 13, no. 10, pp. 1327–1344, Oct 2004.
 [7] K. Nasrollahi and T. Moeslund, “Superresolution: a comprehensive survey,” Machine Vision and Applications, vol. 25, no. 6, pp. 1423–1468, 2014. [Online]. Available: http://dx.doi.org/10.1007/s0013801406234
 [8] S. Baker and T. Kanade, “Hallucinating faces,” in Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on, 2000, pp. 83–88.
 [9] Y. Hu, K.M. Lam, G. Qiu, and T. Shen, “From local pixel structure to global image superresolution: A new face hallucination framework,” Image Processing, IEEE Transactions on, vol. 20, no. 2, pp. 433–445, Feb 2011.
 [10] Y. Li, C. Cai, G. Qiu, and K.M. Lam, “Face hallucination based on sparse localpixel structure,” Pattern Recognition, vol. 47, no. 3, pp. 1261 – 1270, 2014, handwriting Recognition and other {PR} Applications. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320313003841
 [11] X. Wang and X. Tang, “Hallucinating face by eigentransformation,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 35, no. 3, pp. 425–434, Aug 2005.
 [12] J.S. Park and S.W. Lee, “An examplebased face hallucination method for singleframe, lowresolution facial images,” Image Processing, IEEE Transactions on, vol. 17, no. 10, pp. 1806–1816, Oct 2008.

[13]
C. Liu, H.Y. Shum, and C.S. Zhang, “A twostep approach to hallucinating faces: global parametric model and local nonparametric model,” in
Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1, 2001, pp. I–192–I–198 vol.1.  [14] Y. Li and X. Lin, “An improved twostep approach to hallucinating faces,” in Image and Graphics (ICIG’04), Third International Conference on, Dec 2004, pp. 298–301.
 [15] C. Liu, H.Y. Shum, and W. Freeman, “Face hallucination: Theory and practice,” International Journal of Computer Vision, vol. 75, no. 1, pp. 115–134, 2007. [Online]. Available: http://dx.doi.org/10.1007/s1126300600295
 [16] A. Chakrabarti, A. Rajagopalan, and R. Chellappa, “Superresolution of face images using kernel pcabased prior,” Multimedia, IEEE Transactions on, vol. 9, no. 4, pp. 888–892, June 2007.
 [17] Y. Zhuang, J. Zhang, and F. Wu, “Hallucinating faces: {LPH} superresolution and neighbor reconstruction for residue compensation,” Pattern Recognition, vol. 40, no. 11, pp. 3178 – 3194, 2007. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320307001355
 [18] J. Yang, H. Tang, Y. Ma, and T. Huang, “Face hallucination via sparse coding,” in Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on, Oct 2008, pp. 1264–1267.
 [19] H.Y. Chen and S.Y. Chien, “Eigenpatch: Positionpatch based face hallucination using eigen transformation,” in Multimedia and Expo (ICME), 2014 IEEE International Conference on, July 2014, pp. 1–6.
 [20] T. Lu, R. Hu, Z. Han, J. Jiang, and Y. Xia, “Robust superresolution for face images via principle component sparse representation and least squares regression,” in Circuits and Systems (ISCAS), 2013 IEEE International Symposium on, May 2013, pp. 1199–1202.
 [21] K. Su, Q. Tian, Q. Xue, N. Sebe, and J. Ma, “Neighborhood issue in singleframe image superresolution,” in Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, July 2005, pp. 4 – 7.
 [22] W. Liu, D. Lin, and X. Tang, “Neighbor combination and transformation for hallucinating faces,” in Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, July 2005, pp. 4 pp.–.
 [23] X. Zhang, S. Peng, and J. Jiang, “An adaptive learning method for face hallucination using locality preserving projections,” in Automatic Face Gesture Recognition, 2008. FG ’08. 8th IEEE International Conference on, Sept 2008, pp. 1–8.
 [24] W. Zhang and W.K. Cham, “Hallucinating face in the dct domain,” Image Processing, IEEE Transactions on, vol. 20, no. 10, pp. 2769–2779, Oct 2011.
 [25] X. Du, F. Jiang, and D. Zhao, “Multiscale face hallucination based on frequency bands analysis,” in Visual Communications and Image Processing (VCIP), 2013, Nov 2013, pp. 1–6.
 [26] S. W. Park and M. Savvides, “Breaking the limitation of manifold analysis for superresolution of facial images,” in Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, vol. 1, April 2007, pp. I–573–I–576.
 [27] B. Kumar and R. Aravind, “Face hallucination using olpp and kernel ridge regression,” in Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on, Oct 2008, pp. 353–356.
 [28] W. Liu, D. Lin, and X. Tang, “Hallucinating faces: Tensorpatch superresolution and coupled residue compensation,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2, June 2005, pp. 478–484 vol. 2.
 [29] X. Ma, J. Zhang, and C. Qi, “Positionbased face hallucination method,” in Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on, June 2009, pp. 290–293.
 [30] C. Jung, L. Jiao, B. Liu, and M. Gong, “Positionpatch based face hallucination using convex optimization,” Signal Processing Letters, IEEE, vol. 18, no. 6, pp. 367–370, June 2011.
 [31] J. Jiang, R. Hu, Z. Han, T. Lu, and K. Huang, “Positionpatch based face hallucination via localityconstrained representation,” in Multimedia and Expo (ICME), 2012 IEEE International Conference on, July 2012, pp. 212–217.
 [32] J. Jiang, R. Hu, Z. Wang, and Z. Han, “Noise robust face hallucination via localityconstrained representation,” Multimedia, IEEE Transactions on, vol. 16, no. 5, pp. 1268–1281, Aug 2014.
 [33] H. Li, L. Xu, and G. Liu, “Face hallucination via similarity constraints,” Signal Processing Letters, IEEE, vol. 20, no. 1, pp. 19–22, Jan 2013.
 [34] B. Li, H. Chang, S. Shan, and X. Chen, “Locality preserving constraints for superresolution with neighbor embedding,” in Image Processing (ICIP), 2009 16th IEEE International Conference on, Nov 2009, pp. 1189–1192.
 [35] ——, “Aligning coupled manifolds for face hallucination,” Signal Processing Letters, IEEE, vol. 16, no. 11, pp. 957–960, Nov 2009.
 [36] Y. Hao and C. Qi, “Face hallucination based on modified neighbor embedding and global smoothness constraint,” Signal Processing Letters, IEEE, vol. 21, no. 10, pp. 1187–1191, Oct 2014.
 [37] W. Liu, D. Lin, and X. Tang, “Face hallucination through dual associative learning,” in Image Processing, 2005. ICIP 2005. IEEE International Conference on, vol. 1, Sept 2005, pp. I–873–6.
 [38] H. Huang, H. He, X. Fan, and J. Zhang, “Superresolution of human face image using canonical correlation analysis,” Pattern Recognition, vol. 43, no. 7, pp. 2532 – 2543, 2010. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320310000853
 [39] J. Jiang, R. Hu, Z. Han, Z. Wang, T. Lu, and J. Chen, “Localityconstraint iterative neighbor embedding for face hallucination,” in Multimedia and Expo (ICME), 2013 IEEE International Conference on, July 2013, pp. 1–6.
 [40] S. Qu, R. Hu, S. Chen, J. Jiang, Z. Wang, and J. Chen, “Face hallucination via reidentified knearest neighbors embedding,” in Multimedia and Expo (ICME), 2014 IEEE International Conference on, July 2014, pp. 1–6.
 [41] J. Jiang, R. Hu, Z. Wang, and Z. Han, “Face superresolution via multilayer localityconstrained iterative neighbor embedding and intermediate dictionary learning,” Image Processing, IEEE Transactions on, vol. 23, no. 10, pp. 4220–4231, Oct 2014.
 [42] S. Kolouri and G. Rohde, “Transportbased single frame super resolution of very low resolution face images,” in Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, June 2015, pp. 4876–4884.

[43]
P. HenningsYeomans, S. Baker, and B. Kumar, “Simultaneous superresolution and feature extraction for recognition of lowresolution faces,” in
Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, June 2008, pp. 1–8. 
[44]
H. Bhatt, R. Singh, M. Vatsa, and N. Ratha, “Improving crossresolution face matching using ensemblebased cotransfer learning,”
Image Processing, IEEE Transactions on, vol. 23, no. 12, pp. 5654–5669, Dec 2014. 
[45]
M. Jian and K.M. Lam, “Simultaneous hallucination and recognition of lowresolution faces based on singular value decomposition,”
Circuits and Systems for Video Technology, IEEE Transactions on, vol. 25, no. 11, pp. 1761–1772, Nov 2015.  [46] B. Li, H. Chang, S. Shan, and X. Chen, “Lowresolution face recognition via coupled locality preserving mappings,” Signal Processing Letters, IEEE, vol. 17, no. 1, pp. 20–23, Jan 2010.
 [47] C. Zhou, Z. Zhang, D. Yi, Z. Lei, and S. Li, “Lowresolution face recognition via simultaneous discriminant analysis,” in Biometrics (IJCB), 2011 International Joint Conference on, Oct 2011, pp. 1–6.
 [48] S. Siena, V. Boddeti, and B. Vijaya Kumar, “Coupled marginal fisher analysis for lowresolution face recognition,” in Computer Vision ECCV 2012. Workshops and Demonstrations, ser. Lecture Notes in Computer Science, A. Fusiello, V. Murino, and R. Cucchiara, Eds. Springer Berlin Heidelberg, 2012, vol. 7584, pp. 240–249.
 [49] S. Siena, V. Boddeti, and B. Kumar, “Maximummargin coupled mappings for crossdomain matching,” in Biometrics: Theory, Applications and Systems (BTAS), 2013 IEEE Sixth International Conference on, Sept 2013, pp. 1–8.
 [50] X. Tan, S. Chen, Z.H. Zhou, and F. Zhang, “Face recognition from a single image per person: A survey,” Pattern Recognition, vol. 39, no. 9, pp. 1725 – 1745, 2006. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320306001270
 [51] K. Zhang, D. Tao, X. Gao, X. Li, and Z. Xiong, “Learning multiple linear mappings for efficient single image superresolution,” Image Processing, IEEE Transactions on, vol. 24, no. 3, pp. 846–861, March 2015.
 [52] A. Martinez and R. Benavente, “The ar face database,” Robot Vision Lab, Purdue University, Tech. Rep. 5, Apr. 1998. [Online]. Available: http://dx.doi.org/10.1016/s02628856(97)00070x
 [53] Phillips, H. Wechsler, J. Huang, and P. J. Rauss, “The FERET database and evaluation procedure for facerecognition algorithms,” Image and Vision Computing, vol. 16, no. 5, pp. 295–306, Apr. 1998. [Online]. Available: http://dx.doi.org/10.1016/s02628856(97)00070x
 [54] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, “Multipie,” Image Vision Comput., vol. 28, no. 5, pp. 807–813, May 2010. [Online]. Available: http://dx.doi.org/10.1016/j.imavis.2009.08.002
 [55] T. Ahonen, A. Hadid, and M. Pietikainen, “Face description with local binary patterns: Application to face recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 12, pp. 2037–2041, Dec. 2006. [Online]. Available: http://dx.doi.org/10.1109/TPAMI.2006.244
 [56] A. Wagner, J. Wright, A. Ganesh, Z. Zhou, H. Mobahi, and Y. Ma, “Toward a practical face recognition system: Robust alignment and illumination by sparse representation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 34, no. 2, pp. 372–386, Feb 2012.
 [57] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Localityconstrained linear coding for image classification,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, June 2010, pp. 3360–3367.
 [58] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomic decomposition,” IEEE Trans. Inf. Theor., vol. 47, no. 7, pp. 2845–2862, Nov. 2001. [Online]. Available: http://dx.doi.org/10.1109/18.959265
 [59] D. Donoho, M. Elad, and V. Temlyakov, “Stable recovery of sparse overcomplete representations in the presence of noise,” Information Theory, IEEE Transactions on, vol. 52, no. 1, pp. 6–18, Jan 2006.
 [60] A. A. Efros and W. T. Freeman, “Image quilting for texture synthesis and transfer,” in Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, ser. SIGGRAPH ’01. New York, NY, USA: ACM, 2001, pp. 341–346. [Online]. Available: http://doi.acm.org/10.1145/383259.383296
 [61] A. P. Founds, N. Orlans, G. Whiddon, and C. Watson, “Nist special database 32 multiple encounter dataset ii (medaii),” National Institute of Standards and Technology, Tech. Rep., 2011.
 [62] P. J. Phillips, P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek, “Overview of the face recognition grand challenge,” in Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)  Volume 1  Volume 01, ser. CVPR ’05. Washington, DC, USA: IEEE Computer Society, 2005, pp. 947–954. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2005.268
 [63] V. Štruc and N. Pavešić, “The complete gaborfisher classifier for robust face recognition,” EURASIP J. Adv. Signal Process, vol. 2010, pp. 31:1–31:13, Feb. 2010. [Online]. Available: http://dx.doi.org/10.1155/2010/847680
 [64] H. Chang, D.Y. Yeung, and Y. Xiong, “Superresolution through neighbor embedding,” in Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, vol. 1, June 2004, pp. I–I.
 [65] G. B. Huang and E. LearnedMiller, “Labeled faces in the wild: Updates and new reporting procedures,” University of Massachusetts, Tech. Rep., 2014.
 [66] A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Fast l1minimization algorithms and an application in robust face recognition: A review,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS201013, Feb 2010. [Online]. Available: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS201013.html
Comments
There are no comments yet.