Face Hallucination using Linear Models of Coupled Sparse Support

12/18/2015 ∙ by Reuben Farrugia, et al. ∙ University of Malta Inria 0

Most face super-resolution methods assume that low-resolution and high-resolution manifolds have similar local geometrical structure, hence learn local models on the lowresolution manifolds (e.g. sparse or locally linear embedding models), which are then applied on the high-resolution manifold. However, the low-resolution manifold is distorted by the oneto-many relationship between low- and high- resolution patches. This paper presents a method which learns linear models based on the local geometrical structure on the high-resolution manifold rather than on the low-resolution manifold. For this, in a first step, the low-resolution patch is used to derive a globally optimal estimate of the high-resolution patch. The approximated solution is shown to be close in Euclidean space to the ground-truth but is generally smooth and lacks the texture details needed by state-ofthe-art face recognizers. This first estimate allows us to find the support of the high-resolution manifold using sparse coding (SC), which are then used as support for learning a local projection (or upscaling) model between the low-resolution and the highresolution manifolds using Multivariate Ridge Regression (MRR). Experimental results show that the proposed method outperforms six face super-resolution methods in terms of both recognition and quality. These results also reveal that the recognition and quality are significantly affected by the method used for stitching all super-resolved patches together, where quilting was found to better preserve the texture details which helps to achieve higher recognition rates.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Most countries around the world use Closed Circuit Television (CCTV) systems to combat crime in their major cities. These cameras are normally installed to cover a large field of view where the query face image may not be sampled densely enough by the camera sensors [1]. The low-resolution and quality of face images captured on camera reduces the effectiveness of CCTV in identifying perpetrators and potential eyewitnesses [2, 3].

Super-resolution techniques can be used to enhance the quality of low-resolution facial images to improve the recognition performance of existing face recognition software and the identification of individuals from CCTV images. In a recent survey Wang et al. distinguishes between two main categories of super-resolution methods: reconstruction based and learning based approaches [4]. Reconstruction based methods register a sequence of low-resolution images onto a high-resolution grid and fuse them to suppress the aliasing caused by under-sampling [5, 6]. On the other hand, learning based methods use coupled dictionaries to learn the mapping relations between low- and high- resolution image pairs to synthesize high-resolution images from low-resolution images [4, 7]. The research community has lately focused on the latter category of super-resolution methods, since they can provide higher quality images and larger magnification factors.

In their seminal work Baker and Kanade [8]

exploited the fact that the human face images are a relatively small subset of natural scenes and introduced the concept of face super-resolution (also known as face hallucination) where only facial images are used to construct the dictionaries. The high-resolution face image is then hallucinated using Bayesian inference with gradient priors. The authors in

[9] assume that two similar face images share similar local pixel structures so that each pixel could be generated by a linear combination of spatially neighbouring pixels. This method was later extended in [10]

where they use sparse local pixel structure. Although these methods were found to perform well at moderately low-resolutions, they fail when considering very low-resolution face images where the local pixel structure is severely distorted. Classical face representation models such as Principal Component Analysis (PCA)

[11, 12, 13, 14, 15], Kernel PCA (KPCA) [16], Locality Preserving Projections (LPP) [17] and Non-Negative Matrix Factorization (NMF) [18], were used to model a novel low-resolution face image using a linear combination of prototype low-resolution face images present in a dictionary. The combination weights are then used to combine the corresponding high-resolution prototype face images to hallucinate the high-resolution face image. Nevertheless, global methods do not manage to recover the local texture details which are essential for face recognition. To alleviate this problem, some methods employ patch-based local approaches such as Markov Random Fields [13, 14, 15], Locally Linear Embedding (LLE) [17], Sparse Coding (SC) [18] as a post-process.

Different data representation methods, including PCA [19], Principal Component Sparse Representation (PCSR) [20], LLE in both spatial [21, 22] and Discrete Cosine Transform (DCT) domain [23, 24, 25], LPP [26], Orthogonal LPP [27]

(OLPP), Tensors

[28], Constrained Least Squares [29], Sparse Representation [30], Locality-Constrained Representation [31, 32], and Local Appearance Similarity (LAS) [33] have used the same concept to hallucinate high-resolution overlapping patches which are then stitched together to reconstruct the high-resolution face image. All these methods [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33] assume that low- and high- manifolds have similar local geometrical structures. However, the authors in [21, 34, 35, 36] have shown that this assumption does not hold well because the one-to-many mappings from low- and high-resolution distort the structure of the low-resolution manifold.

Motivated by this observation, Coupled Manifold Alignment [35], Easy-Partial Least Squares (EZ-PLS) [36]

, Dual Associative Learning

[37], and Canonical Correlation Analysis (CCA)[38], were used to derive a pair of projection matrices that can be used to project both low- and high-resolution patches on a common coherent subspace. However, the dimension of the coherent sub-spaces is equal to the lowest rank of the low- and high-resolution dictionary matrices. Therefore, the projection from the coherent sub-space to the high-resolution manifold is ill-conditioned. On the other hand, the Locality-constrained Iterative Neighbour Embedding (LINE) method presented in [39, 40] reduces the dependence from the low-resolution manifold by iteratively updating the neighbours on the high-resolution manifold. This was later extended by the same authors in [41] where an iterative dictionary learning scheme was integrated to bridge the low-resolution and high-resolution manifolds. Although this method yields state-of-the-art performance, it cannot guarantee to converge to an optimal solution. A recent method based on Transport Analysis was proposed in [42] where the high-resolution face image is reconstructed by morphing high resolution training images which best fit the given low-resolution face image. However, this method heavily relies on the assumption that the degradation function is known, which is generally not possible in typical CCTV scenarios.

Different automated cross-resolution face recognition methods have been proposed to cater for the resolution discrepancy between the gallery and probe images111Gallery images are high-quality frontal facial images stored in a database which are usually taken in a controlled environment (e.g. ID and passport photos). Probe images are query face images which are compared to each and every face image included in the gallery. Probe images are usually taken in a non-controlled environment and can have different resolution.

. These methods either try to include the resolution discrepancy within the classifier’s optimization function

[43, 1, 44, 45] or else by projecting both probe and gallery images on a coherent subspace and compute the classification there [46, 47, 48, 49]. However, although these methods are reported to provide good results, they suffer from the following shortcomings i) most of the methods ([44, 46, 47, 48, 49]) do not synthesize a high resolution face image and ii) they generally assume that multiple images per subject are available in the gallery, which are often scarce in practice.

This work presents a two layer approach named Linear Models of Coupled Sparse Support (LM-CSS), which employs a coupled dictionary containing low- and corresponding high-resolution training patches to learn the mapping relation between low- and high-resolution patch pairs. LM-CSS first employs all atoms in the dictionary to project the low-resolution patch onto the high-resolution manifold to get an initial approximation of the high-resolution patch. This solution gives a good estimate of the high-resolution ground-truth. However, experimental results provided in section II show that texture detail is better preserved if only a sub-set of coupled low- and high-resolution atoms are used for reconstruction. Basis Pursuit De-noising (BPDN) was then used to derive the atoms needed to optimally reconstruct the initial approximation on the high-resolution manifold. Given that high-resolution patches reside on a high-resolution manifold and that the initial solution is sufficiently close to the ground-truth, we exploit the locality of the high-resolution manifold to refine the initial solution. This set of coupled sparse support are then used by a Multivariate Ridge Regression (MRR) to model the up-scaling function for each patch, which better preserves the texture detail crucial for recognition.

The proposed approach was extensively evaluated against six face hallucination methods using 930 probe images from the FRGC dataset against a gallery of 889 individuals using a closed set identification scenario with one face image per subject in the gallery222Collection of gallery images is laborious and expensive in practice. This limits the number of gallery images that can be used in practice for recognition, where frequently only one image per subject is available in the gallery. This problem is referred to as the one sample per person in face recognition literature [50].. This method was found to provide state-of-the-art performance in terms of face recognition achieving rank-1 recognition gains between 0.4% - 1.5% over LINE [41] and between 1% - 9% over Sparse Position Patching [30] who ranked second and third respectively. The quality analysis further shows that the proposed method outperforms Eigen-Patches [19] and Position-Patches [29] by around 0.1 dB and 0.2 dB in Peak Signal-to-Noise Ratio (PSNR) respectively, followed by the others.

The remained of the paper is organized as follows In section II we give some notation and concepts needed later in the paper. The proposed method is described in section III while the experimental protocol is outlined in the following section. The experimental result and discussion are presented in section V, while the final comments and conclusion are provided in section VI.

Ii Background

Ii-a Problem Formulation

We consider a low-resolution face image where the distance between the eye centres is defined as . The goal of face super-resolution is to up-scale by a scale factor , where represents the distance between the eye-centres of the desired super-resolved face image. The image is divided into a set of overlapping patches of size with an overlap of

, and the resulting patches are reshaped to column-vectors in lexicological order and stored as vectors

, where represents the patch index.

In order to learn the mapping relation between low- and high-resolution patches, we have high resolution face images which are registered based on eye and mouth center coordinates, where the distance between eye centres is set to . These images are divided into overlapping patches of size with an overlap of , where stands for the rounding operator. The -th patch of every high-resolution image is reshaped to column-vectors in lexicological order and placed within the high-resolution dictionary . The low-resolution dictionary of the -th patch is constructed using the same images present in the high-resolution dictionary, which are down-scaled by a factor and divided into overlapping patches of size with an overlap of . This formulation is in line with the position-patch method published in [29] where only collocated patches with index are used to super-resolve the low resolution patch .

Without loss of generalization we will assume that the column vectors of both dictionaries are standardized to have zero mean and unit variance to compensate for illumination and contrast variations. The standardized low-resolution patch is denoted by

and the aim of this work is to find an up-scaling projection matrix that minimizes the following objective function

(1)

where is the up-scaling projection matrix of dimensions . The standardized -th high-resolution patch is then hallucinated using

(2)

The pixel intensities of the patch are then recovered using

(3)

where and

represent the mean and standard deviation of the low-resolution patch

. The resulting hallucinated patches are then stitched together to form the hallucinated high-resolution face image .

Ii-B Multivariate Ridge Regression

The direct least squares solution to (1

) can provide singular values. Instead, this work adopts an

norm regularization term to derive a more stable solution for the up-scaling projection matrix . The analytic solution of multivariate ridge regression is given by

(4)

where is a regularization parameter used to avoid singular values and

is the identity matrix. This solution will be referred to as the direct up-scaling matrix.

Very recently, the authors in [51] have adopted ridge regression to solve a different optimization problem for generic super-resolution, which using our notation can be formulated as

(5)

where their intent is to find the reconstruction weight vector that is optimal to represent the low-resolution patch using a weighted combination of the low resolution dictionary . The same reconstruction weights are then used to combine the high-resolution patches in the coupled dictionary to hallucinate the high-resolution patch using

(6)

where represents the hallucinated solution using the method in [51]. It can be seen that the solution in (6) can be interpreted as up-scaling the low-resolution patch using the up-scaling projection matrix . This is referred to as the indirect up-scaling matrix.

It can be shown in Appendix A that . This leads to the following equivalent solutions to the standardized high-resolution patch

(7)

This shows that finding the optimal reconstruction weights to model and then use them to reconstruct the high-resolution patch (indirect method) is equivalent to modelling the up-scaling projection matrix directly (as the proposed method). Nevertheless, the direct formulation has a complexity of the order while the indirect formulation has a complexity of the order , where .

Iii Linear Models of Coupled Sparse Support

Iii-a Quality and Texture Analysis

The multivariate ridge regression solution given in (4) is heavily over-determined since it employs all column-vectors. This generally results in biased and overly-smooth solutions [30]. Motivated by the success of Neighbour Embedding based schemes, we will investigate here the effect that neighbourhood size has on performance. We define to represent the support which specifies the column-vector indices that are used to model the up-scaling projection matrix. Let and represent the coupled sub-dictionaries using the column-vectors listed in from and respectively. Therefore, the up-scaling projection matrix can be solved using

(8)

where represents the cardinality of set which corresponds to the size of the support. Figure 1 shows the quality and texture analysis as a function of the number of support points . These results were computed using the AR face dataset [52] while the coupled dictionary is constructed using one-image per subject from the Color Feret [53] and Multi-Pie [54] datasets. More information about these datasets is provided in section IV.

(a) Quality Analysis
(b) Texture Analysis
Fig. 1: Performance analysis using different neighbourhood size .

In this experiment the support was chosen using Nearest neighbours. The quality was measured using the PSNR333Similar results were obtained using other full-reference quality metrics such as Structural Similarity (SSIM) and Feature Similarity (FSIM) metrics. while the Texture Consistency (TC) was measured by comparing the Local Binary Pattern (LBP) features of the reference and hallucinated image. The LBP features were extracted using the method in [55] where the similarity was measured using histogram intersection. In this experiment , and . More information is provided in section IV.

The results in Figure 0(a) demonstrate that the PSNR increases rapidly until , and keeps on improving slowly for larger values of . The up-scaling function which maximizes the PSNR metric, and is therefore closer to the ground-truth in Euclidean space, was obtained when i.e. all column-vectors are included as support. However, the results in Figure 0(b) show that the texture consistency increases steadily up till and starts degrading (or remains steady) as increases. The subjective results in Figure 2 support this observation where it can be seen that the images derived using (middle row) generally contain more texture detail while the images for (bottom row) are more blurred.

PSNR = 29.48 TS = 0.52 PSNR = 30.23 TS = 0.50 PSNR = 28.95 TS = 0.55 PSNR = 29.25 TS = 0.54
PSNR = 30.21 TS = 0.49 PSNR = 31.03 TS = 0.48 PSNR = 29.08 TS = 0.53 PSNR = 29.58 TS = 0.51
Fig. 2: Comparison of super-resolution results at resolution where the top row represents the high-resolution ground truth, middle row the hallucination with and bottom row the hallucination with .

All the face hallucination methods found in literature follow the same philosophy of generic super-resolution and are designed to maximize an objective measure such as PSNR. These methods assume that increasing the PSNR metric will inherently improve the face recognition performance. The above results and observations reveal that improving the PSNR does not correspond to improving the texture detail of the hallucinated face image. However, recent face recognition methods exploit the texture similarity between probe and gallery images to achieve state-of-the-art performance [55, 56]. This indicates that optimizing the face hallucination to minimize the mean square error leads to sub-optimal solutions, at least in terms of recognition.

The results in Fig. 0(b) further show that there is a relation between texture consistency and sparsity, i.e. facial images hallucinated using the -nearest atoms, where , have better texture consistency. The aim of this work is to learn upscaling models between the low- and high-resolution manifolds exploiting the local geometrical structure of the high-resolution manifold. Opposed to method proposed by Jung [30], Sparse Coding is employed on the high-resolution manifold and the up-scaling matrix is learned using Multivariate Ridge Regression. The results in section V reveal the superiority of the proposed approach over this method.

Iii-B Proposed Method

The proposed method builds on the observations drawn in the previous sub-section where the main objective is to find the atoms that are able to better preserve the similarity between the hallucinated and ground-truth images in terms of both texture and quality. Fig. 3 shows a block-diagram of the proposed method, where in this example the first patch covering the right eye is being processed. The low-resolution patch is first standardized and then passed to the first layer which derives the first approximation of the desired standardized ground-truth , which is not known in practice.

Layer 1StitchingLayer 0HR DictionaryLR DictionaryStand.Inv. Stand.
Fig. 3: Block Diagram of the proposed method.

Fig. 4 depicts the geometrical representation of this method and shows that if is sufficiently close to the ground-truth, they will share similar local-structure on the high-resolution manifold [57]. The first approximation is computed using

(9)

where is approximated using all atoms in the coupled dictionaries. This solution ignores the local geometric structure of the low-resolution manifold which is known to be distorted, and tries to approximate the ground-truth vector using the global structure of the low-resolution manifold. This provides a unique and global solution for the approximation of the ground-truth point. Backed up by the results in Fig. 0(a), this solution provides the largest PSNR and is thus close to the ground-truth in Euclidean space. However, as shown in Fig. 2, the solution is generally blurred and lacks important texture details.

coupled supportfirst approx. ground-truthlow-res patch
Fig. 4: Illustration of the concept behind this work.

The second layer assumes that is sufficiently close to the ground-truth and exploits the locality of the high-resolution manifold to refine the first approximated solution and recover the required texture details. The aim here is to find the atoms from the high-resolution dictionary which can optimally reconstruct the first approximation . This can be formulated as finding the sparsest representation of using

(10)

where is the sparse vector, represents the number of non-zero entries in and is the noise parameter. This solution however is intractable since it cannot be solved in polynomial time. The authors in [58, 59] have shown that (10) can be relaxed and solved using Basis Pursuit Denoising (BPDN)

(11)

where

is a regularization parameter. This optimization can be solved in polynomial time using linear programming. In this work we use the solver provided by SparseLab

444The code can be found at https://sparselab.stanford.edu/ to solve the above BPDN problem. The support is then set as the indices of with the largest magnitude. The high-resolution patch is then solved using

(12)

It is important to notice that using directly will provide a solution close to the first approximation and therefore the coefficients cannot be used directly. Instead we use the coupled anchor points to get an up-scaling projection matrix with richer texture. This can be further explained that by extending (7) for the support , one gets the following relation

(13)

where

(14)

These equations indicate that a larger support will lead to averaging a larger number of high-resolution atoms, which will result in blurred solutions. Therefore, increasing the support will reduce the ability of the projection matrix to recover texture details.

Fig. 5 shows the performance of the proposed sparse support selection method compared to -nearest neighbour, to model the ground-truth samples using a weighted combination of selected atoms from dictionary . The weights were derived using

(15)

where is the pseudo-inverse operator. These results were tested on the AR dataset using similar configuration provided above. It can be seen that the reconstruction of the ground-truth using sparse support provides images of better quality and texture-consistency than when using the nearest neighbours. Therefore, the proposed method finds an optimal initial guess in sense and then finds an optimal set of support using Basis Pursuit Denoising.

(a) Quality Analysis
(b) Texture Analysis
Fig. 5: Comparing the modelling ability using two different support selection to model the ground-truth samples when the high-resolution patch size is set to 20.

The last step involves stitching the overlapping patches together. This process is generally given little importance in literature, where most methods simply average the overlapping regions. The quilting method [60] is an alternative approach which tries to find the minimum error boundary patch and better preserves the texture recovered by each patch when reconstructing the global face. The results in section V show that the performance of face hallucination is significantly dependent on the stitching method used.

Iv Experimental Protocol

The experiments conducted in this paper use different publicly available datasets: i) AR [52], ii) Color Feret [53], iii) Multi-Pie [54], iv) MEDS-II [61] and v) FRGC-V2 [62]. The AR dataset was primarily used for the analysis provided in section III, and consists of 112 images with 8 images per subject resulting in a total of 886 images with different expressions.

The dictionary used to learn the up-scaling projection matrix for each patch consisted of a composite dataset which includes images from both Color Feret and Multi-Pie datasets, where only frontal facial images were considered. One image per subject was randomly selected, resulting in a dictionary of of facial images. The gallery consisted of another composite dataset which combined frontal facial images from the FRGC-V2 (controlled environment) and MEDS datasets. One unique image per subject was randomly selected, providing a gallery of 889 facial images. The probe images were taken from the FRGC-V2 dataset (uncontrolled environment), where two images per subjects were included, resulting in 930 probe images. All probe images are frontal images, however various poses and illuminations were considered. The performance of the proposed method is reported on a closed set identification scenario.

All the images were registered using affine transformation computed on landmark points of the eyes and mouth centres, such that the distance between the eyes . The probe and low-resolution dictionary images were down-sampled to the desired scale using MATLAB’s function. Unless stated otherwise all patch based methods are configured such that the number of pixels in a low-resolution patch , the low-resolution overlap , and the patches are stitched by averaging overlapping regions. The dictionaries for each patch are constructed using position-patch method introduced in [29] and described in section II-A.

Two face recognition methods were adopted in this experiment, namely the LBP face recognition [55] method (which was found to provide state-of-the-art performance on the single image per subject problem in [56]) and the Gabor face recognizer [63]555The code was provided by the authors in http://www.mathworks.com/matlabcentral/fileexchange/35106-the-phd-face-recognition-toolbox.

. The latter method performs classification on the Principal Component Analysis (PCA) subspace. The PCA basis were trained off-line on the AR dataset. The proposed method was compared with Bi-Cubic Interpolation, a global face method

[11] and five patch-based methods which are reputed to provide state-of-the-art performance [19, 29, 30, 64, 41]. These methods were configured using the same patch size and overlap as indicated above and configured using the optimal parameters provided in their respective papers. The methods were implemented in MATLAB, where the code for [41] was provided by the authors. All simulations were run using a machine with Intel (R) Core (TM) i7-3687U CPU at 2.10GHz running Windows 64-bit Operating system.

The proposed method has only three parameters that need to be tuned. The regularization parameter

adopted by Ridge Regression can be easily set to a very small value since its purpose is to perturb the linear-dependent vectors within a matrix to avoid singular values. In all experiments, this was set to

. Similarly, the BPDN’s regularization parameter which controls the sparsity of the solution was set to 0.01, since it provided satisfactory performance on the AR dataset. The performance of the proposed method is mainly affected by the number of anchor vectors , where its effect will be extensively analysed in the following section.

V Results

V-a Parameter Analysis

In this section we investigate the effect of the number of anchor vectors of the proposed method. Fig. 6 shows the average PSNR and Rank-1 recognition using LBP face recognizer on all 930 probe images. From these results it can be observed that PSNR increases as the number of neighbours is increased, until where it starts decreasing (or stays in steady state). On the other hand, the best rank-1 recognition is attained at , and recognition starts decreasing at larger values of . This can be explained by the fact that increasing the number of support points will increase the number of high-resolution atoms to be combined (see (13)), thus reducing the texture consistency of the hallucinated patch. More theoretical details are provided in section III-B.

(a) Quality Analysis
(b) Recognition Analysis
Fig. 6: Analysing the effect of the number of anchor vectors has on the performance of the proposed hallucination method.

V-B Recognition Analysis

The recognition performance of the proposed and other state-of-the-art face hallucination methods are summarized in Table I

. In this table we adopt the Area Under the ROC curve (AUC) as a scalar-valued measure of accuracy for unsupervised learning

[65] together with the rank-1 recognition. It can be seen that the proposed method is most of the time superior to all the methods considered in this experiment when using the same averaging stitching method. This performance can be further improved using the Quilting stitching method which provides better texture preservation properties.

 

Hall Method Rec Method Resolution
8 10 15 20
rank-1 AUC rank-1 AUC rank-1 AUC rank-1 AUC
Bi-Cubic Gabor 0.0000 0.6985 0.0000 0.7823 0.0344 0.8829 0.5215 0.9181
LBP 0.3065 0.9380 0.5032 0.9598 0.6065 0.9708 0.7054 0.9792
Eigentransformation [11] Gabor 0.0591 0.7852 0.1097 0.8359 0.3312 0.8841 0.5183 0.9098
LBP 0.2559 0.9390 0.4516 0.9554 0.5624 0.9633 0.6495 0.9688
Neighbour Embedding [64] Gabor 0.2323 0.8624 0.4710 0.8968 0.6172 0.9182 0.6409 0.9272
LBP 0.5548 0.9635 0.6398 0.9712 0.7215 0.9795 0.7559 0.9830
Sparse Position-Patches [30] Gabor 0.2333 0.8632 0.4645 0.8969 0.6118 0.9152 0.6398 0.9254
LBP 0.5677 0.9649 0.6441 0.9721 0.7247 0.9803 0.7570 0.9830
Position-Patches [29] Gabor 0.1108 0.8354 0.2849 0.8814 0.5774 0.9154 0.6419 0.9281
LBP 0.4699 0.9588 0.5849 0.9675 0.6849 0.9782 0.7312 0.9812
Eigen-Patches [19] Gabor 0.1613 0.8517 0.3849 0.8934 0.6065 0.9172 0.6387 0.9283
LBP 0.5226 0.9625 0.6215 0.9704 0.7237 0.9800 0.7602 0.9830
LINE [41] Gabor 0.3118 0.8696 0.5011 0.8986 0.6118 0.9168 0.6409 0.9252
LBP 0.5925 0.9647 0.6559 0.9714 0.7323 0.9804 0.7677 0.9833
Proposed () Gabor 0.2753 0.8803 0.5000 0.9036 0.6183 0.9202 0.6452 0.9281
LBP 0.6032 0.9658 0.6581 0.9722 0.7398 0.9798 0.7742 0.9833
Proposed Quilt. () Gabor 0.3258 0.8785 0.5086 0.9051 0.6226 0.9204 0.6409 0.9275
LBP 0.6065 0.9663 0.6656 0.9732 0.7355 0.9799 0.7753 0.9825

 

TABLE I: Summary of the Rank-1 recognition results and Area Under Curve (AUC) metric using two different face recognition algorithms.

Figure 7 shows the Cumulative Matching Score Curve (CMC) which measures the recognition at different ranks. Because of the lack of space, only CMCs of LBP face recognizer were included. For clarity, only LINE which provided the most competitive rank-1 recognition and Eigen-Patches, which as will be shown in the latter subsection, provides the most competitive results in terms of quality were included, and compared to the oracle where the probe images were not down sampled. Bi-cubic interpolation was included to show that state-of-the-art face recognition methods benefit from the texture details recovered by patch-based hallucination methods, achieving significant higher recognition rates at all ranks. It can also be noted that the proposed method outperforms the other hallucination methods, especially at lower resolutions and lower ranks.

(a)
(b)
(c)
(d)
Fig. 7: Cumulative Matching Score Curves (CMC) of face images hallucinated from different resolutions .

V-C Quality Analysis

Table II shows the quality analysis measured in terms of PSNR and SSIM. These results reveal that the proposed method outperforms all other methods achieving PSNR gains of around 0.3 dB over LINE using the same neighbourhood size and 0.1 dB over Eigen-Patches which uses the whole dictionary, when using the same stitching method. However, the use of quilting to stitch the patches provides a degradation in quality. This can be explained since the averaging stitching acts as a denoising algorithm over overlapping regions which while reduce the texture detail (contributing to reducing the recognition performance), it suppresses the noise hallucinated in each patch.

 

Hall Method Resolution
8 10 15 20
PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
Bi-Cubic 24.0292 0.6224 26.2024 0.7338 25.2804 0.7094 28.6663 0.8531
Eigentransformation [11] 24.3958 0.6496 26.8645 0.7504 24.9374 0.6724 27.7883 0.7892
Neighbour Embedding [64] 26.9987 0.7533 27.9560 0.7973 29.9892 0.8714 31.6301 0.9122
Position-Patches [29] 27.3044 0.7731 28.2906 0.8145 30.1887 0.8785 31.7192 0.9143
Sparse Position-Patches [30] 27.2500 0.7666 28.2219 0.8100 30.1290 0.8767 31.7162 0.9146
Eigen-Patches [19] 27.3918 0.7778 28.3847 0.8196 30.3118 0.8842 31.8986 0.9203
LINE [41] 27.0927 0.7591 28.0253 0.8009 30.0471 0.8727 31.6970 0.9131
Proposed () 27.4866 0.7802 28.4200 0.8009 30.3431 0.8845 31.9610 0.9209
Proposed Quilt. () 27.3916 0.7762 28.3345 0.8178 30.2512 0.8815 31.8507 0.9185

 

TABLE II: Summary of the Quality Analysis results using the PSNR and SSIM quality metrics.

Fig. 8 compares subjectively our method with LINE and Eigen-Patches, which were found to provide the most competitive results in terms of recognition and quality respectively. It can be seen that Eigen-patches generally provides images which are blurred. On the other hand, the images hallucinated by LINE contain more texture detail but have visible artefacts. The images recovered by the proposed method manages to preserve more texture when having at the expense of having more noise. The results in table I reveal that the face recognizers employed in this experiment are more robust to these distortions than to the smoothing effect provided by Eigen-Patches. These results also reveal that these distortions can be alleviated by simply increasing the size of the support.

Fig. 8: Comparison of super-resolution results at resolution . From left to right are the high-resolution ground-truth, the results of LINE [41], Eigen-Patches [19], the proposed method using and the proposed method with .

V-D Complexity Analysis

The complexity of the first layer which computes ridge regression using all entries in the dictionary is of order . The second process first computed BPDN followed by Multivariate Ridge Regression on the selected anchor points. In this work we use the SparseLab solver for BPDN which employs Primal-Dual Interior-Point Algorithm whose complexity is of order [66]. The complexity of Multivariate Ridge Regression using support vectors is of the order . This analysis reveals that the complexity of the proposed method is mostly dependent on the complexity of the sparse optimization method used.

 

Hall Method Resolution
8 10 15 20
Eigentransformation [11] 2.84 2.69 2.74 2.75
Neighbour Embedding [64] 0.25 0.33 0.73 1.24
Position-Patches [29] 1.59 2.11 4.69 8.37
Sparse Position-Patches [30] 0.59 0.79 1.72 3.02
Eigen-Patches [19] 10.89 14.80 34.74 63.98
LINE [41] 2.23 2.16 2.61 2.83
Proposed 8.04 5.78 4.71 5.43

 

TABLE III: Summary of the time taken (in seconds) to synthesize one image at different resolutions.

The complexity in terms of the average time taken to synthesize a high-resolution image from a low-resolution image in seconds is summarized in Table III. These results show that the proposed method is significantly less computationally intensive than Eigen-Patches but more complex than the other methods, including LINE. While complexity is not the prime aim of this work, the performance of the proposed scheme can be significantly improved using more efficient -minimization algorithms as mentioned in [66].

Vi Conclusion

In this paper, we propose a new approach which can be used to super-resolve a high-resolution image from a low-resolution test image. The proposed method first derives a smooth approximation of the ground-truth on the high-resolution manifold using all entries in the coupled dictionaries. Then we use Basis Pursuit Denoising to find the optimal atomic decomposition to represent the approximated sample . Based on the assumption that the patches reside on a high-resolution manifold, we assume that the optimal support to represent is good to reconstruct the ground-truth, and use the coupled support to hallucinate the high-resolution patch using Multivariate Ridge Regression. Extensive simulation results demonstrate that the proposed method outperforms the six face super-resolution methods in both recognition and quality analysis.

Future work points us in the direction to implement face hallucination techniques which are able to hallucinate and enhance face images afflicted by different distortions such as compression, landmark-point misalignment, bad exposure and other distortions commonly found in CCTV images. The ability of current schemes (including the proposed method) are dependent on the dictionaries used, and therefore these schemes can be made more robust by building more robust dictionaries.

Appendix A Indirect and Direct Projection Matrices

Using the results from section II-B, the direct up-scaling projection matrix is defined by

(16)

while the indirect projection matrix according to the authors in [51] is given by

(17)

The covariance matrix while the covariance matrix , where . This shows that has at least zero eigen-values. Given the above observation, we hypothesize that if the solutions in (16) and (17) are equivalent, then

(18)

Therefore, we can conclude that the direct and indirect projection matrices are equivalent.

Acknowledgment

The authors would like to thank Junjun Jiang who is the author in [41] and Vitomir Štruc author in [63] for proving the source code of their methods. The authors would also like to thank Prof. David Donoho and his team at SparseLab for providing the BPDN -minimization solver.

References

  • [1] W. Zou and P. Yuen, “Very low resolution face recognition problem,” Image Processing, IEEE Transactions on, vol. 21, no. 1, pp. 327–340, Jan 2012.
  • [2] M. A. Sasse, “Not seeing the crime for the cameras?” Commun. ACM, vol. 53, no. 2, pp. 22–25, Feb. 2010. [Online]. Available: http://doi.acm.org/10.1145/1646353.1646363
  • [3] N. La Vigne, S. Lowry, J. Markman, and A. Dwyer, “Evaluating the use of public surveillance cameras for crime control and prevention,” Urban Institute Justice Policy Center, Tech. Rep., 2011.
  • [4] N. Wang, D. Tao, X. Gao, X. Li, and J. Li, “A comprehensive survey to face hallucination,”

    International Journal of Computer Vision

    , vol. 106, no. 1, pp. 9–30, 2014. [Online]. Available: http://dx.doi.org/10.1007/s11263-013-0645-9
  • [5] M. Elad and A. Feuer, “Super-resolution reconstruction of image sequences,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 21, no. 9, pp. 817–834, Sep 1999.
  • [6] S. Farsiu, M. Robinson, M. Elad, and P. Milanfar, “Fast and robust multiframe super resolution,” Image Processing, IEEE Transactions on, vol. 13, no. 10, pp. 1327–1344, Oct 2004.
  • [7] K. Nasrollahi and T. Moeslund, “Super-resolution: a comprehensive survey,” Machine Vision and Applications, vol. 25, no. 6, pp. 1423–1468, 2014. [Online]. Available: http://dx.doi.org/10.1007/s00138-014-0623-4
  • [8] S. Baker and T. Kanade, “Hallucinating faces,” in Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on, 2000, pp. 83–88.
  • [9] Y. Hu, K.-M. Lam, G. Qiu, and T. Shen, “From local pixel structure to global image super-resolution: A new face hallucination framework,” Image Processing, IEEE Transactions on, vol. 20, no. 2, pp. 433–445, Feb 2011.
  • [10] Y. Li, C. Cai, G. Qiu, and K.-M. Lam, “Face hallucination based on sparse local-pixel structure,” Pattern Recognition, vol. 47, no. 3, pp. 1261 – 1270, 2014, handwriting Recognition and other {PR} Applications. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320313003841
  • [11] X. Wang and X. Tang, “Hallucinating face by eigentransformation,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 35, no. 3, pp. 425–434, Aug 2005.
  • [12] J.-S. Park and S.-W. Lee, “An example-based face hallucination method for single-frame, low-resolution facial images,” Image Processing, IEEE Transactions on, vol. 17, no. 10, pp. 1806–1816, Oct 2008.
  • [13]

    C. Liu, H.-Y. Shum, and C.-S. Zhang, “A two-step approach to hallucinating faces: global parametric model and local nonparametric model,” in

    Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1, 2001, pp. I–192–I–198 vol.1.
  • [14] Y. Li and X. Lin, “An improved two-step approach to hallucinating faces,” in Image and Graphics (ICIG’04), Third International Conference on, Dec 2004, pp. 298–301.
  • [15] C. Liu, H.-Y. Shum, and W. Freeman, “Face hallucination: Theory and practice,” International Journal of Computer Vision, vol. 75, no. 1, pp. 115–134, 2007. [Online]. Available: http://dx.doi.org/10.1007/s11263-006-0029-5
  • [16] A. Chakrabarti, A. Rajagopalan, and R. Chellappa, “Super-resolution of face images using kernel pca-based prior,” Multimedia, IEEE Transactions on, vol. 9, no. 4, pp. 888–892, June 2007.
  • [17] Y. Zhuang, J. Zhang, and F. Wu, “Hallucinating faces: {LPH} super-resolution and neighbor reconstruction for residue compensation,” Pattern Recognition, vol. 40, no. 11, pp. 3178 – 3194, 2007. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320307001355
  • [18] J. Yang, H. Tang, Y. Ma, and T. Huang, “Face hallucination via sparse coding,” in Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on, Oct 2008, pp. 1264–1267.
  • [19] H.-Y. Chen and S.-Y. Chien, “Eigen-patch: Position-patch based face hallucination using eigen transformation,” in Multimedia and Expo (ICME), 2014 IEEE International Conference on, July 2014, pp. 1–6.
  • [20] T. Lu, R. Hu, Z. Han, J. Jiang, and Y. Xia, “Robust super-resolution for face images via principle component sparse representation and least squares regression,” in Circuits and Systems (ISCAS), 2013 IEEE International Symposium on, May 2013, pp. 1199–1202.
  • [21] K. Su, Q. Tian, Q. Xue, N. Sebe, and J. Ma, “Neighborhood issue in single-frame image super-resolution,” in Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, July 2005, pp. 4 – 7.
  • [22] W. Liu, D. Lin, and X. Tang, “Neighbor combination and transformation for hallucinating faces,” in Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, July 2005, pp. 4 pp.–.
  • [23] X. Zhang, S. Peng, and J. Jiang, “An adaptive learning method for face hallucination using locality preserving projections,” in Automatic Face Gesture Recognition, 2008. FG ’08. 8th IEEE International Conference on, Sept 2008, pp. 1–8.
  • [24] W. Zhang and W.-K. Cham, “Hallucinating face in the dct domain,” Image Processing, IEEE Transactions on, vol. 20, no. 10, pp. 2769–2779, Oct 2011.
  • [25] X. Du, F. Jiang, and D. Zhao, “Multi-scale face hallucination based on frequency bands analysis,” in Visual Communications and Image Processing (VCIP), 2013, Nov 2013, pp. 1–6.
  • [26] S. W. Park and M. Savvides, “Breaking the limitation of manifold analysis for super-resolution of facial images,” in Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, vol. 1, April 2007, pp. I–573–I–576.
  • [27] B. Kumar and R. Aravind, “Face hallucination using olpp and kernel ridge regression,” in Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on, Oct 2008, pp. 353–356.
  • [28] W. Liu, D. Lin, and X. Tang, “Hallucinating faces: Tensorpatch super-resolution and coupled residue compensation,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2, June 2005, pp. 478–484 vol. 2.
  • [29] X. Ma, J. Zhang, and C. Qi, “Position-based face hallucination method,” in Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on, June 2009, pp. 290–293.
  • [30] C. Jung, L. Jiao, B. Liu, and M. Gong, “Position-patch based face hallucination using convex optimization,” Signal Processing Letters, IEEE, vol. 18, no. 6, pp. 367–370, June 2011.
  • [31] J. Jiang, R. Hu, Z. Han, T. Lu, and K. Huang, “Position-patch based face hallucination via locality-constrained representation,” in Multimedia and Expo (ICME), 2012 IEEE International Conference on, July 2012, pp. 212–217.
  • [32] J. Jiang, R. Hu, Z. Wang, and Z. Han, “Noise robust face hallucination via locality-constrained representation,” Multimedia, IEEE Transactions on, vol. 16, no. 5, pp. 1268–1281, Aug 2014.
  • [33] H. Li, L. Xu, and G. Liu, “Face hallucination via similarity constraints,” Signal Processing Letters, IEEE, vol. 20, no. 1, pp. 19–22, Jan 2013.
  • [34] B. Li, H. Chang, S. Shan, and X. Chen, “Locality preserving constraints for super-resolution with neighbor embedding,” in Image Processing (ICIP), 2009 16th IEEE International Conference on, Nov 2009, pp. 1189–1192.
  • [35] ——, “Aligning coupled manifolds for face hallucination,” Signal Processing Letters, IEEE, vol. 16, no. 11, pp. 957–960, Nov 2009.
  • [36] Y. Hao and C. Qi, “Face hallucination based on modified neighbor embedding and global smoothness constraint,” Signal Processing Letters, IEEE, vol. 21, no. 10, pp. 1187–1191, Oct 2014.
  • [37] W. Liu, D. Lin, and X. Tang, “Face hallucination through dual associative learning,” in Image Processing, 2005. ICIP 2005. IEEE International Conference on, vol. 1, Sept 2005, pp. I–873–6.
  • [38] H. Huang, H. He, X. Fan, and J. Zhang, “Super-resolution of human face image using canonical correlation analysis,” Pattern Recognition, vol. 43, no. 7, pp. 2532 – 2543, 2010. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320310000853
  • [39] J. Jiang, R. Hu, Z. Han, Z. Wang, T. Lu, and J. Chen, “Locality-constraint iterative neighbor embedding for face hallucination,” in Multimedia and Expo (ICME), 2013 IEEE International Conference on, July 2013, pp. 1–6.
  • [40] S. Qu, R. Hu, S. Chen, J. Jiang, Z. Wang, and J. Chen, “Face hallucination via re-identified k-nearest neighbors embedding,” in Multimedia and Expo (ICME), 2014 IEEE International Conference on, July 2014, pp. 1–6.
  • [41] J. Jiang, R. Hu, Z. Wang, and Z. Han, “Face super-resolution via multilayer locality-constrained iterative neighbor embedding and intermediate dictionary learning,” Image Processing, IEEE Transactions on, vol. 23, no. 10, pp. 4220–4231, Oct 2014.
  • [42] S. Kolouri and G. Rohde, “Transport-based single frame super resolution of very low resolution face images,” in Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, June 2015, pp. 4876–4884.
  • [43]

    P. Hennings-Yeomans, S. Baker, and B. Kumar, “Simultaneous super-resolution and feature extraction for recognition of low-resolution faces,” in

    Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, June 2008, pp. 1–8.
  • [44]

    H. Bhatt, R. Singh, M. Vatsa, and N. Ratha, “Improving cross-resolution face matching using ensemble-based co-transfer learning,”

    Image Processing, IEEE Transactions on, vol. 23, no. 12, pp. 5654–5669, Dec 2014.
  • [45]

    M. Jian and K.-M. Lam, “Simultaneous hallucination and recognition of low-resolution faces based on singular value decomposition,”

    Circuits and Systems for Video Technology, IEEE Transactions on, vol. 25, no. 11, pp. 1761–1772, Nov 2015.
  • [46] B. Li, H. Chang, S. Shan, and X. Chen, “Low-resolution face recognition via coupled locality preserving mappings,” Signal Processing Letters, IEEE, vol. 17, no. 1, pp. 20–23, Jan 2010.
  • [47] C. Zhou, Z. Zhang, D. Yi, Z. Lei, and S. Li, “Low-resolution face recognition via simultaneous discriminant analysis,” in Biometrics (IJCB), 2011 International Joint Conference on, Oct 2011, pp. 1–6.
  • [48] S. Siena, V. Boddeti, and B. Vijaya Kumar, “Coupled marginal fisher analysis for low-resolution face recognition,” in Computer Vision ECCV 2012. Workshops and Demonstrations, ser. Lecture Notes in Computer Science, A. Fusiello, V. Murino, and R. Cucchiara, Eds.   Springer Berlin Heidelberg, 2012, vol. 7584, pp. 240–249.
  • [49] S. Siena, V. Boddeti, and B. Kumar, “Maximum-margin coupled mappings for cross-domain matching,” in Biometrics: Theory, Applications and Systems (BTAS), 2013 IEEE Sixth International Conference on, Sept 2013, pp. 1–8.
  • [50] X. Tan, S. Chen, Z.-H. Zhou, and F. Zhang, “Face recognition from a single image per person: A survey,” Pattern Recognition, vol. 39, no. 9, pp. 1725 – 1745, 2006. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320306001270
  • [51] K. Zhang, D. Tao, X. Gao, X. Li, and Z. Xiong, “Learning multiple linear mappings for efficient single image super-resolution,” Image Processing, IEEE Transactions on, vol. 24, no. 3, pp. 846–861, March 2015.
  • [52] A. Martinez and R. Benavente, “The ar face database,” Robot Vision Lab, Purdue University, Tech. Rep. 5, Apr. 1998. [Online]. Available: http://dx.doi.org/10.1016/s0262-8856(97)00070-x
  • [53] Phillips, H. Wechsler, J. Huang, and P. J. Rauss, “The FERET database and evaluation procedure for face-recognition algorithms,” Image and Vision Computing, vol. 16, no. 5, pp. 295–306, Apr. 1998. [Online]. Available: http://dx.doi.org/10.1016/s0262-8856(97)00070-x
  • [54] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, “Multi-pie,” Image Vision Comput., vol. 28, no. 5, pp. 807–813, May 2010. [Online]. Available: http://dx.doi.org/10.1016/j.imavis.2009.08.002
  • [55] T. Ahonen, A. Hadid, and M. Pietikainen, “Face description with local binary patterns: Application to face recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 12, pp. 2037–2041, Dec. 2006. [Online]. Available: http://dx.doi.org/10.1109/TPAMI.2006.244
  • [56] A. Wagner, J. Wright, A. Ganesh, Z. Zhou, H. Mobahi, and Y. Ma, “Toward a practical face recognition system: Robust alignment and illumination by sparse representation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 34, no. 2, pp. 372–386, Feb 2012.
  • [57] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality-constrained linear coding for image classification,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, June 2010, pp. 3360–3367.
  • [58] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomic decomposition,” IEEE Trans. Inf. Theor., vol. 47, no. 7, pp. 2845–2862, Nov. 2001. [Online]. Available: http://dx.doi.org/10.1109/18.959265
  • [59] D. Donoho, M. Elad, and V. Temlyakov, “Stable recovery of sparse overcomplete representations in the presence of noise,” Information Theory, IEEE Transactions on, vol. 52, no. 1, pp. 6–18, Jan 2006.
  • [60] A. A. Efros and W. T. Freeman, “Image quilting for texture synthesis and transfer,” in Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, ser. SIGGRAPH ’01.   New York, NY, USA: ACM, 2001, pp. 341–346. [Online]. Available: http://doi.acm.org/10.1145/383259.383296
  • [61] A. P. Founds, N. Orlans, G. Whiddon, and C. Watson, “Nist special database 32 multiple encounter dataset ii (meda-ii),” National Institute of Standards and Technology, Tech. Rep., 2011.
  • [62] P. J. Phillips, P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek, “Overview of the face recognition grand challenge,” in Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Volume 1 - Volume 01, ser. CVPR ’05.   Washington, DC, USA: IEEE Computer Society, 2005, pp. 947–954. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2005.268
  • [63] V. Štruc and N. Pavešić, “The complete gabor-fisher classifier for robust face recognition,” EURASIP J. Adv. Signal Process, vol. 2010, pp. 31:1–31:13, Feb. 2010. [Online]. Available: http://dx.doi.org/10.1155/2010/847680
  • [64] H. Chang, D.-Y. Yeung, and Y. Xiong, “Super-resolution through neighbor embedding,” in Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, vol. 1, June 2004, pp. I–I.
  • [65] G. B. Huang and E. Learned-Miller, “Labeled faces in the wild: Updates and new reporting procedures,” University of Massachusetts, Tech. Rep., 2014.
  • [66] A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Fast l1-minimization algorithms and an application in robust face recognition: A review,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2010-13, Feb 2010. [Online]. Available: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-13.html