 # Learning Locality-Constrained Collaborative Representation for Face Recognition

The model of low-dimensional manifold and sparse representation are two well-known concise models that suggest each data can be described by a few characteristics. Manifold learning is usually investigated for dimension reduction by preserving some expected local geometric structures from the original space to a low-dimensional one. The structures are generally determined by using pairwise distance, e.g., Euclidean distance. Alternatively, sparse representation denotes a data point as a linear combination of the points from the same subspace. In practical applications, however, the nearby points in terms of pairwise distance may not belong to the same subspace, and vice versa. Consequently, it is interesting and important to explore how to get a better representation by integrating these two models together. To this end, this paper proposes a novel coding algorithm, called Locality-Constrained Collaborative Representation (LCCR), which improves the robustness and discrimination of data representation by introducing a kind of local consistency. The locality term derives from a biologic observation that the similar inputs have similar code. The objective function of LCCR has an analytical solution, and it does not involve local minima. The empirical studies based on four public facial databases, ORL, AR, Extended Yale B, and Multiple PIE, show that LCCR is promising in recognizing human faces from frontal views with varying expression and illumination, as well as various corruptions and occlusions.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Sparse representation has become a powerful method to address problems in pattern recognition and computer version, which assumes that each data point can be encoded as a linear combination of other points. In mathematically, , where is a dictionary whose columns consist of some data points, and is the representation of over . If most entries of are zeros, then is called a sparse representation. Generally, it can be achieved by solving

 (P0):min∥a∥0s.t.x=Da,

where denotes

-norm by counting the number of nonzero entries in a vector.

is difficult to solve since it is a NP-hard problem. Recently, compressive sensing theory Candes2005-Decoding ; Donoho2006-large have found that the solution of is equivalent to that of -minimization problem () when is highly sparse.

 (P1,1):min∥a∥1s.t.x=Da,

where -norm sums the absolute value of all entries in a vector. is convex and can be solved by a large amount of convex optimization methods, such as basis pursuit (BP) Chen2001-Atomic , least angle regression (LARS) Efron2004-Least . In Yang2010-l1-minimization , Yang et al. make a comprehensive survey for some popular optimizers.

Benefiting from the emergence of compressed sensing theory, sparse coding has been widely used for various tasks, e.g., subspace learning Wong2012Discover; Peng2012

Cheng2010-Learning ; Elhamifar2012-Sparse and matrix factorization Wang2011-Image . In these works, Wright et al. Wright2009-Robust

reported a remarkable method that passes sparse representation through a nearest feature subspace classifier, named sparse representation based classification (SRC). SRC has achieved attractive performance in robust face recognition and has motivated a large amount of works such as

Zhang2012Joint ; Zhang2013-Simultaneous ; He2011-Maximum . The work implies that sparse representation plays a important role in face recognition under the framework of nearest subspace classification Li1999-recognition .

However, is -norm based sparsity really necessary to improve the performance of face recognition? Several recent works directly or indirectly examined this problem. Yang et al. Yang2012-Beyond discussed the connections and differences between -optimizer and -optimizer for SRC. They show that the success of SRC should attributes to the mechanism of -optimizer which selects the set of support training samples for the given testing sample by minimizing reconstruction error. Consequently, Yang et al. pointed out that the global similarity derived from -optimizer but sparsity derived from -optimizer is more critical for pattern recognition. Rigamonti et al. Rigamonti2011-sparse compared the discrimination of two different data models. One is the -norm based sparse representation, and the other model is produced by passing input into a simple convolution filter. Their result showed that two models achieve a similar recognition rate. Therefore, -norm based sparsity is actually not as essential as it seems in the previous claims. Shi et al. Shi2011-recognition provided a more intuitive approach to investigate this problem by removing the -regularization term from the objective function of SRC. Their experimental results showed that their method achieves a higher recognition rate than SRC if the original data is available. Zhang et al. Zhang2011-Sparse replaced the -norm by the -norm, and their experimental results again support the views that -norm based sparsity is not necessary to improve the discrimination of data representation. Moreover, we have noted that Naseem et al. Naseem2010-Linear

proposed Linear Regression Classifier (LRC) which has the same objective function with Shi’s work. The difference is that Shi et al. aimed to explore the role of sparsity while Naseem et al. focused on developing an effective classifier for face recognition.

As another extensively-studied concise model, manifold learning is usually investigated for dimension reduction by learning and embedding local consistency of original data into a low-dimensional representation He2005-Neighborhood ; Belkin2006-Manifold ; Yan2007-Graph . Local consistency means that nearby data points share the same properties, which is hardly reflected in linear representation.

Recently, some researchers have explored the possibility of integrating the locality (local consistency) with the sparsity together to produce a better data model. Baraniuk et al. Richard2006-Random successfully bridged the connections between sparse coding and manifold learning, and have founded the theory for random projections of smooth manifold; Majumdar et al. Majumdar2010-Robust investigated the effectiveness and robustness of random projection method in classification task. Moreover, Wang et al. Wang2010-Locality proposed a hierarchal images classification method named locality-constrained linear coding (LLC) by introducing dictionary learning into Locally Linear Embedding (LLE) Roweis2000 . Chao et al. Chao2011-Locality

presented an approach to unify group sparsity and data locality by introducing the term of ridge regression into LLC; Yang et al.

Yang2012-Relaxed incorporated the prior knowledge into the coding process by iteratively learning a weight matrix of which the atoms measure the similarity between two data points. Figure 1: A key observation. (a) Three face images from two different sub-manifolds are linked to their corresponding neighbors, respectively. (b) The first column includes three images which correspond to the points A, B and C in Figure 1; The second column shows the Eigenface feature matrices for the testing images; The third column includes two parts: the left part is the coefficients of SRC Wright2009-Robust , and the right one is of CRC-RLS Zhang2011-Sparse . From the results, we could see that the representations of nearby points are more similar than that of non-neighboring points, i.e., local consistency could be defined as the similar inputs have similar codes.

In this paper, we proposed and formulated a new kind of local consistency into the linear coding paradigm by enforcing the similar inputs (neighbors) produce similar codes. The idea is motivated by an observation in biological founds Ohki2005

which shows that L2/3 of rat visual cortex activates the same collection of neurons in response to leftward and rightward drifting gratings. Figure

1 show an example to illustrate the motivation. There are three face images , and selected from two different individuals, where and came from the same person. This means that and lie on the same subspace and could represent with each other. Figure 1 is a real example corresponding to Figure 1. Either from the Eigenface matrices or the coefficients of the two coding schemes, we can see that the similarity between and is much higher than the similarity between and either of them.

Based on the observation, we proposed a representation learning method for robust face recognition, named as Locality-Constrained Collaborative Representation (LCCR), which not aims to obtain a representation that could reconstruct the input with the minimal residual but simultaneously reconstruct the input and its neighborhood such that the codes are as similar as possible. Furthermore, the objective function of LCCR has an analytic solution, does not involve local minima. Extensive experiments show that LCCR outperforms SRC Wright2009-Robust , LRC Shi2011-recognition ; Naseem2010-Linear , CRC-RLS Zhang2011-Sparse , CESR He2011-Maximum , LPP He2003-Locality , and linear SVM in the context of robust face recognition.

Except in some specified cases, lower-case bold letters represent column vectors and upper-case bold ones represent matrices, denotes the transpose of the matrix , represents the pseudo-inverse of , and

is reserved for identity matrix.

The remainder of paper is organized as follows: Section 2 introduces three related approaches for face recognition based on data representation, i.e., SRC Wright2009-Robust , LRC Shi2011-recognition ; Naseem2010-Linear and CRC-RLS Zhang2011-Sparse . Section 3 presents our LCCR algorithm. Section 4 reports the experiments on several facial databases. Finally, Section 5 contains the conclusion.

## 2 Preliminaries

We consider a set of facial images collected from subjects. Each training image, which is denoted as a vector , corresponds to the th column of a dictionary . Without generality, we assume that the columns of are sorted according to their labels.

### 2.1 Sparse representation based classification

Sparse coding aims at finding the most sparse solution of . However, in many practical problems, the constraint cannot hold exactly since the input may include noise. Wright et al. Wright2009-Robust relaxed the constraint to , where is the error tolerance, then, is rewritten as:

 (P1,2):min∥a∥1s.t.∥x−Da∥2≤ε.

Using Lagrangian method, can be transformed to the following unconstrained optimization problem:

 (P1,3):argmina∥x−Da∥22+λ∥a∥1,

where the scalar balances the importance between the reconstruction error of and the sparsity of code . Given a testing sample , its sparse representation can be computed by solving or .

After getting the sparse representation of , one infers its label by assigning to the class that has the minimum residual:

 ri(x)=∥x−D⋅δi(a∗)∥2, (1)
 identity(x)=argmini{ri(x)}. (2)

where the nonzero entries of are the entries in that are associated with th class, and denotes the label for x.

### 2.2 ℓ2-minimization based methods

In Naseem2010-Linear , Naseem et al. proposed a Linear Regression Classifier (LRC) which achieved comparable accuracy to SRC in the context of robust face recognition. In another independent workShi2011-recognition , Shi et al. used the same objective function with that of LRC to discuss the role of -regularization based sparsity. The objective function used in Naseem2010-Linear ; Shi2011-recognition is

 argmina∥x−Da∥22.

In Shi2011-recognition , Shi et al. empirically showed that their method (denoted as LRC in this paper for convenience) requires to be an over-determined matrix for achieving competitive results, while the dictionary of SRC must be under-determined according to compressive sensing theory. Once the optimal code is calculated for a given input, the classifier (1) and (2) is used to determine the label for the input .

As another recent -norm model, CRC-RLS Zhang2011-Sparse estimates the representation for the input by relaxing the -norm to the -norm in . They aimed to solve following objective function:

 argmina∥x−Da∥22+λ∥a∥22,

where is a balance factor.

LRC and CRC-RLS show that -norm based data models can achieve competitive classification accuracy with hundreds of times speed increase, compared with SRC. Under this background, we aim to incorporate the local geometric structures into coding process for achieving better discrimination and robustness.

## 3 Locality-Constrained Collaborative Representation Figure 2: Overview of the coding process of LCCR, which consists of three steps separated by dotted lines. First, for a given input x, find its neighborhood Y(x) from training data. Then, code x over D by finding the optimal representation a (see bar graph) which produces the minimal reconstruction errors for x and Y(x) simultaneously. Finally, conduct classification by finding which class produces the minimum residual. In the middle part of the figure, we use a red rectangles to indicate the basis vectors which produce the minimum residual.

It is a big challenge to improve the discrimination and the robustness of facial representation because a practical face recognition system requires not only a high recognition rate but also the robustness against various noise and occlusions.

### 3.1 Algorithm Description

As two of the most promising methods, locality preservation based algorithm and sparse representation have been extensively studied and successfully applied to appearance-based face recognition, respectively. Locality preservation based algorithm aims to find a low-dimensional model by learning and preserving some properties shared by nearby points from the original space to another one. Alternatively, sparse representation, which encodes each testing sample as a linear combination of the training data, depicts a global relationship between testing sample with training ones. In this paper, we aim to propose and formulate a kind of local consistency into coding scheme for modeling facial data. Our objective function is in the form of

 E(x,a)=∥x−Da∥22+λ∥a∥p+γEL, (3)

where , is the locality constraint, and dictate the importance of and , respectively. Then the key is to formulate the shared property of the neighborhood with .

could be defined as the reconstruction error of the neighborhood of the testing image, i.e.,

 (4)

where, for an input , its neighborhood is searched from the training samples according to prior knowledge or manual labeling. For simplicity, we assume that each data point has neighbors, and denotes the optimal code for .

To bridge the connection between the objective variants and , it is possible to assume that could be denotede as a linear combination of . Mathematically,

 a=∑ci∈Cwici, (5)

where is the representation coefficient between and . The calculation of is a challenging and key step which has been studied in many works. For example, Roweis and Saul Roweis2000 defined as the reconstruction coefficients over the nearby points in the original space. However, the approach is not suitable for our case since we aim to denote with but vice versa.

Motivated by a biological experiment of Ohki Ohki2005 as discussed in Section 1, we present a simple but effective method to solve the problem by directly replacing with . It is based on an observation (Figure 1) that the representation of also can approximate the representation of , i.e.,

 ∥yi−Dci∥22≤∥yi−Da∥22≤∥yi−D¯a∥22,

where denotes the representation of the point which is not close to .

Thus, the proposed objective function is as follows:

 (6)

where balances the importance between the testing image and its neighborhood . The second term, which measures the contribution of locality, can largely improve the robustness of a. If is corrupted by noise or occluded by disguise, a larger will yield better recognition results.

On the other hand, the locality constraint in (6) is a simplified model of the property that similar inputs having similar codes. We think this might be a new interesting way to learn local consistency.

Consider the recent findings, i.e., -norm based sparsity cannot bring a higher recognition accuracy and better robustness for facial data than -norm based methods Shi2011-recognition ; Zhang2011-Sparse , we simplify our objective function (6) as follows:

 (7)

Clearly, (7) achieves the minimum when its derivative with respect to is zero. Hence, the optimal solution is

 a∗=(DTD+λ⋅I)−1DT⎡⎣(1−γ)x+γ1K∑yi(x)∈Y(x)yi(x)⎤⎦. (8)

Let whose calculation requires re-formulating the psuedo-inverse, it can be calculated in advance and only once as it is only dependent on training data .

Given a testing image , the first step is to determine its neighborhood from the training set according to prior knowledge, or manual labeling, etc. In practical applications, there are two widely-used variations for finding the neighborhood:

1. -ball method: The training sample is a neighbor of the testing image if , where is a constant.

2. -nearest neighbors (K-NN) searching: The training sample is a neighbor of , if is among the -nearest neighbors of , where can be specified as a constant or determined adaptively.

Once the neighborhood of the testing image is obtained, LCCR just simply projects and its neighborhood onto space via (8). In addition, the matrix form of LCCR is easily derived, which can used in batch prediction.

 A∗=(DTD+λ⋅I)−1DT[(1−γ)X+γ1KK∑i=1Yi(X)],

where the columns of are the testing images whose codes are stored in , and denotes the collection of th-nearest neighbor of .

The proposed LCCR algorithm is summarized in Algorithm 1, and an overview is illustrated in Figure 2.

### 3.2 Discussions

From the algorithm, it is easy to see that the performance of LCCR is positively correlated with that of -NN searching method. Thus, it is possible to assume that LCCR will be failed if -NN cannot find the correct neighbors for the testing sample. Here, we give a real example (Figure3) to illustrate that LCCR would largely avoid such situations from happening. In the example, the classification accuracy of LCCR is about 94% by using 600 AR images with sunglasses as testing image and 1400 clean ones as training samples.

Figure 3 demonstrates the coefficients and residual of LCCR and CRC-RLS. We can see that the two methods correctly predicted the identity of the input, while -NN searching could not find the correct neighbors (see Figure 3). It illustrates that LCCR could work well even though -NN is failed to get the results. Figure 3 and Figure 3 illustrate another possible case. That is, CRC-RLS fails to get the correct identity of the input while the nearest neighbor cames from the 7th individual, and LCCR successfully obtains the the correct identify. Figure 3: The effectiveness of the proposed model. (a) A testing face disguised by sunglass comes from the 7th subject of AR database. The figures (in the red rectangle) in the second row are the coefficients and residual of the input learned by LCCR (λ=0.005, γ=0.9, and k=2); the figures in the first row are the results of CRC-RLS Zhang2011-Sparse (λ=0.001 for the best accuracy). (b) The 10 nearest neighbors of the input in terms of cityblock distance (Y-axis). (c) and (d) are the results of another testing sample from the same individual.

### 3.3 Computational Complexity Analysis

The computational complexity of LCCR consists of two parts for offline and online computation, respectively. Suppose the dictionary contains samples with dimensionality, LCCR takes to compute the projection matrix and to store it.

For each querying sample , LCCR needs to search the K-nearest neighbors of from . After that, the algorithm projects into another space via (8) in . Thus, the computational complexity of encoding LCCR is for each unknown sample. Note that, the computational complexity of LCCR is same with that of LRC Shi2011-recognition ; Naseem2010-Linear and CRC-RLS Zhang2011-Sparse , and it is more competitive than SRC Wright2009-Robust even though the fastest -solver is used. For example, SRC takes to code each sample over when Homotopy optimizer Osborne2000 is adopted to get the sparsest solution, where Homotopy optimizer is one of the fastest -minimization algorithm according to Yang2010-l1-minimization and denotes the number of iterations of Homotopy algorithm. From the above analysis, it is easy to find that a medium-sized data set will bring up the scalability issues with the models. To address the problem, a potential choice is to perform dimension reduction or sampling techniques to reduce the size of problem in practical application as did in Peng2013 .

## 4 Experimental Verification and Analysis

In this section, we report the performance of LCCR over four publicly-accessed facial databases, i.e., AR Martinez1998 , ORL Samaria1994 , the Extended Yale database B Georghiades2001 , and Multi-PIE Gross2010 . We examine the recognition results of the proposed algorithm with respect to 1) discrimination, 2) robustness to corruptions, 3) and robustness to occlusions.

### 4.1 Experimental Configuration

We compared the classification results of LCCR with four linear coding models (SRC Wright2009-Robust , CESR He2011-Maximum , LRC Shi2011-recognition ; Naseem2010-Linear and CRC-RLS Zhang2011-Sparse ) and a subspace learning algorithm (LPP He2003-Locality ) with the nearest neighbors classifier (1NN). Moreover, we also reported the results of linear SVM Fan2008 over the original inputs. Note that, SRC, CESR, LRC, CRC-RLS and LCCR directly code each testing sample over training data without usability of dictionary learning method, and get classification result by finding which subject produces the minimum reconstruction error. In these models, only LCCR incorporates locality based pairwise distance into coding scheme. For a comprehensive comparison, we report the performance of LCCR with five basic distance metrics, i.e., Euclidean distance (-distance), Seuclidean distance (standardized Euclidean distance), Cosine distance (the cosine of the angle between two points), Cityblock distance (-distance), and Spearman distance.

For computational efficiency, as did in Wright2009-Robust ; Zhang2011-Sparse , we performed Eigenface Turk1991 to reduce the dimensionality of data set throughout the experiments. Moreover, SRC requires the dictionary to be an under-determined matrix, and Shi et al. Shi2011-recognition claimed that their model (named as LRC in Naseem2010-Linear ) will achieve competitive results when is over-determined. For a extensive comparison, we investigate the performance of the tested methods except SRC over two cases.

We solved the -minimization problem in SRC by using the CVX Grant2008 , a package for solving convex optimization problems, and got the results of LRC, CRC-RLS and CESR by using the source codes from the homepages of the authors. All experiments are carried out using Matlab 32bit on a 2.5GHz machine with 2.00 GB RAM.

Parameter determination is a big challenge in pattern recognition and computer vision. As did in

Cheng2010-Learning ; Peng2012 , we report the best classification results of all tested methods under different parameter configurations. The value range used to find the best values for LCCR can be inferred from Figure 4, and these possible values of also are tested for SRC and CRC-RLS. In all tests, we randomly split each data set into two parts for training and testing, and compare the performance of the algorithms using the same partition to avoid the difference in data sets. Figure 4: Recognition accuracy of LCCR using Cityblock distance on a subset of AR database with dimensionality 2580. (a) The recognition rates versus the variation of the neighborhood parameter K, where λ=0.005 and γ=0.2. (b) The recognition rates versus the variation of the sparsity parameter λ, where K=5 and γ=0.2. (c) The recognition rates versus the variation of the locality constrained coefficient γ, where K=3 and λ=0.005.

### 4.2 Recognition on Clean Images

In this sub-section, we examine the performance of 7 competing methods over 4 clean facial data sets. Here, clean image means an image without occlusion or corruption, just with variations in illumination, pose, expression, etc.

(1) ORL database Samaria1994 consists of 400 different images of 40 individuals. For each person, there are 10 images with the variation in lighting, facial expression and facial details (with or without glasses). For computational efficiency, we cropped all ORL images from to , and randomly selected 5 images from each subject for training and used the remaining 5 images for testing.

Table 1 reports the classification accuracy of the tested algorithms over various dimensionality. Note that, the Eigenface with 200D retains energy of the cropped data, which makes the investigated methods achieve the same rates over 2688D. From the results, LCCRs outperform the other algorithms, and the best results are achieved when Cityblock distance is used to search the nearest neighbors. Moreover, we can find that all the algorithms achieve a higher recognition rate in the original space except LRC. One possible reason is that the cropped operation degrades the performance of LRC, another reason may attribute to the used classier. Moreover, we have found that if another nearest subspace classifier Wright2009-Robust is adopted with linear regression based representation, the accuracy of LRC is slightly decreased from 89% to 88.00% over the original data and from 91% to 90% with 120D.

(2) AR database Martinez1998 includes over 4000 face images of 126 people (70 male and 56 female) which vary in expression, illumination and disguise (wearing sunglasses or scarves). Each subject has 26 images consisting of 14 clean images, 6 images with sunglasses and 6 images with scarves. As did in Wright2009-Robust ; Zhang2011-Sparse , a subset that contains 1400 normal faces randomly selected from 50 male subjects and 50 female subjects, is used in our experiment. For each subject, we randomly permute the 14 images and take the first half for training and the rest for testing. Limited by the computational capabilities, as in Zhang2011-Sparse , we crop all images from original to (2580D) and convert it to gray scale.

(3) Extended Yale B database Georghiades2001 contains 2414 frontal-face images with size over 38 subjects, as did in Wright2009-Robust ; Zhang2011-Sparse , we carried out the experiments on the cropped and normalized images of size . For each subject (about 64 images per subject), we randomly split the images into two parts with equal size, one for training, and the other for testing. Similar to the above experimental configuration, we calculated the recognition rates over dimensionality 54, 120 and 300 using Eigenface, and 2592D in the original data space. Table 3 show that LCCRs again outperform its counterparts across various spaces, especially when the Spearman distance is used to determine the neighborhood of testing samples.

(4) Multi PIE database (MPIE)  Gross2010 contains the images of 337 subjects captured in 4 sessions with simultaneous variations in pose, expression and illumination. As did in Zhang2011-Sparse , we used all the images in the first session as training data and the images belonging to the first 250 subjects in the other sessions as testing data. All images are cropped from to .

From Tables 1-4, we draw the following conclusions:

1. LCCRs generally outperforms SVM (original input), SRC (sparse representation), CESR (robust sparse representation), LRC (linear regression based model) and CRC-RLS (collaborative representation) over the tested cases.

2. LCCRs perform better in a low-dimensional space than a high-dimensional ones. For example, on the Extended Yale B, the difference in accuracy between LCCR and CRC-RLS (the second best method) changed from (54D) to (120D) and to (300D). It again corroborates our claim that local consistency is helpful to improving the discrimination of data representation, since the low-dimensional data contain few information than higher one.

3. CESR is more competitive in the original space at the cost of computing cost. For example, it outperforms the other models over MPIE-S4 in classification accuracy where its time cost about 11003.51 seconds, compared with 3104.82s of SRC, 54.59s of LRC, 54.79s of CRC-RLS and 59.82s of LCCR.

4. SRC, LRC and CRC-RLS achieve the similar performance, and SRC is more competitive in the low-dimensional feature spaces. The results are consistent with the reports in Zhang2011-Sparse . For example, in the experiments of Zhang over MPIE-S2 with 300D, the accuracy scores of SRC and CRC-RLS are about 93.9% and 94.1%, respectively, comparing with 93.13% and 94.88% in our experiments. Moreover, CRC-RLS and LRC achieve similar recognition rates with the difference less than across various feature spaces.

### 4.3 Recognition on Partial Facial Features Figure 5: Recognition Accuracy with partial face features. (a) An example of the three features, right eye, mouth and chin, and nose from left to right. (b) The recognition rates of competing methods on the partial face features of the AR database.

The ability to work on partial face features is very interesting since not all facial features play an equal role in recognition. Therefore, this ability has become an important metric in the face recognition researches Savvides2006 . We examine the performance of the investigated methods using three partial facial features, i.e., right eye, nose, as well as mouth and chin, sheared from the clean AR faces with 2580D (as shown in Figure 5). For each partial face feature, we generate a data set by randomly selecting 7 images per subject for training and the remaining 700 for testing. It should be noted that Wright2009-Robust conducted the similar experiment on Extended Yale B which includes less subjects, smaller irrelevant white background, and more training samples per subject than our case.

Figure 5 shows that LCCRs achieve better recognition rates than SVM, SRC, LRC and CRC-RLS for right eye as well as mouth and chin, and the second best rates for the nose. Some works found that the most important feature is the eye, followed by the mouth, and then the nose Sinha2006-Recognition . We can see that the results for SVM, CRC-RLS and LCCR are consistent with the conclusions even though the dominance of the mouth and chin over the nose is not very distinct.

### 4.4 Face Recognition with Block Occlusions Figure 6: Experiments on AR database with varying percent block occlusion. (a) From top to bottom, the occlusion percents for test images are, 10%, 30%, and 50%, respectively. (b) and (c) are the recognition rates under different levels of block occlusion on AR database with 300D (Eigenface) and 2580D, respectively. (d) The recognition rates of LCCRs with 300D and 2580D.

To examine the robustness to block occlusion, similar to Wright2009-Robust ; Shi2011-recognition ; Zhang2011-Sparse , we get 700 testing images by replacing a random block of each clean AR image with an irrelevant image (baboon) and use 700 clean images for training. The occlusion ratio increases from 10% to 50%, as shown in Figure 6. We investigate the classification accuracy of the methods across Eigenface space with 300D (Figure 6) and cropped data space with 2580D (Figure 6).

Figures 6-6 show that LCCRs generally outperform the other models with considerable performance margins. Especially, with the increase of the occlusion ratio, the difference in recognition rates of LCCRs and the other methods becomes larger. For example, when the occlusion ratio is , in 300 dimensional space, the accuracy of LCCR with Cityblock distance is about higher than SVM, about higher than LPP, about higher than SRC (CVX), about higher than CESR, about higher than LRC, and about higher than CRC-RLS. Note that, different -solvers will lead to different results for SRC. For Example, if SRC adopts Homotopy algorithm Osborne2000 to get the sparest solution, the recognition rate will increase from 25.43% (with CVX) to 36.14% such that the performance dominance decreases from to . Moreover, CESR achieves the best results at the cost of computational cost when the original data is available and the occluded ratio ranges from 20% to 40%. Figure 7: Recognition on AR faces with real possible occlusions. (a) The top row is a facial image occluded by sunglass, whose partitioned blocks are shown as below. (b) The accuracy of K-NN searching using Cityblock distance, Cosine distance, Euclidean distance, Seuclidean distance and Spearman distance on the AR images with sunglasses (2580D). (c) Similar to (a), the top row is a face occluded by scarf, and its partitions below. (d) The precision of K-NN searching using Cityblock, Cosine, Euclidean, Seuclidean, Spearman as distance metrics on the AR images with scarves (2580D). (e) The recognition rates of competing methods across different experimental configurations.

On the other hand, it is easy to find that LRC, CRC-RLS and LCCRs are more robust than SRC and SVM, which implies that the -regularization term cannot yield better robustness than the -regularization term, at least for the Eigenface space. Moreover, the models achieve better results in higher dimensional space, even though the difference of classification accuracy between higher dimensional space and lower ones is less than except CESR has an obvious improvement.

### 4.5 Face Recognition with Real Occlusions

In this sub-section, we examine the robustness to real possible occlusions of the investigated approaches over the AR data set. We use 1400 clean images for training, 600 faces wearing by sunglasses (occluded ratio is about 20%) and 600 face wearing by scarves (occluded ratio is about 40%) for testing, separately. In Wright2009-Robust , Wright et al. only used a third of disguised images for this test, i.e., 200 images for each kind of disguises. In addition, we also investigate the role of K-NN searching in LCCR.

We examine two widely-used feature schemes, namely, the holistic feature with 300D and 2580D, as well as the partitioned feature based on the cropped data. The partitioned feature scheme firstly partitions an image into multiple blocks (8 blocks as did in Wright2009-Robust ; Zhang2011-Sparse ; Yang2012-Relaxed , see Figure 7 and 7), then conducts classification on each block independently, and after that, aggregates the results by voting.

Figure 7 reports the recognition rates of all the tested methods. For the images occluded by sunglasses, LCCR with Cityblock distance and CESR achieve remarkable results with the holistic feature scheme, their recognition accuracy are nearly double that of the other methods. This considerable performance margin contributes to the accuracy of -NN searching based on Cityblock distance (see Figure 7).

For the images occluded by scarves, LCCR achieves the highest recognition rate over the full dimensional space, and the second highest rates using Eigenface. However, the difference in rates between LCCR and other non-iterative algorithms (LRC, CRC-RLS) is very small due to the poor accuracy of K-NN searching as shown in Figure 7. Furthermore, the partitioned feature scheme produces higher recognition rates than the holistic one for all competing methods, which is consistent with previous report Wright2009-Robust .

From the above experiments, it is easy to conclude that the preservation of locality is helpful to coding scheme, especially when the real structures of data cannot be found by traditional coding scheme. Moreover, the performance ranking of LCCR with five distance metrics is same with that of K-NN searching with the used metrics.

### 4.6 Face Recognition with Corruption Figure 8: Testing images from AR database with additive noise and non-additive noise. Top row: 10%, 30%, 50%, 70%, 90% white noises are added into test image; Bottom row: the case of random pixel corruption with 10%-90% percentages, respectively.

We test the robustness of LCCR against two kinds of corruption using the AR data set containing 2600 images of 100 individuals. For each subject, we use 13 images for training (7 clean images, 3 images with sunglasses, and 3 images with scarves), and the remaining 13 images for testing. Different from Wright2009-Robust which tested the robustness to corruption using the Extended Yale B database, our case is more challenging for the following reasons. Firstly, AR images contain real possible occlusions, i.e., sunglasses and scarves, while Extended Yale B is a set of clean images without disguises. Secondly, AR includes more facial variations (13 versus 9), more subjects (100 versus 38), and a smaller samples for each subject (26 images per subject versus 64 images per subject). Thirdly, we investigated two kinds of corruption, white noise (additive noise) and random pixel corruption (non-additive noise) which are two commonly assumed in face recognition problem Wright2009-Robust ; Shi2011-recognition ; Naseem2010-Linear . For the white noise case (the top row of Figure 8

), we add random noise from normal distribution to each testing image

, that is, , and restrict , where is the corruption ratio from to with an interval of , and is the noise following a standard normal distribution. For the random pixel corruption case (the bottom row in Figure 8

), we replace the value of a percentage of pixels randomly chosen from each test image with the values following a uniform distribution over

, where is the largest pixel value of current image.

To improve the anti-noise ability of SRC Wright2009-Robust , Wright et al. generate a new dictionary by concatenating an identity matrix with the original dictionary , where the dimensionality of equals to that of data. The use of has been verified to be effective in improving the robustness of -norm based models Qiao2010Sparsity ; Wright2009-Robust at the cost of time-consuming. Therefore, it is a tradeoff between robustness and efficiency for the algorithms. Will the strategy still work for -minimization based models? In this sub-section, we fill this gap by comparing the results by coding over these two dictionary.

Table 5 through Table 8 are the recognition rates of the tested methods across feature space (Eigenface with 300D) and full dimensional space (2580D). We didn’t reported the results of SVM and LPP with the strategy of expanding dictionary since the methods are not belong to the facility of linear coding scheme. Moreover, SRC requires the dictionary is an over-completed matrix such that it could not run in the full dimensional cases. Based on the results, we have the following conclusions:

Firstly, the proposed LCCRs are much superior to SVM, LPP, SRC, CESR, LRC and CRC-RLS. For example, in the worst case (the white gaussian noise corruption ratio is , the best result of LCCR is about (Table 6), compared to of SVM (Table 6), of LPP (Table 5) , of SRC (Table 7), of CESR (Table 5), of LRC (Table 8), and of CRC-RLS (Table 6). In the case of random pixel corruption, one can see when the corruption ratio reaches , all methods fail to perform recognition except LCCR in the two data spaces and CESR in the full dimensional space.

Secondly, all investigated algorithms perform worse with increased corruption ratio and achieve better results in white noise corruption (additive noise) than random pixel corruption (non-additive noise). Moreover, the improvement of CESR is obvious when the original data is used to test. As discussed in the above, the improvement is at the cost of computational efficiency. For the other methods, they perform slightly better (less than ) in the full-dimensional space except LRC.

Thirdly, the results show that coding over is helpful in improving the robustness of SRC and LRC, but it has negative impact on the recognition accuracy of CESR, CRC-RLS and LCCR. For example, when white noise ratio rises to for the Eigenface (Table 5, expanding leads to the variation of the recognition rate from to for SRC, from to for CESR, from to for LRC, from to for CRC-RLS, and from to for LCCR with Spearman distance. The conclusion has not been reported in the previous works.

## 5 Conclusions and Discussions

It is interesting and important to improve the discrimination and robustness of data representation. The traditional coding algorithm gets the representation by encoding each datum as a linear combination of a set of training samples, which mainly depicts the global structure of data. However, it will be failed when the data are grossly corrupted. Locality (Local consistency) preservation, which keeps the geometric structure of manifold for dimension reduction, has shown the effectiveness in revealing the real structure of data. In this paper, we proposed a novel objective function to get an effective and robust representation by enforcing the similar inputs produce similar codes, and the function possesses analytic solution.

The experimental studies showed that the introduction of locality makes LCCR more accurate and robust to various occlusions and corruptions. We investigated the performance of LCCR with five basic distance metrics (for locality). The results imply that if better K-NN searching methods or more sophisticated distance metrics are adopted, LCCR might achieve a higher recognition rate. Moreover, the performance comparisons over two different dictionaries show that it is unnecessary to expand the dictionary with for -norm based coding algorithms.

Each approach has its own advantages and disadvantages. Parameter determination maybe is the biggest problem of LCCR which requires three user-specified parameters. In the future works, it is possible to explore the relationship between locality parameter and the intrinsic dimensionality of sub-manifold. Moreover, the work has focused on the representation learning, however, dictionary learning is also important and interesting in this area. Therefore, an possible way to extend this work is exploring how to reflect local consistency in the formation process of dictionary.

## References

• (1)

E. J. Candes, T. Tao, Decoding by linear programming, IEEE Transactions on Information Theory 51 (12) (2005) 4203–4215.

• (2) D. L. Donoho, For most large underdetermined systems of linear equations the minimal -norm solution is also the sparsest solution, Communications on Pure and Applied Mathematics 59 (6) (2006) 797–829.
• (3) S. S. B. Chen, D. L. Donoho, M. A. Saunders, Atomic decomposition by basis pursuit, SIAM Review 43 (1) (2001) 129–159.
• (4) B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression, Annals of Statistics 32 (2) (2004) 407–451.
• (5) A. Yang, A. Ganesh, S. Sastry, Y. Ma, Fast -minimization algorithms and an application in robust face recognition: a review, in: Proc. of International Conference on Image Processing, 2010, pp. 1849–1852.
• (6) P. Xi, L. Zhang, Z. Yi., Constructing l2-graph for subspace learning and segmentation, ArXiv e-printsarXiv:1209.0841.
• (7) B. Cheng, J. Yang, S. Yan, Y. Fu, T. Huang, Learning with -graph for image analysis, IEEE Transactions on Image Processing 19 (4) (2010) 858–866.
• (8) E. Elhamifar, R. Vidal, Sparse subspace clustering: Algorithm, theory, and applications, To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence.
• (9)

C. Wang, X. He, J. Bu, Z. Chen, C. Chen, Z. Guan, Image representation using laplacian regularized nonnegative tensor factorization, Pattern Recognition 44 (10) (2011) 2516–2526.

• (10) J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (2) (2009) 210–227.
• (11) H. Zhang, N. M. Nasrabadi, Y. Zhang, T. S. Huang, Joint dynamic sparse representation for multi-view face recognition, Pattern Recognition 45 (4) (2012) 1290–1298.
• (12) H. Zhang, Y. Zhang, T. S. Huang, Simultaneous discriminative projection and dictionary learning for sparse representation based classification, Pattern Recognition 46 (1) (2013) 346–354.
• (13) R. He, W.-S. Zheng, B.-G. Hu, Maximum correntropy criterion for robust face recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (8) (2011) 1561–1576.
• (14)

S. Z. Li, J. Lu, Face recognition using the nearest feature line method, IEEE Transactions on Neural Networks 10 (2) (1999) 439–443.

• (15) J. Yang, L. Zhang, Y. Xu, J. Yang, Beyond sparsity: The role of l1-optimizer in pattern classification, Pattern Recognition 45 (3) (2012) 1104–1118.
• (16) R. Rigamonti, M. A. Brown, V. Lepetit, Are sparse representations really relevant for image classification?, in: Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, 2011, pp. 1545–1552.
• (17) Q. Shi, A. Eriksson, A. van den Hengel, C. Shen, Is face recognition really a compressive sensing problem?, in: Proc. of IEEE Conference on Computer Vision and Pattern Recognition.
• (18) L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: Which helps face recognition?, in: Proc. of IEEE International Conference on Computer Vision.
• (19) I. Naseem, R. Togneri, M. Bennamoun, Linear regression for face recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (11) (2010) 2106–2112.
• (20) X. He, D. Cai, S. Yan, H. Zhang, Neighborhood preserving embedding, in: Proc. of IEEE International Conference on Computer Vision.
• (21)

M. Belkin, P. Niyogi, V. Sindhwani, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, The Journal of Machine Learning Research 7 (2006) 2399–2434.

• (22) S. C. Yan, D. Xu, B. Y. Zhang, H. J. Zhang, Q. Yang, S. Lin, Graph embedding and extensions: A general framework for dimensionality reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (1) (2007) 40–51.
• (23) R. G. Baraniuk, M. B. Wakin, Random projections of smooth manifolds, Foundations of Computational mathematics 9 (1) (2009) 51–77.
• (24) A. Majumdar, R. K. Ward, Robust classifiers for data reduced via random projections, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 40 (5) (2010) 1359–1371.
• (25) J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, Y. Gong, Locality-constrained linear coding for image classification, in: Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, 2010, pp. 3360–3367.
• (26) S. T. Roweis, L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (5500) (2000) 2323–2326.
• (27) Y. Chao, Y. Yeh, Y. Chen, Y. Lee, Y. Wang, Locality-constrained group sparse representation for robust face recognition, in: Proc. of IEEE International Conference on Image Processing, 2011, pp. 761–764.
• (28) M. Yang, L. Zhang, D. Zhang, S. Wang, Relaxed collaborative representation for pattern classification, in: Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2224–2231.
• (29) K. Ohki, S. Chung, Y. H. Ch’ng, P. Kara, R. C. Reid, Functional imaging with cellular resolution reveals precise micro-architecture in visual cortex, Nature 433 (7026) (2005) 597–603.
• (30) X. He, S. Yan, Y. Hu, P. Niyogi, H. Zhang, Face recognition using laplacianfaces, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (3) (2005) 328–340.
• (31) M. R. Osborne, B. Presnell, B. A. Turlach, A new approach to variable selection in least squares problems, IMA Journal of Numerical Analysis 20 (3) (2000) 389–403.
• (32) X. Peng, L. Zhang, Z. Yi, Scalable sparse subspace clustering, in: Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2013.
• (33) A. Martinez, R. Benavente, The ar face database (1998).
• (34) F. Samaria, A. Harter, Parameterisation of a stochastic model for human face identification, in: Proc. of the IEEE Workshop on Applications of Computer Vision (WACV), 1994, pp. 138–142.
• (35) A. Georghiades, P. Belhumeur, D. Kriegman, From few to many: illumination cone models for face recognition under variable lighting and pose, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (6) (2001) 643–660.
• (36) R. Gross, I. Matthews, J. Cohn, T. Kanade, S. Baker, Multi-pie, Image and Vision Computing 28 (5) (2010) 807–813.
• (37) R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, C.-J. Lin, Liblinear: A library for large linear classification, Journal of Machine Learning Research 9 (2008) 1871–1874.
• (38) M. Turk, A. Pentland, Eigenfaces for recognition, Journal of Cognitive Neuroscience 3 (1) (1991) 71–86.
• (39) M. Grant, S. Boyd, Graph implementations for nonsmooth convex programs, in: V. Blondel, S. Boyd, H. Kimura (Eds.), Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, Springer-Verlag Limited, 2008, pp. 95–110.
• (40)

M. Savvides, R. Abiantun, J. Heo, S. Park, C. Xie, B. Vijayakumar, Partial holistic face recognition on frgc-ii data using support vector machine, in: Proc. of Computer Vision and Pattern Recognition Workshop, 2006, pp. 48–53.

• (41) P. Sinha, B. Balas, Y. Ostrovsky, R. Russell, Face recognition by humans: Nineteen results all computer vision researchers should know about, Proceedings of the IEEE 94 (11) (2006) 1948–1962.
• (42) L. S. Qiao, S. C. Chen, X. Y. Tan, Sparsity preserving projections with applications to face recognition, Pattern Recognition 43 (1) (2010) 331–341.