Sparse Graph-based Transduction for Image Classification

08/26/2014 ∙ by Sheng Huang, et al. ∙ Chongqing University 0

Motivated by the remarkable successes of Graph-based Transduction (GT) and Sparse Representation (SR), we present a novel Classifier named Sparse Graph-based Classifier (SGC) for image classification. In SGC, SR is leveraged to measure the correlation (similarity) of each two samples and a graph is constructed for encoding these correlations. Then the Laplacian eigenmapping is adopted for deriving the graph Laplacian of the graph. Finally, SGC can be obtained by plugging the graph Laplacian into the conventional GT framework. In the image classification procedure, SGC utilizes the correlations, which are encoded in the learned graph Laplacian, to infer the labels of unlabeled images. SGC inherits the merits of both GT and SR. Compared to SR, SGC improves the robustness and the discriminating power of GT. Compared to GT, SGC sufficiently exploits the whole data. Therefore it alleviates the undercomplete dictionary issue suffered by SR. Four popular image databases are employed for evaluation. The results demonstrate that SGC can achieve a promising performance in comparison with the state-of-the-art classifiers, particularly in the small training sample size case and the noisy sample case.



There are no comments yet.


page 2

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

As two popular techniques for classification, Sparse Representation (SR) and Graph-based Transduction (GT) have attracted a lot of attentions in machine learning, computer vision and image processing communities 

src ; gsrc ; ssc ; srdp ; ksrc ; hyper ; zhou . The idea of SR stems from the compression sensing that most signals have a sparse representation as a linear combination of a reduced subset of signals from the same space src ; compress . The basic idea of GT is to utilize the similarities between each two samples to infer the labels of unlabeled samples where such similarities are encoded in a graph or hypergraph zhou ; st ; dhlp ; gtam ; ssl . In SR, the signals tend to have a representation biased towards their own class and only the most relevant signals are highlighted srdp . These facts endow SR with the strong discriminating power and the robustness to noise. However, an important prior condition of SR is that it requires the dictionary to be overcomplete. In the lack of training samples case (the undercomplete dictionary case), which actually is very common in the real world applications, the dictionary constructed by training samples is too small to sparsely represent the query sample which will restrict the classification performance of SR. Moreover, another shortcoming of SR is that it cannot utilize the self-similarities of the training data and the self-similarities of the testing data. On the contrary, GT can well alleviate the previous shortcomings suffered by SR, since the graph, which is core of GT and encodes the similarities, are constructed from both training and testing samples. In other words, all data can be sufficiently exploited. The main problem of the current GT approaches is that they are easily corrupted by noise. This is due to the fact that most of GT approaches generate the graphs (or hypergraphs) by k-nearest-neighbour and -ball sparsegraph . However, improving the robustness to noise is what is SR good at. Apparently, the advantages SR and GT are complementary. So here comes a question, if there exists a classification approach that can combine SR and GT together and inherit their advantages? Fortunately, this paper will give a positive answer.

(a) Raw Samples and their rank scores
(b) Samples with noise and their rank scores
Figure 1: The figure shows the top 10 most relevant face images selected by SR and K-Nearest Neighbour based on a given query face image. This experiment is conducted in a subset of FERET database feret (72 subjects with 6 images in each subject). The first two rows of the figure are the selection results of SR while the last two rows are the selection results of KNN.

The left subfigure reports the results on the original FERET database while the right one reports the results on the modified FERET database in which 30% of pixels of each image has been corrupted by noise. In the figure, the first face image of each image array is the query image and the rest ten images are the relevant face images selected by SR or KNN. The histograms above the image array demonstrates the confidence scores of these top ten relevant face images. If the subjects of the return face image and the query face image are identical, its corresponding histogram is positive otherwise it is negative. In the figure, SR gets five hits either on the original FERET database or on the noisy FERET database while KNN only gets three and two hits on these two datasets respectively. Clearly, this phenomenon verifies that sparse graph, which is generated by SR, is more discriminative and robust.

Recently, many works leverage SR to construct a sparse graph (or -graph) for tackling subspace learning, clustering and semi-supervised learning tasks ssc ; srdp ; spp ; sparsegraph ; nlrsg . These approaches can achieve such remarkable successes, since the sparse graph incorporates the merits of SR that it is more discriminative and robust than the conventional graph. Although a lot of impressive related works have been proposed, as far as we know, there is no prior work that directly employs the sparse graph for transduction. In this paper, we utilize the sparse graph to present a novel Graph-based Transduction (GT) algorithm for classification. Following the same graph construction manner in srdp ; sparsegraph , each sample is taken out as the query sample and the remainder of samples are considered as the dictionary to present a Sparse Representation (SR) system in which the correlations (or similarities) between the query sample and other samples are measured. In such case, a sparse graph, which encodes the correlations between each two samples, can be constructed and it is not hard to derive its graph Laplacian. Note, the graph Laplacian is constructed from both training samples and testing samples. Finally we can achieve our proposed classification approach via plugging such graph Laplacian into the conventional Graph-based transduction framework. We name this novel graph-based classification approach Sparse Graph-based Classifier (SGC). SGC inherits the advantages of both SR and GT which is exactly the positive answer of the aforementioned question. Compared with SR, since the graph Laplacian is constructed from both training and testing samples, SGC can not only use the correlations between the given testing sample and training samples, which is as same as what the traditional SR-based classifier does, but also use the correlations of the testing data and the correlations of the training data to further improve the discriminating power of SR. Moreover, since the testing samples are complemented to construct a larger dictionary, SGC alleviates the undercomplete dictionary issue suffered by SR src . Compared with GT, the graph laplacian of SGC is generated from SR instead of k-nearest-neighbour or -ball. So there are two merits inherited from SR: the relevant samples can be better and adaptively selected for each sample to constitute the local clique (or neighbour); the obtained graph is more robust to noise src ; sparsegraph (see the examples on Figure 1). We apply our work to image classification. Yale yale , AR ar , FERET feret and Caltech256 caltech256 databases are employed for evaluation. The experimental results show that our method can get a promising result in comparison with the state-of-the-art classifiers particularly in the small training sample size case.

The rest of paper is organized as follows: Section 2 presents the related works; Section 3 describes the proposed approach. Section 4 shows the experimental evaluation of our works; the conclusion is summarized in Section 5.

2 Related Works

2.1 Sparse Representation

Sparse Representation (SR) is a hot topic in the recent decade and widely applied to extensive areas src ; ksrc ; sparsegraph ; srod ; mjsr ; srdp

. Since SR enjoys the good discriminating power and the robustness to noise, SR is often considered as a popular classification technique. For example, Wright et al considered the testing face image as a query and the training face images as the visual dictionary to construct a SR system to address the face recognition task 

src . Gao et al kernelized the previous approach and apply the kernel version to the face recognition and image classification ksrc . To overcome the undercomplete dictionary situation and further improve the performance of SR-based face recognition, Ma et al gsrc complemented the visual dictionary by adding the gradient image of the faces. However, the original faces and gradient faces are totally in the different feature domains. Agarwal e al introduced a work for learning a sparse, part-based representation for object detection srod . Yuan et al presented a multitask joint sparse representation model to combine the strength of multiple features and/or instances for visual classification mjsr . Although these SR-based approaches achieve remarkable successes, there are two main shortcomings which are still not essentially overcame. The first one is that SR cannot perform well in the undercomplete dictionary case (the small training sample size case). The second shortcoming is that conventional SR only can utilize the correlations (or similarities) between training samples and the testing samples to infer the class label while cannot sufficiently exploit the correlations of the training samples as well as the correlations of the testing samples. The proposed Sparse Graph-based Classifier (SGC) can well overcome these two shortcomings.

2.2 Sparse Graph

Motivated by the recent successes of SR src ; ksrc ; wsrc , some researchers leverage SR instead of the conventional k-nearest-neighbour or -ball to construct a sparse graph for addressing the different issues ssc ; srdp ; sparsegraph ; spp ; nlrsg . More specifically, Qiao et al and Timofte et al successively use SR to construct a sparse graph for dimensionality reduction srdp ; spp . The Sparse Subspace Clustering (SSC) algorithms ssc ; sssc ; rss learn a sparse graph for clustering via considering the data self representation problem as a SR issue. Similar to srdp ; spp , Cheng et al utilize SR to construct the

-graph (sparse-graph) for spectral clustering, subspace learning and semi-supervised learning 

sparsegraph . Although the applications and the learning (or construction) procedures of these works are very different, the obtained sparse graphs are very similar which all demonstrate the good discriminative abilities and robustness. In this paper, we intend to use the sparse graph to present a GT algorithm which can incorporate these desirable properties. As same as these works srdp ; sparsegraph ; spp ; nlrsg , our approach is also an application of the sparse graph.

2.3 Graph-based Transduction

As a transductive learning algorithm, Graph-based Transduction (GT) labels the samples based on the similarities between each two sample (no matter the training sample or the testing samples) where these similarities are encoded in a graph (or hypergraph). In other words, GT can sufficiently exploit the information of whole data and therefore it often performs well in the small training sample size case. This fact makes GT become very popular approach for classification and labeling hyper ; zhou ; st ; gtam ; gtc ; srgt . For example, Duchenne et al presented a state-of-the-art segmentation via leveraging the conventional GT to infer the label of each pixel st . Graph Transduction via Alternating Minimization (GTAM) enhanced GT via introducing a propagation algorithm, which can more reliably minimize a cost function over both a function on the graph and a binary label matrix, and applying it to classification gtam . Similarly, in order to address the classification issue, Orach et al presented a new GT algorithm via introducing an additional quantity of confidence in label assignments and learning them jointly with the weights gtc . Zhou et al provided a new way to construct the hypergraph and used it to replace the graph in the GT framework for tackling a labeling task zhou . Following the same framework in zhou , Yu et al presented a GT-based image classification approach via adaptively generating the hyperedges and learning their weights hyper . From this short review, it is not hard to conclude that one of the important factors to effect the success of GT algorithm is the quality of the graph (or hypergraph). The graphs (or hypegraphs) of aforementioned approaches are generated by k-nearest-neighbour or -ball. However, some works have indicated that such graphs can be easily corrupted by noise sparsegraph (see the examples in Figure 1). Inspired by the approaches mentioned in Section 2.2 srdp ; spp ; sparsegraph , in our approach, we adopt the more robust and discriminative graph, sparse graph, to alleviate this problem.

3 Methodology

3.1 Sparse Graph Laplacian

The graph plays a very important role in Graph-based Transduction (GT), since it depicts the relationships (similarities or correlations) of the samples which are regarded as the basis for classification (or labeling). However, the conventional GT approaches generate the graphs (or hypergraph) by k-nearest-neighbour or -ball. It has been proved that these graphs often cannot well reveal the real relationships of samples due to noise and some other factors sparsegraph . Some recent works srdp ; sparsegraph ; spp indicate that using the Sparse Representation (SR) can generate a more discriminative and robust graph. So, in this section, we will introduce how to use SR to construct a high quality graph. Following the same graph construction manner in srdp ; sparsegraph , we take out one sample from the whole dataset and consider the rest samples as the dictionary to construct a SR system. Here, we let -dimensional matrix, , be the sample matrix where is the dimension of sample and is the number of samples. We denote the sample that we want to represent, , where is its corresponding index. The matrix is the sample matrix which excludes the sample . The correlations (or similarities) between the query sample and the other samples are measured by solving the following SR problem


where the vector

is the representation coefficients (regression weights) of sample and is the element of corresponding to the sample . is the measurement noise. However, this -norm constrained representation issue is NP-hard and difficult even to approximate src ; nphard . Only a few of very recent works attempt to solve the problem as a non-convex minimization issue tpm ; l0debur , and some of these works even cannot guarantee the converge. The researchers more tend to seek the close-form solution via considering this -norm constrained regression problem as a -norm constrained problem

where is a parameter in the range which is used to control the trade off between the reconstruction error and the sparsity. This problem is a typical convex problem. So it can be solved by many mature convex optimization techniques. Moreover, another reason that -norm may be more suitable to construct a high quality sparse graph is that, unlike the -norm, which only counts the nonzero elements of coefficients, also pays attention on the values of coefficients which indicate the degrees of similarities. Of course, the idea of the sparse graph is general. So other norms can also be applied to construct some other graphs which incorporate different specific properties.

In our model, we adopt the SLEP method slep to efficiently solve the problem in Equation 3.1. The correlation between sample and , which is also regarded as the weight of edge between and , can be calculated as follows


where is also the (

)-th element of affinity matrix of sparse graph,

. Moreover, we define the self-similarity of the sample as follows


We use the Laplacian Eigenmapping laplacian to derive the graph Laplacian. The normalized graph Laplacian can be computed as follows


where is a diagonal matrix and .

is an identical matrix. This normalized graph Laplacian incorporates the properties of SR which is more discriminative and enjoys the robustness to noise.

3.2 Sparse Graph-based Transduction

Graph-based transduction (GT) methods label input data by learning a classification function that is regularized to exhibit smoothness along a graph over labeled and unlabeled samples zhou ; gtam . In other words, the GT model can be deemed as a regularized graph cut problem in which the graph cut is considered as a classification function. Based on the obtained sparse graph Laplacian , we first formulate our GT method in the binary class case and then generalize it into the multi-class case. Since our method is based on sparse graph, we name our proposed GT algorithm Sparse Graph-based Transduction (SGT) and its corresponding classifier Sparse Graph-based Classifier (SGC). In SGC, a graph cut is defined as the classification function and this cut should not only minimize the similarities losses (sparse representation relationship losses) but also reduce the classification errors of the training samples. Mathematically speaking, such model can be formatted as follows

where the similarity loss function

is denoted as a normalized cut function ncut and the classification error function measures the classification errors by computing the Euclidean distances between the predicted labels and groundtruth labels. The vector is the label vector. Let us assume is the -the element of , which depicts the status of the sample . Then, in , or -1 if the sample has been labeled as positive or negative respectively, and 0 if it is unlabeled. is a positive to reconcile the similarity losses, , and the classification errors, . Note, the graph Laplacian is constructed from both training samples and testing samples. Moreover, it is worthwhile to point out that the GT framework is very flexible. The researchers can also design these two loss functions by themselves for addressing different issues.

We employ the one-versus-all strategy to generalize the algorithm from the binary classification case to the multi-class classification case. The multi-class version is denoted as follows

where and are the collection of classification functions and the collection of the defined labels with respect to the different classes. is the number of classes. In label vector , only the samples from class are considered as positive while the samples from other classes are considered as negative.

Since is a positive semi-definite matrix, Equation 3.2 can be efficiently solved by Regularized Least Square (RLS). We obtain the partial derivative of Equation 3.2 with respect to , and let it equal to zero.

Finally, the classification of -th sample can be accomplished by assigning it to the -th class that satisfies


where is the ()-th element of matrix .

SGT inherits the desirable properties of both GT and SR. More specifically, SGC can well exploit the correlations of both testing samples and the training samples, since the graph Laplacian is constructed from whole data. SGC performs much better in the small training sample size case, since SGC utilizes the testing samples to complement the dictionary of SR in the sparse graph construction procedure. Moreover, SGC is more discriminative and robust to noise.

4 Experimental Results

Yale yale , FERET feret , AR ar and Caltech256 caltech256 databases are used to evaluate our work. The Yale face database totally has 15 subjects and 11 samples per subject yale . The size of image is 3232 pixels. The FERET database contains 13539 images corresponding to 1565 subjects feret . Following paper dhlp , a subset which contains 436 images of 72 individuals is selected in our experiments and this subset involves variations of facial expression, illumination and poses. The AR database consists of more than 4,000 images of 126 subjects ar . The database characterizes divergence from ideal conditions by incorporating various facial expressions, luminance alterations, and occlusion modes. Following paper lrc , a subset contains 1680 images with 120 subjects are constructed in our experiment. All these images are 5040 pixels. Similarly, we follow the paper hyper and select a subset from Caltech256 database caltech256 . In this subset, there are 20 classes and 100 images per class. Since Caltech256 is more challenging than the other two databases. We adopt the Picodes feature picodes to represent the images. The AR, Yale and FERET databases are the face databases and Caltech256 is the image databases. Figure 2 shows some example images of these databases.

Sparse Representation-based Classifier (SRC) src , Collaborative Representation-based Classifier (CRC) crc , LIBSVM libsvm , Graph-based Classifier (GC) (The corresponding Classifier of Graph-based Transduction (GT) algorithm st ; gtam ), Normalized Hypergraph-based Classifier (NHC) zhou and Adaptive Hypergraph-based Classifier (AHC) hyper are employed for comparison. The last three algorithms are all transductive learning-based methods and their graph matrices (or hypergraph matrices) are generated based on Euclidean distance (Heat Kernel Weighting).

(a) Yale
(b) AR
(d) Caltech256
Figure 2: The sample images of the datasets used in our experiments.
Databases Classification Error (MeanSTD,%)
CRC crc LIBSVM libsvm SRC src NHC zhou AHC hyper GC Ours
Yale 9.335.66 10.673.77 3.332.83 20.007.86 20.567.07 8.896.29 2.782.36
AR 31.250.42 31.900.67 27.982.36 31.610.76 31.730.08 42.082.44 26.130.42
Caltech256 38.201.55 39.152.47 40.852.19 43.002.40 43.102.26 52.504.10 39.403.45
FERET 15.280.00 12.960.56 11.340.03 35.421.64 33.102.29 11.111.31 10.190.65
Average 23.52 23.67 20.88 32.51 32.12 28.65 19.63
Table 1: Two-fold cross validation results on Yale, AR and Caltech256 databases.

4.1 Image Classification

We apply different classifiers to these four databases and the two-fold cross validation is adopted in our experiments. Table 1 reports the classification results. From the observations, we can know that the proposed Sparse Graph-based Classifier (SGC) outperforms all the compared classifiers on AR, Yale and FERET databases and can get a very promising performance on Caltech256 database. Moreover, SGC improves the performances of both SRC and GT-based algorithms (NHC, AHC and GC). For example, the classification accuracy gains of SGC over SRC, NHC, AHC and GC are 1.25%, 12.88%, 12.49% and 9.02% respectively in average. In the experiments, the GT-based algorithms perform not well in comparison with SRC and SGC. We think there are two reasons behind this phenomenon. The first reason is that k-nearest-neighbour is not discriminative enough to well select the relevant samples. The second reasons is that it is hard to select a suitable to define a suitable neighbour, which can well reveal the local relationships of samples, while SGC can avoid such selection of , since the relevant samples are adaptively selected (without giving any ). We observe from the classification performances of SRC and SGC that the performance of SGC relies on the performance of SRC. This phenomenon verifies that the core of SGC is the sparse graph which is generated by SR and incorporates the properties of SR.

4.2 Robustness to Noise

In SGC, the graph Laplacian is generated by SR. So SGC should inherit some merits from SRC. Theoretically speaking, compared to the original GT approaches, SGT should be more robust to noise. In this section, we conduct some experiments on AR and FERET databases to validate this. In the experiments, several noisy face databases are constructed by randomly generating the salt-and-pepper noise for each face image. We define four noise levels based on the proportion of noise in a image and study the effect of noise proportion to the classification performance. As same as the experimental setting in the previous section, the two-fold cross-validation is adopted to measure the classification performance. Figure 3 shows the experimental results. In this figure, the -axis indicates the proportion of noise and the -axis indicates the misclassification accuracy. From the figure, SGC outperforms SRC and GC in all experiments and has a similar behaviour as SRC. GC fails soon even in the case that only 10% noise is introduced. On the contrary, the classification performances of SRC and SGC drop slowly along with the increasing of the noise percentage. Clearly, such phenomenon well verifies that SGC is more robust to noise in comparison with the conventional GT algorithms.

(b) AR
Figure 3: Classification performances of different approaches under different noise levels.

4.3 Insensitivity to the Training Sample Size

The main advantage of GT approaches is that the information of both training data and testing data can be fully exploited. So, in most of time, these approaches always perform much better than other approaches in small training sample size case. As an instance of the GT framework, SGT should also have such desirable property. In this section, we conduct several experiments on AR and Yale databases to investigate the effect of the training sample size to the classification performance of SGC. In these experiments, the cross-validation strategy is employed for measuring the classification performance and five sizes of training samples are defined. For example, if the proportion of the training sample is 0.1, we adopt ten-fold cross-validation to conduct the experiments. We plot the classification errors of different approaches under different training sample sizes in Figure 4. The -axis indicates the training sample percentage of data and the -axis indicates the mean classification error. From the observations in Figure 4, SGC consistently outperforms SRC and the improvement of SGC over SRC is increased along with the reduction of the training proportion. These phenomena all verify that SGC can perform much better than SRC in the small training sample size case.

(a) Yale
(b) AR
Figure 4: Classification errors of different methods under different training sample sizes.

4.4 The Settings of Parameters

There are two parameters in SGC. One is which is introduced by SR and used to control the degree of sparsity. The other is which is used to reconcile the correlation loss and classification errors of training samples. In this section, we conduct some experiments to discuss the effects of these parameters to the classification performance. As same as the previous section, the two-fold cross-validation is adopted. Figure 5 plots the relationships between the classification error and the value of parameters. From this figure, we can find that SGC is quite insensitive to on three face databases, namely Yale, AR and FERET, when . So we suggest that the optimal of these face databases are all equal to . However, for Caltech256 database, the optimal is much greater and its value is . The settings of on Caltech256 database is different to the ones on three face databases, since their features are different. The feature of the face databases is just the simple gray scale while the feature of the Caltech256 database is Picodes. Similarly, SGC is quite insensitive to when its value is greater than 1. From the observations, we can conclude that the optimal for all four databases is .

(a) The effect of
(b) The effect of
Figure 5: The effects of parameters to the classification performance.

5 Conclusion

We introduced the Sparse Representation (SR) to the Graph-based Transduction (GT) and presented a novel GT-based Classifier called Sparse Graph-based Classifier (SGC) for image classification. In SGC, SR is utilized to measure the correlation of each two samples. Then a sparse graph is constructed to depict such correlations. Finally, the graph Laplacian of this graph is plugged into the GT framework to infer the labels of the unlabeled samples. According to the theoretical analysis and the experimental verification on four popular image databases, we concluded that SGC can incorporate the advantages of both SR and GT. SGC is a very flexible framework, since its parts are all replaceable. So there are a lot of interesting works that can be done based on SGC. For example, if we want to enhance SGC, we can design the classification error function by ourselves or utilize other more advanced regression techniques to instead of SR to construct the high quality graph.


The work described in this paper was partially supported by National Natural Science Foundations of China (NO. 60975015 and 61173131), Fundamental Research Funds for the Central Universities (No. CDJXS11181162). The authors would like to thank useful comments of the anonymous reviewers and editors.


  • (1) J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (2) (2009) 210–227.
  • (2) P. Ma, D. Yang, Y. Ge, X. Zhang, Y. Qu, S. Huang, J. Lu, Robust face recognition via gradient-based sparse representation, Journal of Electronic Imaging 22 (1) (2013) 013018–013018.
  • (3)

    E. Elhamifar, R. Vidal, Sparse subspace clustering, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 2790–2797.

  • (4) R. Timofte, L. Van Gool, Sparse representation based projections, in: British machine vision conference (BMVC), 2011, pp. 61–1.
  • (5) S. Gao, I. W.-H. Tsang, L.-T. Chia, Kernel sparse representation for image classification and face recognition, in: European Conference on Computer Vision (ECCV), 2010, pp. 1–14.
  • (6) J. Yu, D. Tao, M. Wang, Adaptive hypergraph learning and its application in image classification, IEEE Transactions on Image Processing 21 (7) (2012) 3262–3272.
  • (7) D. Zhou, J. Huang, B. Schölkopf, Learning with hypergraphs: Clustering, classification, and embedding, in: Advances in neural information processing systems (NIPS), 2006, pp. 1601–1608.
  • (8) D. L. Donoho, For most large underdetermined systems of linear equations the minimal L1-norm solution is also the sparsest solution, Communications on pure and applied mathematics 59 (6) (2006) 797–829.
  • (9) O. Duchenne, J.-Y. Audibert, R. Keriven, J. Ponce, F. Ségonne, Segmentation by transduction, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8.
  • (10) S. Huang, D. Yang, Y. Ge, D. Zhao, X. Feng, Discriminant hyper-laplacian projections with its application to face recognition, in: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), 2014, pp. 1–6.
  • (11) J. Wang, T. Jebara, S.-F. Chang, Graph transduction via alternating minimization, in: International conference on Machine learning, 2008, pp. 1144–1151.
  • (12) X. Zhu, Z. Ghahramani, J. Lafferty, et al., Semi-supervised learning using gaussian fields and harmonic functions, in: International Conference on Machine Learning (ICML), Vol. 3, 2003, pp. 912–919.
  • (13) B. Cheng, J. Yang, S. Yan, Y. Fu, T. S. Huang, Learning with L1-graph for image analysis, IEEE Transactions on Image Processing 19 (4) (2010) 858–866.
  • (14) P. J. Phillips, H. Wechsler, J. Huang, P. J. Rauss, The feret database and evaluation procedure for face-recognition algorithms, Image and vision computing 16 (5) (1998) 295–306.
  • (15) L. Qiao, S. Chen, X. Tan, Sparsity preserving projections with applications to face recognition, Pattern Recognition 43 (1) (2010) 331–341.
  • (16) L. Zhuang, H. Gao, Z. Lin, Y. Ma, X. Zhang, N. Yu, Non-negative low rank and sparse graph for semi-supervised learning, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2328–2335.
  • (17) A. M. Martínez, A. C. Kak, PCA versus LDA, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (2) (2001) 228–233.
  • (18) A. Martínez, R. Benavente, The AR Face Database (Jun 1998).
  • (19) G. Griffin, A. Holub, P. Perona, Caltech-256 object category dataset.
  • (20) S. Agarwal, D. Roth, Learning a sparse representation for object detection, in: European Conference on Computer Vision (ECCV), 2002, pp. 113–127.
  • (21) X.-T. Yuan, X. Liu, S. Yan, Visual classification with multitask joint sparse representation, IEEE Transactions on Image Processing 21 (10) (2012) 4349–4360.
  • (22) C.-Y. Lu, H. Min, J. Gui, L. Zhu, Y.-K. Lei, Face recognition via weighted sparse representation, Journal of Visual Communication and Image Representation 24 (2) (2013) 111–116.
  • (23) X. Peng, L. Zhang, Z. Yi, Scalable sparse subspace clustering, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 430–437.
  • (24) G. Liu, Z. Lin, Y. Yu, Robust subspace segmentation by low-rank representation, in: International Conference on Machine Learning (ICML), 2010, pp. 663–670.
  • (25) M. Orbach, K. Crammer, Graph-based transduction with confidence, in: Machine Learning and Knowledge Discovery in Databases, 2012, pp. 323–338.
  • (26) X. Yang, X. Bai, L. J. Latecki, Z. Tu, Improving shape retrieval by learning graph transduction, in: European Conference on Computer Vision (ECCV), 2008, pp. 788–801.
  • (27) E. Amaldi, V. Kann, On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems, Theoretical Computer Science 209 (1) (1998) 237–260.
  • (28)

    X.-T. Yuan, T. Zhang, Truncated power method for sparse eigenvalue problems, The Journal of Machine Learning Research 14 (1) (2013) 899–925.

  • (29) L. Xu, S. Zheng, J. Jia, Unnatural l0 sparse representation for natural image deblurring, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 1107–1114.
  • (30) J. Liu, S. Ji, J. Ye, SLEP: Sparse learning with efficient projections, Arizona State University 6.
  • (31) M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural computation 15 (6) (2003) 1373–1396.
  • (32) J. Shi, J. Malik, Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (8) (2000) 888–905.
  • (33)

    I. Naseem, R. Togneri, M. Bennamoun, Linear regression for face recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (11) (2010) 2106–2112.

  • (34) A. Bergamo, L. Torresani, A. W. Fitzgibbon, Picodes: Learning a compact code for novel-category recognition, in: Advances in Neural Information Processing Systems (NIPS), 2011, pp. 2088–2096.
  • (35) D. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: Which helps face recognition?, in: IEEE International Conference on Computer Vision (ICCV), 2011, pp. 471–478.
  • (36)

    C.-C. Chang, C.-J. Lin, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology 2 (2011) 27:1–27:27.