1 Introduction
Image classification automatically assigns an unknown image to a category according to its visual content, which has been a major research direction in computer vision. Image classification has two major challenges. First, each image may contain multiple objects with similar low level features, it is thus hard to accurately categorize the image using the global statistical information such as color or texture histograms. Second, a mediumsized (e.g.,
) grayscale image corresponds to a vector with dimensionality of , this brings up the scalability issue with image classification techniques.To address these problems, numerous approaches Mendoza2012 ; Lu2013 have been proposed in the past decade, among which one of the most popular methods is BagofFeatures (BOF) or called BagofWords (BOW). BOW originates from document analysis Joachims1996 ; Blei2003
. It models each document as the joint probability distribution of a collection of words.
Sivic2003 ; Csurka2004 ; Fei2005 incorporated the insights of BOW into image analysis by treating each image as a collection of unordered appearance descriptors extracted from local patches. Each descriptor is quantized into a discrete ‘visual words’ corresponding to a given codebook (i.e., dictionary), and then the compact histogram representation is calculated for semantic image classification.The huge success of BOF has inspired a lot of works Grauman2005 ; Bolovinou2013 . In particular, Lazebnik et al. Lazebnik2006 proposed Spatial Pyramid Matching (SPM) which divides each image into blocks in different scales and computes the histograms of local features inside each block, and finally concatenates all the histograms to represent the image. SPM has been the major component of most stateoftheart systems such as Yu2009 ; Wang2010 ; Yu2011 ; Gao2013TIP ; Zhou2013 , which has achieved considerably improved performance on a range of image classification benchmarks like Columbia University Image Library100 (COIL100) COIL100 and Caltech101 Fei2006
. However, to obtain a good performance, SPM has to pass the obtained representation to a Support Vector Machine classifier (SVM) with nonlinear Mercer kernels, e.g., the intersection kernel. This brings up the scalability issue with SPM in practical applications.
To make SPM efficient, Yang et al. Yang2009 proposed using sparse coding Gao2013TIP
instead of kmeans based vector quantization to encode each ScaleInvariant Feature Transform (SIFT) descriptor
Lowe2004 over a codebook. Benefiting from the nonlinear structure of sparse representation, Yang’s method (namely ScSPM) with linear SVM obtains a higher classification accuracy than the traditional nonlinear SPM method, while taking less time for training and testing.However, as Liu2013 pointed out, sparse representation encodes each data point independently and thus cannot capture the class structure. Moreover, due to the overhigh computational complexity of sparse coding, it is a daunting task to perform ScSPM when the data size is larger than (i.e., when the number of blocks is over ). To solve these two problems, this paper proposes using Low Rank Representation (LRR) rather than sparse code to hierarchically encode each SIFT descriptor. To the best of our knowledge, this is the first work to formulate the image classification as a LRR problem under the framework of SPM.
Our method is motivated by a fact that: each subject consists of multiple images and each image consists of multiple local descriptors. The number of subjects is largely less than that of the descriptors and therefore the representation of the descriptors is naturally low rank. This is the first work to incorporate LRR into SPM and the following problems are focused in this paper: (1) The computational complexity of the traditional LRR is equivalent to that of sparse coding. In other words, simply using LRR to replace sparse representation cannot solve the scalability issue of the ScSPM. To address this problem, we propose a fast version of LRR based on the equivalence theory between the Nuclear norm and the Frobenius norm Favaro2011 ; Peng2012 ; Zhang2014 . Our method has a closed form solution and thus can be calculated very fast. Moreover, the method can be run in an online way, which makes handling the incremental data possible. (2) Most of the recent LRR works use the inputs as the codebook (socalled selfexpression), which is not suitable for classification scenario. To address this problem, we propose a new objective function and derive the corresponding optimal solution. (3) The codebook of the original LRR technique Liu2013 probably contains various errors such as the Gaussian noises. In this paper, we calculate the representation for each descriptor using a clean codebook. Extensive experimental results show that the proposed method, namely LrrSPM, which achieves competitive results on several image databases and is times faster than ScSPM. Figure 1 shows a schematic comparison of the original SPM, ScSPM, and LrrSPM.
The rest of the paper is organized as follows: Section 2 provides a brief review on two classic image classification methods, i.e., Spatial Pyramid Matching (SPM) Lazebnik2006 and Sparse coding based Spatial Pyramid Matching (ScSPM) Yang2009 . Section 3 presents our method (LrrSPM) which uses multiplescale low rank representation rather than vector quantization or sparse code to represent each image. Moreover, a fast and online low rank representation method is introduced. Section 4 carries out some experiments using seven image data sets and several popular approaches. Finally, Section 5 concludes this paper.
Notation  Definition 

the number of descriptors (features)  
the scale or resolution of a given image  
the dimensionality of the descriptors  
the number of subjects  
the size of codebook  
the rank of a given matrix  
an image  
a set of features  
codebook  
the representation of over 
Notations: Lowercase bold letters represent column vectors and uppercase bold ones denote matrices. and denote the transpose and pseudoinverse of the matrix , respectively.
denotes the identity matrix. Table
1 summarizes some notations used throughout the paper.2 Related works
Let be a collection of the descriptors and each column vector of represents a feature vector , Spatial Pyramid Matching (SPM) Lazebnik2006 applies Vector Quantization (VQ) to encode via
(1) 
where denotes norm, is the representation or called the cluster assignment of , the constraint guarantees that only one entry of is with value of one and the rest are zeroes, and denotes the codebook.
In the training phase, and are iteratively solved, and VQ is equivalent to the classic kmeans clustering algorithm which aims to
(2) 
where consists of cluster centers identified from .
In the testing phase, each is actually assigned to the nearest . Since each has only one nonzero element, it discards a lot of information for (socalled, hard coding problem). Yang et al. Yang2009 proposed ScSPM which uses sparse representation to encode each via
(3) 
where denotes norm which sums the absolute values of a vector, and is the sparsity parameter.
The advantage of ScSPM is that the sparse representation has a small number of nonzero entries and it can represent better with less reconstruction errors. Extensive studies Wang2010 ; Yang2009 have shown that ScSPM with linear SVM is superior to the original SPM with nonlinear SVM on a range of databases. The disadvantage of ScSPM is that each data point is encoded independently, and thus the sparse representation of cannot reflect the class structure. Moreover, the computational cost of sparse coding is very high. Any mediumsized data set will bring up scalability issue with ScSPM.
3 Fast Low Rank Representation Learning for Spatial Pyramid Matching
LRR seeks the low rank representation of a given data set. It can capture the relations among different subjects, thus providing better representation. LRR has been widely studied in image clustering Xiao2014 , semisupervised classification Yang2014 , subspace learning Liu2011 , and so on.
In this work, we propose an approach, called Low Rank Representation based Spatial Pyramid Matching (LrrSPM), which uses the multiplescale LRR of the SIFT descriptors as feature vectors to train and test the linear SVM classifier. Our method (see Figure 2 for the flow chart of the algorithm) is based on a fact that each subject consists of multiple images and each image consists of multiple local descriptors. The size of subjects is much less than that of the descriptors and thus the representation of the descriptors is low rank. We aim to solve
(4) 
where denotes the collection of the SIFT descriptors, denotes the representation of over the codebook , and generally consists of cluster centers.
Since the rank operator is nonconvex and discontinuous, we can use nuclear norm as a convex relaxation based on the theoretical result from Recht2010 . Moreover, since probably contains the errors (e.g., noises), we aim to solve
The major difference between our coding method (5) and the existing LRR methods is the objective function. Liu2013 ; Liu2011 use the input as the codebook and perform encoding using a corrupted codebook, i.e., their constraint term is instead of . Different objective functions result in different optimization algorithms and results. We argue that a clean codebook would provide better representative ability.
To solve (5), the Augmented Lagrange Multiplier method (ALM) is adopted, which minimizes
(6) 
where denotes Frobenius norm, is a balanced parameter, and is the Lagrange multiplier.
ALM solves (6) with respect to and in an iterative way. The optimization process involves variables, and thus it is inefficient in large scale setting. Furthermore, (6) is an offline process. For any datum not including into , the above formulation cannot get its representation.
To solve these two problems, we propose an approximate LRR method. The method is based on the equivalence theory between the Nuclear norm and the Frobenius norm given by Favaro2011 ; Peng2012 ; Zhang2014 . Zhang2014 proves that the Frobenius norm and the Nuclear norm have the same unique solution in the case of errorfree (i.e., ). Peng2012 theoretically and experimentally shows that the Frobenius norm is equivalent to the truncated Nuclear norm Favaro2011 in the case of . In other words, one can obtain the lowest rank representation by solving a Frobenius norm based objective function. Hence, LrrSPM solves
(7) 
The solution of (7) is given by . For the incremental datum , the corresponding code is . When , this solution is the deserved LRR, which is also called Collaborative Representation (CR) Zhang2011 . In Zhang2011 ; Peng2014 ; Wei2014
, CR has been extensively investigated and achieved a lot of success in face recognition, palm recognition, and so on. In practice, however,
probably contains various errors (i.e., ), which makes the solution of (7) not the lowest rank. To obtain the lowest rank representation in this case, we thresholds the trivial entries for each based on the theoretical results Favaro2011 ; Peng2012 .(8) 
denotes the scale or the level of the pyramid. For each block at each level, perform max pooling for each block at each level via
, where denotes the th LRR vector belonging to the th block, and .Algorithm 1 summarizes our algorithm. Similar to Lazebnik2006 ; Yang2009 , the codebook can be generated by the kmeans clustering method or dictionary learning methods such as Gao2014TIP . For training or testing purpose, LrrSPM can get the low rank representation in an online way, which further explores the potential of LRR in online and incremental learning. Moreover, our method is very efficient since its coding process only involves a simple projection operation.
4 Experiments
4.1 Baseline Algorithms and Databases
We implemented and evaluated four classes of SPM methods on seven image databases^{1}^{1}1The MATLAB codes and the used data set can be downloaded at http://goo.gl/sTSa6k.. Besides our own implementations, we also quote some results directly from the literature.
The implemented methods include BOF Fei2005 with linear SVM (LinearBOF) and kernel SVM (KernelBOF), SPM Lazebnik2006 with linear SVM (LinearSPM) and kernel SVM (KernelSPM), Sparse Coding based SPM with linear SVM(ScSPM) (Yang2009, ), and Localityconstrained Linear Coding with linear SVM (LLC) Wang2010 .
The used databases include four scene image data sets, two object image data sets (i.e., COIL20 COIL20 and COIL100 COIL100 ), and one facial image database (i.e., Extended Yale B Georghiades2001 ). The scene image data sets are from Oliva and Torralba Oliva2001 , FeiFei and Perona Fei2005 , Lazebnik et al. Lazebnik2006 , and FeiFei et al Fei2006 , which are referred to as OT, FP, LS, and Caltech101, respectively. Table 2 gives a brief review on these data sets.
Databases  Type  Data Size  Image Size  

OT Oliva2001  scene  2688  260–410  8  
FP Fei2005  scene  3860  210–410  13  
LS Lazebnik2006  scene  4486  210–410  15  
Caltech101 Fei2006  scene  9144  31–800  102  
COIL20 COIL20  object  1440  72  20  
COIL100 COIL100  object  7200  72  100  
Extended Yale B Georghiades2001  face  2414  59–64  38 
4.2 Experimental setup
To be consistent with the existing works Lazebnik2006 ; Yang2009 , we use dense sampling technique to divide each image into blocks (patches) with a step size of 6 pixels, where
denotes the scale. And we extract the SIFT descriptors from each block as features. To obtain the codebook, we use the kmeans clustering algorithm to find 256 cluster centers for each data set and use the same codebook for different algorithms. In each test, we split the images per subject into two parts, one is for training and the other is for testing. Following the common benchmarking procedures, we repeat the test 5 times with different training and testing data partitions and record the average of persubject recognition rates and the time costs for each run. We report the final results by the mean and standard deviation of the recognition rates and the time costs. For the LrrSPM approach, we fixed
and assigned different for different databases. For the competing approaches, we referred to the parameters configurations in Lazebnik2006 ; Wang2010 ; Yang2009 . Besides our own implementation, we also quote some stateoftheart results directly from the literature.4.3 Influence of the parameters
LrrSPM has two userspecified parameters, the regularization parameter is used to avoid overfitting and the thresholding parameter is used to eliminate the effect of the errors. In this section, we investigate the influence of these two parameters on OT data set. We fix () and reported the mean classification accuracy of LrrSPM with the varying (). Figure 3 shows the results, from which one can see that LrrSPM is robust to the choice of the parameters. When increases from 0.2 to 2.0 with an interval of 0.1, the accuracy ranges from 83.68% to 85.63%; When increases from 0.5 to 1.0 with an interval of 0.02, the accuracy ranges from 84.07% to 86.03%.
4.4 Robustness with Respect to the Size of Codebook
In this Section, we report the performance of the evaluated methods when the size of codebook increases from to . we carried out the experiments on the Caltech101 data set by randomly selecting 30 samples per subject for training and using the rest for testing. The is set as for LrrSPM. Moreover, we directly quote some stateoftheart results achieved in Lee2009; Kavukcuoglu2010; Zhang2014Learning. Table 3 shows the results, from which we can find that:
Algorithms  Accuracy  Time Costs (seconds)  
256  512  1024  256  512  1024  
LrrSPM (Ours)  
LinearBOF  
KernelBOF  
linearSPM  
KernelSPM  
ScSPM  
LLC  
state of the art  
KernelSPM Lazebnik2006          
Convolution DBN1 Lee2009      60.50       
Convolution DBN2 Lee2009      65.40       
CNN Kavukcuoglu2010      66.30       
ObjecttoClass Zhang2014Learning      64.26       
. DBN is short for deep belief network and CNN is short for convolutional neural network.

LrrSPM, ScSPM and LLC are superior to LinearBOF, KernelBOF, LinearSPM, and KernelSPM. LrrSPM achieves comparable result compared to ScSPM and LLC, whereas consuming less time for coding and classification. For example, when the codebook includes bases (i.e., ), the recognition rates of LrrSPM is 28.78% higher than that of LinearBOF, 20.73% higher than that of Kernel BOF, 22.36 higher than that of LinearSPM, 12.38% higher than that of KernelSPM, 0.5% higher than that of ScSPM and lower than that of LLC, whereas LrrSPM only takes about 3% (30%) CPU time of ScSPM (LLC).

With increasing , all evaluated methods achieve better recognition results and takes more time for coding and classification. ScSPM, LLC, and LrrSPM use the SVM with linear kernel. Therefore, they take less time to train and test the classifier than KernelBOF and KernelSPM. However, these three methods take more time to encode each SIFT descriptor than KernelBOF and KernelSPM.

We could not reproduce the results reported in the literature for some evaluated methods. The possible reason is due to subtle engineering details, e.g., Lazebnik2006 tested 50 rather than the all images per subject, Wang2010 ; Yang2009 used a much larger codebook () and the codebook could probably be different even when using the samesized codebook.
4.5 Scene Classification
This section reports the performance of LrrSPM on three scene image databases. The codebook consists of bases identifying by the kmeans method. For each data set, we randomly chose 100 samples from each subject for training and used the rest for testing.
Algorithms  the OT database  the FP database  the LS database  

Accuracy (%)  Time (s)  Accuracy (%)  Time (s)  Accuracy (%)  Time (s)  
LrrSPM (Ours)  85.630.56  80.900.75  76.340.58  
LinearBOF  
KernelBOF  
linearSPM  
KernelSPM  
ScSPM  
LLC  76.991.21  
Rasiwasia Rasiwasia2009      76.20     
Table 4 shows that LrrSPM is slightly better than the other evaluated algorithms in most tests. Although LrrSPM is not the fastest method, it finds a good balance between the efficiency and the classification rate. On the OT database, the speed of LrrSPM is about 5.49 and 46.07 times faster than ScSPM and LLC, respectively. On the LS database, the speedups are 5.59 and 50.26 times.
4.6 Object and Face Recognition
This section investigates the performance of LrrSPM on two object image data sets (i.e., COIL20 and COIL100) and one facial image database (i.e., Extended Yale Database B). To analyze the time costs of the examined methods, we also report the time costs of the methods for encoding the SIFT descriptors and for classifying the representation using a linear or nonlinear SVM.
Algorithms  Training Images for Each Subject  

10  20  30  40  50  
LrrSPM (Ours)  97.900.42  99.520.87  100.000.03  100.000.00  100.000.00 
LinearBOF  
KernelBOF  
linearSPM  
KernelSPM  
ScSPM  100.000.01  100.000.02  
LLC  100.000.07 
Algorithms  Training Images for Each Subject  

10  20  30  40  50  
LrrSPM (Ours)  91.190.65  97.390.78  99.290.21  99.870.36  100.000.16 
LinearBOF  
KernelBOF  
linearSPM  
KernelSPM  
ScSPM  
LLC  91.260.42 
Algorithms  Training Images for Each Subject  

10  20  30  40  50  
LrrSPM (Ours)  87.080.41  96.030.89  98.280.55  99.230.81  99.810.83 
LinearBOF  
KernelBOF  
linearSPM  
KernelSPM  
ScSPM  
LLC 
Algorithms  COIL20  COIL100  Extended Yale B  

Coding  Classification  Coding  Classification  Coding  Classification  
LrrSPM  
LinearBOF  
KernelBOF  
linearSPM  
KernelSPM  
ScSPM  
LLC 
Tables 5– 7 report the recognition rate of the tested approaches on COIL20, COIL100, and Extended Yale B, respectively. In most cases, our method achieves the best results and is followed by ScSPM and LLC. When 50 samples per subject of COIL20 and COIL100 are used for training the classifier, LrrSPM perfectly grouped the remaining images into the correct categories. On the Extended Yale B, LrrSPM also classifies almost all the samples into the correct categories (the recognition rate is about 99.81%).
Table 8 shows the efficiency of the evaluated methods. One can find that LrrSPM, BOF, and SPM are obviously more efficient than ScSPM and LLC for the encoding and the classification. Specifically, the CPU time of LrrSPM is only about 2.35%–3.90% of that of ScSPM and about 5.99%–10.44% of that of LLC.
5 Conclusion
In this paper, we propose a spatial pyramid matching method which is based on the lowest rank representation (LRR) of the SIFT descriptors. The proposed method, named as LrrSPM, is very efficient in computation while still maintaining a competitive accuracy on many data sets. LrrSPM formulates the quantization of the SIFT descriptors as a Nuclear norm optimization problem and utilizes the multiplescale representation to characterize the statistical information of the image. The paper also introduces an approximation method to speed up the computation of LRR. The method makes LRR handling incremental and large scale data possible. Experimental results based on several wellknown data sets show the good performance of LrrSPM.
References
 (1) N. AcostaMendoza, A. GagoAlonso, J. E. MedinaPagola, Frequent approximate subgraphs as features for graphbased image classification, KnowledgeBased Systems 27 (2012) 381–392.
 (2) J. Lu, G. Wang, P. Moulin, Image set classification using holistic multiple order statistics features and localized multikernel metric learning, in: IEEE International Conference on Computer Vision, 2013, pp. 329–336.
 (3) T. Joachims, A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization., Tech. rep., DTIC Document (1996).

(4)
D. M. Blei, A. Y. Ng, M. I. Jordan, Latent dirichlet allocation, the Journal of machine Learning research 3 (2003) 993–1022.
 (5) J. Sivic, A. Zisserman, Video google: A text retrieval approach to object matching in videos, in: IEEE International Conference on Computer Vision, 2003, pp. 1470–1477.
 (6) G. Csurka, C. Dance, L. Fan, J. Willamowski, C. Bray, Visual categorization with bags of keypoints, in: Workshop on statistical learning in computer vision, ECCV, Vol. 1, 2004, pp. 1–2.

(7)
L. FeiFei, P. Perona, A bayesian hierarchical model for learning natural scene categories, in: IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, 2005, pp. 524–531.
 (8) K. Grauman, T. Darrell, The pyramid match kernel: Discriminative classification with sets of image features, in: IEEE International Conference on Computer Vision, Vol. 2, 2005, pp. 1458–1465.
 (9) A. Bolovinou, I. Pratikakis, S. Perantonis, Bag of spatiovisual words for context inference in scene classification, Pattern Recognition 46 (3) (2013) 1039–1053.
 (10) S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, in: IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, 2006, pp. 2169–2178.
 (11) K. Yu, T. Zhang, Y. Gong, Nonlinear learning using local coordinate coding, in: Advances in neural information processing systems, 2009, pp. 2223–2231.
 (12) J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, Y. Gong, Localityconstrained linear coding for image classification, in: IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 3360–3367.
 (13) K. Yu, Y. Lin, J. Lafferty, Learning image representations from the pixel level via hierarchical sparse coding, in: IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 1713–1720.
 (14) S. Gao, I. W.H. Tsang, L.T. Chia, Sparse representation with kernels, IEEE Transactions on Image Processing 22 (2) (2013) 423–434.
 (15) L. Zhou, Z. Zhou, D. Hu, Scene classification using a multiresolution bagoffeatures model, Pattern Recognition 46 (1) (2013) 424–433.
 (16) L. FeiFei, R. Fergus, P. Perona, Oneshot learning of object categories, IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (4) (2006) 594–611.
 (17) J. Yang, K. Yu, Y. Gong, T. Huang, Linear spatial pyramid matching using sparse coding for image classification, in: IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1794–1801.
 (18) D. G. Lowe, Distinctive image features from scaleinvariant keypoints, International journal of computer vision 60 (2) (2004) 91–110.
 (19) G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, Y. Ma, Robust recovery of subspace structures by lowrank representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (1) (2013) 171–184.

(20)
P. Favaro, R. Vidal, A. Ravichandran, A closed form solution to robust subspace estimation and clustering, in: IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 1801–1807.
 (21) X. Peng, L. Zhang, Z. Yi, Constructing l2graph for subspace learning and segmentation, arXiv preprint arXiv:1209.0841.
 (22) H. Zhang, Z. Yi, X. Peng, fLRR: fast lowrank representation using frobeniusnorm, Electronics Letters 50 (13) (2014) 936–938.
 (23) S. Xiao, M. Tan, D. Xu, Weighted blocksparse low rank representation for face clustering in videos, in: European Conference on Computer Vision, 2014, pp. 123–138.
 (24) S. Yang, Z. Feng, Y. Ren, H. Liu, L. Jiao, Semisupervised classification via kernel lowrank representation graph, KnowledgeBased Systemsdoi:http://dx.doi.org/10.1016/j.knosys.2014.06.007.

(25)
G. Liu, S. Yan, Latent lowrank representation for subspace segmentation and feature extraction, in: IEEE International Conference on Computer Vision, 2011, pp. 1615–1622.
 (26) B. Recht, M. Fazel, P. Parrilo, Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization, SIAM Review 52 (3) (2010) 471–501.
 (27) D. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: Which helps face recognition?, in: IEEE International Conference on Computer Vision, 2011, pp. 471–478.
 (28) X. Peng, L. Zhang, Z. Yi, K. K. Tan, Learning localityconstrained collaborative representation for robust face recognition, Pattern Recognition 47 (9) (2014) 2794–2806.
 (29) L. Wei, F. Xu, J. Yin, A. Wu, Kernel localityconstrained collaborative representation based discriminant analysis, KnowledgeBased Systemsdoi:http://dx.doi.org/10.1016/j.knosys.2014.06.027.
 (30) S. Gao, I.H. Tsang, Y. Ma, Learning categoryspecific dictionary and shared dictionary for finegrained image categorization, IEEE Transactions on Image Processing 23 (2) (2014) 623–634.
 (31) S. A. Nene, S. K. Nayar, H. Murase, et al., Columbia object image library (coil20), Tech. rep., Technical Report CUCS00596 (1996).
 (32) S. K. Nayar, S. A. Nene, H. Murase, Columbia object image library (coil 100), Department of Comp. Science, Columbia University, Tech. Rep. CUCS00696.
 (33) A. S. Georghiades, P. N. Belhumeur, D. J. Kriegman, From few to many: Illumination cone models for face recognition under variable lighting and pose, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (6) (2001) 643–660.
 (34) A. Oliva, A. Torralba, Modeling the shape of the scene: A holistic representation of the spatial envelope, International journal of computer vision 42 (3) (2001) 145–175.