Face recognition is one of the hottest research topics in computer vision due to its wide range of applications, from public security to personal consumer electronics. Although signicicant improvement has been achieved in the past decades, a reliable face recognition system for real life environments is still very challenging to build due to the large intra-class facial variations, such as expression, illumination, pose, aging and the small inter-class facial differences.
For a face recognition system, face representation and classifier construction are the two key factors. face representation can be divided into two categories: holistic feature based and local feature based. Principle Component Analysis (PCA) based Eigenface  and Linear Discriminative Analysis (LDA) based Fisherface  are the two most famous holistic face representations. PCA projects the face image into a subspace such that the most variations are kept, which is optimal in terms of face reconstruction. LDA considers the label information of the training data and linearly projects face image into a subspace such that the ratio of the between-class scatter over the within-class scatter is maximized. Both PCA and LDA projects the face image into a low dimensional subspace on which the classification is easier. It is based on an assumption that the high dimensional face images lie on a low dimensional subspace or sub-manifold. Therefore, it is beneficial to first project the high dimensional face image into that low dimensional subspace to extract the main structure of the face data and reduce the impact of the unimportant factors, such as illumination changes. Many other holistic face representations have been proposed later, including Locality Preserving Projection (LPP) 
, Independent Component Analysis (ICA), Local Discriminant Embedding (LDE) , Neighborhood Preserving Embedding (NPE) , Maximum margin criterion (MMC)  and so on.
The holistic face representation is known to be sensitive to expression, illumination, occlusion, noise and other local distortions. The local face representation which extracts features by using local information is shown to be more robust against those factors. The most commonly used local features in face recognition include Local Binary Pattern (LBP) , Gabor Wavelets , Scale-Invariant Feature Transform (SIFT) , Histogram of Oriented Gradients (HOG)  and so on.
To classify the extracted representations of faces into correct classes, a classier needs to be constructed. Many classifiers have been proposed and the most widely used classifier is the Nearest neighbor classifier (NN) and it is improved by Nearest Feature Line (NFL) , Nearest Feature Plane (NFP)  and Nearest Feature Space (NFS)  in different ways. Recently, Sparse Representation Classification (SRC)  is proposed and shows good recognition performance and is robust to random pixel noise and occlusion. SRC codes the test sample as a sparse linear combination of all training samples by exposing an -norm constraint on the resulting coding coefficients. The -norm constraint is very expensive which is the main obstacle of applying SRC in large scale face recognition systems. Lately, Collaborative Representation Classification (CRC)  is proposed which achieves comparable performance to SRC and has a much faster recognition speed. The author in  finds that it is the collaborative representation not the -norm constraint that is important in the classification process. By replacing the slow -norm with a much fast -norm constraint, CRC codes each test sample as a linear combination of all the training faces with a closed-form solution. As a result, CRC can recognize a test sample 10-1000 times faster than SRC as shown in .
In this paper, we propose to ensemble several CRCs to boost the performance of CRC. Each CRC is a weak classifier are combined to construct the strong classifier named ensemble-CRC. For each test sample, several different face representations are extracted. Then, severl CRCs are used to make the classification using those face representations. A weight is then calculated and assigned to each CRC by considering the reconstruction residue characteristics. By analyzing the magnitude relationship between reconstruction residues of different classes, the highly correct CRC can be identified. Large weights are assigned to those highly correct CRCs and small weights are assigned to the rest CRCs. Finally, the classification is obtained by a weighted combination of the reconstruction residues of all CRCs.
One key factor to the success of ensemble learning is the significant diversity among the weak classifiers. For example, if different CRC makes different errors for test samples, then, the combination of many CRCs tends to yield much better results than each CRC. To this end, some randomly generated biologically-inspired face representation will be used. Biologically-inspired features have generated very competitive results in a variety of different object and face recognition contexts , , . Most of them try to build artificial visual systems that mimic the computational architecture of the brain. We use the similar model as in , in which the author showed that the randomly generated biologically-inspired features perform surprisingly well, provided that the proper non-linearities and pooling layers are used. The randomly generated biologically-inspired model is shown to be inherently frequency selective and translation invariant under certain convolutional pooling architectures . It is expected that different randomly generated biologically-inspired features may generate different face representations (e.g., corresponds to different frequencies). Therefore, the proposed ensemble-CRC can obtain the significant diversity which is highly desired.
Ii Proposed Method
First, we briefly introduce CRC. CRC codes a test sample using all the training samples linearly and pose an constraints on the coding coefficients. Then, the reconstruction of the test sample is formed by linearly combine the training samples from a specific class utilizing the corresponding coding coefficients. The test sample is classified into the class that has the smallest reconstruction error.
More specifically, suppose there are training samples from different classes. For each class , there are training samples. The th training sample of class is denoted as where is the feature’s dimensionality. Let be the set of entire training samples, where is composed of training samples from class . For a given test sample , CRC solves the following problem
where is the regularization parameter. The solution of the above problem can be obtained analytically as
Let . It can be seen that is independent of the test sample and can be pre-calculated. For each test sample, we only need simply project onto to obtain the coding coefficients. To make the classification of , the reconstruction of by each class should be calculated. For each class , let
be the characteristic function that keeps the coefficients of classand assigns the coefficients associated with other class to be . The reconstruction of by the class is obtained as . The reconstruction error of class is obtained by
CRC classifies into the class that has minimum reconstruction error.
The proposed ensemble CRC utilizes multiple CRCs and combines them together to obtain a final classification. Assume there are different face representations extracted from each face, and training set can be formed as and . Then, projection matrix can be obtained using . For a test sample , different representations are extracted and denoted as . For each set of , the coding coefficients can be obtained using Equation (2) and the corresponding reconstruction errors can be obtained using Equation (3).
Different face representation has different performance for a particular test sample, therefore, proper weights should be assigned to different CRCs given the test sample. Notice that CRC determines the class of the test sample by selecting the minimum classification error. If the correct class produces small reconstruction error and all other incorrect classes produce large reconstruction errors, CRC makes correct classification easily in this situation. However, when some incorrect classes produce similar or smaller reconstruction error compared with the correct class, CRC may make wrong classification in this situation. In the latter situation, the reconstruction error of the correct CRC is usually among the several small reconstruction errors. In summary, CRC has high fidelity of correct classification when there is only one small reconstruction error and CRC has low fidelity of correct classification when there are several small reconstruction errors. We utilize this observation to guide the calculation of the weights. For each representation, the smallest (denoted as ) and the second smallest (denoted as ) reconstruction errors are picked, then the difference value between the two reconstruction errors is calculated as . Each representation has its difference value and difference values can be obtained as . Then, the weight for the th CRC can be calculated as
It is obvious that the larger the difference, the larger the weight. After obtaining all the weight, the reconstruction error of class is calculated as
The ensemble-CRC will assign the test sample into the class where the combined reconstruction error has minimum value.
Ii-B Randomly Generated Biologically-Inspired Feature
The biologically-inspired features used in the proposed ensemble-CRC are similar in form as the biologically-inspired features in 
. The feature extraction process includes four layers: filter bank layer, rectification layer, local contrast normalization layer and pooling layer. Different Biologically-inspired features can be obtained by modifying the structure of the extraction process or using different model parameters. The details of each layer are introduced in the following.
Filter bank layer. The input image is convolved with a certain number of filters. Assume the input image has size and each filter has size , the convolved output (or feature map) will have size . The output can be computed as
where is the convolve operation, tanh is the hyperbolic tangent non-linearity function and is a gain factor.
Rectification layer. This layer simply applies the absolute function to the output of the filter bank layer as .
Local contrast normalization layer. Local subtractive and divisive normalization are performed which enforces the local competition between adjacent features in a feature map. More details can be found in .
Pooling layer. The pooling layer transforms the joint feature representation into a more robust feature which achieves invariance to transformations, clutter and small distortions. Max pooling and average pooling can be used. For max pooling, the max value of a small non-overlapping region in the feature map is selected. All other features in this small local region are discarded. The average pooling returns the average value of the small local region in the feature map. After pooling, the number of feature in feature maps are reduced. The reduction ratio is determined by the size of the local region.
it is shown in  that the filters in the filter bank layer can be assigned with small random values and the obtained randomly generated features still achieve very good recognition performance in several image classification benchmark data sets.
The reason that we select the randomly generated biologically-inspired features in the proposed ensemble-CRC is twofold. First, it performs well in many different visual recognition problems, and second, the randomness in it provides some diverseness. It is shown that a necessary and sufficient condition for an ensemble of classifier to be more accurate than any of its individual members is if the classifiers are accurate and diverse .
Ii-C The Complete Recognition Process
The complete recognition process for a test face image is shown in Fig. 1. The input face image is first convolved with filters and then transformed non-linearly. As a result, feature maps are obtained, which are then rectified and normalized. Then, pooling is used to extract the salient features and reduce the feature map’s size. Because the extract feature maps still have big size, we transform the -D feature maps into
-D vectors and use PCA to reduce the dimensionality. After PCA,feature maps are transformed into face representations with reduced dimensionality. Up to now, we finish the extraction of different features. Next, the extracted features are used by CRCs, then, classification results are weighted combined to form the final classification result.
We compare the proposed ensemble-CRC with CRC , AW-CRC (Adaptive and Weighted Collaborative Representation Classification) , SRC , WSRC (Weighted Sparse Representation Classification)  and RPPFE (Random Projection based Partial Feature Extraction) . using AR  and LFW  face databases.
The AR database consists of over frontal face images from individuals. The images have different facial expressions, illumination conditions and occlusions. The images were taken in two separate sessions, separated by two weeks time. In our experiment, we choose a subset of the AR database consisting of male subjects and female subjects and crop image into the size of . For each subject, the seven images with only illumination change and expressions from Session one are used for training. The seven images with only illumination change and expressions from Session two are used for testing.
The Labeled Faces in the Wild (LFW) database is a very challenging database consists of faces with great variations in terms of lighting, pose, expression and age. It contains face images from persons. LFW-a is a subset of LFW that the face images are aligned using a commercial face alignment software. We adopt the same experiment setting in . In detail, subjects in LFW-a that have no less than images are chosen. For each subject, images are selected in the experiment. Thus, there are in total images used in our experiment. Each image is first cropped to and then resized to the size of . Five images are used for training and the other five images for testing.
In all the following experiment, the filter size used is
, and all filters are randomly generated from a uniform distribution from. The non-linearity function used is as in . The pooling used is max pooling with size .
Iii-a Number of CRCs in Ensemble-CRC
The number of weak classifiers in an ensemble classifier is very important to the performance of the ensemble classifier. The increase of the number of weak classifiers improve the performance of the ensemble classifier at first, but the performance of the ensemble classifier may degrade when too many weak classifiers are used. Also, the more the weak classier, the more the computation is needed. Next, we conduct several experiments on AR database to show the huge impact of the number of weak classifiers and try to find the best number experimentally.
We test the number of weak classifier from to and the dimension after PCA is set as . We repeat the experiment times and the average result is reported in Fig. 3. It can be seen that the recognition rate is when only one CRC is used. With eight CRCs included in ensemble-CRC, the performance increases rapidly to . When CRCs are used in ensemble-CRC, the performance is around , and more CRCs do not improve the performance further. We conclude that CRCs seem to be the best number of weak classifiers. All the rest experiments thus use CRCs in ensemble-CRC.
Iii-B Weighted VS. Non-Weighted Ensemble-CRC
In the proposed ensemble-CRC, a weight is calculated for each CRC. The weights can all be assigned to be , and the obtained ensemble-CRC can be regarded as non-weighted ensemble-CRC. In the following, we compare the performance of the proposed weighted ensemble-CRC and the non-weighted ensemble-CRC on AR database, using the feature dimension of . Fig. 4 shows that the weighted ensemble-CRC consistently outperforms the non-weighted ensemble-CRC.
Iii-C Performance Comparison With Other Methods
In the following, the proposed ensemble-CRC is compared with CRC, AW-CRC, SRC, WSRC and RPPFE. Different feature dimensions are compared for each database as shown in Fig. 5. For AR database, ensemble-CRC achieves the recognition rate of with feature dimension of , which is higher than that of CRC (), higher than that of AW-CRC (), higher than that of SRC(), higher than that of WSRC() and higher than that of RPPFE(). With the increase of the dimension, the performance of ensemble-CRC, CRC, AW-CRC, SRC, WSRC and RPPFE all increase gradually. The highest recognition rate of ensemble-CRC, CRC, AW-CRC, SRC, WSRC and RPPFE are , , , , and respectively. It is clear that the proposed ensemble-CRC outperforms all other methods.
The LFW database is quite difficult. The highest recognition rate obtained by CRC, AW-CRC, SRC, WSRC and RPPFE is , , and, which are much lower than that of AR database. The proposed ensemble-CRC achieves the highest recognition rate of which is much higher than that of CRC, AW-CRC, SRC, WSRC and RPPFE. Due to the pooling operation, the dimension for each randomly generated biologically-inspired feature is constrained to be . However, the recognition rate may be higher if higher dimension of randomly generated biologically-inspired feature can be used (e.g., larger input image size), which can be inferred from the recognition rate curve of ensemble-CRC.
In this paper, a novel face recognition algorithm named ensemble-CRC is proposed. Ensemble-CRC utilizes the randomly generated biologically-inspired feature to create many high-performance and diverse CRCs which are combined using a weighted manner. The experimental result shows that the proposed ensemble-CRC outperforms the CRC, AW-CRC, SRC, WSRC and RPPFE.
-  Z. Chai, Z. Sun, H. Mendez-Vazquez, R. He, and T. Tan, “Gabor ordinal measures for face recognition,” Information Forensics and Security, IEEE Transactions on, vol. 9, no. 1, pp. 14–26, Jan 2014.
M. A. Turk and A. P. Pentland, “Face recognition using eigenfaces,” in
Proc.IEEE Conf. Compute. Vis. Pattern Recognit., 1991, pp. 586–591.
-  P. N. Belhumeur, J. P. Hespanha, and D. Kriegman, “Eigenfaces vs. fisherfaces: Recognition using class specific linear projection,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 19, no. 7, pp. 711–720, 1997.
-  X. Niyogi, “Locality preserving projections,” in Neural information processing systems, vol. 16, 2004, p. 153.
-  M. S. Bartlett, J. R. Movellan, and T. J. Sejnowski, “Face recognition by independent component analysis,” Neural Networks, IEEE Transactions on, vol. 13, no. 6, pp. 1450–1464, 2002.
-  H.-T. Chen, H.-W. Chang, and T.-L. Liu, “Local discriminant embedding and its variants,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2. IEEE, 2005, pp. 846–853.
-  X. He, D. Cai, S. Yan, and H.-J. Zhang, “Neighborhood preserving embedding,” in Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 2. IEEE, 2005, pp. 1208–1213.
-  X. Li, T. Jiang, and K. Zhang, “Efficient and robust feature extraction by maximum margin criterion,” Neural Networks, IEEE Transactions on, vol. 17, no. 1, pp. 157–165, 2006.
-  T. Ahonen, A. Hadid, and M. Pietikainen, “Face description with local binary patterns: Application to face recognition,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, no. 12, pp. 2037–2041, 2006.
-  C. Liu and H. Wechsler, “Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition,” Image processing, IEEE Transactions on, vol. 11, no. 4, pp. 467–476, 2002.
-  D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
-  N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1. IEEE, 2005, pp. 886–893.
-  S. Z. Li and J. Lu, “Face recognition using the nearest feature line method,” Neural Networks, IEEE Transactions on, vol. 10, no. 2, pp. 439–443, 1999.
-  J.-T. Chien and C.-C. Wu, “Discriminant waveletfaces and nearest feature classifiers for face recognition,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 12, pp. 1644–1649, 2002.
-  J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp. 210–227, 2009.
-  L. Zhang, M. Yang, and X. Feng, “Sparse representation or collaborative representation: Which helps face recognition?” in Proc. IEEE Int’l Conf. Computer vision, 2011, pp. 471–478.
-  Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
-  T. Serre, L. Wolf, and T. Poggio, “Object recognition with features inspired by visual cortex,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2. IEEE, 2005, pp. 994–1000.
-  D. Cox and N. Pinto, “Beyond simple features: A large-scale feature search approach to unconstrained face recognition,” in Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on. IEEE, 2011, pp. 8–15.
-  K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun, “What is the best multi-stage architecture for object recognition?” in Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009, pp. 2146–2153.
A. Saxe, P. W. Koh, Z. Chen, M. Bhand, B. Suresh, and A. Y. Ng, “On random
weights and unsupervised feature learning,” in
Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011, pp. 1089–1096.
-  N. Pinto, D. D. Cox, and J. J. DiCarlo, “Why is real-world visual object recognition hard?” PLoS computational biology, vol. 4, no. 1, p. e27, 2008.
-  T. G. Dietterich, “Ensemble methods in machine learning,” in Multiple classifier systems. Springer, 2000, pp. 1–15.
-  R. Timofte and L. Van Gool, “Adaptive and weighted collaborative representations for image classification,” Pattern Recognition Letters, vol. 43, pp. 127–135, 2014.
-  C.-Y. Lu, H. Min, J. Gui, L. Zhu, and Y.-K. Lei, “Face recognition via weighted sparse representation,” Journal of Visual Communication and Image Representation, vol. 24, no. 2, pp. 111–116, 2013.
-  C. Ma, J.-Y. Jung, S.-W. Kim, and S.-J. Ko, “Random projection-based partial feature extraction for robust face recognition,” Neurocomputing, vol. 149, pp. 1232–1244, 2015.
-  A. M. Martinez, “The ar face database,” CVC Technical Report, vol. 24, 1998.
-  G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, “Labeled faces in the wild: A database for studying face recognition in unconstrained environments,” University of Massachusetts, Amherst, Tech. Rep. 07-49, October 2007.
-  P. Zhu, L. Zhang, Q. Hu, and S. C. Shiu, “Multi-scale patch based collaborative representation for face recognition with margin distribution optimization,” in Computer Vision–ECCV 2012. Springer, 2012, pp. 822–835.