1 Introduction
Linear leastsquares (LS) estimation, or linear norm minimization (denoted as
for brevity throughout this paper), is widely used in computer vision and image analysis due to its simplicity and efficiency. Recently the
norm technique has been applied to recognition problems such as face recognition LRC10 , ShiEHS11. All of these methods are linear regressionbased and the regression residual is utilized to make the final classification. However, a small number of outliers can drastically bias
, leading to low quality estimates. Clearly, robust regression techniques are critical when outliers are present.The literature contains a range of different approaches to robust regression. One commonly used method is the Mestimator framework M_Estimators1973 , Huber1981 , where the Huber function is minimized, rather than the conventional norm. Related methods include Lestimators LEstimator1987 and Restimators REstimator1971 . One drawback of these methods is that they are still vulnerable to bad leverage outliers Rousseeuw2008 . By bad leverage points, we mean those observations who are outlying in space and do not follow the linear pattern of the majority. Least median of squares (LMS) LMS1984 , least trimmed squares (LTS) Rousseeuw:1987:RRO:40031 and the technique using data partition and Mestimation park2012robust have highbreakdown points. Although each of these regression methods is, in general, more robust than , they have rarely been applied to object recognition problems in computer vision due to their computational expense RLRC2012 , Fidler2006 .
Another class of robust methods have been developed to remove these abnormal observations from the measurement data. One of the most popular methods is RANSAC RANSAC which attempts to maximise the size of the consensus set. RANSAC relies on iterative random sampling and consensus testing where the size of each sample is determined by the minimum number of data points require to compute a single solution. RANSAC’s efficiency is therefore directly tied to the time and number of data points required to compute a solution. For example, RANSAC has been successfully applied to multiview structurefrommotion and homography estimation problems. However, it is unclear how to apply RANSAC to visual recognition problems, e.g., face recognition, where face images are usually in a highdimensional space.
Sim and Hartley Sim2006 proposed an outlierremoving method using the norm, which iteratively fits a model to the data and removes the measurement with the largest residual at each iteration. Generally, the iterative method can fail for the optimization problems, however it is valid for a wide class of
problems. Sim and Hartley proved that the set of measurements with largest residual must contain at least one outlier. Hence continuing to iterate eventually removes all the outliers. This method is shown to be effective in outlier detection for multiview geometry problems
Sim2006 , Olsson2010 .norm minimization can be timeconsuming, since at each step one needs to solve an optimization problem via SecondOrder Cone Programming (SOCP) or Linear Programming (LP) in the application of multiview geometry
Kahl2005 , Ke2005 . The software package SeDuMi sedumi99 provides solvers for both SOCP and LP problems.In this paper, we propose a fast algorithm to minimize the norm for approximating the least median estimation (denoted as for brevity throughout the paper). Observing that the norm is determined by only a few measurements, the optimization strategy column generation (CG) Lubbecke05 can be applied to reduce the main problem into a set of much smaller subproblems. Each subproblem can be formulated as a Quadratic Programming (QP) problem. Due to its relatively small size, the QP problem can be solved extremely efficiently using customized solvers. In particular, we can generate solvers using the technique introduced by Mattingley and Boyd CVXGEN
. This reduction results in a speedup of several orders of magnitude for high dimensional data.
This degree of speedup allows to be applied to problems which were previously inaccessible. We show how the outlier removal technique can be applied to several classification problems in computer vision. Representations of objects in this type of problems are often derived by using to solve equations containing samples in the same class (or other collaborative classes) LRC10 , ShiEHS11 . Representation errors are then taken into classification where the query object is assigned to the class corresponding to the minimal residual. This method is shown to be effective on data without occlusion or outliers. However, in realworld applications, measurement data are almost always contaminated by noises or outliers. Before a robust representation can be obtained via linear estimation, outlier removal is necessary. The proposed method is shown to significantly improve the classification accuracies in our experiments for face recognition and iris recognition on several public datasets.
2 Related work
Hartley and Schaffalitzky Hartley04b seek a globally optimal solution for multiview geometry problems via norm optimization, based on the fact that many geometry problems in computer vision have a single local, and hence global, minimum under the norm. In contrast, the commonly used cost function typically has multiple local minimum Hartley04b , Sim2006 . This work has been extended by several authors, yielding a large set of geometry problems whose globally optimal solution can be found using the norm (Olsson provides a summary Olsson07 ).
It was observed that these geometry problems are examples of quasiconvex optimization problems, which are typically solved by a sequence of SOCPs using a bisection (binary search) algorithm Kahl2005 , Ke2005 . Olsson et al. Olsson07 show that the functions involved in the norm problems are in fact pseudoconvex which is a stronger condition than quasiconvex. As a consequence, several fast algorithms have been proposed Olsson07 , Hongdong09 .
Sim and Hartley Sim2006 propose a simple method based on the norm for outlier removal, where measurements with maximal residuals are thrown away. The authors prove that at least one outlier is removed at each iteration, meaning that all outliers will be rejected in a finite number of iterations. However the method is not efficient, since one need to solve a sequence of SOCPs. Observing that many fixeddimensional geometry problems are actually instances of LPtype problem, an LPtype framework was proposed for the multiview triangulation problem with outliers Hongdong07 .
Recently, the Lagrange dual problem of the minimization problem posed in Hartley04b was derived in Olsson2010 . To further boost the efficiency of the method, the authors of Olsson2010 proposed an minimization algorithm for outlier removal. While the aforementioned methods add a single slack variable and repeatedly solve a feasibility problem, the algorithm adds one slack variable for each residual and then solves a single convex program. While efficient, this method is only successful on data drawn from particular statistical distributions.
Robust statistical techniques, including the aforementioned robust regression and outlier removal methods, can significantly improve the performance of their classic counterparts. However, they have rarely been applied in image analysis field, to problems such as visual recognition, due to their computational expense. The Mestimator method is utilized in RLRC2012 for face recognition and achieved high accuracy even when illumination change and pixel corruption were present. In Fidler06 , the authors propose a theoretical framework combining reconstructive and discriminative subspace methods for robust classification and regression. This framework acts on subsets of pixels in images to detect outliers.
The reminder of this paper is organized as follows. In Section 3 we briefly review the and problems. In Section 4, the main outlier removal algorithm is presented. In Section 5, we formulate the norm minimization problem into a set of small subproblems which can be solved with high efficiency. We then apply the outlier removal technique in Section 6 to several visual recognition applications. Finally the conclusion is given in Section 8.
3 The and norm minimization problems
In this section, we briefly present the norm minimization problem in the form we use in several recognition problems. Let us first examine the norm minimization problem,
(1) 
for which we have a closedform solution^{1}^{1}1The closedform solution can only be obtained when is overdetermined, i.e., . When
, one can solve the multicollinearity problem by ridge regression, or another variable selection method, to obtain a unique solution.
(2) 
where is the measurement data matrix, composed of rows , and usually
. The model’s response is represented by the vector
and stores the parameters to be estimated. Note that in our visual recognition applications, both and the columns of are images flattened to vectors. According to the linear subspace assumption Basri00lambertianreflectance , a probe image can be approximately represented by a linear combination of the training samples of the same class: . Due to its simplicity and efficacy, the linear representation method is widely used in various image analysis applications, e.g., LRC10 , ShiEHS11 , CRCzhanglei2011 , Wright09 , Yang20121104 , Iris2011 .The norm minimization aims to minimize the sum of squared residuals (1), where the terms , are the squared residuals. norm minimization is simple and efficient, however it utilizes the entire data set and therefore can be easily influenced by outliers.
Instead of minimizing the sum of squared residuals, the  norm minimization method seeks to minimize only the maximal residual, leading to the following formulation:
(3) 
This equation has no closedform solution, however it can be easily reformulated into a constrained formulation, with an auxiliary variable:
(4) 
which is clearly a SOCP problem. If we take the absolute value of the residual in (3), we obtain
(5) 
leading to an LP problem
(6) 
A critical advantage of the norm cost function is that it has a single global minimum in many multiview geometry problems Hartley04b , Hongdong07 . Unfortunately, like norm minimization, the norm minimization method is also vulnerable to outliers. Moreover, minimizing the norm fits to the outliers, rather than the data truly generated by the model Sim2006 . Therefore, it is necessary to first reject outliers before the estimation.
4 Outlier removal via maximum residual
In Sim2006 , outlier removal is conducted in an iterative fashion by first minimizing the norm, then removing the measurements with maximum residual and then repeating. The measurements with maximum residual are referred to as the support set of the minimax problem, i.e.,
(7) 
where is the optimum residual of the minimax problem. The outlier removal strategy does not work well for the general minimization problems (i.e., ), because the outliers are not guaranteed to be included in the support set. In contrast, this strategy is valid for the minimization problems. For problem (3) or (5), it is proved by the following theorems that the measurements with largest residual must contain at least one outlier.
Suppose the index vector is composed of and , the inlier and outlier sets respectively, and there exists such that . Then we have the following theorem.
Theorem 1.
Theorem 2.
Proof.
At each iteration, we first obtain the optimal parameters by solving (3) or (5) and then remove the measurements (pixels in images) corresponding to largest residual. If we continue the iteration, all outliers are eventually removed.
As with all outliers removal processes, there is a risk that discarding a set of outliers will remove some inliers at the same time. In this framework, the outliers are individual pixels, which are in good supply in visual recognition applications. For example, a face image will typically contain hundreds or thousands of pixels. Removing a small fraction of the good pixels is therefore unlikely to affect recognition performance. However, if too many pixels are removed, the remaining pool may be too small for successful recognition. Therefore, we propose a process to restore incorrectly removed pixels where possible, as part of the overall outlier removal algorithm list in Algorithm 1
. In practice, the heuristic remedy step does improve the performance of our method on the visual recognition problems in our experiments. Also note that it is impossible that all points in the support set are moved back in step 6, which is because, based on Theorem
2 we can prove that5 A fast algorithm for the norm minimization problem
Recalling Theorem 2, we may remove any data not in the support set without changing the value of the norm. This property allows us to subdivide the large problem into a set of smaller subproblems. We will proceed by first presenting a useful definition of pseudoconvexity:
Definition 1. A function is called pseudoconvex if is differentiable and implies .
In this definition, has to be differentiable. However the notion of pseudoconvexity can be generalized to nondifferentiable functions pseudoconvex2001 :
Definition 2. A function is called pseudoconvex if for all :
(9) 
where is subdifferential of .
Both of these definitions share the property that any local minimum of a pseudoconvex function is also a global minimum. Based on the first definition, it has been proved that if the residual error functions are pseudoconvex and differentiable, the cardinality of the support set is not larger than Olsson08 , Hongdong09 . Following the proof in [Hongdong09, ], one can easily validate the following corollary:
Corollary 1. For the minimax problem with pseudoconvex residual functions (differentiable or not), there must exist a subset such that
(10) 
It is clear that the squared residual functions in (3) are convex and differentiable, hence also, pseudoconvex. The absolute residual function (5): is subdifferentiable. It is easy to verify that the only nondifferentiable point, the origin, satisfies the second definition. Therefore function (5) also satisfies Corollary 1.
The above corollary says that we can solve a subproblem with at most measurements without changing the estimated solution to the original minimax problem. However before solving the subproblems, we should first determine the support set. We choose to solve a set of small subproblems using an optimization method called column generation (CG) Lubbecke05 . The CG method adds one constraint at a time to the current subproblem until an optimal solution is obtained.
The process is as follows: We first choose measurements not contained in the support set. These data are then used to compute a solution and residuals for all the data in the main problem are determined. Then the most violated constraint, or the measurement corresponding to the largest residual, is added to the subproblem. The subproblem is then too large, therefore we solve the subproblem again (now with size ) and remove an inactive measurement. Through this strategy, the problem is prevented from growing too large, and violating Corollary 1. When there are no violated constraints, we have obtained the optimal solution.
The proposed fast method is presented in Algorithm 2. We divide the data into an active set, corresponding to a subproblem, and the remaining set with the norm minimization. This algorithm allows us to solve the original problem with the measurement matrix of size , by solving a series of small problems with size or . In most visual recognition problems, . Typically the algorithm converges in less than 30 iterations in all of our experiments. We will show that this strategy radically improves computational efficiency.
For maximal efficiency, we choose to solve the LP problem, (3), and utilize the code generator CVXGEN CVXGEN to generate custom, high speed solvers for the subproblems in algorithm 2. CVXGEN is a software tool that automatically generates customized C code for LP or QP problems of modest size. CVXGEN is less effective on large problems (e.g., variables). However, in Algorithm 2 we convert the original problem into a set of small subproblems, which can be efficiently solved with the generator. CVXGEN embeds the problem size into the generated code, restricting it to fixedsize problems CVXGEN . The proposed method is only ever solves problems of size or , enabling the use of CVXGEN.
(11) 
6 Experimental Results
In this section, we first illustrate the effectiveness and efficiency of our algorithm on several classic geometric model fitting problems. Then the proposed method is evaluated on face recognition problems with both artificial and natural contiguous occlusions. Finally, we test our method on the iris recognition problem, where both segmentation error and occlusions are present. For comparison, we also evaluate several other representative robust regression methods on face recognition problems.
Once the outliers have been removed from the data set, any solver can be used to obtain the final model estimate. We implemented the original minimax algorithm using Matlab package CVX cvx with SeDuMi solver sedumi99 while the proposed fast algorithm was implemented using solvers generated by CVXGEN CVXGEN . All experiments are conducted in Matlab running on a PC with a QuadCore 3.07GHz CPU and 12GB of RAM, using mex to call the solvers from CVXGEN. Note that the algorithm makes no special effort to use multiple cores, though Matlab itself may do so if possible.
6.1 Geometric model fitting
6.1.1 Line fitting
Figure 1 shows estimation performance when our algorithm is used for outlier removal and the line is subsequently estimated via least squares, on data generated under two different error models.
We generate randomly and randomly. We then set the first error terms
as independent standard normal random variables. We set the last
error terms as independent chi squared random variables with degrees of freedom. We also test using the twosided contamination model which sets the sign of the last variables randomly such that the outliers lie on both side of the true regression line. In both cases we set .As can be seen in Figure 1, our method detects all of the outliers and consequently generates a line estimate which fits the inlier set well for both noise models, whilst the estimate obtained with the outliers included achieves a reasonable estimate only for the twosided contamination case, where the outliers are evenly distributed on both sides of the line.
6.1.2 Ellipse fitting
An example of the performance of our method applied to ellipse fitting is shown in Figure 2. points were sampled uniformly around the perimeter of an ellipse centred at , and where then perturbed via offset drawn from .
outliers were randomly drawn from an approximately uniform distribution within the bounding box shown in Figure
2. The result of the method, again shown in Figure 2, shows that our method has correctly identified the inlier and outlier sets, and demonstrates that the centre and radius estimated by our method are accurate.Method  Number of observations  

20  50  100  200  500  1000  2000  10000  
original  0.185  0.454  0.89  1.926  5.093  11.597  29.968  313.328 
fast  0.002  0.005  0.011  0.031  0.083  0.164  0.395  4.127 
6.1.3 Efficiency
Next we compare the computational efficiency of the standard norm outlier removal process and of the proposed fast algorithm.
For the line fitting problem we generate the data using the scheme described previously. We initially fix the data dimension and increase the problem size from 20 to 10000. The outlier fraction, , is set 90% of . A comparison of the running time for these two algorithms are shown in Table 1. The fast algorithm finishes 70 to 80 times faster than the original algorithm. Specifically, with dimension 10000 the fast algorithm finishes in approximately 4 seconds, whilst the original algorithm requires more than 5 minutes. In this case, the proposed fast approach is about 80 times faster than the conventional approach.
Second, we fix the number of observations and vary the data dimension from 2 to 10. Execution times are shown in Table 2. Consistent with the last experiment, the fast algorithm completes far more rapidly than the original algorithm in all situations. When , the proposed fast algorithm is faster than the original algorithm by more than 60 times. With a larger dimension , the proposed fast algorithm takes only 0.128 seconds to complete while the original algorithm requires more than 2 seconds.
Method  Feature dimension  

2  4  6  8  10  
original  1.926  1.984  1.994  2.016  2.039 
fast  0.031  0.045  0.068  0.096  0.128 
6.2 Robust face recognition
In this section, we test our method on face recognition problems from 3 datasets: AR AMM98 , Extended Yale B GeBeKr01 , and CMUPIE Sim03thecmu . A range of stateoftheart algorithms are compared to the proposed method. Recently, sparse representation based classification (SRC) Wright09 obtained an excellent performance for robust face recognition problems, especially with contiguous occlusions. The SRC problem solves , where is the training data from all classes, is the corresponding coefficient vector and is the error tolerance. To handle occlusions, SRC is extended to where , and . and
are the identity matrix and error vector respectively. SRC assigns the test image to the class with smallest residual:
. Here is a vector whose only nonzero entries are the entries in corresponding to the th class. We also evaluate the following two methods which are related to our method. Most recently, a method called Collaborative Representationbased Classification (CRC) was proposed in CRCzhanglei2011 which relax the norm to norm. Linear regression classification (LRC) LRC10 cast face recognition as a simple linear regression problem: , where and are the training data and representative coefficients with respect to class . LRC selects the class with smallest residual: . Both CRC and LRC achieved competitive or even better results than SRC CRCzhanglei2011 , LRC10 in some cases.For the purposes of the face recognition experiments, outlier pixels are first removed using our method, leaving the remaining inlier set to be processed by any regression based classifier. In the experiments listed below, LRC has been used for this purpose due to its computational efficiency.
Lots of robust regression estimators has been developed in the statistic literature. In this section, we also compare other two popular estimators, namely, Least median of squares (LMS) LMS1984 and MMestimator yohai1987high , both of which have highbreakdown points and do not need to specify the number of outliers to be removed. Comparison is conducted on face recognition problems with both artificial and natural occlusions. These methods are first used to estimate the coefficient and face images are recognized by the minimal residuals.
Method  Feature dimension  

54  130  300  540  
LRC  21.0%  38.5%  54.5%  60.0% 
SRC  48.0%  67.5%  69.5%  64.5% 
CRC  22.0%  35.5%  44.5%  56.0% 
MMestimator  0.5%  8.5%  21%  24% 
LMS  9%  25%  37.5%  48% 
our method  43.0%  85.0%  99.5%  100% 
6.2.1 Faces recognition despite disguise
The AR dataset AMM98 consists of over facial images from subjects ( men and women). For each subject facial images were taken in two separate sessions, per session. The images exhibit a number of variations including various facial expressions (neutral, smile, anger, and scream), illuminations (left light on, right light on and all side lights on), and occlusion by sunglasses and scarves. Of the subjects available, have been randomly selected for testing (50 males and 50 females) and the images cropped to pixels. images of each subject with various facial expressions, but no occlusions, were selected for training. Testing was carried out on images of each of the selected subjects wearing sunglasses. Figure 3 shows two typical images from the AR dataset with the outliers (30% of all the pixels in the face images) detected by our method set to white. The reconstructed images are shown as the third and sixth images.
The images were downsampled to produce features of , , , and dimensions respectively. Table 3 shows a comparison of the recognition rates of various methods. Our method exhibits superior performance to LRC, CRC and SRC in all except the lowest feature dimension case. Specifically with feature dimension 540, the proposed method achieves a perfect accuracy 100%, which outperforms LRC, CRC and SRC by 40%, 44% and 35.5% respectively. MMestimator and LMS failed to achieve good results on this dataset, which is mainly because the residuals of outliers severely affect the final classification although relatively accurate coefficients could be estimated. In this face recognition application, the final classification is based on the fitting residual. These results highlight the ability of our method for outlier removal, which can significantly improve the face recognition performance.
Method  Occlusion rate  

10%  20%  30%  35%  40%  50%  
LRC  
SRC  
CRC  
our method 
Mean and standard deviations of recognition accuracies (%) in the presence of randomly placed block occlusions of images from the Extended Yale B dataset based on 5 runs results.
6.2.2 Contiguous block occlusions
In order to evaluate the performance of the algorithm in the presence of artificial noise and larger occlusions, we now describe testing where large regions of the original image are replaced by pixels from another source. The Extended Yale B dataset GeBeKr01 was used as the source of the original images and consists of frontal face images from subjects under various lighting conditions. The images are cropped and normalized to pixels KCLee05 . Following Wright09 , we choose subsets 1 and 2 (715 images ) for training and Subset 3 (451 images) for testing. In our experiment, all the images are downsampled to pixels. We replace a randomly selected region, covering between 10% and 50% of each image, with a square monkey face. Figure 4 shows the monkey face, an example of an occluded image, the outlying pixels detected by our method and a reconstructed copy of the input image.
Table 4 compares the average recognition rates of the different methods, averaged over five separate runs. Our proposed method outperforms all other methods in all conditions. With small occlusions, all methods achieve high accuracy, however, the performance of LRC, SRC and CRC deteriorate dramatically as the size of the occlusion increases. In contrast, our method is robust in the presence of outliers. In particular, with 30% occlusion our method obtains 98.2% accuracy while recognition rates of all the other methods are below 80%. With 50% occlusion, all other methods show low performances, while the accuracy for our method is still above 85%. According to Table 4, we can also see that the proposed method is more stable in the sense of accuracy variations, which is mainly because the outliers are effectively detected.
6.2.3 Partial face features on the CMU PIE dataset
As shown in the previous example, occlusion can significantly reduce face recognition performance, particularly in methods without outlier removal. Wright et al. Wright09 attempt to identify faces on the basis of particular subimages, such as the area around the eye or ear, etc. Here, we use the complete face and remove increasing portions of the bottom half of the image, so that initially the neck is obscured, followed by the chin and mouth, etc. The removal occurs by setting the pixels to be black. Thus, the complete image is used as a feature vector, and a subset of the elements is set to zero. A second experiment is performed following the same procedure, but to the central section of the face, thus initially obscuring the nose, then the ears, eyes and mouth, etc.
In this experiment, we use the CMUPIE dataset Sim03thecmu which contains 68 subjects and a total of 41368 face images. Each person has their picture taken with 13 different poses, 43 different illumination conditions, and with 4 different expressions. In our experiment all of the face images are aligned and cropped, with 256 gray level per pixel He05laplacianscore , and finally resized to pixels. Here we use the subset containing images of pose C27 (a nearly front pose) and we use the data from the first 20 subjects, each subject with 21 images. The first 15 images of each subject are used for training and the last 6 images for testing. The test images are preprocessed so that one part (bottom or middle) of faces (from to of pixels) are set to black. See Figure 5 for examples.
The recognition rates of different methods when the bottom area of the image is occluded are reported in Table 5. When occlusion area is small, all methods except MMestimator obtain perfect 100% recognition rates. When occlusion area increases to 20% of the image size, accuracy for LRC drops to 80%, which is because the black pixels bias the linear regression estimate. The technique used in SRC mentioned above performs better than LRC when occlusion are present, achieving 98.3% accuracy. However our method is able to achieve 100% accuracy with that level of occlusion. We can see that CRC achieves a relatively good result (83.3%) for 30% occlusion, accuracies of other methods (including robust methods MMestimator and LMS) drop dramatically. In contrast, our method still achieves 100% accuracy which demonstrates the robustness of our method against heavy occlusion. The comparison of these methods for occlusion in the middle part of faces is shown in Figure 6. These results again show the robustness of our method against heavy occlusions. Almost all the methods show lower accuracy than in the former situation. Such a results leads to the conclusion that information from the middle part of a face (area around nose) is more discriminative than that form the bottom part (area around chin) for face recognition.
Method  Percentage of image removed  
10%  20%  30%  
LRC  100%  80.0%  61.7% 
SRC  100%  98.3%  71.7% 
CRC  100%  96.7%  83.3% 
MMestimator  99.2%  41.7%  15.0% 
LMS  100%  77.5%  58.3% 
our method  100%  100%  100% 
Method  Percentage of image removed  
10%  20%  30%  
LRC  100%  91.7%  35.0% 
SRC  100%  93.3%  65.0% 
CRC  98.3%  87.5%  53.3% 
MMestimator  92.5%  7.5%  0.8% 
LMS  100%  80.8%  22.5% 
our method  100%  99.2%  90.0% 
6.2.4 Efficiency
For the problem of identifying outliers in face images, we compare the computation efficiency using the AR face dataset, as described in Section 6.2.1. We vary the feature dimension from 54 to the original 19800. Table 7 shows the execution time for both the proposed fast algorithm and the original method. We can see that the fast algorithm outperforms the original in all situations. With low dimensional features, below 4800, the fast algorithm is approximately 20 times faster than the original. When the feature dimension increases to 19800, the original algorithm needs about 1.43 hours while the fast algorithm costs only about 6 minutes.
Method  Feature dimension  

54  300  1200  4800  19800  
original  2.051  9.894  48.371  396.323  5150.137 
fast  0.113  0.566  2.564  18.811  361.689 
6.3 Robust iris recognition
Iris recognition is a commonly used noncontact biometric measure used to automatically identify a person. Occlusions can also occur in iris data acquisition, especially in unconstrained conditions, caused by eyelids, eyelashes, segmentation errors, etc. In this section we test our method against segmentation errors, which can result in outliers from eyelids or eyelashes. Specially we take the NDIRIS0405 dataset ND06 , which contains 64,980 iris images obtained from 356 subjects with a wide variety of distortions. In our experiment, each iris image is segmented by detecting the pupil and iris boundaries using the opensource package of Masek and Kovesi MasekIris03 . 80 subjects were selected and 10 images from each subject were chosen for training and 2 images for testing. To test outlier detection, segmentation errors and artificial occlusions were placed on the iris area, in a similar fashion as Iris2011 . A few example images and their detected iris and pupil boundaries are shown in Figure 6. The feature vector is obtained by warping the circular iris region into a rectangular block by sampling with a radial resolution 20 and angular resolution of 240 respectively. These blocks were then are then resized to . For our method, 10% of pixels are detected and removed when test images are with only segmentation errors, and the corresponding additional number of pixels are removed for artificial occlusions.
The recognition results are summarized in Table 8. SRC used in Iris2011 for iris recognition and LRC are compared with our method. We can clearly see that the proposed method achieved the best results with all feature dimensions. Specifically, our method achieves 96.3% accuracy when iris images are with only segmentation errors while accuracy for LRC is 89.5%. SRC performs well (95.6%) for this task. However when 10% additional occlusions occur in the test images, performances for LRC and SRC drop dramatically to 43.8% and 61.3% respectively, while our method still achieves the same result 96.3% as before. When occlusions increase to 20%, our method still obtains a high accuracy 95% which is higher than those of LRC and SRC by 74.4% and 43.7% respectively.
6.3.1 Efficiency
Table 9 shows the computation time comparison of different methods on the iris recognition problem. Consistent with the former results, the proposed algorithm is much more efficient than the original algorithm.
Method  Percentage of artificial occlusion  
0%  10%  20%  
LRC  89.5%  43.8%  20.6% 
SRC  95.6%  61.3%  51.3% 
our method  96.3%  96.3%  95% 
Method  Image resolution  

original  2.400  6.692  31.942  255.182 
fast  0.131  0.385  1.941  14.906 
7 Discussion
The main drawback of our method is that one have to first estimate the outlier percentage empirically as done by many other robust regression methods. Actually, to our knowledge, for almost all the outlier removal methods, one has to preset the outlier percentage or some other parameters such as a residual threshold. This is in contrast with those robust regression methods using a robust loss such as the Huber function or even nonconvex loss. These methods do not need to specify the outlier percentage.
One may concern how the proposed algorithm will perform with an under or over estimated . Taking the AR dataset for example, we evaluate our method by varying from 25% to 45%. From Table 10, we can see that the proposed method is not very sensitive to the preestimated outlier percentage when is over 30%. We also observe that our method becomes more stable when the image resolution is higher. This is mainly because, as mentioned before, visual recognition problems generally supply large amount of pixels by high dimensional images and consequently it is more crucial to reject as many outliers as possible than to keep all inliers.
Different from our approach, there exist many robust estimators which do not need to specify the outliers number, such as MMestimator yohai1987high , LMS LMS1984 and DPM park2012robust . These methods can also be applied to visual recognition problems as we have shown in Section 6. However, the difference is that our method can directly identify the outliers, which can help compute more reliable residuals for classification as shown in Section 6. Of course, for these methods, observations can be detected as outliers when the corresponding standardized residuals exceed the cutoff point, which also has to be determined a priori though.
Dimension  Percentage of removed pixels  

25%  30%  35%  40%  45%  
82%  85%  92.5%  95%  94.5%  
98%  99.5%  100%  100%  99.5%% 
8 Conclusion
In this work, we have proposed an efficient method for minimizing the norm based robust least squares fitting, and hence for iteratively removing outliers. The efficiency of the method allows it to be applied to visual recognition problems which would normally be too large for such an approach. The method takes advantage of the nature of the norm to break the main problem into more manageable subproblems, which can then be solved via standard, efficient, techniques.
The efficiency of the technique and the benefits that outlier removal can bring to visual recognition problems were highlighted in the experiments, with the computational efficiency and accuracy of the resultant recognition process easily beating all other tested methods.
Like many other robust fitting methods, the proposed method needs a parameter: the number of outliers to be removed. One may heuristically determine this value. Although it is not very sensitive for the visual recognition problems, in the future, we plan to investigate how to automatically estimate the outlier rate in the noisy data.
Acknowledgements
This work was in part supported by ARC Future Fellowship FT120100969. F. Shen’s contribution was made when he was visiting The University of Adelaide.
All correspondence should be addressed to C. Shen (chunhua.shen@adelaide.edu.au).
References
 [1] I. Naseem, R. Togneri, M. Bennamoun, Linear regression for face recognition, IEEE Trans. Patt. Anal. Mach. Intell 32 (11) (2010) 2106–2112.

[2]
Q. Shi, A. Eriksson, A. van den Hengel, C. Shen, Is face recognition really a compressive sensing problem?, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011, pp. 553–560.
 [3] P. J. Huber, Robust regression: Asymptotics, conjectures and monte carlo, Ann. Math. Statist. 1 (5) (1973) 799–821.
 [4] P. J. Huber, Robust Statistics, Wiley, New York, 1981, 2005.
 [5] R. Koenker, S. Portnoy, Lestimation for linear models, J. Amer. Statistical Assoc. 82 (1987) 851–857.
 [6] J. Jureckova, Nonparametric estimate of regression coefficients, Ann. Math. Statist. 42 (4) (1971) 1328–1338.
 [7] M. Hubert, P. J. Rousseeuw, S. v. Aelst, Highbreakdown robust multivariate methods, Statistical Science 23 (1) (2008) 92–119.
 [8] P. J. Rousseeuw, Least median of squares regression, J. Amer. Statistical Assoc. 79.
 [9] P. J. Rousseeuw, A. M. Leroy, Robust regression and outlier detection, John Wiley & Sons, Inc., New York, NY, USA, 1987.
 [10] Y. Park, D. Kim, S. Kim, Robust regression using data partitioning and mestimation, Commun. Statsimul. C. 41 (8) (2012) 1282–1300.
 [11] I. Naseem, R. Togneri, M. Bennamoun, Robust regression for face recognition, Patt. Recogn. 45 (1) (2012) 104–118.
 [12] S. Fidler, D. Skocaj, A. Leonardis, Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling, IEEE Trans. Patt. Anal. Mach. Intell 28 (3) (2006) 337–350.
 [13] M. A. Fischler, R. C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Communication the ACM 24 (1981) 381–395.
 [14] K. Sim, R. Hartley, Removing outliers using the norm, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006, pp. 485–494.
 [15] C. Olsson, A. Eriksson, R. Hartley, Outlier removal using duality, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010, pp. 1450–1457.
 [16] F. Kahl, Multiple view geometry and the norm, in: Proc. Int. Conf. Computer Vision, Vol. 2, 2005, pp. 1002–1009.
 [17] Q. Ke, T. Kanade, Quasiconvex optimization for robust geometric reconstruction, in: Proc. Int. Conf. Computer Vision, Vol. 2, 2005, pp. 986 –993 Vol. 2.
 [18] J. F. Sturm, Using sedumi 1.02, a MATLAB toolbox for optimization over symmetric cones, Optim. Method Softw. 1112 (1999) 625–653.
 [19] M. E. Lübbecke, J. Desrosiers, Selected topics in column generation, Oper. Res. 53 (6) (2005) 1007–1023.
 [20] J. Mattingley, S. Boyd, CVXGEN: a code generator for embedded convex optimization, Optim. Eng. 13 (2012) 1–27.
 [21] R. I. Hartley, F. Schaffalitzky, minimization in geometric reconstruction problems, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
 [22] C. Olsson, A. Eriksson, F. Kahl, Efficient optimization for problems using pseudoconvexity, in: Proc. Int. Conf. Computer Vision, 2007.
 [23] H. Li, Efficient reduction of geometry problems, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009, pp. 2695–2702.
 [24] H. Li, A practical algorithm for triangulation with outliers, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
 [25] S. Fidler, D. Skocaj, A. Leonardis, Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling, IEEE Trans. Patt. Anal. Mach. Intell 28 (3) (2006) 337–350.
 [26] R. Basri, D. Jacobs, Lambertian reflectance and linear subspaces, IEEE Trans. Patt. Anal. Mach. Intell 25 (2) (2003) 218–233.
 [27] L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: Which helps face recognition?, in: Proc. Int. Conf. Computer Vision, 2011, pp. 471–478.
 [28] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Trans. Patt. Anal. Mach. Intell 31 (2009) 210–227.
 [29] J. Yang, L. Zhang, Y. Xu, J. Yang, Beyond sparsity: The role of optimizer in pattern classification, Patt. Recogn. 45 (3) (2012) 1104–1118.
 [30] J. Pillai, V. Patel, R. Chellappa, N. Ratha, Secure and robust iris recognition using random projections and sparse representations, IEEE Trans. Patt. Anal. Mach. Intell 33 (9) (2011) 1877–1893.
 [31] S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004.
 [32] Generalized monotone multivalued maps, in: C. Floudas, P. Pardalos (Eds.), Encyclopedia of Optimization, 2001, pp. 764–769.
 [33] C. Olsson, O. Enqvist, F. Kahl, A polynomialtime bound for matching and registration with outliers, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
 [34] M. Grant, S. Boyd, CVX: Matlab software for disciplined convex programming, version 1.21, http://cvxr.com/cvx (2011).
 [35] A. M. Martinez, R. Benavente, The AR Face Database, CVC, Tech. Rep. 1998.
 [36] A. S. Georghiades, P. N. Belhumeur, D. J. Kriegman, From few to many: Illumination cone models for face recognition under variable lighting and pose, IEEE Trans. Patt. Anal. Mach. Intell 23 (6) (2001) 643–660.
 [37] T. Sim, S. Baker, M. Bsat, The cmu pose, illumination, and expression database, IEEE Trans. Patt. Anal. Mach. Intell 25 (2003) 1615–1618.
 [38] V. J. Yohai, High breakdownpoint and high efficiency robust estimates for regression, Ann. Stat. (1987) 642–656.
 [39] K. Lee, J. Ho, D. Kriegman, Acquiring linear subspaces for face recognition under variable lighting, IEEE Trans. Patt. Anal. Mach. Intell 27 (5) (2005) 684–698.

[40]
X. He, D. Cai, P. Niyogi, Laplacian score for feature selection, in: Proc. Advances Neural Info. Process. Syst., 2005.
 [41] P. J. Phillips, W. T. Scruggs, A. J. O’Toole, P. J. Flynn, K. W. Bowyer, C. L. Schott, M. Sharpe, FRVT 2006 and ICE 2006 largescale experimental results, IEEE Trans. Patt. Anal. Mach. Intell 32 (2010) 831–846.
 [42] L. Masek, P. Kovesi, MATLAB source code for a biometric identification system based on iris patterns (2003).
Comments
There are no comments yet.