Fast Approximate L_infty Minimization: Speeding Up Robust Regression

by   Fumin Shen, et al.
The University of Adelaide

Minimization of the L_∞ norm, which can be viewed as approximately solving the non-convex least median estimation problem, is a powerful method for outlier removal and hence robust regression. However, current techniques for solving the problem at the heart of L_∞ norm minimization are slow, and therefore cannot scale to large problems. A new method for the minimization of the L_∞ norm is presented here, which provides a speedup of multiple orders of magnitude for data with high dimension. This method, termed Fast L_∞ Minimization, allows robust regression to be applied to a class of problems which were previously inaccessible. It is shown how the L_∞ norm minimization problem can be broken up into smaller sub-problems, which can then be solved extremely efficiently. Experimental results demonstrate the radical reduction in computation time, along with robustness against large numbers of outliers in a few model-fitting problems.



There are no comments yet.


page 1

page 2

page 3

page 4


Efficient Outlier Removal for Large Scale Global Structure-from-Motion

This work addresses the outlier removal problem in large-scale global st...

Scalable Algorithms for Tractable Schatten Quasi-Norm Minimization

The Schatten-p quasi-norm (0<p<1) is usually used to replace the standar...

Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications

The heavy-tailed distributions of corrupted outliers and singular values...

Robust Mean Estimation in High Dimensions via Global Outlier Pursuit

We study the robust mean estimation problem in high dimensions, where le...

An algorithm based on continuation techniques for minimization problems with highly non-linear equality constraints

We present an algorithm based on continuation techniques that can be app...


We propose a new optimization algorithm for Multiple Kernel Learning (MK...

l_1-regularized Outlier Isolation and Regression

This paper proposed a new regression model called l_1-regularized outlie...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Linear least-squares (LS) estimation, or linear norm minimization (denoted as

for brevity throughout this paper), is widely used in computer vision and image analysis due to its simplicity and efficiency. Recently the

norm technique has been applied to recognition problems such as face recognition LRC10 , ShiEHS11

. All of these methods are linear regression-based and the regression residual is utilized to make the final classification. However, a small number of outliers can drastically bias

, leading to low quality estimates. Clearly, robust regression techniques are critical when outliers are present.

The literature contains a range of different approaches to robust regression. One commonly used method is the M-estimator framework M_Estimators1973 , Huber1981 , where the Huber function is minimized, rather than the conventional norm. Related methods include L-estimators LEstimator1987 and R-estimators REstimator1971 . One drawback of these methods is that they are still vulnerable to bad leverage outliers Rousseeuw2008 . By bad leverage points, we mean those observations who are outlying in -space and do not follow the linear pattern of the majority. Least median of squares (LMS) LMS1984 , least trimmed squares (LTS) Rousseeuw:1987:RRO:40031 and the technique using data partition and M-estimation park2012robust have high-breakdown points. Although each of these regression methods is, in general, more robust than , they have rarely been applied to object recognition problems in computer vision due to their computational expense RLRC2012 , Fidler2006 .

Another class of robust methods have been developed to remove these abnormal observations from the measurement data. One of the most popular methods is RANSAC RANSAC which attempts to maximise the size of the consensus set. RANSAC relies on iterative random sampling and consensus testing where the size of each sample is determined by the minimum number of data points require to compute a single solution. RANSAC’s efficiency is therefore directly tied to the time and number of data points required to compute a solution. For example, RANSAC has been successfully applied to multiview structure-from-motion and homography estimation problems. However, it is unclear how to apply RANSAC to visual recognition problems, e.g., face recognition, where face images are usually in a high-dimensional space.

Sim and Hartley Sim2006 proposed an outlier-removing method using the norm, which iteratively fits a model to the data and removes the measurement with the largest residual at each iteration. Generally, the iterative method can fail for the optimization problems, however it is valid for a wide class of

problems. Sim and Hartley proved that the set of measurements with largest residual must contain at least one outlier. Hence continuing to iterate eventually removes all the outliers. This method is shown to be effective in outlier detection for multiview geometry problems

Sim2006 , Olsson2010 .

norm minimization can be time-consuming, since at each step one needs to solve an optimization problem via Second-Order Cone Programming (SOCP) or Linear Programming (LP) in the application of multi-view geometry

Kahl2005 , Ke2005 . The software package SeDuMi sedumi99 provides solvers for both SOCP and LP problems.

In this paper, we propose a fast algorithm to minimize the norm for approximating the least median estimation (denoted as for brevity throughout the paper). Observing that the norm is determined by only a few measurements, the optimization strategy column generation (CG) Lubbecke05 can be applied to reduce the main problem into a set of much smaller sub-problems. Each sub-problem can be formulated as a Quadratic Programming (QP) problem. Due to its relatively small size, the QP problem can be solved extremely efficiently using customized solvers. In particular, we can generate solvers using the technique introduced by Mattingley and Boyd CVXGEN

. This reduction results in a speedup of several orders of magnitude for high dimensional data.

This degree of speedup allows to be applied to problems which were previously inaccessible. We show how the outlier removal technique can be applied to several classification problems in computer vision. Representations of objects in this type of problems are often derived by using to solve equations containing samples in the same class (or other collaborative classes) LRC10 , ShiEHS11 . Representation errors are then taken into classification where the query object is assigned to the class corresponding to the minimal residual. This method is shown to be effective on data without occlusion or outliers. However, in real-world applications, measurement data are almost always contaminated by noises or outliers. Before a robust representation can be obtained via linear estimation, outlier removal is necessary. The proposed method is shown to significantly improve the classification accuracies in our experiments for face recognition and iris recognition on several public datasets.

2 Related work

Hartley and Schaffalitzky Hartley04b seek a globally optimal solution for multi-view geometry problems via norm optimization, based on the fact that many geometry problems in computer vision have a single local, and hence global, minimum under the norm. In contrast, the commonly used cost function typically has multiple local minimum Hartley04b , Sim2006 . This work has been extended by several authors, yielding a large set of geometry problems whose globally optimal solution can be found using the norm (Olsson provides a summary Olsson07 ).

It was observed that these geometry problems are examples of quasiconvex optimization problems, which are typically solved by a sequence of SOCPs using a bisection (binary search) algorithm Kahl2005 , Ke2005 . Olsson et al. Olsson07 show that the functions involved in the norm problems are in fact pseudoconvex which is a stronger condition than quasiconvex. As a consequence, several fast algorithms have been proposed Olsson07 , Hongdong09 .

Sim and Hartley Sim2006 propose a simple method based on the norm for outlier removal, where measurements with maximal residuals are thrown away. The authors prove that at least one outlier is removed at each iteration, meaning that all outliers will be rejected in a finite number of iterations. However the method is not efficient, since one need to solve a sequence of SOCPs. Observing that many fixed-dimensional geometry problems are actually instances of LP-type problem, an LP-type framework was proposed for the multi-view triangulation problem with outliers Hongdong07 .

Recently, the Lagrange dual problem of the minimization problem posed in Hartley04b was derived in Olsson2010 . To further boost the efficiency of the method, the authors of Olsson2010 proposed an -minimization algorithm for outlier removal. While the aforementioned methods add a single slack variable and repeatedly solve a feasibility problem, the algorithm adds one slack variable for each residual and then solves a single convex program. While efficient, this method is only successful on data drawn from particular statistical distributions.

Robust statistical techniques, including the aforementioned robust regression and outlier removal methods, can significantly improve the performance of their classic counterparts. However, they have rarely been applied in image analysis field, to problems such as visual recognition, due to their computational expense. The M-estimator method is utilized in RLRC2012 for face recognition and achieved high accuracy even when illumination change and pixel corruption were present. In Fidler06 , the authors propose a theoretical framework combining reconstructive and discriminative subspace methods for robust classification and regression. This framework acts on subsets of pixels in images to detect outliers.

The reminder of this paper is organized as follows. In Section 3 we briefly review the and problems. In Section 4, the main outlier removal algorithm is presented. In Section 5, we formulate the norm minimization problem into a set of small sub-problems which can be solved with high efficiency. We then apply the outlier removal technique in Section 6 to several visual recognition applications. Finally the conclusion is given in Section 8.

3 The and norm minimization problems

In this section, we briefly present the norm minimization problem in the form we use in several recognition problems. Let us first examine the norm minimization problem,


for which we have a closed-form solution111The closed-form solution can only be obtained when is over-determined, i.e., . When

, one can solve the multicollinearity problem by ridge regression, or another variable selection method, to obtain a unique solution.


where is the measurement data matrix, composed of rows , and usually

. The model’s response is represented by the vector

and stores the parameters to be estimated. Note that in our visual recognition applications, both and the columns of are images flattened to vectors. According to the linear subspace assumption Basri00lambertianreflectance , a probe image can be approximately represented by a linear combination of the training samples of the same class: . Due to its simplicity and efficacy, the linear representation method is widely used in various image analysis applications, e.g., LRC10 , ShiEHS11 , CRCzhanglei2011 , Wright09 , Yang20121104 , Iris2011 .

The norm minimization aims to minimize the sum of squared residuals (1), where the terms , are the squared residuals. norm minimization is simple and efficient, however it utilizes the entire data set and therefore can be easily influenced by outliers.

Instead of minimizing the sum of squared residuals, the - norm minimization method seeks to minimize only the maximal residual, leading to the following formulation:


This equation has no closed-form solution, however it can be easily reformulated into a constrained formulation, with an auxiliary variable:


which is clearly a SOCP problem. If we take the absolute value of the residual in (3), we obtain


leading to an LP problem


A critical advantage of the norm cost function is that it has a single global minimum in many multi-view geometry problems Hartley04b , Hongdong07 . Unfortunately, like norm minimization, the norm minimization method is also vulnerable to outliers. Moreover, minimizing the norm fits to the outliers, rather than the data truly generated by the model Sim2006 . Therefore, it is necessary to first reject outliers before the estimation.

4 Outlier removal via maximum residual

In Sim2006 , outlier removal is conducted in an iterative fashion by first minimizing the norm, then removing the measurements with maximum residual and then repeating. The measurements with maximum residual are referred to as the support set of the minimax problem, i.e.,


where is the optimum residual of the minimax problem. The outlier removal strategy does not work well for the general minimization problems (i.e., ), because the outliers are not guaranteed to be included in the support set. In contrast, this strategy is valid for the minimization problems. For problem (3) or (5), it is proved by the following theorems that the measurements with largest residual must contain at least one outlier.

Suppose the index vector is composed of and , the inlier and outlier sets respectively, and there exists such that . Then we have the following theorem.

Theorem 1.

Sim2006 Consider the norm minimization problem (3) or (5) with the optimal residual . If there exists an inlier subset for which , then the support set must contain at least one index , that is, an outlier.

Following Theorem 2 in Sim2006 , Theorem 1 can be easily proved based on the following theorem.

Theorem 2.

If is not in the support set for the minimax problem (3) or (5), then removing the measurement with respect to will not decrease the optimal residual . Formally, if , then


It is not difficult to verify that both the residual error function and are convex, and therefore also quasiconvex Boyd . Furthermore these two error functions are also strictly convex then also strictly quasiconvex. Then due to Corollary 1 in Sim2006 (omitted here), the theorem holds. ∎

1:  Input: the measurement data matrix ; the response vector ; outlier percent .
2:  Initialization: ; number of measurements to be removed ; index .
3:  while   do
4:     Solve the norm minimization problem: ; get the support set via equation (7).
5:     Remove the measurements with indices in , i.e., .
6:     Remedy. Solve the minimax problem again with the new index and get the optimal residual and parameter . Move the indices = back to .
7:     .
8:  end while
9:  Output: and with measurement index .
Algorithm 1 Outlier removal using the norm

At each iteration, we first obtain the optimal parameters by solving (3) or (5) and then remove the measurements (pixels in images) corresponding to largest residual. If we continue the iteration, all outliers are eventually removed.

As with all outliers removal processes, there is a risk that discarding a set of outliers will remove some inliers at the same time. In this framework, the outliers are individual pixels, which are in good supply in visual recognition applications. For example, a face image will typically contain hundreds or thousands of pixels. Removing a small fraction of the good pixels is therefore unlikely to affect recognition performance. However, if too many pixels are removed, the remaining pool may be too small for successful recognition. Therefore, we propose a process to restore incorrectly removed pixels where possible, as part of the overall outlier removal algorithm list in Algorithm 1

. In practice, the heuristic remedy step does improve the performance of our method on the visual recognition problems in our experiments. Also note that it is impossible that all points in the support set are moved back in step 6, which is because, based on Theorem

2 we can prove that

5 A fast algorithm for the norm minimization problem

Recalling Theorem 2, we may remove any data not in the support set without changing the value of the norm. This property allows us to subdivide the large problem into a set of smaller sub-problems. We will proceed by first presenting a useful definition of pseudoconvexity:

Definition 1. A function is called pseudoconvex if is differentiable and implies .

In this definition, has to be differentiable. However the notion of pseudoconvexity can be generalized to non-differentiable functions pseudoconvex2001 :

Definition 2. A function is called pseudoconvex if for all :


where is subdifferential of .

Both of these definitions share the property that any local minimum of a pseudoconvex function is also a global minimum. Based on the first definition, it has been proved that if the residual error functions are pseudoconvex and differentiable, the cardinality of the support set is not larger than Olsson08 , Hongdong09 . Following the proof in [Hongdong09, ], one can easily validate the following corollary:

Corollary 1. For the minimax problem with pseudoconvex residual functions (differentiable or not), there must exist a subset such that


It is clear that the squared residual functions in (3) are convex and differentiable, hence also, pseudoconvex. The absolute residual function (5): is sub-differentiable. It is easy to verify that the only non-differentiable point, the origin, satisfies the second definition. Therefore function (5) also satisfies Corollary 1.

The above corollary says that we can solve a sub-problem with at most measurements without changing the estimated solution to the original minimax problem. However before solving the sub-problems, we should first determine the support set. We choose to solve a set of small sub-problems using an optimization method called column generation (CG) Lubbecke05 . The CG method adds one constraint at a time to the current sub-problem until an optimal solution is obtained.

The process is as follows: We first choose measurements not contained in the support set. These data are then used to compute a solution and residuals for all the data in the main problem are determined. Then the most violated constraint, or the measurement corresponding to the largest residual, is added to the sub-problem. The sub-problem is then too large, therefore we solve the sub-problem again (now with size ) and remove an inactive measurement. Through this strategy, the problem is prevented from growing too large, and violating Corollary 1. When there are no violated constraints, we have obtained the optimal solution.

The proposed fast method is presented in Algorithm 2. We divide the data into an active set, corresponding to a sub-problem, and the remaining set with the norm minimization. This algorithm allows us to solve the original problem with the measurement matrix of size , by solving a series of small problems with size or . In most visual recognition problems, . Typically the algorithm converges in less than 30 iterations in all of our experiments. We will show that this strategy radically improves computational efficiency.

For maximal efficiency, we choose to solve the LP problem, (3), and utilize the code generator CVXGEN CVXGEN to generate custom, high speed solvers for the sub-problems in algorithm 2. CVXGEN is a software tool that automatically generates customized C code for LP or QP problems of modest size. CVXGEN is less effective on large problems (e.g., variables). However, in Algorithm 2 we convert the original problem into a set of small sub-problems, which can be efficiently solved with the generator. CVXGEN embeds the problem size into the generated code, restricting it to fixed-size problems CVXGEN . The proposed method is only ever solves problems of size or , enabling the use of CVXGEN.

1:  Input: The measurement data matrix ; the response vector ; maximum iteration number .
2:  Initialization: Initialize the active set with indices corresponding to the largest absolute residuals from vector { }, using the LS solution as in (2); set ; set iteration counter .
3:  Solve the -minimization sub-problem
and set .
4:  while  do
5:     Get the most violated measurement from the remaining set :
6:     Check for optimal solution: if ,  then break (problem solved).
7:     Update the active and remaining set:,   .
8:     Solve the -minimization sub-problem in (11).
9:     Move the inactive measurement index to the remaining set: ,  
10:     .
11:  end while
12:  Output: .
Algorithm 2 A fast algorithm for the problem

6 Experimental Results

In this section, we first illustrate the effectiveness and efficiency of our algorithm on several classic geometric model fitting problems. Then the proposed method is evaluated on face recognition problems with both artificial and natural contiguous occlusions. Finally, we test our method on the iris recognition problem, where both segmentation error and occlusions are present. For comparison, we also evaluate several other representative robust regression methods on face recognition problems.

Once the outliers have been removed from the data set, any solver can be used to obtain the final model estimate. We implemented the original minimax algorithm using Matlab package CVX cvx with SeDuMi solver sedumi99 while the proposed fast algorithm was implemented using solvers generated by CVXGEN CVXGEN . All experiments are conducted in Matlab running on a PC with a Quad-Core 3.07GHz CPU and 12GB of RAM, using mex to call the solvers from CVXGEN. Note that the algorithm makes no special effort to use multiple cores, though Matlab itself may do so if possible.

6.1 Geometric model fitting

6.1.1 Line fitting

Figure 1: Two examples demonstrating the performance of our algorithm on data contaminated on one-side (left) and two-sides (right). Outliers detected by our algorithm are marked with circles. The solid black line is the result after outlier removal and the red dashed line shows the result with outliers included. In these two cases, we set and , where and are described in Section 6.1.1.

Figure 1 shows estimation performance when our algorithm is used for outlier removal and the line is subsequently estimated via least squares, on data generated under two different error models.

We generate randomly and randomly. We then set the first error terms

as independent standard normal random variables. We set the last

error terms as independent chi squared random variables with degrees of freedom. We also test using the two-sided contamination model which sets the sign of the last variables randomly such that the outliers lie on both side of the true regression line. In both cases we set .

As can be seen in Figure 1, our method detects all of the outliers and consequently generates a line estimate which fits the inlier set well for both noise models, whilst the estimate obtained with the outliers included achieves a reasonable estimate only for the two-sided contamination case, where the outliers are evenly distributed on both sides of the line.

6.1.2 Ellipse fitting

An example of the performance of our method applied to ellipse fitting is shown in Figure 2. points were sampled uniformly around the perimeter of an ellipse centred at , and where then perturbed via offset drawn from .

outliers were randomly drawn from an approximately uniform distribution within the bounding box shown in Figure 

2. The result of the method, again shown in Figure 2, shows that our method has correctly identified the inlier and outlier sets, and demonstrates that the centre and radius estimated by our method are accurate.

Method Number of observations
20 50 100 200 500 1000 2000 10000
original 0.185 0.454 0.89 1.926 5.093 11.597 29.968 313.328
fast 0.002 0.005 0.011 0.031 0.083 0.164 0.395 4.127
Table 1: Computation time (in seconds) comparison of the original and fast algorithms implemented with different solvers. For line fitting problem, we fixed and the observation number is increased from 20 to 10000.
Figure 2: An example of the performance of our method in ellipse fitting. Points identified as outliers are marked by circles.

6.1.3 Efficiency

Next we compare the computational efficiency of the standard norm outlier removal process and of the proposed fast algorithm.

For the line fitting problem we generate the data using the scheme described previously. We initially fix the data dimension and increase the problem size from 20 to 10000. The outlier fraction, , is set 90% of . A comparison of the running time for these two algorithms are shown in Table 1. The fast algorithm finishes 70 to 80 times faster than the original algorithm. Specifically, with dimension 10000 the fast algorithm finishes in approximately 4 seconds, whilst the original algorithm requires more than 5 minutes. In this case, the proposed fast approach is about 80 times faster than the conventional approach.

Second, we fix the number of observations and vary the data dimension from 2 to 10. Execution times are shown in Table 2. Consistent with the last experiment, the fast algorithm completes far more rapidly than the original algorithm in all situations. When , the proposed fast algorithm is faster than the original algorithm by more than 60 times. With a larger dimension , the proposed fast algorithm takes only 0.128 seconds to complete while the original algorithm requires more than 2 seconds.

Method Feature dimension
2 4 6 8 10
original 1.926 1.984 1.994 2.016 2.039
fast  0.031 0.045 0.068 0.096 0.128
Table 2: Computation time (in seconds) comparison of the original and fast algorithms implemented with different solvers. Data is generated as 200 observations with dimension varying from 2 to 10.

6.2 Robust face recognition

In this section, we test our method on face recognition problems from 3 datasets: AR AMM98 , Extended Yale B GeBeKr01 , and CMU-PIE Sim03thecmu . A range of state-of-the-art algorithms are compared to the proposed method. Recently, sparse representation based classification (SRC) Wright09 obtained an excellent performance for robust face recognition problems, especially with contiguous occlusions. The SRC problem solves , where is the training data from all classes, is the corresponding coefficient vector and is the error tolerance. To handle occlusions, SRC is extended to where , and . and

are the identity matrix and error vector respectively. SRC assigns the test image to the class with smallest residual:

. Here is a vector whose only nonzero entries are the entries in corresponding to the -th class. We also evaluate the following two methods which are related to our method. Most recently, a method called Collaborative Representation-based Classification (CRC) was proposed in CRCzhanglei2011 which relax the norm to norm. Linear regression classification (LRC) LRC10 cast face recognition as a simple linear regression problem: , where and are the training data and representative coefficients with respect to class . LRC selects the class with smallest residual: . Both CRC and LRC achieved competitive or even better results than SRC CRCzhanglei2011 , LRC10 in some cases.

For the purposes of the face recognition experiments, outlier pixels are first removed using our method, leaving the remaining inlier set to be processed by any regression based classifier. In the experiments listed below, LRC has been used for this purpose due to its computational efficiency.

Lots of robust regression estimators has been developed in the statistic literature. In this section, we also compare other two popular estimators, namely, Least median of squares (LMS) LMS1984 and MM-estimator yohai1987high , both of which have high-breakdown points and do not need to specify the number of outliers to be removed. Comparison is conducted on face recognition problems with both artificial and natural occlusions. These methods are first used to estimate the coefficient and face images are recognized by the minimal residuals.

Figure 3: Detecting the sunglasses occlusion of two example images with dimension from the AR dataset. Each row shows an original image of a particular subject, followed by an image where outlying pixels have been automatically marked in white and finally a reconstructed image of the subject.
Method Feature dimension
54 130 300 540
LRC 21.0% 38.5% 54.5% 60.0%
SRC 48.0% 67.5% 69.5% 64.5%
CRC 22.0% 35.5% 44.5% 56.0%
MM-estimator 0.5% 8.5% 21% 24%
LMS 9% 25% 37.5% 48%
our method 43.0% 85.0% 99.5% 100%
Table 3: Accuracy rates (%) of different methods on the AR dataset with sunglasses occlusion. The various feature dimension correspond to downsampling the original pixel images to , , and , respectively.

6.2.1 Faces recognition despite disguise

The AR dataset AMM98 consists of over facial images from subjects ( men and women). For each subject facial images were taken in two separate sessions, per session. The images exhibit a number of variations including various facial expressions (neutral, smile, anger, and scream), illuminations (left light on, right light on and all side lights on), and occlusion by sunglasses and scarves. Of the subjects available, have been randomly selected for testing (50 males and 50 females) and the images cropped to pixels. images of each subject with various facial expressions, but no occlusions, were selected for training. Testing was carried out on images of each of the selected subjects wearing sunglasses. Figure 3 shows two typical images from the AR dataset with the outliers (30% of all the pixels in the face images) detected by our method set to white. The reconstructed images are shown as the third and sixth images.

The images were downsampled to produce features of , , , and dimensions respectively. Table 3 shows a comparison of the recognition rates of various methods. Our method exhibits superior performance to LRC, CRC and SRC in all except the lowest feature dimension case. Specifically with feature dimension 540, the proposed method achieves a perfect accuracy 100%, which outperforms LRC, CRC and SRC by 40%, 44% and 35.5% respectively. MM-estimator and LMS failed to achieve good results on this dataset, which is mainly because the residuals of outliers severely affect the final classification although relatively accurate coefficients could be estimated. In this face recognition application, the final classification is based on the fitting residual. These results highlight the ability of our method for outlier removal, which can significantly improve the face recognition performance.

Figure 4: Detecting the random placed square monkey face (the first image) in an example face image (the second image) from the Extended Yale B dataset. Outliers (30% of the whole image) detected by our method are marked as white pixels (the third image) and the reconstructed image is shown on the rightmost.
Method Occlusion rate
10% 20% 30% 35% 40% 50%
our method
Table 4:

Mean and standard deviations of recognition accuracies (%) in the presence of randomly placed block occlusions of images from the Extended Yale B dataset based on 5 runs results.

6.2.2 Contiguous block occlusions

In order to evaluate the performance of the algorithm in the presence of artificial noise and larger occlusions, we now describe testing where large regions of the original image are replaced by pixels from another source. The Extended Yale B dataset GeBeKr01 was used as the source of the original images and consists of frontal face images from subjects under various lighting conditions. The images are cropped and normalized to pixels KCLee05 . Following Wright09 , we choose subsets 1 and 2 (715 images ) for training and Subset 3 (451 images) for testing. In our experiment, all the images are downsampled to pixels. We replace a randomly selected region, covering between 10% and 50% of each image, with a square monkey face. Figure 4 shows the monkey face, an example of an occluded image, the outlying pixels detected by our method and a reconstructed copy of the input image.

Table 4 compares the average recognition rates of the different methods, averaged over five separate runs. Our proposed method outperforms all other methods in all conditions. With small occlusions, all methods achieve high accuracy, however, the performance of LRC, SRC and CRC deteriorate dramatically as the size of the occlusion increases. In contrast, our method is robust in the presence of outliers. In particular, with 30% occlusion our method obtains 98.2% accuracy while recognition rates of all the other methods are below 80%. With 50% occlusion, all other methods show low performances, while the accuracy for our method is still above 85%. According to Table 4, we can also see that the proposed method is more stable in the sense of accuracy variations, which is mainly because the outliers are effectively detected.

6.2.3 Partial face features on the CMU PIE dataset

Figure 5: This figure demonstrates the detection of the dead pixels, covering about 13% of the input image. Each row shows a distinct example drawn from the CMU-PIE dataset. Outliers detected by our method are marked as white pixels in the centre column, followed by the reconstructed images in the last column.

As shown in the previous example, occlusion can significantly reduce face recognition performance, particularly in methods without outlier removal. Wright et al. Wright09 attempt to identify faces on the basis of particular sub-images, such as the area around the eye or ear, etc. Here, we use the complete face and remove increasing portions of the bottom half of the image, so that initially the neck is obscured, followed by the chin and mouth, etc. The removal occurs by setting the pixels to be black. Thus, the complete image is used as a feature vector, and a subset of the elements is set to zero. A second experiment is performed following the same procedure, but to the central section of the face, thus initially obscuring the nose, then the ears, eyes and mouth, etc.

In this experiment, we use the CMU-PIE dataset Sim03thecmu which contains 68 subjects and a total of 41368 face images. Each person has their picture taken with 13 different poses, 43 different illumination conditions, and with 4 different expressions. In our experiment all of the face images are aligned and cropped, with 256 gray level per pixel He05laplacianscore , and finally resized to pixels. Here we use the subset containing images of pose C27 (a nearly front pose) and we use the data from the first 20 subjects, each subject with 21 images. The first 15 images of each subject are used for training and the last 6 images for testing. The test images are preprocessed so that one part (bottom or middle) of faces (from to of pixels) are set to black. See Figure 5 for examples.

The recognition rates of different methods when the bottom area of the image is occluded are reported in Table 5. When occlusion area is small, all methods except MM-estimator obtain perfect 100% recognition rates. When occlusion area increases to 20% of the image size, accuracy for LRC drops to 80%, which is because the black pixels bias the linear regression estimate. The technique used in SRC mentioned above performs better than LRC when occlusion are present, achieving 98.3% accuracy. However our method is able to achieve 100% accuracy with that level of occlusion. We can see that CRC achieves a relatively good result (83.3%) for 30% occlusion, accuracies of other methods (including robust methods MM-estimator and LMS) drop dramatically. In contrast, our method still achieves 100% accuracy which demonstrates the robustness of our method against heavy occlusion. The comparison of these methods for occlusion in the middle part of faces is shown in Figure 6. These results again show the robustness of our method against heavy occlusions. Almost all the methods show lower accuracy than in the former situation. Such a results leads to the conclusion that information from the middle part of a face (area around nose) is more discriminative than that form the bottom part (area around chin) for face recognition.

Method Percentage of image removed
10% 20% 30%
LRC 100% 80.0% 61.7%
SRC 100% 98.3% 71.7%
CRC 100% 96.7% 83.3%
MM-estimator 99.2% 41.7% 15.0%
LMS 100% 77.5% 58.3%
our method 100% 100% 100%
Table 5: Recognition accuracies of various methods on the CMU-PIE dataset with dimension . to of pixels in the bottom area are replaced with black.
Method Percentage of image removed
10% 20% 30%
LRC 100% 91.7% 35.0%
SRC 100% 93.3% 65.0%
CRC 98.3% 87.5% 53.3%
MM-estimator 92.5% 7.5% 0.8%
LMS 100% 80.8% 22.5%
our method 100% 99.2% 90.0%
Table 6: Recognition accuracies of various methods on the CMU-PIE dataset with dimension . to of pixels in the middle area are replaced with black.

6.2.4 Efficiency

For the problem of identifying outliers in face images, we compare the computation efficiency using the AR face dataset, as described in Section 6.2.1. We vary the feature dimension from 54 to the original 19800. Table 7 shows the execution time for both the proposed fast algorithm and the original method. We can see that the fast algorithm outperforms the original in all situations. With low dimensional features, below 4800, the fast algorithm is approximately 20 times faster than the original. When the feature dimension increases to 19800, the original algorithm needs about 1.43 hours while the fast algorithm costs only about 6 minutes.

Method Feature dimension
54 300 1200 4800 19800
original 2.051 9.894 48.371 396.323 5150.137
fast 0.113 0.566 2.564 18.811 361.689
Table 7: Computation time (in seconds) of the original and fast algorithms when applied to the AR face dataset.

6.3 Robust iris recognition

Iris recognition is a commonly used non-contact biometric measure used to automatically identify a person. Occlusions can also occur in iris data acquisition, especially in unconstrained conditions, caused by eyelids, eyelashes, segmentation errors, etc. In this section we test our method against segmentation errors, which can result in outliers from eyelids or eyelashes. Specially we take the ND-IRIS-0405 dataset ND06 , which contains 64,980 iris images obtained from 356 subjects with a wide variety of distortions. In our experiment, each iris image is segmented by detecting the pupil and iris boundaries using the open-source package of Masek and Kovesi MasekIris03 . 80 subjects were selected and 10 images from each subject were chosen for training and 2 images for testing. To test outlier detection, segmentation errors and artificial occlusions were placed on the iris area, in a similar fashion as Iris2011 . A few example images and their detected iris and pupil boundaries are shown in Figure 6. The feature vector is obtained by warping the circular iris region into a rectangular block by sampling with a radial resolution 20 and angular resolution of 240 respectively. These blocks were then are then resized to . For our method, 10% of pixels are detected and removed when test images are with only segmentation errors, and the corresponding additional number of pixels are removed for artificial occlusions.

The recognition results are summarized in Table 8. SRC used in Iris2011 for iris recognition and LRC are compared with our method. We can clearly see that the proposed method achieved the best results with all feature dimensions. Specifically, our method achieves 96.3% accuracy when iris images are with only segmentation errors while accuracy for LRC is 89.5%. SRC performs well (95.6%) for this task. However when 10% additional occlusions occur in the test images, performances for LRC and SRC drop dramatically to 43.8% and 61.3% respectively, while our method still achieves the same result 96.3% as before. When occlusions increase to 20%, our method still obtains a high accuracy 95% which is higher than those of LRC and SRC by 74.4% and 43.7% respectively.

6.3.1 Efficiency

Table 9 shows the computation time comparison of different methods on the iris recognition problem. Consistent with the former results, the proposed algorithm is much more efficient than the original algorithm.

Figure 6: Three example images from the ND-IRIS-0405 dataset. Features are extracted from the iris area which is between the detected iris and pupil boundaries as shown in red and blue circles respectively. The first two iris images suffer from increasing segmentation errors while the third one suffers from both segmentation error and artificial occlusion.
Method Percentage of artificial occlusion
0% 10% 20%
LRC 89.5% 43.8% 20.6%
SRC 95.6% 61.3% 51.3%
our method 96.3% 96.3% 95%
Table 8: Classification accuracies on the ND iris dataset. Note that percentage of artificial occlusion 0% means there are only segmentation errors.
Method Image resolution
original 2.400 6.692 31.942 255.182
fast 0.131 0.385 1.941 14.906
Table 9: Computation time (in seconds) comparison of the original and fast algorithms on the Iris data set with different feature resolutions (shown in the first row).

7 Discussion

The main drawback of our method is that one have to first estimate the outlier percentage empirically as done by many other robust regression methods. Actually, to our knowledge, for almost all the outlier removal methods, one has to pre-set the outlier percentage or some other parameters such as a residual threshold. This is in contrast with those robust regression methods using a robust loss such as the Huber function or even nonconvex loss. These methods do not need to specify the outlier percentage.

One may concern how the proposed algorithm will perform with an under or over estimated . Taking the AR dataset for example, we evaluate our method by varying from 25% to 45%. From Table 10, we can see that the proposed method is not very sensitive to the pre-estimated outlier percentage when is over 30%. We also observe that our method becomes more stable when the image resolution is higher. This is mainly because, as mentioned before, visual recognition problems generally supply large amount of pixels by high dimensional images and consequently it is more crucial to reject as many outliers as possible than to keep all inliers.

Different from our approach, there exist many robust estimators which do not need to specify the outliers number, such as MM-estimator yohai1987high , LMS LMS1984 and DPM park2012robust . These methods can also be applied to visual recognition problems as we have shown in Section 6. However, the difference is that our method can directly identify the outliers, which can help compute more reliable residuals for classification as shown in Section 6. Of course, for these methods, observations can be detected as outliers when the corresponding standardized residuals exceed the cutoff point, which also has to be determined a priori though.

Dimension Percentage of removed pixels
25% 30% 35% 40% 45%
82% 85% 92.5% 95% 94.5%
98% 99.5% 100% 100% 99.5%%
Table 10: Recognition accuracies on the AR dataset with different percentage of outliers removed by our method. The feature dimension is set to and .

8 Conclusion

In this work, we have proposed an efficient method for minimizing the norm based robust least squares fitting, and hence for iteratively removing outliers. The efficiency of the method allows it to be applied to visual recognition problems which would normally be too large for such an approach. The method takes advantage of the nature of the norm to break the main problem into more manageable sub-problems, which can then be solved via standard, efficient, techniques.

The efficiency of the technique and the benefits that outlier removal can bring to visual recognition problems were highlighted in the experiments, with the computational efficiency and accuracy of the resultant recognition process easily beating all other tested methods.

Like many other robust fitting methods, the proposed method needs a parameter: the number of outliers to be removed. One may heuristically determine this value. Although it is not very sensitive for the visual recognition problems, in the future, we plan to investigate how to automatically estimate the outlier rate in the noisy data.


This work was in part supported by ARC Future Fellowship FT120100969. F. Shen’s contribution was made when he was visiting The University of Adelaide.

All correspondence should be addressed to C. Shen (


  • [1] I. Naseem, R. Togneri, M. Bennamoun, Linear regression for face recognition, IEEE Trans. Patt. Anal. Mach. Intell 32 (11) (2010) 2106–2112.
  • [2]

    Q. Shi, A. Eriksson, A. van den Hengel, C. Shen, Is face recognition really a compressive sensing problem?, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011, pp. 553–560.

  • [3] P. J. Huber, Robust regression: Asymptotics, conjectures and monte carlo, Ann. Math. Statist. 1 (5) (1973) 799–821.
  • [4] P. J. Huber, Robust Statistics, Wiley, New York, 1981, 2005.
  • [5] R. Koenker, S. Portnoy, L-estimation for linear models, J. Amer. Statistical Assoc. 82 (1987) 851–857.
  • [6] J. Jureckova, Nonparametric estimate of regression coefficients, Ann. Math. Statist. 42 (4) (1971) 1328–1338.
  • [7] M. Hubert, P. J. Rousseeuw, S. v. Aelst, High-breakdown robust multivariate methods, Statistical Science 23 (1) (2008) 92–119.
  • [8] P. J. Rousseeuw, Least median of squares regression, J. Amer. Statistical Assoc. 79.
  • [9] P. J. Rousseeuw, A. M. Leroy, Robust regression and outlier detection, John Wiley & Sons, Inc., New York, NY, USA, 1987.
  • [10] Y. Park, D. Kim, S. Kim, Robust regression using data partitioning and m-estimation, Commun. Stat-simul. C. 41 (8) (2012) 1282–1300.
  • [11] I. Naseem, R. Togneri, M. Bennamoun, Robust regression for face recognition, Patt. Recogn. 45 (1) (2012) 104–118.
  • [12] S. Fidler, D. Skocaj, A. Leonardis, Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling, IEEE Trans. Patt. Anal. Mach. Intell 28 (3) (2006) 337–350.
  • [13] M. A. Fischler, R. C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Communication the ACM 24 (1981) 381–395.
  • [14] K. Sim, R. Hartley, Removing outliers using the norm, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006, pp. 485–494.
  • [15] C. Olsson, A. Eriksson, R. Hartley, Outlier removal using duality, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010, pp. 1450–1457.
  • [16] F. Kahl, Multiple view geometry and the -norm, in: Proc. Int. Conf. Computer Vision, Vol. 2, 2005, pp. 1002–1009.
  • [17] Q. Ke, T. Kanade, Quasiconvex optimization for robust geometric reconstruction, in: Proc. Int. Conf. Computer Vision, Vol. 2, 2005, pp. 986 –993 Vol. 2.
  • [18] J. F. Sturm, Using sedumi 1.02, a MATLAB toolbox for optimization over symmetric cones, Optim. Method Softw. 11-12 (1999) 625–653.
  • [19] M. E. Lübbecke, J. Desrosiers, Selected topics in column generation, Oper. Res. 53 (6) (2005) 1007–1023.
  • [20] J. Mattingley, S. Boyd, CVXGEN: a code generator for embedded convex optimization, Optim. Eng. 13 (2012) 1–27.
  • [21] R. I. Hartley, F. Schaffalitzky, minimization in geometric reconstruction problems, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
  • [22] C. Olsson, A. Eriksson, F. Kahl, Efficient optimization for -problems using pseudoconvexity, in: Proc. Int. Conf. Computer Vision, 2007.
  • [23] H. Li, Efficient reduction of geometry problems, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009, pp. 2695–2702.
  • [24] H. Li, A practical algorithm for triangulation with outliers, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
  • [25] S. Fidler, D. Skocaj, A. Leonardis, Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling, IEEE Trans. Patt. Anal. Mach. Intell 28 (3) (2006) 337–350.
  • [26] R. Basri, D. Jacobs, Lambertian reflectance and linear subspaces, IEEE Trans. Patt. Anal. Mach. Intell 25 (2) (2003) 218–233.
  • [27] L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: Which helps face recognition?, in: Proc. Int. Conf. Computer Vision, 2011, pp. 471–478.
  • [28] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Trans. Patt. Anal. Mach. Intell 31 (2009) 210–227.
  • [29] J. Yang, L. Zhang, Y. Xu, J. Yang, Beyond sparsity: The role of -optimizer in pattern classification, Patt. Recogn. 45 (3) (2012) 1104–1118.
  • [30] J. Pillai, V. Patel, R. Chellappa, N. Ratha, Secure and robust iris recognition using random projections and sparse representations, IEEE Trans. Patt. Anal. Mach. Intell 33 (9) (2011) 1877–1893.
  • [31] S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004.
  • [32] Generalized monotone multivalued maps, in: C. Floudas, P. Pardalos (Eds.), Encyclopedia of Optimization, 2001, pp. 764–769.
  • [33] C. Olsson, O. Enqvist, F. Kahl, A polynomial-time bound for matching and registration with outliers, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
  • [34] M. Grant, S. Boyd, CVX: Matlab software for disciplined convex programming, version 1.21, (2011).
  • [35] A. M. Martinez, R. Benavente, The AR Face Database, CVC, Tech. Rep. 1998.
  • [36] A. S. Georghiades, P. N. Belhumeur, D. J. Kriegman, From few to many: Illumination cone models for face recognition under variable lighting and pose, IEEE Trans. Patt. Anal. Mach. Intell 23 (6) (2001) 643–660.
  • [37] T. Sim, S. Baker, M. Bsat, The cmu pose, illumination, and expression database, IEEE Trans. Patt. Anal. Mach. Intell 25 (2003) 1615–1618.
  • [38] V. J. Yohai, High breakdown-point and high efficiency robust estimates for regression, Ann. Stat. (1987) 642–656.
  • [39] K. Lee, J. Ho, D. Kriegman, Acquiring linear subspaces for face recognition under variable lighting, IEEE Trans. Patt. Anal. Mach. Intell 27 (5) (2005) 684–698.
  • [40]

    X. He, D. Cai, P. Niyogi, Laplacian score for feature selection, in: Proc. Advances Neural Info. Process. Syst., 2005.

  • [41] P. J. Phillips, W. T. Scruggs, A. J. O’Toole, P. J. Flynn, K. W. Bowyer, C. L. Schott, M. Sharpe, FRVT 2006 and ICE 2006 large-scale experimental results, IEEE Trans. Patt. Anal. Mach. Intell 32 (2010) 831–846.
  • [42] L. Masek, P. Kovesi, MATLAB source code for a biometric identification system based on iris patterns (2003).