Solving Principal Component Pursuit in Linear Time via l_1 Filtering

08/26/2011 ∙ by Risheng Liu, et al. ∙ Dalian University of Technology Peking University 0

In the past decades, exactly recovering the intrinsic data structure from corrupted observations, which is known as robust principal component analysis (RPCA), has attracted tremendous interests and found many applications in computer vision. Recently, this problem has been formulated as recovering a low-rank component and a sparse component from the observed data matrix. It is proved that under some suitable conditions, this problem can be exactly solved by principal component pursuit (PCP), i.e., minimizing a combination of nuclear norm and l_1 norm. Most of the existing methods for solving PCP require singular value decompositions (SVD) of the data matrix, resulting in a high computational complexity, hence preventing the applications of RPCA to very large scale computer vision problems. In this paper, we propose a novel algorithm, called l_1 filtering, for exactly solving PCP with an O(r^2(m+n)) complexity, where m× n is the size of data matrix and r is the rank of the matrix to recover, which is supposed to be much smaller than m and n. Moreover, l_1 filtering is highly parallelizable. It is the first algorithm that can exactly solve a nuclear norm minimization problem in linear time (with respect to the data size). Experiments on both synthetic data and real applications testify to the great advantage of l_1 filtering in speed over state-of-the-art algorithms.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 7

page 9

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Robustly recovering the intrinsic low-dimensional structure of high-dimensional visual data, which is known as robust principal component analysis (RPCA), plays a fundamental role in various computer vision tasks, such as face image alignment and processing, video denoising, structure from motion, background modeling, photometric stereo and texture representation (see e.g., Wright et al (2009), Ji et al (2010), De la Torre and Black (2003), Peng et al (2010), Wu et al (2010), and Zhang et al (2012), to name just a few). Through the years, a large number of approaches have been proposed for solving this problem. The representative works include De la Torre and Black (2003), Nie et al (2011), Aanes et al (2002), Baccini et al (1996), Ke and Kanade (2005), Skocaj et al (2007), and Storer et al (2009). The main limitation of above mentioned methods is that there is no theoretical guarantee for their performance. Recently, the advances in compressive sensing have led to increasingly interests in considering RPCA as a problem of exactly recovering a low-rank matrix from corrupted observations , where is known to be sparse (Wright et al (2009), Candés et al (2011)). Its mathematical model is as follows:

(1)

where is the norm of a matrix, i.e., the number of nonzero entries in the matrix.

Unfortunately, problem (1) is known to be NP-hard. So Candés et al (2011) proposed using principal component pursuit (PCP) to solve (1), which is to replace the rank function and norm with the nuclear norm (which is the sum of the singular values of a matrix, denoted as ) and norm (which is the sum of the absolute values of the entries), respectively. More specifically, PCP is to solve the following convex problem instead:

(2)

They also rigorously proved that under fairly general conditions and , PCP can exactly recover the low-rank matrix

(namely the underlying low-dimensional structure) with an overwhelming probability, i.e., the difference of the probability from 1 decays exponentially when the matrix size increases. This theoretical analysis makes PCP distinct from previous methods for RPCA.

All the existing algorithms for RPCA need to compute either SVD or matrix-matrix multiplications on the whole data matrix. So their computation complexities are all at least quadratic w.r.t. the data size, preventing the applications of RPCA to large-scale problems when the time is critical. In this paper, we address the large-scale RPCA problem and propose a truly linear cost method to solve the PCP model (2) when the data size is very large while the target rank is relatively small. Such kind of data is ubiquitous in computer vision.

1.1 Main Idea

Our algorithm fully utilizes the properties of low-rankness. The main idea is to apply PCP to a randomly selected submatrix of the original noisy matrix and compute a low rank submatrix. Using this low rank submatrix, the true low rank matrix can be estimated efficiently, where the low rank submatrix is part of it.

Specifically, our method consists of two steps (illustrated in Figure 1). The first step is to recover a submatrix111Note that the “submatrix” here does not necessarily mean that we have to choose consecutive rows and columns from . (Figure 1 (e)) of . We call this submatrix the seed matrix because all other entries of can be further calculated by this submatrix. The second step is to use the seed matrix to recover two submatrices and (Figures 1 (f)-(g)), which are on the same rows and columns as in , respectively. They are recovered by minimizing the distance from the subspaces spanned by the columns and rows of , respectively. Hence we call this step filtering. The remaining part (Figure 1 (h)) of can be represented by , and , using the generalized Nyström method (Wang et al (2009)). As analyzed in Section 3.4, our method is of linear cost with respect to the data size. Besides the advantage of linear time cost, the proposed algorithm is also highly parallel: the columns of and the rows of can be recovered fully independently. We also prove that under suitable conditions, our method can exactly recover the underling low-rank matrix with an overwhelming probability. To our best knowledge, this is the first algorithm that can exactly solve a nuclear norm minimization problem in linear time.

(a) (b) (c) (d)
(e) (f) (g) (h)

Figure 1: Illustration of the proposed filtering method. A large observed data matrix (a) is the sum of a low-rank matrix (b) and a sparse matrix (c). The method first recovers a seed matrix (a submatrix of ) (e). Then the submatrices (f) and (g) can be recovered by column and row filtering, respectively, where and are the column space and row space of , respectively. Then the complement matrix (h) can be represented by , and . Finally, we obtain the computed low-rank matrix (d), which is identical to with an overwhelming probability.

2 Previous Works

In this section, we review some previous algorithms for solving PCP. The existing solvers can be roughly divided into three categories: classic convex optimization, factorization and compressed optimization.

For small sized problems, PCP can be reformulated as a semidefinite program and then be solved by standard interior point methods. However, this type of methods cannot handle even moderate scale matrices due to their complexity in each iteration. So people turned to first-order algorithms, such as the dual method (Ganesh et al (2009)), the accelerated proximal gradient (APG) method (Ganesh et al (2009)) and the alternating direction method (ADM) (Lin et al (2009)), among which ADM is the most efficient. All these methods require solving the following kind of subproblem in each iteration

(3)

where is the Frobenious norm. Cai et al (2010) proved that the above problem has a closed form solution

(4)

where is the singular value decomposition of and is the soft shrinkage operator. Therefore, these methods all require computing SVDs for some matrices, resulting in complexity, where is the matrix size.

As the most expensive computational task required by solving (2) is to perform SVD, Lin et al (2009) adopted partial SVD (Larsen (1998)) to reduce the complexity at each iteration to , where is the target rank. However, such a complexity is still too high for very large data sets. Drineas et al (2006) developed a fast Monte Carlo algorithm, named linear time SVD (LTSVD), which can be used for solving SVDs approximately (also see Halko et al (2011)). The main drawback of LTSVD is that it is less accurate than the standard SVD as it uses random sampling. So the whole algorithm needs more iterations to achieve the same accuracy. As a consequence, the speed performance of LTSVD quickly deteriorates when the target rank increases (see Figure 2). Actually, even adopting LTSVD the whole algorithm is still quadratic w.r.t. the data size because it still requires matrix-matrix multiplication in each iteration.

To address the scalability issue of solving large-scale PCP problems, Shen et al (2011)

proposed a factorization based method, named low-rank matrix fitting (LMaFit). This approach represents the low-rank matrix as a product of two matrices and then minimizes over the two matrices alternately. Although they do not require nuclear norm minimization (hence the SVDs), the convergence of the proposed algorithm is not guaranteed as the corresponding problem is non-convex. Moreover, both the matrix-matrix multiplication and the QR decomposition based rank estimation technique require

complexity. So this method does not essentially reduce the complexity.

Inspired by compressed optimization, Mu et al (2011) proposed reducing the problem scale by random projection (RP). However, this method is highly unstable – different random projections may lead to radically different results. Moreover, the need to introduce additional constraint to the problem slows down the convergence. And actually, the complexity of this method is also , where is the size of the random projection matrix and . So this method is still not of linear complexity with respect to the matrix size.

3 The Filtering Algorithm

Given an observed data matrix , which is the sum of a low-rank matrix and a sparse matrix , PCP is to recover from . What our approach differs from traditional ones is that the underlying low-rank matrix is reconstructed from a seed matrix. As explained in Section 1.1, our filtering algorithm consists of two steps: first recovering a seed matrix, second performing filtering on corresponding rows and columns of the data matrix. Below we provide details of these two steps.

3.1 Seed Matrix Recovery

Suppose that the target rank is very small compared with the data size: . We first randomly sample an submatrix from , where and are the row and column oversampling rates, respectively. Then the submatrix of the underlying matrix can be recovered by solving a small sized PCP problem:

(5)

e.g., using ADM (Lin et al (2009)), where .

By Theorem 1.1 in (Candés et al (2011)), the seed matrix can be exactly recovered from with an overwhelming probability when and increases. In fact, by that theorem and should be chosen at the scale of . For the experiments conducted in this paper, whose ’s are very small, we simply choose .

3.2 Filtering

For ease of illustration, we assume that is the top left submatrix of . Then accordingly , and can be partitioned into:

(6)

Since , there must exist matrices and , such that

(7)

As is sparse, so are and . Therefore, and can be found by solving the following problems:

(8)

and

(9)

respectively. The above two problems can be easily solved by ADM.

With and computed, and are obtained as (7). Again by , the generalized Nyström method (Wang et al (2009)) gives:

(10)

where is the Moore-Penrose pseudo inverse of .

In real computation, as the SVD of is readily available when solving (5), due to the singular value thresholding operation (4), it is more convenient to reformulate (8) and (9) as

(11)

and

(12)

respectively, where is the skinny SVD of obtained from (4) in the iterations. Such a reformulation has multiple advantages. First, as , it is unnecessary to compute the inverse of and when updating and in the iterations of ADM. Second, computing (10) also becomes easy if one wants to form explicitly because now

(13)

To make the algorithm description complete, we sketch in Algorithm 1 the ADM for solving (11) and (12), which are both of the following form:

(14)

where and are known matrices and has orthonormal columns, i.e., . The ADM for (14) is to minimize on the following augmented Lagrangian function

(15)

with respect to and , respectively, by fixing other variables, and then update the Lagrange multiplier and the penalty parameter .222The ADM for solving PCP follows the same methodology. As a reader can refer to (Lin et al (2009), Yuan and Yang (2009)) for details, we omit the pseudo code for using ADM to solve PCP.

  Input: and .
  Initialize: Set , and to zero matrices. Set , and .
  while  do
     Step 1: Update ), where is the soft-thresholding operator (Cai et al (2010)).
     Step 2: Update .
     Step 3: Update and .
  end while
Algorithm 1 Solving (14) by ADM

Note that it is easy to see that (11) and (12) can also be solved in full parallelism as the columns and rows of and can computed independently, thanks to the decomposability of the problems. So the recovery of and is very efficient if one has a parallel computing platform, such as a general purpose graphics processing unit (GPU).

3.3 The Complete Algorithm

Now we are able to summarize in Algorithm 2 our filtering method for solving PCP, where steps 3 and 4 can be done in parallel.

  Input: Observed data matrix .
  Step 1: Randomly sample a submatrix .
  Step 2: Solve the small sized PCP problem (5), e.g., by ADM, to recover the seed matrix .
  Step 3: Reconstruct by solving (11).
  Step 4: Reconstruct by solving (12).
  Step 5: Represent by (13).
  Output: Low-rank matrix and sparse matrix .
Algorithm 2 The Filtering Method for Solving PCP (2)

3.4 Complexity Analysis

Now we analyze the computational complexity of the proposed Algorithm 2. For the step of seed matrix recovery, the complexity of solving (5) is only . For the filtering step, it can be seen that the complexity of solving (11) and (12) is and , respectively. So the total complexity of this step is . As the remaining part of can be represented by , and , using the generalized Nyström method (Wang et al (2009))333Of course, if we explicitly form then this step costs no more than complexity. Compared with other methods, our rest computations are all of complexity at the most, while those methods all require at least complexity in each iteration, which results from matrix-matrix multiplication. and recall that , we conclude that the overall complexity of Algorithm 2 is , which is only of linear cost with respect to the data size.

3.5 Exact Recoverability of Filtering

The exact recoverability of using our filtering method consists of two factors. First, exactly recovering from . Second, exactly recovering and . If all , , and can be exactly recovered, is exactly recovered.

The exact recoverability of from is guaranteed by Theorem 1.1 of (Candés et al (2011)). When and are sufficiently large, the chance of success is overwhelming.

To analyze the exact recoverability of and , we first observe that it is equivalent to the exact recoverability of and . By multiplying annihilation matrices and to both sides of (11) and (12), respectively, we may recover and by solving

(16)

and

(17)

respectively. If the oversampling rates and are large enough, we are able to choose and that are close to Gaussian random matrices. Then we may apply the standard theory in compressed sensing (Candés and Wakin (2007)) to conclude that if the oversampling rates and are large enough and and are sparse enough444As the analysis in the compressed sensing theories is qualitative and the bounds are actually pessimistic, copying those inequalities here is not very useful. So we omit the mathematical descriptions for brevity., and can be exactly recovered with an overwhelming probability.

We also present an example in Figure 1 to illustrate the exact recoverability of filtering. We first truncate the SVD of an image “Water” 555The image is available at http://www.petitcolas.net/fabien/
watermarking/image_database/.
to get a matrix of rank 30 (Figure 1 (b)). The observed image (Figure 1 (a)) is obtained from Figure 1 (b) by adding large noise to 30 of the pixels uniformly sampled at random (Figure 1 (c)). Suppose we have the top-left submatrix as the seed (Figure 1 (e)), the low-rank image (Figure 1 (d)) can be exactly recovered by filtering. Actually, the relative reconstruction errors in is only .

3.6 Target Rank Estimation

The above analysis and computation are all based on a known value of the target rank . For some applications, we could have an estimate on . For example, for the background modeling problem (Wright et al (2009)), the rank of the background video should be very close to one as the background hardly changes; and for the photometric stereo problem (Wu et al (2010)

) the rank of the surface normal map should be very close to three as the normals are three dimensional vectors. However, the rank

of the underlying matrix might not always be known. So we have to provide a strategy to estimate .

As we assume that the size of submatrix is , where and should be sufficiently large in order to ensure the exact recovery of from , after we have computed by solving (5), we may check whether

(18)

are satisfied, where is the rank of . If yes, is accepted as a seed matrix. Otherwise, it implies that may be too small with respect to the target rank . Then we may increase the size of the submatrix to and repeat the above procedure until (18) is satisfied or

(19)

We require (19) because the speed advantage of our filtering algorithm will quickly lost beyond this size limit (see Figure 2). If we have to use a submstrix whose size should be greater than , then the target rank should be comparable to the size of data, hence breaking our low-rank assumption. In this case, we may resort to the usual method to solve PCP.

Of course, we may sample one more submatrix to cross validate the estimated target rank . When is indeed very small, such a cross validation is not a big overhead.

4 Experimental Results

In this section, we present experiments on both synthetic data and real vision problems (structure from motion and background modeling) to test the performance of filtering. All the experiments are conducted and timed on the same PC with an AMD Athlon® II X4 2.80GHz CPU that has 4 cores and 6GB memory, running Windows 7 and Matlab (Version 7.10).

4.1 Comparison Results for Solving PCP

We first test the performance of filtering on solving PCP problem (2). The experiments are categorized into the following three classes:

  1. Compare with classic numerical solvers (e.g., ADM Lin et al (2009) and its variation, denoted as LTSVD ADM, which uses linear-time SVD Drineas et al (2006) to solve the partial SVD in each iteration) on randomly generated low-rank and sparse matrices.

  2. Compare with factorization based solver (e.g., LMaFit Shen et al (2011)) on recovering either randomly generated or deterministic low-rank matrix from its sum with a random sparse matrix.

  3. Compare with random projection based solver (e.g., random projection Mu et al (2011)) on recovering randomly generated low-rank and sparse matrices.

In the experiments synthetic data, we generate random test data in the following way: an observed data matrix is synthesized as the sum of a low-rank matrix and a sparse matrix . The rank matrix is generated as a product of two

matrices whose entries are i.i.d. Gaussian random variables with zero mean and unit variance. The matrix

is generated as a sparse matrix whose support is chosen uniformly at random, and whose non-zero entries are i.i.d. uniformly in . The rank ratio and sparsity ratio are denoted as and , respectively.

4.1.1 Filtering vs. Classic Convex Optimization

Firstly, we compare our approach with ADM on the whole matrix666The Matlab code of ADM is provided by the authors of (Lin et al (2009)) and all the parameters in this code are set to their default values., which we call the standard ADM, and its variation, which uses linear-time SVD (LTSVD)777The Matlab code of linear-time SVD is available in the FPCA package at http://www.columbia.edu/sm2756/FPCA.htm. for solving the partial SVD, hence we call the LTSVD ADM. We choose these two approaches because the standard ADM is known to be the most efficient classic convex optimization algorithm to solve PCP exactly and LTSVD ADM has a linear time cost in solving SVD888However, LTSVD ADM is still of complexity as it involves matrix-matrix multiplication in each iteration. See also Section 2.. For LTSVD ADM, in each time to compute the partial SVD we uniformly oversample columns of the data matrix without replacement. Such an oversampling rate is important for ensuring the numerical accuracy of LTSVD ADM at high probability. For all methods in comparison, the stopping criterion is .

Table 1 shows the detailed comparison among the three methods, where is the relative error to the true low-rank matrix . It is easy to see that our filtering approach has the highest numerical accuracy and is also much faster than the standard ADM and LTSVD ADM. Although LTSVD ADM is faster than the standard ADM, its numerical accuracy is the lowest among the three methods because it is probabilistic.

Size () Method RelErr Time
2000 ,   ,   ,   
S-ADM 1.46 20 39546 39998 998105 84.73
L-ADM 4.72 20 39546 40229 998105 27.41
1.66 20 39546 40000 998105 5.56 = 2.24 + 3.32
5000 ,   ,   ,   
S-ADM 7.13 50 249432 249995 6246093 1093.96
L-ADM 4.28 50 249432 250636 6246158 195.79
5.07 50 249432 250000 6246093 42.34=19.66 + 22.68
10000 ,   ,   ,   
S-ADM 1.23 100 997153 1000146 25004071 11258.51
L-ADM 4.26 100 997153 1000744 25005109 1301.83
2.90 100 997153 1000023 25004071 276.54 = 144.38 + 132.16
Table 1: Comparison among the standard ADM (S-ADM for short), LTSVD ADM (L-ADM for short) and filtering method ( for short) on the synthetic data. We present CPU time (in seconds) and the numerical accuracy of tested algorithms. and are the ground truth and and are the solution computed by different methods. For the filtering method, we report its computation time as , where , and are the time for total computation, seed matrix recovery and filtering, respectively.

We also present in Figure 2 the CPU times of the three methods when the rank ratio and sparsity ratio increases, respectively. The observed matrices are generated using the following parameter settings: , vary from 0.005 to 0.05 with fixed and vary from 0.02 to 0.2 with fixed . It can be seen from Figure 2 (a) that LTSVD ADM is faster than the standard ADM when . However, the computing time of LTSVD ADM grows quickly when increases. It even becomes slower than the standard ADM when . This is because LTSVD cannot guarantee the accuracy of partial SVD in each iteration. So its number of iterations is larger than that of the standard ADM. In comparison, the time cost of our filtering method is much less than the other two methods for all the rank ratios. However, when further grows the advantage of filtering will be lost quickly, because filtering has to compute the PCP on the submatrix . In contrast, Figure 2 (b) indicates that the CPU time of these methods grows very slowly with respect to the sparsity ratio.

(a) (b)
Figure 2: Performance of the standard ADM (S-ADM for short), LTSVD ADM (L-ADM for short) and filtering under different rank ratios and sparsity ratios , where the matrix size is . The -axis represents the rank ratio (a) or sparsity ratio (b). The -axis represents the CPU time (in seconds).

4.1.2 Filtering vs. Factorization Method

We then compare the proposed filtering with a factorization method (i.e., LMaFit999The Matlab code of LMaFit is provided by the authors of (Shen et al (2011)) and all the parameters in this code are set to their default values.) on solving (2). To test the ability of these algorithms in coping with corruptions with large magnitude, we multiply a scale to the sparse matrix, i.e., . We fix other parameters of the data (, and ) and vary the scale parameter from 1 to 10 to increase the magnitude of the sparse errors.

The computational comparisons are presented in Figure 3. Besides the CPU time and relative error, we also measure the quality of the recovered by its maximum difference (MaxDif) and average difference (AveDif) to the true low-rank matrix , which are respectively defined as and . One can see that the performance of LMaFit dramatically decreases when . This experiment suggests that the factorization method fails when the sparse matrix dominates the low-rank one in magnitude. This is because a sparse matrix with large magnitudes makes rank estimation difficult or impossible for LMaFit. Without a correct rank, the low-rank matrix cannot be recovered exactly. In comparison, our filtering always performs well on the test data.

(a) (b) (c) (d)
Figure 3: Performance of LMaFit and filtering under different sparsity magnitudes (). The -axes represent the sparsity magnitudes and the -axes represent the CPU time (in seconds) (a), “RelErr” (b), “MaxDif” (c) and “AveDif” (d) in log scale, respectively.

In the following, we consider the problem of recovering deterministic low-rank matrix from corruptions. We generate an “checkerboard” image (see Figure 4), whose rank is 2, and corrupt it by adding 10 impulsive noise to it. The corruptions (nonzero entries of the sparse matrix) are sampled uniformly at random. The image size ranges from 1000 to 5000 with an increment 500.

The results for this test are shown in Figure 4, where the first image is the corrupted checkerboard image, the second image is recovered by LMaFit and the third by filtering. A more complete illustration for this test can be seen from Figure 4(d), where the CPU time corresponding to all tested data matrix sizes are plotted. It can be seen that the images recovered by LMaFit and filtering are visually comparable in quality. The speeds of these two methods are very similar when the data size is small, while filtering runs much faster than LMaFit when the matrix size increases. This concludes that our approach has significant speed advantage over the factorization method on large scale data sets.

(a) Corrupted (b) LMaFit (c) (d) CPU Time
Figure 4: Recovery results for “checkerboard”. (a) is the image corrupted by 10 impulsive noise. (b) is the image recovered by LMaFit. (c) is the image recovered by filtering (). (d) CPU time (in seconds) vs. data size ().

4.1.3 Filtering vs. Compressed Optimization

Now we compare filtering with a compressed optimization method (i.e., random projection101010The Matlab code of random projection (RP) is provided by the author of (Mu et al (2011)) and all the parameters in this code are set to their default values.). This experiment is to study the performance of these two methods with respect to the rank of the matrix and the data size. The parameters of the test matrices are set as follows: , varying from 0.05 to 0.15 with fixed , and varying from 1000 to 5000 with fixed . For the dimension of the projection matrix (i.e., ), we set it as for all the experiments.

As shown in Figure 5, in all cases the speed and the numerical accuracy of filtering are always much higher than those of random projection.

(a) (b)
(c) (d)
Figure 5: Performance of random projection (RP for short) and filtering. (a)-(b) are the comparison under different rank ratios (). (c)-(d) are the comparison under different data sizes (). In (a) and (c), the -axes are the CPU times (in seconds). In (b) and (d), the -axes are the relative errors in log scale.

4.2 Structure from Motion

In this subsection, we apply filtering to a real world vision application, namely structure from motion (SfM). The problem of SfM is to automatically recover the 3D structure of an object from a sequence of images of the object. Suppose that the object is rigid, there are frames and tracked feature points (i.e., ), and the camera intrinsic parameters do not change. As shown in (Rao et al (2010)), the trajectories of feature points from a single rigid motion of the camera all lie in a liner subspace of , whose dimension is at most four (i.e., ). It has been shown that can be factorized as , where recovers the rotations and translations while the first three rows of

encode the relative 3D positions for each feature point in the reconstructed object. However, when there exist errors (e.g., occlusion, missing data or outliers) the feature matrix is no longer of rank 4. Then recovering the full 3D structure of the object can be posed as a low-rank matrix recovery problem.

For this experiment, we first generate the 2D feature points by applying an affine camera model (with rotation angles between 0 and , with a step size , and uniformly randomly generated translations) to the 3D “wolf” object 111111The 3D “wolf” data is available at: http://tosca.cs.technion.ac.il/., which contains 3D points. Then we add impulsive noises (the locations of the nonzero entries are uniformly sampled at random) to part (e.g., or ) of the feature points (see Figure 6). In this way, we obtain corrupted observations with a size .

Figure 6: The illustrations of some trajectories (2D image frames) generated by the 3D “wolf” object (300th, 500th, , 1300th, 1500th frames). Top row: the ground truth trajectories. Bottom row: corrupted trajectories.

We apply our filtering to remove outliers (i.e., ) and compute the affine motion matrix and the 3D coordinates from the recovered features (i.e., ). For comparison, we also include the results from the robust subspace learning (RSL) (De la Torre and Black (2003))121212The Matlab code of RSL is available at http://www.salleurl.edu/ftorre/papers/rpca/rpca.zip and the parameters in this code are set to their default values. and standard PCP (i.e., S-ADM based PCP). In Figure 7, we show the original 3D object, SfM results based on noisy trajectories and trajectories recovered by RSL, standard PCP and filtering, respectively. It is easy to see that the 3D reconstruction of RSL fails near the front legs and tail. In contrast, the standard PCP and filtering provide results with almost the same quality. Table 2 further compares the numerical behaviors of these methods. We measure the quantitative performance for SfM by the well-known mean 2D reprojection error, which is denoted as “ReprojErr” and defined by the mean distance of the ground truth 2D feature points and their reprojections. We can see that the standard PCP provides the highest numerical accuracy while its time cost is extremely high (9 times slower than RSL and more than 100 times slower than filtering). Although the speed of RSL is faster than standard PCP, its numerical accuracy is the worst among these methods. In comparison, our filtering achieves almost the same numerical accuracy as standard PCP and is the fastest.

(a) Original (b) Corrupted (c) RSL (d) S-PCP (e) -filtering
Figure 7: The SfM reconstruction. (a) is the original 3D object. (b)-(e) are SfM results using corrupted trajectory and the trajectories recovered by RSL, standard PCP (S-PCP for short) and filtering, respectively.
Noisy Level Method RelErr Time MaxDif() AveDif() ReprojErr
,   
RSL 0.0323 4 (fixed) 15384229 93.05 32.1731 0.4777 0.9851
S-PCP 4 869200 848.09
4 869644 6.46
,   
RSL 0.0550 4 (fixed) 16383294 106.65 38.1621 0.9285 1.8979
S-PCP 4 1738410 991.40
4 1739912 6.48
Table 2: Comparison among RSL, S-PCP and filtering on the structure from motion problem. We present CPU time (in seconds) and the numerical accuracy of tested algorithms. and are the ground truth and and are the solution computed by different methods.

4.3 Background Modeling

In this subsection, we consider the problem of background modeling from video surveillance. The background of a group of video surveillance frames are supposed to be exactly the same and the foreground on each frame is recognized as sparse errors. Thus this vision problem can be naturally formulated as recovering the low-rank matrix from its sum with sparse errors (Candés et al (2011)). We compare our filtering with the baseline median filter131313Please refer to http://en.wikipedia.org/wiki/Median_filter. and other state-of-the-art robust approaches, such as RSL and S-PCP. When median filtering using all the frames, the complexity is actually also quadratic and the results are not good. So we only buffer 20 frames when using median filter to compute the background. For filtering, we set the size of the seed matrix as .

For quantitative evaluation, we perform all the compared methods on the “laboratory” sequence from a public surveillance database (Benedek and Szirányi (2008)

) which has ground truth foreground. Both the false negative rate (FNR) and the false positive rate (FPR) are calculated in the sense of foreground detection. FNR indicates the ability of the method to correctly recover the foreground while the FPR represents the power of a method on distinguishing the background. These two scores correspond to the Type I and Type II errors in the statistical test theory

141414Please refer to http://en.wikipedia.org/wiki/Type_I_and_type_II_errors. and are judged by the criterion that the smaller the better. One can see from Table 3 that RSL has the lowest FNR but the highest FPR among the compared methods. This reveals that RSL could not exactly distinguish the background. Although the speed of our filtering is slightly slower than median filtering on 20 frames, its performance is as good as S-PCP, which achieves the best results but with the highest time cost.

Video - Median RSL S-PCP
“laboratory” Resolution: ,  No. Frames: 887
FNR 9.85 7.31 8.61 8.62
FPR 9.18 10.83 8.72 8.76
Time 42.90 3159.92 10897.96 48.99
“meeting” Resolution: ,  No. Frames: 700
Time 179.19 N.A. N.A. 178.74
Table 3: Comparison among median filter (Median for short), RSL, S-PCP, and filtering on background modeling problem. “Resolution” and “No. Frames” denote the size of each frame and the number of frames in a video sequence, respectively. We present FNR, FPR and the CPU time (in seconds) for the “laboratory” data set. For our collected “meeting” data set, we only report the CPU time because there is no ground truth foreground for this video sequence.

To further test the performance of filtering on large scale data set, we also collect a video sequence (named “meeting”) of 700 frames, each of which has a resolution . So the data matrix is of size greater than , which cannot be fit into the memory of our PC. As a result, we cannot use the standard ADM to solve the corresponding PCP problem. As for RSL, we have found that it did not converge on this data. Thus we only compare the performance of median filter and filtering. The time cost is reported in Table 3 and the qualitative comparison is shown in Figure 8. We can see that filtering is as fast as median filtering with 20 frames, and median filter fails on this data set. This is because the mechanism of median filter is based on the (local) frame difference. Thus when the scene contains slowly moving objects (such as that in the “meeting” video sequence), median filter will not give good results. In contrast, the background and the foreground can be separated satisfactorily by filtering. This makes sense because our filtering can exactly recover the (global) low-rank structure for the background and remove the foreground as sparse errors.

(a) Original (b) Background (Median) (c) Foreground (Median) (d) Background () (e) Foreground ()
Figure 8: The partial background modeling results of median filter and filtering on the “meeting” video sequence. (b)-(c) and (d)-(e) are the the background () and the foreground () recovered by median filter and filtering, respectively.

5 Conclusion and Further Work

In this paper, we propose the first linear time algorithm, named the filtering method, for exactly solving very large PCP problems, whose ranks are supposed to be very small compared to the data size. It first recovers a seed matrix and then uses the seed matrix to filter some rows and columns of the data matrix. It avoids SVD on the original data matrix, and the filtering step can be done in full parallelism. As a result, the time cost of our filtering method is only linear with respect to the data size, making applications of RPCA to extremely large scale problems possible. The experiments on both synthetic and real world data demonstrate the high accuracy and efficiency of our method. It is possible that the proposed technique can be applied to other large scale nuclear norm minimization problems, e.g., matrix completion (Cai et al (2010)) and low-rank representation (Liu et al (2010)). This will be our future work.

Acknowledgements.
The authors would like to thank Prof. Zaiwen Wen and Dr. Yadong Mu for sharing us their codes for LMaFit (Shen et al (2011)) and random projection (Mu et al (2011)), respectively. This work is partially supported by the grants of the National Nature Science Foundation of China-Guangdong Joint Fund (No. U0935004), the National Nature Science Foundation of China Fund (No. 60873181, 61173103) and the Fundamental Research Funds for the Central Universities. The first author would also like to thank the support from China Scholarship Council.

References

  • Aanes et al (2002) Aanes H, Fisker R, Astrom K, Carstensen J (2002) Robust factorization. IEEE Trans on PAMI 24(9):359–368
  • Baccini et al (1996) Baccini A, Besse P, de Falguerolles A (1996) An

    -norm PCA and a heuristic approach. In: Proceedings of the International Conference on Ordinal and Symbolic Data Analysis, pp 359–368

  • Benedek and Szirányi (2008) Benedek C, Szirányi T (2008) Bayesian foreground and shadow detection in uncertain frame rate surveillance videos. IEEE Trans on Image Processing 17(4):608–621
  • Cai et al (2010) Cai J, Candés E, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization 20(4):1956–1982
  • Candés and Wakin (2007) Candés E, Wakin M (2007) An introduction to compressive sampling. IEEE Signal Processing Magazine 25(2):21–30
  • Candés et al (2011) Candés E, Li X, Ma Y, Wright J (2011) Robust Principal Component Analysis? Journal of the ACM 58(3):11
  • De la Torre and Black (2003) De la Torre F, Black M (2003) A framework for robust subspace learning. IJCV 54(1–3):117–142
  • Drineas et al (2006) Drineas P, Kannan R, Mahoney M (2006) Fast Monte Carlo algorithms for matrices II: Computing a low rank approximation to a matrix. SIAM Journal on Computing 36(1):158–183
  • Ganesh et al (2009) Ganesh A, Lin Z, Wright J, Wu L, Chen M, Ma Y (2009) Fast algorithms for recovering a corrupted low-rank matrix. In: Proceedings of International Workshop on Computational Advances in Multi-Sensor Adaptive Processing
  • Halko et al (2011) Halko N, Martinsson P, Tropp J (2011) Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review 53(2):217–288
  • Ji et al (2010) Ji H, Liu C, Shen Z, Xu Y (2010) Robust video denoising using low-rank matrix completion. In: CVPR
  • Ke and Kanade (2005) Ke Q, Kanade T (2005) Robust -norm factorization in the presence of outliers and missing data by alternative convex programming. In: CVPR
  • Larsen (1998) Larsen R (1998) Lanczos bidiagonalization with partial reorthogonalization. Department of Computer Science, Aarhus University, Technical report, DAIMI PB-357
  • Lin et al (2009) Lin Z, Chen M, Wu L, Ma Y (2009) The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. UIUC Technical Report UILU-ENG-09-2215
  • Liu et al (2010) Liu G, Lin Z, Yu Y (2010) Robust subspace segmentation by low-rank representation. In: ICML
  • Mu et al (2011) Mu Y, Dong J, Yuan X, Yan S (2011) Accelerated low-rank visual recovery by random projection. In: CVPR
  • Nie et al (2011) Nie F, Huang H, Ding C, Luo D, Wang H (2011) Robust principal component analysis with non-greedy -norm maximization. In: IJCAI
  • Peng et al (2010) Peng Y, Ganesh A, Wright J, Xu W, Ma Y (2010) RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images. In: CVPR
  • Rao et al (2010) Rao S, Tron R, Vidal R, Ma Y (2010) Motion segmentation in the presence of outlying, incomplete, and corrupted trajectories. IEEE Trans on PAMI 32(10):1832–1845
  • Shen et al (2011) Shen Y, Wen Z, Zhang Y (2011) Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization. preprint
  • Skocaj et al (2007)

    Skocaj D, Leonardis A, Bischof H (2007) Weighted and robust learning of subspace representations. Pattern Recognition 40(5):1556–1569

  • Storer et al (2009) Storer M, Roth P, Urschler M, Bischof H (2009) Fast-robust PCA. In: Proc. 16th Scandinavian Conference on Image Analysis (SCIA)
  • Wang et al (2009) Wang J, Dong Y, Tong X, Lin Z, Guo B (2009) Kernel Nyström method for light transport. ACM Transactions on Graphics 28(3)
  • Wright et al (2009) Wright J, Ganesh A, Rao S, Peng Y, Ma Y (2009) Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In: NIPS
  • Wu et al (2010) Wu L, Ganesh A, Shi B, Matsushita Y, Wang Y, Ma Y (2010) Robust photometric stereo via low-rank matrix completion and recovery. preprint
  • Yuan and Yang (2009) Yuan X, Yang J (2009) Sparse and low-rank matrix decomposition via alternating direction methods. preprint
  • Zhang et al (2012) Zhang Z, Ganesh A, Liang X, Ma Y (2012) TILT: Transform-invariant low-rank textures. accepted by IJCV