1 Introduction
In recent years, there is a surge of applying rank minimization as a regularizer to various machine learning and signal processing problems Wright09 ; Candes08 ; Wu10 ; Zhang10TILT ; Zhu10 ; Min10 ; Liu10 ; Peng10 ; Candes10 ; Candes09 ; Meng10 ; Liu09 ; Ji10 ; Batmanghelich10 ; Karbasi10 . In the mathematical models of these problems, the rank of some matrix is often required to be minimized. Typical models are Robust PCA (RPCA) Wright09 :
(1) 
and Matrix Completion (MC) Candes08 :
(2) 
where is the number of nonzeros in , is the set of indices of known entries in and is the restriction onto . There are variations of RPCA Zhou10 ; Ganesh10 and MC Candes09 , and there is also a combination of RPCA and MC Candes10 .
Due to the effectiveness of rank minimization, many researchers have proposed various algorithms to solve rank minimization problems Toh09 ; Cai08 ; Ma09 ; Lin09 ; Ganesh09 ; Yuan09 ; Tao09 ; Candes08 . As rank minimization problems are usually NP hard, most of them aim at solving companion convex programs instead, by replacing the rank function with the nuclear norm , i.e., the sum of the singular values, and the norm with the norm, i.e., the sum of the absolute values of the entries. This is suggested by the fact that the nuclear norm and norm are the convex envelopes of the rank function Recht10 and the
norm, respectively. Some researchers have proven that for RPCA and MC problems solving the companion convex program can produce the same solution to the original problem at an overwhelming probability
Recht10 ; Candes09 ; Candes10 . As a result, solving a rank minimization problem is often converted into solving a nuclear norm minimization (NNM) problem, in order to exploit the efficiency of convex programs.Whichever of the existing methods that solve the NNM problems is used, one always has to solve the following subproblem:
(3) 
where and change along iteration and is the Frobenius norm. Cai et al. Cai08 proved that the solution to (3) can be obtained by singular value thresholding:
(4) 
where is a shrinkage operator and is the singular value decomposition (SVD) of . Therefore, it is easy to see that NNM problems are usually computationally costly as they require solving SVDs multiple times and an SVD typically requires operations, where and
is the size of the matrix. Fortunately, it is apparent that all the singular values/vectors need not be computed because the singular values smaller than the threshold
will be shrunk to zeros hence their associated singular vectors will not contribute to . This leads to a common practice in solving NNM problems, namely using PROPACK Propack to compute the partial SVD of , where only those leading singular values that are greater than , and their associated singular vectors, are computed. This significantly brings down the computation complexity from to , where is the number of leading singular values/vectors computed.Although computing the partial SVD instead already saves the computation significantly, the complexity is still too high for large scale problems. Therefore, any further savings in the computation are valuable when the problem scale becomes large. In this paper, we aim at exploiting the relationship between successive iterations to further bring down the computation cost. Our technique is called the block Lanczos with warm start (BLWS), which uses the block Lanczos method to solve the partial SVD and the block Lanczos procedure is initialized by the principal singular subspaces of the previous iteration. The number of steps in the block Lanczos procedure is also kept small. Our BLWS technique can work in different algorithms for NNM problems. Our numerical tests show that BLWS can speed up its host algorithm by at least two to three times.
To proceed, we first introduce how the partial SVD is computed in PROPACK.
2 The Lanczos Method for the Partial SVD
PROPACK uses the Lanczos method to compute the partial SVD. As the method is based on the Lanczos method for partial eigenvalue decomposition (EVD), we have to start with the partial EVD computation.
The Lanczos method for partial EVD is to find the optimal leading^{1}^{1}1Actually it can also find the tailing eigensubspace of . eigensubspace of a symmetric matrix in a Krylov subspace Golub96 :
(5) 
The orthonormal basis of can be efficiently computed via a socalled Lanczos procedure shown in Algorithm 1. Accordingly, can be approximated as , where is a tridiagonal matrix:
(6) 
The Lanczos procedure is actually derived by comparing the left and right hand sides of (cf. Section 4).
After the Lanczos procedure is iterated for times, the EVD of is computed: . Then . Suppose the eigenvalues in is ordered from large to small. Then the largest eigenvalues of can be approximated by the first eigenvalues in (called the Ritz values of ) and the leading eigenvectors of can be approximated by the first columns of (called the Ritz vectors of ).
When computing the partial SVD of a given matrix , a critical relationship between the SVD of and the EVD of the following augmented matrix
(7) 
is used. It is depicted by the following theorem Golub96 .
Theorem 2.1
If the SVD of an matrix is , then the EVD of is
(8) 
where
(9) 
So by computing the EVD of , the SVD of can be obtained.
When computing the SVD of , the Lanczos method is actually implicitly applied to with the initial vector being chosen as
(10) 
in order to exploit the special structure of . Accordingly, can be approximated as , where columns of and are orthonormal and is bidiagonal. Then the approximate singular values/vectors of can be obtained after computing the SVD of . For more details, please refer to Propack .
The Lanczos method has some important properties Golub96 . First, the Ritz values of converge to the largest eigen/singular values of quickly when grows, so do the Ritz vectors. Second, as it only requires solving the EVD/SVD of a relatively small and banded matrix /, the partial EVD/SVD is usually faster than the full EVD/SVD when the required number of eigen/singular vectors is relatively small (e.g., when ). Third, the Lanczos procedure terminates when an invariant subspace is found. Fourth, the orthogonality among the columns of is easily lost when the Lanczos procedure goes on. Hence, reorthogonalization is usually necessary when is relatively large. Unfortunately, reorthogonalization is expensive. So PROPACK monitors the orthogonality among the columns of and only reorthogonalizes part of the columns whose orthogonalities with other columns deteriorate.
3 Ideas to Improve
We notice that if we solve the partial SVD in each iteration independently, the Lanczos procedure has to start from a random initial vector as no apriori information is available. Random initialization makes the size of unpredictable. If is not good, can be relatively large in order for the Ritz values/vectors to achieve a prescribed precision, making the partial SVD inefficient. Actually, during the iterations of optimization, as the matrices and are close to each other, any of the leading Ritz vectors of should be good for initializing the Lanczos procedure of . However, a risk of this strategy is that the Lanczos procedure may terminate quickly by outputting a small invariant subspace containing the previous Ritz vector because the previous Ritz vector is close to be a singular vector of . So the Lanczos procedure for may fail and has to restart with another initial vector^{2}^{2}2Although in reality the Lanczos procedure seldom terminates due to numerical error, our numerical tests show that such choice of initial does not help speeding up.. Moreover, initializing with a vector neglects the fact that we are actually seeking optimal singular subspaces, not a number of individual singular vectors. A vector definitely cannot record all the information from the previous principal singular subspaces (left and right). So, ideally we should use the whole previous principal singular subspaces. This motivates us to adopt the block Lanczos method for partial SVD, where the block Lanczos procedure starts with the previous principal singular subspaces.
4 Block Lanczos with Warm Start
Again, we start with the block Lanczos with warm start (BLWS) for partial EVD. The block Lanczos method is a natural generalization of the standard Lanczos method by replacing the Krylov space with
(11) 
where is an orthonormal basis of an initial subspace. Accordingly, the Lanczos procedure is generalized to the block Lanczos procedure, which is to find an approximation of : , where is a block tridiagonal matrix Golub96 :
(12) 
, and columns of are orthonormal. By comparing the left and right hand sides of , we have
(13) 
where is defined to be 0. From the orthogonality among the columns of , we have that
(14) 
Moreover, if we define , then
is the QR decomposition of
. This leads to the block Lanczos procedure in Algorithm 2.After the approximation is obtained, one may further compute the EVD of : , where the eigenvalues are ordered from large to small. Then the leading eigenvalues and eigenvectors of is approximated by , and , respectively. The whole process is summarized in Algorithm 3.
If we denote the block Lanczos for partial EVD (Algorithm 3) as , then our BLWS can be written as:
namely the principal eigensubspace of the previous iteration is used to initialize the block Lanczos procedure.
When using the block Lanczos method to compute the partial SVD of a matrix , similarly the block Lanczos procedure is applied to shown in (7). Note that is of special structure. So the block Lanczos procedure can be done efficiently by skipping the zero submatrices of . The details are trivial. So we omit them.
With BLWS, compared with the standard Lanczos method, the risk of terminating with a small invariant subspace is gone, and the principal eigensubspace can be updated more efficiently. As a result, the whole optimization process can be sped up a lot.
4.1 More Tricks for Acceleration
Recall that in the standard Lanczos procedure, the orthogonality among the columns of is easily lost when grows. So is the block Lanczos procedure. As reorthogonalization is expensive, we further require that the number of performing the block Lanczos procedure is small, such that reorthogonalization can be waived. In our experiments, we typically set , namely the block Lanczos procedure is performed only once. Although such a fixed and small value of cannot result in high precision principal singular subspaces when the block Lanczos procedure is randomly initialized, it does produce high precision principal singular subspaces when the block Lanczos procedure is initialized with the previous principal singular subspaces. This is because is close to . So the previous principal singular subspaces is already close to the principal singular subspaces of
. Then the block Lanczos procedure improves them and produce better estimated principal singular subspaces. Note that keeping
small has multiple advantages. First, it waives the necessity of expensive reorthogonalization. Second, it saves the computation in performing the block Lanczos procedure. Third, the SVD of also becomes cheap because the size of is small.In the standard block Lanczos method for partial SVD, the initial subspace is chosen as or (cf. (10)), where and are the estimated left and right principal singular subspaces obtained in the previous iteration, respectively. However, such an initialization only utilizes half of the information provided by the previous iteration. So our BLWS technique uses as the initial subspace. In this way, the precision of obtained principal singular subspaces is higher when the block Lanczos procedure is performed for the same number of steps.
4.2 Handling Variant Dimensions of Principal Singular Subspaces
The above exposition assumes that the dimension of the principal singular subspaces is known and fixed along iteration. In reality, is unknown and has to be dynamically predicted before the partial SVD is computed Lin09 ; Toh09 ; Cai08 ; Ma09 . Hence actually varies along iteration. In this case, BLWS simply outputs Ritz values/vectors according to the predicted in the current iteration and the block Lanczos procedure is still initialized with the principal singular subspaces output by last iteration. We have observed that for many NNM problems, the predicted quickly stabilizes. So variant dimensions of principal singular subspaces at the early iterations do not affect the effectiveness of BLWS.
5 Experimental Results
Our BLWS technique is a general acceleration method. Given an algorithm to solve a NNM problem, a user only has to replace the SVD computation in the algorithm with BLWS and may obtain noticeable speedup.
As examples, in this section we apply our BLWS technique to two popular problems: Robust PCA (RPCA) Wright09 and Matrix Completion (MC) problems Candes08 . For each problem, we compare the original chosen algorithm and its BLWS improved counterpart in the aspect of computation time. The accuracies of obtained solutions are also shown in order to ensure that the correct solutions are approached. All experiments are run on the same workstation with two quadcore 2.53GHz Intel Xeon E5540 CPUs, running Windows Server 2008 and Matlab (Version 7.7).
For the RPCA problem, we generate the synthetic data in the same way as that in Lin09 . Namely, is generated according to the independent random orthogonal model Wright09 ,
is a sparse matrix whose support is independent and the entry values are uniformly distributed in
, and . For simplicity, we only focus on square matrices and the parameter is fixed at , as suggested by Wright et al. Wright09 . The value of is chosen in . The rank of is chosen as , and the number of corrupted entries (i.e., ) is . We choose the ADM method Yuan09 ; Lin09 to solve the PRCA problem.The data for the MC problem is generated in the same way as that in Cai08 . Namely, an matrix with rank is generated by first sampling two factor matrices and independently, each having i.i.d. Gaussian entries, and then multiplying them: . Finally, the set of observed entries is sampled uniformly at random. We choose the SVT algorithm Cai08 to solve the MC problem.
Table 1 shows detailed comparison between ADM and BLWS accelerated ADM for solving the RPCA problem. We can see that BLWSADM roughly costs less than 1/3 time of ADM, achieving the same accuracy, and the total number of iterations does not change or only increases slightly. Similar phenomenon can also be observed in Table 2, which lists the comparison results for solving the MC problem.
#  

500  ADM  5.27e006  50  25009  30  4.07 
500  BLWSADM  9.64e006  50  25008  30  2.07 
1000  ADM  3.99e006  100  100021  29  23.09 
1000  BLWSADM  6.05e006  100  100015  30  9.25 
2000  ADM  2.80e006  200  400064  28  154.80 
2000  BLWSADM  4.30e006  200  400008  30  53.37 
3000  ADM  2.52e006  300  900075  28  477.13 
3000  BLWSADM  3.90e006  300  900006  30  149.19 
#  

5000 
10  6  0.024  SVT  72.57  123  1.73e004 
5000  10  6  0.024  BLWSSVT  20.02  123  1.74e004 
5000  50  5  0.1  SVT  438.81  107  1.63e004 
5000  50  5  0.1  BLWSSVT  279.08  108  1.55e004 
5000 
100  4  0.158  SVT  1783.26  122  1.73e004 
5000  100  4  0.158  BLWSSVT  1175.91  122  1.74e004 
10000 
10  6  0.012  SVT  135.90  123  1.68e004 
10000  10  6  0.012  BLWSSVT  42.80  123  1.70e004 
10000 
50  5  0.050  SVT  1156.25  110  1.58e004 
10000  50  5  0.050  BLWSSVT  736.01  110  1.60e004 
20000 
10  6  0.006  SVT  251.13  123  1.74e004 
20000  10  6  0.006  BLWSSVT  101.47  124  1.68e004 
30000  10  6  0.004  SVT  449.34  124  1.75e004 
30000  10  6  0.004  BLWSSVT  171.40  125  1.69e004 

is the number of degrees of freedom in an
matrix of rank .6 Discussions
Although we have presented numerical results to testify to the effectiveness of BLWS, currently we have not rigorously proved the correctness of BLWS. We guess that BLWS can work well for most NNM problems. This is due to Theorem 9.2.2 of Golub96 , which implies that when there is sufficient gap between the th and the th eigenvalues, the errors in the Ritz values can be well controlled. As NNM problems typically involve singular value thresholding (4), such a gap should exist. However, a rigorous proof is still under exploration.
7 Conclusions
In this paper, we introduce the block Lanczos with warm start technique to accelerate the partial SVD computation in NNM problems. Both the block Lanczos procedure and the initialization with the previous principal singular subspaces can take full advantage of the information from last iteration. Our BLWS technique can work in different algorithms that solve rank minimization problems. The experimental results indicate that our BLWS technique usually makes its host algorithm at least two to three times faster.
References
 (1) Batmanghelich, N., et al.: Application of tracenorm and lowrank matrix decomposition for computational anatomy. In: Proc. of IEEE Computer Society Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA) (2010)
 (2) Cai, J., Candès, E., Shen, Z.: A singular value thresholding algorithm for matrix completion. Preprint (2008)

(3)
Candès, E., Li, X., Ma, Y., Wright, J.: Robust principal component analysis?
Journal of the ACM (2010)  (4) Candès, E., Plan, Y.: Matrix completion with noise. Proc. of the IEEE (2009)
 (5) Candès, E., Recht, B.: Exact lowrank matrix completion via convex optimization. In: Proc. of 46th Annual Allerton Conference on Communication, Control, and Computing, pp. 806–812 (2008)
 (6) Ganesh, A., et al.: Dense error correction for lowrank matrices via principal component pursuit. In: Proc. of IEEE International Symposium on Information Theory (ISIT) (2010)
 (7) Ganesh, A., Lin, Z., Wright, J., Wu, L., Chen, M., Ma, Y.: Fast algorithms for recovering a corrupted lowrank matrix. In: Proc. of International Workshop on Computational Advances in MultiSensor Adaptive Processing (CAMSAP) (2009)
 (8) Golub, G., Loan, C.: Matrix computations. The Johns Hopkins University Press (1996)

(9)
Ji, H., Liu, C., Shen, Z., Xu, Y.: Robust video denoising using low rank matrix
completion.
In: Proc. of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
 (10) Karbasi, A., Oh, S., Parhizkar, R., Vetterli, M.: Ultrasound tomography calibration using structured matrix completion. In: Proc. of International Congress of Acoustics (2010)
 (11) Larsen, R.: Lanczos bidiagonalization with partial reorthogonalization. Department of Computer Science, Aarhus University, Technical report, DAIMI PB357, code available at http://soi.stanford.edu/rmunk/PROPACK/ (1998)
 (12) Lin, Z., Chen, M., Wu, L., Ma, Y.: The augmented Lagrange multiplier method for exact recovery of corrupted lowrank matrices. UIUC Technical Report UILUENG092215 (2009)
 (13) Liu, G., Lin, Z., Yu, Y.: Robust subspace segmentation by lowrank representation. In: Proc. of International Conference on Machine Learning (ICML) (2010)
 (14) Liu, Z., Vandenberghe, L.: Semidefinite programming methods for system realization and identification. In: Proc. of IEEE Conference on Decision and Control (CDC), pp. 4676–4681 (2009)
 (15) Ma, S., Goldfarb, D., Chen, L.: Fixed point and Bregman iterative methods for matrix rank minimization. Preprint (2009)
 (16) Meng, J., Yin, W., Houssain, E., Han, Z.: Collaborative spectrum sensing from sparse observations using matrix completion for cognitive radio networks. In: Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2010)
 (17) Min, K., Zhang, Z., Wright, J., Ma, Y.: Decomposing background topics from keywords by principal component pursuit. In: Proc. of ACM International Conference on Information and Knowledge Management (CIKM) (2010)
 (18) Peng, Y., Ganesh, A., Wright, J., Ma, Y.: RASL: Robust alignment via sparse and lowrank decomposition. In: Proc. of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
 (19) Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization. SIAM Review 52(3), 471–501 (2010)
 (20) Tao, M., Yuan, X.: Recovering lowrank and sparse components of matrices from incomplete and noisy observations. Preprint (2009)
 (21) Toh, K., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Preprint (2009)
 (22) Wright, J., et al.: Robust principal component analysis: Exact recovery of corrupted lowrank matrices via convex optimization. In: Proc. of Neural Information Processing Systems (NIPS), pp. 2080–2088 (2009)
 (23) Wu, L., et al.: Robust photometric stereo via lowrank matrix completion and recovery. In: Proc. of Asian Conference on Computer Vision (2010)
 (24) Yuan, X., Yang, J.: Sparse and lowrank matrix decomposition via alternating direction methods. Preprint (2009)
 (25) Zhang, Z., Liang, X., Ganesh, A., Ma, Y.: TILT: Transform invariant lowrank textures. In: Proc. of Asian Conference on Computer Vision (2010)
 (26) Zhou, Z., et al.: Stable principal component pursuit. In: Proc. of IEEE International Symposium on Information Theory (ISIT) (2010)
 (27) Zhu, G., Yan, S., Ma, Y.: Image tag refinement towards lowrank, contenttag prior and error sparsity. In: Proc. of ACM Multimedia (2010)
Comments
There are no comments yet.