CUR matrix decomposition is a randomized algorithm that can efficiently compute the low rank approximation for a given rectangle matrix (Drineas et al., 2006, Mahoney and Drineas, 2008, 2009). Let be the given matrix and be the target rank for approximation. CUR randomly samples columns and rows from , according to their leverage scores, to form matrices and , respectively. The approximated matrix is then computed as , where
. It can be shown, that with a high probability,
where is the best -rank approximation of . In case when the maximum of statistical leverage scores, which is also referred to as incoherence measure in matrix completion (Candès and Tao, 2010, Recht, 2011, Candès and Recht, 2012), are small, CUR matrix decomposition can be simplified by uniformly sampling rows an columns from . The simplified algorithm will have a relative error bound similar to that in (1) except that the sample sizes and should be increased by the incoherence measure. In this draft, we will focus on the situation with bounded incoherence measure where uniform sampling of columns and rows is in general sufficient.
One limitation with the existing CUR algorithms is that they require an access to the full matrix for computing . In this work, we aim to alleviate this limitation. In particular, we assume that besides having an access to randomly sampled rows and columns from , we only observe a subset of randomly sampled entries from . Our goal is to develop a low rank approximation algorithm, similar to CUR, based on (i) randomly sampled rows and columns from , and (ii) randomly sampled entries from .
Compared to the standard matrix completion theory (Candès and Tao, 2010, Recht, 2011, Candès and Recht, 2012), the key advantage of the proposed algorithm is its low sample complexity and high computational efficiency. In particular, unlike matrix completion that requires number of observed entries, the proposed algorithm is able to perfectly recover the target matrix with only number of observed entries (including the randomly sampled entries and entries in randomly sampled rows and columns). In addition, instead of having to solve an optimization problem involved trace norm regularization, the proposed algorithm only needs to solve a standard regression problem. Finally, unlike most matrix completion theories that hold only when the target matrix is of low rank, we show a strong guarantee for the proposed algorithm even when the target matrix is not low rank.
We finally note that a closely related algorithm, titled “Low-rank Matrix and Tensor Completion via Adaptive Sampling”, was published recently(Krishnamurthy and Singh, 2013). It is designed to recover a low rank matrix with randomly sampled rows and entries, which is different from the goal of this work (i.e. computing a low rank approximation for a target matrix ).
2 Algorithm and Notation
Let be the target matrix, where . To approximate , we first sample uniformly at random columns and rows from , denoted by and , respectively, where each and is a row and column of , respectively. Let be the target rank for approximation, with . Let and be the first eigenvectors of and , respectively. Besides and , we furthermore sample, uniformly at random, entries from matrix . Let include the indices of randomly sampled entries. Our goal is to approximately recover the matrix using , , and randomly sample entries in . To this end, we will solve the following optimization problem
where is defined as
Let be an optimal solution to (2). The recovered matrix is given by .
The following notation will be used throughout the draft. We denote by
the singular values ofin ranked in the descending order, and by and
be the corresponding left and right singular vectors. Defineand . Given , partition the SVD decomposition of as
Let be the th column of and be the th column of . Define the incoherence measure for and as
Similarly, we define the incoherence measure for matrices and . Let be the th column of and be the th column of . Define the incoherence measure for and as
Define projection operators , , , and . We will use for spectral norm of matrix, and for the Frobenius norm of matrix.
3 Supporting Theorems
In this section, we present several theorems that are important to our analysis.
( (Halko et al., 2011)) Let be an
matrix with singular value decomposition
matrix with singular value decomposition, an a fixed . Choose a test matrix and construct sample matrix . Partition as in (3) and define and . Assuming has full row rank, the approximation error satisfies
where project column vectors in in the subspace spanned by the column vectors in .
( (Tropp, 2011)) Let be a finite set of PSD matrices with dimension , and suppose that
Sample uniformly at random from without replacement. Compute
Let and be two symmetric matrices of size . Let and be the eigenvalues of
be the eigenvalues ofand , respectively, ranked in descending order. Let include the first eigenvectors of and , respectively. Let be any invariant norm. Define
If , we have
Since the above theorem follows directly from Theorem 4.4 and discussion in Section 5 from (Li, 1999), we skip its proof.
4 Recovering a Low Rank Matrix
In this section, we discuss the recovery result when the rank of is no more than . We will first provide the key results for our analysis, and then present detailed proof for the key theorems.
4.1 Main Result
Our analysis is divided into two steps. We will first show that is small, and then bound the strongly convexity of the objective function in (2). The following theorem shows that the difference between and , measured in spectral norm, is well bounded if is small and the objective function in (2) is strongly convex.
Assume (i) , and (ii) the strongly convexity of the objective function is no less than . Then
To utilize Theorem 4, we need to bound and , respectively, which are given in the following two theorems.
With a probability , we have,
Our analysis is based on the following theorem.
With a probability , we have,
provided that .
Using Theorem 6, we have, if , with a probability
With a probability , we have that the strongly convexity for the objective function in (2) is bounded from below by , provided that
The following lemma allows us to replace in Theorem 7 with .
Assume . Then, with a probability , we have , provided .
When , according to Theorem 5, with a probability , we have , provided that . Hence and , which directly implies that .
Assume , , and . Then, with a probability , we have .
The result from Theorem 9 shows that, with a probability , a low rank matrix can be perfectly recovered from number of observations from matrix . This result significantly improves the result from (Krishnamurthy and Singh, 2013), where number of observations are needed for perfect recovery. We should note that unlike (Krishnamurthy and Singh, 2013) where a small incoherence measure is assumed only for column vectors in matrix , we assume a small incoherence measure for both row and column vectors in . It is this assumption that allows us to sample both rows and columns of , leading to the improvement in the sample complexity.
4.2 Detailed Proofs
4.2.1 Proof of Theorem 4
4.2.2 Proof of Theorem 6
Let are the selected columns. Define , where is the th canonical basis. To utilize Theorem 1, we need to bound the minimum eigenvalue of . We have
Let be the th row vector of . We have
It is straightforward to show that
To bound the minimum eigenvalue of , we will use Theorem 2. To this end, we have
Thus, we have
By setting , we have, with a probability
Under the assumption that
using Theorem 1, we have
We complete the proof using the fact that .
4.2.3 Proof of Theorem 7
5 Recovering the Low Rank Approximation of a Full Rank Matrix
In this section, we consider a general case when
is of full rank but with skewed eigenvalue distribution. To capture the skewed eigenvalue distribution, we use the concept of numerical rankwith respect to non-negative constant , which is defined as follows (Hansen, 1987)
Next, we generalize the definition of incoherence measure to numerical low rank. Define , where , and incoherence measure with respect to a non-negative constant as
It is easy to verify that . Note that when the rank of matrix is , we have and .
In order to utilize the theorems presented in Section 4 to bound , the key is to bound and by . The following theorem allows us to bound by . Assume
for some positive . We have . More specifically, if we choose , we have
Using the above lemma, we have a modified version for Theorem 5
Set for a fixed . With a probability , we have,
Next we will bound by . To this end, we need the following theorem.
With a probability , for any , we have
Assume that , and . Set . With a probability , we have
Assume and . Set . With a probability , we have that the strongly convexity for the objective function in (2) is bounded from below by , provided that
Combining the above results, we have the final theorem for the recovering of when its numerical rank is small.
Assume and . Set . We have, with a probability
The total number of observed entries are . It is minimized when , leading to for the number of observed entries and for recovery error.
5.1 Detailed Proof
5.1.1 Proof of Theorem 11
It is sufficient to show the result for . Define
Using the definition of , we have . Since
Using the fact that
The upper bound is obtained by setting . Similarly, for the lower bound, we have
Using the fact that
We have the lower bound by setting .
5.1.2 Proof of Theorem 12
where . According to Theorem 11, with a probability , we have , provided that
Using the assumption that , we have and therefore . As a result, according to Theorem 3, we have
Similarly, we have,
Thus, with a probability , we have
- Candès and Recht (2012) E. Candès and B. Recht. Exact matrix completion via convex optimization. Commun. ACM, 55(6):111–119, 2012.
- Candès and Tao (2010) E. Candès and T. Tao. The power of convex relaxation: near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5):2053–2080, 2010.
- Drineas et al. (2006) P. Drineas, R. Kannan, and M.W. Mahoney. Fast Monte Carlo algorithms for matrices III: Computing a compressed approximate matrix decomposition. SIAM J Comput, 36:184–206, 2006.
- Halko et al. (2011) N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., 53(2):217–288, 2011.
- Hansen (1987) P. C. Hansen. Rank-defficient and discrete ill-posed problems: Numerical aspects of linear inversion. Society for Industrial and Applied Mathematics, 1987.
- Krishnamurthy and Singh (2013) Akshay Krishnamurthy and Aarti Singh. Low-rank matrix and tensor completion via adaptive sampling. In Adavance in Neural Information Processing (NIPS), 2013.
Relative perturbation theory: (II) eigenspace and singular subspace variations.SIAM J. Matrix Anal. Appl., 20:471–492, 1999.
- Mahoney and Drineas (2008) M. W. Mahoney and P. Drineas. Relative-error CUR matrix decompositions. SIAM J Matrix Anal Appl, 30:844–881, 2008.
- Mahoney and Drineas (2009) M. W. Mahoney and P. Drineas. CUR matrix decompositions for improved data analysis. Proc. Natl. Acad. Sci. USA, 106:697–702, 2009.
- Recht (2011) B. Recht. A simpler approach to matrix completion. JMLR, 12:3413–3430, 2011.
- Tropp (2011) J. Tropp. Improved analysis of the subsampled randomized hadamard transform. Adv. Adapt. Data Anal, 3:115–126, 2011.