CUR Algorithm with Incomplete Matrix Observation

03/22/2014 ∙ by Rong Jin, et al. ∙ Michigan State University 0

CUR matrix decomposition is a randomized algorithm that can efficiently compute the low rank approximation for a given rectangle matrix. One limitation with the existing CUR algorithms is that they require an access to the full matrix A for computing U. In this work, we aim to alleviate this limitation. In particular, we assume that besides having an access to randomly sampled d rows and d columns from A, we only observe a subset of randomly sampled entries from A. Our goal is to develop a low rank approximation algorithm, similar to CUR, based on (i) randomly sampled rows and columns from A, and (ii) randomly sampled entries from A. The proposed algorithm is able to perfectly recover the target matrix A with only O(rn log n) number of observed entries. In addition, instead of having to solve an optimization problem involved trace norm regularization, the proposed algorithm only needs to solve a standard regression problem. Finally, unlike most matrix completion theories that hold only when the target matrix is of low rank, we show a strong guarantee for the proposed algorithm even when the target matrix is not low rank.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

CUR matrix decomposition is a randomized algorithm that can efficiently compute the low rank approximation for a given rectangle matrix (Drineas et al., 2006, Mahoney and Drineas, 2008, 2009). Let be the given matrix and be the target rank for approximation. CUR randomly samples columns and rows from , according to their leverage scores, to form matrices and , respectively. The approximated matrix is then computed as , where

. It can be shown, that with a high probability,

(1)

where is the best -rank approximation of . In case when the maximum of statistical leverage scores, which is also referred to as incoherence measure in matrix completion (Candès and Tao, 2010, Recht, 2011, Candès and Recht, 2012), are small, CUR matrix decomposition can be simplified by uniformly sampling rows an columns from . The simplified algorithm will have a relative error bound similar to that in (1) except that the sample sizes and should be increased by the incoherence measure. In this draft, we will focus on the situation with bounded incoherence measure where uniform sampling of columns and rows is in general sufficient.

One limitation with the existing CUR algorithms is that they require an access to the full matrix for computing . In this work, we aim to alleviate this limitation. In particular, we assume that besides having an access to randomly sampled rows and columns from , we only observe a subset of randomly sampled entries from . Our goal is to develop a low rank approximation algorithm, similar to CUR, based on (i) randomly sampled rows and columns from , and (ii) randomly sampled entries from .

Compared to the standard matrix completion theory (Candès and Tao, 2010, Recht, 2011, Candès and Recht, 2012), the key advantage of the proposed algorithm is its low sample complexity and high computational efficiency. In particular, unlike matrix completion that requires number of observed entries, the proposed algorithm is able to perfectly recover the target matrix with only number of observed entries (including the randomly sampled entries and entries in randomly sampled rows and columns). In addition, instead of having to solve an optimization problem involved trace norm regularization, the proposed algorithm only needs to solve a standard regression problem. Finally, unlike most matrix completion theories that hold only when the target matrix is of low rank, we show a strong guarantee for the proposed algorithm even when the target matrix is not low rank.

We finally note that a closely related algorithm, titled “Low-rank Matrix and Tensor Completion via Adaptive Sampling”, was published recently 

(Krishnamurthy and Singh, 2013). It is designed to recover a low rank matrix with randomly sampled rows and entries, which is different from the goal of this work (i.e. computing a low rank approximation for a target matrix ).

2 Algorithm and Notation

Let be the target matrix, where . To approximate , we first sample uniformly at random columns and rows from , denoted by and , respectively, where each and is a row and column of , respectively. Let be the target rank for approximation, with . Let and be the first eigenvectors of and , respectively. Besides and , we furthermore sample, uniformly at random, entries from matrix . Let include the indices of randomly sampled entries. Our goal is to approximately recover the matrix using , , and randomly sample entries in . To this end, we will solve the following optimization problem

(2)

where is defined as

Let be an optimal solution to (2). The recovered matrix is given by .

The following notation will be used throughout the draft. We denote by

the singular values of

in ranked in the descending order, and by and

be the corresponding left and right singular vectors. Define

and . Given , partition the SVD decomposition of as

(3)

Let be the th column of and be the th column of . Define the incoherence measure for and as

Similarly, we define the incoherence measure for matrices and . Let be the th column of and be the th column of . Define the incoherence measure for and as

Define projection operators , , , and . We will use for spectral norm of matrix, and for the Frobenius norm of matrix.

3 Supporting Theorems

In this section, we present several theorems that are important to our analysis.

Theorem 1

(Halko et al., 2011)) Let be an

matrix with singular value decomposition

, an a fixed . Choose a test matrix and construct sample matrix . Partition as in (3) and define and . Assuming has full row rank, the approximation error satisfies

where project column vectors in in the subspace spanned by the column vectors in .

Theorem 2

(Tropp, 2011)) Let be a finite set of PSD matrices with dimension , and suppose that

Sample uniformly at random from without replacement. Compute

Then

Theorem 3

Let and be two symmetric matrices of size . Let and

be the eigenvalues of

and , respectively, ranked in descending order. Let include the first eigenvectors of and , respectively. Let be any invariant norm. Define

If , we have

Since the above theorem follows directly from Theorem 4.4 and discussion in Section 5 from (Li, 1999), we skip its proof.

4 Recovering a Low Rank Matrix

In this section, we discuss the recovery result when the rank of is no more than . We will first provide the key results for our analysis, and then present detailed proof for the key theorems.

4.1 Main Result

Our analysis is divided into two steps. We will first show that is small, and then bound the strongly convexity of the objective function in (2). The following theorem shows that the difference between and , measured in spectral norm, is well bounded if is small and the objective function in (2) is strongly convex.

Theorem 4

Assume (i) , and (ii) the strongly convexity of the objective function is no less than . Then

To utilize Theorem 4, we need to bound and , respectively, which are given in the following two theorems.

Theorem 5

With a probability , we have,

if .

Our analysis is based on the following theorem.

Theorem 6

With a probability , we have,

provided that .

Using Theorem 6, we have, if , with a probability

Theorem 7

With a probability , we have that the strongly convexity for the objective function in (2) is bounded from below by , provided that

The following lemma allows us to replace in Theorem 7 with .

Theorem 8

Assume . Then, with a probability , we have , provided .

When , according to Theorem 5, with a probability , we have , provided that . Hence and , which directly implies that .

The following theorem follows directly follows from Theorem 5, 7, 8, and 4.

Theorem 9

Assume , , and . Then, with a probability , we have .

Remark

The result from Theorem 9 shows that, with a probability , a low rank matrix can be perfectly recovered from number of observations from matrix . This result significantly improves the result from (Krishnamurthy and Singh, 2013), where number of observations are needed for perfect recovery. We should note that unlike (Krishnamurthy and Singh, 2013) where a small incoherence measure is assumed only for column vectors in matrix , we assume a small incoherence measure for both row and column vectors in . It is this assumption that allows us to sample both rows and columns of , leading to the improvement in the sample complexity.

4.2 Detailed Proofs

4.2.1 Proof of Theorem 4

Set . Since , we have

implying that

Hence, we have

Let be the optimal solution to (2). Using the strongly convexity of (2), we have

i.e. . We thus have

4.2.2 Proof of Theorem 6

Let are the selected columns. Define , where is the th canonical basis. To utilize Theorem 1, we need to bound the minimum eigenvalue of . We have

Let be the th row vector of . We have

It is straightforward to show that

To bound the minimum eigenvalue of , we will use Theorem 2. To this end, we have

Thus, we have

By setting , we have, with a probability

Under the assumption that

using Theorem 1, we have

We complete the proof using the fact that .

4.2.3 Proof of Theorem 7

We rewrite the objective function as

Define matrix , where . Our goal is to bound the minimum eigenvalue of . To Theorem 2, we bound

and

where is Kronecker product. Thus, according to Theorem 2, with a probability , we have

provided that

5 Recovering the Low Rank Approximation of a Full Rank Matrix

In this section, we consider a general case when

is of full rank but with skewed eigenvalue distribution. To capture the skewed eigenvalue distribution, we use the concept of numerical rank

with respect to non-negative constant , which is defined as follows (Hansen, 1987)

Define

and

Next, we generalize the definition of incoherence measure to numerical low rank. Define , where , and incoherence measure with respect to a non-negative constant as

It is easy to verify that . Note that when the rank of matrix is , we have and .

In order to utilize the theorems presented in Section 4 to bound , the key is to bound and by . The following theorem allows us to bound by . Assume

for some positive . We have . More specifically, if we choose , we have

Using the above lemma, we have a modified version for Theorem 5

Theorem 10

Set for a fixed . With a probability , we have,

if .

We note that Theorem 10 is almost identical to Theorem 5 except that is replaced with .

Next we will bound by . To this end, we need the following theorem.

Theorem 11

With a probability , for any , we have

if

Theorem 12

Assume that , and . Set . With a probability , we have

where

Using Theorem 12, we have the following version of Theorem 7

Theorem 13

Assume and . Set . With a probability , we have that the strongly convexity for the objective function in (2) is bounded from below by , provided that

Combining the above results, we have the final theorem for the recovering of when its numerical rank is small.

Theorem 14

Assume and . Set . We have, with a probability

if

Remark

The total number of observed entries are . It is minimized when , leading to for the number of observed entries and for recovery error.

5.1 Detailed Proof

5.1.1 Proof of Theorem 11

It is sufficient to show the result for . Define

We have

Using the definition of , we have . Since

we have

Using the fact that

we have

The upper bound is obtained by setting . Similarly, for the lower bound, we have

Using the fact that

We have the lower bound by setting .

5.1.2 Proof of Theorem 12

To utilize Theorem 3, we rewrite and , defined in Theorem 11, as

where . According to Theorem 11, with a probability , we have , provided that

We then compute and defined in Theorem 3. Using the fact and Theorem 11, we have, with a probability , . Hence

Using the assumption that , we have and therefore . As a result, according to Theorem 3, we have

Similarly, we have,

Thus, with a probability , we have

References

  • Candès and Recht (2012) E. Candès and B. Recht. Exact matrix completion via convex optimization. Commun. ACM, 55(6):111–119, 2012.
  • Candès and Tao (2010) E. Candès and T. Tao. The power of convex relaxation: near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5):2053–2080, 2010.
  • Drineas et al. (2006) P. Drineas, R. Kannan, and M.W. Mahoney. Fast Monte Carlo algorithms for matrices III: Computing a compressed approximate matrix decomposition. SIAM J Comput, 36:184–206, 2006.
  • Halko et al. (2011) N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., 53(2):217–288, 2011.
  • Hansen (1987) P. C. Hansen. Rank-defficient and discrete ill-posed problems: Numerical aspects of linear inversion. Society for Industrial and Applied Mathematics, 1987.
  • Krishnamurthy and Singh (2013) Akshay Krishnamurthy and Aarti Singh. Low-rank matrix and tensor completion via adaptive sampling. In Adavance in Neural Information Processing (NIPS), 2013.
  • Li (1999) R.-C. Li.

    Relative perturbation theory: (II) eigenspace and singular subspace variations.

    SIAM J. Matrix Anal. Appl., 20:471–492, 1999.
  • Mahoney and Drineas (2008) M. W. Mahoney and P. Drineas. Relative-error CUR matrix decompositions. SIAM J Matrix Anal Appl, 30:844–881, 2008.
  • Mahoney and Drineas (2009) M. W. Mahoney and P. Drineas. CUR matrix decompositions for improved data analysis. Proc. Natl. Acad. Sci. USA, 106:697–702, 2009.
  • Recht (2011) B. Recht. A simpler approach to matrix completion. JMLR, 12:3413–3430, 2011.
  • Tropp (2011) J. Tropp. Improved analysis of the subsampled randomized hadamard transform. Adv. Adapt. Data Anal, 3:115–126, 2011.