# CUR Algorithm with Incomplete Matrix Observation

CUR matrix decomposition is a randomized algorithm that can efficiently compute the low rank approximation for a given rectangle matrix. One limitation with the existing CUR algorithms is that they require an access to the full matrix A for computing U. In this work, we aim to alleviate this limitation. In particular, we assume that besides having an access to randomly sampled d rows and d columns from A, we only observe a subset of randomly sampled entries from A. Our goal is to develop a low rank approximation algorithm, similar to CUR, based on (i) randomly sampled rows and columns from A, and (ii) randomly sampled entries from A. The proposed algorithm is able to perfectly recover the target matrix A with only O(rn log n) number of observed entries. In addition, instead of having to solve an optimization problem involved trace norm regularization, the proposed algorithm only needs to solve a standard regression problem. Finally, unlike most matrix completion theories that hold only when the target matrix is of low rank, we show a strong guarantee for the proposed algorithm even when the target matrix is not low rank.

## Authors

• 34 publications
• 17 publications
• ### Matrix Completion from Non-Uniformly Sampled Entries

In this paper, we consider matrix completion from non-uniformly sampled ...
06/27/2018 ∙ by Yuanyu Wan, et al. ∙ 4

• ### Universal Matrix Completion

The problem of low-rank matrix completion has recently generated a lot o...
02/10/2014 ∙ by Srinadh Bhojanapalli, et al. ∙ 0

• ### Matrix Completion with Noise via Leveraged Sampling

Many matrix completion methods assume that the data follows the uniform ...
11/11/2020 ∙ by Xinjian Huang, et al. ∙ 0

• ### NoisyCUR: An algorithm for two-cost budgeted matrix completion

Matrix completion is a ubiquitous tool in machine learning and data anal...
04/16/2021 ∙ by Dong Hu, et al. ∙ 0

• ### On the Estimation of Coherence

Low-rank matrix approximations are often used to help scale standard mac...
09/04/2010 ∙ by Mehryar Mohri, et al. ∙ 0

• ### Identifiability of Low-Rank Sparse Component Analysis

Sparse component analysis (SCA) is the following problem: Given an input...
08/27/2018 ∙ by Jérémy E. Cohen, et al. ∙ 0

• ### Single Pass Entrywise-Transformed Low Rank Approximation

In applications such as natural language processing or computer vision, ...
07/16/2021 ∙ by Yifei Jiang, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

CUR matrix decomposition is a randomized algorithm that can efficiently compute the low rank approximation for a given rectangle matrix (Drineas et al., 2006, Mahoney and Drineas, 2008, 2009). Let be the given matrix and be the target rank for approximation. CUR randomly samples columns and rows from , according to their leverage scores, to form matrices and , respectively. The approximated matrix is then computed as , where

. It can be shown, that with a high probability,

 ∥A−ˆA∥F≤(2+ϵ)∥A−Ak∥F (1)

where is the best -rank approximation of . In case when the maximum of statistical leverage scores, which is also referred to as incoherence measure in matrix completion (Candès and Tao, 2010, Recht, 2011, Candès and Recht, 2012), are small, CUR matrix decomposition can be simplified by uniformly sampling rows an columns from . The simplified algorithm will have a relative error bound similar to that in (1) except that the sample sizes and should be increased by the incoherence measure. In this draft, we will focus on the situation with bounded incoherence measure where uniform sampling of columns and rows is in general sufficient.

One limitation with the existing CUR algorithms is that they require an access to the full matrix for computing . In this work, we aim to alleviate this limitation. In particular, we assume that besides having an access to randomly sampled rows and columns from , we only observe a subset of randomly sampled entries from . Our goal is to develop a low rank approximation algorithm, similar to CUR, based on (i) randomly sampled rows and columns from , and (ii) randomly sampled entries from .

Compared to the standard matrix completion theory (Candès and Tao, 2010, Recht, 2011, Candès and Recht, 2012), the key advantage of the proposed algorithm is its low sample complexity and high computational efficiency. In particular, unlike matrix completion that requires number of observed entries, the proposed algorithm is able to perfectly recover the target matrix with only number of observed entries (including the randomly sampled entries and entries in randomly sampled rows and columns). In addition, instead of having to solve an optimization problem involved trace norm regularization, the proposed algorithm only needs to solve a standard regression problem. Finally, unlike most matrix completion theories that hold only when the target matrix is of low rank, we show a strong guarantee for the proposed algorithm even when the target matrix is not low rank.

We finally note that a closely related algorithm, titled “Low-rank Matrix and Tensor Completion via Adaptive Sampling”, was published recently

(Krishnamurthy and Singh, 2013). It is designed to recover a low rank matrix with randomly sampled rows and entries, which is different from the goal of this work (i.e. computing a low rank approximation for a target matrix ).

## 2 Algorithm and Notation

Let be the target matrix, where . To approximate , we first sample uniformly at random columns and rows from , denoted by and , respectively, where each and is a row and column of , respectively. Let be the target rank for approximation, with . Let and be the first eigenvectors of and , respectively. Besides and , we furthermore sample, uniformly at random, entries from matrix . Let include the indices of randomly sampled entries. Our goal is to approximately recover the matrix using , , and randomly sample entries in . To this end, we will solve the following optimization problem

 minZ∈Rr×r|RΩ(M)−RΩ(ˆUZˆV⊤)|2F (2)

where is defined as

 [RΩ(M)]i,j={Mi,j(i,j)∈Ω0o. w.

Let be an optimal solution to (2). The recovered matrix is given by .

The following notation will be used throughout the draft. We denote by

the singular values of

in ranked in the descending order, and by and

be the corresponding left and right singular vectors. Define

and . Given , partition the SVD decomposition of as

 (3)

Let be the th column of and be the th column of . Define the incoherence measure for and as

 μ(r)=max(maxi∈[n]nr|˜ui|2,maxi∈[m]mr|˜vi|2)

Similarly, we define the incoherence measure for matrices and . Let be the th column of and be the th column of . Define the incoherence measure for and as

 ˆμ=max(maxi∈[n]nr|ˆu′i|2,maxi∈[m]mr|ˆv′i|2)

Define projection operators , , , and . We will use for spectral norm of matrix, and for the Frobenius norm of matrix.

## 3 Supporting Theorems

In this section, we present several theorems that are important to our analysis.

###### Theorem 1

(Halko et al., 2011)) Let be an

matrix with singular value decomposition

, an a fixed . Choose a test matrix and construct sample matrix . Partition as in (3) and define and . Assuming has full row rank, the approximation error satisfies

 ∥M−PY(M)∥22≤∥Σ2∥22+∥Σ2Ω2Ω†1∥22

where project column vectors in in the subspace spanned by the column vectors in .

###### Theorem 2

(Tropp, 2011)) Let be a finite set of PSD matrices with dimension , and suppose that

 maxX∈Xλ1(X)≤B.

Sample uniformly at random from without replacement. Compute

 μmax=ℓλmax(E[X1]),μmin=ℓλmin(E[X1])

Then

 Pr{λmax(ℓ∑i=1Xi)≥(1+δ)μmax}≤k[eδ(1+δ)1+δ]μmax/B Pr{λmin(ℓ∑i=1Xi)≤(1−δ)μmin}≤k[e−δ(1−δ)1−δ]μmin/B
###### Theorem 3

Let and be two symmetric matrices of size . Let and

be the eigenvalues of

and , respectively, ranked in descending order. Let include the first eigenvectors of and , respectively. Let be any invariant norm. Define

 Δλ = ΔH = ∥H−1∥2∥H−~H∥√1−∥H−1∥2∥H−~H∥2

If , we have

 ∥sinΘ(U1,~U1)∥≤ΔHΔλ−ΔH/2(1+ΔHΔλ16)

Since the above theorem follows directly from Theorem 4.4 and discussion in Section 5 from (Li, 1999), we skip its proof.

## 4 Recovering a Low Rank Matrix

In this section, we discuss the recovery result when the rank of is no more than . We will first provide the key results for our analysis, and then present detailed proof for the key theorems.

### 4.1 Main Result

Our analysis is divided into two steps. We will first show that is small, and then bound the strongly convexity of the objective function in (2). The following theorem shows that the difference between and , measured in spectral norm, is well bounded if is small and the objective function in (2) is strongly convex.

###### Theorem 4

Assume (i) , and (ii) the strongly convexity of the objective function is no less than . Then

 ∥M−ˆM∥22≤2(Δ+Δγ)

To utilize Theorem 4, we need to bound and , respectively, which are given in the following two theorems.

###### Theorem 5

With a probability , we have,

 Δ:=∥M−PˆUMPˆV∥22≤4σ2r+1(1+m+nd)

if .

Our analysis is based on the following theorem.

###### Theorem 6

With a probability , we have,

 ∥M−MPˆV∥22≤σ2r+1(1+2md),|M−PˆUM∥2≤σ2r+1(1+2nd)

provided that .

Using Theorem 6, we have, if , with a probability

 ∥M−PˆUMPˆV∥22≤2∥M−MPˆV∥22+2∥(M−PˆUM)PˆV∥22≤4σ2r+1(1+n+md)
###### Theorem 7

With a probability , we have that the strongly convexity for the objective function in (2) is bounded from below by , provided that

 |Ω|≥7ˆμ2r2(t+2logr)

The following lemma allows us to replace in Theorem 7 with .

###### Theorem 8

Assume . Then, with a probability , we have , provided .

When , according to Theorem 5, with a probability , we have , provided that . Hence and , which directly implies that .

The following theorem follows directly follows from Theorem 5, 7, 8, and 4.

###### Theorem 9

Assume , , and . Then, with a probability , we have .

##### Remark

The result from Theorem 9 shows that, with a probability , a low rank matrix can be perfectly recovered from number of observations from matrix . This result significantly improves the result from (Krishnamurthy and Singh, 2013), where number of observations are needed for perfect recovery. We should note that unlike (Krishnamurthy and Singh, 2013) where a small incoherence measure is assumed only for column vectors in matrix , we assume a small incoherence measure for both row and column vectors in . It is this assumption that allows us to sample both rows and columns of , leading to the improvement in the sample complexity.

### 4.2 Detailed Proofs

#### 4.2.1 Proof of Theorem 4

Set . Since , we have

 ∥M−ˆUZˆV⊤∥22≤Δ,

implying that

 |Mi,j−[ˆUZˆV⊤]i,j|2≤Δ,∀i∈[n],j∈[m]

Hence, we have

 |RΩ(M)−RΩ(ˆUZˆV⊤)|2F≤|Ω|Δ

Let be the optimal solution to (2). Using the strongly convexity of (2), we have

 γ2|Ω|∥Z−Z∗∥2F≤12|Ω|Δ,

i.e. . We thus have

 ∥M−ˆM∥22 ≤ 2∥M−PˆUMPˆV∥22+2∥PˆUMPˆV−ˆUZ∗ˆV⊤∥22 ≤ 2∥M−PˆUMPˆV∥22+2∥Z−Z∗∥22≤2(Δ+Δγ)

#### 4.2.2 Proof of Theorem 6

Let are the selected columns. Define , where is the th canonical basis. To utilize Theorem 1, we need to bound the minimum eigenvalue of . We have

 Ω1Ω⊤1=V⊤1ΩΩ⊤V1

Let be the th row vector of . We have

 Ω1Ω⊤1=d∑j=1˜vij˜v⊤ij

It is straightforward to show that

 E[Ω1Ω⊤1]=dmIr

To bound the minimum eigenvalue of , we will use Theorem 2. To this end, we have

 B=max1≤i≤m|˜vi|2≤μ(r)rm

Thus, we have

 Pr{λmin(Ω1Ω⊤1)≤(1−δ)dm}≤r⋅exp(−dμ(r)r[δ+(1−δ)ln(1−δ)])

By setting , we have, with a probability

 λmin(Ω1Ω⊤1)≥d2m

Under the assumption that

 λmin(Ω1Ω⊤1)≥d2m,

using Theorem 1, we have

 ∥A−APˆV∥22≤σ2r+1+∥∥Σ2Ω2Ω†1∥∥22≤σ2r+1+2md∥Σ2Ω2∥22≤σ2r+1(1+2md∥Ω2∥22)

We complete the proof using the fact that .

#### 4.2.3 Proof of Theorem 7

We rewrite the objective function as

Define matrix , where . Our goal is to bound the minimum eigenvalue of . To Theorem 2, we bound

 B=maxi,j|k(i,j)|2≤ˆμ2r2mn

and

 λmin(E[K⊤K])=|Ω|mnλmin(ˆU⊗ˆV)=|Ω|mn

where is Kronecker product. Thus, according to Theorem 2, with a probability , we have

 λmin(K⊤K)≥|Ω|2mn

provided that

 |Ω|≥7ˆμ2r2(t+2logr)

## 5 Recovering the Low Rank Approximation of a Full Rank Matrix

In this section, we consider a general case when

is of full rank but with skewed eigenvalue distribution. To capture the skewed eigenvalue distribution, we use the concept of numerical rank

with respect to non-negative constant , which is defined as follows (Hansen, 1987)

 r(M,λ)=m∑i=1σ2iσ2i+mnλ

Define

 HA=λI+1mnMM⊤,ˆHA=λI+1dnAA⊤

and

 HB=λI+1mnM⊤M,ˆHB=λI+1dmBB⊤

Next, we generalize the definition of incoherence measure to numerical low rank. Define , where , and incoherence measure with respect to a non-negative constant as

 μ(λ)=max(max1≤i≤nnr(M,λ)|Vi,∗ΣS−1/2|2,max1≤i≤nmr(M,λ)|Ui,∗ΣS−1/2|2)

It is easy to verify that . Note that when the rank of matrix is , we have and .

In order to utilize the theorems presented in Section 4 to bound , the key is to bound and by . The following theorem allows us to bound by . Assume

 σ2r(σ2r+mnλ)r(M,λ)≥ar

for some positive . We have . More specifically, if we choose , we have

 μ(r)≤2r(M,λ)rμ(λ)

Using the above lemma, we have a modified version for Theorem 5

###### Theorem 10

Set for a fixed . With a probability , we have,

 Δ:=∥M−PˆUMPˆV∥22≤4σ2r+1(1+m+nd)

if .

We note that Theorem 10 is almost identical to Theorem 5 except that is replaced with .

Next we will bound by . To this end, we need the following theorem.

###### Theorem 11

With a probability , for any , we have

 1−δ≤λk(H−1/2AˆHAH−1/2A)≤1+δ 1−δ≤λk(H−1/2BˆHBH−1/2B)≤1+δ

if

 d≥4δ2(μ(λ)r(M,λ)+1)(t+logn)
###### Theorem 12

Assume that , and . Set . With a probability , we have

 ˆμ(r)≤2r(M,λ)rμ(λ)+18nδ2r

where

 δ2=4d(μ(λ)r(M,λ)+1)(t+logn)

Using Theorem 12, we have the following version of Theorem 7

###### Theorem 13

Assume and . Set . With a probability , we have that the strongly convexity for the objective function in (2) is bounded from below by , provided that

 |Ω|≥7(2μ(λ)r(M,λ)+72nd(μ(λ)r(M,λ)+1)(t+logn))2(t+2logr)

Combining the above results, we have the final theorem for the recovering of when its numerical rank is small.

###### Theorem 14

Assume and . Set . We have, with a probability

 ∥M−ˆM∥22≤24σ2r+1(1+(m+n)d)

if

 |Ω|≥7(2μ(λ)r(M,λ)+72nd(μ(λ)r(M,λ)+1)(t+logn))2(t+2logr)
##### Remark

The total number of observed entries are . It is minimized when , leading to for the number of observed entries and for recovery error.

### 5.1 Detailed Proof

#### 5.1.1 Proof of Theorem 11

It is sufficient to show the result for . Define

 X={Mi=H−1/2A(1naia⊤i+λI)H−1/2A,i=1,…,m}

We have

 Mi = US−1/2U⊤(mUΣV⊤i,∗Vi,∗ΣU⊤+mnλI)US−1/2U⊤ = U(mS−1/2ΣV⊤i,∗Vi,∗ΣS−1/2+mnλS−1)U⊤

Using the definition of , we have . Since

 B=dλmax(E[Mi])=d

we have

 Pr{λmax(H−1/2AˆHAH−1/2A)≥1+δ}≤nexp(−dμ(λ)r(M,λ)+1[(1+δ)log(1+δ)−δ])

Using the fact that

 (1+δ)log(1+δ)≥δ+14δ2,∀δ∈[0,1],

we have

 Pr{λmax(H−1/2AˆHAH−1/2A)≥1+δ}≤nexp(−dδ24(μr(M,λ)+1))

The upper bound is obtained by setting . Similarly, for the lower bound, we have

 Pr{λmin(H−1/2AˆHAH−1/2A)≤1−δ}≤nexp(−nμ(λ)r(M,λ)+1[(1−δ)log(1−δ)+δ])

Using the fact that

 (1−δ)log(1−δ)≥−δ+δ22

We have the lower bound by setting .

#### 5.1.2 Proof of Theorem 12

To utilize Theorem 3, we rewrite and , defined in Theorem 11, as

where . According to Theorem 11, with a probability , we have , provided that

 d≥4δ2(μ(λ)r(M,λ)+1)(t+logn)

We then compute and defined in Theorem 3. Using the fact and Theorem 11, we have, with a probability , . Hence

 ΔH≤∥D−I∥2√1−∥D−I∥2=δ√1−δ≤√2δ

Using the assumption that , we have and therefore . As a result, according to Theorem 3, we have

 ∥sinΘ(U1,ˆU)∥2≤3√2δ

Similarly, we have,

 ∥sinΘ(V1,ˆV)∥2≤3√2δ

Thus, with a probability , we have

 ˆμ(r)≤μ(λ)+nr∥sinΘ(V1,ˆV)∥22≤μ(λ)+18nδ2r

## References

• Candès and Recht (2012) E. Candès and B. Recht. Exact matrix completion via convex optimization. Commun. ACM, 55(6):111–119, 2012.
• Candès and Tao (2010) E. Candès and T. Tao. The power of convex relaxation: near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5):2053–2080, 2010.
• Drineas et al. (2006) P. Drineas, R. Kannan, and M.W. Mahoney. Fast Monte Carlo algorithms for matrices III: Computing a compressed approximate matrix decomposition. SIAM J Comput, 36:184–206, 2006.
• Halko et al. (2011) N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., 53(2):217–288, 2011.
• Hansen (1987) P. C. Hansen. Rank-defficient and discrete ill-posed problems: Numerical aspects of linear inversion. Society for Industrial and Applied Mathematics, 1987.
• Krishnamurthy and Singh (2013) Akshay Krishnamurthy and Aarti Singh. Low-rank matrix and tensor completion via adaptive sampling. In Adavance in Neural Information Processing (NIPS), 2013.
• Li (1999) R.-C. Li.

Relative perturbation theory: (II) eigenspace and singular subspace variations.

SIAM J. Matrix Anal. Appl., 20:471–492, 1999.
• Mahoney and Drineas (2008) M. W. Mahoney and P. Drineas. Relative-error CUR matrix decompositions. SIAM J Matrix Anal Appl, 30:844–881, 2008.
• Mahoney and Drineas (2009) M. W. Mahoney and P. Drineas. CUR matrix decompositions for improved data analysis. Proc. Natl. Acad. Sci. USA, 106:697–702, 2009.
• Recht (2011) B. Recht. A simpler approach to matrix completion. JMLR, 12:3413–3430, 2011.
• Tropp (2011) J. Tropp. Improved analysis of the subsampled randomized hadamard transform. Adv. Adapt. Data Anal, 3:115–126, 2011.