# An Iterative Method for Structured Matrix Completion

The task of filling-in or predicting missing entries of a matrix, from a subset of known entries, is known as matrix completion. In today's data-driven world, data completion is essential whether it is the main goal or a pre-processing step. In recent work, a modification to the standard nuclear norm minimization for matrix completion has been made to take into account structural differences between observed and unobserved entries. One example of such structural difference is when the probability that an entry is observed or not depends mainly on the value of the entry. We propose adjusting an Iteratively Reweighted Least Squares (IRLS) algorithm for low-rank matrix completion to take into account sparsity-based structure in the missing entries. We also present an iterative gradient-projection-based implementation of the algorithm, and present numerical experiments showing that the proposed method often outperforms the IRLS algorithm in structured settings.

## Authors

• 9 publications
• 6 publications
• 53 publications
01/29/2018

### Matrix Completion for Structured Observations

The need to predict or fill-in missing data, often referred to as matrix...
06/04/2021

### Matrix completion with data-dependent missingness probabilities

The problem of completing a large matrix with lots of missing entries ha...
04/17/2019

### Matrix Completion With Selective Sampling

Matrix completion is a classical problem in data science wherein one att...
08/07/2014

### Matrix Completion on Graphs

The problem of finding the missing values of a matrix given a few of its...
05/12/2020

### Detection thresholds in very sparse matrix completion

Let A be a rectangular matrix of size m× n and A_1 be the random matrix ...
01/10/2014

### Online Matrix Completion Through Nuclear Norm Regularisation

It is the main goal of this paper to propose a novel method to perform m...
11/09/2012

### Calibrated Elastic Regularization in Matrix Completion

This paper concerns the problem of matrix completion, which is to estima...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Matrix completion is the task of filling-in, or predicting, the missing entries of a partially observed matrix from a subset of known entries. In today’s data-driven world, data completion is essential, whether it is the main goal as in recommender systems, or a pre-processing step for other tasks like regression or classification. The Netflix Problem is a popular example of a data completion task [2, 3, 26]. The Netflix Problem was an open competition for the best collaborative filtering algorithm to predict unseen user ratings for movies. That is, given a subset of user-movie ratings, the goal is to predict the remaining ratings. This can be used to decide whether a certain movie should be recommended to a user. The Netflix Problem can be viewed as a matrix completion problem where the rows represent users, the columns represent movies, and the entries of the matrix are the corresponding user-movie ratings, most of which are missing.

Matrix completion problems are generally ill-posed without some additional information, since the missing entries could be assigned arbitrary values. In many instances, the matrix we wish to recover is known to be low-dimensional in the sense that it is low-rank, or approximately low-rank. For instance, a data matrix of all user-ratings of movies may be approximately low-rank because it is commonly believed that only a few factors contribute to an individual’s tastes or preferences [9]. Low-rank matrix completion is a special case of the affine rank minimization problem

. Indeed, the problem of minimizing the rank of a matrix subject to affine constraints arises often in machine learning, and is known to be NP-hard

[17, 40].

In recent work [36], Molitor and Needell propose a modification to the standard nuclear norm minimization for matrix completion to take into account structural differences between observed and unobserved entries. Previous strategies typically assume that there are no structural differences between observed and missing entries, which is an unrealistic assumption in many settings. General notions of structural difference include any setting in which whether an entry is observed or unobserved does not occur uniformly at random. For example, whether an entry is observed or missing could be biased not only based on the value of that entry, but also on the location of that entry in the matrix. For instance, certain rows (or columns) may have substantially more entries than a typical row (or columns); this happens in the Netflix Problem for very popular movies or so-called “super-users”. Another important example of such structural differences is when the unobserved entries are sparse or have lower magnitudes than the observed entries [36]. In other words, whether an entry is missing need not be independent of the value of that entry. In our work, we focus on this notion of structure, in which the probability that an entry is observed or not depends mainly on the value of the entry. In particular, we are interested in sparsity-based structure in the missing entries whereby most of the missing values are close in the or norm sense to 0 (or more generally to a fixed value). This is motivated by many situations in which the missing values tend to be near a certain value. For instance, missing data in chemical measurements might indicate that the measurement value is lower than the limit of detection of the device, and thus a typical missing measurement is smaller in value than a typical observed measurement. Similarly, in medical survey data, patients are more likely to respond to questions addressing noticeable symptoms, whereas a missing response may indicate a lack of symptoms [36]. Similarly, a missing rating of a movie might indicate the user’s lack of interest in that movie, thus suggesting a lower rating than otherwise expected. More generally, in survey data, incomplete data may be irrelevant or unimportant to the individual, thus suggesting structure in the missing observations [36]

. For another example in the setting of sensor networks, suppose we are given partial information about the distances between sensors, where distance estimates are based on signal strength readings

[9]

and we would like to impute the missing signal strength readings. In this case, signals may be missing because of the very low signal strength, indicating that perhaps sensors are at great distance from each other (or there are other geographic obstacles between them). Thus, we obtain a partially observed distance matrix with structured observations—missing entries tend to have lower signal strength. Sensor networks give a low-rank matrix completion problem, perhaps of rank equal to two if the sensors are located in a plane, or three if they are located in three-dimensional space

[30, 43].

Matrix completion algorithms are generally hard to evaluate as there is usually no ground truth data available on which to test the performance of the algorithm. One method is to consider a subset of the matrix entries to be unobserved, and test how well they are predicted by the algorithm. Another method is testing the algorithm on classification tasks. That is, if we are working with labeled matrices with missing entries, we can test the performance of a matrix completion algorithm based on the classification rates of how well we predict a matrix’s label after imputing its missing entries.

### 1.1. Affine Rank Minimization Problem

The Affine Rank Minimization Problem (ARMP), or the problem of finding the minimum rank matrix in an affine set, is expressed as

 (1) minimizeX rank(X) subject to A(X)=b,

where is the optimization variable, is a linear map, and denotes the measurements. The affine rank minimization problem arises frequently in applications like system identification and control [31], collaborative filtering, low-dimensional Euclidean embedding [18], sensor networks [4, 41, 42], quantum state tomography [21, 22], signal processing, and image processing.

Many algorithms have been proposed for ARMP, e.g. reweighted nuclear norm minimization [34]

, Singular Value Thresholding (SVT)

[5], Fixed Point Continuation Algorithm (FPCA) [20], Iterative Hard Thresholding (IHT) [20], Optspace [25], Singular Value Projection (SVP) [24], Atomic Decomposition for Minimum Rank Approximation (AdMiRA) [28], and the accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems (NNLS) [44], etc.

The low-rank matrix completion problem can be formulated as follows [10, 39]. Suppose we are given some matrix with a set of partially observed entries, with . The goal is to recover the missing elements in . The low-rank matrix completion problem is a special case of the affine rank minimization problem where the set of affine constraints restrict certain entries of the matrix to equal observed values. In this case, the linear operator is a sampling operator, and the problem can be written as

 minimizeX rank(X) subject to Xij=Mij,(i,j)∈Ω,

where is the matrix we would like to recover, and where denotes the set of entries which are revealed. We define the sampling operator via

 (PΩ(X))ij={Xij(i,j)∈Ω0(i,j)∉Ω,

as in [10]. Further, denotes the complement of , i.e., all index pairs that are not in . Thus, corresponds to the collection of missing entries. Generally, a low-rank matrix completion problem gets harder when fewer entries are observed, or when the matrix is not approximately low-rank.

The rank minimization problem (1) is NP-hard in general, and therefore we consider its convex relaxation [8, 9, 10, 17, 40],

 (2) minimizeX ∥X∥∗ subject to A(X)=b,

where denotes the nuclear norm, given by the sum of singular values. The rank of a matrix equals the number of nonzero singular values of , also known as the

norm of the vector

of singular values of , and the nuclear norm of is the norm of . Indeed, if is a diagonal matrix, then (2) becomes the compressed sensing problem.

In [36], the authors propose adjusting the standard nuclear norm minimization problem to take into account the structural differences between observed and unobserved entries by regularizing the values of the unobserved entries (see Section 3.1). The method results in a semidefinite program; however, standard semidefinite programming tools do not work efficiently for solving large nuclear norm minimization problems.

Inspired by the iteratively reweighted least squares (IRLS) algorithm for sparse vector recovery analyzed in [15], iteratively reweighted least squares algorithms [19, 27, 35] have been proposed as a computationally efficient method for low-rank matrix recovery111To our knowledge, not yet in the structured setting. (see Section 2.3). Instead of minimizing the nuclear norm, the algorithms essentially minimize the Frobenius norm of the matrix, subject to affine constraints. Properly reweighting this norm produces low-rank solutions under suitable assumptions. In [35], Mohan and Fazel propose a family of Iterative Reweighted Least Squares algorithms for matrix rank minimization, called IRLS- (for ), as a computationally efficient way to improve over the performance of nuclear norm minimization. In addition, a gradient projection algorithm is presented as an efficient implementation for the algorithm. The algorithm exhibits improved recovery when compared to existing algorithms.

### 1.2. Contribution

We propose an iterative algorithm for low-rank matrix completion to take into account structural differences between observed and unobserved entries by adjusting the Iteratively Reweighted Least Squares (IRLS) algorithm studied in [35]. We refer to our algorithm as Structured IRLS. We consider sparsity-based structure in the missing entries whereby most of the missing values are close in the or norm sense to 0. Our objective function can be adjusted by applying a linear shift on the missing values to handle other fixed nonzero values. Further, we present a gradient-projection-based implementation, called Structured sIRLS (motivated by sIRLS in [35]). Lastly, we show that Structured sIRLS outperforms the sIRLS algorithm in structured settings, as showcased in our numerical experiments. Our code for Structured sIRLS is available at [1].

### 1.3. Organization

We review related iteratively reweighted least squares algorithms for recovering sparse vectors and low-rank matrices in Section 2. In Section 3, we describe the structured matrix completion problem and propose for this problem an iterative algorithm, Structured IRLS. Furthermore, we present a computationally efficient implementation, Structured sIRLS. In Section 4, we run numerical experiments to showcase the performance of this method, and compare it to the performance of sIRLS (studied in [35]) on various structured settings.

## 2. Iteratively Reweighted Least Squares Algorithms

In this section, we set notation for the rest of the paper, and review related algorithms for recovering sparse vectors and low-rank matrices.

### 2.1. Notation

The entries of a matrix are denoted by lowercase letters, i.e.  is the entry in row and column of . Let

denote the identity matrix and

the vector of all ones. The trace of a square matrix is the sum of its diagonal entries, and is denoted by . We denote the adjoint matrix of by . Without loss of generality, we assume

and we write the singular value decomposition of

as

 X=UΣV∗.

Here and are unitary matrices, and is a diagonal matrix, where are the singular values. The rank of , denoted by , equals the number of nonzero singular values of . Further, the Frobenius norm of the matrix is defined by

The nuclear norm of a matrix is defined by . Given a vector of positive weights, we define the weighted norm of a vector as

 ∥z∥ℓ2(w)=(n∑i=1wiz2i)1/2.

Let denote the vector of missing entries of a matrix , and let denote the corresponding vector with entries squared, i.e.  where denotes elementwise multiplication.

### 2.2. Sparse Vector Recovery

Given a vector , the value denotes the number of nonzero entries of , and is known as the norm of . The sparse vector recovery problem is described as

 (3) minimize ∥x∥0, subject to Ax=b,

where and

. This problem is known to be NP-hard. A commonly used convex heuristic for this problem is

minimization [6, 16],

 (4) minimize ∥x∥1, subject to Ax=b.

Indeed, many algorithms for solving (3) and (4) have been proposed. In [15], Daubechies et al. propose and analyze a simple and computationally efficient reweighted algorithm for sparse recovery, called the Iterative Reweighted Least Squares algorithm, IRLS-, for any . Its -th iteration is given by

 xk+1=argminx{n∑i=1wkix2i : Ax=b},

where is a weight vector with , and where is a regularization parameter added to ensure that is well-defined. Typically, the weights are initialized to one, so the first iteration gives the least norm solution to . For [15] gives a theoretical guarantee for sparse recovery similar to norm minimization. It was empirically observed in [13, 14] that IRLS- shows a better recovery performance than minimization for ; see also [12]. In [15], the authors provide proofs of several findings listed in these works. Further, the reader is referred to [32, 37, 38, 45] for related work on iteratively minimizing the norm for sparse recovery.

### 2.3. Low-rank Matrix Recovery

We review two related algorithms [19, 35] for low-rank matrix recovery that generalize the iteratively reweighted least squares algorithm analyzed in [15] for sparse recovery. In general, minimizing the Frobenius norm subject to affine constraints does not lead to low-rank solutions; however, properly reweighting this norm produces low-rank solutions under suitable assumptions [19, 35].

In [19], Fornasier et al. propose a variant of the reweighted least squares algorithm for sparse recovery for nuclear norm minimization (or low-rank matrix recovery), called IRLS-M. The -th iteration of IRLS-M is given by

 (5) Xk+1=argminX{∥(Wk)1/2X∥2F:PΩ(X)=PΩ(M)}.

Here is a weight matrix defined as , and for , , where and . Indeed, each iteration of (5) minimizes a weighted Frobenius norm of the matrix X. Under the assumption that the linear measurements fulfill a suitable generalization of the Null Space Property, the algorithm is guaranteed to iteratively recover any matrix with an error on the order of the best rank approximation [19]. The algorithm essentially has the same recovery guarantees as nuclear norm minimization. Though the Null Space Property fails in the matrix completion setup, the authors illustrate numerical experiments which show that the IRLS-M algorithm still works very well in this setting for recovering low-rank matrices. Further, for the matrix completion problem, the algorithm takes advantage of the Woodbury matrix identity, allowing an expedited solution to the least squares problem required at each iteration [19].

In [35], Mohan and Fazel propose a related family of Iterative Reweighted Least Squares algorithms for matrix rank minimization, called IRLS- (for ), as a computationally efficient way to improve over the performance of nuclear norm minimization. The -th iteration of IRLS- is given by

 (6) Xk+1=argminX{Tr(WkpX⊤X):PΩ(X)=PΩ(M)},

where is a weight matrix defined as , and for , . Here is a regularization parameter added to ensure that is well-defined. Each iteration of (6) minimizes a weighted Frobenius norm of the matrix X, since

 Tr(Wk−1pX⊤X)=∥(Wk−1p)1/2X∥2F.

The algorithms can be viewed as (locally) minimizing certain smooth approximations to the rank function. When , theoretical guarantees are given similar to those for nuclear norm minimization, i.e., recovery of low-rank matrices under the assumptions that the operator defining the constraints satisfies a specific Null Space Property (NSP). Further, for , IRLS- shows better empirical performance in terms of recovering low-rank matrices than nuclear norm minimization. In addition, a gradient projection algorithm, IRLS-GP, is presented as an efficient implementation for IRLS-. Further, this same paper presents a related family of algorithms sIRLS- (or short IRLS), which can be seen as a first-order method for locally minimizing a smooth approximation to the rank function. The results exploit the fact that these algorithms can be derived from the KKT conditions for minimization problems whose objectives are suitable smooth approximations to the rank function [35]. We will refer to IRLS- (resp. sIRLS-) studied in [35] as IRLS (resp. sIRLS).

The algorithms proposed in [19, 35] differ mainly in the implementation, and in the update rules of the weights and their corresponding regularizers. In IRLS-M [19], the weights are updated as , and in IRLS- [35] they are updated as . Further, each of the regularization parameters and are updated differently. The IRLS-M algorithm makes use of the rank of the matrix (either given or estimated), and thus the choice of parameter depends on this given or estimated rank. On the other hand, the IRLS- algorithm chooses and updates its regularizer based on prior sensitivity experiments.

## 3. Structured Iteratively Reweighted Least Squares Algorithms

In this section, we first introduce the structured matrix completion problem. Second, we introduce and analyze our proposed algorithm and implementation.

### 3.1. Problem Statement

In [36], the authors propose adjusting the standard nuclear norm minimization strategy for matrix completion to account for structural differences between observed and unobserved entries. This could be achieved by adding to problem (2) a regularization term on the unobserved entries that still results in a semidefinite program,

 (7) minimizeX ∥X∥∗+α∥PΩc(X)∥ subject to PΩ(X)=PΩ(M),

where , and where is an appropriate matrix norm. If most of the missing entries are zero except for a few, then the norm is a natural choice222The method can be rescaled if there instead is a preference for the missing entries to be near a nonzero constant. . If the missing entries are mostly close to zero, then the norm is a natural choice. The authors show that the proposed method outperforms nuclear norm minimization in certain structured settings.

Indeed, program (7) very closely resembles the problem of decomposing a matrix into a low-rank component and a sparse component (see e.g. [11]

). A popular method is Robust Principal Component Analysis (RPCA)

[7], where one assumes that a low-rank matrix has some set of its entries corrupted.

### 3.2. Proposed Algorithm: Structured IRLS

We propose an iterative reweighted least squares algorithm related to [19, 35] for matrix completion with structured observations. In particular, we adjust the Iteratively Reweighted Least Squares (IRLS) algorithm proposed in [35] to take into account the structural differences between observed and unobserved entries. We call our algorithm Structured IRLS.

Structured IRLS is designed to promote low-rank structure in the recovered matrix with sparsity in the missing entries. In many applications, missing values tend to be near a certain value, e.g. the maximum possible value in the range, or the value 1 might be the lowest possible value (“1 star” in movie ratings). In cases where this value is nonzero, our objective function can be adjusted accordingly. Note that for , the algorithm reduces to IRLS studied in [35].

Here denotes the vector of missing entries of the the -th approximation , and recall that denotes the vector with entries squared. Each iteration of Structured IRLS solves a quadratic program. The algorithm can be adjusted to have the norm for the regularization term on the unobserved entries by fixing the weights . Further, we can impose nonnegativity constraints on the missing entries by thresholding all missing entries to be nonnegative.

### 3.3. Proposed Implementation: Structured sIRLS

In this section, we propose a gradient-projection-based implementation of Structured IRLS, that we will refer to as Structured sIRLS. Indeed, sIRLS stands for short IRLS (in analogy to [35]), the reason being we do not perform gradient descent until convergence; instead we take however many steps desired. Further, calculating is computationally cheap, so the gradient projection algorithm can be used to efficiently solve the quadratic program in each iteration of Structured IRLS.

We derive the first part of our objective function as described in [35]. We define the smooth Schatten- function, for , as

 fp(X)=Tr(X⊤X+γI)p2=n∑i=1(σ2i(X)+γ)p2.

Note that is differentiable for , and convex for  [35]. For we have , which is also known as the Schatten-1 norm. Again for , we have as  [35]. Further, for , we define

 fp(X)=logdet(X⊤X+γI),

a smooth surrogate for (see e.g. [17, 18, 35, 40]). Thus, it is of interest to minimize subject to the set of constraints on the observed entries.

The gradient of is given by (see e.g. [29, 35]), and the gradient projection iterates are given by

 Xk+1=PΩc(Xk−sk∇fp(Xk))+PΩ(M),

where denotes the gradient step size at the -th iteration. This iterate describes our gradient step promoting low-rankness, where we preserve the observed entries and update only the missing entries. The gradient of at the -th iteration is given by , where we iteratively define as

 Wkp=(Xk⊤Xk+γkI)p2−1,

where .

Further, we promote sparsity in the missing entries as follows. Instead of minimizing the norm of the vector of missing entries, we iteratively minimize a re-weighted norm of missing entries as described in [15]. Let denote the vector of missing entries of the the -th approximation . Define the weighted norm of as

 gq(X)=∥z(X)∥2ℓ2(wq)=mn−|Ω|∑i=1(wq)iz2i(X),

where (as done in [15]). The -th entry of the gradient of is given by . Therefore, the gradient projection iterates are given by

 z(Xk+1)=z(Xk)−ck∇gq(Xk),

where denotes the gradient step size at the -th iteration. We iteratively define the weights as

 wkq=(z2(Xk)+ϵk1)q2−1,

where .

We outline in Algorithm 2 Structured sIRLS, a gradient-projection-based implementation of Structured IRLS.

In this implementation, we do not perform projected gradient descent on

 ∥(Wk−1p)1/2X∥2F+α∥z(X)∥2ℓ2(wq),

with . Instead, we perform projected gradient descent on and consecutively. This allows us to update the weights before each alternating step, and to control how many gradient steps we would like to perform on each function.

Further, a rank estimate of the matrix is used as an input to truncate SVD when computing the weights . In our implementation, we use a randomized algorithm for SVD computation [23]. When the rank of the matrix is not estimated or provided, we instead choose to be at each iteration, where is the largest integer such that , and where (as implemented in [35]).

## 4. Numerical Experiments

In this section, we run numerical experiments to showcase the performance of Structured sIRLS, and compare it to the performance of sIRLS (studied in [35]) on structured settings. We first describe the choice of parameters we use, and then describe our experiments for exact matrix completion and matrix completion with noise. Our code for Structured sIRLS is available at [1]. Further, we use the publicly available code of sIRLS [33].

### 4.1. Choice of parameters

In all the numerical experiments, we adopt the same parameters. However, one can use different choices for parameters, or optimize some of the parameters. We normalize the input data to have a spectral norm of 1 (as done in [35]).

We are particularly interested in the case . In our experiments, we set . Generally, these parameters can be varied . Each value of and define a different objective function (see Sections 2.2 and 2.3).

For the implementation parameters, we set and , which means that we take one gradient step to promote sparsity and ten gradient steps to promote low-rankness, respectively. These parameters can be varied based on the low-rankness of the matrix and on the expected sparsity of its missing entries. Further, we set the regularizers and at the -th iteration. However, there are other choices for these regularizers, for example could depend on the -th largest value of , where is the sparsity of (as done in [15]). Similarly, could depend on the -th singular value of , where is the rank of (as done in [19]). Lastly, for all we set the step size to promote low-rankness and to promote sparsity; however, these parameters could be scaled or varied. We define the relative distance between two consecutive approximations as

 d(Xk,Xk−1)=∥Xk−Xk−1∥F/∥Xk∥F.

We say the algorithm converges if we obtain . We set the tolerance for both sIRLS and Structured sIRLS in our comparison experiments333In the implementation of sIRLS provided by the authors [33, 35], the tolerance is set to . We lowered it to to report fair comparisons since our algorithm attains the tolerance with fewer iterations.. We set the maximum number of iterations for Structured sIRLS to be 1000 and for sIRLS to be 5000.

### 4.2. Exact Matrix Completion

We first investigate the performance of the Structured sIRLS algorithm when the observed entries are exact, i.e. there is no noise in the observed values. We construct matrices of rank as done in [36]. We consider , where and are sparse matrices. Indeed, the entries of (resp. ) are chosen to be zero uniformly at random so that on average (resp.

) of its entries are zero. The remaining nonzero entries are uniformly distributed at random between zero and one

444The sparsity level of the resulting matrix cannot be calculated exactly from the given sparsity levels of and .. We subsample from the zero and nonzero entries of the data matrix at various rates to generate a matrix with missing entries. We define the relative error of Structured sIRLS as

 ∥M−^X∥F/∥M∥F,

where is the output of the Structured sIRLS algorithm. Similarly, we define the relative error of sIRLS as

 ∥M−~X∥F/∥M∥F,

where is the output of the sIRLS algorithm. The average ratio is then defined as

 ∥M−^X∥F/∥M−~X∥F.

We say Structured sIRLS outperforms sIRLS when the average ratio is less than one, and vice versa when the average ratio is greater than one. These two cases, when the average ratio is strictly less than or greater than one, are visually represented by the white and black squares, respectively, in the bottom right plots of Figures 13. We refer to this binary notion of average ratio as binned average ratio. We report each of these error values in our numerical experiments.

It is important to note that the setting we are interested in is the structured setting where the missing entries are sparse. This setting can be observed in the upper left triangle of the matrix displaying the ratio of errors between sIRLS and Structured sIRLS (see the bottom left plots of Figures 15). In this region, the percentage of nonzero entries that are sampled is greater than the percentage of zero entries that are sampled. Further, we obtain better accuracy in general as we move right along a row or up along a column, since we are sampling more and more entries. In addition, it is important to note that in all experiments we are using the same algorithm (with fixed parameters) for all the cases considered in our computations, without any parameter optimization. The algorithm promotes sparsity in all the cases, even in the unstructured settings. Omitting the sparsity promoting step would result in an algorithm promoting only low-rankness.

#### 4.2.1. 1000×1000 rank 10 matrices

In Figure 1, we construct random matrices of size and of rank 10, as described in Section 4.2.

Error values below one in the bottom left plot of Figure 1 indicate that Structured sIRLS outperforms sIRLS. In this particular experiment, we observe that Structured sIRLS outperforms sIRLS for most of the structured cases (the upper left triangle), and more. For this particular experiment, it turns out that this happens roughly when the decimal percentage of sampled nonzero entries is greater than 0.2.

Note that in the case where all entries are observed (no longer a matrix completion problem), both relative errors are 0 and thus the average ratio is 1. We only say that Structured sIRLS outperforms sIRLS when the average ratio is strictly less than 1, and this is why the upper right pixel in the bottom right plot of Figure 1 is black. The same is true in Figures 2 and 3.

#### 4.2.2. 500×500 rank 10 matrices

In Figure 2, we construct random matrices of size and of rank 10, as described in Section 4.2.

We observe that Structured sIRLS outperforms sIRLS not only in the majority of the structured cases, but also in many of the other cases where the unobserved entries are not necessarily sparse.

#### 4.2.3. 100×100 rank 10 matrices

We now consider a harder problem. In Figure 3, we construct random matrices of size and of rank 10, as described in Section 4.2.

We observe in Figure 3 that Structured sIRLS outperforms sIRLS when the sampling rate of the nonzero entries is high (roughly speaking, when the decimal percentage of sampled nonzero entries is greater than 0.5), which covers the majority of the cases where there is sparsity-based structure in the missing entries.

#### 4.2.4. 100×100 matrices with no knowledge of the rank a priori

In Figure 4, we construct random matrices of size and of rank 8, as described in Section 4.2. For this experiment, we do not provide the algorithm with any rank estimate, for either sIRLS or Structured sIRLS. Instead, we allow the algorithm to estimate the rank at each iteration based on a heuristic described in Section 3.3.

We observe in the bottom right plot of Figure 4, where we look closely to the cases where the sampling rate of non-zero entries is at least 0.7, that Structured sIRLS outperform sIRLS to different extents. Indeed, Structured sIRLS does particularly better when more entries are observed.

### 4.3. Matrix Completion with Noise

In this section, we investigate the performance of Structured sIRLS when the observed entries are corrupted with noise. We consider the following noisy matrix completion problem [36],

 (8) minimizeX ∥X∥∗+α∥PΩc(X)∥ subject to PΩ(X)=PΩ(B),

where is an unknown low-rank matrix that we wish to recover, where is the measurement noise, and where the noisy matrix satisfies . Let

be i.i.d. Gaussian random variables. We define our noise model such that

for a noise parameter . We do so by adding noise of the form

 Zij=ϵ⋅∥PΩ(M)∥F∥PΩ(N)∥F⋅Nij,

where are i.i.d. Gaussian random variables with the standard distribution .

We adapt sIRLS and Structured sIRLS for noisy matrix completion by replacing the observed entries with the noisily observed entries . We do not vary the noisily observed entries, only the missing entries. We define the relative error as

 ∥B−^X∥F/∥B∥F,

where is the output of the Structured sIRLS algorithm.

In Figure 5, we consider rank 3 matrices with noise parameter , where we construct our matrices in the same fashion as in Section 4.2 and with the noise model described above. We consider analogous structured settings as in the prior experiments, and observe that sIRLS and Structured sIRLS algorithms perform roughly the same, showing that both are robust to noise but that improvements from the structure are less drastic in this setting.

## 5. Analytic Remarks

In this section, we provide an analytic remark, similar to [36, Proposition 1], applied to the objective functions for each iteration of IRLS [35] and Structured IRLS. We consider the simplified setting, in which all of the unobserved entries are exactly zero. We show that the approximation given by an iteration of Structured IRLS will always perform at least as well as that of IRLS with the same weights assigned.

###### Proposition 5.1.

Let

 ~X=argminX{∥W1/2X∥2F : PΩ(X)=PΩ(M)}

be the minimizer of the objective function of each iterate in IRLS [35]. Let

 ^X=argminX{∥W1/2X∥2F+α∥PΩc(X)∥ : PΩ(X)=PΩ(M)}

be the minimizer of the objective function generalizing555Here is an arbitrary matrix norm; one recovers Structured IRLS by choosing the norm . each iterate in Structured IRLS (with ). If

is the zero matrix and the same weights

are assigned, then for any matrix norm .

###### Proof.

By definition of , we have . Similarly, by definition of , we have . Therefore,

 ∥W1/2^X∥2F+α∥PΩc(^X)∥2≤∥W1/2~X∥2F+α∥PΩc(~X)∥2≤∥W1/2^X∥2F+α∥PΩc(~X)∥2.

Since , this implies . We therefore have

 ∥M−^X∥ =∥PΩc(^X)∥ since PΩ(M)=PΩ(^X) and PΩc(M)=0 ≤∥PΩc(~X)∥ by assumption on the restriction of the matrix norm to Ωc =∥M−~X∥ since PΩ(M)=PΩ(~X) and PΩc(M)=0.

## 6. Conclusion

In this paper, we consider the structured matrix completion problem, studied in [36]. We focus on the notion of structure in which the probability that an entry is observed or not depends mainly on the value of the entry. In particular, we are interested in sparsity-based structure in the missing entries whereby most of the missing values are close in the or norm sense to 0 (or more generally to a fixed value). For example, a missing rating of a movie might indicate the user’s lack of interest in that movie, thus suggesting a lower rating than otherwise expected. In recent work [36], Molitor and Needell propose adjusting the standard nuclear norm minimization problem to take into account the structural differences between observed and unobserved entries.

We propose an iterative algorithm for structured low-rank matrix completion, called Structured IRLS, by adjusting the IRLS algorithm proposed in [35]. We also present a gradient-projection-based implementation, called Structured sIRLS (based on sIRLS of [35]). The projection step is computationally cheap, so the gradient projection algorithm can be used to efficiently solve the quadratic program in each iteration of Structured IRLS. The algorithms are designed to promote low-rank structure in the recovered matrix with sparsity in the missing entries. Our objective function can be adjusted by applying a linear shift on the missing values to handle other fixed nonzero values.

We perform numerical experiments on various structured settings to test the performance Structured sIRLS compared to sIRLS. To generate a sparse matrix with missing entries, we subsample from the zero and nonzero entries of a sparse data matrix at various rates. Indeed, we are interested in the structured cases, when the sampling rate of the zero entries is lower than the sampling rate of the nonzero entries. For matrices of different sizes and sparsity levels, we report the relative error of sIRLS, the relative error of Structured sIRLS, and the average ratio between the two. The numerical experiments show that Structured sIRLS often gives better recovery results than sIRLS in structured settings.

In future work, we hope to extend the theoretical results for Structured IRLS to more general settings. In the simplified setting, in which all of the unobserved entries are exactly zero, we show that the approximation given by an iteration of Structured IRLS will always perform at least as well as that of IRLS with the same weights assigned. However, we empirically observe the stronger result that Structured sIRLS often outperforms sIRLS in structured settings (in which algorithms are run until convergence, and in which not all missing entries are zero). Another extension is to explore Structured IRLS for different values of and , both empirically and theoretically. Furthermore, a possible direction for future work is to extend sparsity-based structure in the missing entries to a more general notion of structure, whereby the probability that an entry is observed or not may depend on more than just the value of that entry. For example, one could imagine that columns in a matrix corresponding to popular movies would have many entries (user ratings) filled in. In this context, an entry might be more likely to be observed if many entries in its same column are also observed.

## References

• [1] Henry Adams, Lara Kassab, and Deanna Needell. Structured IRLS code.
• [2] Robert M Bell and Yehuda Koren. Lessons from the netflix prize challenge. SiGKDD Explorations, 9(2):75–79, 2007.
• [3] James Bennett and Stan Lanning. The Netflix prize. In Proceedings of KDD Cup and Workshop, volume 2007, page 35. New York, NY, USA., 2007.
• [4] Pratik Biswas, Tzu-Chen Lian, Ta-Chung Wang, and Yinyu Ye. Semidefinite programming based algorithms for sensor network localization. ACM Transactions on Sensor Networks (TOSN), 2(2):188–220, 2006.
• [5] Jian-Feng Cai, Emmanuel J Candès, and Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on optimization, 20(4):1956–1982, 2010.
• [6] Emmanuel Candes and Terence Tao. Decoding by linear programming. arXiv preprint math/0502327, 2005.
• [7] Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM (JACM), 58(3):1–37, 2011.
• [8] Emmanuel J Candès and Yaniv Plan. Matrix completion with noise. Proceedings of the IEEE, 98(6):925–936, 2010.
• [9] Emmanuel J Candès and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717, 2009.
• [10] Emmanuel J Candès and Terence Tao. The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5):2053–2080, 2010.
• [11] Venkat Chandrasekaran, Sujay Sanghavi, Pablo A Parrilo, and Alan S Willsky. Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization, 21(2):572–596, 2011.
• [12] Rick Chartrand. Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Processing Letters, 14(10):707–710, 2007.
• [13] Rick Chartrand and Valentina Staneva. Restricted isometry properties and nonconvex compressive sensing. Inverse Problems, 24(3):035020, 2008.
• [14] Rick Chartrand and Wotao Yin. Iteratively reweighted algorithms for compressive sensing. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 3869–3872. IEEE, 2008.
• [15] Ingrid Daubechies, Ronald DeVore, Massimo Fornasier, and C Sinan Güntürk. Iteratively reweighted least squares minimization for sparse recovery. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 63(1):1–38, 2010.
• [16] David L Donoho and Philip B Stark. Uncertainty principles and signal recovery. SIAM Journal on Applied Mathematics, 49(3):906–931, 1989.
• [17] Maryam Fazel. Matrix rank minimization with applications. 2002.
• [18] Maryam Fazel, Haitham Hindi, and Stephen P Boyd. Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. In Proceedings of the 2003 American Control Conference, 2003, volume 3, pages 2156–2162. IEEE, 2003.
• [19] Massimo Fornasier, Holger Rauhut, and Rachel Ward. Low-rank matrix recovery via iteratively reweighted least squares minimization. SIAM Journal on Optimization, 21(4):1614–1640, 2011.
• [20] Donald Goldfarb and Shiqian Ma. Convergence of fixed-point continuation algorithms for matrix rank minimization. Foundations of Computational Mathematics, 11(2):183–210, 2011.
• [21] David Gross. Recovering low-rank matrices from few coefficients in any basis. IEEE Transactions on Information Theory, 57(3):1548–1566, 2011.
• [22] David Gross, Yi-Kai Liu, Steven T Flammia, Stephen Becker, and Jens Eisert. Quantum state tomography via compressed sensing. Physical review letters, 105(15):150401, 2010.
• [23] Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288, 2011.
• [24] Prateek Jain, Raghu Meka, and Inderjit S Dhillon. Guaranteed rank minimization via singular value projection. In Advances in Neural Information Processing Systems, pages 937–945, 2010.
• [25] Raghunandan H Keshavan and Sewoong Oh. A gradient descent algorithm on the Grassman manifold for matrix completion. arXiv preprint arXiv:0910.5260, 2009.
• [26] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. Computer, (8):30–37, 2009.
• [27] Christian Kümmerle and Juliane Sigl. Harmonic mean iteratively reweighted least squares for low-rank matrix recovery. The Journal of Machine Learning Research, 19(1):1815–1863, 2018.
• [28] Kiryung Lee and Yoram Bresler. ADMiRA: Atomic decomposition for minimum rank approximation. IEEE Transactions on Information Theory, 56(9):4402–4416, 2010.
• [29] Adrian Stephen Lewis. Derivatives of spectral functions. Mathematics of Operations Research, 21(3):576–588, 1996.
• [30] Nathan Linial, Eran London, and Yuri Rabinovich. The geometry of graphs and some of its algorithmic applications. Combinatorica, 15(2):215–245, 1995.
• [31] Zhang Liu and Lieven Vandenberghe. Interior-point method for nuclear norm approximation with application to system identification. SIAM Journal on Matrix Analysis and Applications, 31(3):1235–1256, 2009.
• [32] Hassan Mansour and Rayan Saab. Recovery analysis for weighted -minimization using the null space property. Applied and Computational Harmonic Analysis, 43(1):23–38, 2017.
• [33] Karthik Mohan and Maryam Fazel. IRLS code. [Online; accessed 01-Aug-2019].
• [34] Karthik Mohan and Maryam Fazel. Reweighted nuclear norm minimization with application to system identification. In Proceedings of the 2010 American Control Conference, pages 2953–2959. IEEE, 2010.
• [35] Karthik Mohan and Maryam Fazel. Iterative reweighted algorithms for matrix rank minimization. Journal of Machine Learning Research, 13(Nov):3441–3473, 2012.
• [36] Denali Molitor and Deanna Needell. Matrix completion for structured observations. In 2018 Information Theory and Applications Workshop (ITA), pages 1–5. IEEE, 2018.
• [37] Deanna Needell. Noisy signal recovery via iterative reweighted L1-minimization. In Proceedings of the 43rd Asilomar conference on Signals, systems and computers, pages 113–117. IEEE Press, 2009.
• [38] Deanna Needell, Rayan Saab, and Tina Woolf. Weighted-minimization for sparse recovery under arbitrary prior information. Information and Inference: A Journal of the IMA, 6(3):284–309, 2017.
• [39] Benjamin Recht. A simpler approach to matrix completion. Journal of Machine Learning Research, 12(Dec):3413–3430, 2011.
• [40] Benjamin Recht, Maryam Fazel, and Pablo A Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review, 52(3):471–501, 2010.
• [41] Ralph Schmidt. Multiple emitter location and signal parameter estimation. IEEE transactions on antennas and propagation, 34(3):276–280, 1986.
• [42] Amit Singer. A remark on global positioning from local distances. Proceedings of the National Academy of Sciences, 105(28):9507–9511, 2008.
• [43] Anthony Man-Cho So and Yinyu Ye. Theory of semidefinite programming for sensor network localization. Mathematical Programming, 109(2-3):367–384, 2007.
• [44] Kim-Chuan Toh and Sangwoon Yun. An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pacific Journal of optimization, 6(615-640):15, 2010.
• [45] Tong Zhang. Analysis of multi-stage convex relaxation for sparse regularization. Journal of Machine Learning Research, 11(Mar):1081–1107, 2010.