In the era of big data, the low-rank matrix has become a useful and popular tool to express two-dimensional information. One well-known example is the rating matrix in the recommendation systems representing users’ tastes on products . Since users expressing similar ratings on multiple products tend to have the same interest for the new product, columns associated with users sharing the same interest are highly likely to be the same, resulting in the low rank structure of the rating matrix (see Fig. 1). Another example is the Euclidean distance matrix formed by the pairwise distances of a large number of sensor nodes. Since the rank of a Euclidean distance matrix in the -dimensional Euclidean space is at most (if , then the rank is 4), this matrix can be readily modeled as a low-rank matrix [2, 3, 4].
One major benefit of the low-rank matrix is that the essential information, expressed in terms of degree of freedom, in a matrix is much smaller than the total number of entries. Therefore, even though the number of observed entries is small, we still have a good chance to recover the whole matrix. There are a variety of scenarios where the number of observed entries of a matrix is tiny. In the recommendation systems, for example, users are recommended to submit the feedback in a form of rating number, e.g., 1 to 5 for the purchased product. However, users often do not want to leave a feedback and thus the rating matrix will have many missing entries. Also, in the internet of things (IoT) network, sensor nodes have a limitation on the radio communication range or under the power outage so that only small portion of entries in the Euclidean distance matrix is available.
When there is no restriction on the rank of a matrix, the problem to recover unknown entries of a matrix from partial observed entries is ill-posed. This is because any value can be assigned to unknown entries, which in turn means that there are infinite number of matrices that agree with the observed entries. As a simple example, consider the following matrix with one unknown entry marked ?
If is a full rank, i.e., the rank of is two, then any value except can be assigned to . Whereas, if is a low-rank matrix (the rank is one in this trivial example), two columns differ by only a constant and hence unknown element ? can be easily determined using a linear relationship between two columns (? = 10). This example is obviously simple, but the fundamental principle to recover a large dimensional matrix is not much different from this and the low-rank constraint plays a central role in recovering unknown entries of the matrix.
Before we proceed, we discuss a few notable applications where the underlying matrix is modeled as a low-rank matrix.
Recommendation system: In 2006, the online DVD rental company Netflix announced a contest to improve the quality of the company’s movie recommendation system. The company released a training set of half million customers. Training set contains ratings on more than ten thousands movies, each movie being rated on a scale from to 
. The training data can be represented in a large dimensional matrix in which each column represents the rating of a customer for the movies. The primary goal of the recommendation system is to estimate the users’ interests on products using the sparsely sampled111Netflix dataset consists of ratings of more than 17,000 movies by more than 2.5 million users. The number of known entries is only about 1% . rating matrix.222Customers might not necessarily rate all of the movies. Often users sharing the same interests in key factors such as the type, the price, and the appearance of the product tend to provide the same rating on the movies. The ratings of those users might form a low-rank column space, resulting in the low-rank model of the rating matrix (see Fig. 1).
: The problem to recover a signal not necessarily sparse from the magnitude of its observation is referred to as the phase retrieval. Phase retrieval is an important problem in X-ray crystallography and quantum mechanics since only the magnitude of the Fourier transform is measured in these applications. Suppose the unknown time-domain signal is acquired in a form of the measured magnitude of the Fourier transform. That is,
where is the set of sampled frequencies. Further, let
where is the conjugate transpose of . Then, (2) can be rewritten as
(4) (5) (6) (7)
where is the rank-1 matrix of the waveform . Using this simple transform, we can express the quadratic magnitude as linear measurement of . In essence, the phase retrieval problem can be converted to the problem to reconstruct the rank-1 matrix in the positive semi-definite (PSD) cone333If
is recovered, then the time-domain vector
can be computed by the eigenvalue decomposition of. :
Localization in IoT networks: In recent years, internet of things (IoT) has received much attention for its plethora of applications such as healthcare, automatic metering, environmental monitoring (temperature, pressure, moisture), and surveillance [6, 7, 2]. Since the action in IoT networks, such as fire alarm, energy transfer, emergency request, is made primarily on the data center, data center should figure out the location information of whole devices in the networks. In this scheme, called network localization (a.k.a. cooperative localization), each sensor node measures the distance information of adjacent nodes and then sends it to the data center. Then the data center constructs a map of sensor nodes using the collected distance information . Due to various reasons, such as the power outage of a sensor node or the limitation of radio communication range (see Fig. 2), only small number of distance information is available at the data center. Also, in the vehicular networks, it is not easy to measure the distance of all adjacent vehicles when a vehicle is located at the dead zone. An example of the observed Euclidean distance matrix is
where is the pairwise distance between two sensor nodes and . Since the rank of Euclidean distance matrix is at most +2 in the -dimensional Euclidean space ( or ) [3, 4], the problem to reconstruct can be well-modeled as the LRMC problem.
Image compression and restoration: When there is dirt or scribble in a two-dimensional image (see Fig. 3
), one simple solution is to replace the contaminated pixels with the interpolated version of adjacent pixels. A better way is to exploit intrinsic domination of a few singular values in an image. In fact, one can readily approximate an image to the low-rank matrix without perceptible loss of quality. By using clean (uncontaminated) pixels as observed entries, an original image can be recovered via the low-rank matrix completion.
Massive multiple-input multiple-output (MIMO): By exploiting hundreds of antennas at the basestation (BS), massive MIMO can offer a large gain in capacity. In order to optimize the performance gain of the massive MIMO systems, the channel state information at the transmitter (CSIT) is required . One way to acquire the CSIT is to let each user directly feed back its own pilot observation to BS for the joint CSIT estimation of all users . In this setup, the MIMO channel matrix can be reconstructed in two steps: 1) finding the pilot matrix using the least squares (LS) estimation or linear minimum mean square error (LMMSE) estimation and 2) reconstructing using the model where each column of is the pilot signal from one antenna at BS [10, 11]. Since the number of resolvable paths is limited in most cases, one can readily assume that . In the massive MIMO systems, is often much smaller than the dimension of due to the limited number of clusters around BS. Thus, the problem to recover at BS can be solved via the rank minimization problem subject to the linear constraint .
Other than these, there are a bewildering variety of applications of LRMC in wireless communication, such as millimeter wave (mmWave) channel estimation [13, 14], topological interference management (TIM) [15, 16, 17, 18] and mobile edge caching in fog radio access networks (Fog-RAN) [19, 20].
The paradigm of LRMC has received much attention ever since the works of Fazel , Candes and Recht , and Candes and Tao . Over the years, there have been lots of works on this topic [5, 57, 48, 49]
, but it might not be easy to grasp the essentials of LRMC from these studies. One reason is because many of these works are highly theoretical and based on random matrix theory, graph theory, manifold analysis, and convex optimization so that it is not easy to grasp the essential knowledge from these studies. Another reason is that most of these works are proposals of new LRMC technique so that it is difficult to catch a general idea and big picture of LRMC from these.
The primary goal of this paper is to provide a contemporary survey on LRMC, a new paradigm to recover unknown entries of a low-rank matrix from partial observations. To provide better view, insight, and understanding of the potentials and limitations of LRMC to researchers and practitioners in a friendly way, we present early scattered results in a structured and accessible way. Firstly, we classify the state-of-the-art LRMC techniques into two main categories and then explain each category in detail. Secondly, we present issues to be considered when using LRMC techniques. Specifically, we discuss the intrinsic properties required for low-rank matrix recovery and explain how to exploit a special structure, such as positive semidefinite-based structure, Euclidean distance-based structure, and graph structure, in LRMC design. Thirdly, we compare the recovery performance and the computational complexity of LRMC techniques via numerical simulations. We conclude the paper by commenting on the choice of LRMC techniques and providing future research directions.
Recently, there have been a few overview papers on LRMC. An overview of LRMC algorithms and their performance guarantees can be found in . A survey with an emphasis on first-order LRMC techniques together with their computational efficiency is presented in . Our work is distinct from the previous studies in several aspects. Firstly, we categorize the state-of-the-art LRMC techniques into two classes and then explain the details of each class, which can help researchers to easily determine which technique can be used for the given problem setup. Secondly, we provide a comprehensive survey of LRMC techniques and also provide extensive simulation results on the recovery quality and the running time complexity from which one can easily see the pros and cons of each LRMC technique and also gain a better insight into the choice of LRMC algorithms. Finally, we discuss how to exploit a special structure of a low-rank matrix in the LRMC algorithm design. In particular, we introduce the CNN-based LRMC algorithm that exploits the graph structure of a low-rank matrix.
We briefly summarize notations used in this paper.
For a vector , is the diagonal matrix formed by .
For a matrix , is the -th column of .
is the rank of .
is the transpose of .
For , and are the inner product and the Hadamard product (or element-wise multiplication) of two matrices and , respectively, where denotes the trace operator.
, , and stand for the spectral norm (i.e., the largest singular value), the nuclear norm (i.e., the sum of singular values), and the Frobenius norm of , respectively.
is the -th largest singular value of .
and are -dimensional matrices with entries being zero and one, respectively.
-dimensional identity matrix.
If is a square matrix (i.e., ), is the vector formed by the diagonal entries of .
is the vectorization of .
Ii Basics of Low-Rank Matrix Completion
In this section, we discuss the principle to recover a low-rank matrix from partial observations. Basically, the desired low-rank matrix can be recovered by solving the rank minimization problem
where is the index set of observed entries (e.g., in the example in (1)). One can alternatively express the problem using the sampling operator . The sampling operation of a matrix is defined as
Using this operator, the problem (9) can be equivalently formulated as
A naive way to solve the rank minimization problem (10) is the combinatorial search. Specifically, we first assume that . Then, any two columns of M are linearly dependent and thus we have the system of expressions for some . If the system has no solution for the rank-one assumption, then we move to the next assumption of . In this case, we solve the new system of expressions . This procedure is repeated until the solution is found. Clearly, the combinatorial search strategy would not be feasible for most practical scenarios since it has an exponential complexity in the problem size . For example, when is an matrix, it can be shown that the number of the system expressions to be solved is .
As a cost-effective alternative, various low-rank matrix completion (LRMC) algorithms have been proposed over the years. Roughly speaking, depending on the way of using the rank information, the LRMC algorithms can be classified into two main categories: 1) those without the rank information and 2) those exploiting the rank information. In this section, we provide an in depth discussion of two categories (see the outline of LRMC algorithms in Fig. 4).
Ii-a LRMC Algorithms Without the Rank Information
In this subsection, we explain the LRMC algorithms that do not require the rank information of the original low-rank matrix.
Ii-A1 Nuclear Norm Minimization (NNM)
Since the rank minimization problem (10) is NP-hard , it is computationally intractable when the dimension of a matrix is large. One common trick to avoid computational issue is to replace the non-convex objective function with its convex surrogate, converting the combinatorial search problem into a convex optimization problem. There are two clear advantages in solving the convex optimization problem: 1) a local optimum solution is globally optimal and 2) there are many efficient polynomial-time convex optimization solvers (e.g., interior point method  and semi-definite programming (SDP) solver).
In the LRMC problem, the nuclear norm , the sum of the singular values of , has been widely used as a convex surrogate of :
Indeed, it has been shown that the nuclear norm is the convex envelope (the “best” convex approximation) of the rank function on the set .444For any function , where is a convex set, the convex envelope of is the largest convex function such that for all . Note that the convex envelope of on the set is the nuclear norm . Note that the relaxation from the rank function to the nuclear norm is conceptually analogous to the relaxation from -norm to -norm in compressed sensing (CS) [39, 40, 41].
Now, a natural question one might ask is whether the NNM problem in (11) would offer a solution comparable to the solution of the rank minimization problem in (10). In , it has been shown that if the observed entries of a rank matrix are suitably random and the number of observed entries satisfies
) with overwhelming probability (see AppendixB).
where , is the sequence of linear sampling matrices, and are the observed entries. The problem (13) can be solved by the off-the-shelf SDP solvers such as SDPT3  and SeDuMi  using interior-point methods [26, 27, 28, 31, 30, 29]. It has been shown that the computational complexity of SDP techniques is where . Also, it has been shown that under suitable conditions, the output of SDP satisfies in at most iterations where is a positive constant . Alternatively, one can reconstruct by solving the equivalent nonconvex quadratic optimization form of the NNM problem . Note that this approach has computational benefit since the number of primal variables of NNM is reduced from to (). Interested readers may refer to  for more details.
Ii-A2 Singular Value Thresholding (SVT)
As an effort to mitigate the computational burden, the singular value thresholding (SVT) algorithm has been proposed . The key idea of this approach is to put the regularization term into the objective function of the NNM problem:
where is the regularization parameter. In [33, Theorem 3.1], it has been shown that the solution to the problem (14) converges to the solution of the NNM problem as .555In practice, a large value of has been suggested (e.g., for an low rank matrix) for the fast convergence of SVT. For example, when , it requires 177 iterations to reconstruct a 10001000 matrix of rank 10 .
Let be the Lagrangian function associated with (14), i.e.,
where is the dual variable. Let and be the primal and dual optimal solutions. Then, by the strong duality , we have
The SVT algorithm finds and in an iterative fashion. Specifically, starting with , SVT updates and as
where is a sequence of positive step sizes. Note that can be expressed as
where (a) is because and (b) is because vanishes outside of (i.e., ) by (17b). Due to the inclusion of the nuclear norm, finding out the solution of (18) seems to be difficult. However, thanks to the intriguing result of Cai et al., we can easily obtain the solution.
Theorem 1 ([33, Theorem 2.1]).
Let be a matrix whose singular value decomposition (SVD) is
be a matrix whose singular value decomposition (SVD) is. Define for . Then,
where is the singular value thresholding operator defined as
One can notice from (21a) and (21b) that the SVT algorithm is computationally efficient since we only need the truncated SVD and elementary matrix operations in each iteration. Indeed, let be the number of singular values of being greater than the threshold . Also, we suppose converges to the rank of the original matrix, i.e., . Then the computational complexity of SVT is . Note also that the iteration number to achieve the -approximation666By -approximation, we mean where is the reconstructed matrix and is the optimal solution of SVT. is . In Table I, we summarize the SVT algorithm. For the details of the stopping criterion of SVT, see [33, Section 5].
Over the years, various SVT-based techniques have been proposed [35, 78, 79]. In , an iterative matrix completion algorithm using the SVT-based operator called proximal operator has been proposed. Similar algorithms inspired by the iterative hard thresholding (IHT) algorithm in CS have also been proposed [35, 79].
|Input||observed entries ,|
|a sequence of positive step sizes ,|
|a regularization parameter ,|
|and a stopping criterion|
Ii-A3 Iteratively Reweighted Least Squares (IRLS) Minimization
Yet another simple and computationally efficient way to solve the NNM problem is the IRLS minimization technique [36, 37]. In essence, the NNM problem can be recast using the least squares minimization as
The key idea of the IRLS technique is to find and in an iterative fashion. The update expressions are
Note that the weighted least squares subproblem (24a) can be easily solved by updating each and every column of . In order to compute , we need a matrix inversion (24b). To avoid the ill-behavior (i.e., some of the singular values of approach to zero), an approach to use the perturbation of singular values has been proposed [36, 37]. Similar to SVT, the computational complexity per iteration of the IRLS-based technique is . Also, IRLS requires iterations to achieve the -approximation solution. We summarize the IRLS minimization technique in Table II.
|Input||a constant ,|
|a scaling parameter ,|
|and a stopping criterion|
|Initialize||iteration counter ,|
|a regularizing sequence ,|
|Compute a SVD perturbation version of |
Ii-B LRMC Algorithms Using Rank Information
In many applications such as localization in IoT networks, recommendation system, and image restoration, we encounter the situation where the rank of a desired matrix is known in advance. As mentioned, the rank of a Euclidean distance matrix in a localization problem is at most ( is the dimension of the Euclidean space). In this situation, the LRMC problem can be formulated as a Frobenius norm minimization (FNM) problem:
Due to the inequality of the rank constraint, an approach to use the approximate rank information (e.g., upper bound of the rank) has been proposed . The FNM problem has two main advantages: 1) the problem is well-posed in the noisy scenario and 2) the cost function is differentiable so that various gradient-based optimization techniques (e.g., gradient descent, conjugate gradient, Newton methods, and manifold optimization) can be used to solve the problem.
Over the years, various techniques to solve the FNM problem in (25) have been proposed [43, 44, 45, 46, 47, 48, 49, 50, 51, 57]. The performance guarantee of the FNM-based techniques has also been provided [59, 60, 61]. It has been shown that under suitable conditions of the sampling ratio and the largest coherence of (see the definition in Subsection III-A2), the gradient-based algorithms globally converges to with high probability . Well-known FNM-based LRMC techniques include greedy techniques , alternating projection techniques , and optimization over Riemannian manifold . In this subsection, we explain these techniques in detail.
Ii-B1 Greedy Techniques
In recent years, greedy algorithms have been popularly used for LRMC due to the computational simplicity. In a nutshell, they solve the LRMC problem by making a heuristic decision at each iteration with a hope to find the right solution in the end.
Let be the rank of a desired low-rank matrix and be the singular value decomposition of where . By noting that
can be expressed as a linear combination of rank-one matrices. The main task of greedy techniques is to investigate the atom set of rank-one matrices representing . Once the atom set is found, the singular values can be computed easily by solving the following problem
To be specific, let , and . Then, we have .
One popular greedy technique is atomic decomposition for minimum rank approximation (ADMiRA) , which can be viewed as an extension of the compressive sampling matching pursuit (CoSaMP) algorithm in CS [38, 39, 40, 41]. ADMiRA employs a strategy of adding as well as pruning to identify the atom set . In the adding stage, ADMiRA identifies rank-one matrices representing a residual best and then adds the matrices to the pre-chosen atom set. Specifically, if is the output matrix generated in the -th iteration and is its atom set, then ADMiRA computes the residual and then adds leading principal components of to . In other words, the enlarged atom set is given by
where and are the -th principal left and right singular vectors of , respectively. Note that contains at most elements. In the pruning stage, ADMiRA refines into a set of atoms. To be specific, if is the best rank- approximation of , i.e.,777Note that the solution to (29) can be computed in a similar way as in (27).
then the refined atom set is expressed as
where and are the -th principal left and right singular vectors of , respectively. The computational complexity of ADMiRA is mainly due to two operations: the least squares operation in (27) and the SVD-based operation to find out the leading atoms of the required matrix (e.g., and ). First, since (27) involves the pseudo-inverse of (size of ), its computational cost is . Second, the computational cost of performing a truncated SVD of atoms is . Since , the computational complexity of ADMiRA per iteration is . Also, the iteration number of ADMiRA to achieve the -approximation is . In Table III, we summarize the ADMiRA algorithm.
|Input||observed entries ,|
|rank of a desired low-rank matrix ,|
|and a stopping criterion|
|Initialize||iteration counter ,|
Yet another well-known greedy method is the rank-one matrix pursuit algorithm , an extension of the orthogonal matching pursuit algorithm in CS . In this approach, instead of choosing multiple atoms of a matrix, an atom corresponding to the largest singular value of the residual matrix is chosen.
Ii-B2 Alternating Minimization Techniques
Many of LRMC algorithms [33, 43] require the computation of (partial) SVD to obtain the singular values and vectors (expressed as ). As an effort to further reduce the computational burden of SVD, alternating minimization techniques have been proposed [45, 46, 47]. The basic premise behind the alternating minimization techniques is that a low-rank matrix of rank can be factorized into tall and fat matrices, i.e., where and . The key idea of this approach is to find out and minimizing the residual defined as the difference between the original matrix and the estimate of it on the sampling space. In other words, they recover and by solving
Alternating steepest descent (ASD) is another alternating method to find out the solution . The key idea of ASD is to update and by applying the steepest gradient descent method to the objective function in (31). Specifically, ASD first computes the gradient of with respect to and then updates along the steepest gradient descent direction:
where the gradient descent direction and stepsize are given by
After updating , ASD updates in a similar way:
The low-rank matrix fitting (LMaFit) algorithm finds out the solution in a different way by solving 
With the arbitrary input of and and , the variables , , and are updated in the -th iteration as
where is Moore-Penrose pseudoinverse of matrix .
Running time of the alternating minimization algorithms is very short due to the following reasons: 1) it does not require the SVD computation and 2) the size of matrices to be inverted is smaller than the size of matrices for the greedy algorithms. While the inversion of huge size matrices (size of ) is required in a greedy algorithms (see (27)), alternating minimization only requires the pseudo inversion of and (size of and , respectively). Indeed, the computational complexity of this approach is , which is much smaller than that of SVT and ADMiRA when . Also, the iteration number of ASD and LMaFit to achieve the -approximation is [46, 47]. It has been shown that alternating minimization techniques are simple to implement and also require small sized memory . Major drawback of these approaches is that it might converge to the local optimum.
Ii-B3 Optimization over Smooth Riemannian Manifold
In many applications where the rank of a matrix is known in a priori (i.e., ), one can strengthen the constraint of (25) by defining the feasible set, denoted by , as
Note that is not a vector space888This is because if and , then is not necessarily true (and thus does not need to belong ). and thus conventional optimization techniques cannot be used to solve the problem defined over . While this is bad news, a remedy for this is that is a smooth Riemannian manifold [53, 48]. Roughly speaking, smooth manifold is a generalization of on which a notion of differentiability exists. For more rigorous definition, see, e.g., [55, 56]. A smooth manifold equipped with an inner product, often called a Riemannian metric, forms a smooth Riemannian manifold. Since the smooth Riemannian manifold is a differentiable structure equipped with an inner product, one can use all necessary ingredients to solve the optimization problem with quadratic cost function, such as Riemannian gradient, Hessian matrix, exponential map, and parallel translation . Therefore, optimization techniques in (e.g., steepest descent, Newton method, conjugate gradient method) can be used to solve (25) in the smooth Riemannian manifold .
In recent years, many efforts have been made to solve the matrix completion over smooth Riemannian manifolds. These works are classified by their specific choice of Riemannian manifold structure. One well-known approach is to solve (25) over the Grassmann manifold of orthogonal matrices999The Grassmann manifold is defined as the set of the linear subspaces in a vector space . . In this approach, a feasible set can be expressed as and thus solving (25) is to find an orthonormal matrix satisfying
Recently, it has been shown that the original matrix can be reconstructed by the unconstrained optimization over the smooth Riemannian manifold . Often, is expressed using the singular value decomposition as
The FNM problem (25) can then be reformulated as an unconstrained optimization over :
One can easily obtain the closed-form expression of the ingredients such as tangent spaces, Riemannian metric, Riemannian gradient, and Hessian matrix in the unconstrained optimization [53, 55, 56]. In fact, major benefits of the Riemannian optimization-based LRMC techniques are the simplicity in implementation and the fast convergence. Similar to ASD, the computational complexity per iteration of these techniques is , and they require iterations to achieve the -approximation solution .
Ii-B4 Truncated NNM
Truncated NNM is a variation of the NNM-based technique requiring the rank information .101010Although truncated NNM is a variant of NNM, we put it into the second category since it exploits the rank information of a low-rank matrix. While the NNM technique takes into account all the singular values of a desired matrix, truncated NNM considers only the smallest singular values . Specifically, truncated NNM finds a solution to
where . We recall that is the -th largest singular value of . Using 
and thus the problem (42) can be reformulated to
This problem can be solved in an iterative way. Specifically, starting from , truncated NNM updates by solving 
where are the matrices of left and right-singular vectors of , respectively. We note that an approach in (46) has two main advantages: 1) the rank information of the desired matrix can be incorporated and 2) various gradient-based techniques including alternating direction method of multipliers (ADMM) [80, 81], ADMM with an adaptive penalty (ADMMAP) , and accelerated proximal gradient line search method (APGL) can be employed. Note also that the dominant operation is the truncated SVD operation and its complexity is , which is much smaller than that of the NNM technique (see Table V) as long as