1 Introduction
The rank minimization problem has a wide range of applications in matrix completion (MC) [1]
, robust principal component analysis (RPCA)
[2], lowrank representation [3], multivariate regression [4] and multitask learning [5]. To efficiently solve these problems, a principled way is to relax the rank function by its convex envelope [6, 7], i.e., the trace norm (also known as the nuclear norm), which also leads to a convex optimization problem. In fact, the trace norm penalty is annorm regularization of the singular values, and thus it motivates a lowrank solution. However,
[8] pointed out that thenorm overpenalizes large entries of vectors, and results in a biased solution. Similar to the
norm case, the trace norm penalty shrinks all singular values equally, which also leads to overpenalize large singular values. In other words, the trace norm may make the solution deviate from the original solution as the norm does. Compared with the trace norm, although the Schatten quasinorm for is nonconvex, it gives a closer approximation to the rank function. Therefore, the Schatten quasinorm minimization has attracted a great deal of attention in images recovery [9, 10], collaborative filtering [11] and MRI analysis [12].[13] and [14] proposed iterative reweighted lease squares (IRLS) algorithms to approximate associated Schatten quasinorm minimization problems. In addition, [10] proposed an iteratively reweighted nuclear norm (IRNN) algorithm to solve nonconvex surrogate minimization problems. In some recent work [15, 16, 11, 9, 10], the Schatten quasinorm has been shown to be empirically superior to the trace norm. Moreover, [17] theoretically proved that the Schatten quasinorm minimization with small
requires significantly fewer measurements than the convex trace norm minimization. However, all existing algorithms have to be solved iteratively and involve singular value decomposition (SVD) or eigenvalue decomposition (EVD) in each iteration. Thus they suffer from high computational cost and are even not applicable for largescale problems
[18].In contrast, the trace norm has a scalable equivalent formulation, the bilinear spectral regularization [19, 7], which has been successfully applied in many largescale applications, such as collaborative filtering [20, 21]. Since the Schatten quasinorm is equivalent to the quasinorm on the singular values, it is natural to ask the following question: can we design an equivalent matrix factorization form to some cases of the Schatten quasinorm, e.g., or ?
In this paper we first define two tractable Schatten norms, the bitrace (Bitr) and tritrace (Tritr) norms. We then prove that they are in essence the Schatten and quasinorms, respectively, for solving whose minimization we only need to perform SVDs on much smaller factor matrices to replace the large matrices in the algorithms mentioned above. Then we design two efficient linearized alternating minimization algorithms with guaranteed convergence to solve our problems. Finally, we provide the sufficient condition for exact recovery, and the restricted strong convexity (RSC) based and MC error bounds.
2 Notations and Background
The Schatten norm () of a matrix () is defined as
where denotes the th singular value of . For it defines a natural norm, for instance, the Schatten norm is the socalled trace norm, , whereas for it defines a quasinorm. As the nonconvex surrogate for the rank function, the Schatten quasinorm with is the better approximation of the matrix rank than the trace norm [17] (analogous to the superiority of the quasinorm to the norm [14, 22]).
We mainly consider the following Schatten quasinorm minimization problem to recover a lowrank matrix from a small set of linear observations, ,
(1) 
where is a linear measurement operator. Alternatively, the Lagrangian version of (1) is
(2) 
where
is a regularization parameter, and the loss function
generally denotes certain measurement for characterizing the loss term (for instance, is the linear projection operator , and in MC problems [15, 13, 23, 10]).The Schatten quasinorm minimization problems (1) and (2) are nonconvex, nonsmooth and even nonLipschitz [24]. Therefore, it is crucial to develop efficient algorithms that are specialized to solve some alternative formulations of Schatten quasinorm minimization (1) or (2). So far, only few algorithms, such as IRLS [14, 13] and IRNN [10], have been developed to solve such problems. In addition, since all existing Schatten quasinorm minimization algorithms involve SVD or EVD in each iteration, they suffer from a high computational cost of , which severely limits their applicability to largescale problems.
3 Tractable Schatten QuasiNorm Minimization
Lemma 1.
Given a matrix with , the following holds:
3.1 BiTrace QuasiNorm
Motivated by the equivalence relation between the trace norm and its bilinear spectral regularization form stated in Lemma 1, our bitrace (Bitr) norm is naturally defined as follows [18].
Definition 1.
For any matrix with , we can factorize it into two much smaller matrices and such that . Then the bitrace norm of is defined as
In fact, the bitrace norm defined above is not a real norm, because it is nonconvex and does not satisfy the triangle inequality of a norm. Similar to the wellknown Schatten quasinorm (), the bitrace norm is also a quasinorm, and their relationship is stated in the following theorem [18].
Theorem 1.
The bitrace norm is a quasinorm. Surprisingly, it is also the Schatten quasinorm, i.e.,
where is the Schatten quasinorm of .
The proof of Theorem 1 can be found in the Supplementary Materials. Due to such a relationship, it is easy to verify that the bitrace quasinorm possesses the following properties.
Property 1.
For any matrix with , the following holds:
Property 2.
The bitrace quasinorm satisfies the following properties:

, with equality iff .

is unitarily invariant, i.e., , where and have orthonormal columns.
3.2 TriTrace QuasiNorm
Similar to the definition of the bitrace quasinorm, our tritrace (Tritr) norm is naturally defined as follows.
Definition 2.
For any matrix with , we can factorize it into three much smaller matrices , and such that . Then the tritrace norm of is defined as
Like the bitrace quasinorm, the tritrace norm is also a quasinorm, as stated in the following theorem.
Theorem 2.
The tritrace norm is a quasinorm. In addition, it is also the Schatten quasinorm, i.e.,
The proof of Theorem 2 is very similar to that of Theorem 1 and is thus omitted. According to Theorem 2, it is easy to verify that the tritrace quasinorm possesses the following properties.
Property 3.
For any matrix with , the following holds:
Property 4.
The tritrace quasinorm satisfies the following properties:

, with equality iff .

is unitarily invariant, i.e., , where and have orthonormal columns.
The following relationship between the tracenorm and Frobenius norm is well known: . Similarly, the analogous bounds hold for the bitrace and tritrace quasinorms, as stated in the following property.
Property 5.
For any matrix with , the following inequalities hold:
Proof.
The proof of this property involves the following properties of the quasinorm. For any vectors and in and , we have
Suppose is of rank , and denote its skinny SVD by . By Theorems 1 and 2, and the properties of the quasinorm, we have
In addition,
. ∎
It is easy to see that Property 5 in turn implies that any low bitrace or tritrace quasinorm approximation is also a low trace norm approximation.
3.3 Problem Formulations
Bounding the Schatten quasinorm of in (1) by the bitrace or tritrace quasinorm defined above, the noiseless lowrank structured matrix factorization problem is given by
(3) 
where can also denote , and is replaced by . In addition, (3) has the following Lagrangian forms,
(4) 
(5) 
The formulations (3), (4) and (5) can address a wide range of problems, such as MC [13, 10], RPCA [2, 25, 26] ( is the identity operator, and or ), and lowrank representation [3] or multivariate regression [4] ( with being a given matrix, and or ). In addition, may be also chosen as the Hinge loss in [19] or the structured atomic norms in [27].
4 Optimization Algorithms
In this section, we mainly propose two efficient algorithms to solve the challenging bitrace quasinorm regularized problem (4) with a smooth or nonsmooth loss function, respectively. In other words, if is a smooth loss function, e.g., , we employ the proximal alternating linearized minimization (PALM) method as in [28] to solve (4). In contrast, to solve efficiently (4) with a nonsmooth loss function, e.g., , we need to introduce an auxiliary variable and obtain the following equivalent form:
(6) 
4.1 LADM Algorithm
To avoid introducing more auxiliary variables, inspired by [29], we propose a linearized alternating direction method (LADM) to solve (6), whose augmented Lagrangian function is given by
where is the Lagrange multiplier, denotes the inner product, and is a penalty parameter. By applying the classical augmented Lagrangian method to (6), we obtain the following iterative scheme:
(7a)  
(7b)  
(7c)  
(7d) 
where
. In many machine learning problems
[15, 3, 4], is not identity, e.g., the operator . Due to the presence of and , thus we usually need to introduce some auxiliary variables to achieve closedform solutions to (7a) and (7b). To avoid introducing additional auxiliary variables, we propose the following linearization technique for (7a) and (7b).4.1.1 Updating and
Let , then we can know that the gradient of is Lipschitz continuous with the constant , i.e., for any . By linearizing at and adding a proximal term, we have
(8) 
Therefore, we have
(9) 
4.1.2 Computing Step Sizes
There are two step sizes, i.e., the Lipschitz constants in (9) and in (10), need to be set during the iteration.
where denotes the adjoint operator of . Thus, both step sizes are defined in the following way:
(11) 
Based on the description above, we develop an efficient LADM algorithm to solve the Bitr quasinorm regularized problem (4) with a nonsmooth loss function (e.g., RPCA problems), as outlined in Algorithm 1. To further accelerate the convergence of the algorithm, the penalty parameter is adaptively updated by the strategy as in [32], as well as . Moreover, Algorithm 1 can be used to solve the noiseless problem (3) and also extended to solve the Tritr quasinorm regularized problem (5) with a nonsmooth loss function.
4.2 PALM Algorithm
By using the similar linearization technique in (9) and (10), we design an efficient PALM algorithm to solve (4) with a smooth loss function, e.g., MC problems. Specifically, by linearizing the smooth loss function at and adding a proximal term, we have the following approximation:
(12) 
where . Similarly,
(13) 
where .
4.3 Convergence Analysis
In the following, we provide the convergence analysis of our algorithms. First, we analyze the convergence of our LADM algorithm for solving (4) with a nonsmooth loss function, e.g., .
Theorem 3.
The proof of Theorem 3 is provided in the Supplementary Materials. From Theorem 3, we can know that under mild conditions each sequence generated by our LADM algorithm converges to a critical point, similar to the LADM algorithms for solving convex problems as in [32].
Moreover, we provide the global convergence of our PALM algorithm for solving (4) with a smooth loss function, e.g., .
Theorem 4.
Let be a sequence generated by our PALM algorithm, then it is a Cauchy sequence and converges to a critical point of (4) with the squared loss, .
The proof of Theorem 4 can be found in the Supplementary Materials. Theorem 4 shows the global convergence of our PALM algorithm. We emphasize that, different from the general subsequence convergence property, the global convergence property is given by as the number of iteration , where is a critical point of (4). On the contrary, existing algorithms for solving nonconvex and nonsmooth problems, such as [14] and [10], have only subsequence convergence property.
By the KurdykaŁojasiewicz (KL) property (for more details, see [28]) and Theorem 2 in [33], our PALM algorithm has the following convergence rate:
Theorem 5.
The sequence generated by our PALM algorithm converges to a critical point of with , which satisfies the KL property at each point of with for and . We have

If , converges to in finite steps;

If , then and such that ;

If , then such that .
5 Recovery Guarantees
We provide theoretical guarantees for our Bitr quasinorm minimization in recovering lowrank matrices from small sets of linear observations. By using the nullspace property (NSP), we first provide a sufficient condition for exact recovery of lowrank matrices. We then establish the restricted strong convexity (RSC) condition based and MC error bounds.
5.1 Null Space Property
The wide use of NSP for recovering sparse vectors and lowrank matrices can be found in [22, 34]. We give the sufficient and necessary condition for exact recovery via our bitrace quasinorm model (3) that improves the NSP condition for the Schatten quasinorm in [34]. Let , and , where and denote the matrices consisting the top left and right singular vectors of the true matrix (which satisfies ) with rank at most . denotes the null space of the linear operator . Then we have the following theorem, the proof of which is provided in the Supplementary Materials.
Theorem 6.
can be uniquely recovered by (3), if and only if for any , where , , we have
(14) 
5.2 RSC based Error Bound
Unlike most of existing recovery guarantees as in [17, 34], we do not impose the restricted isometry property (RIP) on the general operator , rather, we require the operator to satisfy a weaker and more general condition known as restricted strong convexity (RSC) [35], as shown in the following.
Assumption 1 (Rsc).
We suppose that there is a positive constant such that the general operator satisfies the following inequality
for all .
We mainly provide the RSC based error bound for robust recovery via our bitrace quasinorm algorithm with noisy measurements. To our knowledge, our recovery guarantee analysis is the first one for solutions generated by Schatten quasinorm algorithms, not for the global optima^{1}^{1}1It is well known that the Schatten quasinorm () problems in [15, 11, 14, 10, 9] are nonconvex, nonsmooth and nonLipschitz [24]. The recovery guarantees in [36, 17, 34] are naturally based on the global optimal solution of associated models. of (4) as in [36, 17, 34].
Theorem 7.
Assume is a true matrix and the corrupted measurements , where is noise with . Let be a critical point of (4) with the squared loss , and suppose the operator satisfies the RSC condition with a constant . Then
where .
The proof of Theorem 7 and the analysis of lowerboundedness of is provided in the Supplementary Materials.
5.3 Error Bound on Matrix Completion
Although the MC problem is a practically important application of (4), the projection operator in (15) does not satisfy the standard RIP and RSC conditions in general [1, 37, 38]. Therefore, we also need to provide the recovery guarantee for performance of our Bitr quasinorm minimization for solving the following MC problem.
(15) 
Without loss of generality, assume that the observed matrix can be decomposed as a true matrix of rank and a random Gaussian noise , i.e., . We give the following recovery guarantee for our Bitr quasinorm minimization (15).
Theorem 8.
Let be a critical point of the problem (15) with given rank , and . Then there exists an absolute constant
, such that with probability at least
,where and .
The proof of Theorem 8 and the analysis of lowerboundedness of can be found in the Supplementary Materials. When the samples size , the second and third terms diminish, and the recovery error is essentially bounded by the “average” magnitude of entries of noise . In other words, only observed entries are needed, significantly lower than in standard matrix completion theories [37, 39, 7], which will be confirmed by the following experimental results.
6 Experimental Results
We evaluate both the effectiveness and efficiency of our methods (i.e., the Bitr and Tritr methods) for solving MC and RPCA problems, such as collaborative filtering and text separation. All experiments were conducted on an Intel Xeon E74830V2 2.20GHz CPU with 64G RAM.
6.1 Synthetic Matrix Completion
The synthetic matrices with rank are generated randomly by the following procedure: the entries of both random matrices and are first generated as independent and identically distributed (i.i.d.) numbers, and then is assembled. The experiments are conducted on random matrices with different noise factors, or
, where the observed subset is corrupted by i.i.d. standard Gaussian random variables as in
[18]. In both cases, the sampling ratio (SR) is set to 20% or 30%. We use the relative standard error (
) as the evaluation measure, where denotes the recovered matrix.We compare our methods with two trace norm solvers: NNLS [40] and ALT [4], one bilinear spectral regularization method, LRMF [20], and two Schatten norm methods, IRLS [14] and IRNN [10]. The recovery results of IRLS and IRNN () on noisy random matrices are shown in Figure 4, from which we can observe that as a scalable alternative to trace norm regularization, LRMF with relatively small ranks often obtains more accurate solutions than its trace norm counterparts, i.e., NNLS and ALT. If is chosen from the range of , IRLS and IRNN have similar performance, and usually outperform NNLS, ALT and LRMF in terms of RSE, otherwise they sometimes perform much worse than the latter three methods, especially . This means that both our methods (which are in essence the Schatten and quasinorm algorithms) should perform better than them. As expected, the RSE results of both our methods under all of these settings are consistently much better than those of the other approaches. This clearly justifies the usefulness of our Bitr and Tritr quasinorm penalties. Moreover, the running time of all these methods on random matrices with different sizes is provided in the Supplementary Materials, which shows that our methods are much faster than the other methods. This confirms that both our methods have very good scalability and can address largescale problems.
6.2 Collaborative Filtering
We test our methods on the realworld recommendation system datasets: MovieLens1M, MovieLens10M and MovieLens20M^{2}^{2}2http://www.grouplens.org/node/73, and Netflix [41]. We randomly choose 90% as the training set and the remaining as the testing set, and the experimental results are reported over 10 independent runs. Besides those methods used above, we also compare our methods to one of the fastest methods, LMaFit [42], and use the root mean squared error (RMSE) as evaluation measure.
The testing RMSE of all those methods on the four datasets is reported in Figure 5, where the rank varies from 5 to 20 (the running time of all methods are provided in Supplementary Materials). From all these results, we can observe that for these fixed ranks, the matrix factorization methods including LMaFit, LRMF and our methods significantly perform better than the trace norm solvers including NNLS and ALT in terms of RMSE, especially on the three larger datasets, as shown in Figures 5
(b)(d). In most cases, the sophisticated matrix factorization based approaches outperform LMaFit as a baseline method without any regularization term. This suggests that those regularized models can alleviate the overfitting problem of matrix factorization. The testing RMSE of both our methods varies only slightly when the number of the given rank increases, while that of the other matrix factorization methods changes dramatically. This further means that our methods perform much more robust than them in terms of the given ranks. More importantly, both our methods under all of the rank settings consistently outperform the other methods in terms of prediction accuracy. This confirms that our Bitr or Tritr quasinorm regularized models can provide a good estimation of a lowrank matrix. Note that IRLS and IRNN could not run on the three larger datasets due to runtime exceptions. Moreover, our methods are much faster than LRMF, NNLS, ALT, IRLS and IRNN on all these datasets, and are comparable in speed with LMaFit. This shows that our methods have very good scalability and can solve largescale problems.
6.3 Text Separation
We conducted an experiment on artificially generated data to separate some text from an image. The groundtruth image is of size with rank equal to 10. Figure 6(a) shows the input image together with the original image. The input data are generated by setting 10% of the randomly selected pixels as missing entries. We compare our Bitr+, Tritr+ and Bitr+ methods (see Supplementary Materials for the details) to three stateoftheart methods, including PCP [2], LRMF+ [43] and + [11] with . For fairness, we set the rank of all methods to 15, and for all these algorithms.
The results of different methods are shown in Figure 6
, where the text detection accuracy (the score Area Under the receiver operating characteristic Curve, AUC) and the RSE of lowrank component recovery are reported. Note that we present the best performance results of
+ with all choices of in . For both lowrank component recovery and text separation, our Bitr+ method is significantly better than the other methods, not only visually but also quantitatively. In addition, our Bitr+ and Tritr+ methods have very similar performance to the + method, and all these three methods outperform PCP and LRMF+ in terms of AUC and RSE. Moreover, the running time of PCP, LRMF+, +, Tritr+, Bitr+ and Bitr+ is 31.57sec, 6.91sec, 163.65sec, 0.96sec, 0.57sec and 1.62sec, respectively. In other words, our three methods are at least 7, 12 and 4 times faster than the other methods, respectively. This is a very impressive result as our three methods are nearly 170, 290 or 100 times faster than the most related + method, which further confirms that our methods have good scalability.7 Conclusions
In this paper, we defined two tractable Schatten quasinorm formulations, and then proved that they are in essence the Schatten and quasinorms, respectively. By applying the two defined quasinorms to various rank minimization problems, such as MC and RPCA, we achieved some challenging nonsmooth and nonconvex problems. Then we designed two classes of efficient PALM and LADM algorithms to solve our problems with smooth and nonsmooth loss functions, respectively. Finally, we established that each bounded sequence generated by our algorithms converges to a critical point, and also provided the recovery performance guarantees for our algorithms. Experiments on realworld data sets showed that our methods outperform the stateoftheart methods in terms of both efficiency and effectiveness. For future work, we are interested in analyzing the recovery bound for our algorithms to solve the Bitr or Tritr quasinorm regularized problems with nonsmooth loss functions.
Acknowledgements
We thank the reviewers for their valuable comments. The authors are supported by the Hong Kong GRF 2150851. The project is funded by Research Committee of CUHK.
References
 [1] E. Candès and B. Recht. Exact matrix completion via convex optimization. Found. Comput. Math., 9(6):717–772, 2009.
 [2] E. Candès, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? J. ACM, 58(3):1–37, 2011.
 [3] G. Liu, Z. Lin, and Y. Yu. Robust subspace segmentation by lowrank representation. In ICML, pages 663–670, 2010.
 [4] C. Hsieh and P. A. Olsen. Nuclear norm minimization via active subspace selection. In ICML, pages 575–583, 2014.
 [5] A. Argyriou, C. A. Micchelli, M. Pontil, and Y. Ying. A spectral regularization framework for multitask structure learning. In NIPS, pages 25–32, 2007.

[6]
M. Fazel, H. Hindi, and S. P. Boyd.
A rank minimization heuristic with application to minimum order system approximation.
In ACC, pages 4734–4739, 2001.  [7] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev., 52:471–501, 2010.
 [8] J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its Oracle properties. J. Am. Statist. Assoc., 96:1348–1361, 2001.
 [9] Z. Lu and Y. Zhang. Schatten quasinorm regularized matrix optimization via iterative reweighted singular value minimization. arXiv:1401.0869v2, 2015.
 [10] C. Lu, J. Tang, S. Yan, and Z. Lin. Generalized nonconvex nonsmooth lowrank minimization. In CVPR, pages 4130–4137, 2014.
 [11] F. Nie, H. Wang, X. Cai, H. Huang, and C. Ding. Robust matrix completion via joint Schatten norm and norm minimization. In ICDM, pages 566–574, 2012.
 [12] A. Majumdar and R. K. Ward. An algorithm for sparse MRI reconstruction by Schatten norm minimization. Magn. Reson. Imaging, 29:408–417, 2011.
 [13] K. Mohan and M. Fazel. Iterative reweighted algorithms for matrix rank minimization. J. Mach. Learn. Res., 13:3441–3473, 2012.
 [14] M. Lai, Y. Xu, and W. Yin. Improved iteratively rewighted least squares for unconstrained smoothed minimization. SIAM J. Numer. Anal., 51(2):927–957, 2013.
 [15] G. Marjanovic and V. Solo. On optimization and matrix completion. IEEE Trans. Signal Process., 60(11):5714–5724, 2012.
 [16] F. Nie, H. Huang, and C. Ding. Lowrank matrix recovery via efficient Schatten norm minimization. In AAAI, pages 655–661, 2012.
 [17] M. Zhang, Z. Huang, and Y. Zhang. Restricted isometry properties of nonconvex matrix recovery. IEEE Trans. Inform. Theory, 59(7):4316–4323, 2013.
 [18] F. Shang, Y. Liu, and J. Cheng. Scalable algorithms for tractable Schatten quasinorm minimization. In AAAI, pages 2016–2022, 2016.
 [19] N. Srebro, J. Rennie, and T. Jaakkola. Maximummargin matrix factorization. In NIPS, pages 1329–1336, 2004.
 [20] K. Mitra, S. Sheorey, and R. Chellappa. Largescale matrix factorization with missing data under additional constraints. In NIPS, pages 1642–1650, 2010.

[21]
A. Aravkin, R. Kumar, H. Mansour, B. Recht, and F. J. Herrmann.
Fast methods for denoising matrix completion formulations, with applications to robust seismic data interpolation.
SIAM J. Sci. Comput., 36(5):S237–S266, 2014.  [22] S. Foucart and M. Lai. Sparsest solutions of underdetermined linear systems via minimization for . Appl. Comput. Harmon. Anal., 26:397–407, 2009.
 [23] Y. Liu, F. Shang, H. Cheng, and J. Cheng. A Grassmannian manifold algorithm for nuclear norm regularized least squares problems. In UAI, pages 515–524, 2014.
 [24] W. Bian, X. Chen, and Y. Ye. Complexity analysis of interior point algorithms for nonLipschitz and nonconvex minimization. Math. Program., 149:301–327, 2015.
 [25] F. Shang, Y. Liu, J. Cheng, and H. Cheng. Robust principal component analysis with missing data. In CIKM, pages 1149–1158, 2014.
 [26] F. Shang, Y. Liu, J. Cheng, and H. Cheng. Recovering lowrank and sparse matrices via robust bilateral factorization. In ICDM, pages 965–970, 2014.
 [27] M. Jaggi. Revisiting FrankWolfe: Projectionfree sparse convex optimization. In ICML, pages 427–435, 2013.
 [28] J. Bolte, S. Sabach, and M. Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program., 146:459–494, 2014.
 [29] J. Yang and X. Yuan. Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization. Math. Comp., 82:301–329, 2013.
 [30] J. Cai, E. Candès, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM J. Optim., 20(4):1956–1982, 2010.
 [31] I. Daubechies, M. Defrise, and C. DeMol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pur. Appl. Math., 57(11):1413–1457, 2004.
 [32] Z. Lin, R. Liu, and Z. Su. Linearized alternating direction method with adaptive penalty for lowrank representation. In NIPS, pages 612–620, 2011.
 [33] H. Attouch and J. Bolte. On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program., 116:5–16, 2009.
 [34] S. Oymak, K. Mohan, M. Fazel, and B. Hassibi. A simplified approach to recovery conditions for low rank matrices. In ISIT, pages 2318–2322, 2011.
 [35] S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A unified framework for highdimensional analysis of Mestimators with decomposable regularizers. In NIPS, pages 1348–1356, 2009.
 [36] A. Rohde and A. B. Tsybakov. Estimation of highdimensional lowrank matrices. Ann. Statist., 39(2):887–930, 2011.
 [37] E. Candès and Y. Plan. Matrix completion with noise. Proc. IEEE, 98(6):925–936, 2010.
 [38] P. Jain, R. Meka, and I. Dhillon. Guaranteed rank minimization via singular value projection. In NIPS, pages 937–945, 2010.
 [39] R. Keshavan, A. Montanari, and S. Oh. Matrix completion from a few entries. IEEE Trans. Inform. Theory, 56(6):2980–2998, 2010.
 [40] K.C. Toh and S. Yun. An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Pac. J. Optim., 6:615–640, 2010.
 [41] KDDCup. ACM SIGKDD and Netflix. In Proc. KDD Cup and Workshop, 2007.
 [42] Z. Wen, W. Yin, and Y. Zhang. Solving a lowrank factorization model for matrix completion by a nonlinear successive overrelaxation algorithm. Math. Prog. Comp., 4(4):333–361, 2012.
 [43] R. Cabral, F. Torre, J. Costeira, and A. Bernardino. Unifying nuclear norm and bilinear factorization approaches for lowrank matrix decomposition. In ICCV, pages 2488–2495, 2013.
 [44] R. Mazumder, T. Hastie, and R. Tibshirani. Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res., 11:2287–2322, 2010.
 [45] D. P. Bertsekas. Nonlinear Programming. The 2nd edition, Athena Scientific, Belmont, 2004.
 [46] M. C. Yue and A. M. C. So. A perturbation inequality for concave functions of singular values and its applications in lowrank matrix recovery. Appl. Comput. Harmon. Anal., 40(2):396–416, 2016.
 [47] Y. Wang and H. Xu. Stability of matrix factorization for collaborative filtering. In ICML, pages 417–424, 2012.
 [48] D. Krishnan and R. Fergus. Fast image deconvolution using hyperLaplacian priors. In NIPS, pages 1033–1041, 2009.
 [49] J. Zeng, S. Lin, Y. Wang, and Z. Xu. regularization: Convergence of iterative half thresholding algorithm. IEEE Trans. Signal Process., 62(9):2317–2329, 2014.
 [50] R. Larsen. PROPACKsoftware for large and sparse SVD calculations. Available from http://sun.stanford.edu/srmunk/PROPACK/, 2005.
8 More Notations
denotes the dimensional Euclidean space, and the set of all matrices with real entries is denoted by . Given matrices and , the inner product is defined by , where denotes the trace of a matrix. is the spectral norm and is equal to the maximum singular value of .
denotes an identity matrix.
For any vector , its quasinorm for is defined as
Comments
There are no comments yet.