1 Introduction
Due to the unconstrained nature of today’s data acquisition procedure, the observed data is often contaminated by gross errors, such as large corruptions and outliers. The gross errors, in general, could significantly reduce the representativeness of data samples and therefore seriously distort the analysis of data. Given this pressing situation, it is of considerable practical significance to study the problem of
Matrix Recovery, which aims to correct the errors possibly existing in a data matrix of observations.Problem 1.
(Matrix Recovery). Let be an observed data matrix which could be decomposed as
where is the target matrix of interest in which each column is a dimensional authentic sample, and corresponds to the possible errors. Given , the goal is to recover .
In general, the above problem is illposed, and thus some restrictions are necessary to be imposed for both and . Some methods have already been proposed to solve the above problem with proper constraints. For example, provided that is lowrank and is sparse, Problem 1 could be well solved by a convex procedure termed Principal Component Pursuit (PCP), which is also known as Robust Principal Component Anlaysis (RPCA) candes2011robust ; zhang2015exact . Outlier Pursuit (OP) xu2010robust solves Problem 1 under the conditions that is lowrank and is columnwisely sparse. Under similar conditions, LowRank Representation (LRR) tpami_2013_lrr ; liu:tpami:2016 guarantees to recover the row space of . In addition, LRR equipped with proper dictionaries could handle the cases where is of high coherence liu2017blessing ; liu:tsp:2016 . Even though these related approaches are very powerful, they all rely on the assumption that is lowrank, which, however, could be violated in practice.
To cope with the data of complex structures, it would be more suitable to consider the cases where is lowrank after feature mapping: namely, is implicitly lowrank in some (unknown) feature space but could be highrank or even fullrank by itself. There are only a few investigations in this direction, such as the Kernel Principal Component Analysis (KPCA) nguyen2009robust . In general, KPCA could apply to the data matrix that is implicitly lowrank but originally highrank. However, this method assumes that the data is contaminated by small Gaussian noise and is therefore brittle in the presence of gross errors. We also notice that many kernel methods have been established in community of lowrankness modeling, e.g., xiao2016robust ; ji2017low ; xie2018implicit ; nguyen2015kernel . Nevertheless, these methods are designed for a specific purpose of classification or clustering, and thus they cannot be directly applied to Problem 1 which is essentially a data recovery problem.
In this work, we would like to study Problem 1 in the context of is implicitly lowrank and contains gross errors. Following xu2010robust ; tpami_2013_lrr , we will focus on the case where is columnwisely sparse, i.e., the observed data matrix is contaminated by outliers. The basic idea of our method to pursue the lowrank structure of in an implicit feature space of higher but unknown (maybe infinite) dimension is simple and traditional. Nevertheless, it is indeed rather challenging to realize this idea:

Firstly, the rank of an unknowndimensional matrix cannot be calculated directly. To overcome this difficulty, we show that the nuclear norm of after feature mapping is actually equivalent to the nuclear norm of the square root of the Gram matrix (or kernel matrix as equal). This enables the possibility of obtaining a computable formulation for solving the lowrank constraint in an unknowndimensional implicit space.

Secondly, in the presence of outliers, it is actually inaccurate to estimate the Gram matrix based on
. This is because the outliers could seriously reduce the quality of the estimated Gram matrix, in which the whole column and row corresponding to the outliers are corrupted. Hence, we build our algorithm upon a kernel function that only defines the inner product of the points in feature space. Since the kernel function is independent of the data , this strategy is conducive to reduce the influence of outliers and preserve the geometry structure of the clean data. 
Finally, the combination of implicit feature mapping with kernel lowrankness pursuit generally leads to a challenging optimization problem, which is nonconvex and nonsmooth. To overcome this difficulty, we adopt the Accelerated Proximal Gradient (APG) method established by li2015accelerated , together with some linearization operators, to solved the raised optimization problem. In particular, we provide some theoretical analyses for the convergence of our optimization algorithm. Namely, the solution produced by the proposed algorithm is analytically proved to be a stationary point.
We conduct experiments on both synthetic and real datasets, and we also compare with some stateoftheart methods. The results show that, in terms of recovery accuracy, our method is distinctly better than all competing methods.
2 Related Work
2.1 Linear lowrank recovery
Recently, linear lowrank recovery has attracted great attention due to its pleasing efficacy in exploring the lowdimensional structures from given measurements. Formally, the linear lowrank recovery problem can be directly or indirectly written in the following form:
(1) 
where and represent the given data and the desired structure, respectively. is the error residue. is a certain robust norm to measure the residual between the observed and recovered signals. denotes a lowrank structure regularization and is a nonnegative parameter that provides a tradeoff between the recovery fidelity and the lowrank promoting regularizer. The major difference among existing recovery methods is pertaining to the choice of penalty on the residual. Candès et al. candes2011robust choose norm to model the sparse noise. They theoretically prove that their model can exactly recover the groundtruth data with the assumption of sparse outliers/noise. The works in xu2010robust ; zhang2015exact select
norm to penalize the columnsparse residual. Their model can also recover the correct column space of data. The linear lowrank recovery has been applied to many computer vision tasks, such as face recognition
zheng2014fisher and image classification zhang2015image , where they perform very well. Besides, for lowrank matrix recovery, Liu et al. liu2013fast propose a fast trifactorization method, and Cui et al. cui2018exactcome up with a transformed affine matrix rank minimization method.
2.2 Kernel lowrank method
KPCA, an widespread extension of traditional PCA, seeks a lowrank approximation of the affinity among the data points in the kernel space scholkopf1998nonlinear . Similar to PCA, it is sensitive to the outliers even after mapping. Hence, some robust kernel lowrank methods have been proposed and investigated. In particular, the works in baghshah2011learning ; ji2017low ; xie2018implicit provide kernel lowrank methods for subspace clustering, which demonstrate that the kernel lowrank approximation does benefit the clustering of nonlinear data. Nguyen et al. nguyen2015kernel apply the kernel lowrank representation to face recognition. Works in pan2011learning ; rakotomamonjy2014 investigate the influence of different kernels. Garg et al. garg2016non present a new way to pursue the lowrankness in the kernel space, but the measurement of the other regularization is still in the original space, which cannot be directly utilized to solve Problem 1.
Though the existing methods have achieved great success for the clustering or linear lowrank recovery tasks, none of them can robustly recover the nonlinear or super low dimensional data in the original space. Comparatively, our model solves Problem 1 robustly when is implicitly lowrank but could be highrank or even fullrank by itself.
3 Kernel LowRank Recovery
3.1 Problem Formulation
The model, for solving the linear lowrank recovery problem with columnwise noise, can be represented as:
(2) 
where
is the nuclear norm (sum of all singular values) and the
norm can be calculated as . To tackle the issue of implicitly lowrank data, it is worthwhile to kernelize the model in (2) to handle the data which are sampled from some complex nonlinear manifold. Moreover, in the scenario that the ambient dimension is far greater than the data size , kernel method is more efficient.Let be a mapping from the input space to the reproducing kernel Hilbert space . Here we assume that resides in a certain linear subspace in . Namely, the nonlinear observation is considered to be linearly dependent in . Let be a positive semidefinite kernel Gram matrix whose elements are computed as:
where is the kernel function and
With the above assumption, by kernelizing model (2), our model can be represented as:
(3) 
Note that, after mapping, the data matrix still contains columnwise noise or outliers. Hence, we also adopt the norm in (3) to measure the error residue in the kernel space.
3.2 Reformulation and Relaxation
It is hard to optimize (3) due to the explicit dependency on . Fortunately, as shown in garg2016non , a symmetric and positive semidefinite matrix can be factorized. We can easily derive the following proposition.
Proposition 0.
Assume is a kernel Gram matrix which is computed as , then we have
(4) 
where .
Substituting (4) into (3), we convert (3) into:
(5) 
We then relax the constrained problem to the following unconstrained one:
(6) 
where is a parameter which balances the difference and the original objective function. We can see that when is sufficiently large, (6) and (5) are the same model. It is worth mentioning that (5) can be solved by adopting the alternative direction method of multipliers (ADMM) technique. However, the optimization of the subproblem related to is nonconvex and an auxiliary variable will be introduced. ADMM fails to ensure the convergence when the optimization involves more than three variables. Therefore, we choose an APG based method for our nonconvex and nonsmooth problem whose convergence can be guaranteed li2015accelerated . Another advantages of the relaxation is that sometimes the rank of groundtruth matrix is higher than that of the solution of (5), which is caused by some unsuitable . the solution of (6) is closer to the groundtruth in this case, and thus (6) is more robust to the selection of mapping functions.
3.3 Optimization Algorithm
We will show how to solve (6) in this subsection. We minimize the objective function alternately over and . The updating of is performed by the Monotone APG together with some linear approximation. Meanwhile, the subproblem involving has a closedform solution.
(1) Update
can be updated by solving the following subproblem:
(7) 
where
. Denote the singular value decomposition (SVD) of
as , and this subproblem has a closedform solution given by garg2016non :(8) 
is a diagonal matrix with , where is the th singular value of . Hence, each can be achieved by solving a cubic equation. Note that
is not unique since one can multiply an arbitrary unitary matrix to the left of (
8) without changing the objective value in (7). Fortunately, the nonuniqueness does not affect the optimization of since only involves the updating of .(2) Update
To update , the following subproblem should be solved:
(9) 
where . By dividing the matrix into columns, (9) can be rewritten as:
where is the th column of . The solution of this problem can be achieved by the block coordinate descent (BCD) method xu2013block which minimizes the objective cyclically over each of while fixing the remaining blocks at their last updated values. Hence, we are required to address the following problem:
(10) 
To optimize this problem, it requires to define the kernel function . Here we choose two types of kernels (convex and nonconvex) as the examples. The optimization related to other kernel functions can be solved in a similar way.
(i) Convex kernel: We select the most commonly used convex kernel, i.e., Polynomial Kernel Function (). The inner product in the kernel space can be represented as
where is a free parameter trading off the influence of higherorder versus lowerorder terms in the polynomial. is the order of the polynomial kernel. (10) can be rewritten as
(11) 
Note that, is a realvalued function and it is differentiable at nonzero points. Thus we utilize its linear approximation at point to simplify and accelerate the optimization.
(12) 
where , , , and is the smooth parameter. Obviously, one local minimizer can be calculated in an alternating minimization way:
(13)  
(14)  
(15) 
where .
(ii) NonConvex kernel: For nonconvex kernel, we choose the Gaussian Kernel Function for mapping the observation into an infinitedimensional space. The inner product in the kernel space can be represented as , where and is the precision parameter of the Gaussian Kernel Function. (10) can be rewritten as:
(16) 
Note that, is a realvalued function and it is differentiable at nonzero points. Thus we utilize its linear approximation at point to simplify and accelerate the optimization. The problem of (16) is converted into:
(17) 
where , , and is the smooth parameter. Obviously, one local minimizer can be calculated in an alternating minimization way:
(18)  
(19)  
(20) 
where .
Note that, in most cases, the solution to the linear approximation problem is not exactly equivalent to that of the original problem. However, in contrast, here the updating steps (18) – (20) can solve the optimization in (16), which we will show in the next section.
(3) Solve Nonconvex Programming
The optimization problem in (15) or (20) is a nonconvex programming whose solution can be attained by the APG method. The updating steps of includes:
(21)  
(22)  
(23)  
(24)  
(25) 
where is for Polynomial Kernel or for Gaussian Kernel. is the gradient of and represents or . The proximal mapping is defined as . is a fixed constant satisfying . is the Lipschitz constant of and denotes .
The algorithm to solve (6) with the APG and alternating minimization is outlined in Algorithm 1.
3.4 Computational Complexity
The updating of consists of two parts, finding roots for cubic equations and performing the SVD operator on . The computational complexity for achieving the roots is , since we can get the closedform expression for the roots of cubic equations. The complexity for the SVD is , where is the size of data and is the rank of . During the procedure of updating according to (21)  (23), the
matrix vector multiplication needs to be carried out. Hence, the computational complexity for calculating
is . In summary, the total computational complexity for the whole algorithm in each iteration is .4 Theoretical Analysis
In this section, we first provide some useful theoretical results, including Lemma 3 for illustrating the connection between (16) and (17), as well as Theorem 4 and Theorem 5 for ensuing the convergence of the optimization.
Before stating the Lemma 1, we first introduce one proposition to rewrite the nonlinear mapping by its conjugated function. Based on the theory of convex conjugated functions rockafellar2015convex , we can derive the following proposition.
Proposition 0.
There exists a convex conjugated function of such that
(26) 
where is a scalar variable. For a fixed , the minimum is reached at he2011robust .
Proof.
We represent as . With the same spirit of the iteratively reweighted least squares (IRLS) method fornasier2011low , we can solve (16) by iteratively optimizing the following problem with the weight determined from the last iteration:
(27) 
where and is the smooth parameter. Substituting (26) into (27), it gives that:
Proposition 2 gives that . Hence, we get:
(28) 
Due to (27) – (28), we find that steps (18) – (20) actually solve the problem in (16) by the iteratively reweighted strategy, and hence cyclic iteration between these steps can solve the optimization in (16). ∎
We denote the objective of (6) as . Then the following theorem regarding the convergence of Algorithm 1 can be established.
Theorem 4.
The sequence generated in Algorithm 1 satisfies the following properties:
The objective is monotonically decreasing, i.e.
(29) 
,
The sequence , and are bounded.
Proof.
First, from the updating rule of in (8), we have
Note that is strongly convex. By the Lemma B.5 in mairal2013optimization .We have
(30) 
Second, we denote the objective in (17) as , from the Theorem 1 in li2015accelerated , for all , we have
(31) 
where . As aforementioned, is the linear approximation of at , which gives . From the concavity of , we have . Sum the inequality in (31) for all , we get
Thus, together with (30), we achieve the conclusion in (29). Hence, is monotonically decreasing and thus it is upper bounded. This implies that is bounded.
Now, summing (29) over , we have
This implies and . Then, similar to , is also bounded.
The proof is completed.
∎
Theorem 5.
The sequence generated in Algorithm 1 has at least one accumulation point. Let be any accumulation point of , and we have , i.e., is a stationary point.
Proof.
Now, from the boundedness of , there exist a point and a subsequence such that , . Then by (2) in Theorem 1, we have , . On the other hand, from the optimality of to (7), to (23) and Theorem 1 in li2015accelerated , we have
Let above. We have
Hence, is a stationary point of (6). ∎
5 Experimental Verification
5.1 Experimental Settings
In this section, we conduct experiments on both synthetic and real datasets to show the advantages of our proposed method.
Data: The real datasets cover two computer vision tasks: 1) nonlinear data recovery from the similarity; 2) nonlinear data denoising over the MNIST salakhutdinov2008quantitative and COIL20 nene1996columbia
databases. The MNIST database consists of
bit grayscale images of digits from ”” to ””. Each image is centered on a grid. The COIL20 database contains 1440 samples distributed over 20 objects, where each image is with the size of .Baselines: We assess the performance of the proposed model in comparison with several stateoftheart methods including Outlier Pursuit (OP) xu2010robust , KPCA nguyen2009robust and GRPCA shahid2015robust , the codes of which are downloaded from the authors’ websites except KPCA. We implement KPCA according to the paper. All methods’ settings follow the suggestions by the authors or the given parameters.
Evaluation metrics: Two metrics are used to evaluate the performance of data recovery methods.
– Peak SignaltoNoise Ratio (PSNR) : Suppose Mean Squared Error (MSE) is defined as , where are the original image and the recovered image, respectively, then the PSNR value can be calculated by .
– SignaltoNoise Ratio (SNR) : The SNR can be calculated by .
5.2 Data Recovery with Graph Constraint
In this experiment, we aim at recovering the data with a graph constraint. For our proposed model, we solve the subproblem (9) with fixed. Note that, except GRPCA, all other comparative methods cannot cope with this similarity recovery task. We examine the effectiveness of our model over the MNIST database. Firstly, we randomly select images from digit ”” and ””, and then rotate them with a random degree . Secondly, of the images are randomly chosen to corrupt: for each chosen image
, its observation is computed by adding Gaussian noise with zero mean and standard deviation
, or adding three blocks of structure occlusion with the size of . Finally, we convert these images to vectors of dimension. In order to construct the graph constraint for our proposed model and GRPCA, we adopt the same way as in shahid2015robust and the input graph is calculated from the nearest neighbors. Note that we utilize the Gaussian kernel function with on digit ”” and Polynomial Kernel Function with on digit ”” .Fig. 1 shows the results of our method and GRPCA on the rotated MNIST data. As we can see, the proposed method produces the encouraging recovery results and outperforms the competing method. This confirms the superiority of our model in the setting of highly nonlinear scenario. It is worthwhile to mention that the method used to solve the problem in (9) can be directly applied into some other scenarios, such as multimodal inference and multiview learning for recovery from similarity.
5.3 Data Denoising
We now evaluate the effectiveness of our method on the data denoising problem.
1) TwoDimensional Case: Fig. 2 shows the results of methods on some synthetic data. We randomly select 100 data points from a circle embedded in a twodimensional plane, resulting in a clean data matrix. We then select 10% data points as the outliers. In this example, the ambient data dimension is equal to the extrinsic dimension, and thus traditional lowrankness based methods (e.g., OP) cannot recover the data points correctly. In sharp contrast, as shown in Fig. 2, our method can still identify the outliers and replace them by the points which are close to the groundtruth manifold.
2) HighDimensional Case: We apply the proposed method to denoise data over the MNIST and the COIL20 databases. We compare all the recovery methods in two cases: (1) rotation with Gaussian noise, (2) rotation with occlusion. For the MNIST database, we randomly select images from digit ”” and ””, and then rotate them with a random degree. For the COIL20 database, we randomly choose subjects and their corresponding images, and then rotate each image times with a degree from to . In these two cases, data is randomly chosen to corrupt in the same way as the previous experiment. Finally, we convert these images to vectors and normalize them to a unit length.
Methods  MNIST  COIL20  
Gaussian(dB)  Occlusion(dB)  Gaussian(dB)  Occlusion(dB)  
PSNR  SNR  PSNR  SNR  PSNR  SNR  PSNR  SNR  
OP  21.32  13.24  19.29  10.17  27.87  19.13  26.25  19.14 
KPCA  24.30  16.27  21.28  14.23  29.19  22.16  27.22  21.15 
GRPCA  22.34  14.22  18.31  10.20  26.20  20.21  24.55  18.43 
Ours(Gaussian)  39.03  29.25  37.21  27.26  39.97  33.15  38.22  31.12 
Ours(Polynomial)  36.53  26.77  34.08  23.76  32.96  25.03  33.03  23.55 
Table 1 compares our model against all competing methods. We report the results of our proposed method with the Gaussian Kernel Function () and the Polynomial Kernel Function () in the last two lines. Since the rotated data owns a structure of highly nonlinearity, our method, which combines the data independent kernel trick and kernel lowrank pursuit, consistently outperforms other methods and obtains the highest PSNR and SNR. Fig. 3 visualizes the denoised results of all the methods. It can be seen that our model is more robust to the gross corruption and achieves better recovery results in details due to the unrestraint of rank in the original space. We notice that the proposed method with the Gaussian Kernel obtains a better recovery results than using the Polynomial Kernel. This is because, for big order , the first term in problem (17) dominates the optimization procedure, however, with small , the proposed method cannot capture the underlying nonlinear structure of data. Contrary to the Polynomial Kernel, the Gaussian Kernel is a infinitedimensional mapping with bounded value. Thus when the data owns a highly nonlinear structure, the Gaussian Kernel can perform better than Polynomial Kernel. Comparing with other methods, the results of ours, again, confirm the superiority of combing the implicitly lowrank pursuit and data independent kernel trick.
Methods  OP  KPCA  GRPCA  Ours (Gaussian kernel) 
MNIST (Gaussian noise)  43.9s  120.4s  131.2s  106.1s 
The CPU time for all the competing methods on the MNIST dataset with Gaussian noise is presented in tabel 2. All methods based on graph or kernel have high computational complexity. By utilizing the APG strategy for optimization, our method is faster than the other two graph based methods.
6 Conclusion and Future Work
This paper shows a method, more robust than other kernel methods, for solving the nonlinear matrix recovery problem. To solve the nonconvex optimization problem, we propose an algorithm that leverages the techniques of linearization and proximal gradient. In the meantime, we also analyze the convergence and complexity of our algorithm, and theoretically prove that the obtained solution is a stationary point. Compared with the stateoftheart methods, our proposed method achieves much better results in both data recovery and denoising tasks.
For future work, we hope to reduce the computational complexity of the proposed algorithm. It is worth mentioning that the computational complexity of updating can be reduced to , since the high cost of our method comes from a sequential updating step of the columns of matrix . In practice, when optimizing , other columns are fixed and we update the full when all columns are calculated. Hence a parallel strategy can be introduced, namely, we can parallelly update the columns of in one iteration. In this case, the computational complexity of updating can be reduced to .
Acknowledgment
The authors would like to thank the anonymous reviewers for their helpful comments. The work of Guangcan Liu is supported in part by National Natural Science Foundation of China (NSFC) under Grant 61622305 and Grant 61502238, in part by the Natural Science Foundation of Jiangsu Province of China (NSFJPC) under Grant BK20160040. The work of Jun Wang is supported in part by NSFC under Grant 61402224, 6177226), the Fundamental Research Funds for the Central Universities (NE2014402, NE2016004).
References
References
 [1] Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM, 58(3):11, 2011.
 [2] Hongyang Zhang, Zhouchen Lin, Chao Zhang, and Edward Y Chang. Exact recoverability of robust pca via outlier pursuit with tight recovery bounds. In AAAI, pages 3143–3149, 2015.
 [3] Huan Xu, Constantine Caramanis, and Sujay Sanghavi. Robust pca via outlier pursuit. In Advances in Neural Information Processing Systems, pages 2496–2504, 2010.
 [4] Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. Robust recovery of subspace structures by lowrank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):171–184, 2013.
 [5] Guangcan Liu, Huan Xu, Jinhui Tang, Qingshan Liu, and Shuicheng Yan. A deterministic analysis for LRR. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(3):417–430, 2016.
 [6] Guangcan Liu, Qingshan Liu, and Ping Li. Blessing of dimensionality: Recovering mixture data via dictionary pursuit. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(1):47–60, 2017.
 [7] Guangcan Liu and Ping Li. Lowrank matrix completion in the presence of high coherence. IEEE Transactions on Signal Processing, 64(21):5623–5633, 2016.
 [8] Minh H Nguyen and Fernando Torre. Robust kernel principal component analysis. In Advances in Neural Information Processing Systems, pages 1185–1192, 2009.

[9]
Shijie Xiao, Mingkui Tan, Dong Xu, and Zhao Yang Dong.
Robust kernel lowrank representation.
IEEE Transactions on Neural Networks and Learning Systems
, 27(11):2268–2281, 2016.  [10] Pan Ji, Ian Reid, Ravi Garg, Hongdong Li, and Mathieu Salzmann. Lowrank kernel subspace clustering. arXiv preprint arXiv:1707.04974, 2017.
 [11] Xingyu Xie, Xianglin Guo, Guangcan Liu, and Jun Wang. Implicit block diagonal lowrank representation. IEEE Transactions on Image Processing, 27(1):477–489, 2018.
 [12] Hoangvu Nguyen, Wankou Yang, Fumin Shen, and Changyin Sun. Kernel lowrank representation for face recognition. Neurocomputing, 155:32–42, 2015.
 [13] Huan Li and Zhouchen Lin. Accelerated proximal gradient methods for nonconvex programming. In Advances in Neural Information Processing Systems, pages 379–387, 2015.
 [14] Zhonglong Zheng, Mudan Yu, Jiong Jia, Huawen Liu, Daohong Xiang, Xiaoqiao Huang, and Jie Yang. Fisher discrimination based low rank matrix recovery for face recognition. Pattern recognition, 47(11):3502–3511, 2014.

[15]
Xu Zhang, Shijie Hao, Chenyang Xu, Xueming Qian, Meng Wang, and Jianguo Jiang.
Image classification based on lowrank matrix recovery and naive bayes collaborative representation.
Neurocomputing, 169:110–118, 2015.  [16] Yuanyuan Liu, LC Jiao, and Fanhua Shang. A fast trifactorization method for lowrank matrix recovery and completion. Pattern Recognition, 46(1):163–173, 2013.
 [17] Angang Cui, Jigen Peng, and Haiyang Li. Exact recovery lowrank matrix via transformed affine matrix rank minimization. Neurocomputing, 2018.

[18]
Bernhard Schölkopf, Alexander Smola, and KlausRobert Müller.
Nonlinear component analysis as a kernel eigenvalue problem.
Neural Computation, 10(5):1299–1319, 1998.  [19] Mahdieh Soleymani Baghshah and Saeed Bagheri Shouraki. Learning lowrank kernel matrices for constrained clustering. Neurocomputing, 74(1213):2201–2211, 2011.
 [20] Binbin Pan, Jianhuang Lai, and Pong C Yuen. Learning lowrank mercer kernels with fastdecaying spectrum. Neurocomputing, 74(17):3028–3035, 2011.
 [21] Alain Rakotomamonjy and Sukalpa Chanda. lpnorm multiple kernel learning with lowrank kernels. Neurocomputing, 143:68–79, 2014.
 [22] Ravi Garg, Anders Eriksson, and Ian Reid. Nonlinear dimensionality regularizer for solving inverse problems. arXiv preprint arXiv:1603.05015, 2016.

[23]
Yangyang Xu and Wotao Yin.
A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion.
SIAM Journal on imaging sciences, 6(3):1758–1789, 2013.  [24] Ralph Tyrell Rockafellar. Convex analysis. Princeton University Press, 2015.
 [25] Ran He, BaoGang Hu, WeiShi Zheng, and XiangWei Kong. Robust principal component analysis based on maximum correntropy criterion. IEEE Transactions on Image Processing, 20(6):1485–1494, 2011.
 [26] Massimo Fornasier, Holger Rauhut, and Rachel Ward. Lowrank matrix recovery via iteratively reweighted least squares minimization. SIAM Journal on Optimization, 21(4):1614–1640, 2011.

[27]
Julien Mairal.
Optimization with firstorder surrogate functions.
In
Proceedings of the International Conference on Machine Learning
, pages 783–791, 2013. 
[28]
Ruslan Salakhutdinov and Iain Murray.
On the quantitative analysis of deep belief networks.
In Proceedings of the International Conference on Machine Learning, pages 872–879, 2008.  [29] Sameer A Nene, Shree K Nayar, Hiroshi Murase, et al. Columbia object image library (coil20). 1996.
 [30] Nauman Shahid, Vassilis Kalofolias, Xavier Bresson, Michael Bronstein, and Pierre Vandergheynst. Robust principal component analysis on graphs. In Proceedings of the IEEE International Conference on Computer Vision, pages 2812–2820, 2015.
Comments
There are no comments yet.