# Robust Matrix Completion via Maximum Correntropy Criterion and Half Quadratic Optimization

Robust matrix completion aims to recover a low-rank matrix from a subset of noisy entries perturbed by complex noises, where traditional methods for matrix completion may perform poorly due to utilizing l_2 error norm in optimization. In this paper, we propose a novel and fast robust matrix completion method based on maximum correntropy criterion (MCC). The correntropy based error measure is utilized instead of using l_2-based error norm to improve the robustness to noises. Using the half-quadratic optimization technique, the correntropy based optimization can be transformed to a weighted matrix factorization problem. Then, two efficient algorithms are derived, including alternating minimization based algorithm and alternating gradient descend based algorithm. The proposed algorithms do not need to calculate singular value decomposition (SVD) at each iteration. Further, the adaptive kernel selection strategy is proposed to accelerate the convergence speed as well as improve the performance. Comparison with existing robust matrix completion algorithms is provided by simulations, showing that the new methods can achieve better performance than existing state-of-the-art algorithms.

Comments

There are no comments yet.

## Authors

• 3 publications
• 84 publications
• 3 publications
• 35 publications
• 30 publications
• ### A Combinatorial Algebraic Approach for the Identifiability of Low-Rank Matrix Completion

In this paper, we review the problem of matrix completion and expose its...
06/27/2012 ∙ by Franz Király, et al. ∙ 0

read it

• ### Matrix Completion with Noisy Entries and Outliers

This paper considers the problem of matrix completion when the observed ...
03/01/2015 ∙ by Raymond K. W. Wong, et al. ∙ 0

read it

• ### Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares

The matrix-completion problem has attracted a lot of attention, largely ...
10/09/2014 ∙ by Trevor Hastie, et al. ∙ 0

read it

• ### Faster Matrix Completion Using Randomized SVD

Matrix completion is a widely used technique for image inpainting and pe...
10/16/2018 ∙ by Xu Feng, et al. ∙ 0

read it

• ### Fast methods for denoising matrix completion formulations, with applications to robust seismic data interpolation

Recent SVD-free matrix factorization formulations have enabled rank mini...
02/20/2013 ∙ by Aleksandr Y. Aravkin, et al. ∙ 0

read it

• ### A Novel Approach to Quantized Matrix Completion Using Huber Loss Measure

In this paper, we introduce a novel and robust approach to Quantized Mat...
10/29/2018 ∙ by Ashkan Esmaeili, et al. ∙ 0

read it

• ### Matrix Completion and Performance Guarantees for Single Individual Haplotyping

Single individual haplotyping is an NP-hard problem that emerges when at...
06/13/2018 ∙ by Somsubhra Barik, et al. ∙ 0

read it

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Matrix completion is a novel technique that can fulfill the missing entries of a partially observed low-rank matrix [1, 2, 3]

. Matrix completion takes advantages of low-rank properties of the matrix, which is always available in the fields such as computer vision

[4, 5], recommender systems [6][7], where the data can always be well approximated by low-rank representation or structure. A typical application of matrix completion is the recommender system [8]. In the recommender system, how to guide users’ behavior based on existing data is a crucial problem that should be considered. For example, Netflix–the world’s largest online movie renter, which wants to recommend movies that might be of interest to users based on their behavior (ratings of various types of movies). User behavior consistency can be expressed by low rank property, thus matrix completion can be applied to predict the missing ratings of the users [9, 10].

Although matrix completion is, in general, an NP-hard problem, in the last decades various algorithms have been proposed to tackle this problem and show accurate reconstruction performance. In particular, by formulating the problem as a constrained rank (or nuclear norm) minimization problem, several algorithms have been developed including normalized iterative hard thresholding (NIHT) [11], iterative soft thresholding (IST) [12], singular value thresholding (SVT) [13] and fixed point continuation (FPC) [14]. These algorithms need to calculate full or truncated singular value decomposition (SVD) at each iteration, which may cause high computational complexity especially when the data scale is large. To reduce the performance degradation caused by SVD computation, recently the matrix factorization model is applied to solve the matrix completion problem [3, 15, 16, 17]. In matrix factorization, the target matrix is represented as a multiple of two matrices so that the low-rank property can be guaranteed. Thus algorithms based on matrix factorization can naturally overcome the drawback of low efficiency in SVD computation. The representative algorithms include Power factorization (PF) [18], Low-rank Matrix Fitting (LMaFit) [19] and alternating steepest descent (ASD) [20].

Traditional matrix completion often utilizes

error norm in optimization, which can perform well under Gaussian noise assumption. However, in real applications, the observations may contain outliers. For example, in the recommender systems, the rate may exist human errors which is unreliable. In this case, the

error norm may not properly represent the error statistics and the performance of traditional matrix completion algorithms may degrade. To improve the robustness against outliers, several robust matrix completion algorithms have been proposed. In [21], the authors utilize error norm-based the cost function and solve the optimization problem by regression and alternating direction method of multipliers (ADMM). In [22]

, the authors propose two new robust loss functions and utilize a distributed optimization framework

[23] to solve the problem in parallel.

In recent years, an information theoretic learning (ITL) [24] measure called correntropy has been proposed to deal with the robust learning problem [25, 26]. Correntropy is a smooth local similarity measure, which has its root in Renyi’s entropy [27]

. With a Gaussian kernel, correntropy involves all the even moments of the error and is insensitive to large outliers

[28]. Compared with norm, correntropy based methods can achieve better performance especially when the outliers are large [29, 30]. Applying correntropy to matrix completion, the correntropy based iterative soft and hard thresholding strategies have been proposed [31]. However, as mentioned before, the iterative thresholding based algorithms need to compute the SVD and will suffer from high computational cost when data scale is large.

In the present paper, we combine correntropy with matrix factorization method and propose a new cost function for robust matrix completion. The correntropy measure is utilized instead of using norm, thus the negative effects of outliers can be alleviated. Using matrix factorization, there is no need to compute the SVD at each iteration. Further, to efficiently solve the correntropy based optimization, the half-quadratic (HQ) technique is adopted [32]. Using HQ, the complex optimization problem can be transformed into a quadratic optimization, and the traditional quadratic optimization method can be applied.

Based on HQ, we propose two algorithms for robust matrix completion. The first algorithm utilizes the traditional alternating minimization method [17]. At each minimization step, the correntropy based optimization is transformed to a weighted least squares problem so that the solution can be iteratively obtained. The second algorithm directly transforms the correntropy based cost to a weighted matrix completion problem and then utilize the alternating steepest descend (ASD) method [20]. Both algorithms utilize HQ technique but in different ways for optimization. Moreover, taking advantage of the properties of correntropy, we propose an adaptive kernel width selection strategy for the proposed algorithms to improve the convergence speed as well as reconstruction accuracy. In summary, the main contributions of this paper are:

1) A new cost function for robust matrix completion is proposed.

2) Two efficient algorithms are developed via HQ techniques.

3) Extensive simulations demonstrate the superior performance of the proposed algorithms compared with other state-of-the-art algorithms.

The paper is organized as follows. In section II we briefly review the concept of matrix completion and maximum correntropy criterion. In Section III we propose the new correntropy based matrix completion cost and propose two HQ based algorithms. In Section IV, simulation results are presented to demonstrate the reconstruction performance. Finally, conclusion is given in Section VI.

## Ii Preliminaries

### Ii-a Matrix completion

Consider a low-rank matrix where only a subset of entries can be observed. In particular, by defining the observed subset matrix where

 Ωi,j={1,(i,j)∈Ω0,(i,j)∉Ω (1)

the observed matrix can be represented as , where denotes the Hadamard product. The goal of matrix completion is to recover the whole entries of based on observed entries and low rank property. In detail, the matrix completion can be formulated as the following constrained minimization problem

 minM rank(M)  s.t.  Ω∘M=Ω∘X (2)

where is the recovered matrix and denotes the Hadamard product.

The above optimization is an NP-hard and non-convex problem. In the last decade, various methods have been proposed to deal with the above matrix completion problem. Typically, there are three approaches, which are shown as follows:

1) Direct approach: Although Eq.(2) is NP-hard, methods based on iterative hard thresholding (IHT) technique [33] can still be directly applied to the optimization problem. Similar to compressive sensing, at each iteration, the IHT approach utilizes gradient descent to decrease a measurement fidelity objective and then perform the best rank-R approximation. Note that the to obtain the largest R singular values at each iteration, the truncated SVD should be performed. Moreover, the normalized IHT (NIHT) [11] has also been introduced to matrix completion, which shows better performance than IHT.

2) Convex relaxation: A popular method for solving a non-convex optimization problem is to relax the nonconvex optimization to a convex problem. In particular, the convex nuclear norm is always used to replace the nonconvex rank minimization, i.e.

 minM ∥M∥∗  s.t.  Ω∘M=Ω∘X (3)

where denotes the sum of all singular values of . The semidefinite programming (SDP) [34] and iterative soft thresholding (IST) [12] algorithms can be applied to solve Eq.(3) Note that to obtain the singular values, the SVD still should be performed at each iteration.

3) Matrix factorization: The above methods are both SVD based methods, and may suffer from high computational complexity when dealing with large scale data. Matrix factorization is a simple way to tackle this problem. Specifically, the recovered matrix is factorized to multiple of two matrices and where is the rank of . The matrix factorization then solves the matrix completion by utilizing following objective function

 minU,V∥Ω∘(X−UV)∥2F (4)

where denotes the Frobenius norm of matrix . The solution of (1.3) can be solved via alternating optimization methods. The representative algorithms include PowerFactorization (PF) [18], low-rank Matrix Fitting (LMaFit) [19] and alternating steepest descent (ASD) [20].

### Ii-B Maximum Correntropy Criterion

Correntropy is a local and nonlinear similarity measure between two random variables within a ”window” in the joint space determined by the kernel width. Given two random variables

and , the correntropy is defined by [28]

 V(X,Y)=E[κ(X,Y)]=∫κ(x,y)dFXY(x,y) (5)

where is a shift-invariant Mercer kernel, and

denotes the joint distribution function of

. Given a finite number of samples , the correntropy can be approximated by

 ^V(X,Y)=1NN∑i=1κ(xi,yi) (6)

In general, the kernel function of correntropy is the Gaussian kernel

 κ(x,y)=Gσ(e)=exp(−e22σ2) (7)

where and is the kernel width.

Compared with the norm based second-order statistics of the error, correntropy involves all the even moments of the difference between and and is insensitive to outliers. Replacing the second-order measure with correntropy measure leads to the maximum correntropy criterion (MCC) [35]. The MCC solution is obtained by maximizing the following utility function

 Jmcc=E[Gσ(e(i))] (8)

Moreover, in practice, the MCC can also be formulated as minimizing the following correntropy-induced loss (C-loss) function [36, 37]

 JC−loss=1MM∑i=1σ2(1−Gσ(e(i))) (9)

The above cost function is closely related to the Welsch’s cost function originally introduced in [38]. The C-loss function with different kernel width are shown in Fig.1. One can observe that the C-loss function can effectively alleviate the impact of large errors. Moreover, selecting different kernel widths can adjust the range of sensitivity to outliers, while the error measure near zero will not be highly affected.

## Iii Proposed Algorithms

In this section, we combine matrix factorization with correntropy measure and propose a new objective function. Then we propose two efficient algorithms to solve the optimization problem.

Eq.(4) can be further rewritten as the sum of weighted squared residuals

 minU,VJ2(U,V)=n∑i=1m∑j=1Ωi,j(Xi,j−(UV)i,j)2 (10)

where denotes the -th entry of matrix . When the observed entry contain large outliers, the error measure may not work well because the outliers may highly bias the optimization. To improve the robustness, in this paper we introduce the correntropy as the error measure. In particular, by replacing the error measure with correntropy, one can obtain the following new optimization problem:

 minU,VJGσ(U,V)=n∑i=1m∑j=1Ωi,jσ2(1−Gσ(Xi,j−(UV)i,j)) (11)

The above equation can also be simplified as the following representation

 minU,VJGσ(U,V)=∥Ω∘(X−UV)∥Gσ (12)

The formulation of the above correntropy based optimization is closely related to [39] and [40]. In particular, when matrix is fully observed (i.e. for all ), Eq.. equals to the optimization of robust PCA based on MCC [39]. Further, when continues to impose constrains that both and are non-negative matrices, the optimization in Eq.(12) can be equivalent to the correntropy based nonnegative matrix factorization problem [40]. Certainly, due to existence of observation matrix in Eq.(12), the solution in [39] and [40] will no longer suitable for matrix completion. Thus one should seek new approaches to solve Eq.(12).

### Iii-a Optimization via half-quadratic

In general, the correntropy based objective function in Eq.(12) is difficult to be minimized directly. To tackle this problem, the half-quadratic (HQ) technique has been applied to optimize these correntropy based cost functions [39, 40, 41, 42]. By introducing additional auxiliary variable, HQ reformulates a nonquadratic cost function as an augmented objective function in enlarged parameter space.

According to half quadratic theory [43], for there exists a convex conjugated function so that

 Gσ(e)=maxt(e2tσ2−φ(t)) (13)

where and the maximum is reached at . Eq.(13) can be further rewritten as

 σ2(1−Gσ(e))=mint(−e2t+σ2φ(t)) (14)

By defining and , the above equation can be further derived as

 mineσ2(1−Gσ(e))=mine,s(e2s+ϕ(s)) (15)

Thus, as shown above, minimizing the nonconvex C-loss function in terms of is equivalent to minimizing an augmented cost function in an enlarged parameter space . Therefore, by substituting Eq.(15) to Eq.(11), the correntropy based objective function can be further formulated as

 JGσ(U,V) (16) =minWn∑i=1m∑j=1(Wi,jΩi,j(Xi,j−(UV)i,j)2+Ωi,jϕ(Wi,j))

Further, by defining the augmented cost function

 JHQ(U,V,W)=∥√W∘Ω∘(X−UV)∥2F+Ω∘ϕ(W) (17)

where , we have the following relation

 minU,VJGσ(U,V)=minU,V,WJHQ(U,V,W) (18)

and the correntropy based optimization problem can be formulated as a half-quadratic based optimization. Similar to [21], treating and as a whole, the optimization can be solved with following alternating minimization procedure:

1) optimizing : from Eq.(13) and Eq.(15) one can observe that given a certain , the minimum is reached as . Therefore, given the fixed and , the optimal solutions of can be obtained as

 Wi,j=Gσ(Xi,j−(UV)i,j),(i,j)∈Ω (19)

We should note that when , optimal is unavailable. However, computing for do not affect solution of Eq.(12) and Eq.(17) because of multiplication with . To simplify the expression, in the following we directly use for full matrix . In practical application, one just need to calculate for .

2) Given a fixed , Eq.(17) becomes a weighted matrix completion optimization problem

 minU,V∥√W∘Ω∘(X−UV)∥2F (20)

The weighting matrix assigns different weights based on error residuals. According to the property of Gaussian function, a large error will assign small weight, such that the negative impact of outliers may be greatly reduced. So far, there are no existing algorithms to directly solve Eq.(17). In the following, we follow two methods of traditional matrix completion and propose two efficient algorithms to solve the above weighted matrix completion optimization.

### Iii-B Correntropy based power factorization algorithm

In this part, we utilize the alternating minimization method to solve the correntropy based matrix completion problem. Alternating minimization is a widely used method for solving matrix factorization based optimization problem, and the representative algorithm is the Power factorization (PF). In particular, PF alternately minimizes and at each iteration for Eq.(4), i.e. fix one factored matrix and optimize the other. Similar to the traditional PF, for correntropy based optimization in Eq.(12), we can also alternatively optimize and as follows

 Ut+1=argminU∥Ω∘(X−UVt)∥Gσ (21)
 Vt+1=argminV∥Ω∘(X−Ut+1V)∥Gσ (22)

where denotes the iteration number. Then, the HQ techniques can be utilized for each minimization step in Eq.(21) or Eq.(22). In particular, similar to Eq.(18), given a fixed , Eq.(21) can be further rewritten as

 Ut+1=argminU,W∥√W∘Ω∘(X−UVt)∥2F+Ω∘φ(W) (23)

The above minimization problem can be solved by alternating minimization described in the previous subsection. To distinguish the iteration procedure in Eq.(21) and Eq.(22), we denote the alternating minimization for in Eq.(23) as the inner iteration, while the iteration in Eq.(21) and Eq.(22) are outer iteration. Therefore, at each inner iteration , we first obtain optimal from Eq.(19) and then solve

 Uk=argminU∥√Wk∘Ω∘(X−UV)∥2F (24)

Due to existence of , Eq.(24) do not has explicit solution. To solve this problem, similar to [21], by defining

 U=[uT1 uT2 ... uTm]T,X=[xT1 xT2 ... xTm]T (25) S=√W∘Ω,S=[sT1 sT2 ... sTm]T

Eq.(24) can be optimized by solving the following subproblems:

 uki=argminu∥∥diag(ski)(xi−VTu)∥∥22, i=1,...,m (26)

denotes the diagonal matrix whose entries are composed by elements of . For each subproblem, by dropping the zero diagonal entries of , one can finally obtain

 uki=argminu∥∥√Φk(xθi−VTθiu)∥∥22 (27)

where denotes the index set of non-zero entries of , such that and where is the cardinality of . is a diagonal matrix with entries

 Φkj,j=Gσ((xθi−VTθiu)j),j=1,2,...,|ski| (28)

Eq.(27) is essentially a weighted least squares problem and has a explicit solution

 uki=(VTθiΦkVθi)−1VTθiΦkxθi (29)

Thus each can be obtained by alternating calculate Eq.(28) and Eq.(29) until convergence. The same iteration procedure can be applied for Eq.(22) with fixed .

The above algorithm is called the half-quadratic based powerfactorization (HQ-PF) algorithm. In the following we give two propositions for HQ-PF.

Proposition 1: For a non-increasing , the sequence generated by HQ-PF will converge.

Proof: According to properties of alternating minimization, for a fixed we can obtain

 JGσ(Ut+1,Vt+1)≤JGσ(Ut+1,Vt)≤JGσ(Ut,Vt) (30)

Moreover, considering the following function in terms of

 f(σ)=σ2(1−Gσ(x)) (31)

Taking derivative of Eq.. we can obtain

 ∂f(σ)∂σ=2σ(1−(1+x22σ2)exp(−x22σ2))(a)>0 (32)

The (a) holds since for . Therefore monotonically increases as increases, and then for

 JGσ1(Ut+1,Vt+1)≤JGσ2(Ut+1,Vt+1)≤JGσ2(Ut,Vt)

will always satisfied. Thus, for a non-increasing , the sequence monotonically non-increases . It can be also verified that is always below-bounded for arbitrary . Thus, will converge.

Proposition 2: When , the HQ-PF is equal to PF.

Proof: one can observe that when the kernel width tends to infinity, the equation

 limσ→+∞−2σ2exp(−x22σ2)+2σ2=x2 (33)

will hold. Therefore, for sufficient large , the correntropy based optimization in Eq.(21) and Eq.(22) becomes equal to error norm based optimization

 Ut+1=argminU∥Ω∘(X−UVt)∥2F (34)
 Vt+1=argminV∥Ω∘(X−Ut+1V)∥2F (35)

which is the typical iteration procedure of PF. Moreover, as , will be equal to 1, and

becomes the identity matrix. The

and can be directly obtained by

 ui=(VTθiVθi)−1VTθixθi (36)
 vj=(UTθjUθj)−1UTθjxθj (37)

and the inner iteration number becomes 1. The solution coincides with the solution of PF [18].

As shown in Fig.1, the kernel width of Gaussian kernel function affects the range of sensitivity to outliers. Lots of works of literature have shown that relatively small can offer more accurate performance but also suffers from low convergence speed [44]. A practical way is to use adaptive kernel width [44, 27]. On the other hand, in the field of online adaptive filtering, many algorithms utilize the LS method at the first several iterations to speed up the convergence. In this work, to improve both the efficiency and accuracy, we combine the above two methods and propose a new kernel width selection strategy for HQPF. In particular, by defining the error residual matrix at iteration with

 Et=X−UtVt (38)

we measure the convergence speed using the relative change of , i.e. . At the initial iterations, we directly utilize norm based PF solution in Eq.(36) to update and to speed up the initial convergence speed. When the convergence speed is slow, i.e. where is the free threshold parameter, the optimization is switched to correntropy based optimization in Eq.(21) and Eq.(22). The adaptive kenrel width at -th iteration is selected using the following strategy

 σt=max(η(etΩ(0.75)−etΩ(0.25)),ξ) (39)

where

denotes the vector composed by elements of all non-zero entries of

, and denotes the

-th quantile of

. is the parameter controls the kernel width, and is the lower bound of . Finally, if is less than a sufficient small value , we consider the algorithms converge to a local minimum, and the iteration procedure terminates.

For optimization of Eq.(24), the selection of kernel width for inner iteration also affects the performance. Too small at initial iteration may lead to near-zero , and may cause the singularity problem. Therefore, we also utilize the adaptive kernel selection method to update at each inner iteration. In particular, is initialized to a sufficient large value (i.e.,10000) at and then update as follows for :

 σkin=⎧⎨⎩∥∥12|si|(xsi−VTsiuki)∥∥2,∥uki−uk−1i∥2>ϵσt,∥uki−uk−1i∥2<ϵ (40)

where is the threshold parameter. The norm of relative error vector is also utilized as the stop criterion for inner iteration.

The pseudocode of HQPF algorithm with adaptive kernel selection is summarized in Algorithm 1. Note that in each alternating minimization step, (or ) subproblems are actually independent with each other. Thus one can further utilize a distributed system to solve the subproblems in parallel and speed up the computation.

### Iii-C Correntropy based alternating steepest descent algorithm

HQ-PF is an extension of the traditional PF algorithm. Although HQ-PF is a distributable algorithm which can improve computation efficiency, the whole computational cost is still much higher than based algorithm since at each iteration the weighted LS optimization should be applied. Recently, an alternating steepest descent (ASD) method is proposed for matrix completion task. ASD directly applies gradient descend method and shows faster performance than alternating minimization based algorithms. Inspired by ASD, in this section we introduce the gradient descent method to solve Eq.(12) and derive a more efficient algorithm.

As described in subsection A, we first optimize according to Eq.(19) Then, unlike alternating minimization, we directly apply the gradient descent method to alternative update and only one step at each iteration. For further derivation, we add a coefficient to Eq.(20) so that the minimization becomes

 minU,V12∥√W∘Ω∘(X−UV)∥2F (41)

then, based on Eq.(25), for a fixed , Eq.(41) can be rewritten as the function in terms of :

 fV(U)=12n∑i=1(xi−VTui)Σi(xi−VTui)T (42)

where . Thus the gradient of Eq.(42) at each element can be derived as

 ∂fV(U)∂uij =−(xiΣiV)j+(uiVΣiV)j (43) =−(W∘Ω∘XV)ij+(W∘Ω∘(UV)V)ij =−(W∘Ω∘(X−UV)V)ij

hence the gradient descent of along can be obtained as

 gU=∂fV(U)∂U=−W∘Ω∘(X−UV)VT (44)

Further, the gradient descent step size is selected by solving the following optimization problem

 μU =argminμ∥√W∘Ω∘(X−(U−μgU)V)∥2F (45) =∥gU∥2F∥√W∘Ω∘(gUV)∥2F

Similar to Eq.(44) and Eq.(45), for a fixed , we can obtain the gradient descent along and the corresponding step size as

 gV=−UT(W∘Ω∘(X−UV)) (46)
 μV=∥gV∥2F∥√W∘Ω∘(UgV)∥2F (47)

Therefore, the matrices and can be alternated update using gradient descend method, i.e. for each iteration .

 Ut+1=Ut−μtUgtU (48) Vt+1=Vt−μtVgtV

The algorithm with update above is called the half-quadratic alternating steepest descend (HQASD) algorithm. The following proposition guarantees the convergence of the using the above gradient descend method.

Proposition 3: For a non-increasing , the sequence generated by HQASD will converge.

Proof: according to properties of alternating descend, for a fixed we can obtain

 JHQ(Ut+1,Vt+1,Wt+1) ≤J(Ut+1,Vt+1,Wt) (49) ≤JHQ(Ut+1,Vt,Wt) ≤JHQ(Ut,Vt,Wt)

Since is bounded below, one can obtain [32]

 JGσ(Ut+1,Vt+1)≤JGσ(Ut,Vt)

Moreover, according to proof of proposition 1, for

 JGσ1(Ut+1,Vt+1)≤JGσ2(Ut,Vt)

will always satisfied. Thus, the sequence generated by HQ-ASD will converge for non-increasing .

Proposition 4: When , the HQ-PF is equal to PF.

Proof: As , will be equal to 1, thus all the entries of becomes 1. The in Eq.(19) does not need to be optimized, and Eq.(44)-(47) will be the same as the algorithm for ASD in [20].

In [20], the author also proposed the scaled ASD (ScaledASD) algorithm to improve the convergence speed and recover performance. Similar to ScaledASD, we can further scale the gradient descent directions in Eq.(44) and Eq.(46) by and , respectively, i.e.

 ^gU=gU(VVT)−1 (50) ^gV=gV(UTU)−1

the corresponding step sizes are obtained as

 ^μU=tr{gTU^gU}∥√W∘Ω∘(^gUV)∥2F (51) ^μV=tr{gTV^gV}∥√W∘Ω∘(U^gV)∥2F

and the gradient update of and can be then derived as

 Ut+1=Ut−^μtU^gtU (52) Vt+1=Vt−^μtV^gtV

Since ScaledASD has been proved to show better performance than ASD [20]. Thus, we can conduct that Scaled HQ-ASD will also perform better than Scaled HQ-ASD. Therefore, for simplicity, in the following part we directly utilize Eq.(52) as the update of HQ-ASD.

Similar to HQPF, we also apply the adaptive selection of kernel width to improve the convergence speed and performance of HQASD. In particular, at first several iterations, the kernel width is fixed to sufficient large value (or equivalently set to an all one matrix and use ASD update procedure). When , the optimization is switched to correntropy based optimization and the HQASD with adaptive kernel width in Eq.(39) is applied.

The pseudocode of HQASD is summarized in Algorithm 2.

### Iii-D Complexity analysis

In this part, we discuss the complexity of the two proposed algorithms. For HQPF, at each minimization step of Eq.(HQPFU) and Eq.(22), the complexity is where is the number of outer iteration. For the inner iteration of HQPF, we consider two cases. When utilizing PF at first several iterations (denoted as ), the least squares solution can be directly obtained, such that . When applied weighted least squares for HQPF, an iteration procedure should be performed, and the inner iteration number is denoted as . Therefore, the final complexity of HQPF is . One can observe that the complexity is closely related to the percentage of observations and rank of . Larger rank or larger amount of observed entries may both increase the computational cost of HQPF. Moreover, as mentioned in Section B, HQPF is friendly to multicore and distributed systems. In particular, subproblems for solving and are independent and can be applied in a parallel way. Therefore, for a distributed system with workers, the complexity of each worker will be reduced to .

The complexity of HQASD is similar to ASD [20]. In particular, the complexity per iteration without can be directly obtained from the complexity of ASD, i.e. the complexity is . When taking computation of into consideration, the complexity of MCC-ASD at each iteration becomes . Therefore, the overall complexity of MCC-ASD is where is the iteration number. As can be seen, the complexity of HQASD is also positively correlated to the percentage of observations and rank of . Moreover, compared with HQPF, the complexity of HQASD per iteration is much smaller especially when rank or matrix size is large (large matrix size will lead to large ). Certainly, the gradient descend based HQASD may need a larger number of iterations than HQPF. The final computation cost comparison will be verified by simulations.

## Iv Simulations

In this section, we carry out simulations to verify the performance of the proposed two algorithms.

We compare the performance with existing state-of-the-art robust matrix completion methods including based alternating minimization via PowerFactorization (-PF), Quadratic Programming (QP) with the loss function (BMFC) and correntropy based iterative hard thresholding (CIHT). In particular, to ensure fairness in comparison, the kernel width adaptation method proposed in this paper is also applied to CIHT in the simulations. All the algorithms are implemented in MATLAB r2017b on a 2.6-GHz and 16-GB memory computer without any acceleration. The completion performance is evaluated by normalized mean square error (NMSE) defined by

 NMSE=E[∥∥^M−X∥∥2F]∥X∥2F (53)

In the simulation, the expectation in Eq.(53) is approximated by

 E[∥∥^M−X∥∥2F]≈1TT∑mc=1∥^Mmc−X∥2F

where is the number of Monte Carlo runs.

In the simulation, the typical two-component Gaussian mixture model (GMM) is utilized as the non-Gaussian noise model. The probability density function (PDF) of GMM is defined as

 pv(i)=(1−c)N(0,σ2A)+cN(0,σ2B) (54)

where

represents general noise disturbance with variance

, and stands for outliers that occur occasionally with a large variance . The variable controls the occurrence probability of outliers.

For all algorithms, similar to HQPF and HQASD, we utilize the relative change of the current and previous iterations as the stop criterion, and the threshold parameter is set specific to each algorithm. During all the simulations, without explicitly mentioned, the threshold parameter for stop criterion is set to for BMFC, -PF, HQ-PF, and for HQASD and CIHT. The threshold parameter for adaptive kernel width strategy is set to , and for HQ-PF, HQASD and CIHT, respectively. The inner iteration threshold for weighted LS is set to for both -PF and HQPF. Other parameters are tuned to achieve the best during each simulation.

### Iv-a Random matrix completion

We first compare the performance on the synthetic random data. The matrix with rank is generated by multiplying two matrices and . The observation matrix with the percentage of observation is generated by randomly assign of the entries of with value 1. In this part, without explicitly mentioned, we set the matrix as the squared matrix and the dimension is set to . The parameters of GMM noise are set as . The rank is set to . The percentage of observation is fixed at . The parameter for BMFC is set to 6. For each simulation, the NMSE is obtained via 200 Monte Carlo runs with different realizations of , , , and noises.

First, we compare the performance under different noise environments. We gradually increase the variance , , and calculate the NMSE values. The curves of NMSE with different noise distributions are shown in Fig.2-4. One can observe that the performance of all algorithms degrade as and increases, while the performance is slightly changed with different levels of outliers (i.e. ). In particular, all algorithms achieve comparable performance except -PF, which is mainly caused by nonsmooth of norm near zero error.

Second, the performance with different sizes of are investigated. Fig.5 shows the curves of NMSE in terms of different matrix sizes , and Fig.6 shows the corresponding average running times for a single matrix completion procedure. As can be seen, as the size of the matrix increases, the NMSE values of all algorithms decrease, while the time costs increase significantly for all algorithms. The proposed two algorithms HQPF and HQASD achieve comparable lower NMSE than other algorithms, and HQASD runs much faster than other algorithms. In particular, HQASD can run as fast as about 3 orders of magnitude than -PF and BMFC when the matrix size is larger than 700.

Third, we explore the largest recoverable rank of under different percentage of observed entries

, which is also called the phase transition. The rank

and the percentage are set within [2,30] and [0,100], respectively. For each selection of and , 200 Monte Carlo runs are performed, and the recovery is judged to be successful if the NMSE is lower than . The phase transition image for different algorithms is shown in Fig.7. The shade of the color block represents the probability of success, i.e. the percentage of successful recovery in 200 Monte Carlo runs. One can observe that the white region of HQASD and HQPF is larger than other algorithms, which shows that the proposed algorithms can afford larger rank or lower percentage of observations. Meanwhile, the corresponding average running times and NMSE values curves for each algorithm are shown in Fig.8 and Fig.9. Note that only corresponding data with white blocks in Fig.7 are shown in Fig.8 and Fig.9. As seen, the proposed HQASD achieve lowest NMSE as well as lowest computational cost among all algorithms. HQPF and CIHT achieve similar low NMSE with HQASD but need more time for completion. Moreover, among all distributable algorithms, the HQPF performs the best.

It is known that for correntropy based algorithms, the kernel width highly affects the performance. Thus, here we also analyze the sensitivity of kernel parameters. As shown in Eq.(39), in kernel adaptation strategy, the kernel width is determined by the choice of and . Thus we select different values of and and then obtain the corresponding NMSE over 200 Monte Carlo runs. The NMSE curves versus different are shown in Fig.10. As can be seen, the algorithms can work well in a wide range of values of and . In particular, when , the NMSE increases as grows, while the NMSE will not highly be affected when selecting different values of . When the and are both selected as the too small values, the kernel width is too small and the algorithms may not converge to local minima.

### Iv-B Image Inpainting

In this part, we compare the performance of algorithms on image inpainting tasks with non-Gaussian noise. Image inpainting aims to fill in the unknown pixels of an image from an incomplete image. Because the most image can be well approximated by low-rank representation, image inpainting can be seen as a matrix completion task. Moreover, to evaluate the performance under non-Gaussian noise, we select a mixture of Gaussian and Salt-and-pepper noise as the noise model. Gaussian noise is a typical normal image noise caused by electronic components. Salt-and-pepper noise is another noise produced in errors in the analog-to-digital converter or bit transmission caused by sudden intense interference, such that the noise value with

or sparsely occurs on the image. We utilize the popular peak signal to noise ratio (PSNR) to evaluate the performance, which is defined as

 PSNR=nm∥^M−X∥2F

A higher PSNR represents better recovery performance.

We select the Lena and Palace image as the test images. Lena image (Fig.11(a)) is a popular image for performance evaluation, while Palace image (fig.12(a)) contains duplicate patterns which is always utilized for image inpainting test. Each image is compressed via best rank-50 approximation (see. Fig.11(b) and Fig.12(b)) so that the low rank property is guaranteed. Then the two images are masked in a ”cross pattern” and a ”stamp mark”, respectively. Finally, the observed pixels of images are added with Gaussian noise with variance , and then of the observed pixels are also disturbed by salt-and-pepper noise. During the simulations, 100 Monte Carlo runs are performed for each recovery task. The algorithms parameters for HQPF an HQASD are tuned as . For BMFC, to obtain better performance, the parameter is set to 6 and 20 for Gaussian and non-Gaussian noise, respectively.

Table.1 lists the average recovery PSNR and the corresponding average running times under noiseless and noisy environments. One can see that the proposed HQASD algorithm achieves the best performance in all simulations. In particular, HQASD obtains the highest PSNR as well as lowest running times. Moreover, HQPF obtains the comparable PSNR for Palace image, while the performance is worse than HQASD for Lena image. Nevertheless, HQPF still achieves the best performance among distributable algorithms.

To further demonstrate the recovery performance, samples of images recovered by different algorithms under non-Gaussian noise are shown in Fig.11 and Fig.12. The enlarge view of part of recovered images are also depicted to evidently show the recovery differences. One can see that when filling missing entries of the face region in Lena image, fringes are produced in all of the recovered images. In particular, BMFC and MCCIHT have the most obvious fringes, while HQASD has the least visible fringes. Moreover, -PF and HQPF fail to accurately recover the left eye, which is probably caused by converging to a wrong local minimum perturbed by non-Gaussian noise via alternative minimization. Furthermore, for the palace image, the recovered image of BMFC and MCCIHT still have visible reconstruction error. -PF, HQPF and HQASD can successfully recover the image. From the enlarged view, one can see that the recovered image of HQPF and HQASD are slightly clear than -PF, especially for object edges.

### Iv-C Experiments on MovieLens dataset

In this part we evaluate the proposed algorithm on the real data set. MovieLens is a widely used dataset for recommender system. Firstly, similar to experiments in [22, 21], we carry out the experiment on MovieLens-100K data set. MovieLens-100K consists of 100,000 ratings (1-5) from 943 users on 1682 movies, and the percentage is observed data about . It also provide 5 splits of training data and testing data , where and account for and of the observed data, respectively. We perform the test on both noiseless case and noisy case. In noisy case, of the rating value are replaced by , and of the rating value are replaced by , too. The performance is evaluated using the root mean square error (RMSE) defined as [22]

 RMSE=   ⎷E[∥∥Ωtest∘(^M−Xtest)∥∥2F]card(Ωtest)

where is a logical matrix for testing data where the each entry denotes whether the -th entry of testing data is observed. The expectation is approximated by 10 Monte Carlo realizations.

In this experiment, we set for HQPF and HQASD. For CIHT and HQASD, the threshold is set to . Fig.13 and Fig.