# Simultaneous Tensor Completion and Denoising by Noise Inequality Constrained Convex Optimization

Tensor completion is a technique of filling missing elements of the incomplete data tensors. It being actively studied based on the convex optimization scheme such as nuclear-norm minimization. When given data tensors include some noises, the nuclear-norm minimization problem is usually converted to the nuclear-norm regularization' problem which simultaneously minimize penalty and error terms with some trade-off parameter. However, the good value of trade-off is not easily determined because of the difference of two units and the data dependence. In the sense of trade-off tuning, the noisy tensor completion problem with the noise inequality constraint' is better choice than the `regularization' because the good noise threshold can be easily bounded with noise standard deviation. In this study, we tackle to solve the convex tensor completion problems with two types of noise inequality constraints: Gaussian and Laplace distributions. The contributions of this study are follows: (1) New tensor completion and denoising models using tensor total variation and nuclear-norm are proposed which can be characterized as a generalization/extension of many past matrix and tensor completion models, (2) proximal mappings for noise inequalities are derived which are analytically computable with low computational complexity, (3) convex optimization algorithm is proposed based on primal-dual splitting framework, (4) new step-size adaptation method is proposed to accelerate the optimization, and (5) extensive experiments demonstrated the advantages of the proposed method for visual data retrieval such as for color images, movies, and 3D-volumetric data.

## Authors

• 6 publications
• 7 publications
02/20/2021

### On the tensor nuclear norm and the total variation regularization for image and video completion

In the present paper we propose two new algorithms of tensor completion ...
05/07/2014

### On Tensor Completion via Nuclear Norm Minimization

Many problems can be formulated as recovering a low-rank tensor. Althoug...
03/21/2019

### Tensor-Ring Nuclear Norm Minimization and Application for Visual Data Completion

Tensor ring (TR) decomposition has been successfully used to obtain the ...
07/19/2013

### Tensor-based formulation and nuclear norm regularization for multi-energy computed tomography

The development of energy selective, photon counting X-ray detectors all...
11/18/2020

### Exact nuclear norm, completion and decomposition for random overcomplete tensors via degree-4 SOS

In this paper we show that simple semidefinite programs inspired by degr...
09/24/2019

### Tensor Norm, Cubic Power and Gelfand Limit

We establish two inequalities for the nuclear norm and the spectral norm...
07/22/2014

### Approximate Regularization Path for Nuclear Norm Based H2 Model Reduction

This paper concerns model reduction of dynamical systems using the nucle...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Completion is a technique of filling missing elements of incomplete data using the values of reference (available) elements and the structural assumptions (priors) of data. We consider a general exact matrix/tensor completion problem as follows:

 minimizeXf(X), s.t. PΩ(X)=PΩ(T), (1)

where and are the input and output -th order tensors, respectively, a cost function, , is used to evaluate (prior) structural assumptions, with is an index tensor that represents the missing and available elements of as and , respectively. A support set, , is defined as . When the missing and available elements are independent, completion is impossible. However, most real-world data have a few redundant properties that can be used for completion, such as symmetry, repetition, and sparsity. When the cost function is convex and proximable, Problem (1) can be solved by convex optimization methods such as alternating direction method of multipliers (ADMM) [3] and primal-dual splitting (hybrid gradient) (PDS/PDHG) method [10]. In this paper, we refer to PDS/PDHG as PDS for simple. For example, Problem (1

) with matrix/tensor nuclear-norm

[6, 7, 4, 24, 25], total variation (TV) [16, 11], and both cost functions [18] have been studied.

Next, we consider an ‘inexact’ matrix/tensor completion problem as follows:

 minimizeXf(X)+μDΩ(X,T), (2)

where is a distance measure between and , and

is a trade-off parameter between a prior and a distance term. We refer to this as ‘Lagrange form’. When we assume Gaussian distribution for a noise model, Problem (

2) with can be considered for completion and denoising. In similar way, Problem (2) with assumes Laplace distribution for a noise model. When both functions and are convex and proximable, it can be solved by convex optimization. For example, Problem (2) with matrix/tensor nuclear-norm [26, 13, 20], TV [29, 30, 17], and both cost functions [31] have been studied.

Here, we consider ‘inequality form’ of Problem (2) as follow:

 minimizeXf(X),s.t. DΩ(X,T)≤δ, (3)

where is a noise threshold parameter. Figure 1 illustrates the Problems (2) and (3) and these solutions for different values of and . In the Lagrange form, the minimal point of convex penalty is the solution for , is the solution for , and the solutions for draw a curve connecting two solution points for and . Generally, the good trade-off exists on the curve, and it may be a projected point onto the curve from the unknown original tensor . However, optimal value of is still unknown and has to be tuned from wide range . On the other hand, the range of can be more narrow in the inequality form when we assume the noise standard deviation is known. Let us put for Gaussian and for Laplace, then optimal value of may exist in . If directions between the additional noise tensor and the gradient of at are similar, then optimal would near to and would also near to . By contrast, if directions between the additional noise tensor and the gradient of at are very different, optimal would be small and would be far from .

Problems (2) and (3) are convertible with corresponding values of and , however, these corresponding values of and are difficult to know. Figure 2 shows the relationship between and in a tensor completion problem with nuclear-norm and Frobenius-norm minimization. Both Problems are linked by and which have one-to-one correspondence.

The problem here is that the inequality form is not easy to solve directly unlike Lagrange form. In [5], inequality form with is solved by iterative optimization of Lagrange form with to tune optimal trade-off which corresponds to . This is a critical issue to solve its optimization problem with inequality using convex optimization only once.

In this study, we propose a new optimization algorithm based on PDS [10] for convex optimization problems consisting of proximable functions and noise inequality constraints. For this purpose, we derive that the proximal mappings of noise inequality constraints based on Gaussian and Laplace distributions, which are not trivial, can be obtained using analytical calculation. Furthermore, to accelerate the optimization, we propose a new step-size adaptation method for PDS algorithm. For application, we define the cost function as a composition of tensor nuclear-norm and generalized TV, and conduct extensive experiments to show the advantages of the proposed methods.

Note that this work is an extension of prior study presented in conferences [36, 35]. The new contributions in this study are as follows: A generalized formulation and detailed explanation of the proposed models, derivation of a new proximal mapping for Laplace noise inequality, applying a new step-size adaptation method for acceleration, and additional experiments.

The remainder of this paper is organized as follows: In Section II, prior studies on matrix and tensor completion methods are reviewed. In Sections III and IV, we propose a new model for tensor completion based on low rank and TV, and its optimization algorithm using a PDS approach. In Section V, we demonstrate the advantages of the proposed method over selected state-of-the-art methods using color images, movies, and 3D-volumetric images. Lastly, we state the conclusions in Section VI.

### I-a Notations

The notations used in this paper follow several rules. A vector, a matrix, and a tensor are denoted by a bold lowercase letter,

, a bold uppercase letter, , and a bold calligraphic letter, , respectively. An th-order tensor, , can be transformed into a vector and matrix forms, which are denoted using the same character, and for , respectively. An -element of is denoted by or . Operator represents the Hadamard product, defined as .

## Ii Review of prior works in matrix and tensor completion

### Ii-a Matrix completion models

First, we consider a matrix completion problem as follow:

 minimizeXf(X),% s.t. PΩ(X)=PΩ(T), (4)

which is a case of in (1). For low-rank matrix completion, ideally, we want to set , however, it is NP-hard [14]. Thus, its convex relaxation, i.e. nuclear-norm, is used [28]:

 f(Z)=||Z||∗:=min(I,J)∑i=1σi(Z), (5)

where is the

-th largest singular value of

.

In [5], a noisy case has been discussed as

 minimizeX||X||∗, s.t. ||PΩ(X)−PΩ(T)||2F≤δ, (6)

To solve Problem (6), an algorithm has been proposed, which requires solving

 X∗μ=argminX||X||∗+μ2||PΩ(X−T)||2F, (7)

multiple times to determine an appropriate value of such that

. As iterative calculations of singular value decomposition are required to solve Problem (

7) [26], the algorithm is computationally expensive. We refer to this algorithm as low-rank matrix completion with noise (LRMCn).

There are several studies about applications of image deblurring, denoising, and interpolation

[29, 34, 16], where a cost function is given by TV. The standard TV for matrix is defined by

 ||Z||TV:=∑i,j||∇zi,j||2, (8) ∇zi,j:=(∇1zi,j∇2zi,j)=(zi+1,j−zi,jzi,j+1−zi,j). (9)

Problem (4) with the nuclear norm and TV is discussed in [31], in which it was proposed to minimize the nuclear norm using singular value thresholding and TV using gradient descent, alternately. However, using standard gradient-based optimization is not appropriate because the nuclear norm and TV are not differentiable functions. An alternative efficient optimization approach referred to as ‘proximal splitting’ is gaining attention [10, 2].

### Ii-B Tensor completion models

When in (1), it is not a simple extension of matrix completion because of special properties of tensors. For example, there are two types of ranks in tensors, i.e., the canonical polyadic (CP) rank and Tucker rank [22]. As the CP rank has several difficult properties, low Tucker-rank based completion is relatively well studied.

In [24, 25], a case of exact tensor completion, which is Problem (1), where the cost function is given by a tensor nuclear norm has been discussed, in which the tensor nuclear norm, , is defined by

 fLR(X):=N∑n=1λn||X(n)||∗, (10)

where represents the weight parameters for individual tensor modes, and is the -th mode unfolded matrix of tensor . ADMM [3] has been employed for its minimization problem. Furthermore, its noisy scenario has been discussed in [13], which is formulated as Problem (2) with the tensor nuclear norm. We refer to this method as low n-rank tensor completion (LNRTC).

In [17], a case of Problem (2), in which the cost function is given by generalized TV (GTV) has been discussed, where GTV is defined by the sum of the generalized matrix TV of individual mode-unfolded matrices of a tensor as

 fGTV(X):=N∑n=1wn||X(n)||GTV, (11)

where represents the weight parameters for individual tensor modes, and for matrix is a GTV-norm, which is defined by

 ||Z||GTV:=∑i,j√∑θ∈Θτθ(~∇θzi,j)2, (12)

where represents weight parameters, is the differential operator for direction , for example, , , , and . This convex optimization problem is solved using the ADMM in [17]. We typically consider for standard matrix TV (8). In contrast, is considered for GTV. When we consider and for all in GTV, it is given by . In this case, GTV is anisotropic with respect to modes in the tensors, which leads to corruption of diagonal edges.

Note that we can consider a more simple, straightforward, and isotropic tensorial extension of matrix TV, which is defined by

 fTV(Z):=∑i1,i2,...,iN||∇zi1,i2,...,iN||2,w, (13) ∇zi1,i2,...,iN:=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝∇1zi1,i2,...,iN∇2zi1,i2,...,iN⋮∇Nzi1,i2,...,iN⎞⎟ ⎟ ⎟ ⎟ ⎟⎠, (14)

where is an weighted l2-norm, and an -th mode partial differential operator is defined by . Instead of , we consider in this paper.

## Iii Proposed model

In this section, we propose a new model for tensor completion and denoising using a tensor nuclear norm and TV simultaneously. The proposed optimization problem is given by

 minimizeX αfTV(X)+βfLR(X), s.t. vmin≤X≤vmax, (15) DΩ(X,T)≤δ,

where and are the weight parameters between the TV and nuclear norm terms, and the first constraint in (15) imposes all values of the output tensor to be included in a range, . The first and second constraints are convex and these indicator functions are given by

 iv(X):={0vmin≤X≤vmax∞otherwise, (16) iδ(X):={0DΩ(X,T)≤δ∞otherwise. (17)

Using and , tensor completion problem (15) can be rewritten as

 minimizeXαfTV(X)+βfLR(X)+iv(X)+iδ(X). (18)

As these four functions are not differentiable, traditional gradient-based optimization algorithms, e.g., the Newton method, cannot be applied. In Section IV, we introduce and apply an efficient approach, referred to as PDS, to solve proposed optimization problem (15).

### Iii-a Characterization of the proposed model

In this section, we explain the relationship between the proposed model and prior works introduced in Section II. There are three characterizations of the proposed model.

First, when , , , , , and , the proposed model can be characterized as LRMCn [5]. In contrast with LRMCn, which solves several convex optimization problems to tune , the proposed method can obtain its solution by solving only one convex optimization problem, and provides its tensorial extension. Moreover, proposed model includes Laplace distribution as noise model unlike LRMCn.

Second, when , , , and , the proposed method can be characterized as LNRTC [13]. In contrast with LNRTC, which employs the ADMM for solving a type of Problem (2), the proposed method employs the PDS algorithm for solving a type of Problem (3).

Third, when , , , and , the proposed model can be characterized as an isotropic version of GTV [17]

. In contrast with GTV, in which a problem is solved using the ADMM, which requires matrix inversion through the fast Fourier transform (FFT) and inverse FFT, the proposed method does not need to consider matrix inversion. Furthermore, the proposed method tunes the value of

Additionally, our model differs from a recent work proposed in [21] because it applies some constrained fixed-rank matrix factorization models into individual mode-matricization of a same tensor in noiseless scenario. The problem is non-convex and it is not designed for a noise reduction model.

## Iv Optimization

Currently, two optimization methods named ADMM and PDS have attracted attentions in signal/image processing [3, 12, 10]. Two optimization methods can be creatively used, for example, ADMM can be efficiently used for low-rank matrix/tensor completion [23, 9], in contrast, PDS can be efficiently used for TV regularization [38, 8]. Main difference between those optimizations is that TV regularization may include a large matrix inversion in ADMM, by contrast, it can be avoided in PDS. Thus, PDS has been used for many TV regularization methods such as TV denoising/deconvolution [38], vectorial TV regularization [27], and total generalized variation in diffusion tensor imaging [33].

### Iv-a Primal-dual splitting algorithm

In this section, we introduce basics of PDS algorithm. The PDS [10] algorithm is a framework used to split an optimization problem including non-differentiable functions into several sub-optimization processes using proximal operators. First, we consider the following convex optimization problem

 minimizexf(x)+h(Lx), (19)

where and are general convex functions, and is a linear operator (matrix). From the definition of convex conjugate:

 h(Lx)=maxy⟨y,Lx⟩−h∗(y), (20)

the following saddle-point problem can be derived

 minxmaxyf(x)+⟨y,Lx⟩−h∗(y), (21)

where is a convex conjugate of . In PDS method, we focus to solve (21) instead of (19).

For optimality of , at least the following conditions are satisfied:

 0=p(ˆx,ˆy)∈∂f(ˆx)+LTˆy, (22) 0=d(ˆx,ˆy)∈∂h∗(ˆy)−Lˆx, (23)

where stands for sub-gradient of functions, and we consider primal and dual residual vectors as some and , respectively.

Based on sub-gradient descent/ascent method, natural update rules are given by

 xk+1=xk−γ1Δxk, (24) yk+1=yk+γ2Δyk, (25)

where and are update directions, and and are step size parameters.

When and are proximable functions, proximal gradient method can be introduced as

 xk+1=proxγ1f[xk−γ1LTyk], (26) yk+1=proxγ2h∗[yk+γ2Lxk+1], (27)

where proximal mapping is defined by

 proxλg[z]:=argminuλg(u)+12||u−z||22. (28)

Note that let us put , then we have

 z−z∗∈λ∂g(z∗). (29)

Optimization algorithm using (26) and (27) is called as “Arrow-Hurwicz method” which is the original version of PDS method. A generalization of PDS method is given by

 xk+1=proxγ1f[xk−γ1LTyk], (30) yk+1=proxγ2h∗[yk+γ2L(xk+1+θ(xk+1−xk))], (31)

where is a some scalar. Usually, is chosen because that the fast convergence of is theoretically proved in contrast to of [8]. Therefore, Formulations (30)-(31) are recognized as a standard version of PDS method, currently. Nonetheless, Arrow-Hurwicz method is practically competitive with standard PDS method that a experimental report exists in [8].

Using Moreau decomposition rule:

 z=proxλg∗[z]+λprox1λg[1λz], (32)

Update rule (31) can be rewritten by

 ˜yk+1=yk+γ2L(xk+1+θ(xk+1−xk)), (33) yk+1=˜yk+1−γ2prox1γ2h[1γ2˜yk+1]. (34)

If proximal mapping of is more difficult to calculate or derive than that of , then update rule (33)-(34) is a convenient choice to implement.

Thus, PDS can be applied into many convex optimization problems that the proximal mappings of and are given as analytically computable operations. Furthermore, above formulation can be easily extended into the composite optimization of multiple convex functions [10].

### Iv-B Proposed algorithm

In this section, we apply the PDS algorithm to the proposed optimization problem. Introducing dual variables , , and , Problem (18) can be rewritten as

 minimizex iδ(X)+iv(U) +α||YT||2,1+βN∑n=1λn||Z(n)(n)||∗, (35) s.t. U=X, yn=√wnDnx (∀n), Z(n)=X (∀n),

where is the vectorized form of and is the linear differential operator of the -th mode of the tensor. is the -norm of the matrix, defined as for matrix . Algorithm 1 can be derived using the PDS framework in Problem (35). We refer to this algorithm as the “low-rank and TV (LRTV)–PDS” algorithm.

Note that the -norm, the nuclear norm, and are clearly proximable functions whose calculations are given by

 proxγ||⋅||2,1(Z)=[% proxγ||⋅||2(z1),...,proxγ||⋅||2(zJ)], (36) proxγ||⋅||2(z)=z||z||2max(||z||2−γ,0), (37) proxγ||⋅||∗(Z)=Umax(Σ−γ,0)VT, (38) proxiv(Z)=max(min(Z,vmax),vmin), (39)

where with for , and are the left, center-diagonal, and right matrices, respectively, of the singular value decomposition of . Proximal mappings of with Gaussian and Laplace noise models are provided by Sections IV-C and IV-D.

### Iv-C Proximal mapping of Gaussian noise inequality

In this section, we consider the following problem:

 (40)

Focusing on the elements , those elements are independent with respect to inequality. Thus, for minimizing each element cost , we obtain

 x∗i1,i2,...,iN=zi1,i2,...,iN   (qi1,i2,...,iN=0). (41)

Focusing on the elements , the optimization problem is given as

 minimizexq12||zq−xq||22 s.t. ||tq−xq||22≤δ, (42)

where , , and are vectors consisting of all elements of , , and , respectively, that satisfy . The solution of (42) is given by a projection of on the sphere with center and radius , or by the that is in that sphere (see Fig. 3). We can consider two cases: (a) , and (b) . Thus, we obtain

 x∗q =tq+min(1,η)(zq−tq) =[1−min(1,η)]tq+min(1,η)zq =max(0,1−η)tq+[1−max(0,1−η)]zq, (43)

where . Combining (41) and (43), we obtain

 x∗i1,i2,...,iN= qi1,i2,...,iNmax(0,1−η)ti1,i2,...,iN +[1−qi1,i2,...,iNmax(0,1−η)]zi1,i2,...,iN, (44)

and the proximal mapping of with Gaussian noise model is given by

 (45)

where . Clearly, this computational complexity is linear with respect to the size of tensor .

### Iv-D Proximal mapping of Laplace noise inequality

In this section, we consider the following problem:

 (46)

In the same way to (41), focusing on the elements , we obtain

 x∗i1,i2,...,iN=zi1,i2,...,iN   (qi1,i2,...,iN=0). (47)

Focusing on the elements , the optimization problem is given as

 minimizexq12||zq−xq||22 s.t. ||tq−xq||1≤δ, (48)

The solution of (48) is given by a projection of on the polyhedron with center , or the that is in that polyhedron (see Fig. 4). However, it can not be calculated analytically unlike Gaussian noise model. Generally, it is resolved by linear search problem. Let us consider Lagrange form of (48) as

 minimizexq12||zq−xq||22+τ||tq−xq||1, (49)

then the solution of (49) can be given by soft-thresholding:

 (50)

where , and . The linear search problem can be given by

 τ∗=argminτ≥0τ, s.t. ||ˆxτ−tq||1≤δ. (51)

Finally, the solution of (48) is given as .

Next, we show an efficient algorithm to solve (51). The computational complexity of the proposed algorithm is only for sorting. First, the constraint in (51) can be rewritten by

 (52)

Next, we consider the sorting of as

 hj1≤hj2≤⋯≤hj|Ω|. (53)

The left part of (52) can be illustrated by Fig. 6. Obviously, the left part is with . Next, when , the left part can be obtained by

 H1:=H0−hj1|Ω|. (54)

In general, for , the left part can be calculated by

 Hk+1=Hk−(hjk+1−hjk)(|Ω|−k). (55)

Fig. 6 helps us to understand above formulations. We have . Thus, we can find that satisfies by linear computational complexity of . Finally, optimal value of can be given by

 τ∗=hjk′+1−δ−Hk′+1|Ω|−k′. (56)

From the theory of fixed point algorithm, global convergence of PDS method with sufficiently small step-size has been proven [12, 8, 19]. However, small step-size leads usually slow convergence, and optimal step-size may not be constant, i.e., it may adaptively change in optimization process. Furthermore, appropriate balance between primal and dual step-size parameters is not trivial. To tackle this issue, Goldstein et al. has been proposed a nice adaptation rule of primal-dual step-size parameters in 2015 [15]. By inheriting and improving Goldstein’s rule work, we proposed a new step-size adaptation rule.

In this section, we consider to apply above step-size adaptation rules into our new tensor completion model. It is very important for step-size adaptation to consider primal and dual residual vectors:

 pk+1 :=1γ1(xk−xk+1)−(uk−uk+1) −∑n{√wnDTn(ykn−yk+1n)+(z(n)k−z(n)k+1)}, (57) dk+1 :=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝1γ2(uk−uk+1)−(xk−xk+1)1γ2(yk1−yk+11)−√w1D1(xk−xk+1)⋮1γ2(ykN−yk+1N)−√wNDN(xk−xk+1)1γ2(z(1)k−z(1)k+1)−(xk−xk+1)⋮1γ2(z(N)k−z(N)k+1)−(xk−xk+1)⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠. (58)

Primal and dual vectors can be derived based on Eqs. (22), (23), and (29).

#### Iv-E1 Goldstein’s rule [15]

Here, we introduce an adaptation rule for primal-dual step-size of PDS proposed by Goldstein et al. in 2015. In order to balance and , primal and dual step-size are adjusted as follow:

• If , then , , and ,

• If , then , , and ,

where , and . Moreover, to prevent too large step-size, a backtracking condition is defined as

 Bk+1:= c2γ1||xk+1−xk||2+c2γ2||vk+1−vk||2 −2(vk+1−vk)TA(xk+1−xk), (59)

where is a constant (typically ), and

 (60) AT=[I,√w1DT1,...,√wNDTN,I,...,I], (61)

and if , then and .

#### Iv-E2 Proposed rule

In contrast that Goldstein’s rule balances the primal and dual step-sizes based on inequality condition, the proposed rule is based on primal-dual ratio