DeepAI

# Bayesian Low Rank Tensor Ring Model for Image Completion

Low rank tensor ring model is powerful for image completion which recovers missing entries in data acquisition and transformation. The recently proposed tensor ring (TR) based completion algorithms generally solve the low rank optimization problem by alternating least squares method with predefined ranks, which may easily lead to overfitting when the unknown ranks are set too large and only a few measurements are available. In this paper, we present a Bayesian low rank tensor ring model for image completion by automatically learning the low rank structure of data. A multiplicative interaction model is developed for the low-rank tensor ring decomposition, where core factors are enforced to be sparse by assuming their entries obey Student-T distribution. Compared with most of the existing methods, the proposed one is free of parameter-tuning, and the TR ranks can be obtained by Bayesian inference. Numerical Experiments, including synthetic data, color images with different sizes and YaleFace dataset B with respect to one pose, show that the proposed approach outperforms state-of-the-art ones, especially in terms of recovery accuracy.

• 5 publications
• 39 publications
• 4 publications
• 28 publications
03/31/2019

### Robust Tensor Recovery using Low-Rank Tensor Ring

Robust tensor completion recoveries the low-rank and sparse parts from i...
04/22/2020

### Hierarchical Tensor Ring Completion

Tensor completion can estimate missing values of a high-order data from ...
02/27/2022

### Bayesian Robust Tensor Ring Model for Incomplete Multiway Data

Low-rank tensor completion aims to recover missing entries from the obse...
05/08/2018

### Low Rank Tensor Completion for Multiway Visual Data

Tensor completion recovers missing entries of multiway data. Teh missing...
08/03/2017

### Beyond Low Rank: A Data-Adaptive Tensor Completion Method

Low rank tensor representation underpins much of recent progress in tens...
10/29/2020

### Tensor Completion via Tensor Networks with a Tucker Wrapper

In recent years, low-rank tensor completion (LRTC) has received consider...
01/31/2013

### Rank regularization and Bayesian inference for tensor completion and extrapolation

A novel regularizer of the PARAFAC decomposition factors capturing the t...

## I Introduction

Tensors, which are multi-dimensional generalizations of matrices, provide a natural representation for multidimensional data. Exploring the internal structure of tensors could help us obtain more latent information for high-dimensional data processing. For example, a color video is a forth-order tensor, which allows its temporal and spatial correlation could be simultaneously investigated. Recently, tensor-based methods have attracted interests in image processing problems

[1, 2, 3, 4, 5, 6, 7, 8, 9]. Image completion is one of them, which recovers the missing entries during acquisition and transformation.

In recent works, the tensor-based methods of image completion are mostly addressed by assuming the data is low rank and mainly divided into two groups. One is based on the rank minimization model and can be presented as:

 minXrank(X)s. t.XO=TO, (1)

where

is the estimated tensor,

is an -th order observed tensor with the same size as , and is an index set with the entries observed. Traditional decompositions based on rank minimization model such as Tucker decomposition[10, 11]

and tensor-Singular Value Decomposition (t-SVD)

[12, 13] have been well studied. Rank minimization models are developed by minimizing the summation of nuclear norm regularization terms which correspond to the unfolding matrices of . For example, Tucker ranks minimization is firstly proposed in [14], followed by tubal rank minimization [15, 16]. Besides, some improvements for these rank minimization methods are proposed in [17, 18, 19, 20, 21, 22]. Recently, an advanced tensor network named tensor train (TT) has been proposed in [23] and showed its advantage to capture the latent information for image completion [24]. Moreover, as one of the generation formats of TT decomposition, tensor tree ranks minimization for image completion is proposed in [25].

The other important class is factorization based methods which alternatively update the factor with the predefined rank. Mathematically, the low-rank tensor completion problem can be formulated as:

 minX12∥PO(X−T)∥2Fs. t.rank(X)=R. (2)

where is the random sampling operator. In [26], the model with CANDECOMP/PARAFAC (CP) rank known in advance is proposed and solved by the alternating least squares (ALS) algorithm. Besides, considering the factorization based model and ALS approach, some works such as HOOI [11] with Tucker ranks, Tubal-ALS [27] with tubal rank, TT-ALS [28] with TT ranks and TR-ALS [29] with tensor ring (TR) ranks are investigated to fill these fields. Aparting from the ALS framework, Riemannian optimization scheme with nonlinear conjugate gradient approach has been explored to tackle the factorization based tensor completion model, resulting in CP-WOPT [30], RMTC [31], RTTC [32], TR-WOPT [33] and HTTC [34].

In these methods, the advanced tensor networks such as TT and TR perform better than the CP/Tucker based methods in image completion because they can capture more correlations than the traditional tensor decompositions. Compared with TT, TR has more balanced and smaller ranks due to its ring structure, which may be beneficial to explore more latent structure. However, it is intractable to directly minimize tensor ring rank since its corresponding fold matrix is hard to find due to its circular dimensional permutation invariance. The existing TR based methods are based on the ranks pre-defined in advance, which may easily lead to overfitting when rank is set to be large and a few observations are available.

Motivated by these, we present a Bayesian inference (BI) model for inferring TR ranks to solve image completion problem. BI, which models the low-rank problem, shows a success in low-rank matrix factorization [35, 36, 37, 38] by automatically adjusting the tradeoff between rank and fitting error. In addition, some tensor based works [39, 40, 41] reveal the superiority of BI framework on low-rank tensor completion. To the best of our knowledge, this is the first work to investigate low rank TR model for image completion on BI framework. Our objective is to infer the missing entries from observations by low rank tensor ring completion, while TR ranks can be determined automatically.

The natural image has the characteristic of heavy-tailed. The phenomenon could be explained intuitively in Fig. 1

. This property inspires us to assume the core factor, which is the potential part in TR formats, following Student-T distribution. To model this problem, we propose TR decomposition framework with a sparsity-inducing hierarchical prior over core factor. Specifically, each slice of core factor is assumed to independently follow a Gaussian distribution with zero mean and a variance matrix. The variance matrix is treated as a random hyperparameter with Gamma distribution. Then, variational BI algorithm has been utilized to estimate parameters in this model. Experiments on synthetic data and real-world images demonstrate that the recovery performance of the proposed approach outperforms existing state-of-the-art works, which may imply our algorithm can explore more correlations from images. In addition, the experiments also indicate that TR ranks inferred by our algorithm can be used in TR-ALS, which avoids tuning parameters.

The rest of this paper is organized as follows. Sec. II introduces some notations and preliminaries for TR decomposition. In sec. III, the details of Bayesian TR decomposition are introduced, including the model description, solution and discussion. Sec. IV provides some experiments on multi-way data completion. The conclusion is concluded in sec. V.

## Ii Notations and Preliminaries

### Ii-a Notations

Firstly we give the notations to be used. A scalar, a vector, a matrix, and a tensor are written as

, , , and , respectively. The product of two scalars denotes , denotes an -order tensor where is the dimensional size. The trace operation is denoted as where is a square matrix. denotes the vectorization of tensor and denotes a tensor transforming from a vector or a matrix . The Kronecker product of two tensors can be denoted as , where , . The Hadamard product of two tensors is defined as , where and are the entries of and .

### Ii-B Tensor Ring Model

###### Definition 1.

(TR decomposition[42] For an -order tensor , the TR decomposition is defined as

 X(i1,i2,⋯,iN)= Trace(G1(:,i1,:)G2(:,i2,:)⋯GN(:,iN,:)), (3)

where , are the core factors, and the TR ranks are defined as with . For simplicity, we denote TR decomposition by . The graphical illustration of TR decomposition is shown in Fig. 2.

###### Definition 2.

(Tensor permutation) For an -order tensor , the tensor permutation is defined as :

 XPn(in,⋯,iN,i1,⋯,in−1)=X(i1,⋯,iN).
###### Theorem 1.

(Cyclic permutation property[42] Based on the definitions of tensor permutation and TR decomposition, the tensor permutation of is equivalent to its factors circular shifting, as follows:

 XPn=f(Gn,⋯GN,G1,⋯Gn−1),

with entries

 XPn(in,⋯,iN,i1,⋯,in−1)=Trace(Gn(:,in,:) ⋯GN(:,iN,:)G1(:,i1,:)⋯Gn−1(:,in−1,:)).
###### Definition 3.

(Tensor connection product (TCP)[43] The tensor connection product for 3-order tensors is defined as

and , which is the TCP of a set of core factors excepting , is defined as:

 G≠n = TCP(Gn+1,⋯GN,G1,⋯,Gn−1) ∈ RRn×(In+1⋯INI1⋯In−1)×Rn−1.
###### Theorem 2.

(Expectation of Inner product) Letting a random tensor , we can calculate the expectation of inner product by:

 E[⟨Vec(X),Vec(X)⟩] (4) =∑i1=1,⋯,iNN∏n=1(ERn⊗KRnRn−1⊗ERn−1)(E[(gn(in)(gn(in))T])

with

 E[Vec(gn(in)))Vec(gn(in)))T] (5) =E[Vec(gn(in)))]E[Vec(gn(in)))T]+Var(Vec(gn(in)))).
###### Proof.
 E[⟨Vec(X),Vec(X)⟩] =E[∑i1=1,⋯,iNX(i1,⋯,iN)X(i1,⋯,iN)] =E[∑i1=1,⋯,iNTrace(N∏n=1Gn(in))Trace(N∏n=1Gn(in))] =E[∑i1=1,⋯,iNTrace((N∏n=1Gn(in)))⊗(N∏n=1Gn(in))))] =E[∑i1=1,⋯,iNTrace(N∏n=1(Gn(in)⊗Gn(in))] =∑i1=1,⋯,iNN∏n=1(ERn⊗KRnRn−1⊗ERn−1)(E[(gn(in)(gn(in))T]) (6)

where , is an unit matrix, is the permutation matrix.

## Iii Bayesian Tensor Ring decomposition

### Iii-a Model Description

In this section, we present the Bayesian low TR rank decomposition based on Student-T process. Given an incomplete tensor , its entry is denoted by , where is the set of indices of available data in . Our goal is to find a Bayesian low-TR-rank approximation for the observed tensor under probabilistic framework, which is formulated as

 p(TO|{Gn}Nn=1,τ)=I1∏i1⋯IN∏iN (7) N(Ti1,i2,⋯,iN|f(G1(i1),⋯,GN(iN)),τ−1)Oi1⋯iN

where denotes a Gaussian density with mean and variance , denotes the noise precision, is the -th slice of , and is the indicator tensor in which 1 represents observed entry and 0 represents missing entry. The likelihood model in (7) means the observed tensor is generated by two parts where one is the core factors in TR format and the other is noise.

Firstly, we assume entries of noise are independent and identically distributed random variables obeying Gaussian distribution with zero mean and noise precision

. To learn

, we place a hyperprior over the noise precision

, as follows,

 p(τ)=Ga(τ|a,b) (8)

where denotes a Gamma distribution. The expectation of is defined by . The parameters and are set to small values, e.g., , which makes the Gamma distribution a non-informative prior.

Secondly, we assume the recovered tensor has a low rank structure. To learn the low rank structure, we assume the core factor following Student-T distribution and propose a two-layer multiplicative interaction model over core factor. In the first layer, the entries in core factor obey Gaussian distribution with zero mean and a precision matrix:

 p(Gn|λ(n−1),λ(n))=In∏inRn−1∏rn−1Rn∏rn (9) N(Gn(rn−1,in,rn)|0,(λ(n−1)rn−1∗λ(n)rn)−1)

where hyperparameters , and simultaneously control the components in . Specifically, we directly employ a sparsity-inducing prior over each slice of core factor, leading to the following probabilistic framework:

 p(Gn|λ(n−1),λ(n)) (10) =In∏in=1N(Vec(Gn(in))|0,(Λ(n−1)⊗Λ(n))−1)

where denotes the precision matrix. The second layer specifies a Gamma distribution as a hyperprior over , as follows:

 p(λ(n)|cn,dn)=Rn∏rn=1Ga(λ(n)rn|crnn,drnn) (11)

where the parameters and are set to small values for making Gamma distribution a non-informative prior. Besides, the expectation of is defined as:

 E[λ(n)]=cndn.

For simplification, we set and , and all unknown parameters in Bayesian TR model are collected and denoted by

. By combining the stages of the hierarchical Bayesian model, the joint distribution

can be written as:

 p(TO,Z)=p(TO|{Gn}Nn=1,τ)p(τ|a,b) (12) ×N∏n=1p(Gn|λ(n−1),λ(n))p(λ(n−1)|cn−1,dn−1)p(λ(n)|cn,dn)

Our objective is to compute the conditional density of the latent variables given the observation, as follows:

 p(Z|TO)=p(Z,TO)∫p(Z,TO)dZ (13)

Therefore, the missing entries can be inferred by the following equation:

 p(T¯O)=∫p(T¯O|Z)p(Z)dZ (14)

However, the integral over variables is unavailable in closed form, which leads to posterior is intractable to address.

### Iii-B Variable Bayesian Inference

In this section, we apply variable Bayesian inference (VBI) to tackle this problem. In variational inference, we specify a family of densities over the latent variables. Each is a candidate approximation to the exact posteriors. Our goal is to find the best candidate, the one closed to in Kullback-Leibler (KL) divergence, that is:

 q∗(Z)=argminq(Z)∈ZKL(q(Z||p(Z|TO))) (15)

According to KL definition, the problem (15) can be rewritten as:

 (16)

Since is a constant with respect to , the problem (16) can be formulated as an optimization model:

 maxq(G1),⋯,q(GN),q(λ(1)),⋯,q(λ(N)),q(τ)E[lnp(Z,TO)]−E[lnq(Z)] (17)

where based on the mean-filed approximation. It means each parameter in is independent and these parameters could be developed by iteratively optimizing each one while keeping the others fixed.

#### Iii-B1 Update q(Gn)

By substituting the equation (12) into the optimization problem (17), we can obtain the following subproblem with respect to :

 maxq(Gn)E[lnp(TO|Gn,G≠n,τ)]+E[lnp(Gn|λ(n−1),λ(n))]−E[lnq(Gn)] (18)

where

 q(Gn)=In∏in=1N(Vec(Gn(in))|~gn(in),Vnin) (19)

with mean and variance .

From (18), we could infer core factor by receiving the messages from the observed data , the rest core factors and the noise precision , and incorporating the message from its hyperparameters and . By utilizing these messages, the optimization model with respect to each could be rewritten as (details of the derivation can be found in sec. 2 of supplemental materials):

 +E[(Λ(n−1)⊗Λ(n))])Vec(Gn(in))]) −E[τ]TTOinE[G≠nOin]Vec(Gn(in)) −E[τ]Vec(Gn(in))TE[(G≠nOin)TTTOin]} +12{(Vec(Gn(in)))T(Vnin)−1Vec(Gn(in))]) (20) −~gn(in)T(Vnin)−1Vec(Gn(in))−Vec(Gn(in))T(Vnin)−1~gn(in)}

The maximum value reaches with:

 Vnin=(E[τ]E[(G≠nOin)TG≠nOin]+E[(Λ(n−1)⊗Λ(n))])−1 ~gn(in)=E[τ]VninE[(G≠nOin)T]TOin. (21)

where and , is the observed entries in and is the number of observations .

The main computational complexity of updating core factor comes from the operation of . Based on Theorem 2, we can calculate by the following equation:

 E[(G≠nOin)TG≠nOin] (22) =∑OinN∏l≠n(ERl⊗KRlRl−1⊗ERl−1)(E[(gl(il)(gl(il))T])

Assuming and , the computational complexity for the calculation of is . The update of core factor has a complexity of .

#### Iii-B2 Update q(λ(n))

Combining equation (12) with problem (17), we can obtain the subproblem with respect to , as follows (details of the derivation can be found in sec. 3 of supplemental materials):

 maxq(λ(n)) 12{E[lnp(Gn|λ(n−1),λ(n))+lnp(Gn+1|λ(n),λ(n+1)) (23) +2lnp(λ(n)|cn,dn)]}−E[lnq(λ(n))]

where

 q(λ(n))=Rn∏rn=1Ga(λ(n)rn|~crnn,~drnn) (24)

with parameters and .

As shown in (23), the inference of can be obtained by receiving the messages from its corresponding core factors, which are and , and a pair of its partners, including and , meanwhile combining the information with its hyperparameters, which are and . Therefore, for each posteriors of , , the optimization model is

 max~crnn,~drnn(crnn+12(InRn−1+In+1Rn+1)−1)lnλ(n)rn −{(drnn+14(E[λ(n−1)]E[((Gn(rn)Gn(rn))T)] +E[λ(n+1)]E[(Gn+1(rn))TGn+1(rn)))]}λ(n)rn −(~crnn−1)lnλ(n)rn+~drnnλ(n)rn (25)

The optimization solutions are obtained by

 ~crnn = crnn+12(InRn−1+In+1Rn+1) ~drnn = drnn+14(E[λ(n−1)]E[((Gn(rn)Gn(rn))T)] (26) +E[λ(n+1)]E[(Gn+1(rn))TGn+1(rn)))])

where , .

The main computational complexity of updating comes from the calculation of , which can be divided into similar two parts. For one of the parts:

 E[λ(n−1)]E[((Gn(rn)Gn(rn))T)] (27) =In∑in=1E[λ(n−1)](E[^gn(in)]E[^gn(in)]T+Vn(rn))

where . Therefore, for each , the complexity is under the assumption that all and .

#### Iii-B3 Update q(τ)

Similarly, the subproblem corresponding to can be converted into:

 maxq(τ)E[lnp(TO|{Gn}Nn=1,τ)+lnp(τ|a,b)]−E[lnq(τ)]. (28)

where

 q(τ)=Ga(τ|~a,~b) (29)

with parameters and .

Form (28), the inference of can be obtained via receiving messages from observed tensor and core factors, meanwhile, incorporating with the message from the hyperparameters and . Applying these messages, the (28) could be reformulated as:

 max~a,~b(a+O2−1)lnτ −(b+12E[∥O⊙(T−f(G1,G2⋯,GN))∥2F])τ −~alnτ+~bτ (30)

The maximization value could be obtained when

 ~a=a+O2 ~b=b+12E[∥O⊙(T−^X∥2F] (31)

where is the number of total observations, .

It can been seen from (III-B3), calculating costs the most time, as follows:

 E[∥O⊙(T−^X∥2F] (32) =T2O−2∗TO∗^XO+E[^XO^XTO]

with

 E[^XO^XTO]=∑ON∏n=1(ERn⊗KRnRn−1⊗ERn−1)(E[(gn(in)(gn(in))T]) (33)

we could see the main computational complexity of updating is with all and .

For clarity, we call this algorithm low TR rank based on VBI framework (TR-VBI) for image completion and summarize it in Algorithm 1.

### Iii-C Discussion

We have developed an efficient algorithm to automatically determine the TR ranks for tensor completion. From the solution of (III-B1), the update of is related with the noise precision , the information from other core factors and the prior . It can be easily inferred the lower the value of noise precision is, the more the information from is. Meanwhile, we could observe the update of noise precision is impacted by fitting error from (III-B3). Therefore, if the model fits well, there will be more information from other factors than the prior. In addition, from (III-B2), the update of is associated with its interrelated core factors, which are and , and its partners and . The values of and affect their corresponding core factors. Moreover, the smaller values of and lead to larger . The larger values of and will enforce the values in core factor smaller, which will influence the update of and in turn. Therefore, this model have a robust capability of automatically adjusting tradeoff between fitting error and TR ranks.

### Iii-D Complexity Analysis

Storage Complexity For an -order tensor , the storage complexity is , which increases exponentially with its order. Assuming all and in the TR model, we only need to store the core factors and hyperparameters, which are and respectively, leading to storage complexity.

Computational Complexity The computation cost of our proposed algorithm divides into three parts which are the update of core factors , the update of noise precision operation and the update of hyperparameters . Combining these computation complexities together, the computational complexity of our algorithm is for one iteration with and .

## Iv Experiments

To evaluate our algorithm TR-VBI, we conduct experiments on synthetic data and real data, and compare it with TR-ALS [29], FBCP [44], HaLRTC [14] and SiLRTCTT [24]. TR-ALS and SiLRTCTT are advanced tensor networks based methods, where the former one utilizes the model based on factorization with the TR ranks known in advance and SiLRTCTT is based on TT rank minimization model, addressed by the block coordinate descent method. FBCP and HaLRTC are based on traditional tensor decompositions, where FBCP uses the CP decomposition in BI framework while HaLRTC explores low Tucker rank structure using ADMM.

All experiments are tested with respect to different missing ratios (MR), which is:

 MR=M∏Nn=1IN

where

is the number of total missing entries which are chosen randomly in a uniform distribution.

For the experiment on synthetic data, we consider the relative standard error (RSE) as a performance metric. The RSE is defined as

 RSE=∥^X−X∥F∥X∥F

where is the recovered tensor and

is the original one. In addition, peak signal-to-noise ratio (PSNR) are used to evaluate the performance for image recovery experiments too, which is