The problem of recovering a low dimensional linear subspace from high dimensional visual data naturally arises in the fields of computer vision, machine learning and statistics, and has drawn increasing attention in the recent years. Typical examples include representation and recognition of faces[1, 2, 3, 4], structure from motion , recognition of 3D objects under varying pose , motion segmentation 
. In such contexts, the data to be analyzed usually can be formulated as high-order tensors, which are natural generalization of vectors and matrices. Existing approaches, including LRMF and RPCA, proceed by matricizing tensors into matrices and then applying common matrix techniques to deal with tensor problems. However, as shown in, such matricization procedure fails to exploit the essential tensor structure and often leads to suboptimal procedure. Figure 1 illustrates the difference between the matrix based method and tensor based method in dealing with the high-order tensor data. The upper row is the matrix based factorization method, which needs to preliminarily matricize the tensor at the cost of losing data structure information; the lower row is our tensor based method which directly factorizes the tensor without destroying the spatial structures.
Given a high-order tensor data, an efficient way to extract the underlying useful information is low-rank tensor factorization (LRTF), which aims to extract low-rank subspaces underlying those vector spaces such that the original tensor can be suitably expressed through reasonably affiliating these subspaces.
Traditionally, there are two definitions of tensor factorizations, i.e., CP factorization and Tucker factorization.
The CP factorization can be viewed as a higher-order generalization of the matrix singular value decomposition[9, 10, 11, 12]
and has been widely applied to many fields, including image inpainting[13, 14], collaborative filtering , and data mining , etc. The idea of CP is to express a tensor as the sum of a finite number of rank-1 tensors. Mathematically, an -order tensor , with the integer indicating the dimension of along the -th order, is represented in the CP factorization form as
Each element of the tensor has the following form:
where The mode (or factor) matrices refer to the combination of the vectors from the rank-1 components, i.e., and likewise for others.
The Tucker factorization is a form of higher-order principal component analysis[9, 10, 11, 19] and has been widely used in many applications [20, 21, 22, 23, 24, 25, 26]. It decomposes a tensor into a core tensor multiplied (or transformed) by a matrix along each mode, written as
where denotes the -mode matrix product and is the core tensor controlling the interaction between the mode matrices .
Elementwise, the Tucker decomposition in Eq. (3) is
Here the defined -rank of , denoted as rank, considers the mode- rank of tensors. Accordingly, we call a rank- tensor.
As mentioned in , there exist a number of other tensor factorizations but they are all related to the CP and the Tucker factorization. On account of this, our work for robust low rank tensor factorization is designed based on this two typical factorizations.
It is known that the canonical fit function for the LRTF is based on the Frobenius norm function which assumes the noise to follow a Gaussian distribution. However, for many real data, such as the fMRI neuroimaging data  and the video surveillance data , a relative large perturbation in magnitude only affects a relatively small fraction of data points, which often violates the Gaussian assumption and follows a Laplacian distribution instead.
Therefore, it is necessary to consider other loss function that is robust to Laplacian noise. To alleviate this problem, one commonly used strategy is to replace the Frobenius norm function (say, norm) by the -type norm [30, 31], which is known to be robust to gross Laplacian perturbations. Unfortunately, in many real applications, the noise often exhibits very complex statistical distributions rather than a single purely Gaussian or Laplacian noise . This motivates us to consider more flexible modeling strategies to tackle such complex noise cases.
Under the framework of low-rank matrix factorization (LRMF), Meng and De la Torre  firstly proposed to model the noise as Mixture of Gaussians (MoG). They showed that the MoG model is a universal approximator to any continuous distribution, and hence could be capable of modeling a wider range of noise distributions. Along this line, Zhao et al.  further extended the MoG model to deal with robust PCA (RPCA) problem. Extensive experiments on synthetic data, face modeling and background subtraction demonstrated the merits of MoG model.
As such, to share the same light of matrix MoG model, we aim to introduce a novel MoG model to the tensor case for the LRTF task to overcome the drawbacks of existing models, which are only optimal for simple Gaussian or Laplacian noise.
The contributions of this paper can be summarized as follows: (1) As an extension of our last work , we propose a generalized low-rank subspace learning approach called generalized weighted low-rank tensor factorization (GWLRTF), i.e., the GWLRTF-CP and the GWLRTF-Tucker, which both can preserve the essential tensor structure; (2) For modelling complex noise, MoG is applied to the proposed GWLRTF model called generalized weighted low-rank tensor factorization integrated with MoG (MoG GWLRTF), i.e., the MoG GWLRTF-CP and the MoG GWLRTF-Tucker; (3) For solving the proposed model, we propose efficient algorithms to estimate the parameters under the EM framework and through the proposed algorithm of GWLRTF. Our strategy is different from not only the traditional EM algorithm for solving matrix/tensor decomposition models, but also conventional alternative least squares (ALS) techniques for solving other tensor factorization problems; (4) To further compare the performance between CP factorization and Tucker factorization in different real application problems, a series of synthetic and real data experiments are conducted. The source codes of our algorithm are published online: http://vision.sia.cn/our%20team/Hanzhi-homepage/vision-ZhiHan(English).html.
The rest of this paper is organized as follows. Section 2 describes the notation and common operations used in this paper. In Section 3, a generalized model of tensor factorization integrated with MoG is proposed for reconstructing low-rank tensor from high-order tensor with unknown noise. In Section 4, the established problem model is solved under the EM framework with new proposed tensor factorization algorithms. Extensive experiments are conducted on both synthetic data and real image data in Section 5.
Ii Notation and preliminaries
The notations and common operations used throughout the paper are defined as follows. Scalars are denoted by lowercase letters and vectors are denoted by bold lowercase letters with elements . Matrices are represented by uppercase letters with column vectors and elements . The calligraphic letters stand for the the high-order tensors. We denote an -order tensor as , where is a positive integer. Each element in it is represented as , where .
If tensor is a rank-1 tensor, then it can be written as the outer product of vectors, i.e.,
Each element of the tensor is the product of the corresponding vector elements which can be represented as:
The slice of an -order tensor is a matrix defined by fixing every index but two. For instance, the slice of a 3-order tensor has the form: frontal slices , lateral slices , horizontal slices . Meanwhile, each order of a tensor is associated with a ‘mode’ and the unfolding matrix of a tensor in each mode is obtained by unfolding the tensor along its corresponding mode. For example, the mode- unfolding matrix of , denoted as unfold. The inverse operation of the mode- unfolding is the mode- folding, represented as fold. The mode- rank of is defined as the rank of the mode- unfolding matrix rank.
The operation of mode- product of a tensor and a matrix forms a new tensor. Given tensor and matrix , their mode- product is calculated by with element
Given two same-sized tensors , their inner product is defined as:
The Frobenius norm is . The norm is to calculate the number of non-zero entries in , and the norm . To be noted, ,, and for any .
Iii MoG GWLRTF Model
In this section, a generalized weighted low-rank tensor factorization model integrated with MoG (MoG GWLRTF) for modelling complex noise is proposed. By applying MoG to model the noise elements of the input tensor, we obtain the log-likelihood optimization objective.
Firstly, taking the noise part (denoted as ) into consideration, the input tensor can be represented as:
where denotes the low-rank tensor and the corresponding elementwise form is
As MoG has the ability to universally approximate any hybrids of continuous distributions, it is adopted for modeling the unknown noise in the original data. Hence every follows an MoG and the distribution is defined as:
where is the mixing proportion with and . denotes the Gaussian distribution with mean
Therefore, the element in Eq. (10) follows a MoG distribution with mean and variance
. The probability of each elementin the input tensor can be represented as:
where . Correspondingly, the likelihood of can be written as
where is the index set of the non-missing entries of .
Our goal is to maximize the logarithmic form of Eq. (13) with respect to the parameters , i.e.
Iv Algorithms under EM framework
In this section, through assuming a latent variable with higher dimension, we solve the above problem iteratively under the EM framework. We also design algorithms to solve the generalized weighted low-rank tensor factorization model for updating the low-rank tensor.
EM algorithm  is proven to be effective for solving the maximization problem of the log-likelihood function. Therefore, for solving Eq. (14), we assume a higher dimensional latent variable under the EM framework.
In the model, the factorized low-rank tensor components are shared by all the clusters of MoG and the mean for each cluster of the standard EM algorithm is represented by them. Thus our proposed algorithm will iterate between computing responsibilities of all Gaussian components (E Step) and maximizing the parameters and the low-rank tensor in the model (M Step).
E Step: A latent variable is assumed in the model, with and , representing the assigned value of the noise to each component of the mixture. Here we denote . The posterior responsibility of the -th mixture for generating the noise of can be calculated by
The M step maximizes the upper bound given by the E step with regard to :
This maximization problem can be solved by alternatively updating the MoG parameters and the factorized components of low-rank tensor as follows:
M Step to update : The closed-form updates for the MoG parameters are:
M Step to update : Re-write Eq. (17) only with regard to the unknown as follows:
Here denotes the Hadamard product (component-wise multiplication) and the element of is
The whole MoG GWLRTF optimization process is summarized in Algorithm 1.
In the M Step, is evaluated by solving the GWLRTF model . Here, we introduce the two typical factorizations to the GWLRTF and the corresponding algorithms are given in detail in the following parts.
Iv-a Weighted low-rank tensor CP factorization (GWLRTF-CP)
The GWLRTF model of a 3-order tensor in the form of CP factorization can be written as
and are mode matrices with rank . is the weighted tensor which is composed by the standard variance of the input tensor elements.
Because of the effectiveness and implementation convenience of ALS, we adopt its idea to update of the tensor one at a time.
Suppose are data matrices. In order to stack each of the above matrix as a vector, we define the operator .
For each slice of the higher-order tensor, it can be viewed as a linear combination of the corresponding slices of all the rank-1 tensors. Different from other methods for solving the problem of LRTF, we stack each frontal slice of the higher-order tensor as a vector of a new matrix denoted as . Correspondingly, the vectorized horizontal slices and lateral slices are represented as and , respectively.
Firstly we have
Then taking term as an example, the vectorized frontal slice of the higher-order tensor can be written as follows:
For the -th frontal slice of the higher-order tensor, the vectorized corresponding slices of all the rank-1 tensors can be viewed as the -th element of the cell which can be represented as:
Then the -th vector of term can be updated as follows:
where represents the pseudo-inverse matrix of matrix , and denotes the transposed matrix of matrix .
Similarly, we have the term and updated as following:
The whole optimization process is summarized in Algorithm 2.
Iv-B Weighted low-rank tensor Tucker factorization (GWLRTF-Tucker)
By applying the Tucker factorization to the low-rank tensor in the GWLRTF model, we obtain the following GWLRTF-Tucker model
Through coordinate-wisely separating the original optimization problem into solving a sequence of scalar minimization subproblems, coordinate descent has exhibited its effectiveness in dealing with the convex optimization problems [39, 40, 41]. Based on this, we aim to coordinate-wisely optimize each entry of and in Eq. (32).
Update the mode matrices : Firstly, we reformulate Eq. (32) in the form of minimizing the function against only one of the unknown mode matrices (take for example) at a time with others fixed as
Unfolding the tensors along mode-, it can be reformulated as the following sub-problem
Then, the original problem Eq. (32) are separated into the following single-scalar parameter optimization sub-problems
Update the core tensor : Likewise, for the equivalent formulation of Eq. (32)
it can be rewritten as
Here we denote , , , then the optimization of Eq. (38) can be obtained by minimizing the following sub-problems
|MoG LRMF||HaLRTC||LRTA||PARAFAC||MSI DL||CWM LRTF||MoG GWLRTF-CP||MoG GWLRTF-Tucker|
|MoG LRMF||HaLRTC||LRTA||PARAFAC||MSI DL||CWM LRTF||MoG GWLRTF-CP||MoG GWLRTF-Tucker|
In this section, we conduct extensive experiments on both synthetic data and real applications to validate the effectiveness of the proposed MoG GWLRTF, compared with MoG LRMF , HaLRTC , LRTA , PARAFAC , MSI DL , CWM LRTF . Specifically, MoG GWLRTF-Tucker and MoG GWLRTF-CP are also demonstrated to further compare the performance of CP factorization and Tucker factorization in different applications. The synthetic experiments are designed to quantitatively assess our methods from: i) predictive performance over missing entries given an incomplete tensor data; ii) reconstruction performance given a both incomplete and noisy tensor data. Real data applications, i.e., single RGB image reconstruction, face modeling, multispectral image recovery and real hyperspectral image restoration, are further conducted to validate the effectiveness of the proposed algorithms.
V-a Synthetic Experiments
|Facade||MoG LRMF||HaLRTC||LRTA||PARAFAC||MSI DL||CWM LRTF||MoG GWLRTF-CP||MoG GWLRTF-Tucker|
|MoG LRMF||HaLRTC||LRTA||PARAFAC||MSI DL||CWM LRTF||MoG GWLRTF-CP||MoG GWLRTF-Tucker|
The synthetic tensor is generated as follows: firstly, matrices
are drawn from a standard normal distribution, i.e.,, the vectors of the matrices comply with a standard normal distribution ; Secondly, construct the true tensor by , and set the size to and CP rank . Then we conduct two synthetic experiments: i) for validating the predictive performance, we vary the true tensor missing entries rate (, , ) ; ii) for verifying the reconstruction performance, we randomly choose missing entries of the true tensor and further add certain type of noise to it as the following procedure: (1) Gaussian noise ; (2) Sparse noise:
of the non-missing entries with the uniformly distribution over [-5,5]; (3) Mixture noise:of the non-missing elements with the uniformly distribution over [-5,5], and of the rest non-missing with Gaussian noise and the rest with . The performance of each method is quantitatively assessed by the following measurements as used in :
where and are used to denote the noisy tensor and the recovered tensor, respectively. As mentioned in , and are the optimization objectives of existing methods, which assess how the reconstruction complies with the noisy input, but and are more meaningful for evaluating the correctness of the clean subspace recoveries. Therefore, we pay more attention to the quantitative indices of and . In the tables, the first and second best performances are marked out with bold and underline, respectively.
The performance of each method in the synthetic experiments are summarized in Table I and Table II, respectively. From Table I we can see that, in the case of varying data missing rate, our methods always have a relative better predictive performance in all evaluation terms. When the data is only disturbed by a single distribution noise, i.e., Gaussian noise or sparse noise, other methods can also obtain a fairly well results. However, when the noise becomes complex, our methods still have a good reconstruction performance in this case, as shown in Table II.
V-B Single RGB Image Reconstruction
The benchmark colorful building facade image is used in this section to evaluate the performance of different methods in image reconstruction. Note that the colorful image can be viewed as a 3-order tensor of size . Two groups of experiments are considered here.
Firstly, the facade image rescaled to [0,255] is randomly sampled with missing entries and then added with a relative small scale mixture noise: of the non-missing pixels with the uniformly distribution over [-35,35], of the rest non-missing pixels with Gaussian noise and the rest with another uniformly distribution .
The visual effect of each methods are demonstrated in Figure 2. For better visual comparison, we have also provided a zoom-in version of a local region in Figure 2. From the results we can see that the MoG GWLRTF-Tucker method has a better reconstruction performance than the matrix based methods and the traditional tensor based methods in reconstructing the image details.
Besides the visual effect, quantitative assessments are also reported. Three quantitative image quality indices are adopted to evaluate the performance of each method: peak signal-to-noise ratio (PSNR), relative standard error (RSE) and feature similarity (FSIM). Larger values of PSNR and FSIM and smaller values of RSE mean a better restoration results.
The quantitative results obtained by each method are given in Table III (the upper row), and it shows that the MoG GWLRTF-Tucker method is superior to all the other methods except the MoG GWLRTF-CP method which performs nearly as well in this small scale mixture noise case.
Secondly, in order to further compare the reconstruction ability of each method, we add a larger mixture noise to the facade image. The image is first rescaled to [0,1] and then a larger mixture noise added as in the synthetic experiments: missing entries, of the non-missing pixels with the uniformly distribution over [-5,5], of the rest non-missing pixels with Gaussian noise and the rest with another uniformly distribution .
Both the visual and the quantitative results are provided, as in Figure 3 and Table III (the lower row). In Figure 3, the zoom-in version of a local region for comparison is given. From the results, we can see that with the increasing of the mixture noise scale, the MoG GWLRTF-Tucker method performs much better than the MoG GWLRTF-CP method in reconstructing the image details. And they are both superior to the other methods which lose a lot of image structure information.
V-C Face Modeling
In this section, we assess the effectiveness of the proposed methods in face modeling with different objects under different illuminations. It is different from the traditional methods in dealing with face modeling problem which always only focus on one kind of object under different illuminations. The dataset is the ensemble subset of the Extended Yale B database , containing 45 faces of 5 objects and illuminations with size . The original tensor is thus generated with size . All the competing methods used above are also compared here. Considering that some tensor methods are only designed to dealing with 3-order tensors and the matrix methods are originally designed to solve matrix data, in this case the original 4-order tensor is vectorized into the 3-order tensor with size and matrix data with size before processing, respectively. The sampling images from this data subset are plotted in Figure 4.