1 Introduction
The highdimensional data, also referred to as tensors, arise naturally in a number of scenarios, including image and video processing, and data mining
[1]. However, most of the current processing techniques are developed for twodimensional data [2]. The principal component analysis (PCA) is one of the most widely used one in twodimensional data analysis [3].The robust PCA (RPCA), as an extension of PCA, is an effective method in matrix decomposition problems [4]. Suppose we have a matrix , which can be decomposed as , where is the low rank component of the matrix and is the sparse component. The RPCA method has been applied to image alignment [5], surveillance video processing [6], illumination normalization for face images [7]
. In most applications, the RPCA method should flatten or vectorize the tensor data so as to solve the problem in the matrix. It doesn’t use the structural feature of the data effectively since the information loss involves in the operation of matricization.
Tensor robust principal component analysis (TRPCA) has been studied in [8, 9] based on the tensor singular value decomposition (tSVD). The advantage of tSVD over the existing methods such as canonical polyadic decomposition (CPD) [10] and Tucker decomposition [11] is that the resulting analysis is very close to that of matrix analysis [12]. Similarly, suppose we are given a tensor and it can be decomposed into low rank component and sparse component. We can write it as
(1) 
where denotes the low rank component, and is the sparse component of the tensor. Fig. 1 is the illustration for TRPCA. In [8] the problem (1) is transformed to the convex optimization model:
(2) 
where
is the tensor nuclear norm (see section 2 for the definition),
denotes the norm. In the paper [9] the problem (1) is transformed to another convex optimization model as:(3) 
where is defined as . The two methods solve the tensor decomposition problem based on the tSVD.
The low rank and sparse matrix decomposition has been improved by the [13]. The main idea is incorporating multiscale structures with low rank methods. The additional multiscale structures can obtain a more accurate representation than conventional low rank methods. Inspired by this work, we notice that the sparse component in matrix is blockdistributed in some applications, e.g. shadow and motion in videos. For these images we find it is more effective to extract the low rank components in a another smaller scale of image data. Here we try to extract low rank components in block tensor data that is stacked by small scale of image data. And when we decompose the tensor data into many small blocks, it is easy to extract the principal component in some blocks that have few sparse components. We model our tensor data as the concatenation of block tensors instead of solving the RPCA problem as a whole big tensor. Fig. 2 is the illustration of concatenation of block tensors.
Based on the above motivation, we decompose the whole tensor into concatenation of blocks of the same size, then we extract low rank component of each block by minimizing the tubal rank of each block tensor. Fig. 3 is the illustration of our method. And we get low rank component of the whole tensor by concatenating all the low rank components of the block tensors. The proposed method can be used to some conventional image processing problems, including motion separation for surveillance videos (Section 4.1) and illumination normalization for face images (Section 4.2). The results of numerical experiments demonstrate that our method outperforms the existing methods in term of accuracy.
2 Notations and Preliminaries
In this section, we describe the notations and definitions used throughout the paper briefly [14, 15, 16, 17].
A thirdorder tensor is represented as , and its (i, j, k)th entry is represented as . denotes the (i, j)th tubal scalar. , and are the ith horizontal, jth lateral and kth frontal slices, respectively. and tensor kinds of tensor norms.
We can view a threedimensional tensor of size as an matrix of tubes.
is a tensor which is obtained by taking the fast Fourier transform (FFT) along the third mode of
. For a compact notation we will use = fft to denote the FFT along the third dimension. In the same way, we can also compute from using the inverse FFT (IFFT).Definition 2.1 (tproduct) [12] The tproduct of and is an tensor. The th tube of is given by
(4) 
where denotes the circular convolution between two tubes of same size.
Definition 2.2 (conjugate transpose) [14] The conjugate transpose of a tensor of size is the tensor obtained by conjugate transposing each of the frontal slice and then reversing the order of transposed frontal slices from 2 to .
Definition 2.3 (identity tensor) [14] The identity tensor is a tensor whose first frontal slice is the identity matrix and all other frontal slices are zero.
Definition 2.4 (orthogonal tensor) [14] A tensor is orthogonal if it satisfies
(5) 
Definition 2.5 (fdiagonal tensor) [14] A tensor is called fdiagonal if each of its frontal slices is a diagonal matrix.
Definition 2.6 (tSVD) [14] For , the tSVD of is given by
(6) 
where and are orthogonal tensors of size and respectively, and is a fdiagonal tensor of size .
We can obtain this decomposition by computing matrix singular value decomposition (SVD) in the Fourier domain, as it shows in Algorithm 1. Fig. 4 illustrates the decomposition for the threedimensional case.
tSVD for 3way data 
Input: 
fft(,[],), 
to , do 
U , S , V = svd, 
=, )=, =, 
end for 
ifft(,[],3), ifft(,[],3), ifft(,[],3), 
Output: 
Definition 2.7 (tensor multirank and tubal rank) [9] The tensor multirank of is a vector with its ith entry as the rank of the ith frontal slice of , i. e. . The tensor tubal rank, denoted by , is defined as the number of nonzero singular tubes of , where is from , i. e.
(7) 
Definition 2.8 (tensor nuclear norm: TNN) [9] The tensor nuclear norm of denoted by is defined as the sum of the singular values of all the frontal slices of . The TNN of is equal to the TNN of blkdiag(). Here blkdiag() is a block diagonal matrix defined as follows:
(8) 
where is the ith frontal slice of .
Definition 2.9 (standard tensor basis) [12] The column basis, denoted as , is a tensor of size with its (i,1,1)th entry equaling to 1 and the rest equaling to 0. Naturally its transpose is called row basis.
3 Iterative Block Tensor Singular Value Thresholding
We decompose the whole tensor which satisfies the incoherence conditions into many small blocks of the same size. And the third dimension of the block size should be the same as the third dimension of the tensor. That is to say, given an input tensor and its corresponding block size, we propose a multiblock tensor modeling that models the tensor data as the concatenation of block tensors. And each block tensor can be decomposed into two components, i.e. , where and denote the low rank component and sparse component of block tensor respectively.
As observed in RPCA, the low rank and sparse decomposition is impossible in some cases [4]. Similarly, we are not able to identify the low rank component and sparse component if the tensor is of both low rank and sparsity. Similar to the tensor incoherence conditions [8], we assume the block tensor data in each block satisfies some block tensor incoherence conditions to guarantee successful low rank component extraction.
Definition 3.1 (block tensor incoherence conditions) For , assume that and it has the tSVD , where and satisfy and , and is an fdiagonal tensor. Then satisfies the tensor incoherence conditions with parameter if
(9) 
(10) 
and
(11) 
The incoherence condition guarantees that for small values of , the singular vectors are not sparse. Then the tensor can be decomposed into low rank component and sparse component.
For extracting the low rank component from every block, we process the tensor nuclear norm of , i. e. . Here we can use singular value thresholding operator in the Fourier domain to extract the low rank component of the block tensor [18, 19]. The proposed method is called iterative block tensor singular value thresholding (IBTSVT). The thresholding operator used here is the soft one as follows:
(12) 
where “” keeps the positive part.
After we extract the low rank component , where denotes concatenation operation, we can get the sparse component of the tensor by computing the . See Algorithm 2 in details.
: IBTSVT 
: tensor data 
Initialize: given , , , , and 
block tensors of size , 
while not converged do 
1. Update , 
2. Update , 
3. Compute , . 
end while: 
at ()th step. 
Output: 
In our method, the block size can’t be too large. The large size of the block will make the sparse part contain some low rank component. And if the size of the block is too small, the computational time will be long. Because the number of tSVDs is large. Generally, we can choose our block size .
In our algorithm, we choose , , . But the thresholding parameter is difficult to determine. Here we can get a value by experience. As discussed in [8], the thresholding parameter could be
for every block. This value is for denoising problem in images or videos, where the noise is uniformly distributed. But for different applications, it should be different from
. Because in these applications, the sparse component in data is not uniformly distributed, such as shadow in face images and motion in surveillance videos.4 Experimental Results
In this section, we conduct numerical results to show the performance of the method. We apply IBTSVT method on two different real datasets that are conventionally used in low rank model: motion separation for surveillance videos (Section 4.1) and illumination normalization for face images (Section 4.2).
4.1 Motion Separation for Surveillance Videos
In surveillance video, the background only changes its brightness over the time, and it can be represented as the low rank component. And the foreground objects are the sparse component in videos. It is often desired to extract the foreground objects from the video. We use the proposed IBTSVT method to separate the foreground component from the background one.
We use the surveillance video data used in [6]. Each frame is of size and we use 20 frames. The constructed tensor is and the selected block size is . The thresholding parameter is .
Fig. 5 shows one of the results. We can find that IBTSVT method correctly recovers the background, while the sparse component correctly identifies the moving pedestrians. It shows the proposed method can realize motion separation for surveillance videos.
4.2 Illumination normalization for face images
The face recognition algorithms are sensitive to shadow or occlusions on faces
[7], and it’s important to remove illumination variations and shadow on the face images. The low rank model is often used for face images [20].In our experiments, we use the Yale B face database [7]. Each face image is of size with 64 different lighting conditions. We construct the tensor data and choose the block size . We set the thresholding parameter .
We compare the proposed method with multiscale low rank matrix decomposition method [13] and low rank + sparse method [4]. Fig. 6 shows one of the comparison results. The IBTSVT method can result in almost shadowfree faces. In contrast, the other two methods can only recover the faces with some shadow.
In order to further illustrate the effect of shadow elimination in the recovered face images, we carry on face detection with the recovered data from different methods. In our experiments, we employ the face detection algorithm ViolaJones algorithm [21] to detect the faces and the eyes. The ViolaJones algorithm is a classical algorithm which can be used to detect people’s faces, noses, eyes, mouths, and upper bodies. In the first experiment we put all face images into one image of JPG format. Then we use the algorithm to detect faces in the newly formed image. In the second experiment, we use the algorithm to detect the eyes of every face image. The second and third columns of Table 1 show the detection accuracy ratios of ViolaJones algorithm with different recovered face images. We test how long the three methods process the 64 face images as can be seen in the fourth column. The IBTSVT can improve the efficiency by parallel processing of the block tensors. From the result of Table 1, we can find our method gives the best detection performance, because removing shadow of face images is helpful for face detection.
face detection  eye detection  Time (s)  

Original image  0.297  0.58  NULL 
Low rank + sparsity  0.375  0.70  10 
Multiscale low rank  0.359  1.00  4472 
IBTSVT  0.844  1.00  715 
5 Conclusions
In this paper, we proposed a novel IBTSVT method to extract the low rank component of the tensor using tSVD. The IBTSVT is a good way to utilize the structural feature of tensor by solving TPCA problem in block tensor form. We have given the tensor incoherence conditions for block tensor data. For applications, we considered motion separation for surveillance videos and illumination normalization for face images, and numerical experiments showed its performance gains compared with the existing methods.
6 Acknowledgment
This work is supported by National High Technology Research and Development Program of China (863, No. 2015AA015903), National Natural Science Foundation of China (NSFC, No. 61602091, No. 61602091), the Fundamental Research Funds for the Central Universities (No. ZYGX2015KYQD004, No. ZYGX2015KYQD004), and a grant from the Ph.D. Programs Foundation of Ministry of Education of China (No. 20130185110010).
References
 [1] T. G. Kolda and J. Sun, “Scalable tensor decompositions for multiaspect data mining,” in IEEE International Conference on Data Mining. IEEE, 2008, pp. 363–372.
 [2] J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning, vol. 1, Springer series in statistics Springer, Berlin, 2001.
 [3] I. Jolliffe, Principal Component Analysis, New York: Springer, 2002.
 [4] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis,” Journal of the ACM, vol. 58, no. 3, pp. 1–73, 2011.
 [5] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma, “RASL: robust alignment by sparse and lowrank decomposition for linearly correlated images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2233–2246, 2010.
 [6] L. Li, W. Huang, I. Y. H. Gu, and Q. Tian, “Statistical modeling of complex backgrounds for foreground object detection,” IEEE Transactions on Image Processing, vol. 13, no. 11, pp. 1459–72, 2004.
 [7] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, “From few to many: Illumination cone models for face recognition under variable lighting and pose,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 643–660, 2001.

[8]
C. Lu, Y. Chen J. Feng, W. Liu, Z. Lin, and S. Yan,
“Tensor robust principal component analysis: Exact recovery of
corrupted lowrank tensors via convex optimization,”
in
The IEEE Conference on Computer Vision and Pattern Recognition
, June 2016.  [9] Z. Zhang, G. Ely, S. Aeron, N. Hao, and M. Kilmer, “Novel methods for multilinear data completion and denoising based on tensorsvd,” Computer Science, vol. 44, no. 9, pp. 3842–3849, 2014.
 [10] A. Cichocki, D. Mandic, L. De Lathauwer, G. Zhou, Q. Zhao, C. Caiafa, and H. A. Phan, “Tensor decompositions for signal processing applications: From twoway to multiway component analysis,” IEEE Signal Processing Magazine, vol. 32, no. 2, pp. 145–163, 2014.
 [11] L. De Lathauwer, B. De Moor, and J. Vandewalle, “A multilinear singular value decomposition,” SIAM Journal on Matrix Analysis and Applications, vol. 21, no. 4, pp. 1253–1278, 2000.
 [12] Z. Zhang and S. Aeron, “Exact tensor completion using tsvd,” arXiv preprint arXiv:1502.04689, 2015.
 [13] F. Ong and M. Lustig, “Beyond low rank + sparse: Multiscale low rank matrix decomposition,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 4, pp. 672–687, 2015.
 [14] M. E. Kilmer and C. D. Martin, “Factorization strategies for thirdorder tensors,” Linear Algebra and its Applications, vol. 435, no. 3, pp. 641–658, 2011.
 [15] K. Braman, “Thirdorder tensors as linear operators on a space of matrices,” Linear Algebra and Its Applications, vol. 433, no. 7, pp. 1241–1253, 2010.
 [16] M. E. Kilmer, E. Misha, K. Braman, N. Hao, and R. C. Hoover, “Thirdorder tensors as operators on matrices: a theoretical and computational framework with applications in imaging,” SIAM Journal on Matrix Analysis and Applications, vol. 34, no. 1, pp. 148–172, 2013.
 [17] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Review, vol. 66, no. 4, pp. 294–310, 2005.
 [18] J. F. Cai, E. J. Candès, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM Journal on Optimization, vol. 20, no. 4, pp. 1956–1982, 2008.
 [19] G. A. Watson, “Characterization of the subdifferential of some matrix norms,” Linear Algebra and Its Applications, vol. 170, no. 6, pp. 33–45, 1992.
 [20] R. Basri and D. W. Jacobs, “Lambertian reflectance and linear subspaces,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 2, pp. 218–233, 2003.
 [21] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Computer Vision and Pattern Recognition. IEEE, 2001, vol. 1, pp. I–511.