Multilinear Low-Rank Tensors on Graphs & Applications

11/15/2016 ∙ by Nauman Shahid, et al. ∙ Politecnico di Torino 0

We propose a new framework for the analysis of low-rank tensors which lies at the intersection of spectral graph theory and signal processing. As a first step, we present a new graph based low-rank decomposition which approximates the classical low-rank SVD for matrices and multi-linear SVD for tensors. Then, building on this novel decomposition we construct a general class of convex optimization problems for approximately solving low-rank tensor inverse problems, such as tensor Robust PCA. The whole framework is named as 'Multilinear Low-rank tensors on Graphs (MLRTG)'. Our theoretical analysis shows: 1) MLRTG stands on the notion of approximate stationarity of multi-dimensional signals on graphs and 2) the approximation error depends on the eigen gaps of the graphs. We demonstrate applications for a wide variety of 4 artificial and 12 real tensor datasets, such as EEG, FMRI, BCI, surveillance videos and hyperspectral images. Generalization of the tensor concepts to non-euclidean domain, orders of magnitude speed-up, low-memory requirement and significantly enhanced performance at low SNR are the key aspects of our framework.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 8

page 14

page 20

page 22

page 23

page 24

page 25

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Low-rank tensors span a wide variety of applications, like tensor compression [13, 8], robust PCA [30, 7], completion [15, 6, 14, 9] and parameter approximation in CNNs [29]. However, the current literature lacks development in two main facets: 1) large scale processing and 2) generalization to non-euclidean domain (graphs [27]). For example, for a tensor , tensor robust PCA via nuclear norm minimization [31] costs per iteration which is unacceptable even for as small as 100. Many application specific alternatives, such as randomization, sketching and fast optimization methods have been proposed to speed up computations [32, 12, 22, 11, 3, 10, 1, 35], but they cannot be generalized for a broad range of applications.

In this work, we answer the following question: Is it possible to 1) generalize the notion of tensors to graphs and 2) target above applications in a scalable manner? To the best of our knowledge, little effort has been made to target the former [34], [36], [37] at the price of higher cost, however, no work has been done to tackle the two problems simultaneously. Therefore, we revisit tensors from a new perspective and develop an entirely novel, scalable and approximate framework which benefits from graphs.

It has recently been shown for the case of 1D [21] and time varying signals [16]

that the first few eigenvectors of the graph provide a smooth basis for data, the notion of graph stationarity. We generalize this concept for higher order tensors and develop a framework that encodes the tensors as a multilinear combination of few graph eigenvectors constructed from the rows of its different modes (figure on the top of this page). This multilinear combination, which we call

graph core tensor (GCT), is highly structured like the core tensor obtained by Multilinear SVD (MLSVD) [13] and can be used to solve a plethora of tensor related inverse problems in a highly scalable manner.

Contributions: In this paper we propose Multilinear low-rank tensors on graphs (MLRTG) as a novel signal model for low-rank tensors. Using this signal model, we develop an entirely novel, scalable and approximate framework for a variety of inverse problems involving tensors, such as Multilinear SVD and robust tensor PCA. Most importantly, we theoretically link the concept to joint approximate graph stationarity and characterize the approximation error in terms of the eigen gaps of the graphs. Various experiments on a wide variety of 4 artificial and 12 real benchmark datasets such as videos, face images and hyperspectral images using our algorithms demonstrate the power of our approach. The MLRTG framework is highly scalable, for example, for a tensor , Robust Tensor PCA on graphs scales with , where , as opposed to .

Notation: We represent tensors with bold calligraphic letters and matrices with capital letters . For a tensor , with a multilinear rank [4], its matricization / flattening is a re-arrangement such that . For simplicity we work with 3D tensors of same size and rank in each dimension.

Graphs: We specifically refer to a -nearest neighbors graph between the rows of as with vertex set , edge set and weight matrix . , as defined in [27], is constructed via a Gaussian kernel and the combinatorial Laplacian is given as , where

is the degree matrix. The eigenvalue of decomposition of

and we refer to the 1st eigenvectors and eigenvalues as . Throughout, we use FLANN [17] for the construction of which costs and is parallelizable. We also assume that a fast and parallelizable framework, such as the one proposed in [18] or [28] is available for the computation of which costs , where is the number of processors.

2 Multilinear Low-Rank Tensors on Graphs

A tensor is said to be Multilinear Low-Rank on Graphs (MLRTG) if it can be encoded in terms of the lowest Laplacian eigenvectors as:

where

denotes the vectorization,

denotes the kronecker product, and is the Graph Core Tensor (GCT). We refer to a tensor from the set of all possible MLRTG as . The main idea is illustrated in the figure on the first page of this paper. We call the tuple , where , as the Graph Multilinear Rank of . In the sequel, for the simplicity of notation: 1) we work with the matricized version (along mode 1) of and 2) denote . Then for , .

In simple words:

one can encode a low-rank tensor in terms of the low-frequency Laplacian eigenvectors. This multilinear combination is called GCT. It is highly structured like the core tensor obtained by standard Multilinear SVD (MLSVD) and can be used for a broad range of tensor based applications. Furthermore, the fact that GCT encodes the interaction between the graph eigenvectors, renders its interpretation as a multi-dimensional graph fourier transform.

In real applications, due to noise the tensor is only approximately low-rank (approximate MLRTG), so the following Lemma holds:

Lemma 1.

For any , where and models the noise and errors, the matricization of satisfies

(1)

where and denote the complement Laplacian eigenvectors (above ) and . Furthermore, .

Proof.

Please refer to the proof of Theorem 1 in Appendix A.1. ∎

Figure 1: Illustration the properties of MLRTG in terms of an arbitrary graph spectral covariance (GSC) matrix.
Figure 2:

A hyperspectral image tensor from the Stanford database. The singular values of the modes, graph spectral covariance (GSC) and the energy concentration plot clearly show that the tensor is approximately MLRTG.

Let be the covariance of , then the Graph Spectral Covariance (GSC) is given as . For a signal that is approximately stationary on a graph, has most of its energy concentrated on the diagonal [21]. For MLRTG, we additionally require the energy to be concentrated on the top corner of , which we call low-frequency energy concentration.

Key properties of MLRTG: Thus, for any , the GSC of each of its matricization satisfies: 1) joint approximate graph stationarity, i.e, and 2) low frequency energy concentration, i.e, , .

Theorem 1.

For any , if and only Lemma 1 and property 2 hold.

Proof.

Please refer to Appendix A.1. ∎

Fig. 1 illustrates the properties in terms of an arbitrary GSC matrix. The leftmost plot corresponds to the case of approximate stationarity (strong diagonal), the middle to the case of non-noisy MLRTG and the rightmost plot to the case of approximate MLRTG. Note that the energy spilling out of the top left submatrix due to noise results in an approximate low-rank representation.

Examples: Many real world datasets satisfy the approximate MLRTG assumption. Fig. 2 presents the example of a hyperspectral face tensor. The singular values for each of the modes show the low-rankness property, whereas the graph spectral covariance and the energy concentration plot (property 2) show that 99% of the energy of the mode 1 can be expressed in terms of the top 2% of the graph eigenvectors. Examples of FMRI and COIL20 tensor are also presented in Fig. 11 of Appendix A.2.

3 Applications of MLRTG

Any is the product of two important components 1) the Laplacian eigenvectors and eigenvalues , for each of the modes of the tensor and 2) the GCT . While the former can be pre-computed, the GCT needs to be determined via an appropriate procedure. Once determined, it can be used directly as a low-dimensional feature of the tensor or employed for other useful tasks. It is therefore possible to propose a general framework for solving a broad range of tensor / matrix inverse problems, which optimize the GCT . For a general linear operator and its matricization ,

(2)

where is an norm depending on the application under consideration and , denote the kernelized Laplacian eigenvalues as the weights for the nuclear norm minimization. Assuming the eigenvalues are sorted in ascending order, this corresponds to a higher penalization of higher singular values of which correspond to noise. Thus, the goal is to determine a graph core tensor whose rank is minimized in all the modes. Such a nuclear norm minimization on the full tensor (without weights) has appeared in earlier works [7]. However, note that in our case we lift the computational burden by minimizing only the core tensor .

3.1 Graph Core Tensor Pursuit (GCTP)

The first application corresponds to the case where one is only interested in the GCT . For a clean matricized tensor , it is straight-forward to determine the matricized as .

For the case of noisy corrupted with Gaussian noise, one seeks a robust which is not possible without an appropriate regularization on . Hence, we propose to solve problem 2 with Frobenius norm:

(3)

Using in eq. (3), we get:

(4)

which we call as Graph Core Tensor Pursuit (GCTP). To solve GCTP, one just needs to apply the singular value soft-thresholding operation (Appendix A.3) on each of the modes of the tensor . For , it scales with , where .

3.2 Graph Multilinear SVD (GMLSVD)

Notice that the decomposition defined by eq. (1) is quite similar to the standard Mulitlinear SVD (MLSVD) [13]. Is it possible to define a graph based MLSVD using ?

MLSVD: In standard MLSVD, one aims to decompose a tensor into factors which are linked by a core . This can be attained by solving the ALS problem [13] which iteratively computes the SVD of every mode of until the fit stops to improve. This costs per iteration for rank .

From MLSVD to GMLSVD: In our case the fit is given in terms of the pre-computed Laplacian eigenvectors , i.e, and is determined by GCTP eq. (4). This raises the question about how the factors relate to and core tensor to . We argue as following: let be the MLSVD of . Then, if we set set , then . While we give an example below, a more thoretical study is presented in the Theorem 2.

Algorithm for GMLSVD: Thus, for a tensor , one can compute GMLSVD in the following steps: 1) Compute the graph core tensor via GCTP (eq. (4)), 2) Perform the MLSVD of , 3) Let the factors and the core tensor is . Given the Laplacian eigenvectors , GMLSVD scales with per iteration which is the same as the complexity of solving GCTP.

Example: To understand this, imagine the case of 2D tensor (a matrix) of vectorized wedge images from COIL20 dataset in the columns. for this matrix corresponds to the left singular vectors obtained by SVD and correspond to the first eigenvectors of the Laplacian between the rows (pixels). Fig. 3 shows an example wedge image, 1st singular vector in obtained via SVD and the 2nd Laplacian eigenvector . Clearly, the 1st singular vector in is not equal to the 2nd eigenvector (and others). However, if we recover using GCTP (eq. (4)) and then perform the SVD of and let , then (bottom right plot in Fig. 3). For more examples, please see Fig. 12 in the Appendices.

Figure 3: An example wedge image from COIL20 dataset, 1st singular vector in obtained via SVD and the 2nd Laplacian eigenvector . Clearly, (and other eigenvectors). However, , where , and are the left singular vectors of via GCTP eq. (4)

3.3 Tensor Robust PCA on Graphs (TRPCAG)

Another important application of low-rank tensors is Tensor Robust PCA (TRPCA) [30]. Unfortunately, this method scales as . We propose an alternate framework, tensor robust PCA on graphs (TRPCAG):

(5)

The above algorithm requires nuclear norm on and scales with . This is a significant complexity reduction over TRPCA. We use Parallel Proximal Splitting Algorithm to solve eq. (5) as shown in Appendix A.3.

4 Theoretical Analysis

Although the inverse problems of the form (eq. 2) are orders of magnitude faster than the standard tensor based inverse problems, they introduce some approximation. First, note that we do not present any procedure to determine the optimal . Furthermore, as noted from the proof of Theorem 1, for , the choice of depends on the eigen gap assumption (), which might not exist for the -Laplacians. Finally, noise in the data adds to the approximation as well.

We perform our analysis for 2D tensors, i.e, matrices of the form . The results can be extended for high order tensors in a straight-forward manner. We assume further that 1) the eigen gaps exist, i.e, there exists a , such that and 2) we select a for our method. For the case of 2D tensor, the general inverse problem 2, using can be written as:

(6)
Theorem 2.

For any ,

  1. Let be the SVD of and be the SVD of the GCT obtained via GCTP (eq.(4)). Now, let , where are the Laplacian eigenvectors of , then, upto a sign permutation and .

  2. Solving eq. (6), with a is equivalent to solving the following factorized graph regularized problem:

    (7)

    where , and .

  3. Any solution of (2), where and with and , where and satisfies

    (8)

    where denote the eigenvalues of and , where denote the projection of the factors on the complement eigenvectors.

Proof.

Please see Appendix A.4. ∎

In simple words: Theorem 2 states that 1) the singular vectors and values of a matrix / tensor obtained by GMLSVD are equivalent to those obtained by MLSVD, 2) in general, the inverse problem 6 is equivalent to solving a graph regularized matrix / tensor factorization problem (eq.2) where the factors belong to the span of the graph eigenvectors constructed from the modes of the tensor. The bound eq. 3 shows that to recover an MLRTG one should have large eigen gaps . This occurs when the rows of the matricized can be clustered into clusters. The smaller is, the closer is to . In case one selects a , the error is characterized by the projection of singular vectors on complement graph eigenvectors . Our experiments show that selecting a always leads to a better recovery when the exact value of is not known.

5 Experimental Results

Figure 4: Performance comparison of GSVD with SVD and GMLSVD with MLSVD for 2D artificial (1st row), 3D artificial (2nd row) and 4G real EEG tensors (3rd row) under different SNR scenarios. The artificial tensors have a size 100 and rank 10 along each mode and the 4D EEG tensor has the size . A core tensor of size 30 along each mode and is used for the artificial and real tensors respectively. The leftmost plots show the reconstruction errors, the middle plots show the reconstruction error for top 30 singular values and the right plots show the subspace angles for the 1st five subspace vectors. Clearly GSVD / GMLSVD outperform SVD / MLSVD in terms of computation time and error for big and noisy datasets.

Datasets: To study the performance of GMLSVD and TRPCAG, we perform extensive experimentation on 4 artificial and 12 real 2D-4D tensors. All types of low-rank artificial datasets are generated by filtering a randomly generated tensor with the combinatorial Laplacians constructed from its flattened modes (details in Appendix A.5). Then, different levels of Gaussian and sparse noise are added to the tensor. The real datasets include Hyperspectral images, EEG, BCI, FMRI, 2D and 3D image and video tensors and 3 point cloud datasets.

Methods: GMLSVD and TRPCAG are low-rank tensor factorization methods, which are programmed using GSPBox [19], UNLocBox [20] and Tensorlab [33] toolboxes. Note that GMLSVD is robust to Gaussian and TRPCAG to sparse noise, therefore, these methods are tested under varying levels of these two types of noise. To avoid any confusion, we call the 2D tensor (matrix) version of GMLSVD as Graph SVD (GSVD).

For the 3D tensors with Gaussian noise we compare GMLSVD performance with MLSVD. For the 3D tensor with sparse noise we compare TRPCAG with Tensor Robust PCA (TRPCA) [30] and GMLSVD. For the 2D tensors (matrices) with Gaussian noise we compare GSVD with simple SVD. Finally for the 2D matrix with sparse noise we compare TRPCAG with Robust PCA (RPCA) [2], Robust PCA on Graphs (RPCAG) [24], Fast Robust PCA on Graphs (FRPCAG) [25] and Compressive PCA (CPCA) [26]. Not all the methods are tested on all the datasets due to computational reasons.

Parameters: For all the experiments involving TRPCAG and GMLSVD, the graphs are constructed from the rows of each of the flattened modes of the tensor, using and a Gaussian kernel for weighting the edges. For all the other methods the graphs are constructed as required, using the same parameters as above. Each method has several hyper-parameters which require tuning. For a fair comparison, all the methods are properly tuned for their hyper-parameters and best results are reported. For details on all the datasets, methods and parameter tuning please refer to Appendix A.5.

Evaluation Metrics: The metrics used for the evaluation can be divided into two types: 1) quantitative and 2) qualitative. Three different types of quantitative measures are used: 1) normalized reconstruction error of the tensor , 2) the normalized reconstruction error of the first (normally ) singular values along mode 1 3) the subspace angle (in radian) of mode 1 between the 1st five subspace vectors determined by the proposed method and those of the clean tensor: and 4) the alignment of the singular vectors , where and denote the mode 1 singular vectors determined by the proposed method and clean tensor. The qualitative measure involves the visual quality of the low-rank components of tensors.

5.1 Experiments on GMLSVD

Performance study on artificial datasets: The first two rows of Fig. 4 show the performance of GSVD (for 2D) and GMLSVD (for 3D) on artificial tensors of the size 100 and rank 10 in each mode, for varying levels of Gaussian noise ranging from 15dB to 1dB. The three plots show the reconstruction error of the recovered tensor, the first singular values and and the subspace angle of the 1st mode subspace (top 5 vectors), w.r.t to those of the clean tensor. These results are compared with the standard SVD for 2D tensor and standard MLSVD for the 3D tensor. It is interesting to note from the leftmost plot that the reconstruction error for GSVD tends to get lower as compared to SVD at higher noise levels (SNR less than 5dB). The middle plot explains this phenomena where one can see that the error for singular values is significantly lower for GSVD than SVD at higher noise levels. This observation is logical, as for higher levels of noise the lower singular values are also affected. SVD is a simple singular value thresholding method which does not eliminate the effect of noise on lower singular values, whereas GSVD is a smart weighted nuclear norm method which thresholds the lower singular values via a function of the graph eigenvalues. This effect is shown in detail in Fig. 5. On the contrary, the subspace angle (for first 5 vectors) for GSVD is higher than SVD for all the levels of noise. This means that the subspaces of the GSVD are not well aligned with the ones of the clean tensor. However, as shown in the right plot in Fig. 5, strong first 7 elements of the diagonal show that the individual subspace vectors of GSVD are very well aligned with those of the clean tensor. This is the primary reason why the reconstruction error of GSVD is less as compared to SVD at low SNR. At higher SNR, the approximation error of GSVD dominates the error due to noise. Thus, GSVD reveals its true power at low SNR scenarios. Similar observations can be made about GMLSVD from the 2nd row of Fig. 4.

Figure 5: Singular values (left plot) of GSVD, SVD and clean data and singular vector alignment of GSVD and clean data (right plot) for a 2D matrix corrupted with Gaussian noise (SNR 5dB). Clearly GSVD eliminates the effect of noise from lower singular values and aligns the first few singular vectors appropriately.

Time, memory & performance on real datasets: The 3rd row of Fig. 4 shows the results of GMLSVD compared to MLSVD for a 4D real EEG dataset of size 3GB and dimensions . A core tensor of size is used for this experiment. It is interesting to note that for low SNR scenarios the reconstruction error of both methods is approximately same. GMLSVD and MLSVD both show a significantly lower error for the singular values (note that the scale is small), whereas GMLSVD’s subspaces are less aligned as compared to those of MLSVD. The rightmost plot of this figure compares the computation time of both methods. Clearly, GMLSVD wins on the computation time (120 secs) significantly as compared to MLSVD (400 secs). For this 3GB dataset, GMLSVD requires 6GB of memory whereas MLSVD requires 15GB, as shown in the detailed results in Fig. 13 of the Appendices. Fig. 13 also presents results for BCI, hyperspectral and FMRI tensors and reveals how GMLSVD performs better as compared to MLSVD while requiring less computation time and memory for big datasets in the low SNR regime. For a visualization of the clean, noisy and GMLSVD recovered tensors, their singular values and the alignments of the subspace vectors for FMRI and hyperspectral tensors, please refer to Fig. 14 in Appendices.

Figure 6: Qualitative and quantitative results for the face () 3D hyperspectral tensors. Using core of size we attained 150 times compression while maintaining an SNR of 25dB.
Figure 7: Performance analysis of TRPCAG for 2D artificial (1st row) and 3D artificial (2nd row) with varying levels of sparse noise and 2D airport lobby video tensor (3rd row). The artificial tensors have a size 100 and rank 10 along each mode. A core tensor of size 30 along each mode is used for the artificial tensors and for the real tensor. The leftmost plots show the reconstruction errors, the middle plots show the reconstruction error for 30 singular values and the right plots show the subspace angles for the 1st five subspace vectors. For the 2D real video dataset, the rightmost frame shows the result obtained via TRPCAG. The computation time for different methods is written on the top of frames. Clearly TRPCAG performs quite well while significantly reducing the computation time.
Figure 8: TRPCAG performance for recovering the low-rank point cloud of a dancing person from the sparse noise. Actual point cloud (left), noisy point cloud (middle), recovered via TRPCAG (right).
Figure 9: Low-rank recovery for a 3D video of dimension and size 1.5GB via TRPCAG. Using a core size of , TRPCAG converged in less than 3 minutes.
Figure 10: Robust recovery of subspace structures via TRPCAG. The leftmost plots show a clean and sparsely corrupted sample wedge image from COIL20 dataset. Other plots in the 1st row show the 1st singular vector recovered by various low-rank recovery methods, SVD, FRPCAG, RPCA, RPCAG and TRPCAG and the 2nd row shows the alignment of the 1st 20 singular vecors recovered by these methods with those of the clean tensor.

Compression: An obvious goal of MLSVD is the compression of low-rank tensors, therefore, GMLSVD can also be used for this purpose. Fig. 6 shows results for the face () 3D hyperspectral tensor. Using core of size we attained 150 times compression while maintaining SNR of 25dB. Fig. 15 in the Appendices shows such results for three other datasets. The rightmost plots of Fig. 14 in Appendices also show compression results for FMRI, EEG and BCI datasets.

5.2 Experiments on TRPCAG

Performance study on artificial datasets: The first two rows of Fig. 7 show experiments on the 2D and 3D artificial datasets with varying levels of sparse noise. The 2D version of TRPCAG is compared with state-of-the-art Robust PCA based methods, FRPCAG and RPCA and also with SVD based methods like MLSVD and GSVD. Conclusions, similar to those for the Gaussian noise experiments can be drawn for the 2D matrices (1st row). Thus TRPCAG is better than state-of-the-art methods in the presence of large fraction of sparse noise. For the 3D tensor (2nd row), TRPCAG is compared with GMLSVD and TRPCA [30]. Interestingly, the performance of TRPCA is always better for this case, even in the presence of high levels of noise. While TRPCA produces the best results, its computation time is orders of magnitude more than TRPCAG (discussed next). A detailed analysis of the singular values recovered by TRPCAG can be done via Fig. 16 in the Appendices.

Time & performance on 2D real datasets: The 3rd row of Fig. 7 present experiments on the 2D real video dataset obtained from an airport lobby (every frame vectorized and stacked as the columns of a matrix). The goal is to separate the static low-rank component from the sparse part (moving people) in the video. The results of TRPCAG are compared with RPCA, RPCAG, FRPCAG and CPCA with a downsampling factor of 5 along the frames. Clearly, TRPCAG recovers a low-rank which is qualitatively equivalent to the other methods in a time which is 100 times less than RPCA and RPCAG and an order of magnitude less as compared to FRPCAG. Furthermore, TRPCAG requires the same time as sampling based CPCA method but recovers a better quality low-rank structure as seen from the 3rd row. The performance quality of TRPCAG is also evident from the point cloud experiment in Fig. 8 where we recover the low-rank point cloud of a dancing person after adding sparse noise to it. Experiments on two more videos (shopping mall and escalator) and two point cloud datasets (walking dog and dancing girl) are presented in Figs. 17, 18 & 19 in Appendices.

Scalability of TRPCAG on 3D video: To show the scalability of TRPCAG as compared to TRPCA, we made a video of snowfall at the campus and tried to separate the snow-fall from the low-rank background via both methods. For this 1.5GB video of dimension , TRPCAG (with core tensor size ) took less than 3 minutes, whereas TRPCA did not converge even in 4 hours. The result obtained via TRPCAG is visualized in Fig. 9. The complete videos of actual frames, low-rank and sparse components, for all the above experiments are provided with the supplementary material of the paper.

5.3 Robust Subspace Recovery

It is imperative to point out that the inverse problems of the form eq. 2 implicitly determine the subspace structures (singular vectors) from grossly corrupted tensors. Examples of GMLSVD from Section 3.2 correspond to the case of clean tensor. In this section we show that the singular vectors recovered by TRPCAG (Section 3.3) from the sparse and grossly corrupted tensors also align closely with those of the clean tensor. Fig. 10 shows the example of the same wedge image from 2D COIL20 dataset that was used in Section 3.2. The leftmost plots show a clean and sparsely corrupted sample wedge image. Other plots in the 1st row show the 1st singular vector recovered by various low-rank recovery methods, SVD, FRPCAG, RPCA, RPCAG and TRPCAG and the 2nd row shows the alignment of the 1st 20 singular vecors recovered by these methods with those of the clean tensor. The rightmost plots correspond to the case of TRPCAG and clearly show that the recovered subspace is robust to sparse noise. Examples of YALE, COIL20 and ORL datasets are also shown in Fig. 20 in the Appendices.

5.4 Effect of parameters

Due to space constraints we study the effect of parameters , the multilinear rank and the power of in eq. 2 in Fig. 21 of the Appendices. To summarize, once the parameters and are tuned, eq. 2 becomes insensitive to parameter .

6 Conclusion

Inspired by the fact that the first few eigenvectors of the -graph provide a smooth basis for data, we present a graph based low-rank tensor decomposition model. Any low-rank tensor can be decomposed as a multilinear combination of the lowest eigenvectors of the graphs constructed from the rows of the flattened modes of the tensor (MLRTG). We propose a general tensor based convex optimization framework which overcomes the computational and memory burden of standard tensor problems and enhances the performance in the low SNR regime. More specifically we demonstrate two applications of MLRTG 1) Graph based MLSVD and 2) Tensor Robust PCA on Graphs for 4 artificial and 12 real datasets under different noise levels. Theoretically, we prove the link of MLRTG to the joint stationarity of signals on graphs. We also study the performance guarantee of the proposed general optimization framework by connecting it to a factorized graph regularized problem.

References

  • [1] S. Bhojanapalli and S. Sanghavi. A new sampling technique for tensors. arXiv preprint arXiv:1502.05023, 2015.
  • [2] E. J. Candès, X. Li, Y. Ma, and J. Wright.

    Robust principal component analysis?

    Journal of the ACM (JACM), 58(3):11, 2011.
  • [3] J. H. Choi and S. Vishwanathan. Dfacto: Distributed factorization of tensors. In Advances in Neural Information Processing Systems, pages 1296–1304, 2014.
  • [4] A. Cichocki, D. Mandic, L. De Lathauwer, G. Zhou, Q. Zhao, C. Caiafa, and H. A. Phan. Tensor decompositions for signal processing applications: From two-way to multiway component analysis. Signal Processing Magazine, IEEE, 32(2):145–163, 2015.
  • [5] P. L. Combettes and J.-C. Pesquet. Proximal splitting methods in signal processing. In Fixed-point algorithms for inverse problems in science and engineering, pages 185–212. Springer, 2011.
  • [6] C. Da Silva and F. J. Herrmann. Hierarchical tucker tensor optimization-applications to tensor completion. In Proc. 10th International Conference on Sampling Theory and Applications, 2013.
  • [7] D. Goldfarb and Z. Qin. Robust low-rank tensor recovery: Models and algorithms. SIAM Journal on Matrix Analysis and Applications, 35(1):225–253, 2014.
  • [8] L. Grasedyck, D. Kressner, and C. Tobler. A literature survey of low-rank tensor approximation techniques. GAMM-Mitteilungen, 36(1):53–78, 2013.
  • [9] B. Huang, C. Mu, D. Goldfarb, and J. Wright. Provable models for robust low-rank tensor completion. Pacific Journal of Optimization, 11(2):339–364, 2015.
  • [10] F. Huang, S. Matusevych, A. Anandkumar, N. Karampatziakis, and P. Mineiro. Distributed latent dirichlet allocation via tensor factorization. In NIPS Optimization Workshop, 2014.
  • [11] F. Huang, U. Niranjan, M. U. Hakeem, and A. Anandkumar. Fast detection of overlapping communities via online tensor methods. arXiv preprint arXiv:1309.0787, 2013.
  • [12] U. Kang, E. Papalexakis, A. Harpale, and C. Faloutsos. Gigatensor: scaling tensor analysis up by 100 times-algorithms and discoveries. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 316–324. ACM, 2012.
  • [13] T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM review, 51(3):455–500, 2009.
  • [14] D. Kressner, M. Steinlechner, and B. Vandereycken. Low-rank tensor completion by riemannian optimization. BIT Numerical Mathematics, 54(2):447–468, 2014.
  • [15] J. Liu, P. Musialski, P. Wonka, and J. Ye.

    Tensor completion for estimating missing values in visual data.

    IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):208–220, 2013.
  • [16] A. Loukas and N. Perraudin. Stationary time-vertex signal processing. arXiv preprint arXiv:1611.00255, 2016.
  • [17] M. Muja and D. Lowe.

    Scalable nearest neighbour algorithms for high dimensional data.

    2014.
  • [18] J. Paratte and L. Martin. Fast eigenspace approximation using random signals. arXiv preprint arXiv:1611.00938, 2016.
  • [19] N. Perraudin, J. Paratte, D. Shuman, V. Kalofolias, P. Vandergheynst, and D. K. Hammond. GSPBOX: A toolbox for signal processing on graphs. ArXiv e-prints, #aug# 2014.
  • [20] N. Perraudin, D. Shuman, G. Puy, and P. Vandergheynst. UNLocBoX A matlab convex optimization toolbox using proximal splitting methods. ArXiv e-prints, #feb# 2014.
  • [21] N. Perraudin and P. Vandergheynst. Stationary signal processing on graphs. ArXiv e-prints, #jan# 2016.
  • [22] A.-H. Phan, P. Tichavskỳ, and A. Cichocki. Fast alternating ls algorithms for high order candecomp/parafac tensor factorizations. IEEE Transactions on Signal Processing, 61(19):4834–4846, 2013.
  • [23] N. Rao, H.-F. Yu, P. K. Ravikumar, and I. S. Dhillon. Collaborative filtering with graph information: Consistency and scalable methods. In Advances in Neural Information Processing Systems, pages 2107–2115, 2015.
  • [24] N. Shahid, V. Kalofolias, X. Bresson, M. Bronstein, and P. Vandergheynst. Robust principal component analysis on graphs. arXiv preprint arXiv:1504.06151, 2015.
  • [25] N. Shahid, N. Perraudin, V. Kalofolias, G. Puy, and P. Vandergheynst. Fast robust PCA on graphs. IEEE Journal of Selected Topics in Signal Processing, 10(4):740–756, 2016.
  • [26] N. Shahid, N. Perraudin, G. Puy, and P. Vandergheynst. Compressive pca for low-rank matrices on graphs. Technical report, 2016.
  • [27] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. Signal Processing Magazine, IEEE, 30(3):83–98, 2013.
  • [28] S. Si, D. Shin, I. S. Dhillon, and B. N. Parlett. Multi-scale spectral decomposition of massive graphs. In Advances in Neural Information Processing Systems, pages 2798–2806, 2014.
  • [29] C. Tai, T. Xiao, X. Wang, et al. Convolutional neural networks with low-rank regularization. arXiv preprint arXiv:1511.06067, 2015.
  • [30] R. Tomioka, K. Hayashi, and H. Kashima. Estimation of low-rank tensors via convex optimization. arXiv preprint arXiv:1010.0789, 2010.
  • [31] R. Tomioka, K. Hayashi, and H. Kashima. On the extension of trace norm to tensors. In

    NIPS Workshop on Tensors, Kernels, and Machine Learning

    , page 7, 2010.
  • [32] C. E. Tsourakakis. Mach: Fast randomized tensor decompositions. In SDM, pages 689–700. SIAM, 2010.
  • [33] S. L. V. B. M. Vervliet N., Debals O. and D. L. L. Tensorlab 3.0, March 2016.
  • [34] C. Wang, X. He, J. Bu, Z. Chen, C. Chen, and Z. Guan. Image representation using laplacian regularized nonnegative tensor factorization. Pattern Recognition, 44(10):2516–2526, 2011.
  • [35] Y. Wang, H.-Y. Tung, A. J. Smola, and A. Anandkumar. Fast and guaranteed tensor decomposition via sketching. In Advances in Neural Information Processing Systems, pages 991–999, 2015.
  • [36] Y.-X. Wang, L.-Y. Gui, and Y.-J. Zhang. Neighborhood preserving non-negative tensor factorization for image representation. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3389–3392. IEEE, 2012.
  • [37] G. Zhong and M. Cheriet. Large margin low rank tensor analysis. Neural computation, 26(4):761–780, 2014.

Appendix A Appendices

a.1 Proof of Theorem 1

In the main body of the paper, we assume (for simplicity of notation) that the graph multilinear rank of the tensor is equal in all the modes . However, for the proof of this Theorem we adopt a more general notation and assume a different rank for every mode of the tensor.

We first clearly state the properties of MLRTG:

  1. Property 1: Joint Approximate Graph Stationarity:

    Definition 1.

    A tensor satisfies Joint Approximate Graph Stationarity, i.e, its matricization / flattening satisfies approximate graph stationarity :

    (9)
  2. Property 2: Low Frequency Energy Concentration:

    Definition 2.

    A tensor satisfies the low-frequency energy concentration property, i.e, the energy is concentrated in the top entries of the graph spectral covariance matrices .

    (10)

a.1.1 Assumption: The existence of Eigen Gap

For the proof of this Theorem, we assume that the ‘eigen gap condition’ holds. In order to understand this condition, we guide the reader step by step by defining the following terms:

  1. Cartesian product of graphs

  2. Eigen gap of a cartesian product graph

Definition 3 (Eigen Gap of a graph).

A graph Laplacian with an eigenvalue decomposition is said to have an eigen gap if there exists a , such that and .

Separable Eigenvector Decomposition of a graph Laplacian: The eigenvector decomposition of a combinatorial Laplacian of a graph possessing an eigen gap (satisfying Definition 3) can be written as:

where denote the first low frequency eigenvectors and eigenvalues in .

For a -nearest neighbors graph constructed from a -clusterable data (along rows) one can expect as and .

Definition 4 (Cartesian product).

Suppose we have two graphs and where the tupple represents (vertices, edges, adjacency matrix, degree matrix). The Cartesian product is a graph such that the vertex set is the Cartesian product and the edges are set according to the following rules: any two vertices and are adjacent in if and only if either

  • and is adjacent with in

  • and is adjacent with in .

The adjacency matrix of the Cartesian product graph is given by the matrix Cartesian product:

Lemma 2.

The degree matrix of the graph which satisfies definition 4 is given as

The combinatorial Laplacian of is:

Proof.

The adjacency matrix of the Cartesian product graph is given by the matrix Cartesian product:

With this definition of the adjacency matrix, it is possible to write the degree matrix of the Cartesian product graph as cartesian product of the factor degree matrices:

where we have used the following property

This implies the following matrix equality:

The combinatorial Laplacian of the cartesian product is:

For our purpose we define the Eigen gap of the Cartesian Product Graph as follows:

Definition 5 (Eigen Gap of a Cartesian Product Graph).

A cartesian product graph as defined in 4, is said to have an eigen gap if there exist , such that, , where denotes the eigenvalue of the graph Laplacian .

a.1.2 Consequence of ‘Eigen gap assumption’: Separable eigenvector decomposition of a cartesian product graph

The eigen gap assumption (definition 5) is important to define the notion of the ‘Separable eigenvector decomposition of a cartesian product graph’ which will be used in the final steps of this Theorem. We define the eigenvector decomposition of a Laplacian of cartesian product of two graphs. The definition can be extended in a straight-forward manner for more than two graphs as well.

Lemma 3 (Separable Eigenvector Decomposition of a Cartesian Product Graph).

For a cartesian product graph as defined in 4, the eigenvector decomposition can be written in a separable form as:

where and denotes the first low frequency eigenvectors and eigenvalues in .

Proof.

The eigenvector matrix of the cartesian product graph can be derived as:

So the eigenvector matrix is given by the Kronecker product between the eigenvector matrices of the factor graphs and the eigenvalues are the element-wise summation between all the possible pairs of factors eigenvalues, i.e. the cartesian product between the eigenvalue matrices.

(11)

where we assume that with the first columns in and 0 appended for others and with the first columns equal to 0 and others copied from . The same holds for as well. Now

where we use . Now removing the zero appended columns we get . Now, let

removing the zero appended entries we get . For a -nearest neighbors graph constructed from a -clusterable data (along rows) one can expect as and . Furthermore (definitions 3 & 5). Thus eq. (A.1.2) can be written as:

a.1.3 Final steps

Now we are ready to prove the Theorem. We start by expanding the denominator of the expression from property 2 above. We start with the GSC for any and use . For a 3D tensor, we index its modes with . While considering the eigenvectors of mode, i.e, we represent the kronecker product of the eigenvectors of other modes as for simplicity:

(12)

The last step above follows from the eigenvalue decomposition of a cartesian product graph (Lemma 3). Note that this only holds if the eigen gap condition (definition 5) holds true.