With the development of data acquisition and storage technology, large-scale data (i.e., big data) becomes ubiquitous in many fields such as computational neuroscience, signal processing, machine learning and pattern recognition
. Among these fields, large amounts of multi-dimensional data (i.e., tensors) of high dimensionality is generated. Big data is of large volume and complex, which is hard to process by traditional methods like singular value decomposition (SVD) and principal component analysis (PCA) due to their high computational complexity. Moreover, in order to fit in these algorithms, traditional methods need to do unfolding (matricization) operations to transform tensor data to matrices and vectors, which leads to adjacent structure information loss and redundant space cost.
Tensor can retain the high-dimension structure of the data and prevent information loss. Tensor decomposition aims to approximate the tensor by the latent factors, thus transforming large-scale tensor data into a latent space of low-dimensionality and reduce the data dimensionality. CANDECOMP/PARAFAC (CP) decomposition  and Tucker decomposition  are the most classical and well-studied tensor decomposition models, after which tensor train (TT) decomposition  and tensor ring (TR) decomposition 
become popular because of their high compression performance in high-dimensional and large-scale tensor. TT and TR provide a natural solution for the ‘curse of dimensionality’. For instance, for an order-tensor, the space complexity of Tucker grows exponentially with N, while the cases of TT, TR and CP are linear with N. Although CP is a highly compact decomposition model of which the space complexity is also linear in N, it has difficulties in finding the optimal latent tensor factors .
Though tensor decomposition has the merit of data structure conservation and high data representation ability, when dealing with large-scale data, traditional deterministic algorithms like alternative least squares (ALS) and gradient descent (GD) are of low-efficiency due to their high computational cost and low convergence rate. Therefore, fast and efficient algorithms are of high demand to large-scale tensor decomposition. Randomized technology is a powerful computation acceleration technique, it has been proposed and studied for decades [8, 9]. Recently, randomness-based tensor decomposition has drawn people’s attention. Literature  proposes a randomized algorithm for large-scale tensors based on Tucker decomposition, it can process arbitrarily large-scale tensors with low multi-linear rank and shows robustness to various data set. A randomized least squares algorithm for CP decomposition is proposed in , it is much faster than traditional CP least squares algorithm and can keep the high performance at the same time. Work in  provides a different randomized CP decomposition algorithm, they first find the CP decomposition of the small tensor which is generated by tensor random projection of the large-scale tensor, then the CP decomposition of the large-scale tensor is obtained by back projection of the CP decomposition of the small tensor.
Many of these randomized tensor decomposition algorithms are efficient and perform well in simulation experiments. However, to the best of our knowledge, randomized techniques have not been applied to TR decomposition, and few studies are conducted to explore the performance of randomized tensor decomposition algorithms in real world-data. Facing the fact that TR decomposition lacks fast and efficient algorithms for large-scale tensor, in this paper, we explore the effectiveness of tensor random projection method on TR decomposition. The main contribution of this paper is listed below:
Based on tensor random projection method and traditional TR decomposition algorithms, we propose two randomized TR decomposition (rTRD) algorithms, which are suitable for fast and reliable tensor decomposition of large-scale data.
The proposed algorithms are compared with the traditional TR decomposition algorithms in the simulation experiment. Our algorithms obtain a significant advantage in computational speed against traditional algorithms without loss of accuracy.
The experiments on deep learning datasets and hyperspectral image (HSI) data are conducted. The proposed algorithms outperform the compared randomized tensor decomposition algorithms in data compression and reconstruction.
2 Notations and Preliminaries
The notations in  are adopted in this paper. Tensors of order- are denoted by calligraphic letters, e.g., . Scalars are denoted by normal lowercase letters or uppercase letters, e.g., . Vectors are denoted by boldface lowercase letters, e.g., . Matrices are denoted by boldface capital letters, e.g., . For simplicity, we define tensor sequence as or The scalar sequence, matrix sequence and vector sequence are denoted by the same way. Moreover, we employ two types of tensor unfolding (matricization) operations in this paper. The first mode- unfolding  of tensor is denoted by , and the second mode- unfolding of tensor which is often used in TR operations  is denoted by . In addition, the Frobenius norm of is defined by , where is the inner product operation.
2.2 Tensor Ring Decomposition
Tensor ring (TR) decomposition is a more general decomposition than tensor-train (TT) decomposition, and it represents a tensor with large dimension by circular multilinear products over a sequence of low dimension cores (TR factors). All of the TR factors are order-three tensors, and are denoted by , . In the same way as TT, the TR decomposition linearly scales to the dimension of the tensor, thus it can overcome the ‘curse of dimensionality’. denotes TR-rank which controls the model complexity of TR decomposition. The TR decomposition relaxes the rank constraint on the first and last core of TT to , while the original constraint on TT is rather stringent, i.e., . TR applies trace operation and all the TR factors are constrained to be third-order equivalently. In this case, TR can be considered as a linear combination of TT and thus it offers a more powerful and generalized representation ability than TT. The element-wise relation and global relation of TR decomposition and the original tensor is given by equations (1) and (2):
where is the matrix trace operator, is the th mode- slice of , which also can be denoted by according to Matlab syntax. is a subchain tensor by merging all TR factors except the th core tensor, i.e., , see more details in .
3.1 Tensor Random Projection
Tensor random projection (TRP) has drawn people’s attention in the very recent years, and several studies has been conducted based on CP and Tucker [12, 10]. Similar to matrix projection, TRP method aims to process random projection at every mode of the tensor, then a much smaller subspace tensor is obtained which reserves most of the actions of the original tensor. The TRP is simply formulated as follows:
where is the mode- tensor production, see details in , are orthogonal matrices, and is the projected tensor. After projection, the projected tensor is employed to calculate the desired low-rank approximation of the original large-scale tensor. The implementation details of the TRP method are illustrated in the next subsection.
3.2 Randomized Tensor Ring Decomposition
The problem of finding TR decomposition is formulated as the following model:
where is the target tensor, are the TR factors to be solved, and is the function which transform the TR factors into the approximated tensor. In , the model is solved by various methods like TRSVD, TRALS, TRSGD, etc. However, the SVD-based and ALS-based algorithms are of high computational cost, when facing large-scale data, tremendous computing resource is needed. In addition, though TRSGD owns low complexity on every iteration and is suitable for large-scale computation, the convergence speed is rather slow and the performance cannot be guaranteed. Under this situation, we combine the TRP technique with the traditional TR decomposition algorithms, (e.g. TRALS and TRSVD), to make it possible for fast and reliable TR decomposition of large-scale tensor. The randomized tensor ring decomposition (rTRD) algorithms which is based on ALS (i.e., rTRALS) and SVD (i.e., rTRSVD) are illustrated by Algorithm 1.
|Algorithm 1 Randomized tensor ring decomposition (rTRD)|
|1: Input: Large-scale tensor ,|
|projection size of every mode ,|
|and TR-rank .|
|2: Output: TR factors of the large-scale tensor .|
|4: Create matrix following|
|5: = % random projection|
% economy QR decomposition
|8: End for|
|9: Obtain TR factors of by TRALS or TRSVD  .|
|12: End for|
It should be noted that for randomized algorithms, several techniques can be applied to the projection step to improve the numerical stability of the projection, thus providing higher decomposition performance. For example, adopting structured projection matrices instead of Gaussian distribution  and applying power iterations method to update the projected tensor in order to achieve fast decay of the spectrum of the mode- unfolding of the projected tensor . In our paper, we only adopt the most basic TRP in order to show the direct improvements compared to the traditional decomposition algorithms.
4 Experiment Results
In the experiment section, we firstly investigate the influence of the size of the projected tensor, and compare our randomized algorithms with their traditional counterparts (i.e, rTRALS vs TRALS, and rTRSVD vs TRSVD). Then we conduct experiments on two large-scale deep learning datasets for fast data compression. Finally, a hyperspectral image (HSI) is employed to test the performance of our algorithm on data reconstruction and denoising. For evaluation index, we mainly adopt relative square error (RSE) which is calculated by , where is the target large-scale tensor and is the tensor approximated by the corresponding decomposition factors. All the computations are conducted on a Mac PC with Intel Core i7 and 16GB DDR3 memory.
The most important hyper-parameter of the tensor projection step is the projection size which determines the amount of residual information to be remained and controls the balance of computational speed and accuracy. In this experiment, we aim to explore how the size of the projected tensor influences the performance of our algorithms, and compare the performance with the related tensor decomposition algorithms. Except for our proposed algorithms, the rCPALS  which is the most related method is also adopted in this experiment. The counterparts of the three randomized algorithms are TRALS, TRSVD  and CPALS  respectively. We choose a RGB image of size as the simulation data. The projection size of order- and order- of the tensor data are chosen from , and the order- of the tensor remains as . As for parameter settings, we set the TR-rank as , CP-rank as , and the maximum iteration as for ALS-based algorithms. For TRSVD and rTRSVD, only one iteration is needed and the TR-rank is automatically chosen, so we only set the tolerance as 0.15. Figure 1 shows the approximation error (RSE) and computation time of the compared algorithms. When the projection size reaches a specific value, the performance of the randomized algorithms remain steady and similar performance with their counterparts are obtained. At the steady points where the performance of the algorithm pairs are similar, from time graph we can see, rTRALS is about 24 times faster than TRALS (2.0s vs 48.1s), and rTRSVD is about 4 times faster than TRSVD (0.11s vs 0.43s).
4.2 Deep Learning Dataset Compression
In this section, we aim to compare the compression performance and running time of our proposed algorithms and other randomized tensor decomposition method on two deep learning datasets (i.e., CIFAR10  of size (training data) with entries, COIL100  of size with entries ). The traditional algorithms will be inefficient because the datasets are too large, so we only compare with algorithms suitable for large-scale data, i.e., TRSGD , rTucker  and rCP . The compression ratio CR is calculated by CR=Num/Np, where Num is the total entries of the data and Np is the number of model parameters. CR is controlled by different rank selection, and for rTRSVD, we set the tolerance as for automatical rank selection. Table 1 shows the compression error and time cost of all the compared algorithms. rTRSVD and rTRALS show high accuracy and speed in all the situations, while TRSGD is much slower and obtains relatively low accuracy. Though rCPALS and rTucker are fast, the accuracy is behind our algorithms.
4.3 Hyperspectral Image Denoising
Hyperspectral image (HSI) is a typical type of natural order-three tensor (i.e., ) with large-scale. For HSI image, the spectrum-mode (mode-) is usually considered to have strong low-rankness, so the projection of mode-3 can largely reduce computational cost. In this experiment, we also employ rSVD  which is often used in HSI image processing and rSVD is done by mode- unfolding operation. The projection size of all the algorithms are set as for the tested HSI image, and other parameters are set to get the best performance. Figure 2 and Table 2 show the visual and numerical results respectively. rTRALS outperforms the compared algorithms in the experiment.
|-||RSE Time||0.0150 60.01||0.149 0.45||0.249 9.45||0.100 5.38||0.0110 0.50||0.0303 1.84|
|20dB||RSE Time||0.0294 60.21||0.143 1.20||0.253 206.82||0.101 3.97||0.0388 0.54||0.0594 2.33|
|10dB||RSE Time||0.0811 59.61||0.113 1.27||0.293 210.89||0.107 3.91||0.114 0.46||0.156 2.08|
|0dB||RSE Time||0.285 59.05||0.328 0.78||0.437 206.62||0.166 3.95||0.367 0.44||0.431 1.87|
In this paper, by tensor random projection method, we proposed rTRALS and rTRSVD algorithms for fast and reliable tensor ring decomposition. Without losing accuracy, the two algorithms perform much faster than their counterparts and outperform the other compared randomized algorithms in deep learning dataset compression and HSI image reconstruction experiments. Randomized method is a promising aspect for large-scale data processing. For future work, we will focus on further improving the performance and applying randomized algorithms to large-scale sparse and incomplete tensors.
-  Andrzej Cichocki, “Era of big data processing: A new approach via tensor networks and tensor decompositions,” arXiv preprint arXiv:1403.2048, 2014.
Amnon Shashua and Tamir Hazan,
“Non-negative tensor factorization with applications to statistics and computer vision,”in Proceedings of the 22nd international conference on Machine learning. ACM, 2005, pp. 792–799.
-  Nicolaas Klaas M Faber, Rasmus Bro, and Philip K Hopke, “Recent developments in candecomp/parafac algorithms: a critical review,” Chemometrics and Intelligent Laboratory Systems, vol. 65, no. 1, pp. 119–137, 2003.
-  Ledyard R Tucker, “Some mathematical notes on three-mode factor analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, 1966.
-  Ivan V Oseledets, “Tensor-train decomposition,” SIAM Journal on Scientific Computing, vol. 33, no. 5, pp. 2295–2317, 2011.
-  Qibin Zhao, Guoxu Zhou, Shengli Xie, Liqing Zhang, and Andrzej Cichocki, “Tensor ring decomposition,” arXiv preprint arXiv:1606.05535, 2016.
-  Guoxu Zhou and Andrzej Cichocki, “Canonical polyadic decomposition based on a single mode blind source separation,” IEEE Signal Processing Letters, vol. 19, no. 8, pp. 523–526, 2012.
-  Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp, “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions,” SIAM review, vol. 53, no. 2, pp. 217–288, 2011.
-  Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert, “A randomized algorithm for the decomposition of matrices,” Applied and Computational Harmonic Analysis, vol. 30, no. 1, pp. 47–68, 2011.
-  Guoxu Zhou, Andrzej Cichocki, and Shengli Xie, “Decomposition of big tensors with low multilinear rank,” arXiv preprint arXiv:1412.1885, 2014.
-  Casey Battaglino, Grey Ballard, and Tamara G Kolda, “A practical randomized cp tensor decomposition,” SIAM Journal on Matrix Analysis and Applications, vol. 39, no. 2, pp. 876–901, 2018.
-  N Benjamin Erichson, Krithika Manohar, Steven L Brunton, and J Nathan Kutz, “Randomized cp tensor decomposition,” arXiv preprint arXiv:1703.09074, 2017.
-  Tamara G Kolda and Brett W Bader, “Tensor decompositions and applications,” SIAM review, vol. 51, no. 3, pp. 455–500, 2009.
-  Qibin Zhao, Masashi Sugiyama, Longhao Yuan, and Andrzej Cichocki, “Learning efficient tensor representations with ring structure networks,” 2018.
-  Franco Woolfe, Edo Liberty, Vladimir Rokhlin, and Mark Tygert, “A fast randomized algorithm for the approximation of matrices,” Applied and Computational Harmonic Analysis, vol. 25, no. 3, pp. 335–366, 2008.
-  Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton, “The cifar-10 dataset,” online: http://www. cs. toronto. edu/kriz/cifar. html, 2014.
-  S Nayar, “Columbia object image library (coil100),” http://www1. cs. columbia. edu/CAVE/software/softlib/coil-100. php, 1996.