1 Introduction
With the development of data acquisition and storage technology, largescale data (i.e., big data) becomes ubiquitous in many fields such as computational neuroscience, signal processing, machine learning and pattern recognition
[1]. Among these fields, large amounts of multidimensional data (i.e., tensors) of high dimensionality is generated. Big data is of large volume and complex, which is hard to process by traditional methods like singular value decomposition (SVD) and principal component analysis (PCA) due to their high computational complexity. Moreover, in order to fit in these algorithms, traditional methods need to do unfolding (matricization) operations to transform tensor data to matrices and vectors, which leads to adjacent structure information loss and redundant space cost
[2].Tensor can retain the highdimension structure of the data and prevent information loss. Tensor decomposition aims to approximate the tensor by the latent factors, thus transforming largescale tensor data into a latent space of lowdimensionality and reduce the data dimensionality. CANDECOMP/PARAFAC (CP) decomposition [3] and Tucker decomposition [4] are the most classical and wellstudied tensor decomposition models, after which tensor train (TT) decomposition [5] and tensor ring (TR) decomposition [6]
become popular because of their high compression performance in highdimensional and largescale tensor. TT and TR provide a natural solution for the ‘curse of dimensionality’. For instance, for an order
tensor, the space complexity of Tucker grows exponentially with N, while the cases of TT, TR and CP are linear with N. Although CP is a highly compact decomposition model of which the space complexity is also linear in N, it has difficulties in finding the optimal latent tensor factors [7].Though tensor decomposition has the merit of data structure conservation and high data representation ability, when dealing with largescale data, traditional deterministic algorithms like alternative least squares (ALS) and gradient descent (GD) are of lowefficiency due to their high computational cost and low convergence rate. Therefore, fast and efficient algorithms are of high demand to largescale tensor decomposition. Randomized technology is a powerful computation acceleration technique, it has been proposed and studied for decades [8, 9]. Recently, randomnessbased tensor decomposition has drawn people’s attention. Literature [10] proposes a randomized algorithm for largescale tensors based on Tucker decomposition, it can process arbitrarily largescale tensors with low multilinear rank and shows robustness to various data set. A randomized least squares algorithm for CP decomposition is proposed in [11], it is much faster than traditional CP least squares algorithm and can keep the high performance at the same time. Work in [12] provides a different randomized CP decomposition algorithm, they first find the CP decomposition of the small tensor which is generated by tensor random projection of the largescale tensor, then the CP decomposition of the largescale tensor is obtained by back projection of the CP decomposition of the small tensor.
Many of these randomized tensor decomposition algorithms are efficient and perform well in simulation experiments. However, to the best of our knowledge, randomized techniques have not been applied to TR decomposition, and few studies are conducted to explore the performance of randomized tensor decomposition algorithms in real worlddata. Facing the fact that TR decomposition lacks fast and efficient algorithms for largescale tensor, in this paper, we explore the effectiveness of tensor random projection method on TR decomposition. The main contribution of this paper is listed below:

Based on tensor random projection method and traditional TR decomposition algorithms, we propose two randomized TR decomposition (rTRD) algorithms, which are suitable for fast and reliable tensor decomposition of largescale data.

The proposed algorithms are compared with the traditional TR decomposition algorithms in the simulation experiment. Our algorithms obtain a significant advantage in computational speed against traditional algorithms without loss of accuracy.

The experiments on deep learning datasets and hyperspectral image (HSI) data are conducted. The proposed algorithms outperform the compared randomized tensor decomposition algorithms in data compression and reconstruction.
2 Notations and Preliminaries
2.1 Notations
The notations in [13] are adopted in this paper. Tensors of order are denoted by calligraphic letters, e.g., . Scalars are denoted by normal lowercase letters or uppercase letters, e.g., . Vectors are denoted by boldface lowercase letters, e.g., . Matrices are denoted by boldface capital letters, e.g., . For simplicity, we define tensor sequence as or The scalar sequence, matrix sequence and vector sequence are denoted by the same way. Moreover, we employ two types of tensor unfolding (matricization) operations in this paper. The first mode unfolding [13] of tensor is denoted by , and the second mode unfolding of tensor which is often used in TR operations [6] is denoted by . In addition, the Frobenius norm of is defined by , where is the inner product operation.
2.2 Tensor Ring Decomposition
Tensor ring (TR) decomposition is a more general decomposition than tensortrain (TT) decomposition, and it represents a tensor with large dimension by circular multilinear products over a sequence of low dimension cores (TR factors). All of the TR factors are orderthree tensors, and are denoted by , . In the same way as TT, the TR decomposition linearly scales to the dimension of the tensor, thus it can overcome the ‘curse of dimensionality’. denotes TRrank which controls the model complexity of TR decomposition. The TR decomposition relaxes the rank constraint on the first and last core of TT to , while the original constraint on TT is rather stringent, i.e., . TR applies trace operation and all the TR factors are constrained to be thirdorder equivalently. In this case, TR can be considered as a linear combination of TT and thus it offers a more powerful and generalized representation ability than TT. The elementwise relation and global relation of TR decomposition and the original tensor is given by equations (1) and (2):
(1) 
(2) 
where is the matrix trace operator, is the th mode slice of , which also can be denoted by according to Matlab syntax. is a subchain tensor by merging all TR factors except the th core tensor, i.e., , see more details in [14].
3 Approach
3.1 Tensor Random Projection
Tensor random projection (TRP) has drawn people’s attention in the very recent years, and several studies has been conducted based on CP and Tucker [12, 10]. Similar to matrix projection, TRP method aims to process random projection at every mode of the tensor, then a much smaller subspace tensor is obtained which reserves most of the actions of the original tensor. The TRP is simply formulated as follows:
(3)  
where is the mode tensor production, see details in [13], are orthogonal matrices, and is the projected tensor. After projection, the projected tensor is employed to calculate the desired lowrank approximation of the original largescale tensor. The implementation details of the TRP method are illustrated in the next subsection.
3.2 Randomized Tensor Ring Decomposition
The problem of finding TR decomposition is formulated as the following model:
(4) 
where is the target tensor, are the TR factors to be solved, and is the function which transform the TR factors into the approximated tensor. In [14], the model is solved by various methods like TRSVD, TRALS, TRSGD, etc. However, the SVDbased and ALSbased algorithms are of high computational cost, when facing largescale data, tremendous computing resource is needed. In addition, though TRSGD owns low complexity on every iteration and is suitable for largescale computation, the convergence speed is rather slow and the performance cannot be guaranteed. Under this situation, we combine the TRP technique with the traditional TR decomposition algorithms, (e.g. TRALS and TRSVD), to make it possible for fast and reliable TR decomposition of largescale tensor. The randomized tensor ring decomposition (rTRD) algorithms which is based on ALS (i.e., rTRALS) and SVD (i.e., rTRSVD) are illustrated by Algorithm 1.
Algorithm 1 Randomized tensor ring decomposition (rTRD) 
1: Input: Largescale tensor , 
projection size of every mode , 
and TRrank . 
2: Output: TR factors of the largescale tensor . 
3: For 
4: Create matrix following 
5: = % random projection 
6: % economy QR decomposition 
7: 
8: End for 
9: Obtain TR factors of by TRALS or TRSVD [6] . 
10: For 
11: . 
12: End for 
It should be noted that for randomized algorithms, several techniques can be applied to the projection step to improve the numerical stability of the projection, thus providing higher decomposition performance. For example, adopting structured projection matrices instead of Gaussian distribution [15] and applying power iterations method to update the projected tensor in order to achieve fast decay of the spectrum of the mode unfolding of the projected tensor [8]. In our paper, we only adopt the most basic TRP in order to show the direct improvements compared to the traditional decomposition algorithms.
4 Experiment Results
In the experiment section, we firstly investigate the influence of the size of the projected tensor, and compare our randomized algorithms with their traditional counterparts (i.e, rTRALS vs TRALS, and rTRSVD vs TRSVD). Then we conduct experiments on two largescale deep learning datasets for fast data compression. Finally, a hyperspectral image (HSI) is employed to test the performance of our algorithm on data reconstruction and denoising. For evaluation index, we mainly adopt relative square error (RSE) which is calculated by , where is the target largescale tensor and is the tensor approximated by the corresponding decomposition factors. All the computations are conducted on a Mac PC with Intel Core i7 and 16GB DDR3 memory.
Cifar10  Coil100  
CR  RSE  time  CR  RSE  time  CR  RSE  time  CR  RSE  time  
rTRALS  102.3  0.2185  18.29  767.0  0.3294  17.39  2948.7  0.3331  40.99  1047.3  0.2911  42.61 
rTRSVD  42.64  0.1791  10.63  42.6  0.1791  10.85  175.4  0.2669  1.49  175.4  0.2663  1.96 
TRSGD  102.3  0.4382  1.21e3  767.0  1.00  6.27e2  2948.7  0.4158  482.64  1047.3  0.3536  411.12 
rCPALS  99.0  0.2254  11.32  613.6  0.3284  10.86  3084.9  0.3434  2.12  1028.3  0.3001  5.80 
rTucker  100.8  0.2146  10.65  509.2  0.3058  4.61  3093.5  0.4241  0.38  1077.4  0.4680  1.98 
4.1 Simulation
The most important hyperparameter of the tensor projection step is the projection size which determines the amount of residual information to be remained and controls the balance of computational speed and accuracy. In this experiment, we aim to explore how the size of the projected tensor influences the performance of our algorithms, and compare the performance with the related tensor decomposition algorithms. Except for our proposed algorithms, the rCPALS [12] which is the most related method is also adopted in this experiment. The counterparts of the three randomized algorithms are TRALS, TRSVD [6] and CPALS [13] respectively. We choose a RGB image of size as the simulation data. The projection size of order and order of the tensor data are chosen from , and the order of the tensor remains as . As for parameter settings, we set the TRrank as , CPrank as , and the maximum iteration as for ALSbased algorithms. For TRSVD and rTRSVD, only one iteration is needed and the TRrank is automatically chosen, so we only set the tolerance as 0.15. Figure 1 shows the approximation error (RSE) and computation time of the compared algorithms. When the projection size reaches a specific value, the performance of the randomized algorithms remain steady and similar performance with their counterparts are obtained. At the steady points where the performance of the algorithm pairs are similar, from time graph we can see, rTRALS is about 24 times faster than TRALS (2.0s vs 48.1s), and rTRSVD is about 4 times faster than TRSVD (0.11s vs 0.43s).
4.2 Deep Learning Dataset Compression
In this section, we aim to compare the compression performance and running time of our proposed algorithms and other randomized tensor decomposition method on two deep learning datasets (i.e., CIFAR10 [16] of size (training data) with entries, COIL100 [17] of size with entries ). The traditional algorithms will be inefficient because the datasets are too large, so we only compare with algorithms suitable for largescale data, i.e., TRSGD [14], rTucker [10] and rCP [12]. The compression ratio CR is calculated by CR=Num/Np, where Num is the total entries of the data and Np is the number of model parameters. CR is controlled by different rank selection, and for rTRSVD, we set the tolerance as for automatical rank selection. Table 1 shows the compression error and time cost of all the compared algorithms. rTRSVD and rTRALS show high accuracy and speed in all the situations, while TRSGD is much slower and obtains relatively low accuracy. Though rCPALS and rTucker are fast, the accuracy is behind our algorithms.
4.3 Hyperspectral Image Denoising
Hyperspectral image (HSI) is a typical type of natural orderthree tensor (i.e., ) with largescale. For HSI image, the spectrummode (mode) is usually considered to have strong lowrankness, so the projection of mode3 can largely reduce computational cost. In this experiment, we also employ rSVD [8] which is often used in HSI image processing and rSVD is done by mode unfolding operation. The projection size of all the algorithms are set as for the tested HSI image, and other parameters are set to get the best performance. Figure 2 and Table 2 show the visual and numerical results respectively. rTRALS outperforms the compared algorithms in the experiment.
Noise  rTRALS  rTRSVD  TRSGD  rCPALS  rTucker  rSVD  
  RSE Time  0.0150 60.01  0.149 0.45  0.249 9.45  0.100 5.38  0.0110 0.50  0.0303 1.84 
20dB  RSE Time  0.0294 60.21  0.143 1.20  0.253 206.82  0.101 3.97  0.0388 0.54  0.0594 2.33 
10dB  RSE Time  0.0811 59.61  0.113 1.27  0.293 210.89  0.107 3.91  0.114 0.46  0.156 2.08 
0dB  RSE Time  0.285 59.05  0.328 0.78  0.437 206.62  0.166 3.95  0.367 0.44  0.431 1.87 
5 Conclusion
In this paper, by tensor random projection method, we proposed rTRALS and rTRSVD algorithms for fast and reliable tensor ring decomposition. Without losing accuracy, the two algorithms perform much faster than their counterparts and outperform the other compared randomized algorithms in deep learning dataset compression and HSI image reconstruction experiments. Randomized method is a promising aspect for largescale data processing. For future work, we will focus on further improving the performance and applying randomized algorithms to largescale sparse and incomplete tensors.
References
 [1] Andrzej Cichocki, “Era of big data processing: A new approach via tensor networks and tensor decompositions,” arXiv preprint arXiv:1403.2048, 2014.

[2]
Amnon Shashua and Tamir Hazan,
“Nonnegative tensor factorization with applications to statistics and computer vision,”
in Proceedings of the 22nd international conference on Machine learning. ACM, 2005, pp. 792–799.  [3] Nicolaas Klaas M Faber, Rasmus Bro, and Philip K Hopke, “Recent developments in candecomp/parafac algorithms: a critical review,” Chemometrics and Intelligent Laboratory Systems, vol. 65, no. 1, pp. 119–137, 2003.
 [4] Ledyard R Tucker, “Some mathematical notes on threemode factor analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, 1966.
 [5] Ivan V Oseledets, “Tensortrain decomposition,” SIAM Journal on Scientific Computing, vol. 33, no. 5, pp. 2295–2317, 2011.
 [6] Qibin Zhao, Guoxu Zhou, Shengli Xie, Liqing Zhang, and Andrzej Cichocki, “Tensor ring decomposition,” arXiv preprint arXiv:1606.05535, 2016.
 [7] Guoxu Zhou and Andrzej Cichocki, “Canonical polyadic decomposition based on a single mode blind source separation,” IEEE Signal Processing Letters, vol. 19, no. 8, pp. 523–526, 2012.
 [8] Nathan Halko, PerGunnar Martinsson, and Joel A Tropp, “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions,” SIAM review, vol. 53, no. 2, pp. 217–288, 2011.
 [9] PerGunnar Martinsson, Vladimir Rokhlin, and Mark Tygert, “A randomized algorithm for the decomposition of matrices,” Applied and Computational Harmonic Analysis, vol. 30, no. 1, pp. 47–68, 2011.
 [10] Guoxu Zhou, Andrzej Cichocki, and Shengli Xie, “Decomposition of big tensors with low multilinear rank,” arXiv preprint arXiv:1412.1885, 2014.
 [11] Casey Battaglino, Grey Ballard, and Tamara G Kolda, “A practical randomized cp tensor decomposition,” SIAM Journal on Matrix Analysis and Applications, vol. 39, no. 2, pp. 876–901, 2018.
 [12] N Benjamin Erichson, Krithika Manohar, Steven L Brunton, and J Nathan Kutz, “Randomized cp tensor decomposition,” arXiv preprint arXiv:1703.09074, 2017.
 [13] Tamara G Kolda and Brett W Bader, “Tensor decompositions and applications,” SIAM review, vol. 51, no. 3, pp. 455–500, 2009.
 [14] Qibin Zhao, Masashi Sugiyama, Longhao Yuan, and Andrzej Cichocki, “Learning efficient tensor representations with ring structure networks,” 2018.
 [15] Franco Woolfe, Edo Liberty, Vladimir Rokhlin, and Mark Tygert, “A fast randomized algorithm for the approximation of matrices,” Applied and Computational Harmonic Analysis, vol. 25, no. 3, pp. 335–366, 2008.
 [16] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton, “The cifar10 dataset,” online: http://www. cs. toronto. edu/kriz/cifar. html, 2014.
 [17] S Nayar, “Columbia object image library (coil100),” http://www1. cs. columbia. edu/CAVE/software/softlib/coil100. php, 1996.