Randomized Tensor Ring Decomposition and Its Application to Large-scale Data Reconstruction

01/07/2019 ∙ by Longhao Yuan, et al. ∙ 0

Dimensionality reduction is an essential technique for multi-way large-scale data, i.e., tensor. Tensor ring (TR) decomposition has become popular due to its high representation ability and flexibility. However, the traditional TR decomposition algorithms suffer from high computational cost when facing large-scale data. In this paper, taking advantages of the recently proposed tensor random projection method, we propose two TR decomposition algorithms. By employing random projection on every mode of the large-scale tensor, the TR decomposition can be processed at a much smaller scale. The simulation experiment shows that the proposed algorithms are 4-25 times faster than traditional algorithms without loss of accuracy, and our algorithms show superior performance in deep learning dataset compression and hyperspectral image reconstruction experiments compared to other randomized algorithms.

READ FULL TEXT VIEW PDF

Authors

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the development of data acquisition and storage technology, large-scale data (i.e., big data) becomes ubiquitous in many fields such as computational neuroscience, signal processing, machine learning and pattern recognition

[1]

. Among these fields, large amounts of multi-dimensional data (i.e., tensors) of high dimensionality is generated. Big data is of large volume and complex, which is hard to process by traditional methods like singular value decomposition (SVD) and principal component analysis (PCA) due to their high computational complexity. Moreover, in order to fit in these algorithms, traditional methods need to do unfolding (matricization) operations to transform tensor data to matrices and vectors, which leads to adjacent structure information loss and redundant space cost

[2].

Tensor can retain the high-dimension structure of the data and prevent information loss. Tensor decomposition aims to approximate the tensor by the latent factors, thus transforming large-scale tensor data into a latent space of low-dimensionality and reduce the data dimensionality. CANDECOMP/PARAFAC (CP) decomposition [3] and Tucker decomposition [4] are the most classical and well-studied tensor decomposition models, after which tensor train (TT) decomposition [5] and tensor ring (TR) decomposition [6]

become popular because of their high compression performance in high-dimensional and large-scale tensor. TT and TR provide a natural solution for the ‘curse of dimensionality’. For instance, for an order-

tensor, the space complexity of Tucker grows exponentially with N, while the cases of TT, TR and CP are linear with N. Although CP is a highly compact decomposition model of which the space complexity is also linear in N, it has difficulties in finding the optimal latent tensor factors [7].

Though tensor decomposition has the merit of data structure conservation and high data representation ability, when dealing with large-scale data, traditional deterministic algorithms like alternative least squares (ALS) and gradient descent (GD) are of low-efficiency due to their high computational cost and low convergence rate. Therefore, fast and efficient algorithms are of high demand to large-scale tensor decomposition. Randomized technology is a powerful computation acceleration technique, it has been proposed and studied for decades [8, 9]. Recently, randomness-based tensor decomposition has drawn people’s attention. Literature [10] proposes a randomized algorithm for large-scale tensors based on Tucker decomposition, it can process arbitrarily large-scale tensors with low multi-linear rank and shows robustness to various data set. A randomized least squares algorithm for CP decomposition is proposed in [11], it is much faster than traditional CP least squares algorithm and can keep the high performance at the same time. Work in [12] provides a different randomized CP decomposition algorithm, they first find the CP decomposition of the small tensor which is generated by tensor random projection of the large-scale tensor, then the CP decomposition of the large-scale tensor is obtained by back projection of the CP decomposition of the small tensor.

Many of these randomized tensor decomposition algorithms are efficient and perform well in simulation experiments. However, to the best of our knowledge, randomized techniques have not been applied to TR decomposition, and few studies are conducted to explore the performance of randomized tensor decomposition algorithms in real world-data. Facing the fact that TR decomposition lacks fast and efficient algorithms for large-scale tensor, in this paper, we explore the effectiveness of tensor random projection method on TR decomposition. The main contribution of this paper is listed below:

  • Based on tensor random projection method and traditional TR decomposition algorithms, we propose two randomized TR decomposition (rTRD) algorithms, which are suitable for fast and reliable tensor decomposition of large-scale data.

  • The proposed algorithms are compared with the traditional TR decomposition algorithms in the simulation experiment. Our algorithms obtain a significant advantage in computational speed against traditional algorithms without loss of accuracy.

  • The experiments on deep learning datasets and hyperspectral image (HSI) data are conducted. The proposed algorithms outperform the compared randomized tensor decomposition algorithms in data compression and reconstruction.

2 Notations and Preliminaries

2.1 Notations

The notations in [13] are adopted in this paper. Tensors of order- are denoted by calligraphic letters, e.g., . Scalars are denoted by normal lowercase letters or uppercase letters, e.g., . Vectors are denoted by boldface lowercase letters, e.g., . Matrices are denoted by boldface capital letters, e.g., . For simplicity, we define tensor sequence as or The scalar sequence, matrix sequence and vector sequence are denoted by the same way. Moreover, we employ two types of tensor unfolding (matricization) operations in this paper. The first mode- unfolding [13] of tensor is denoted by , and the second mode- unfolding of tensor which is often used in TR operations [6] is denoted by . In addition, the Frobenius norm of is defined by , where is the inner product operation.

2.2 Tensor Ring Decomposition

Tensor ring (TR) decomposition is a more general decomposition than tensor-train (TT) decomposition, and it represents a tensor with large dimension by circular multilinear products over a sequence of low dimension cores (TR factors). All of the TR factors are order-three tensors, and are denoted by , . In the same way as TT, the TR decomposition linearly scales to the dimension of the tensor, thus it can overcome the ‘curse of dimensionality’. denotes TR-rank which controls the model complexity of TR decomposition. The TR decomposition relaxes the rank constraint on the first and last core of TT to , while the original constraint on TT is rather stringent, i.e., . TR applies trace operation and all the TR factors are constrained to be third-order equivalently. In this case, TR can be considered as a linear combination of TT and thus it offers a more powerful and generalized representation ability than TT. The element-wise relation and global relation of TR decomposition and the original tensor is given by equations (1) and (2):

(1)
(2)

where is the matrix trace operator, is the th mode- slice of , which also can be denoted by according to Matlab syntax. is a subchain tensor by merging all TR factors except the th core tensor, i.e., , see more details in [14].

3 Approach

3.1 Tensor Random Projection

Tensor random projection (TRP) has drawn people’s attention in the very recent years, and several studies has been conducted based on CP and Tucker [12, 10]. Similar to matrix projection, TRP method aims to process random projection at every mode of the tensor, then a much smaller subspace tensor is obtained which reserves most of the actions of the original tensor. The TRP is simply formulated as follows:

(3)

where is the mode- tensor production, see details in [13], are orthogonal matrices, and is the projected tensor. After projection, the projected tensor is employed to calculate the desired low-rank approximation of the original large-scale tensor. The implementation details of the TRP method are illustrated in the next subsection.

3.2 Randomized Tensor Ring Decomposition

The problem of finding TR decomposition is formulated as the following model:

(4)

where is the target tensor, are the TR factors to be solved, and is the function which transform the TR factors into the approximated tensor. In [14], the model is solved by various methods like TRSVD, TRALS, TRSGD, etc. However, the SVD-based and ALS-based algorithms are of high computational cost, when facing large-scale data, tremendous computing resource is needed. In addition, though TRSGD owns low complexity on every iteration and is suitable for large-scale computation, the convergence speed is rather slow and the performance cannot be guaranteed. Under this situation, we combine the TRP technique with the traditional TR decomposition algorithms, (e.g. TRALS and TRSVD), to make it possible for fast and reliable TR decomposition of large-scale tensor. The randomized tensor ring decomposition (rTRD) algorithms which is based on ALS (i.e., rTRALS) and SVD (i.e., rTRSVD) are illustrated by Algorithm 1.

Algorithm 1 Randomized tensor ring decomposition (rTRD)
  1: Input: Large-scale tensor ,
                 projection size of every mode ,
                 and TR-rank .
  2: Output: TR factors of the large-scale tensor .
  3: For
  4:   Create matrix following

        the Gaussian distribution.

  5:   = % random projection
  6:   

% economy QR decomposition

  7:   
  8: End for
  9:   Obtain TR factors of by TRALS or TRSVD [6] .
10: For
11:   .
12: End for

It should be noted that for randomized algorithms, several techniques can be applied to the projection step to improve the numerical stability of the projection, thus providing higher decomposition performance. For example, adopting structured projection matrices instead of Gaussian distribution [15] and applying power iterations method to update the projected tensor in order to achieve fast decay of the spectrum of the mode- unfolding of the projected tensor [8]. In our paper, we only adopt the most basic TRP in order to show the direct improvements compared to the traditional decomposition algorithms.

4 Experiment Results

In the experiment section, we firstly investigate the influence of the size of the projected tensor, and compare our randomized algorithms with their traditional counterparts (i.e, rTRALS vs TRALS, and rTRSVD vs TRSVD). Then we conduct experiments on two large-scale deep learning datasets for fast data compression. Finally, a hyperspectral image (HSI) is employed to test the performance of our algorithm on data reconstruction and denoising. For evaluation index, we mainly adopt relative square error (RSE) which is calculated by , where is the target large-scale tensor and is the tensor approximated by the corresponding decomposition factors. All the computations are conducted on a Mac PC with Intel Core i7 and 16GB DDR3 memory.

Figure 1: Reconstruction results of six tensor decomposition algorithms under different tensor projection size. Figure (a) and (b) show the RSE values and the time cost respectively.
Cifar10 Coil100
CR RSE time CR RSE time CR RSE time CR RSE time
rTRALS 102.3 0.2185 18.29 767.0 0.3294 17.39 2948.7 0.3331 40.99 1047.3 0.2911 42.61
rTRSVD 42.64 0.1791 10.63 42.6 0.1791 10.85 175.4 0.2669 1.49 175.4 0.2663 1.96
TRSGD 102.3 0.4382 1.21e3 767.0 1.00 6.27e2 2948.7 0.4158 482.64 1047.3 0.3536 411.12
rCPALS 99.0 0.2254 11.32 613.6 0.3284 10.86 3084.9 0.3434 2.12 1028.3 0.3001 5.80
rTucker 100.8 0.2146 10.65 509.2 0.3058 4.61 3093.5 0.4241 0.38 1077.4 0.4680 1.98
Table 1: Comparison of the compression performance of randomized algorithms under two deep learning datasets.

4.1 Simulation

The most important hyper-parameter of the tensor projection step is the projection size which determines the amount of residual information to be remained and controls the balance of computational speed and accuracy. In this experiment, we aim to explore how the size of the projected tensor influences the performance of our algorithms, and compare the performance with the related tensor decomposition algorithms. Except for our proposed algorithms, the rCPALS [12] which is the most related method is also adopted in this experiment. The counterparts of the three randomized algorithms are TRALS, TRSVD [6] and CPALS [13] respectively. We choose a RGB image of size as the simulation data. The projection size of order- and order- of the tensor data are chosen from , and the order- of the tensor remains as . As for parameter settings, we set the TR-rank as , CP-rank as , and the maximum iteration as for ALS-based algorithms. For TRSVD and rTRSVD, only one iteration is needed and the TR-rank is automatically chosen, so we only set the tolerance as 0.15. Figure 1 shows the approximation error (RSE) and computation time of the compared algorithms. When the projection size reaches a specific value, the performance of the randomized algorithms remain steady and similar performance with their counterparts are obtained. At the steady points where the performance of the algorithm pairs are similar, from time graph we can see, rTRALS is about 24 times faster than TRALS (2.0s vs 48.1s), and rTRSVD is about 4 times faster than TRSVD (0.11s vs 0.43s).

4.2 Deep Learning Dataset Compression

In this section, we aim to compare the compression performance and running time of our proposed algorithms and other randomized tensor decomposition method on two deep learning datasets (i.e., CIFAR10 [16] of size (training data) with entries, COIL100 [17] of size with entries ). The traditional algorithms will be inefficient because the datasets are too large, so we only compare with algorithms suitable for large-scale data, i.e., TRSGD [14], rTucker [10] and rCP [12]. The compression ratio CR is calculated by CR=Num/Np, where Num is the total entries of the data and Np is the number of model parameters. CR is controlled by different rank selection, and for rTRSVD, we set the tolerance as for automatical rank selection. Table 1 shows the compression error and time cost of all the compared algorithms. rTRSVD and rTRALS show high accuracy and speed in all the situations, while TRSGD is much slower and obtains relatively low accuracy. Though rCPALS and rTucker are fast, the accuracy is behind our algorithms.

4.3 Hyperspectral Image Denoising

Hyperspectral image (HSI) is a typical type of natural order-three tensor (i.e., ) with large-scale. For HSI image, the spectrum-mode (mode-) is usually considered to have strong low-rankness, so the projection of mode-3 can largely reduce computational cost. In this experiment, we also employ rSVD [8] which is often used in HSI image processing and rSVD is done by mode- unfolding operation. The projection size of all the algorithms are set as for the tested HSI image, and other parameters are set to get the best performance. Figure 2 and Table 2 show the visual and numerical results respectively. rTRALS outperforms the compared algorithms in the experiment.

Figure 2: Visual results of HSI data reconstruction with different noise
Noise rTR-ALS rTR-SVD TR-SGD rCP-ALS rTucker rSVD
- RSE Time 0.0150 60.01 0.149 0.45 0.249 9.45 0.100 5.38 0.0110 0.50 0.0303 1.84
20dB RSE Time 0.0294 60.21 0.143 1.20 0.253 206.82 0.101 3.97 0.0388 0.54 0.0594 2.33
10dB RSE Time 0.0811 59.61 0.113 1.27 0.293 210.89 0.107 3.91 0.114 0.46 0.156 2.08
0dB RSE Time 0.285 59.05 0.328 0.78 0.437 206.62 0.166 3.95 0.367 0.44 0.431 1.87
Table 2: Numerical results of HSI data reconstruction with different noise

5 Conclusion

In this paper, by tensor random projection method, we proposed rTRALS and rTRSVD algorithms for fast and reliable tensor ring decomposition. Without losing accuracy, the two algorithms perform much faster than their counterparts and outperform the other compared randomized algorithms in deep learning dataset compression and HSI image reconstruction experiments. Randomized method is a promising aspect for large-scale data processing. For future work, we will focus on further improving the performance and applying randomized algorithms to large-scale sparse and incomplete tensors.

References