SamBaTen: Sampling-based Batch Incremental Tensor Decomposition

09/03/2017 ∙ by Ekta Gujral, et al. ∙ University of California, Riverside 0

Tensor decompositions are invaluable tools in analyzing multimodal datasets. In many real-world scenarios, such datasets are far from being static, to the contrary they tend to grow over time. For instance, in an online social network setting, as we observe new interactions over time, our dataset gets updated in its "time" mode. How can we maintain a valid and accurate tensor decomposition of such a dynamically evolving multimodal dataset, without having to re-compute the entire decomposition after every single update? In this paper we introduce SaMbaTen, a Sampling-based Batch Incremental Tensor Decomposition algorithm, which incrementally maintains the decomposition given new updates to the tensor dataset. SaMbaTen is able to scale to datasets that the state-of-the-art in incremental tensor decomposition is unable to operate on, due to its ability to effectively summarize the existing tensor and the incoming updates, and perform all computations in the reduced summary space. We extensively evaluate SaMbaTen using synthetic and real datasets. Indicatively, SaMbaTen achieves comparable accuracy to state-of-the-art incremental and non-incremental techniques, while being 25-30 times faster. Furthermore, SaMbaTen scales to very large sparse and dense dynamically evolving tensors of dimensions up to 100K x 100K x 100K where state-of-the-art incremental approaches were not able to operate.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Tensor decomposition is a very powerful tool for many problems in data mining [13]

, machine learning

[1], chemometrics [3], signal processing [22]

to name a few areas. The success of tensor decomposition lies in its capability of finding complex patterns in multi-way settings, by leveraging higher-order structure and correlations within the data. The dominant tensor decompositions are CP/PARAFAC (henceforth referred to as CP), which extracts interpretable latent factors from the data, and Tucker, which estimates the joint subspaces of the tensor. In this work we focus on the CP decomposition, which has been shown to be extremely effective in exploratory data mining time and time again.

Fig. 1: SamBaTen vs state-of-art techniques: Our proposed method SamBaTen outperforms state-of-the-art baselines while maintaining competitive accuracy.

In a wide variety of modern real-world applications, especially in the age of Big Data, data are far from being static. To the contrary, the data get updated dynamically. In these circumstances, a data tensor needs to be shrunk, expanded or modified on any of its mode. For instance, in an online social network, new interactions occur every second and new friendships are formed at a similar pace. In the tensor realm, we may view a large proportion of these dynamic updates as an introduction of new “slices” in the tensor: in the social network example, new interactions that happen as time evolves imply the introduction of new snapshots of the network, which grow the tensor in the “time” mode. How can we handle such updates in the data without having to re-compute the decomposition whenever an update arrives, but incrementally update the pre-existing results given the new data?

Computing the decomposition for a dynamically updated tensor is challenging, with the challenges lying, primarily, on two of the three V’s in the traditional definition of Big Data: Volume and Velocity. As a tensor dataset is updated dynamically, its volume increases to the point that techniques which are not equipped to handle those updates incrementally, inevitably fail to execute due to the sheer size of the data. Furthermore, even though the applications that tensors have been successful so far do not require real-time execution per se, the decomposition algorithm must, nevertheless, be able to ingest the updates to the data at a rate that will not result in the computation being “drowned” by the incoming updates.

The majority of prior work has focused on the Tucker Decomposition of incrementing tensors [18, 25, 10], however very limited amount of work has been done on the CP. Nion and Sidiropoulos [17] proposed two methods namely Simultaneous Diagonalization Tracking (SDT) and Recursive Least Squares Tracking (RLST) and most recently, [28] introduced the OnlineCP decomposition for higher order online tensors. Even though prior work in incremental CP decomposition, by virtue of allowing for incremental updates to the already computed model, is able to deal with Velocity, when compared to the naive approach of re-executing the entire decomposition on the updated data, every time a new update arrives, it falls short when the Volume of the data grows.

[linecolor=red!60!black,backgroundcolor=gray!20,linewidth=2pt,topline=false, rightline=false, leftline=false] In this paper we propose a novel large scale incremental CP tensor decomposition that effectively leverages (potential) sparsity of the data, and achieves faster and more scalable performance than state-of-the-art baselines, while maintaining comparable accuracy. We show a snapshot of our results in Figure 1: SamBaTen is faster than all state-of-the-art methods on data that the baselines were able to operate on. Furthermore, SamBaTen was able to scale to, both dense and sparse, dynamically updated tensors, where none of the baselines was able to run. Finally, SamBaTen achieves comparable accuracy to existing incremental and non-incremental methods. Our contributions are summarized as follows:

  • Novel scalable algorithm: We introduce SamBaTen, a novel scalable and parallel incremental tensor decomposition algorithm for efficiently computing the CP decomposition of incremental tensors. The advantage of the proposed algorithm stems from the fact that it only operates on small summaries of the data at all times, thereby being able to maintain its efficiency regardless of the size of the full data. Furthermore, if the tensor and the updates are sparse, SamBaTen leverages the sparsity by summarizing in a way that retains only the useful variation in the data. To the best of our knowledge, this is the first incremental tensor decomposition which effectively leverages sparsity in the data.

  • Quality control: As a tensor is dynamically updated, some of the incoming updates may contain rank-deficient structure, which, if not handled appropriately, can pollute the results. We equip SamBaTen with a quality control option, which effectively determines whether an update is rank-deficient, and subsequently handles the update in a way that it does not affect latent factors that are not present in that update.

  • Extensive experimental evaluation: Through experimental evaluation on six real-world datasets with sizes that range up to GB, and synthetic tensors that range up to , we show that our method can incrementally maintain very accurate decompositions, faster and in a more scalable fashion than state-of-the-art methods.

Reproducibility: We make our Matlab implementation publicly vailable at Furthermore, all the datasets we use for evaluation are publicly available.

Ii Problem Formulation

Ii-a Preliminary Definitions

Tensor : A tensor is a higher order generalization of a matrix. In order to avoid overloading the term “dimension”, we call an tensor a three “mode” tensor, where “modes” are the numbers of indices used to index the tensor. The number of modes is also called “order”. Table I contains the symbols used throughout the paper. We refer the interested reader to several surveys that provide more details and a wide variety of tensor applications [14, 20]. In the interest of space, we also refer the reader to [20] for the definitions of Kronecker and Khatri-Rao products which are not essential for following the basic derivation of our approach.

Slice : A slice is a (m-1)-dimension partition of tensor where an index is varied in one mode and the indices fixed in the other modes. There are three categories of slices : horizontal ((i,:,:)) , lateral ((:,j,:)), and frontal ((:,:,k)) for third-order tensor X as shown in Figure 2.

Fig. 2: Slices of 3-order tensor (a) horizontal (b) lateral , (c) frontal .
Symbols Definition

Tensor, Matrix, Column vector, Scalar

Set of Real Numbers
Outer product
Frobenius norm, norm
Spanning the elements of in indices
Spanning all elements of
column of
row of
Kronecker product
Khatri-Rao product (column-wise Kronecker product [20])
TABLE I: Table of symbols and their description

Canonical Polyadic Decomposition : One of the most popular and widely used tensor decompositions is the Canonical Polyadic (CP) or CANDECOMP/PARAFAC decomposition [6, 12]. We henceforth refer to this decomposition as CP. In CP., the tensor is decomposed into a sum of rank-one tensors, i.e., a sum of outer products of three vectors (for three-mode tensors): where , and the outer product is given by for all . In order to compute the decomposition we typically need to minimize the squared differences (i.e., Frobenius norm) between the original tensor and the model There exist other modeling approaches in the literature [7]

which minimize the KL-Divergence, however, Frobenius norm-based approaches are still to this day the most well studied. We reserve investigation of other loss functions as future work.

Ii-B Problem Definition

In many real-world applications, data grow dynamically and may do so in many modes. For example, given a dynamic tensor in a location-based recommendation system, as shown in Figure 3(a), structured as location hot-spots people, the number of registered location, hot-spots , and people visited may all increase over time. Another example is time-evolving social network interactions figure 3 (b) as also described in Introduction section. This incremental property of data gives rise to the need for an on-the-fly update of the existing decomposition, which we name incremental tensor decomposition. Notice that the literature (and thereby this paper) uses the terms “incremental”, “dynamic”, and “online” interchangeably. In such scenarios, data updates happen very fast which make traditional (non-incremental) methods to collapse because they need to recompute the entire large scaled data.

Fig. 3: Real life dynamic data examples (a) collecting new location update from satellite to GPS recommendations systems (b) growing Facebook interaction between people over the time.

This paper address the problem of large scale incremental tensor decomposition. Without loss of generality , we focus on a 3-mode tensor one of whose modes is growing with time. However, the problem definition (and our proposed method) extends to any number of modes. Let us consider = at time . The CP decomposition of is given as :

where of dimension and is of dimension . When new incoming slice = is added in mode 3, required decomposition at time is :

where of dimension and is of dimension .

The problem that we solve is the following:

[linecolor=red!60!black,backgroundcolor=gray!20,linewidth=2pt, topline=false,rightline=false, leftline=false] Problem Definition. Given (a) pre-existing set of decomposition results and of components, which approximate tensor of size at time t , (b) new incoming slice in form of tensor of size at any time , find updates of and incrementally to approximate tensor of dimension , where K = after appending new slice or tensor to dimension while maintaining a comparable accuracy with running the full CP decompositon on the entire updated tensor .

To simplify notation, we will interchangeably refer to as (when we need to refer to specific indices of that matrix), and similarly for we shall refer to it as .

Iii Proposed Method: SamBaTen

As we mention in the introduction, there exists a body of work in the literature that is able to efficiently and incrementally update the CP decomposition in the presence of incoming tensor slices [17, 28]. However, those methods fall short when the size of of the dynamically growing tensor increases, and eventually are not able to scale to very large dynamic tensors. The reason why this happens is because these methods operate on the full data, and thus, even though they incrementally update the decomposition (avoiding to re-compute it from scratch), inevitably, as the size of the full data grows, it takes a toll on the run-time and scalability.

In this paper we propose SamBaTen, which takes a different view of the solution, where instead of operating on the full data, it operates on a summary of the data. Suppose that the “complete” tensor (i.e., the one that we will eventually get when we finish receiving updates) is denoted by . Any given incoming slice (or even a batch of slice updates) can be, thus, seen as a sample of that tensor, where the sampled indices in the third mode (which we assume is the one receiving the updates) are the indices of the incoming slice(s). Suppose, further, that given a set of sample tensors (which are drawn by randomly selecting indices from all the modes of the tensor) we can approximate the original tensor with high-accuracy (which, in fact, the literature has shown that it is possible [9, 8]). Therefore, when we receive a new set of slices as an update, if we update those samples with the new indices, then we should be able to compute a decomposition very efficiently which incorporates the slice updates, and approximates the updated tensor well. This line of reasoning inspires SamBaTen, a visual summary of which is shown in Figure 4.

Fig. 4: SamBaTen: Sampling-based Batch Incremental Tensor Decomposition: 1) Sample incoming tensor into sub-tensors, 2) run parallel decompositions on the samples, 3) project back the results into the original space, and, finally, 4) update the incrementally growing factor matrix .

Iii-a The heart of SamBaTen

The algorithmic framework we propose is shown in Figure 4 and is described below: We assume that we have a pre-existing set of decomposition results before the update, as well as a set of summaries of the tensor before the update. Summaries are in the form of sampled sub-tensors, as described in the text below. For simplicity of description, we assume that we are receiving updated slices on the third mode, which in turn have to add new rows to the matrix (that correspond to the third mode). We, further, assume that the updates come in batches of new slices, which, in turn, ensures that we see a mature-enough update to the tensor, which contains useful structure. Trivially, however, SamBaTen can operate on singleton batches.

In the following lines, is the tensor prior to the update and is the batch of incoming slices. Given an incoming batch, SamBaTen performs the following steps:

Sample: The rationale behind SamBaTen is that each batch can be seen as a sample of third-mode indices of (what is going to be) the full tensor. In this step, we are going to merge these incoming indices with an already existing set of sampled tensors. In order to obtain those pre-existing samples, we follow a similar approach to [9]. Namely, we sample indices from the tensor based on a measure of importance. To determine the importance for each mode

and then sample the indices using this measure as a sampling weight divided by its probability. An appropriate measure of importance (MoI) is the

sum-of-squares of the tensor for each mode. For the first mode , MoI is defined as:


for (1,I). Similarly, we can define the MoI for modes 2 and 3.

We sample each mode of without replacement, using Eq. 1 to bias the sampling probabilities. With as sampling factor, i.e. if has size , then will be of size . Sampling rate for each mode is independent from each other, and in fact, different rates can be used for imbalanced modes.

In the case of sparse tensors, the sample will focus on the dense regions of the tensor which contains most of the useful structure. In the case of dense tensors, the sample will give priority to the regions of the tensor with the highest variation.

After forming the sample summary for , we merge it with the samples obtained from the intersection of the third-mode indices of and the already sampled indices in the remaining modes, so that the final sample is equal to =, where are the third-mode indices of .

Due to the randomized nature of this summarization, we need to draw multiple samples, in order to obtain a reliable set of summaries. Each such independent sample is denoted as . In the case of dense tensors, obtaining multiple, independent random samples helps summarize as much of the useful variation as possible. In fact, we will see in the experimental evaluation that increasing the number of samples, especially for dense tensors, improves accuracy.

In [9] the authors note that in order for their method to work, a set of anchor indices must be common between all samples, so that, later on, we can establish the correct correspondence of latent factors. However, in SamBaTen we do not have to actively fix a set of indices across sampling repetitions. When we sample each time, those indices correspond to a portion of the decomposition that is already computed. Therefore, the entire set of indices can serve as the set of anchors. This is a major advantage compared to [9], since SamBaTen 1) does not need to commit to a set of fixed indices for all samples a-priori, which, due to randomization may happen to represent a badly structured portion of the tensor, 2) does not need to be restricted in a “small” set of fixed common indices (which is required in [9] in order to ensure that sufficiently enough new indices are sampled across repetitions), but to the contrary, is able to use a larger number of anchor indices to establish correspondence more reliably, and 3) does not require any synchronization between different sampling repetitions, which results in higher parallelism potential.

0:  Tensor of size , Factor matrices of size , and respectively, sampling factor and number of repetitions . 0:  Factor matrices of size , and , . 1:  for  to  do 2:     Compute and . 3:     Sample a set of indices from without replacement using as probability (accordingly for the rest). 4:     = 5:      CP. 6:     Normalize (as shown in the text) and absorb scaling in . 7:     Compute optimal matching between the columns of and as discussed in the text 8:     Update only zero entries of that correspond to sampled entries of 9:     Obtain of size by taking the last rows of . 10:     Use a shared copy of and average out its entries in a column-wise fashion across different sampling repetitions. 11:  end for 12:  Update of size as :
13:  Update scaling as the average of the previous and new value. 14:  return  A,B,C,
Algorithm 1 SamBaTen

Decompose: Having obtained , from the previous step, SamBaTen decomposes the summary using any state-of-the-art algorithm, obtaining factor matrices . For the purposes of this paper, we use the Alternating Least Squares (ALS) algorithm for the CP decomposition, which is probably the most well studied algorithm for CP. The rank used here is equal to the universal rank , however, in Section III-B we will discuss whether this choice is always the most appropriate and what alternatives there are.

Project back: The CP decomposition is unique (under mild conditions) up to permutation and scaling of the components [26]. This means that, even though the existing decomposition may have established an order of the components, the decomposition we obtained in the previous step is very likely to introduce a different ordering and scaling, as a result of the aforementioned permutation and scaling ambiguity. Formally, the sampled portion of the existing factors and the currently computed factors are connected through the following relation:

where is a diagonal scaling matrix (which without loss of generality we absorb on the first factor), and is a permutation matrix that permutes the order of the components (columns of the factors).

In order to tackle the scaling ambiguity, we need to normalize the results in a consistent manner. In particular, we normalize such that each column of the newly computed factors which spans the indices that are shared with has unit norm: and accordingly for the remaining factors. Note that for , trivially and similarly for . After normalizing, the relation between the existing factors and the currently computed is (and similarly for the remaining factors). Each iteration retains a copy of which will serve the anchor for disambiguating the permutation of components. We normalize to unit norm as well, and the reason behind that lies in the following Lemma:

Lemma 1.

Consider and . If and correspond to the same latent CP factor, in the noiseless case, then otherwise .


From Cauchy-Schwartz inequality . The above inequality is maximized when and for unit norm , . Therefore, if , which happens when and correspond to the same latent CP factor,

Given the above Lemma, we have a guide for identifying the permutation matrix : For every column of we compute the inner product with every column of and compute a matching when the inner product is equal (or close) to 1. Given a large-enough number of rows for (which is usually the case, since we require a large-enough sample of the tensor in order to augment it with the update and compute the factor updates accurately), this matching can be computed more reliably than past approaches that use a related means of permutation disambiguation [9] but rely on a very small number of shared rows which results in sub-optimal results in the presence of noise.

Update results:

After appropriately permuting the columns of , we have all the information needed to update our model. Returning to the problem definition of Section II, contains the updates to the rows within for (and similarly for and ). Even though do not increase their number of rows over time, the incoming slices may contribute valuable new estimates to the already estimated factors. Thus, for the already existing portions of we only update the zero entries that fall within the range of , and respectively. Finally, contains the factors for the newly arrived slices, which need to be merged to the already existing columns of . Since we have properly permuted the columns of , we accumulate the lower portion of the (which corresponds to the newly added slices) into and we take the column-wise average of the rows to-be-appended to , across repetitions. Finally, we update

Iii-B Dealing with rank deficient updates

So far, the above algorithm description is based on the assumption that each one of the sampled tensors we obtain are full-rank, and the CP decomposition of that tensor is identifiable (assumption that is also central to previous works [9]). However, this assumption glosses over the fact that in reality, updates to our tensor may be rank-deficient. In other words, even though have components, the update may contain components, where . If that happens, then the matching as described above is going to fail, and this inevitably leads to very low quality results. Here, we tackle this issue by adding an extra layerincludegraphics of quality control when we compute the CP decomposition, right before line 5 of Algorithm 1: we estimate the number of components in and instead of the “universal” rank , which may result in low quality factors, we use and we accordingly match only those to their most likely matches within the existing

components. Estimating the number of components is a very hard problem, however, there exist efficient heuristics in the literature, such as the Core Consistency Diagnostic (

CorConDia) [4] which gives a quality rating for a computed CP decomposition. By successively trying different candidate ranks for , we estimate its actual rank, as shown in Algorithm 2 (GetRank), and use that instead. For efficiency we use a recent implementation of CorConDia that is especially tailored for exploiting sparsity [19]. In the experimental evaluation we demonstrate that using GetRank indeed results in higher-quality latent factors

0:  Tensor , maximum rank , maximum no. of iterations . 0:   1:  for 1 to  do 2:     for  to  do 3:         Run CP Decomposition on with rank and obtain 4:         Run CorConDia (,) and obtain 5:     end for 6:  end for 7:  sort and get top 1 index . 8:  return   =
Algorithm 2 GetRank

Iv Experimental Evaluation

In this section we extensively evaluate the performance of SamBaTen on multiple synthetic and real datasets, and compare its performance with state-of-the-art approaches. We experiment on the different parameters of SamBaTen and the baselines, and how that affects performance. We implemented SamBaTen in Matlab using the functionality of the Tensor Toolbox for Matlab [2] which supports very efficient computations for sparse tensors. Our implementation is available at We used Intel(R) Xeon(R), CPU E5-2680 v3 @ 2.50GHz machine with 48 CPU cores and 378GB RAM.

Iv-a Data-set description

Iv-A1 Synthetic data generation

In order to fully explore the performance of SamBaTen, in our experiments we generate synthetic tensor with varying density. We generate random tensors of dimension where I [100, 500, 1000, 3000, 5000, 10000, 50000, 100000]. Those tensors are created from a known set of randomly generated factors, so that we have full control over the ground truth of the full decomposition. We dynamically calculate the size of batch or slice for our all experiments to fit the data into memory. For example for a tensor, we selected batch size of 150 and for 5000 tensor, we selected batch size of 100. The specifications of each synthetic dataset are given in Table II.

Dimension Density- Density- Batch Sampling
() dense sparse size factor
100 100% 65% 50 2
500 100% 65% 150 2
1000 100% 55% 150 2
3000 100% 55% 100 5
5000 100% 55% 100 5
10000 100% 55% 10 2
50000 100% 35% 5 2
100000 100% 35% 5 2
TABLE II: Table of Datasets analyzed

Iv-A2 Real Data Description

In order to truly evaluate the effectiveness of SamBaTen, we test its performance against six real datasets that have been used in the literature. Those datasets are summarized in Table III and are publicly available at [23].

Name Description Dimensions NNZ Batch Sampling Dataset
size factor File Size
NIPS [11] (Paper,Author,Word) 2,482 x 2862 x 14036 3,101,609 500 10 57MB

NELL [5]
(Entity,Relation,Entity) 12092 x 9184 x 28818 76,879,419 500 10 1.4GB

Facebook-wall [27]
(Wall owner, Poster, day) 62,891 x 62,891 x 1,070 78,067,090 100 5 2.1GB

Facebook-links [27]
(User, Links, Day) 62,891 x 62,891 x 650 263,544,295 50 2 3.8GB

(Term ,Term, Year) 239,172 x 239,172 x 46 3,596,640,708 10 2 73GB

(User, Product ,Word) 4,821,207 x 1,774,269 x 1,805,187 1,741,809,018 50000 20 43GB
TABLE III: Real datasets analyzed

Iv-B Evaluation Measures

In order to obtain an accurate picture of the performance, we evaluate SamBaTen and the baselines using three criteria: Relative Error, Wall-Clock time and Fitness. These measures provide a quantitative way to compare the performance of our method. More Specifically, Relative Error is effectiveness measurement and defined as :

where, the lower the value, the better the approximation.

CPU time (sec) indicates how much faster does the decomposition runs as compared to re-running the entire decomposition algorithm whenever we receive a new update on the existing tensor. The average running time denoted by for processing all slices for given tensor, measured in seconds, and is used to validate the time efficiency of an algorithm.

Relative Fitness: tracking the decomposition instead of recomputing it is inevitably an approximate task, so we would like to minimize the discrepancy of the incremental algorithm’s result from the one that has to re-compute the decomposition for every update. Relative Fitness is defined as:

where, the lower the value, the better the approximation for SamBaTen.

Iv-C Baselines for Comparison

Here we briefly present the state-of-the-art baselines we used for comparison. Note that for each baseline we use the reported parameters that yielded the best performance in the respective publications. For fairness, we compare against the parameter configuration for SamBaTen that yielded the best performance in terms of low wall-clock timing, low relative error and fitness. However, moving one step further, we evaluate GetRank and demonstrate that it qualitatively improve the performance for SamBaTen. We also show that using GetRank in SamBaTen

we can still achieve better average running time when datasets are very large. Note that all comparisons were carried out over 10 iterations each, and each number reported is an average with a standard deviation attached to it.

CP_ALS [2]: is considered the most standard and well optimized implementation of the Alternating Least Squares algorithm for CP. We use the implementation of the Tensor Toolbox for Matlab [2]. Here, we simply re-compute CP using CP_ALS every time a new batch update arrives.

SDT [17]

: Simultaneous Diagonalization Tracking (SDT is based on incrementally tracking the Singular Value Decomposition (SVD) of the unfolded tensor

and obtain and . Matrix (resultant left singular vector) and (resultant right singular vector) are then estimated by applying SVD on each column of .
RLST [17]:Recursive Least Squares Tracking (RLST) is another online approach in which recursive updates are computed to minimize the Mean Squared Error (MSE) on incoming slice. In RLST, is computed as and is updated as . Then is estimated using matrix inversion on and .
OnilneCP [28]: The most recent andrelated work to ours was proposed by Zhou, el at. [28] is an OnlineCP decomposition method, where the the latent factors are updated when there are new data. OnilneCP fixes and to solve for and minimizes the cost as:

We conduct our experiments on multiple synthetic datasets and six real-world tensors datasets. We set the tolerance rate for convergence between consecutive iterations to and the maximum number of iteration to 1000 for all the algorithms.The batch size and sampling factor is selected based on dimensions of first mode i.e. I, provided in table II and III for synthetic and real dataset respectively. We use the publicly available implementations for the baselines, as provided by the authors. We only modified the interface of the baselines, so that it is consistent across all methods with respect to the way that they receive the incoming slices. No other functionality has been changed.

Iv-D Experimental Results

In this section, we experimentally evaluate the performance of the proposed method SamBaTen. The following major four aspects are analyzed.
Q1: How effective is SamBaTen as compared to the baselines on different synthetic and real world datasets.
Q2: How fast is SamBaTen when compared to the state-of-the-art methods on very large sized datasets?
Q3: What is the cost-benefit trade-off of computing the actual rank of the incoming batch?
Q4: What is the influence of sampling factor and number of sampling repetitions on SamBaTen?

Iv-D1 Baselines for Comparison

For all datasets we compute Relative Error,CPU time (sec) and Fitness. For SamBaTen, CP_ALS , OnlineCP, RSLT and SDT we use 10% of the data in each dataset as existing dataset. We experimented for both dense as well as sparse tensor to check the performance of our method. The results for the dense and sparse synthetic data are shown in Table IV - V. For each of datasets , the best result is shown in bold. OnlineCP, SDT and RLST address the issue very well. Compared with CP_ALS, SDT and RLST reduce the mean running time by up to 2x times and OnlineCP reduce mean time by up to 3x times for small dataset (I up to 3000). Performance of RLST was better than SDT algorithm on 8 out of 8 third-order synthetic tensor datasets. In fact, the efficiency (in terms of CPU time (sec)) of SDT is quite close to RLST. However, the main issue of SDT and RLST is their estimation of relative error and fitness. For some datasets, such as and , they perform well, while for some others, they exhibit poor fitness and relative error, achieving only nearly half of the fitness of other methods. For small size datasets , OnlineCP’s efficiency and accuracy is better than all methods. As the dimension grows, however, the performance of OnlineCP method reduces,and particularly for datasets of dimension larger than . Same behavior is observed for sparse tensors. SamBaTen is comparable to baselines for small dataset and outperformed the baselines for large dataset. CP_ALS is the only baseline able to execute dataset up to size . These results answer Q1 as the SamBaTen have comparable accuracy to other baseline methods.

I=J=K OnlineCP SDT RLST SamBaTen
100 0.109 0.01 0.107 0.02 0.173 0.02 0.151 0.02 0.115 0.02
500 0.102 0.09 0.102 0.09 0.217 0.06 0.217 0.06 0.102 0.09
1000 0.103 0.01 0.103 0.01 0.287 0.01 0.296 0.01 0.102 0.01
3000 0.119 0.01 0.108 0.01 0.189 0.01 0.206 0.01 0.109 0.01
5000 N/A 0.122 0.002 0.201 0.002 0.196 0.04 0.115 0.009
10000 N/A 0.173 0.04 0.225 0.04 0.252 0.06 0.162 0.01
50000 N/A 0.215 0.03 0.229 0.03 0.26 0.01 0.169 0.01
100000 N/A N/A N/A N/A 0.275 0.00
TABLE IV: Experimental results for relative error for synthetic dense tensor. We see that SamBaTen gives comparable accuracy to baseline.
Fig. 5: Experimental results for CPU time (sec) for (a) dense tensor (b) sparse tensor
Fig. 6: Experimental results for Relative Fitness Improvement for (a) dense tensor (b) sparse tensor

OnlineCP SDT RSLT SamBaTen
100 0.169 0.01 0.154 0.02 0.306 0.01 0.313 0.01 0.178 0.01
500 0.175 0.01 0.188 0.01 0.43 0.01 0.421 0.01 0.184 0.01
1000 0.179 0.01 0.185 0.01 0.613 0.01 0.813 0.01 0.178 0.01
3000 0.177 0.02 0.171 0.04 0.446 0.03 0.513 0.02 0.176 0.03
5000 N/A 0.192 0.02 0.494 0.09 0.535 0.13 0.187 0.04
10000 N/A 0.173 0.01 0.212 0.01 0.224 0.11 0.198 0.12
50000 N/A 0.222 0.00 0.262 0.00 0.259 0.00 0.200 0.00
100000 N/A N/A N/A N/A 0.283 0.00
TABLE V: Experimental results for relative error for synthetic sparse tensor. We see that SamBaTen works better in very large scale dataset such as 50000 50000 50000.

SamBaTen is efficiently able to compute sized tensor with batch size of 5 and sampling factor 2. It took 58095.72s and 47232.2s to compute online decomposition for dense and sparse tensor, respectively. On other hand, state-of-art methods are unable to handle such large scaled incoming data.

The most interesting comparison, however, is on the real datasets, since they present more challenging cases than the synthetic ones. Table VI shows the comparison between methods. SamBaTen outperforms other state-of-the-art approaches in most of the real dataset. In the case of NIPS datset, SamBaTen gives better results compared to the baselines, specifically in terms of CPU Time (faster up to 20 times) and Fitness (better up to 15-20%). SamBaTen outperforms for NELL, Facebook-Wall and Facebook-Links dataset in terms of efficiency comparable to CP_ALS. For the NIPS dataset, SamBaTen is 25-30 times faster than OnlineCP method. Due to high dimensions of dataset, RSLT and SDT are unable to execute further. Note that all the real datasets we use are highly sparse, however, no baselines except CP_ALS actually take advantage of that sparsity, therefore, repeated CP_ALS tends to be faster because the baselines have to deal with dense computations which tend to be slower, when the data contain a lot of zeros. Most importantly, however, SamBaTen performed very well on Amazon and Patent dataset, arguably the hardest of the six real datasets we examined and have been analyzed in the literature, where none of the baselines was able to run. These result answer Q1 and Q2 and show that SamBaTen is able to handle large dimensions and sparsity.

Dataset CPU Time (sec) Fitness SamBaTen w.r.t
NIPS 177.46 372.03 1608.23 1596.07 43.98 0.96 0.98 0.78 0.82
NELL 8783.27 42325.22 65325.22 63485.98 983.83 0.95 0.81 0.76 0.81
Facebook-wall 3041.98 N/A N/A N/A 736.07 0.97 N/A N/A N/A
Facebook-links 2689.69 N/A N/A N/A 343.32 0.96 N/A N/A N/A
Amazon N/A N/A N/A N/A 4892.07 N/A N/A N/A N/A
Patent N/A N/A N/A N/A 8068.27 N/A N/A N/A N/A
TABLE VI: SamBaTen performance for real dataset. We see that SamBaTen outperformed the baselines for all the large scaled tensors.

Iv-D2 Evaluation of Quality Control

I=J=K 200 400 600 800 1000
w/ GetRank 0.48 0.57 0.58 0.59 0.55
w/o GetRank 0.46 0.53 0.55 0.54 0.52
TABLE VII: FMS score for synthetic dataset of batch size 50 with sampling factor 2 for each dimension.
Dataset Sampling Factor 2 5 10 15 20
NIPS w/ GetRank 0.26 0.53 0.45 0.48 0.36
w/o GetRank 0.24 0.46 0.36 0.24 0.22
NELL w/ GetRank 0.48 0.37 0.48 0.43 0.26
w/o GetRank 0.25 0.16 0.38 0.37 0.24
TABLE VIII: FMS score for NIPS and NELL dataset with batch size 500, , and same sampling factor for each dimension.

Here we evaluate the performance improvement brought by GetRank, as well as investigate the additional computational overhead associated with it. We perform experiments as shown in Figure 7 on different dataset to examine the cost in terms of CPU time (sec) which we pay to compute new rank of incoming slice. To measure the accuracy of GetRank, we compute the Fitness Improvement and the Factor Matching Score (FMS) score. We define FMS score as follows. If and are single component tensors that have been normalized so that their weights are and , then the score is defined as score = penalty * () * ) * … * () where the penalty is defined by the values such that penalty = . FMS score is measured between 0 to 1, with 1 being “perfect” match. More precisely , FMS score is defined as :

Fig. 7: Experimental results for CPU Time (sec) and Relative Fitness Improvement using GetRank for synthetic dataset. Sampling factor s is 2 and batch size is 50 for each synthetic dataset.
Fig. 8: Experimental results for CPU Time (sec) and Relative Fitness Improvement using GetRank for NIPS and NELL dataset. Sampling factor s [2, 5, 10, 15, 20] with fixed batch size of 500.

We compare the performance of SamBaTen without and with GetRank on synthetic data (where we know the actual components) and on real data, where we compute CP_ALS on the full tensor and we set those as ground truth components. We observe that results consistently better in terms of FMS score VII-VIII and Fitness Improvement Figure 7 for synthetics and Figure 8 for real NIPS dataset when we incorporate GetRank in our SamBaTen, without sacrificing run-time significantly. This answers Q3.

Iv-D3 Tuning of Sampling Factor s

The sampling factor plays an important role in SamBaTen. We performed experiments to evaluate the impact of changing sampling factor on SamBaTen. For these experiments , we fixed batch size to 50 for all datasets. We see in figure 9 that increasing sampling factor results in reduction of CPU time (as sparsity of sub sampled tensor increased) and it reduces the fitness of output up-to 2-3%. In sum, these observations demonstrate that: 1) a suitable sampling factor on sub-sampled tensor could improve the fitness and result in better tensor decomposition, and 2) the higher sampling factor is, the lower the CPU time. This result partially answers Q4.

Fig. 9: SamBaTen outputs sampling factor : CPU Time (sec) and Relative Fitness on different datasets.
Fig. 10: SamBaTen outputs repetition factor r: FMS score and Relative Fitness on synthetic and NIPS real world datasets.

Iv-D4 Influence of Repetition Factor r

We evaluate the performance for parameter setting i.e number of paralleldecompositions. For these experiments, we choose batch size and sampling rate for synthetic dataset and NIPS real world dataset as provided in table II and III, respectively. We can see that with higher values of the repetition factor , FMS score and Relative Fitness (SamBaTen vs CP_ALS) is improved as shown in figure 10. We experiment on varying repetition factor r with Sampling factor s on NIPS real world dataset to check the performance of our method as shown in figure 11. Note that higher FMS score indicates a better decomposition and, similarly, the lower the fitness score, the better decomposition. This result completes the answer to Q4.

Fig. 11: SamBaTen outputs for repetition factor r and Sampling factor s: FMS score and Relative Fitness for NIPS real world dataset.

V Related Work

In this section, we provide review of the work related to our algorithm. At large, incremental tensor methods in the literature can be categorized into three main categories: 1) Tucker decomposition, 2) CP decomposition, 3) Tensor completion

Tucker Decomposition: Online tensor decomposition was first proposed by Sun el at.[25]

as ITA (Incremental Tensor Analysis). In there research, they described the three variants of Incremental Tensor Analysis. First, DTA i.e. Dynamic tensor analysis which is based on calculation of co-variance of matrices in traditional higher-order singular value decomposition in an incremental fashion. Second, with help of SPIRIT algorithm, they found approximation of DTA named as Stream Tensor Analysis (STA). Third, they proposed window-based tensor analysis (WTA). To improve the efficiency of DTA, it uses a sliding window strategy.

Liu el at.[18] proposed an efficient method to diagonalize the core tensor to overcome this problem. Other approaches replace SVD with incremental SVD to improve the efficiency. Hadi el at. [10] proposed multi-aspect-streaming tensor analysis (MASTA) method that relaxes constraint and allows the tensor to concurrently evolve through all modes.

CP Decomposition: There is very limited study on online CP decomposition methods. Phan el at. [21] had developed a theoretic approach GridTF to large-scale tensors processing based on an extension to CP’s fundamental mathematics theory. They used divide and concur technique to get sub-tensors and fuses the output of all factorization to achieve final factor matrices which is proved to be same as decomposing the whole tensor using CP decomposition. Its potential of concurrent computing methods to adapt the engineering applications remains unclear. Sidiropoulos el at.[17], proposed two algorithms that focus on CP decomposition namely SDT (Simultaneous Diagonalization Tracking) that incrementally perform the SVD of the unfolded tensor; and RLST (Recursive Least Squares Tracking), which recursively updates the decomposition factors by minimizing the mean squared error. The most related work to ours was proposed by Zhou, el at. [28] is an online CP decomposition method, where the the latent factors are updated when there are new data.

Tensor Completion

: Related to Tucker and CP decomposition are formulations which are focused on Tensor Completion, i.e., the estimation of missing values in a tensor. The main difference between completion and decomposition techniques is that in completion “zero” values are considered “missing” and are not part of the model, and furthermore, the goal is to impute those missing values accurately, rather than extracting latent factors or subspaces that can describe the existing (observed) data. To the best of our knowledge, the earliest work on incremental tensor completion traces back to

[15], and very recently, Qingquan el at.[24], proposed a streaming tensor completion algorithm based on block partitioning of the tensor.

Vi Conclusions

In this paper we introduce SamBaTen, a novel sample-based incremental CP tensor decomposition. We show its effectiveness with respect to approximation quality, with its performance being on par with state-of-the-art incremental and non-incremental algorithms, and we demonstrate its efficiency and scalability by outperforming state-of-the-art approaches (25-30 times faster) and being able to run very large incremental tensors where none of the baselines was able to produce results. In the future we intend to explore different tensor decompositions that can also benefit from our proposed algorithmic framework.


  • [1] Anima Anandkumar, Prateek Jain, Yang Shi, and U. N. Niranjan. Tensor vs. matrix methods: Robust tensor decomposition under block sparse perturbations. In Arthur Gretton and Christian C. Robert, editors,

    Proceedings of the 19th International Conference on Artificial Intelligence and Statistics

    , volume 51 of Proceedings of Machine Learning Research, pages 268–276, Cadiz, Spain, 09–11 May 2016. PMLR.
  • [2] Brett W Bader, Tamara G Kolda, et al. Matlab tensor toolbox version 2.6, available online, february 2015, 2015.
  • [3] R. Bro. Parafac. tutorial and applications. Chemometrics and intelligent laboratory systems, 38(2):149–171, 1997.
  • [4] Rasmus Bro and Henk AL Kiers. A new efficient method for determining the number of components in parafac models. Journal of chemometrics, 17(5):274–286, 2003.
  • [5] Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for never-ending language learning. In AAAI, volume 5, page 3, 2010.
  • [6] J Douglas Carroll and Jih-Jie Chang. Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition. Psychometrika, 35(3):283–319, 1970.
  • [7] Eric C Chi and Tamara G Kolda. On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications, 33(4):1272–1299, 2012.
  • [8] Dóra Erdos and Pauli Miettinen. Walk’n’merge: A scalable algorithm for boolean tensor factorization. In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pages 1037–1042. IEEE, 2013.
  • [9] Evangelos E. Papalexakis, Christos Faloutsos, and Nicholas D Sidiropoulos. Parcube: Sparse parallelizable tensor decompositions. In ECML-PKDD’12.
  • [10] Hadi Fanaee-T and João Gama. Multi-aspect-streaming tensor analysis. Knowledge-Based Systems, 89:332–345, 2015.
  • [11] A. Globerson, G. Chechik, F. Pereira, and N. Tishby. Euclidean Embedding of Co-occurrence Data. The Journal of Machine Learning Research, 8:2265–2295, 2007.
  • [12] R.A. Harshman. Foundations of the parafac procedure: Models and conditions for an” explanatory” multimodal factor analysis. 1970.
  • [13] Tamara G Kolda, Brett W Bader, and Joseph P Kenny. Higher-order web link analysis using multilinear algebra. In Data Mining, Fifth IEEE International Conference on, pages 8–pp. IEEE, 2005.
  • [14] T.G. Kolda and B.W. Bader. Tensor decompositions and applications. SIAM review, 51(3), 2009.
  • [15] Morteza Mardani, Gonzalo Mateos, and Georgios B Giannakis. Subspace learning and imputation for streaming big data matrices and tensors. IEEE Transactions on Signal Processing, 63(10):2663–2677, 2015.
  • [16] Julian McAuley and Jure Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems, pages 165–172. ACM, 2013.
  • [17] D. Nion and N.D. Sidiropoulos. Adaptive algorithms to track the parafac decomposition of a third-order tensor. Signal Processing, IEEE Transactions on, 57(6):2299–2310, 2009.
  • [18] Spiros Papadimitriou, Jimeng Sun, and Christos Faloutsos. Streaming pattern discovery in multiple time-series. In Proceedings of the 31st international conference on Very large data bases, pages 697–708. VLDB Endowment, 2005.
  • [19] Evangelos E Papalexakis and Christos Faloutsos. Fast efficient and scalable core consistency diagnostic for the parafac decomposition for big sparse tensors. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pages 5441–5445. IEEE, 2015.
  • [20] Evangelos E Papalexakis, Christos Faloutsos, and Nicholas D Sidiropoulos. Tensors for data mining and data fusion: Models, applications, and scalable algorithms. ACM Transactions on Intelligent Systems and Technology (TIST), 8(2):16, 2016.
  • [21] Anh Huy Phan and Andrzej Cichocki. Parafac algorithms for large-scale problems. Neurocomputing, 74(11):1970–1984, 2011.
  • [22] Nikos D Sidiropoulos. Low-rank decomposition of multi-way arrays: A signal processing perspective. In Sensor Array and Multichannel Signal Processing Workshop Proceedings, 2004, pages 52–58. IEEE, 2004.
  • [23] Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. FROSTT: The formidable repository of open sparse tensors and tools, 2017.
  • [24] Qingquan Song, Hancheng Ge Xiao Huang, James Caverlee, and Xia Hu. Multi-aspect streaming tensor completion. In Proceedings of the 23th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2017.
  • [25] Jimeng Sun, Dacheng Tao, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos. Incremental tensor analysis: Theory and applications. ACM Trans. Knowl. Discov. Data, 2(3):11:1–11:37, October 2008.
  • [26] Jos MF ten Berge and Nikolaos D Sidiropoulos. On uniqueness in candecomp/parafac. Psychometrika, 67(3):399–409, 2002.
  • [27] Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P Gummadi. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM workshop on Online social networks, pages 37–42. ACM, 2009.
  • [28] Shuo Zhou, Nguyen Xuan Vinh, James Bailey, Yunzhe Jia, and Ian Davidson. Accelerating online cp decompositions for higher order tensors. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1375–1384. ACM, 2016.