I Introduction
Tensor decomposition is a very powerful tool for many problems in data mining [13]
[1], chemometrics [3], signal processing [22]to name a few areas. The success of tensor decomposition lies in its capability of finding complex patterns in multiway settings, by leveraging higherorder structure and correlations within the data. The dominant tensor decompositions are CP/PARAFAC (henceforth referred to as CP), which extracts interpretable latent factors from the data, and Tucker, which estimates the joint subspaces of the tensor. In this work we focus on the CP decomposition, which has been shown to be extremely effective in exploratory data mining time and time again.
In a wide variety of modern realworld applications, especially in the age of Big Data, data are far from being static. To the contrary, the data get updated dynamically. In these circumstances, a data tensor needs to be shrunk, expanded or modified on any of its mode. For instance, in an online social network, new interactions occur every second and new friendships are formed at a similar pace. In the tensor realm, we may view a large proportion of these dynamic updates as an introduction of new “slices” in the tensor: in the social network example, new interactions that happen as time evolves imply the introduction of new snapshots of the network, which grow the tensor in the “time” mode. How can we handle such updates in the data without having to recompute the decomposition whenever an update arrives, but incrementally update the preexisting results given the new data?
Computing the decomposition for a dynamically updated tensor is challenging, with the challenges lying, primarily, on two of the three V’s in the traditional definition of Big Data: Volume and Velocity. As a tensor dataset is updated dynamically, its volume increases to the point that techniques which are not equipped to handle those updates incrementally, inevitably fail to execute due to the sheer size of the data. Furthermore, even though the applications that tensors have been successful so far do not require realtime execution per se, the decomposition algorithm must, nevertheless, be able to ingest the updates to the data at a rate that will not result in the computation being “drowned” by the incoming updates.
The majority of prior work has focused on the Tucker Decomposition of incrementing tensors [18, 25, 10], however very limited amount of work has been done on the CP. Nion and Sidiropoulos [17] proposed two methods namely Simultaneous Diagonalization Tracking (SDT) and Recursive Least Squares Tracking (RLST) and most recently, [28] introduced the OnlineCP decomposition for higher order online tensors. Even though prior work in incremental CP decomposition, by virtue of allowing for incremental updates to the already computed model, is able to deal with Velocity, when compared to the naive approach of reexecuting the entire decomposition on the updated data, every time a new update arrives, it falls short when the Volume of the data grows.
[linecolor=red!60!black,backgroundcolor=gray!20,linewidth=2pt,topline=false, rightline=false, leftline=false] In this paper we propose a novel large scale incremental CP tensor decomposition that effectively leverages (potential) sparsity of the data, and achieves faster and more scalable performance than stateoftheart baselines, while maintaining comparable accuracy. We show a snapshot of our results in Figure 1: SamBaTen is faster than all stateoftheart methods on data that the baselines were able to operate on. Furthermore, SamBaTen was able to scale to, both dense and sparse, dynamically updated tensors, where none of the baselines was able to run. Finally, SamBaTen achieves comparable accuracy to existing incremental and nonincremental methods. Our contributions are summarized as follows:

Novel scalable algorithm: We introduce SamBaTen, a novel scalable and parallel incremental tensor decomposition algorithm for efficiently computing the CP decomposition of incremental tensors. The advantage of the proposed algorithm stems from the fact that it only operates on small summaries of the data at all times, thereby being able to maintain its efficiency regardless of the size of the full data. Furthermore, if the tensor and the updates are sparse, SamBaTen leverages the sparsity by summarizing in a way that retains only the useful variation in the data. To the best of our knowledge, this is the first incremental tensor decomposition which effectively leverages sparsity in the data.

Quality control: As a tensor is dynamically updated, some of the incoming updates may contain rankdeficient structure, which, if not handled appropriately, can pollute the results. We equip SamBaTen with a quality control option, which effectively determines whether an update is rankdeficient, and subsequently handles the update in a way that it does not affect latent factors that are not present in that update.

Extensive experimental evaluation: Through experimental evaluation on six realworld datasets with sizes that range up to GB, and synthetic tensors that range up to , we show that our method can incrementally maintain very accurate decompositions, faster and in a more scalable fashion than stateoftheart methods.
Reproducibility: We make our Matlab implementation publicly vailable at www.cs.ucr.edu/~epapalex/src/SamBaTen.zip. Furthermore, all the datasets we use for evaluation are publicly available.
Ii Problem Formulation
Iia Preliminary Definitions
Tensor : A tensor is a higher order generalization of a matrix. In order to avoid overloading the term “dimension”, we call an tensor a three “mode” tensor, where “modes” are the numbers of indices used to index the tensor. The number of modes is also called “order”. Table I contains the symbols used throughout the paper. We refer the interested reader to several surveys that provide more details and a wide variety of tensor applications [14, 20]. In the interest of space, we also refer the reader to [20] for the definitions of Kronecker and KhatriRao products which are not essential for following the basic derivation of our approach.
Slice : A slice is a (m1)dimension partition of tensor where an index is varied in one mode and the indices fixed in the other modes. There are three categories of slices : horizontal ((i,:,:)) , lateral ((:,j,:)), and frontal ((:,:,k)) for thirdorder tensor X as shown in Figure 2.
Symbols  Definition 

Tensor, Matrix, Column vector, Scalar 

Set of Real Numbers  
Outer product  
Frobenius norm, norm  
Spanning the elements of in indices  
Spanning all elements of  
column of  
row of  
Kronecker product  
KhatriRao product (columnwise Kronecker product [20]) 
Canonical Polyadic Decomposition : One of the most popular and widely used tensor decompositions is the Canonical Polyadic (CP) or CANDECOMP/PARAFAC decomposition [6, 12]. We henceforth refer to this decomposition as CP. In CP., the tensor is decomposed into a sum of rankone tensors, i.e., a sum of outer products of three vectors (for threemode tensors): where , and the outer product is given by for all . In order to compute the decomposition we typically need to minimize the squared differences (i.e., Frobenius norm) between the original tensor and the model There exist other modeling approaches in the literature [7]
which minimize the KLDivergence, however, Frobenius normbased approaches are still to this day the most well studied. We reserve investigation of other loss functions as future work.
IiB Problem Definition
In many realworld applications, data grow dynamically and may do so in many modes. For example, given a dynamic tensor in a locationbased recommendation system, as shown in Figure 3(a), structured as location hotspots people, the number of registered location, hotspots , and people visited may all increase over time. Another example is timeevolving social network interactions figure 3 (b) as also described in Introduction section. This incremental property of data gives rise to the need for an onthefly update of the existing decomposition, which we name incremental tensor decomposition. Notice that the literature (and thereby this paper) uses the terms “incremental”, “dynamic”, and “online” interchangeably. In such scenarios, data updates happen very fast which make traditional (nonincremental) methods to collapse because they need to recompute the entire large scaled data.
This paper address the problem of large scale incremental tensor decomposition. Without loss of generality , we focus on a 3mode tensor one of whose modes is growing with time. However, the problem definition (and our proposed method) extends to any number of modes. Let us consider = at time . The CP decomposition of is given as :
where of dimension and is of dimension . When new incoming slice = is added in mode 3, required decomposition at time is :
where of dimension and is of dimension .
The problem that we solve is the following:
[linecolor=red!60!black,backgroundcolor=gray!20,linewidth=2pt, topline=false,rightline=false, leftline=false] Problem Definition. Given (a) preexisting set of decomposition results and of components, which approximate tensor of size at time t , (b) new incoming slice in form of tensor of size at any time , find updates of and incrementally to approximate tensor of dimension , where K = after appending new slice or tensor to dimension while maintaining a comparable accuracy with running the full CP decompositon on the entire updated tensor .
To simplify notation, we will interchangeably refer to as (when we need to refer to specific indices of that matrix), and similarly for we shall refer to it as .
Iii Proposed Method: SamBaTen
As we mention in the introduction, there exists a body of work in the literature that is able to efficiently and incrementally update the CP decomposition in the presence of incoming tensor slices [17, 28]. However, those methods fall short when the size of of the dynamically growing tensor increases, and eventually are not able to scale to very large dynamic tensors. The reason why this happens is because these methods operate on the full data, and thus, even though they incrementally update the decomposition (avoiding to recompute it from scratch), inevitably, as the size of the full data grows, it takes a toll on the runtime and scalability.
In this paper we propose SamBaTen, which takes a different view of the solution, where instead of operating on the full data, it operates on a summary of the data. Suppose that the “complete” tensor (i.e., the one that we will eventually get when we finish receiving updates) is denoted by . Any given incoming slice (or even a batch of slice updates) can be, thus, seen as a sample of that tensor, where the sampled indices in the third mode (which we assume is the one receiving the updates) are the indices of the incoming slice(s). Suppose, further, that given a set of sample tensors (which are drawn by randomly selecting indices from all the modes of the tensor) we can approximate the original tensor with highaccuracy (which, in fact, the literature has shown that it is possible [9, 8]). Therefore, when we receive a new set of slices as an update, if we update those samples with the new indices, then we should be able to compute a decomposition very efficiently which incorporates the slice updates, and approximates the updated tensor well. This line of reasoning inspires SamBaTen, a visual summary of which is shown in Figure 4.
Iiia The heart of SamBaTen
The algorithmic framework we propose is shown in Figure 4 and is described below: We assume that we have a preexisting set of decomposition results before the update, as well as a set of summaries of the tensor before the update. Summaries are in the form of sampled subtensors, as described in the text below. For simplicity of description, we assume that we are receiving updated slices on the third mode, which in turn have to add new rows to the matrix (that correspond to the third mode). We, further, assume that the updates come in batches of new slices, which, in turn, ensures that we see a matureenough update to the tensor, which contains useful structure. Trivially, however, SamBaTen can operate on singleton batches.
In the following lines, is the tensor prior to the update and is the batch of incoming slices. Given an incoming batch, SamBaTen performs the following steps:
Sample: The rationale behind SamBaTen is that each batch can be seen as a sample of thirdmode indices of (what is going to be) the full tensor. In this step, we are going to merge these incoming indices with an already existing set of sampled tensors. In order to obtain those preexisting samples, we follow a similar approach to [9]. Namely, we sample indices from the tensor based on a measure of importance. To determine the importance for each mode
and then sample the indices using this measure as a sampling weight divided by its probability. An appropriate measure of importance (MoI) is the
sumofsquares of the tensor for each mode. For the first mode , MoI is defined as:(1) 
for (1,I). Similarly, we can define the MoI for modes 2 and 3.
We sample each mode of without replacement, using Eq. 1 to bias the sampling probabilities. With as sampling factor, i.e. if has size , then will be of size . Sampling rate for each mode is independent from each other, and in fact, different rates can be used for imbalanced modes.
In the case of sparse tensors, the sample will focus on the dense regions of the tensor which contains most of the useful structure. In the case of dense tensors, the sample will give priority to the regions of the tensor with the highest variation.
After forming the sample summary for , we merge it with the samples obtained from the intersection of the thirdmode indices of and the already sampled indices in the remaining modes, so that the final sample is equal to =, where are the thirdmode indices of .
Due to the randomized nature of this summarization, we need to draw multiple samples, in order to obtain a reliable set of summaries. Each such independent sample is denoted as . In the case of dense tensors, obtaining multiple, independent random samples helps summarize as much of the useful variation as possible. In fact, we will see in the experimental evaluation that increasing the number of samples, especially for dense tensors, improves accuracy.
In [9] the authors note that in order for their method to work, a set of anchor indices must be common between all samples, so that, later on, we can establish the correct correspondence of latent factors. However, in SamBaTen we do not have to actively fix a set of indices across sampling repetitions. When we sample each time, those indices correspond to a portion of the decomposition that is already computed. Therefore, the entire set of indices can serve as the set of anchors. This is a major advantage compared to [9], since SamBaTen 1) does not need to commit to a set of fixed indices for all samples apriori, which, due to randomization may happen to represent a badly structured portion of the tensor, 2) does not need to be restricted in a “small” set of fixed common indices (which is required in [9] in order to ensure that sufficiently enough new indices are sampled across repetitions), but to the contrary, is able to use a larger number of anchor indices to establish correspondence more reliably, and 3) does not require any synchronization between different sampling repetitions, which results in higher parallelism potential.
Decompose: Having obtained , from the previous step, SamBaTen decomposes the summary using any stateoftheart algorithm, obtaining factor matrices . For the purposes of this paper, we use the Alternating Least Squares (ALS) algorithm for the CP decomposition, which is probably the most well studied algorithm for CP. The rank used here is equal to the universal rank , however, in Section IIIB we will discuss whether this choice is always the most appropriate and what alternatives there are.
Project back: The CP decomposition is unique (under mild conditions) up to permutation and scaling of the components [26]. This means that, even though the existing decomposition may have established an order of the components, the decomposition we obtained in the previous step is very likely to introduce a different ordering and scaling, as a result of the aforementioned permutation and scaling ambiguity. Formally, the sampled portion of the existing factors and the currently computed factors are connected through the following relation:
where is a diagonal scaling matrix (which without loss of generality we absorb on the first factor), and is a permutation matrix that permutes the order of the components (columns of the factors).
In order to tackle the scaling ambiguity, we need to normalize the results in a consistent manner. In particular, we normalize such that each column of the newly computed factors which spans the indices that are shared with has unit norm: and accordingly for the remaining factors. Note that for , trivially and similarly for . After normalizing, the relation between the existing factors and the currently computed is (and similarly for the remaining factors). Each iteration retains a copy of which will serve the anchor for disambiguating the permutation of components. We normalize to unit norm as well, and the reason behind that lies in the following Lemma:
Lemma 1.
Consider and . If and correspond to the same latent CP factor, in the noiseless case, then otherwise .
Proof.
From CauchySchwartz inequality . The above inequality is maximized when and for unit norm , . Therefore, if , which happens when and correspond to the same latent CP factor, ∎
Given the above Lemma, we have a guide for identifying the permutation matrix : For every column of we compute the inner product with every column of and compute a matching when the inner product is equal (or close) to 1. Given a largeenough number of rows for (which is usually the case, since we require a largeenough sample of the tensor in order to augment it with the update and compute the factor updates accurately), this matching can be computed more reliably than past approaches that use a related means of permutation disambiguation [9] but rely on a very small number of shared rows which results in suboptimal results in the presence of noise.
Update results:
After appropriately permuting the columns of , we have all the information needed to update our model. Returning to the problem definition of Section II, contains the updates to the rows within for (and similarly for and ). Even though do not increase their number of rows over time, the incoming slices may contribute valuable new estimates to the already estimated factors. Thus, for the already existing portions of we only update the zero entries that fall within the range of , and respectively. Finally, contains the factors for the newly arrived slices, which need to be merged to the already existing columns of . Since we have properly permuted the columns of , we accumulate the lower portion of the (which corresponds to the newly added slices) into and we take the columnwise average of the rows tobeappended to , across repetitions. Finally, we update
IiiB Dealing with rank deficient updates
So far, the above algorithm description is based on the assumption that each one of the sampled tensors we obtain are fullrank, and the CP decomposition of that tensor is identifiable (assumption that is also central to previous works [9]). However, this assumption glosses over the fact that in reality, updates to our tensor may be rankdeficient. In other words, even though have components, the update may contain components, where . If that happens, then the matching as described above is going to fail, and this inevitably leads to very low quality results. Here, we tackle this issue by adding an extra layerincludegraphics of quality control when we compute the CP decomposition, right before line 5 of Algorithm 1: we estimate the number of components in and instead of the “universal” rank , which may result in low quality factors, we use and we accordingly match only those to their most likely matches within the existing
components. Estimating the number of components is a very hard problem, however, there exist efficient heuristics in the literature, such as the Core Consistency Diagnostic (
CorConDia) [4] which gives a quality rating for a computed CP decomposition. By successively trying different candidate ranks for , we estimate its actual rank, as shown in Algorithm 2 (GetRank), and use that instead. For efficiency we use a recent implementation of CorConDia that is especially tailored for exploiting sparsity [19]. In the experimental evaluation we demonstrate that using GetRank indeed results in higherquality latent factorsIv Experimental Evaluation
In this section we extensively evaluate the performance of SamBaTen on multiple synthetic and real datasets, and compare its performance with stateoftheart approaches. We experiment on the different parameters of SamBaTen and the baselines, and how that affects performance. We implemented SamBaTen in Matlab using the functionality of the Tensor Toolbox for Matlab [2] which supports very efficient computations for sparse tensors. Our implementation is available at www.cs.ucr.edu/~epapalex/src/SamBaTen.zip. We used Intel(R) Xeon(R), CPU E52680 v3 @ 2.50GHz machine with 48 CPU cores and 378GB RAM.
Iva Dataset description
IvA1 Synthetic data generation
In order to fully explore the performance of SamBaTen, in our experiments we generate synthetic tensor with varying density. We generate random tensors of dimension where I [100, 500, 1000, 3000, 5000, 10000, 50000, 100000]. Those tensors are created from a known set of randomly generated factors, so that we have full control over the ground truth of the full decomposition. We dynamically calculate the size of batch or slice for our all experiments to fit the data into memory. For example for a tensor, we selected batch size of 150 and for 5000 tensor, we selected batch size of 100. The specifications of each synthetic dataset are given in Table II.
Dimension  Density  Density  Batch  Sampling 

()  dense  sparse  size  factor 
100  100%  65%  50  2 
500  100%  65%  150  2 
1000  100%  55%  150  2 
3000  100%  55%  100  5 
5000  100%  55%  100  5 
10000  100%  55%  10  2 
50000  100%  35%  5  2 
100000  100%  35%  5  2 
IvA2 Real Data Description
In order to truly evaluate the effectiveness of SamBaTen, we test its performance against six real datasets that have been used in the literature. Those datasets are summarized in Table III and are publicly available at http://frostt.io/tensors [23].
Name  Description  Dimensions  NNZ  Batch  Sampling  Dataset 

size  factor  File Size  
NIPS [11]  (Paper,Author,Word)  2,482 x 2862 x 14036  3,101,609  500  10  57MB 
NELL [5] 
(Entity,Relation,Entity)  12092 x 9184 x 28818  76,879,419  500  10  1.4GB 
Facebookwall [27] 
(Wall owner, Poster, day)  62,891 x 62,891 x 1,070  78,067,090  100  5  2.1GB 
Facebooklinks [27] 
(User, Links, Day)  62,891 x 62,891 x 650  263,544,295  50  2  3.8GB 
Patents 
(Term ,Term, Year)  239,172 x 239,172 x 46  3,596,640,708  10  2  73GB 
Amazon[16] 
(User, Product ,Word)  4,821,207 x 1,774,269 x 1,805,187  1,741,809,018  50000  20  43GB 
IvB Evaluation Measures
In order to obtain an accurate picture of the performance, we evaluate SamBaTen and the baselines using three criteria: Relative Error, WallClock time and Fitness. These measures provide a quantitative way to compare the performance of our method. More Specifically, Relative Error is effectiveness measurement and defined as :
where, the lower the value, the better the approximation.
CPU time (sec) indicates how much faster does the decomposition runs as compared to rerunning the entire decomposition algorithm whenever we receive a new update on the existing tensor. The average running time denoted by for processing all slices for given tensor, measured in seconds, and is used to validate the time efficiency of an algorithm.
Relative Fitness: tracking the decomposition instead of recomputing it is inevitably an approximate task, so we would like to minimize the discrepancy of the incremental algorithm’s result from the one that has to recompute the decomposition for every update. Relative Fitness is defined as:
where, the lower the value, the better the approximation for SamBaTen.
IvC Baselines for Comparison
Here we briefly present the stateoftheart baselines we used for comparison. Note that for each baseline we use the reported parameters that yielded the best performance in the respective publications. For fairness, we compare against the parameter configuration for SamBaTen that yielded the best performance in terms of low wallclock timing, low relative error and fitness. However, moving one step further, we evaluate GetRank and demonstrate that it qualitatively improve the performance for SamBaTen. We also show that using GetRank in SamBaTen
we can still achieve better average running time when datasets are very large. Note that all comparisons were carried out over 10 iterations each, and each number reported is an average with a standard deviation attached to it.
CP_ALS [2]: is considered the most standard and well optimized implementation of the Alternating Least Squares algorithm for CP. We use the implementation of the Tensor Toolbox for Matlab [2]. Here, we simply recompute CP using CP_ALS every time a new batch update arrives.
SDT [17]
: Simultaneous Diagonalization Tracking (SDT is based on incrementally tracking the Singular Value Decomposition (SVD) of the unfolded tensor
and obtain and . Matrix (resultant left singular vector) and (resultant right singular vector) are then estimated by applying SVD on each column of .RLST [17]:Recursive Least Squares Tracking (RLST) is another online approach in which recursive updates are computed to minimize the Mean Squared Error (MSE) on incoming slice. In RLST, is computed as and is updated as . Then is estimated using matrix inversion on and .
OnilneCP [28]: The most recent andrelated work to ours was proposed by Zhou, el at. [28] is an OnlineCP decomposition method, where the the latent factors are updated when there are new data. OnilneCP fixes and to solve for and minimizes the cost as:
We conduct our experiments on multiple synthetic datasets and six realworld tensors datasets. We set the tolerance rate for convergence between consecutive iterations to and the maximum number of iteration to 1000 for all the algorithms.The batch size and sampling factor is selected based on dimensions of first mode i.e. I, provided in table II and III for synthetic and real dataset respectively. We use the publicly available implementations for the baselines, as provided by the authors. We only modified the interface of the baselines, so that it is consistent across all methods with respect to the way that they receive the incoming slices. No other functionality has been changed.
IvD Experimental Results
In this section, we experimentally evaluate the performance of the proposed method SamBaTen. The following major four aspects are analyzed.
Q1: How effective is SamBaTen as compared to the baselines on different synthetic and real world datasets.
Q2: How fast is SamBaTen when compared to the stateoftheart methods on very large sized datasets?
Q3: What is the costbenefit tradeoff of computing the actual rank of the incoming batch?
Q4: What is the influence of sampling factor and number of sampling repetitions on SamBaTen?
IvD1 Baselines for Comparison
For all datasets we compute Relative Error,CPU time (sec) and Fitness. For SamBaTen, CP_ALS , OnlineCP, RSLT and SDT we use 10% of the data in each dataset as existing dataset. We experimented for both dense as well as sparse tensor to check the performance of our method. The results for the dense and sparse synthetic data are shown in Table IV  V. For each of datasets , the best result is shown in bold. OnlineCP, SDT and RLST address the issue very well. Compared with CP_ALS, SDT and RLST reduce the mean running time by up to 2x times and OnlineCP reduce mean time by up to 3x times for small dataset (I up to 3000). Performance of RLST was better than SDT algorithm on 8 out of 8 thirdorder synthetic tensor datasets. In fact, the efficiency (in terms of CPU time (sec)) of SDT is quite close to RLST. However, the main issue of SDT and RLST is their estimation of relative error and fitness. For some datasets, such as and , they perform well, while for some others, they exhibit poor fitness and relative error, achieving only nearly half of the fitness of other methods. For small size datasets , OnlineCP’s efficiency and accuracy is better than all methods. As the dimension grows, however, the performance of OnlineCP method reduces,and particularly for datasets of dimension larger than . Same behavior is observed for sparse tensors. SamBaTen is comparable to baselines for small dataset and outperformed the baselines for large dataset. CP_ALS is the only baseline able to execute dataset up to size . These results answer Q1 as the SamBaTen have comparable accuracy to other baseline methods.
I=J=K  OnlineCP  SDT  RLST  SamBaTen  

100  0.109 0.01  0.107 0.02  0.173 0.02  0.151 0.02  0.115 0.02 
500  0.102 0.09  0.102 0.09  0.217 0.06  0.217 0.06  0.102 0.09 
1000  0.103 0.01  0.103 0.01  0.287 0.01  0.296 0.01  0.102 0.01 
3000  0.119 0.01  0.108 0.01  0.189 0.01  0.206 0.01  0.109 0.01 
5000  N/A  0.122 0.002  0.201 0.002  0.196 0.04  0.115 0.009 
10000  N/A  0.173 0.04  0.225 0.04  0.252 0.06  0.162 0.01 
50000  N/A  0.215 0.03  0.229 0.03  0.26 0.01  0.169 0.01 
100000  N/A  N/A  N/A  N/A  0.275 0.00 
I=J=K 
OnlineCP  SDT  RSLT  SamBaTen  

100  0.169 0.01  0.154 0.02  0.306 0.01  0.313 0.01  0.178 0.01 
500  0.175 0.01  0.188 0.01  0.43 0.01  0.421 0.01  0.184 0.01 
1000  0.179 0.01  0.185 0.01  0.613 0.01  0.813 0.01  0.178 0.01 
3000  0.177 0.02  0.171 0.04  0.446 0.03  0.513 0.02  0.176 0.03 
5000  N/A  0.192 0.02  0.494 0.09  0.535 0.13  0.187 0.04 
10000  N/A  0.173 0.01  0.212 0.01  0.224 0.11  0.198 0.12 
50000  N/A  0.222 0.00  0.262 0.00  0.259 0.00  0.200 0.00 
100000  N/A  N/A  N/A  N/A  0.283 0.00 
SamBaTen is efficiently able to compute sized tensor with batch size of 5 and sampling factor 2. It took 58095.72s and 47232.2s to compute online decomposition for dense and sparse tensor, respectively. On other hand, stateofart methods are unable to handle such large scaled incoming data.
The most interesting comparison, however, is on the real datasets, since they present more challenging cases than the synthetic ones. Table VI shows the comparison between methods. SamBaTen outperforms other stateoftheart approaches in most of the real dataset. In the case of NIPS datset, SamBaTen gives better results compared to the baselines, specifically in terms of CPU Time (faster up to 20 times) and Fitness (better up to 1520%). SamBaTen outperforms for NELL, FacebookWall and FacebookLinks dataset in terms of efficiency comparable to CP_ALS. For the NIPS dataset, SamBaTen is 2530 times faster than OnlineCP method. Due to high dimensions of dataset, RSLT and SDT are unable to execute further. Note that all the real datasets we use are highly sparse, however, no baselines except CP_ALS actually take advantage of that sparsity, therefore, repeated CP_ALS tends to be faster because the baselines have to deal with dense computations which tend to be slower, when the data contain a lot of zeros. Most importantly, however, SamBaTen performed very well on Amazon and Patent dataset, arguably the hardest of the six real datasets we examined and have been analyzed in the literature, where none of the baselines was able to run. These result answer Q1 and Q2 and show that SamBaTen is able to handle large dimensions and sparsity.
Dataset  CPU Time (sec)  Fitness SamBaTen w.r.t  

OnlineCP  SDT  RSLT  SamBaTen  OnlineCP  SDT  RSLT  
NIPS  177.46  372.03  1608.23  1596.07  43.98  0.96  0.98  0.78  0.82 
NELL  8783.27  42325.22  65325.22  63485.98  983.83  0.95  0.81  0.76  0.81 
Facebookwall  3041.98  N/A  N/A  N/A  736.07  0.97  N/A  N/A  N/A 
Facebooklinks  2689.69  N/A  N/A  N/A  343.32  0.96  N/A  N/A  N/A 
Amazon  N/A  N/A  N/A  N/A  4892.07  N/A  N/A  N/A  N/A 
Patent  N/A  N/A  N/A  N/A  8068.27  N/A  N/A  N/A  N/A 
IvD2 Evaluation of Quality Control
I=J=K  200  400  600  800  1000 

w/ GetRank  0.48  0.57  0.58  0.59  0.55 
w/o GetRank  0.46  0.53  0.55  0.54  0.52 
Dataset  Sampling Factor  2  5  10  15  20 

NIPS  w/ GetRank  0.26  0.53  0.45  0.48  0.36 
w/o GetRank  0.24  0.46  0.36  0.24  0.22  
NELL  w/ GetRank  0.48  0.37  0.48  0.43  0.26 
w/o GetRank  0.25  0.16  0.38  0.37  0.24 
Here we evaluate the performance improvement brought by GetRank, as well as investigate the additional computational overhead associated with it. We perform experiments as shown in Figure 7 on different dataset to examine the cost in terms of CPU time (sec) which we pay to compute new rank of incoming slice. To measure the accuracy of GetRank, we compute the Fitness Improvement and the Factor Matching Score (FMS) score. We define FMS score as follows. If and are single component tensors that have been normalized so that their weights are and , then the score is defined as score = penalty * () * ) * … * () where the penalty is defined by the values such that penalty = . FMS score is measured between 0 to 1, with 1 being “perfect” match. More precisely , FMS score is defined as :
(2) 
We compare the performance of SamBaTen without and with GetRank on synthetic data (where we know the actual components) and on real data, where we compute CP_ALS on the full tensor and we set those as ground truth components. We observe that results consistently better in terms of FMS score VIIVIII and Fitness Improvement Figure 7 for synthetics and Figure 8 for real NIPS dataset when we incorporate GetRank in our SamBaTen, without sacrificing runtime significantly. This answers Q3.
IvD3 Tuning of Sampling Factor s
The sampling factor plays an important role in SamBaTen. We performed experiments to evaluate the impact of changing sampling factor on SamBaTen. For these experiments , we fixed batch size to 50 for all datasets. We see in figure 9 that increasing sampling factor results in reduction of CPU time (as sparsity of sub sampled tensor increased) and it reduces the fitness of output upto 23%. In sum, these observations demonstrate that: 1) a suitable sampling factor on subsampled tensor could improve the fitness and result in better tensor decomposition, and 2) the higher sampling factor is, the lower the CPU time. This result partially answers Q4.
IvD4 Influence of Repetition Factor r
We evaluate the performance for parameter setting i.e number of paralleldecompositions. For these experiments, we choose batch size and sampling rate for synthetic dataset and NIPS real world dataset as provided in table II and III, respectively. We can see that with higher values of the repetition factor , FMS score and Relative Fitness (SamBaTen vs CP_ALS) is improved as shown in figure 10. We experiment on varying repetition factor r with Sampling factor s on NIPS real world dataset to check the performance of our method as shown in figure 11. Note that higher FMS score indicates a better decomposition and, similarly, the lower the fitness score, the better decomposition. This result completes the answer to Q4.
V Related Work
In this section, we provide review of the work related to our algorithm. At large, incremental tensor methods in the literature can be categorized into three main categories: 1) Tucker decomposition, 2) CP decomposition, 3) Tensor completion
Tucker Decomposition: Online tensor decomposition was first proposed by Sun el at.[25]
as ITA (Incremental Tensor Analysis). In there research, they described the three variants of Incremental Tensor Analysis. First, DTA i.e. Dynamic tensor analysis which is based on calculation of covariance of matrices in traditional higherorder singular value decomposition in an incremental fashion. Second, with help of SPIRIT algorithm, they found approximation of DTA named as Stream Tensor Analysis (STA). Third, they proposed windowbased tensor analysis (WTA). To improve the efficiency of DTA, it uses a sliding window strategy.
Liu el at.[18] proposed an efficient method to diagonalize the core tensor to overcome this problem. Other approaches replace SVD with incremental SVD to improve the efficiency. Hadi el at. [10] proposed multiaspectstreaming tensor analysis (MASTA) method that relaxes constraint and allows the tensor to concurrently evolve through all modes.
CP Decomposition: There is very limited study on online CP decomposition methods. Phan el at. [21] had developed a theoretic approach GridTF to largescale tensors processing based on an extension to CP’s fundamental mathematics theory. They used divide and concur technique to get subtensors and fuses the output of all factorization to achieve final factor matrices which is proved to be same as decomposing the whole tensor using CP decomposition. Its potential of concurrent computing methods to adapt the engineering applications remains unclear. Sidiropoulos el at.[17], proposed two algorithms that focus on CP decomposition namely SDT (Simultaneous Diagonalization Tracking) that incrementally perform the SVD of the unfolded tensor; and RLST (Recursive Least Squares Tracking), which recursively updates the decomposition factors by minimizing the mean squared error. The most related work to ours was proposed by Zhou, el at. [28] is an online CP decomposition method, where the the latent factors are updated when there are new data.
Tensor Completion
: Related to Tucker and CP decomposition are formulations which are focused on Tensor Completion, i.e., the estimation of missing values in a tensor. The main difference between completion and decomposition techniques is that in completion “zero” values are considered “missing” and are not part of the model, and furthermore, the goal is to impute those missing values accurately, rather than extracting latent factors or subspaces that can describe the existing (observed) data. To the best of our knowledge, the earliest work on incremental tensor completion traces back to
[15], and very recently, Qingquan el at.[24], proposed a streaming tensor completion algorithm based on block partitioning of the tensor.Vi Conclusions
In this paper we introduce SamBaTen, a novel samplebased incremental CP tensor decomposition. We show its effectiveness with respect to approximation quality, with its performance being on par with stateoftheart incremental and nonincremental algorithms, and we demonstrate its efficiency and scalability by outperforming stateoftheart approaches (2530 times faster) and being able to run very large incremental tensors where none of the baselines was able to produce results. In the future we intend to explore different tensor decompositions that can also benefit from our proposed algorithmic framework.
References

[1]
Anima Anandkumar, Prateek Jain, Yang Shi, and U. N. Niranjan.
Tensor vs. matrix methods: Robust tensor decomposition under block
sparse perturbations.
In Arthur Gretton and Christian C. Robert, editors,
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
, volume 51 of Proceedings of Machine Learning Research, pages 268–276, Cadiz, Spain, 09–11 May 2016. PMLR.  [2] Brett W Bader, Tamara G Kolda, et al. Matlab tensor toolbox version 2.6, available online, february 2015, 2015.
 [3] R. Bro. Parafac. tutorial and applications. Chemometrics and intelligent laboratory systems, 38(2):149–171, 1997.
 [4] Rasmus Bro and Henk AL Kiers. A new efficient method for determining the number of components in parafac models. Journal of chemometrics, 17(5):274–286, 2003.
 [5] Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for neverending language learning. In AAAI, volume 5, page 3, 2010.
 [6] J Douglas Carroll and JihJie Chang. Analysis of individual differences in multidimensional scaling via an nway generalization of “eckartyoung” decomposition. Psychometrika, 35(3):283–319, 1970.
 [7] Eric C Chi and Tamara G Kolda. On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications, 33(4):1272–1299, 2012.
 [8] Dóra Erdos and Pauli Miettinen. Walk’n’merge: A scalable algorithm for boolean tensor factorization. In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pages 1037–1042. IEEE, 2013.
 [9] Evangelos E. Papalexakis, Christos Faloutsos, and Nicholas D Sidiropoulos. Parcube: Sparse parallelizable tensor decompositions. In ECMLPKDD’12.
 [10] Hadi FanaeeT and João Gama. Multiaspectstreaming tensor analysis. KnowledgeBased Systems, 89:332–345, 2015.
 [11] A. Globerson, G. Chechik, F. Pereira, and N. Tishby. Euclidean Embedding of Cooccurrence Data. The Journal of Machine Learning Research, 8:2265–2295, 2007.
 [12] R.A. Harshman. Foundations of the parafac procedure: Models and conditions for an” explanatory” multimodal factor analysis. 1970.
 [13] Tamara G Kolda, Brett W Bader, and Joseph P Kenny. Higherorder web link analysis using multilinear algebra. In Data Mining, Fifth IEEE International Conference on, pages 8–pp. IEEE, 2005.
 [14] T.G. Kolda and B.W. Bader. Tensor decompositions and applications. SIAM review, 51(3), 2009.
 [15] Morteza Mardani, Gonzalo Mateos, and Georgios B Giannakis. Subspace learning and imputation for streaming big data matrices and tensors. IEEE Transactions on Signal Processing, 63(10):2663–2677, 2015.
 [16] Julian McAuley and Jure Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems, pages 165–172. ACM, 2013.
 [17] D. Nion and N.D. Sidiropoulos. Adaptive algorithms to track the parafac decomposition of a thirdorder tensor. Signal Processing, IEEE Transactions on, 57(6):2299–2310, 2009.
 [18] Spiros Papadimitriou, Jimeng Sun, and Christos Faloutsos. Streaming pattern discovery in multiple timeseries. In Proceedings of the 31st international conference on Very large data bases, pages 697–708. VLDB Endowment, 2005.
 [19] Evangelos E Papalexakis and Christos Faloutsos. Fast efficient and scalable core consistency diagnostic for the parafac decomposition for big sparse tensors. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pages 5441–5445. IEEE, 2015.
 [20] Evangelos E Papalexakis, Christos Faloutsos, and Nicholas D Sidiropoulos. Tensors for data mining and data fusion: Models, applications, and scalable algorithms. ACM Transactions on Intelligent Systems and Technology (TIST), 8(2):16, 2016.
 [21] Anh Huy Phan and Andrzej Cichocki. Parafac algorithms for largescale problems. Neurocomputing, 74(11):1970–1984, 2011.
 [22] Nikos D Sidiropoulos. Lowrank decomposition of multiway arrays: A signal processing perspective. In Sensor Array and Multichannel Signal Processing Workshop Proceedings, 2004, pages 52–58. IEEE, 2004.
 [23] Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. FROSTT: The formidable repository of open sparse tensors and tools, 2017.
 [24] Qingquan Song, Hancheng Ge Xiao Huang, James Caverlee, and Xia Hu. Multiaspect streaming tensor completion. In Proceedings of the 23th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2017.
 [25] Jimeng Sun, Dacheng Tao, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos. Incremental tensor analysis: Theory and applications. ACM Trans. Knowl. Discov. Data, 2(3):11:1–11:37, October 2008.
 [26] Jos MF ten Berge and Nikolaos D Sidiropoulos. On uniqueness in candecomp/parafac. Psychometrika, 67(3):399–409, 2002.
 [27] Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P Gummadi. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM workshop on Online social networks, pages 37–42. ACM, 2009.
 [28] Shuo Zhou, Nguyen Xuan Vinh, James Bailey, Yunzhe Jia, and Ian Davidson. Accelerating online cp decompositions for higher order tensors. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1375–1384. ACM, 2016.
Comments
There are no comments yet.