Concept Drift in Streaming Tensor Decomposition
Tensor decompositions are used in various data mining applications from social network to medical applications and are extremely useful in discovering latent structures or concepts in the data. Many real-world applications are dynamic in nature and so are their data. To deal with this dynamic nature of data, there exist a variety of online tensor decomposition algorithms. A central assumption in all those algorithms is that the number of latent concepts remains fixed throughout the en- tire stream. However, this need not be the case. Every incoming batch in the stream may have a different number of latent concepts, and the difference in latent concepts from one tensor batch to another can provide insights into how our findings in a particular application behave and deviate over time. In this paper, we define "concept" and "concept drift" in the context of streaming tensor decomposition, as the manifestation of the variability of latent concepts throughout the stream. Furthermore, we introduce SeekAndDestroy, an algorithm that detects concept drift in streaming tensor decomposition and is able to produce results robust to that drift. To the best of our knowledge, this is the first work that investigates concept drift in streaming tensor decomposition. We extensively evaluate SeekAndDestroy on synthetic datasets, which exhibit a wide variety of realistic drift. Our experiments demonstrate the effectiveness of SeekAndDestroy, both in the detection of concept drift and in the alleviation of its effects, producing results with similar quality to decomposing the entire tensor in one shot. Additionally, in real datasets, SeekAndDestroy outperforms other streaming baselines, while discovering novel useful components.READ FULL TEXT VIEW PDF
We propose Enhash, a fast ensemble learner that detects concept
In data stream mining, predictive models typically suffer drops in predi...
In many real-world applications, data are often collected in the form of...
Currently the amount of data produced worldwide is increasing beyond mea...
A popular tool for unsupervised modelling and mining multi-aspect data i...
Common statistical prediction models often require and assume stationari...
We introduce and study knowledge drift (KD), a complex form of drift tha...
Concept Drift in Streaming Tensor Decomposition
Data comes in many shapes and sizes. Many real world applications deal with data that is multi-aspect (or multi-dimensional) in nature. An example of multi-aspect data would be interactions between different users in a social network over period of time. Interactions like who messages whom, who liked whose posts or who shared (re-tweet) whose post. This can be modeled as a three-mode tensor, user-user being two modes of the tensor and time being the third mode, where each data point can be considered as an interaction between two users.
Tensor decomposition has been used in many data mining applications and is an extremely useful tool for finding latent structures in tensor in an unsupervised fashion. There exist a wide variety of tensor decomposition models and algorithms available, interested readers can refer to [9, 13] for details. In this paper, our main focus is on CP/PARAFAC decomposition  (henceforth refered to as CP for brevity), which decomposes a tensor into a sum of rank-one tensors, each one being a latent factor (or concept) in the data. CP has been widely used in many applications, due to its ability to uniquely uncover latent components in a variety of unsupervised multi-aspect data mining applications .
In today’s world data is not static, data keeps on evolving over time. In real world applications like stock market and e-commerce websites hundred of transaction (if not thousands) takes place every second, or in applications like social media where every second, thousands of new interactions take place forming new communities of users who interact with each other. In this example, we consider each community of people within the graph as a concept.
There has been a considerable amount of work in dealing with online or streaming CP decomposition [16, 6, 11], where the goal is to absorb the updates to the tensor in the already computed decomposition, as they arrive, and avoid recomputing the decomposition every time new data arrives. However, despite the already existing work in the literature, a central issue has been left, to the best of our knowledge, entirely unexplored. All of the existing online/streaming tensor decomposition literature assumes that the concepts in the data (whose number is equal to the rank of the decomposition) remains fixed throughout the lifetime of the application. What happens if the number of components changes? What if a new component is introduced, or an existing component splits into two or more new components? This is an instance of concept drift in unsupervised tensor analysis, and this paper is a look at this problem from first principles.
Our contributions in this paper are the following:
Characterizing concept drift in streaming tensors: We define concept and concept drift in time evolving tensors and provide a quantitative method to measure the concept drift.
Algorithm for detecting and alleviating concept drift in streaming tensor decomposition: We provide an algorithm which detects drift in the streaming data and also updates the previous decomposition without any assumption on the rank of the tensor.
Experimental evaluation on real & synthetic data: We extensively evaluate our method on both synthetic and real datasets and out-perform state of the art methods in cases where the rank is not known a priori and perform on par in other cases.
Reproducibility: Our implementation is made publicly available222https://github.com/ravdeep003/conceptDrift for reproducibility of experiments.
Tensor is collection of stacked matrices () with dimension , where and represents rows and columns of matrix and represents number of views. In other words, a tensor is a higher order abstraction of a matrix. For simplicity, we call the term “dimension” as “mode” of tensor, where “modes” are the numbers of views used to index the tensor. The rank() is the minimum number of rank-1 tensors computed from its latent components which are required to re-produce as their sum. Table 1 represents the notations used throughout the paper.
Tensor, Matrix, Column vector, Scalar
|Set of Real Numbers|
|Frobenius norm, norm|
|Khatri-Rao product (column-wise Kronecker product )|
Tensor Batch: A batch is a (N-1)-mode partition of tensor where size is varied only in one mode and other modes remain unchanged. Here, tensor is of dimension and existing tensor is of dimension . The full tensor where its temporal mode . The tensor can be partitioned into horizontal (I,:,:) , lateral (:,J,:), and frontal (:,:,K) mode.
CP decomposition: The most popular and extensively used tensor decompositions is the Canonical Polyadic or CANDECOMP/PARAFAC decomposition, referred to as CP decomposition henceforth. Given a 3-mode tensor of dimension , and rank at most can be written
, , and and . For tensor approximation, we adopted minimizing least square criteria as where is the sum of squares of its all elements and is Frobenius (norm). The CP model is nonconvex in and . We refer interested readers to popular surveys [9, 13] on tensor decompositions and its applications for more details.
Let us consider a social media network like Facebook, where a large number of users () update information every single minute, and Twitter, where about users tweet every minute333https://mashable.com/2012/06/22/data-created-every-minute/. Here, we have interactions arriving continuously at high velocity, where each interaction consists of User Id, Tag Ids , Device, and Location information etc. How can we capture such dynamic user interactions? How to identify concepts which can signify a potential newly emerging community, complete disappearance of interactions, or a merging of one or more communities to a single one? When using tensors to represent such dynamically evolving data, our problem falls under “streaming” or “online” tensor analysis. Decomposing streaming or online tensors is challenging task, and concept drift in incoming data makes the problem significantly more difficult, especially in applications where we care about characterizing the concepts in the data, in addition to merely approximating the streaming tensor adequately.
Before we conceptualize the problem that our paper deals with, we define certain terms which are necessary to set up the problem. Consider and be two incremental batches of a streaming tensors of rank and respectively. Let be the initial tensor at time and be the batch of the streaming tensor which arrives at time such as . The CP decomposition for these two tensors is given as follows:
Concept: In case of tensors, we define concept as one latent component; a sum of such components make up the tensor. In above equations tensor and has R and F concepts respectively.
Concept Overlap: We define concept overlap as the set of latent concepts that are common or shared between two streaming CP decompositions. Consider Figure 1 where and both are equal to three, which means both tensors and have three concepts. Each concept of corresponds to each concept of . This means that there are three concepts that overlap between and . The minimum and maximum number of concept overlaps between two tensors can be zero and respectively. Thus, the value of concept overlap lies between 0 and . In Section 3 we propose an algorithm for detecting such overlap.
New Concept: If there exists a set of concepts which are not similar to any of the concepts already present in the most recent tensor batch, we call all such concepts in that set as new concepts. Consider Figure 2, where has two concepts and has three concepts . We see that at time tensor batch has three concepts, out of which, two match with tensor concepts and one concept(namely concept 3) does not match with any concept of . In this scenario we say that concept and are overlapping concepts and concept is a new concept.
Missing Concept: If there exists a set of concepts which was present at time , but was missing at future time , we call the concepts in the set missing concepts. For example, consider Figure 2, at time , the CP decomposition of has three concepts, and at time CP decomposition of has two concepts. Two concepts of and match with each other and one concept, present at , is missing at ; we label that concept, as missing concept.
Running Rank: Running Rank (runningRank) at time is defined as the total number of unique concepts (or latent components) seen until time . Running Rank is different from tensor rank of a tensor batch. It may or may not be equal to rank of the current tensor batch. Consider Figure 1, runningRank at time is three, since the total unique number of concepts seen until is three. Similarly runningRank of Figure 2 at time is three, even though rank of is two, since the number unique concepts seen until is three.
Let us assume rank of the initial tensor batch at time is and rank of the subsequent tensor batch at time is . Then runningRank at time is sum of running rank at and number of new concepts discovered from to . At time running rank is equal to initial rank of the tensor batch in this case .
: Concept drift is usually defined in terms of supervised learning[3, 14, 15]. In 
, authors define concept drift in unsupervised learning as the change in probability distribution of a random variable over time. We define concept drift in the context of latent concepts, which is based on rank of the tensor batch. We first give an intuitive description of concept in terms of running rank, and then define concept drift.
Intuition: Consider running rank at time be and running at time be . If is not equal to , then there is a concept drift i.e. either a new concept has appeared, or a concept has disappeared. However, this definition does not capture every single case. Assume if is equal to . In this case, there is no drift only when there is a complete overlap. However there may be concept drift present even if is equal to , since a concept might disappear while runningRank remains the same.
Definition: Whenever a new concept appears, a concept disappears, or both from time to , this phenomenon is defined as concept drift.
In a streaming tensor application, a tensor batch arrives at regular intervals of time. Before we decompose a tensor batch to get latent concepts, we need to know the rank of the tensor. Finding tensor rank is a hard problem  and it is beyond the scope of this paper. There has been considerable amount of work which approximates rank of a tensor[12, 10]. In this paper we employ AutoTen 
to compute a low rank of a tensor. As new advances in tensor rank estimation happen, our proposed method will also benefit.
Given (a) tensor of dimensions and rank , (b) of dimensions of rank at time and respectively as shown in figure 3. Compute of dimension of rank equal to runningRank at time as shown in equation using factor matrices of and .
Consider a social media application where thousands of connections are formed every second, for example, who follows whom or who interacts with whom. These connections formed can be viewed as forming communities. Over a period of time communities disappear, new communities appear or some communities re-appear after sometime. Number of communities at any given point of time is dynamic. There is no way of knowing what communities will appear or disappear in future. When this data stream is captured as a tensor, communities refer to latent concepts and appearing and disappearing of communities over a period of a time is referred to as concept drift. Here we need a dynamic way of figuring out number of communities in a tensor batch rather than assuming constant number of communities in all tensor batches.
To the best of our knowledge, there is no algorithmic approach that detects concept drift in streaming tensor decomposition. As we mentioned in Section 1, there has been considerable amount of work [6, 16, 11] which deals with streaming tensor data and applies batch decomposition on incoming slices and combine the results. But these methods don’t take change of rank in consideration, which could reveal new latent concept in the data sets. Even if we know the rank(latent concept) of the complete tensor, the tensor batches of that tensor might not have same rank as the complete tensor.
In this paper we propose SeekAndDestroy, a streaming CP decomposition algorithm that does not assume rank is fixed. SeekAndDestroy detects the rank of every incoming batch in order to decompose it, and finally, updates the existing decomposition after detecting and alleviating concept drift, as defined in Section 2.
An integral part of SeekAndDestroy is detecting different concepts and identifying concept drift in streaming tensor. In order to do this successfully, we need to solve following problems:
Finding the rank of a tensor batch.
Finding New Concept, Concept Overlap and Missing Concept between two consecutive tensor batch decomposition.
Updating the factor matrices to incorporate the new and missing concepts along with concept overlaps.
Finding Number of Latent Concepts: Finding the rank of the tensor is beyond the scope of this paper, thus we employ AutoTen . Furthermore, in Section 4, we perform our experiments on synthetic data where we know the rank (and use that information as given to us by an “oracle”) and repeat those experiments using AutoTen, comparing the error between them; the gap in quality signifies room for improvement that SeekAndDestroy will reap, if rank estimation is solved more accurately in the future.
Finding Concept Overlap: Given a rank of tensor batch, we compute its latent components using CP decomposition. Consider Figure 3 as an example. At time , the number of latent concepts we computed is represented by , and we already had components before new batch arrived. In this scenario, there could be three possible cases: (1) (2) (3) .
For each one of the cases mentioned above, there may be new concepts appear at , or concepts disappear from to , or there could be shared concepts between two decompositions. In Figure 3. we see that, even though is equal to , we have one new concept, one missing concept and two shared/overlapping concepts. Now, at time , we have four unique concepts, which means our runningRank at is four.
In order to discover which concepts are shared, new, or missing we use the Cauchy-Schwarz inequality which states for two vectors a and b we have . Algorithm 2 provides the general outline of technique used in finding concepts. It takes a column-normalized matrices and of size and respectively as input. We compute the dot product for all permutations of columns between two matrices, as shown below
and are the respective columns. If the computed dot product is higher than the threshold value, the two concepts match, and we consider them as shared/overlapping between and . If the dot product between a column in and with all the columns in has a value less than the threshold, we consider it as a new concept. This solves problem P2. In the experimental evaluation, we demonstrate the behavior of SeekAndDestroy with respect to that threshold.
SeekAndDestroy: This is our overall proposed algorithm, which detects concept drift between the two consecutive tensor batch decompositions, as illustrated in Algorithm 1 and updates the decomposition in a fashion robust to the drift. SeekAndDestroy takes factor matrices(, , ) of previous tensor batch (say at time ), running rank at () and new tensor batch() (say at time ) as inputs. Subsequently, SeekAndDestroy computes the tensor rank for the batch (batchRank) for using AutoTen.
Using the estimated rank batchRank, SeekAndDestroy computes the CP decomposition of , which returns factor matrices . We normalize the columns of to unit norm and we store the normalized matrices into normMatA, normMatB, and normMatC, as shown by lines 3-4 of Algorithm 1. Both and normalized matrix A are passed to function as described above. This returns the indexes of new concept and indexes of overlapping concepts from both matrices. Those indexes inform SeekAndDestroy, while updating the factor matrices, where to append the overlapped concepts. If there are new concepts, we update and factor matrices simply by adding new columns from normalized factor matrices of as shown in lines 9-10 of Algorithm 1. Furthermore, we update the running rank by adding number of new concept discovered to the previous running rank. If there is only overlapping concepts and no new concepts, then and factor matrices does not change.
Updating Factor Matrix C: In this paper, for simplicity of exposition, we are focusing on streaming data that are increasing only on one mode. However, our proposed method readily generalizes to cases where more than one modes grow over time.
In order to update the “evolving” factor matrix ( in our case), we use a different technique from the one used to update and . If there is a new concept discovered in normMatC then
where is of size , is of size and is of size .
If there are overlapping concepts, then we update accordingly as shown below; in this case is again of size .
If there are missing concepts we append an all-zeros matrix (column vector) to those indexes.
The Scaling Factor : When we reconstruct the tensor from updated factor (normalized) matrices, we need a way to re-scale the columns of those factor matrices. In our approach we compute element wise product on normalized columns of factor matrices (, , ) of as shown in line 5 of Algorithm 1. We use the same technique as the one used in updating C matrix, in order to match the values between two consecutive intervals, and we add this value to previously computed values. If it is a missing concept, we simply add zero to it. While reconstructing the tensor we take the average of vector over the number of batches received and we re-scale the components as follows
We evaluate our algorithm on the following criteria:
Q1: Approximation Quality: We compare SeekAndDestroy’s reconstruction accuracy against state-of-the-art streaming baselines, in data that we generate synthetically so that we observe different instances of concept drift. In cases where SeekAndDestroy outperforms the baselines, we argue that this is due to the detection and alleviation of concept drift.
Q2: Concept Drift Detection Accuracy: We evaluate how effectively SeekAndDestroy is able to detect concept drift in synthetic cases, where we control the drift patterns.
Q3: Sensitivity Analysis: As shown in Section 3, SeekAndDestroy expects the matching threshold as a user input. Furthermore, its performance may depend on the selection of the batch size. Here, we experimentally evaluate SeekAndDestroy’s sensitivity along those axes.
Q4: Effectiveness on Real Data: In addition to measuring SeekAndDestroy’s performance in real data, we also evaluate its ability to identify useful and interpretable latent concepts in real data, which elude other streaming baselines.
We implemented our algorithm in Matlab using tensor toolbox library  and we evaluate our algorithm on both synthetic and real data.We use  method available in literature to find rank of incoming batch.
In order to have full control of the drift phenomena, we generate synthetic tensors with different ranks for every tensor batch, we control the batch rank of the tensor with factor matrix C. Table 2 shows the specification of the datasets created. For instance dataset SDS2 has an initial tensor batch whose tensor rank is and last tensor batch whose tensor rank is (full rank). The batches in between the initial and final tensor batch can have any rank between initial and final rank(in this case 2-10). The reason we assign the final batch rank as the full rank is to make sure the tensor created is not rank deficient. We make the synthetic tensor generator available as part of our code release.
|DataSet||Dimension||Initial Rank||Full Rank||Batch Size||Matching Threshold|
|SDS1||100 x 100 x 100||2||5||10||0.6|
|SDS3||300 x 300 x 300||2||5||50||0.6|
|SDS5||500 x 500 x 500||2||5||100||0.6|
In order for us to obtain robust estimates of performance, we require all experiments to either 1) run for 1000 iterations, or 2) the standard deviation converges to a second significant digit (whichever occurs first). For all reported results, we use the median and the standard deviation.
We evaluate SeekAndDestroy and the baselines methods using relative error. Relative Error provides the measure of effectiveness of the computed tensor with respect to the original tensor and is defined as follows (lower is better):
To evaluate our method, we compare SeekAndDestroy with two state-of-the-art streaming baselines: OnlineCP  and SamBaTen . Both baselines assume that the rank remains fixed throughout the entire stream. When we evaluate the approximation accuracy of the baselines, we run two different versions of each method, with different input ranks: 1) Initial Rank, which is the rank of the initial batch, same as the one that SeekAndDestroy uses, and 2) Full Rank, which is the “oracle” rank of the full tensor, if we assume we could compute that in the beginning of the stream. Clearly, Full Rank offers a great advantage to the baselines since it provides information from the future.
The first dimension that we evaluate is the approximation quality. More specifically, we evaluate whether SeekAndDestroy is able to achieve good approximation of the original tensor (in the form of low error) in case where concept drift is occurring in the stream. Table 3 contains the general results of SeekAndDestroy’s accuracy, as compared to the baselines. We observe that SeekAndDestroy outperforms the two baselines, in the pragmatic scenario where they are given the same starting rank as SeekAndDestroy (Initial Rank). In the non-realistic, “oracle” case, OnlineCP performs better than SamBaTen and SeekAndDestroy, however this case is a very advantageous lower bound on the error for OnlineCP.
|DataSet||OnlineCP (Initial Rank)||OnlineCP (Full Rank)||SamBaTen (Initial Rank)||SamBaTen (Full Rank)||SeekAndDestroy|
Through extensive experimentation we made the following interesting observation: in the cases where most of the concepts in the stream appear in the beginning of the stream (e.g., in batches 2 and 3), SeekAndDestroy was able to further outperform the baselines. This is due to the fact that, if SeekAndDestroy has already “seen” most of the possible concepts early-on in the stream, it is more likely to correctly match concepts in later batches of the stream, since there already exists an almost-complete set of concepts to compare against. Indicatively,in this case SeekAndDestroy achieved where as OnlineCP achieved .
The second dimension along which we evaluate SeekAndDestroy is its ability to successfully detect concept drift. Figure 4 shows the rank discovered by SeekAndDestroy at every point of the stream, plotted against the actual rank. We observe that SeekAndDestroy is able to successfully identify changes in rank, which, as we have already argued, signify concept drift. Furthermore, Table 4(b) shows three example runs that demonstrate the concept drift detection accuracy.
The results we have presented so far for SeekAndDestroy have used a matching threshold of 0.6. The threshold was chosen because it is intuitively larger than a 50% match, which is a reasonable matching threshold. In this experiment, we investigate the sensitivity of SeekAndDestroy to the matching threshold parameter. Table 4(a) shows exemplary approximation errors for thresholds of 0.4, 0.6, and 0.8. We observe that 1) the choice of threshold is fairly robust for values around 50%, and 2) the higher the threshold, the better the approximation, with threshold of 0.8 achieving the best performance.
0.6 0.2530.041 0.221 0.042
0.8 0.101 0.040 0.033 0.011
|Running Actual Predicted Approx. Error Rank Rank Rank Actual Predicted Rank Rank 6 [2,4,3,4,3,3,5,3,3,5] [2,4,3,4,3,3,5,3,3,6] 0.185 0.194 6 [2,4,3,4,3,3,5,3,3,5] [2,4,3,4,3,3,5,3,3,6] 0.185 0.197 7 [2,4,3,4,3,3,5,3,3,5] [2,4,3,5,3,3,6,3,3,6] 0.185 0.278|
|70.88||40.57||22||0.68 0.002||0.759 0.059||0.941 0.001|
To evaluate effectiveness of our method on real data, we use the Enron time-evolving communication graph dataset . Our hypothesis is that in such complex real data, there should exists concept drift in streaming tensor decomposition. In order to validate that hypothesis, we compare the approximation error incurred by SeekAndDestroy against the one incurred by the baselines, shown in Table 5. We observe that the approximation error of SeekAndDestroy is lower than the two baselines. Since the main difference between SeekAndDestroy and the baselines is that SeekAndDestroy takes concept drift into consideration, and strives to alleviate its effects, this result 1) provides further evidence that there exists concept drift in the Enron data, and 2) demonstrates SeekAndDestroy’s effectiveness on real data.
The final rank for Enron as computed by SeekAndDestroy was 7, indicating the existence of 7 time-evolving communities in the dataset. This number of communities is higher than what previous tensor-based analysis has uncovered [1, 5]. However, analyzing the (static) graph using a highly-cited non-tensor based method , we were able to detect 7 communities, therefore SeekAndDestroy may be discovering subtle communities that have eluded previous tensor analysis. In order to verify that, we delved deeper into the communities and we plot their temporal evolution (taken from matrix ) along with their annotations (when inspecting the top-5 senders and receivers within each community). Indeed, a subset of the communities discovered matches with the ones already known in the literature [1, 5]. Additionally, SeekAndDestroy was able to discover community #3, which refers to a group of executives, including the CEO. This community appears to be active up until the point that the CEO transition begins, after which point it dies out. This behavior is indicative of concept drift, and SeekAndDestroy was able to successfully discover and extract it.
Tensor decomposition: Tensor decomposition techniques are widely used for static data. With the explosion of big data, data grows at a rapid speed and an extensive study required on the online tensor decomposition problem. Sidiropoulos  introduced two well-known PARAFAC based methods namely RLST (recursive least square) and SDT (simultaneous diagonalization tracking) to address the online 3-mode tensor decomposition. Zhou et al.  proposed OnlineCP for accelerating online factorization that can track the decompositions when new updates arrived for N-mode tensors. Gujral et al.  proposed Sampling-based Batch Incremental Tensor Decomposition algorithm which updates online computation of CP/PARAFAC and performs all computations in the reduced summary space. However, no prior work addresses concept drift.
Concept Drift: The survey paper  provides the qualitative definitions of characterizing the drifts on data stream models. To the best of our knowledge, however, this is the first work to discuss concept drift in tensor decomposition.
In this paper we introduce the notion of “concept drift” in streaming tensors. and provide SeekAndDestroy, an algorithm which detects and alleviates concept drift it without making any assumption on the rank of the tensor. SeekAndDestroy outperforms other state-of-the-art methods when the rank is unknown and is effective in detecting concept drift. Finally, we apply SeekAndDestroy on a real time-evolving dataset, discovering novel drifting concepts.
Research was supported by the Department of the Navy, Naval Engineering Education Consortium under award no. N00174-17-1-0005, the National Science Foundation EAGER Grant no. 1746031, and by an Adobe Data Science Research Faculty Award. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding parties.