The ubiquity of multi-viewed data has increased the importance of advanced mining algorithms that aim at forecasting future events Bai_undated-kl,wu2019trend,deng2020dynamic, such as forecasting taxi-ride counts based on their departure and destination locations, and time information, where the events can be represented as a stream of tuples . Infection spread rates during epidemics can also be represented in a similar form as , and the task is to forecast the infection count. Given such a large data stream produced by user events, how can we effectively forecast future events? How can we extract meaningful patterns to improve forecasting accuracy? Intuitively, the problem we wish to solve is as follows:
Given an event data stream that contains seasonal patterns, forecast the number of future events between each pair of entities in a streaming setting.
As natural periodicity occurs in several types of data, one of the promising directions to understand the vital dynamics of a data stream is to consider seasonal phenomena. However, high-dimensional data structures in sparse time-series pose many challenging tasks to uncover such valuable seasonal patterns chi2012tensors, while modeling seasonality or periodic patterns has been a well-studied topic in literature Takahashi2017-qb. Moreover, in a streaming setting, data dynamics vary over time, and thus the models learned with historical data can be ineffective as we observe new data. Although online learning schemes, such as schaul2013no, jothimurugesan2018variance, Yang2018-bo,nimishakavi2018inductive, Lu2019-mz, provide solutions to overcome this concern, disregarding all past dynamics fails to capture re-occurring patterns in the subsequent processes. On non-stationary data streams, such approaches no longer converge to the optimal parameters but capture average patterns for several phenomena, which causes less accurate forecasting. Adequate methods should retain multiple patterns where each pattern is updated incrementally for prevailing tendencies and leveraged for forecasting.
In this paper, we focus on online matrix factorization for real-time forecasting with seasonality. Specifically, we propose a streaming method, Shifting Seasonal Matrix Factorization (SSMF), which allows the factorization to be aware of multiple smooth time-evolving patterns as well as abrupt innovations in seasonal patterns, which are retained as regimes. In summary, our contributions are:
Accuracy: Outperforms state-of-the-art baseline methods by accurately forecasting -steps ahead of future events.
Scalability: Scales linearly with the number of regimes, and uses constant time and memory for adaptively factorizing data streams.
Effectiveness: Finds meaningful seasonal patterns (regimes) throughout the stream and divides the stream into segments based on incrementally selected patterns. SSMF uses a lossless data compression scheme to determine the number of regimes.
Reproducibility: Our datasets and source code are publicly available at: https://www.github.com/kokikwbt/ssmf
2 Related Work
summarizes the relative advantages of SSMF, compared with existing methods for matrix factorization and time series forecasting. Our proposed method satisfies all the requirements. The following two subsections provide the details of related work on matrix/tensor factorization and time series modeling.
2.1 Matrix Factorization
The major goal of matrix/tensor factorization, well known as SVD, is to obtain lower-dimensional linear representation that summarizes important patterns chi2012tensors, and it has a wide range of applications hoang2017gpop,yelundur2019detection,tivsljaric2020spatiotemporal. Non-negative constraint provides an ability to handle sparse data and find interpretable factors cichocki2009nonnegative. Takahashi2017-qb adapted tensor factorization to extract seasonal patterns in time series as well as the optimal seasonal period in an automatic manner. yu2016temporal formulated the temporal regularized matrix factorization (TRMF), which is designed for time series prediction. Song2017-hs,Najafi2019-uc addressed stream tensor completion but didn’t extract seasonal patterns. Recently, SMF hooi2019smf proposed an online method that incorporates seasonal factors into the non-negative matrix factorization framework, and outperforms CPHW dunlavy2011temporal, which is an offline matrix factorization method. However, SMF misses regime shifts which should be modeled individually.
2.2 Time Series Analysis
Statistical models, such as autoregression (AR) and Kalman filter (KF), learn temporal patterns for forecasting, thus a lot of their extensions have been proposed cai2015facets,dabrowski2018state. RegimeCast Matsubara2016-fx focuses on regime detection to obtain compact factors from data streams effectively but it does not consider updating detected regimes. Also, note that they do not consider mining sparse data in the non-linear dynamical systems, while handling missing values has a significant role in time series analysis yi2016st,abedin2019sparsesense. Hidden Markov models are also powerful tools for change detection in time series mongillo2008online,montanez2015inertial,Pierson2018-nq,kawabata2019automatic, but these models represent only short temporal dependency and thus are ineffective for forecasting tasks. Detecting regime shifts in non-stationary data can be categorized in Concept Drift, and its theory and recent contributions are well-summarized in Lu2019-mz. In more detail, we consider incremental drift and reoccurring concepts simultaneously. wen2019robuststl tried to decompose seasonality and trends in time series on the sparse regularization framework to handle fluctuation and abrupt changes in seasonal patterns, but this approach does not focus on stream mining krempl2014open.
3 Proposed Model
In this section, we propose a model for shifting seasonal matrix factorization. A tensor we consider consists of a timestamped series of matrices , which can be sparse, until the current time point . We incrementally observe a new matrix and evolves (). Our goal is to forecast where by uncovering important factors in the flow of data, whose characteristics can change over time. As we discussed the effectiveness of handling seasonal patterns, we incorporate seasonal factors into a switching model that can adaptively recognize recent patterns in forecasting.
3.1 Modeling Shifting Seasonal Patterns
Figure 1 shows an overview of our model. In matrix factorization, we assume that each attribute, e.g., a start station where we have a taxi ride, has latent communities/activities, which is smaller than and , e.g., the number of stations. Considering such latent seasonal activities, an input data stream is decomposed into the three factors: with shape and with shape for the row and columns of data , respectively, and seasonal patterns with period , i.e., , where is the number of components to decompose. The
-th row vectorof corresponds to the -th season. Letting be a diagonal matrix (zeros on non-diagonal entries) with diagonal elements as , is approximated as follows.
More importantly, we extend by stacking additional to capture multiple discrete seasonal patterns as regimes, which is shown as the red tensor in Figure 1. Letting be the number of regimes, our model employs a seasonality tensor , where each slice captures the -th seasonal pattern individually. By allowing all these components to vary over time, can be represented as follows.
where shows an optimal regime index to represent in the -th season. That is, our approach adaptively selects
, as shown in the red-shaded vector in the figure, with regard to the current season and regime. Finally, the full parameter set we want to estimate is as follows.
Problem 2 (Shifting seasonal matrix factorization: SSMF)
Given: a tensor stream that evolves at each time point, Maintain: community factors: , seasonal factors: , and the number of regime in a streaming fashion.
3.2 Streaming Lossless Data Compression
In the SSMF model, deciding the number of regimes, , is a crucial problem for accurate modeling because it should change as we observe a new pattern to shift. It is also unreliable to obtain any error thresholds to decide the time when employing a new regime before online processing. Instead, we want the model to employ a new regime incrementally so that it can keep concise yet effective to summarize patterns. The minimum description length (MDL) principle grunwald2007minimum realizes this approach, which is defined as follows.
where is the total MDL cost of when given a set of factors, and .
Model description cost, , measures the complexity of a given model, based on the following equations.
where shows the number of nonzero elements in a given matrix/tensor, is the float point cost111We used bits..
Data encoding cost, , measures how well a given model compress original data, for which the Huffman coding scheme adler01towards assigns a number of bits to each element in . Letting
be a reconstructed data, the data encoding cost is represented by the negative log-likelihood under a Gaussian distribution with mean
and varianceover errors, as follows.
While we cannot compute the cost with all historical data in a streaming setting, a straightforward approach obtains the best number of regimes so that the total description cost is minimized. To compute the total cost incrementally and individually when provide with new data, we decompose approximately into:
When a new matrix arrives, we decide whether the current should increase the number . by comparing the total costs for existing regimes with the total costs for estimated regime.
If gives the minimum total cost of , then it belongs to the full for subsequent modeling. In other words, the size of is encouraged to be small unless we find significant changes in seasonal patterns.
4 Proposed Algorithm
We propose an online algorithm for SSMF on large data streams containing seasonal patterns. Algorithm 1 summarizes the overall procedure of our proposed method: it first detects the best regime for the most recent season through a two-step process, namely, RegimeSelection and RegimeExtraction. Then it smoothly updates the entire factors based on the selected regime with non-negative constraints. When deciding the number and assignment of regimes, the algorithm evaluates the cost, Equation (missing), for the most recent season, to ensure the smoothness of consecutive regime-switching during a season. Thus, the algorithm keeps a sub-tensor , which consists of the matrices from time point to , as a queue while receiving a new . Given a , the best regime at time point is obtained as follows.
RegimeSelection, which aims to choose the best regime that minimizes Equation (missing) in , using the two previous factors and . We compute the total cost, by using and for .
RegimeExtraction, which obtains a new regime , focuses on current seasonality, by applying the gradient descent method to and over the whole . The initialization with prevents from overfitting to because temporal data, , can be sparse. It then computes with and for .
If is less than , the best regime index becomes one of . Otherwise, , is extended to , and then is incremented.
4.1 Regime-Aware Gradient Descent Method
Given a new observation , we consider obtaining the current factors, via the previous ones by minimizing error with respect to . This allows the model to fit current dynamics smoothly while considering the past patterns. Thus, our algorithm performs the gradient descent method with a small learning rate . The gradient step keeps the error with respect to low. Letting be the current regime index, and be the predicted matrix of with the previous parameters:
the gradient update to and is given by:
The updated and are projected such that
to keep the parameters non-negative. After all, the algorithm normalizes and by dividing by their norms and , respectively:
and then multiplied by the two norms keeps the normalization constraint:
which captures the seasonal adjustments for the -th component.
To forecast -steps ahead values for a given , we use the latest factors, , , and , where is the most recent season observed before time , which corresponds to the season at time . Namely, . By choosing the best regime , which gives the minimum description cost in the last one season, the predicted matrix for time is thus computed as follows.
4.3 Theoretical Analysis
We show the time and memory complexity of SSMF. The key advantage is that both of the complexities are constant with regard to a given tensor length, even if it evolves without bound.
The time complexity of SSMF is per time point.
The algorithm first computes the model description cost between and using regimes in , which is repeated over . To obtain a new regime, it performs the gradient descent update to , which takes . Then, for a selected regimes, it performs the gradient descent update, which requires . The projection to non-negative values and normalization require for loops, thus in total. Overall, the time complexity of SSMF is per time point.
The memory complexity of SSMF is per time point.
SSMF retains the two consecutive factors, and with shape and , respectively, which requires . For seasonal factors in regimes, it maintains a tensor with shape . It also needs a space to retain a sparse tensor, , which is the number of observed elements. Therefore, the total memory space required for SSMF is per time point.
In this section, we show how our proposed method works on real tensor streams. The experiments were designed to answer the following questions.
Accuracy: How accurately does it predict future events?
Scalability: How does it scale with input tensor length?
Effectiveness: How well does it extract meaningful latent dynamics/components?
We implemented our algorithm in Python (ver. 3.7.4) and all the experiments were conducted on an Intel Xeon W- GHz quad-core CPU with GB of memory and running Linux. Table 2 shows the datasets used to evaluate our method:
NYC-YT.222https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page NYC Yellow Taxi trip records during the term from 2020-01-01 to 2020-06-30 in the pairs of pick-up and drop-off locations.
NYC-CB.333https://s3.amazonaws.com/tripdata/index.html NYC CitiBike ride trips from 2017-03-01 to 2021-03-01. We only considered stations that had at-least k rentals during the term, and extracted stations. We then categorized users by age from to and obtained the stream of ride counts at the stations.
DISEASE.444https://www.tycho.pitt.edu/data/ A stream of the weekly number of infections by kinds of diseases at the states/regions in the US, which were collected from 1950-01-01 to 2014-07-27.
We first describe a comparison in terms of forecasting using Root Mean Square Error (RMSE). Since hooi2019smf outperforms several baselines including CPHW and TSVDCWT, we omit the baselines from our experiments. Our baseline methods include (1) matrix/tensor factorization approaches: NCP xu2013block: a non-negative CP decomposition, optimized by a block coordinate descent method for sparse data. (2) SMF hooi2019smf: an online algorithm for seasonal matrix factorization, which manages a single seasonal factor. (3) TRMF yu2016temporal: a method to take into account temporal and seasonal patterns in matrix factorization.
For each method, the initialization process was conducted on the first three seasons of an input tensor. For TRMF, we searched for the best three regularization coefficients, , , , in . Auto-coefficients of TRMF are set for at time points as well as at three time points to consider seasonality. For SSMF and SMF, we determined a learning rate in by cross-validation in each training data. The number of components was set to among all these methods for a fair comparison. After the initialization, each algorithm takes the first time steps, and forecasts the next steps, as defined in Table 2. This procedure was repeated for every steps, for example, in NYC-YT dataset.
Figure 3 shows the results of RMSE on all three datasets, where SSMF consistently outperforms its baselines. We observe that our method shows a better improvement on (b) NYC-CB than (a) NYC-YT, whereas the two datasets have similar characteristics because detecting regime shifts is more effective in forecasting in a longer data duration. DISEASE dataset contains repetitive regime shifts caused by pandemics and developments of vaccines, therefore, SSMF performs forecasting effectively.
|(a) NYC-YT||(b) NYC-CB||(c) DISEASE|
We discuss the performance of SSMF in terms of computational time on the three datasets. Figure 3 shows the wall clock time vs. data stream length, where the experimental setting is same as of subsection 5.1. The figures show that our proposed method is scalable for mining and forecasting large sparse streams. Although the scalability of SSMF is linear along the number of regimes, the growth rate of its running time is greatly less than that of the other offline baselines. This is because the proposed method can summarize distinct seasonal patterns as regimes effectively. SMF is the fastest algorithm but it does not consider regime shifts in online processing. As described in the accuracy comparison section, our method is more effective when data streams have significant pattern shifts, which cannot be captured only with smooth updates on a single, fixed component.
5.3 Real-world Effectiveness
In this section, we show an example of the real-world effectiveness of our approach using the NYC-YT dataset. Figure 4 shows the overview of the two representative regimes detected by SSMF, where we selected two of the fifteen components to describe. The bottom bar represents the timeline of regime detection. SSMF kept using the initial regime during the blue term, while it generated new regimes during the red term. The three rows of plots are the visualization of , , , where the first two columns correspond to the blue regimes, and the next two to the red.
In mid March 2020, New York City’s governor issued stay-at-home orders, and SSMF detected the event as the regime shifts to adapt to the changed demands of taxi rides. Separating regimes and learning individual patterns help us to interpret the results as follows: Component 1 of the blue regime shows a seasonal pattern that follows a weekday/weekend pattern. According to the above two maps, more taxi rides are concentrated around Manhattan before the pandemic. In the red regime, however, the strength of the seasonal component became about % weaker than the one before. Accordingly, the frequent pickup and dropoff became smaller. This area has a large number of bars and restaurants, suggesting that this component corresponds roughly to entertainment-related trips, whose demands are strongly affected by the pandemic. Component 2, in contrast, is concentrated between Manhattan and Kennedy Airport, suggesting that this component captures trips to and fro the airport. Under the pandemic situation, this component shifted toward trips from the airport to destinations scattered around New York City. This observation could suggest that customers arrived at the airport, and then used a taxi to avoid public transit. After this phase, our method switched to the first regime again, returning to the previous seasonal pattern. The demand for taxi rides began to recover, which roughly aligns with reopening in New York City, which began in early June. Consequently, regime shifts allow us to distinguish significant changes in seasonal behavior as well as contextual information between attributes. In online learning, the mechanism also supports for quick adaptation and updates, which contributes to improving forecasting accuracy.
In this paper, we proposed Shifting Seasonal Matrix Factorization (SSMF) method, for effectively and efficiently forecasting future activities in data streams. The model employs multiple distinct seasonal factors to summarize temporal characteristics as regimes, in which the number of regimes is adaptively increased based on a lossless data encoding scheme. We empirically demonstrate that on three real-world data streams, our forecasting method SSMF has the following advantages over the baseline methods: (a) By using the proposed switching mechanism in an online manner, SSMF accurately forecasts future activities. (b) Scales linearly by learning matrix factorization in constant time and memory. (c) Helps understand data streams by effectively extracting complex time-varying patterns.