Long-Short Term Spatiotemporal Tensor Prediction for Passenger Flow Profile

04/23/2020 ∙ by Ziyue Li, et al. ∙ Arizona State University Tsinghua University The Hong Kong University of Science and Technology 0

Spatiotemporal data is very common in many applications, such as manufacturing systems and transportation systems. It is typically difficult to be accurately predicted given intrinsic complex spatial and temporal correlations. Most of the existing methods based on various statistical models and regularization terms, fail to preserve innate features in data alongside their complex correlations. In this paper, we focus on a tensor-based prediction and propose several practical techniques to improve prediction. For long-term prediction specifically, we propose the "Tensor Decomposition + 2-Dimensional Auto-Regressive Moving Average (2D-ARMA)" model, and an effective way to update prediction real-time; For short-term prediction, we propose to conduct tensor completion based on tensor clustering to avoid oversimplifying and ensure accuracy. A case study based on the metro passenger flow data is conducted to demonstrate the improved performance.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Passenger flow data of an Urban Rapid Transit (URT) system is characterized as typical spatiotemporal data. Predicting passenger flow of a URT system has significant commercial value, such as automatical warning in advance when the system failure occurs for economic loss reduction. Based on the length of the prediction horizon, the task can be classified as short-term prediction (for several hours) and long-term prediction (for several days). Currently, the complicated spatial and temporal correlation structure hinders accurate prediction and further analysis


The goal of the spatiotemporal analysis is to capture various implicit spatial and temporal dependencies. In our URT case, for spatial dependency, the first aspect is the Law of Geography, where the passenger flow of a station is usually affected by its spatially adjacent neighbors. Fig. 1(a) plots the correlation structure of passenger flow data in stations along a specific URT line on a specific day. We can observe that neighboring stations are strongly correlated. The second aspect is the contextual similarity, where two stations sharing similar functions (business center, residential area or school, etc.) are more likely to have similar passenger flows. For temporal dependence, the future prediction is often correlated with historical observations in two different temporal scales such as weekly correlation and daily correlation as shown in Fig. 1(b, c). Weekly correlation refers to the passenger profile of today is correlated with the same day of previous weeks. Daily correlation refers to the the passenger profile of today is related to the pattern in the yesterday.

Existing spatiotemporal forecasting methods are mainly based on geo-statistical models with regularization techniques. Different locations sharing similar features display common spatial correlations: Zhao et al., used -norm to achieve the relatedness of locations sharing the same keyword pool in social media analysis [zhao2015multi]; Zhang et al., also utilized -norm to encourage all locations to select common features in citywide passenger flow prediction [zhong2017spatiotemporal].

Figure 1: (a) Spatial Correlation. (b) Inflow Profile of 2 weeks for STN1 (Station names are desensitized as station code ‘STN’). (c) Daily and Weekly Dependencies observed in Data.

To capture the temporal pattern, many traditional time-series models have been developed for traffic prediction, such as Holt-Winters forecasting [tikunov2007traffic] and Auto-Regressive Integrated Moving Average (ARIMA) [williams2003modeling], which incorporate the regularization terms. For example, temporal smoothness is often enforced by penalizing the difference between two consecutive timestamps [zhang2017spatiotemporal]. However, the linear parametric form with few lags imposes strong assumptions on the temporal correlation, which may lead to under-fitting. Most importantly, a traditional ARIMA is powerless to present the cross-dependency of the spatial and temporal dimension, which is severely overlooked in most of the above research.

Another way to deal with the spatiotemporal prediction is the application of Neural Networks. Convolutional Neural Network (CNN) is a common way to capture spatial correlation by using a station’s neighbors to predict its future behavior. Recurrent Neural Network (RNN) and its variants (e.g., Long Short Term Memory networks) are sufficiently capable of modeling complex temporal correlation. Built upon CNN and LSTM, H. Yao

et al. [yao2018deep] developed a spatio-temporal network to predict taxi demand of different regions. It adds a further layer to consider: semantic similarity of locations, meaning that locations sharing a similar functionality may have similar demand patterns. Recently, X. Geng et al. [geng2019spatiotemporal]

incorporated Multi-Graph Convolution Network to interpret not only the Euclidean correlation among spatially adjacent regions but also involved Non-Euclidean adjacency, including function similarity and Connectivity, following by Gated Recurrent Neural Network (GRNN) to capture temporal dependency. However, it not only requires costly computational resources to train these deep learning models but also demands a very large sample size for training.

In this research, we aim to develop a computationally efficient and robust way to deal with the spatiotemporal prediction problem. To achieve this, we propose to use the tensor representation of the spatio-temporal data, which is proven as an efficient way to represent spatiotemporal data due to its sufficient capacity to capture inter-dependencies along multiple dimensions. In particular, there are two conventional ways to conduct spatiotemporal prediction by tensor, and we categorize them as 2-step tensor prediction and 1-step tensor prediction in the following.

2-step Tensor Decomposition and Time Series Modeling for Spatio-temporal prediction

These methods combine tensor decomposition combined with traditional time-series prediction models [dunlavy2011temporal, guo2017novel, ren2017efficient], and it is suitable for long-term prediction.

Tensor Decomposition can be considered as a high-dimensional version of matrix singular value decomposition

[kolda2009tensor]. Two specific forms of tensor decomposition are usually adopted in tensor analysis: CANDECOMP/PARAFAC (CP) that decomposes a tensor as a sum of rank-one tensors, and Tucker Decomposition that decomposes a tensor into a core tensor multiplied by each mode matrix.

Moreover, the 2-step tensor prediction initially conducts Tensor Decomposition to obtain the temporal mode matrix along time dimension, and then exploits time-series model on the temporal mode. Holt-Winters forecasting [dunlavy2011temporal]

, Auto-Regressive Integrated Moving Average (ARIMA) and Support Vector Regression (SVR)

[ren2017efficient] have been exploited.

However, existing 2-step tensor prediction methods are not capable of capturing all the weekly and daily patterns mentioned above. Take ARIMA or AR model for example. If the weekly pattern is desired, then time-lag should be set large to include past few weeks at least, which highly complicates the model. To address this, we proposed to first reshape the daily profile into a matrix (in the form of ) and then apply a 2-Dimensional Auto-Regressive Moving Average (2D-ARMA) model to capture all those daily and weekly patterns. We name it as ”2-step 2D-ARMA” tensor prediction. Finally, real-time prediction update when new data arrive is another challenge. To this end, we also proposed a Lean Dynamic Updating method.

1-step Tensor Prediction based on tensor completion

This is based on tensor completion [tan2016short, ran2016tensor, luan2019prediction]

, and it is suitable for short-term prediction (prediction horizon as several hours ahead). Tensor Completion is originally designated for tensor random missing data imputation. H. Tan

et al., used tensor completion first time for traffic volume prediction [tan2016short, chen2019bayesian]

, which treated future data as missing data to be estimated.

For tensor completion, Low-Rank Tensor-Completion (LRTC) method has prevailed. One of the most common technique is adding a nuclear-norm on tensor’s rank [kressner2014low, yokota2016smooth, liu2012tensor]; Q. Shi et al., tried -norm on CP weight vector [shi2017tensor]; Q. Zhao et al., also proposed a CP-based Bayesian Hierarchical Probabilistic model, assuming that all mode matrices were generated from higher-level latent distribution, with sparsity-inducing prior to low-rank [zhao2015bayesian].

However, 1-step tensor prediction based on LRTC is prone to oversimplify the model, which results in loss of prediction accuracy. To solve this problem, we propose to first cluster spatiotemporal data and then conduct LRTC within data from the same cluster. To this end, Tensor clustering method will also be studied.

In this paper, we will focus on the improving both 1-step and 2-step state-of-the-art tensor prediction methods, and provide practical and effective techniques to improve the prediction performance correspondingly. In summary, this paper makes the following contributions:

  • We improve both 2-step and 1-step tensor prediction for URT passenger flow prediction. For example, for 2-step tensor prediction, we propose a 2-step 2D-ARMA tensor prediction model. For 1-step tensor prediction, we improve the LRTC by conducting it together with a tensor cluster algorithm.

  • Furthermore, for both methods, we also propose how to dynamically apply the proposed method online. For example, we propose a Lean Dynamic Updating method for tensor decomposition to update the previous prediction real-time;

Ii Preliminaries

We first review the preliminaries and backgrounds about tensor decomposition methods and tensor completion.

Ii-a Notations and Operations

Throughout this exposition, scalars are denoted in italics, e.g. ; vectors by lowercase italic letters in bold face, e.g. ; and matrices by uppercase boldface letters, e.g.

; High dimensional data, tensor by boldface script capital


Ii-B Tensor Decomposition and Completion

We will introduce the basic knowledge of CP, Tucker Decomposition and Tensor Completion here.

CP Decomposition

A tensor is represented as the weighted summation of a set of rank-one tensors:


where each is a unit vector, and is the outer product. is the mode matrix of Dimension- and is the rank of CP decomposition. is the score vector.

Bayesian Low-rank Tensor Decomposition

Consider is a noisy observation of tensor , i.e.,

, and the noise is assumed to be an i.i.d Gaussian distribution

. is generated by CP model, with weight absorbed inside of mode matrices.

Mode- factor matrix can be denoted by row wise or column wise vectors .

The generative model based on Bayesian probabilistic structure [zhao2015bayesian] is shown in Fig. 2, and is specified as:

  • and are generated by:

  • (given ) is generated by:

    Figure 2: Bayesian Probabilistic Structure
  • (given ) is generated by:


Tucker Decomposition

It is commonly regarded as higher-order PCA, decomposing a tensor into a core tensor multiplied by a matrix along each mode, i.e.,


where is core tensor, , and is the rank for Tucker decomposition.

Tensor Completion

It is usually designed for random missing data imputation, like random pixel missing in image data [zhao2015bayesian]. The basic tensor completion is formulated as following:


where is the incomplete input tensor, is the completed output matrix, is the nuclear norm to achieve low rank, and is sampling set which denotes the indices of the observed elements of a tensor.

Iii Proposed Tensor Prediction Framework

In this section, we aim to propose spatio-temporal prediction framework based on the tensor decomposition and tensor completion methods in Section II. For the short-term prediction, the effective prediction horizon is 2 hours ahead, which is the response time needed for URT company to take corresponding actions when abnormal passenger flow pattern happens. In the case of URT passenger flow, data can be represented as: . is location standing for 120 stations in our dataset, is the time scope we are looking at (here is 1 Jan 2017 to 28 Feb 2017), is the 5-minute interval observations per day with 247 sensing points. Thus our data is .

Our prediction problem can be formulated as:


where is the prediction horizon.

Iii-a 2-step Tensor Long-Term Prediction: Combine Tensor Decomposition and 2D-ARMA

2-step tensor prediction is popularly used in the past few years and the mechanism(shown as Fig. 3(a)) behind is designed as:

  • Formulate data as a tensor form. In our case, it is . Decompose it, and among the decomposed components we can find temporal mode matrix .

  • Use traditional time-series model to predict the incoming time’s temporal model matrix .

  • Then substitute it back to decomposition structure to reconstruct the tensor .

(a) Tensor Decomposition + 2D ARMA Model
(b) 2D ARMA Model on Rank-r Vector of
Figure 3: 2-step 2D-ARMA Tensor Prediction

For time-series model, options include Holt Winters, ARIMA etc. However, as mentioned before, they are unable to capture the temporal pattern as shown in Fig. 1(c), especially the weekly pattern. This is because, to take the same day of past 2 weeks, at least 14 days time-lags should be involved into model, which leads to too much model complexity.

Therefore, we proposed the adoption of 2D-ARMA model for the prediction step (shown in Fig. 3(b)), with following additional steps needed:

  • For the Rank-r vector of , , reshape it into 2D matrix, with each row representing one certain day of the week, each column representing the seven days in one week.

  • Train the 2D-ARMA model using .

  • After prediction, vectorize the matrix back to vector, and combine all the rank vectors to obtain .

The matrix from Rank-r vector of is represented as a 2D random field . The 2D ARMA model is defined for the for the matrix by the following equation:


is a stationary white noise with variance

, and the coefficients of are the parameter of the model. and are the time lags of for and respectively. The parameter estimation is explained in [zielinski2010two]. The 2-step 2D-ARMA Tensor Prediction is summarized in Algorithm 1.

1:, , , .
3:CP Decomposition: obtain by Eq.(1)
4:2D-ARMA Prediction:
5:for  to  do
6:   Reshape : to , with
7:   for  to  do
8:       by Eq.(8)
10:   end for
11:   Reshape : to
12:end for
14:CP Inverse Reconstruction: obtain by Eq.(1)
Algorithm 1 2-step 2D-ARMA Tensor Prediction

Another challenges is to instantly and efficiently update the prediction result when new data come, which little research has yielded solutions. The problem is specified in Fig. 4(a). After we obtain the whole predicted passenger flow for tomorrow, when we reach tomorrow 10:00AM for example, the new data will have arrived instead. Then how to update our original prediction dynamically and yield more accurate prediction for the rest of day , especially for next two hours, is a problem. To address this, we propose a lean dynamic tensor decomposition updating method, to alleviate unnecessary workload. The method is demonstrated in Fig. 4(b).

For the long-term prediction result of day , , we propose the following procedure:

  • Decompose the old long-term prediction result into and , with Factor 0 as , which needs to be updated when new data arrive. An illustration is shown in Fig. 4(b).

  • Assume 30% new data of it have come, firstly splice the 30% new data and the rest 70% long-term prediction together, and update according to Eq.(9) [kolda2009tensor].

  • Finally, use the updated , and the original , to reconstruct the tensor.


where is the mode-0 unfolding along L-dimension, is Khatri-Rao product and is Khatri-Rao product pseudoinverse.

(a) 2-step Tensor Prediction Unable to Update Prediction
(b) Lean Dynamic Tensor Decomposition Updating: CP.fit is CP composition function, CP.transform will be shown in Eq.(9), CP.inverse is CP reconstruction function;
Figure 4: Proposed method to Update Prediction

In practice, we can combine the weight and factor mactrix 0 together and update them when new data arrives.

Iii-B 1-step Tensor Prediction for Short-Term: Tensor Completion

To adopt tensor completion framework to prediction, we treat the historical data as observation set (with observation indicator as 1), and the future horizon to be predicted as missing data (with observation indicator as 0).

One thing to be noted is that Tensor completion is not designed for long-term prediction. In particular, here we define two conceptions (as shown in Fig. 4(a)): open dimension and closed dimension. Open dimension is the dimension that keeps increasing as data arrives, e.g. dimension -day here; closed dimension is the one which has fixed maximum length, e.g. dimension -station, and -time point. Tensor completion can be used for missing data imputation along closed dimension (e.g. and ), but not along open dimension (). So it can only predict short-term.

For tensor completion methods, in particular, we follow the Bayesian Low-Rank Tensor Completion (LRTC) framework proposed by Zhao et al. [zhao2015bayesian].

Denote from Eq.(2, 3, 4). After calculating the

-joint distribution, and the posterior distribution, the missing data can be estimated after getting

, by:


However, this method still demonstrates some drawbacks as noted in [yuan2018high]. In particular, when the tensor data violate the innate low-rank structure, such as in our case that the spatial information of URT tensor data is diverse from stations to stations, the low-rank assumption along the spatial dimension cannot hold tenably. Consequently, it is far from enough to assume the spatial prior as low-rank. This most likely oversimplifies the original data structure.

Figure 5: Tensor Clustering based on Tensor Decomposition

To solve the problem, before the Bayesian LRTC, we proposed to use tensor clustering first [yu2019coupled, sun2019dynamic] to classify the tensor samples into several classes, with highly similar samples within a same cluster. Tensor clustering method is shown in Fig. 5 with the following steps:

  • Conduct Tensor decomposition to obtain location model matrix.

  • Implement Principal Component Analysis to further reduce the dimension.

  • Cluster based on a particular clustering method, such as K-mean and Hierarchical method.

Thus different URT stations can be divided into several clusters, and Bayesian LRTC can be conducted within each cluster since the homogeneity within cluster can guarantee its performance.

Iv Experiments

According to what we have discussed in Session III, our UTR passenger flow tensor data is , representing 120 stations, over the past 59 days, with each day 247 sampling points. We set the first 50 days as known historical data and the last 9 days as data to be predicted.

Figure 6: (a) Inflow Profile Long-Term Prediction for STN3 by 2-step 2D-ARMA prediction; (b) Prediction Improvement for STN3 by involving 30% new data; (c) Inflow Profile Short-Term Prediction for STN17 by 1-step prediction
Randomly Selected Station Code 1D ARIMA Tensor Prediction 2D ARMA Tensor Prediction Relative improvement (%)
51 0.1500 0.1066 28.90
56 0.1043 0.0759 27.21
38 0.0849 0.0638 24.91
87 0.1400 0.1087 22.39
54 0.0939 0.0739 21.33
84 0.1605 0.1271 20.81
11 0.0962 0.0767 20.29
16 0.0871 0.0703 19.30
37 0.0982 0.0811 17.38
65 0.1247 0.1060 15.00

Iv-a Proposed 2-step Tensor Prediction Result

For the tensor , we conduct CP decomposition. The rank is chosen as 50 by cross validation, achieving both satisfactory reconstruction and simplicity. For each rank, a 2D ARMA model is constructed and used to predict the coming 9-day-ahead passenger flow. The results of the first 3 days long-term prediction are shown in Fig. 6(a). In our proposed method, to capture the weekly pattern of past 2 weeks and the daily pattern of past 2 days, we set , which introduces 8 time-lag components into model. For fair comparison, the baseline is chosen as the traditional ”Tensor Decomposition + 1D ARIMA” model, with the same model complexity time-lag equal to 8. As shown in Table I, our proposed model can achieve almost 20% improvements, evaluated by relative residual (RES). The overall improvement benefits from the advantage that more strongly correlated temporal patterns have been considered in our model. Note that the scale of improvement varies. This is because some stations may also have strong temporal correlation with time-lag(3), (4) etc., which yet our model ignores.

[t, t+5] Long-Term Prediction Updated after 30% new data Relative improvement (%)
t =75 0.9221 0.8436 8.520
t =80 0.5833 0.3920 32.79
t =85 0.7136 0.6258 12.29
t =90 1.2381 1.2042 2.735
t =95 1.2543 1.1387 9.215
t =100 1.0423 0.9729 6.659
t =105 0.1153 0.0743 35.57
t =110 0.3452 0.3379 2.113
t =115 0.4022 0.3163 21.36
t =120 0.2292 0.3084 -34.54
t =125 0.6245 0.6865 -9.936

When the first 30% new data (in Fig. 6(a) until time stamp ) have arrived, the prediction for the rest 70% of that day (In Fig. 6(a) highlighted in blue block, from time stamp to ) needs to be updated instantly. By using proposed lean dynamic updating, the rest 70% has been recalculated as shown in Fig. 6(b). It is clear to observe that the updated prediction is significantly improved with smaller distance to the real value, especially around the local peak time. To further check the improvement of the rest 70%, RES is calculated from for every 25 minutes (i.e., 5 time stamps). According to Table II, after involving the first 30% of new data, there is an obvious improvement by around 20% (highlighted in boldface) over the following 3 hours (from to ). After then, the relative improvement based on short-term updating becomes less efficient, with the long-term prediction still being preferred.

Iv-B Proposed Bayesian LRTC Result

For the Bayesian LRTC, some stations from mixed clusters have been randomly picked, and the data since (around 10AM) of last day are to be predicted. The prediction result is shown in Fig. 6(c). The Bayesian LRTC can reduce RES by 29% for station 17 in Fig. 6(c), with some other stations achieving around 10% to 30% less residual as shown in Table III. The good performance for tensor completion in short-term prediction is quite satisfactory compared with the proposed Lean Dynamic Updating. However, according to Table III, it is to be noted that this improvement (highlighted in boldface) is not universal, with station 84, 55, 51 and 87 having worse prediction. This is because these four stations have quite unique and distinct passenger flow pattern and the low rank assumption no longer holds.

Station Code (Mixed Cluster) Updated after 30% new data Bayesian LRTC Relative improvement (%)
17 0.1165 0.0821 29.59
56 0.0885 0.0649 26.66
54 0.0833 0.0704 15.38
38 0.1775 0.1521 14.31
35 0.1020 0.0909 10.89
11 0.1416 0.1326 6.38
84 0.0632 0.0691 -9.38
55 0.1240 0.1366 -10.17
51 0.1090 0.1279 -17.38
87 0.1205 0.1609 -33.60

This can be solved by conducting Bayesian LRTC within a same station cluster. We use the Hierarchical Clustering with Agglomerative Method, where the distance is defined as Group Average Euclidean. The clustering result reflects two types of spatial dependencies observed in our data: Law of Geography and contextual similarity. In other words, if two stations are geographically close or functional similar, they are in the same cluster. For example, Fig. 7 shows the land-use information of three selected stations (Station 73, 80 and 84) in one cluster. Though Station 73 and 84 are far to each other, they share the same land-use pattern (with red dominant, mix use in grey and brown). Though Station 84 and 80 are quite different in land-use (Station 84 is dominated with red, Station 80 is dominated with green), they are geographically close (since the stations are indexed consecutively, two stations with close codes indicates they are physically close.)

Figure 7: Land-use Information of Selected Stations (Different Color Pixel Denotes Different Land-use)

By selecting stations within this cluster, we compared the Bayesian LRTC within one cluster with 2-step Prediction updated with 30% new data, and the result is shown in Table IV .

All the stations’ predictions have been improved (with improvement highlighted in boldface), with the majority improved by 20%. Most importantly, even for the same Station 84 and 87 in two cases(highlighted with underline in Table III and IV), conducting Bayesian LRTC within one cluster can improve prediction by 30% to 40% .

Station Code (Same Cluster) Updated after 30% new data Bayesian LRTC within a cluster Relative improvement (%)
82 0.1149 0.0673 41.43
77 0.1775 0.1133 36.18
79 0.2112 0.1349 36.10
71 0.1109 0.0766 30.89
84 0.0632 0.0489 22.59
75 0.1593 0.1288 19.15
74 0.1642 0.1409 14.19
73 0.1773 0.1567 11.61
80 0.1187 0.1087 8.41
87 0.1205 0.1146 4.92


In this paper, we focused on both Long-term and Short-Term URT passenger flow prediction.

Our proposed 2-step Tensor Prediction based on Tensor decomposition and time-series model can predict both long-term and short-term. In particular, the ”CP Decomposition + 2D ARMA model” can achieve satisfactory long-term prediction, and the lean tensor decomposition updating method can update short-term prediction after receiving new data.

Our proposed 1-step Tensor Prediction based on Bayesian Low Rank Tensor Completion can only predict short-term, but with better performance than lean dynamic tensor decomposition updating. To solve its the innate drawback of low-rank assumption that results in oversimplification, a tensor cluster technique is first implemented and then Tensor Completion is conducted for each cluster respectively.


This research is supported by Hong Kong MTR Co. with grant numbers RGC GRF 16203917, 16201718 and NSFC 71931006. We also give special thanks to Dr. Qibin Zhao for sharing the scripts of their methods.