CIKM21 accepted paper - Multivariate and Propagation Graph Attention Network for Spatial-Temporal Prediction with Outdoor Cellular Traffic
Spatial-temporal prediction is a critical problem for intelligent transportation, which is helpful for tasks such as traffic control and accident prevention. Previous studies rely on large-scale traffic data collected from sensors. However, it is unlikely to deploy sensors in all regions due to the device and maintenance costs. This paper addresses the problem via outdoor cellular traffic distilled from over two billion records per day in a telecom company, because outdoor cellular traffic induced by user mobility is highly related to transportation traffic. We study road intersections in urban and aim to predict future outdoor cellular traffic of all intersections given historic outdoor cellular traffic. Furthermore, we propose a new model for multivariate spatial-temporal prediction, mainly consisting of two extending graph attention networks (GAT). First GAT is used to explore correlations among multivariate cellular traffic. Another GAT leverages the attention mechanism into graph propagation to increase the efficiency of capturing spatial dependency. Experiments show that the proposed model significantly outperforms the state-of-the-art methods on our dataset.READ FULL TEXT VIEW PDF
Spatial-temporal prediction has many applications such as climate foreca...
Accurate forecasting of citywide traffic flow has been playing critical ...
Mobile Network Operators (MNOs) are evolving towards becoming data-drive...
Multistep traffic forecasting on road networks is a crucial task in
Predicting traffic conditions has been recently explored as a way to rel...
Substantial efforts have been devoted to the investigation of spatiotemp...
Spatial-temporal prediction is a fundamental problem for constructing sm...
CIKM21 accepted paper - Multivariate and Propagation Graph Attention Network for Spatial-Temporal Prediction with Outdoor Cellular Traffic
Recently, spatial-temporal prediction becomes one of the fundamental techniques in building intelligent transportation. Based on large-scale data (e.g., traffic speed, volume) collected from sensors, previous studies (Li et al., 2018; Zheng et al., 2020) achieve successful performance. However, it is unlikely to deploy sensors in all regions (e.g., road intersections, rural areas) due to the device and maintenance costs, which gives rise to the task of data collection for spatial-temporal prediction.
We seek an alternative approach to evaluate the traffic state without sensors. With billions of mobile devices entering the internet, massive records of cellular traffic (Fang et al., 2018)
are collected at cell towers. Many studies have been made for its application, such as cellular vehicle probes for traffic state estimation(Habtie et al., 2017; Valadkhani et al., 2017), and cellular traffic prediction modeling (Wang et al., 2017; Fang et al., 2018; Wang et al., 2018). Nevertheless, existing studies rarely consider transportation traffic induced by user mobility. According to the analyses from over billions of records, outdoor cellular traffic is found to be induced by user mobility. Therefore, we leverage outdoor cellular traffic representing the traffic state.
In this paper, we propose a new spatial-temporal dataset via outdoor cellular traffic distilled from over a billion records per day in a telecom company, which is extremely valuable for surveilling traffic states in regions without sensors. For instance, if the accumulated outdoor cellular traffic in a unit time step exceeds the threshold, the region might occur certain events (e.g., traffic congestion or parade). We study the road intersections of a major city forming a road network and collect corresponding outdoor cellular traffic in time steps. This paper aims to predict the future outdoor cellular traffic of all intersections given historic outdoor cellular traffic.
have achieved great success in prediction by conducting propagating information on the graph data within graph neural networks (GNN). However, the dataset of previous studies is the traffic speed detected from sensors, which is usually less varied between time steps. While in our dataset, the quantity of outdoor cellular traffic can drastically vary in different times, especially the peak (e.g., 18:00) is 200 times greater than the least active times (e.g., 03:00) in the same road intersection, such significant difference is more challenging than that prior dataset. Therefore, We argue that expanding the uni-historic data into multivariate consisting of various temporal-periodic data is critical to capturing more hidden correlations in the complex temporal pattern, which has not been explored in previous methods. As more complicated temporal features fed into the predictive model, we notice that the propagation process of GNN does not consider the dynamic attention between nodes during updating node information, which might reduce the capacity of modeling.
To address the challenges, we propose a new framework for spatial-temporal prediction, namely MPGAT (Multivariate and Propagation Graph Attention Network), mainly consisting of two extending graph attention networks (GAT) (Veličković et al., 2018) modules. (1) Multivariate GAT (M-GAT) explores the correlations among multivariate input, which can be effectively adapted to multivariate time series. (2) Propagation GAT (P-GAT) incorporates the propagation strategy into GAT to captures the spatial dependency between regions, benefited from the attention mechanism and spatial closeness of the graph. We evaluate the proposed framework on our dataset. Experimental results with statistic analysis show that MPGAT outperforms significantly several state-of-the-art models.
Data Collection: The data was collected from a large-scale cellular-geographic system in a telecom company. Each record contains International Mobile Station Equipment Identification (IMEI, a unique number of a mobile phone), the creation time, the GPS location, and the location type (categorized by the telecom company, e.g., outdoor and indoor). As privacy considerations, we use hashed IMEI to represent each mobile subscriber.
Spatial-temporal Dataset: To explicitly discover the spatial dependency, we study six road intersections (e.g., around the train station and college) as a road network, as shown in Figure 1. Then, we individually aggregated the quantity of outdoor IMEI located at intersections in the unit time step (i.e., 5-minutes) as age spatial-temporal dataset. Specifically, the aggregated IMEI quantity over time steps demonstrates strong temporal correlations, and geographically connected intersections contain spatial dependence.
Intersection Network: Each road intersection is geographically connected. We define the network as a directed graph = (,), where is a set of intersections of the network, is the set of edges representing the connectivity between the intersections.
Multivariate Input: According to our dataset, IMEI quantity has a drastic change in different time steps. Following (Guo et al., 2019; Lin et al., 2021), we expand IMEI quantity of unit time step as different temporal-periodic series to capture more temporal patterns, forming a multivariate input:
(1)The IMEI quantity adjacent to predictions:
denotes historical IMEI quantity with time steps adjacent to the predictions, revealing the short-term factor, where is the IMEI quantity of intersections at time step .
(2)The moving average of IMEI quantity adjacent to predictions:
is a set of moving average (MA) of IMEI quantity over time steps, where each value of is calculated by accumulating the IMEI quantity over time steps and then dividing the sum by . Moving average is commonly used with time-series data to smooth out short-term fluctuations and highlight longer-term trends (Yu et al., 2019). The features of and are adopted in this paper.
(3)The daily intervals adjacent to prediction:
Suppose the sampling frequency is p time steps per day,
denotes a set of daily-interval data, consisting of IMEI quantity at time step in different days closed to predictions. For example, there are (=288) time steps in one day by 5-minute time step. We consider the daily-interval quantity as one of the features to capture the sequence regular pattern.
Problem: Given to denote the multivariate input of all the intersections over observed time steps, predict future IMEI quantity set over the coming time steps, where is the number of features in multivariate input, is the number of intersections in , and is the length of time steps, and is the future IMEI quantity of intersection .
Figure 2 illustrates the framework of our proposed Multivariate and Propagation Graph Attention Network (MPGAT). The multivariate input , such as , is fed to M-GAT to capture correlations between multivariate input . Then, the distilled outputs from M-GAT are forwarded to the connected temporal convolution (Yu and Koltun, 2016) (TCN) layer. For capturing the spatial-temporal dependency, TCN is interleaved with P-GAT as a spatial-temporal block, where P-GAT is to model the spatial dependency between intersections. By stacking multiple spatial-temporal blocks, the capacity of modeling the spatial-temporal dependencies increases (Wu et al., 2019). To avoid gradient vanishing, Residual Layers (He et al., 2016) is appended from the top of each interleaved layer to its end, and the Skipped Layers are concatenated after each TCN to an output layer.
As we aforementioned, the IMEI quantity series has been expanded into multivariate input. Due to the strong feature-extraction capability of GAT, M-GAT contains multiple GAT layers to capture the associations between multivariate input. Inspired by(Huang et al., 2019; Wang et al., 2019), M-GAT treats each component of multivariate input as one node in a complete graph and contains two stacked GAT layers.
We consider a single layer of M-GAT as an example. For intersection , the set of multivariate input is first projected as a set of latent representations by a convolution layer before M-GAT, where is the length of historical time steps, and is the dimension of latent representation. Afterward, is fed to M-GAT, forming a complete graph with nodes. The attention score between nodes represents the importance from node to node , where , , and can be computed as follows:
, where is the concatenation operation,
is the weight vector with transposition, and the attention score is normalized by a SoftMax function with LeakyReLU.
With the normalized attention scores, the output of one M-GAT layer for node is given by:
denotes a non-linear activation function.is the aggregated latent representation for feature of intersection , which contains the implicit influence from others. For intersections, the output set is presented as . To reduce the model complexity, we only distill the latent representations of IMEI quantity outputted at the last layer of M-GAT, denotes as , and feeding into the next layer of MPGAT.
P-GAT consists of two directions Attention Propagation and one Global learning to capture the spatial dependency between intersections. To our best knowledge, we are the first work to apply the attention mechanism on graph propagation.
Attention Propagation: The advantage of the propagation process (Wu et al., 2020) is that it aggregates node information through the graph structure recursively and preserves a proportion of nodes’ states during the process, eliminating the smoothing problem with diffusion method in (Li et al., 2018). With the benefit of propagating the states through graph structure, we generalize the attention mechanism into the propagation process, as shown in Figure 4. Considering nodes with the set of initial representations from the previous layer, the propagation and output is defined as:
, where , is the masked attention matrix for uni-direction, represents the propagation steps,
is a hyperparameter between 0 and 1 to control the ratio of node’s propagation,
is a multilayer perceptron layer (MLP) to transform the channel dimension of concatenation results, andis the final output of nodes. In practice, we employ the CNN as MLP, and set is 2.
In case of the directed graph , P-GAT models two directions, the forward and the backward propagations with own masked attention matrix. Consider each intersection of as one node, the attention score in uni-direction between nodes and , where , , is computed as:
|Models||Mean ±std. Dev.||h1||Mean ±std. Dev.||h1||Mean ±std. Dev.||h1||Mean ±std. Dev.||h1|
|LSTM(Hochreiter and Schmidhuber, 1997)||0.1592±0.0034||0/1||0.1801±0.0034||1/1||0.1984±.0052||1/1||0.2342±0.0078||1/1|
|GWNET(’19)(Wu et al., 2019)||0.1576±0.0027||-1/1||0.1781±0.0034||0/1||0.1934±0.0028||0/1||0.2194±0.0037||0/1|
|MTGNN(’20)(Wu et al., 2020)||0.1629±0.0032||1/1||0.1811±0.0027||1/1||0.1951±0.0031||0/1||0.2198±0.0033||0/1|
|STAWNET(’21)(Tian and Chan, 2021)||0.1611±0.0026||1/1||0.1823±0.0037||1/1||0.1947±0.0029||0/1||0.2180±0.0026||-1/1|
h1 means whether the result of MPGAT-1/MPGAT is significant according to Wilcoxon rank-sum test compared to the baseline method.
, where is the latent representation of node from the previous layer, is a weight vector with transposition, is the dimension of , and denotes the adjacent neighbors of node . The attention coefficient in Equation 5 is masked according to the adjacency matrices in the corresponding direction, where is masked with a large negative value (e.g., -9e15) if node and are not adjacent; otherwise, the coefficient would be preserved. In this way, the attention score between adjacent nodes would increase after SoftMax normalization, while the non-adjacent is set to zero. The matrix can be represented as , which implies the relationship between adjacent nodes but excludes the non-adjacent.
Global Learning: To discover hidden correlations among nodes, Global Learning treats each intersection as a node and builds a complete graph to capture the hidden relationships of spatial dependency. We use Equation 6 to construct an attention matrix, where is not masked, denotes all the nodes of . All nodes would learn globally the attention score toward others and updates the node information directly.
Output: P-GAT explores the bi-directional correlation with directional modeling and captures the implicit associations adaptively with global modeling for spatial dependency learning. Finally, P-GAT fuses two propagation modeling and global modeling outputs as an input fed to the next layer.
Dataset: This paper verifies the proposed MPGAT and the compared models with the outdoor cellular traffic dataset, which contains the IMEI quantity in 5-minutes time steps of six road intersections ranging from Jan.1, 2020, to Jun.30, 2020. The dataset is split with 70% for training, 10% for validation, and 20% for testing.
Comparison Methods: Our dataset is a spatial-temporal dataset with a road network, similar tasks such as traffic speed have generally been addressed better by graph-based models (Zheng et al., 2020). Due to the page limitation, we focus on the latest graph-based models as baselines in our evaluation: LSTM (Hochreiter and Schmidhuber, 1997), GWENT (Wu et al., 2019), MTGNN (Wu et al., 2020), STAWNET (Tian and Chan, 2021). To verify the effectiveness of the correlation among multivariate, we build two models, MPGAT-1 and MPGAT, where MPGAT-1 only adopt univariate IMEI quantity as input.
Experimental Settings: The length of historical input
is 12 for all models including ours and baselines. MPGAT uses eight spatial-temporal blocks to cover the input sequence, where each block has a TCN interleaved with a P-GAT. We adapt the Adam optimizer with a learning rate of 0.001 to train our model. The evaluation metrics we choose mean absolute percentage error (MAPE).
According to Table 1, as the prediction time increases, MAPE of the same method rises for all, showing that the longer the prediction time, the more challenging the prediction task. Second, with the statistical analysis, MPGAT significantly achieves the best performance in the dataset. Third, MPGAT-1 is slightly underperformed than GWNET while outperforming others in the short-term prediction. It is more beneficial to practical applications on short-term factors, e.g., transportation agencies can immediately optimize traffic congestion.
Figure 5 shows the changes in the prediction performance of various methods as the prediction steps increase. We observe that MPGAT consistently outperforms MPGAT-1 and baseline models, indicating it is critical to explore the correlation among multivariate, especially that our dataset has a drastic change between neighboring time steps. Moreover, we notice that the MAPE value of the same method conducted on our dataset is two times larger than the traffic speed dataset (Li et al., 2018), which indicates that spatial-temporal prediction with cellular traffic data is more challenging. We are optimistic that our proposed task and dataset would pave a new path for spatial-temporal prediction and urban computing applications.
In this paper, we propose a new spatial-temporal dataset via outdoor cellular traffic and a model MPGAT for multivariate spatial-temporal prediction. Experimental results show that the proposed MPGAT significantly outperforms other models on the dataset.
Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 922–929. Cited by: §3.
Adaptive fuzzy particle swarm optimization for global optimization of multimodal functions. Information Sciences 181 (20), pp. 4539–4549. Cited by: §5.
Diffusion convolutional recurrent neural network: data-driven traffic forecasting. In International Conference on Learning Representations, External Links: Cited by: §1, §1, §4.2, §5.
Spatial-temporal attention wavenet: a deep learning framework for traffic prediction considering spatial-temporal dependencies. IET Intelligent Transport Systems 15 (4), pp. 549–561. Cited by: §1, Table 1, §5.
Kgat: knowledge graph attention network for recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 950–958. Cited by: §4.1.
Taxi-based mobility demand formulation and prediction using conditional generative adversarial network-driven learning approaches. IEEE Transactions on Intelligent Transportation Systems 20 (10), pp. 3888–3899. Cited by: §3.