Smart cities have been gradually formed by information and communication technologies, including the Internet of Things (IoT) , cloud computing , and edge computing . One important application scenario of the smart city is the Intelligent Transportation System (ITS) to improve the public services and gain the solutions to problems in urban transportation such as traffic jams, traffic accidents, parking chaos, route planning , and resource allocation . The above problems are closely related to traffic flow and its prediction. Furthermore, the smart city industry also plays an important role in Big Data and generates various Spatio-temporal data styles [1, 14, 39] including GPS, sensors, social media, and traffic cards. Driven these urban Spatio-temporal Big Data, the main challenge a smart city faces can be summarized in two aspects: (1) how to deal with and analyze large but redundant Spatio-temporal data, and (2) how to improve human mobility and optimize travel.
Public transportation accounts for a large proportion of urban transportation. Taking Beijing as an example, buses produced 1.7 billion vehicle kilometers traveled and transported 4.9 billion passengers in 2011 alone . The behavior that people are encouraged to take buses is beneficial to the city’s sustainable development owning to low-carbon and green mode of buses. Therefore, the operation management of public transportation directly affects the traffic circumstance of the city, which the government has always valued. Many policies have also been adopted to try to improve public transportation, such as preferential bus fares, bus lanes, additional stops, routes, and bus running time .
However, emerging traffic problems like severe traffic congestion and the unreasonable allocation of resources have urged researchers to study . Consequently, the new services and requirements are urgent to improve bus travel and ride experiences, where the passenger flow prediction becomes critical. As shown in Fig. 1, the existing potential problems are solved through processing and analyzing Spatio-temporal data. One of the most effective solutions is route optimization, which is a complex and challenging task, although valuable to the related industry in sustainable transportation systems. It is not difficult to find that traffic flow prediction is essential in the whole process of route optimization. If we predict the traffic flow accurately, we can respond in time to avoid traffic jams and keep roads smooth.
Due to this demand for flow prediction, plenty of work has contributed to traffic prediction for a long time. In general, some traffic flow prediction approaches are built on traditional mathematical statistics 
, such as AR, ARMA, and ARIMA. On the other hand, as a result of the limitations of traditional models and the excellent performance of deep learning in prediction tasks, the deep learning-based methods have been evolved, like DNN, DBN , LSTM , and GAN . However, the methods mentioned above only consider the numerical traffic flow based on the statistics of Spatio-temporal data but neglect the existence of human mobility behavior, which refers to travel habits and plays a decisive role in the change of traffic flow. The issue can lead to a lack of definite identification and differentiation of traffic flow. Therefore, existing works do not identify mobility behaviors through the relationship between peoples, which results in a deviation for the final prediction performance of different groups. Experientially, people in a similar group have similar mobility behaviors. For example, the actuality that most commuters work from 9 am to 5 pm may mean that they must travel at least once in the morning and evening and pass regular bus stops. Therefore, our studies defines a passenger mobility pattern as a group of people with similar travel routes. Hypothetically, the total flow consists of two flows: (i) steady flow having from most people with permanent jobs and residence, and (ii) uncertain flow generated from travel, entertainment, and so on. However, few studies considered passenger mobility pattern analysis to predict traffic flow as far as we know.
In recent years, Graph Neural Network (GNN), especially Graph Convolutional Network (GCN), has effective performance in extracting the features and relationships of a topological graph. GNN not only represents explicitly the nodes of the graph in the low-dimensional vector space (also called embedding), but retains essential attributes. Ordinarily, the embeddings of nodes can be used in various downstream tasks, including clustering , classification, and prediction . Different from the previous methods treating Spatio-temporal information of the trajectory data as the main feature , based on the passenger bus record data, we try to define and construct an interpretable graph structure-based network that can be applied in GCN based model to explore passenger mobility patterns. Besides, the adjacent bus stops with a solid Spatio-temporal relationship can improve the accuracy of traffic flow prediction in the mass transit network. It is important to note that our studies focus on the prediction, so we do not consider the fare evasion  and certain anomalies [26, 3], which do not affect the evaluation of the predictive model.
In this paper, we introduce a framework, namely MPGCN, with three stages to predict passenger flow for the first time. We first obtain related information of bus stops based on the bus record data, which are used in the analysis of passenger mobility patterns. Secondly, we construct a sharing-stop network of passengers, including stop matching and weight assignment of graph edges. The sharing-stop network is utilized in graph deep clustering with GCN to explore mobility patterns. Furthermore, to verify their diversity, we execute the statistical analysis of each mobility pattern by describing heavy-tailed distributions of the number of bus stops the passengers passed. Then, considering the spatial correlation of bus route network and temporal correlation of traffic flow, we propose GCN2Flow combining Spatio-temporal information to separately predict passenger flow of different patterns, where the predictive flows for all passenger patterns are fused in the final stage to obtain the prediction result. Finally, we design a case study for optimizing routes, where we select optimal routes from candidates of passengers and set the passenger diversion and experience as the main optimization objective based on previous prediction results.
The main contributions of this work are summarized as follows:
We develop a novel prediction framework, namely MPGCN, which integrates the passenger mobility patterns into passenger flow prediction task to enhance accuracy. We define a passenger mobility pattern as a group of people with similar travel times or similar travel routes. Our MPGCN includes three stages to achieve the prediction: (i) pre-processing the bus record data, (ii) recognizing passenger mobility patterns, and (iii) predicting passenger flow by proposed GCN2Flow combined with Spatio-temporal information. Besides, we design constrained planning as a case study for optimizing routes and thus improving passenger diversion and experience based on the prediction results.
We present a sharing-stop network, where the relationship between passengers is established, and explore the passenger mobility patterns in the sharing-stop network based on deep clustering with GCN. Through statistical analysis and data fitness, we suggest the reasonability and interpretability of the network, as well as the significant laws among different mobility patterns.
We conduct a series of experiments, including the analysis of passenger mobility patterns and the comparison with different prediction algorithms with or without passenger mobility patterns. The predictive evaluation demonstrates that our framework has better performance and substantially improves passenger flow prediction. Besides, Prediction is accurate enough to be used for downstream tasks such as route optimization.
The remainder of this paper is structured as follows. In Section 2, we briefly review related works. Section 3 mainly illustrates our proposed model and framework (MPGCN) in detail. Data description and analysis of the experimental results are given in Section 4. Finally, we present discussions, conclusion and future work in Section 5 and 6.
Ii Related Work
In this section, we review the existing works closely related to the research project presented in this paper covering three fields: traffic flow prediction, human mobility pattern, and graph convolutional network.
Ii-a Traffic Flow Prediction
Traffic flow prediction has many cutting edge applications, such as road network planning, congestion prevention, and accident detection. Considering the technical approach applied in prediction, traffic flow prediction models can be roughly into three categories: (i) traditional mathematical-statistical parametric, (ii) non-parametric regression, and (iii) artificial neural network (ANN) models.
The mathematical-statistical parametric models mainly examine the time series that have a periodic change rule in the urban road traffic, like traffic peak in the day and night time. These models include autoregressive moving average (ARMA), and autoregressive integrated moving average (ARIMA) .
Kumar et al.  utilized the seasonal ARIMA model to design a prediction scheme using only limited input data. In this model, the issues associated with huge time-series data like availability, computation, storage, and maintenance are considered, and the last three days’ flow observations were used as input for predicting the next day’s flow. Support Vector Regression (SVR)  and Nearest Neighbor Regression  are the most popular nonparametric regression models. With the complexity and diversity increasing in the road network, there is a demand for more accurate traffic prediction. Because of the efficacy of ANN for various prediction tasks in complex and diverse scenarios, it attracts attention to traffic flow prediction. Liu et al. 
proposed a novel passenger flow prediction model using a hybrid of deep network of unsupervised stacked autoencoders (SAE), and supervised deep neural network (DNN). Besides, Yu et al. introduced Spatio-temporal GCN (STGCN) model to forecast traffic using graph GNN and Gate CNN for extracting spatial and temporal features, respectively. More recent works such as T-GCN , and TGC-LSTM  further utilized GCN to the extraction of spatial and temporal correlations to improve prediction. However, all the models presented in this section only focus on mining the information in the raw data while ignoring the potential mobility patterns of passengers. For addressing this research issue, for the first time, this paper aims to discover passenger mobility behaviors and rules from the bus record data and then utilize them for traffic flow prediction. Since passenger mobility behaviors and rules are to be exploited in this paper to predict passenger flow, human mobility pattern is presented in the next section.
Ii-B Human Mobility Pattern
The fluctuation of traffic flow naturally depends on the human mobility and travel. Consequently, the analysis of human mobility patterns is of paramount importance for traffic prediction. The current methods are distinguished primarily by the type of datasets. For example, two dataset types: (i) unconscious mobility data (e.g., sensors data, or card records) and (ii) active sharing mobility data (e.g., traditional diary activity surveys or social location sharing). Using the former dataset type, Yan et al.  presented a model to capture the underlying driving force accounting for human mobility patterns based on GPS and mobile phone data. Besides, Qi et al.  designed a multi-step methodology to extract mobility patterns from smart card data and points of interest data. Nitti et al.  presented a Wi-Fi-based Automatic Bus pAssenger CoUnting System (iABACUS), which did not depends on passengers card records and can track passengers to analyze urban mobility. In , authors integrated taxi and subway data to compute the human mobility network, and discover the human mobility patterns in terms of trip displacement, duration, and interval. On the other hand, people are willing to share their activities containing location information because of the convenience and popularity of the social networks, like Weibo and Twitter. Utilizing the latter, Comito et al.  developed a methodology to discover people, community behavior and travel routes from geo-tagged posts and tweets. Nevertheless, none of these research works has leveraged human mobility patterns to predict traffic flow. Therefore, exploring human mobility patterns from passenger bus data is one of the main aims of the research project presented in this paper. Since we decide to leverage graph convolutional network for the traffic flow prediction, the next section presents its overview.
Ii-C Graph Convolutional Network
With the development of graph learning, there is an extension of CNNs in the graph (network) as an extensive data structure, like embedding of nodes or subgraphs into vector spaces. The first convolutional operation on graphs is presented in . However, it has been evolved over time for its effectiveness in representing graph  and its numerous application domains, such as node clustering , classification , prediction , and so on. Besides, to further utilize neighbours’ information, GAT  combined multi-head self-attention mechanism to calculate attention scores of different neighbours. In this paper, unlike others, a major challenge is to establish an explainable network from bus record data, which can enable us to discover the relationship between passengers. Besides, we also need to extract spatial features of geographical information in the stop network to improve passenger flow prediction.
Iii Design of Framework
This section provides the details of the theoretical underpinning of our proposed network and techniques used in our proposed Multi-Pattern GCN based passenger flow prediction, namely MPGCN framework as shown in Fig. 2.
Iii-a Network construction: Sharing-Stop Network
For exploring human mobility patterns, a sharing-stop network is defined as a weighted undirected graph , where is the set of passenger nodes and is the set of edges. is the weighted matrix with each element . Specifically, the edge between passengers and denotes their existing relationship of sharing stops, and weight indicates the count of the occurrences of their sharing stops. The pseudocode of the construction of the sharing-stop network is shown in Algorithm 1. A simple example of this sharing-stop network is shown in the middle part of Fig. 2 based on the empirical assumption that there are similar records between passengers in the same pattern meaning that for the same mobility pattern, passengers have a similar number of edges and the similar value of their weights in the sharing-stop network.
Iii-B Deep Clustering with GCN
After constructing the graph, inspired by , an unsupervised deep clustering with GCN is used to mine potential passenger mobility patterns as shown in Fig. 3. First, unsupervised representation method, the basic autoencoder (AE), is employed to learn the representation of passenger nodes, that is a mapping function , where . We assume that there are layers in the encoder and decoder parts, which are symmetrical. Therefore, the representation of the -th layer, in the encoder and in the decoder, can be obtained as follows:
where and are the weight matrix and bias, respectively, andis X that is initial feature matrix obtained by previous sharing-stop network, and the output of the decoder is the reconstruction of
. Hence, the loss function of the entire AE is as follows:
where denotes the Euclidean distance between two representations matrix.
On the other hand, we integrate these representations into GCN that can learn them by combining the relationship between passenger nodes. In this part, the convolutional operation of the -th layer can be defined by:
where , and
is an identity matrix.is the degree of node in adjacent matrix , and is the weight matrix of parameters. Specially, the input of -th layer in GCN, , combines the representations from the initial GCN and AE:
Eq. (4) joins GCN with AE. In this case, we uniformly set
. The final representation of the last layer can be mapped as a multiple classification probability with softmax function:
can be regarded as the probability distribution, anddenotes the probability of node in cluster .
For being more suitable for deep clustering, a dual self-supervised method is employed to combine clustering information with the representation learned previously. With t-distribution to measure the similarity, the probability of node in cluster is as follows:
where is from the ,
is the cluster center vector initialized by the K-means clustering, and
is the degree of freedom of t-distribution. Therefore, we obtain the clustering probability distribution. Besides, the target distribution, , can be computed and normalized as follows:
Then, we use the KL divergence as part of the loss function, that is is the KL divergence between and distributions, and is between and . In the end, the overall loss function is defined by:
where is the hyper-parameters, , and . The final cluster label of node is determined considering the maximum value of from the probability distribution .
Iii-C Multi-Pattern GCN Based Passenger Flow Prediction
The flow prediction components include identifying passenger patterns, training neural network model with GCN, and predicting passenger flow. From the previous results of clustering, passenger nodes under the same cluster reflect that they have similar mobility rules. Therefore, we divide the passengers into several mobility pattern groups utilizing the clustering results, which is also patterns exploration. Besides, we design a statistical task to discover the potential law by fitting several possible heavy-tailed distribution  shown in Section IV.
As shown in Fig. 4, we develop a prediction architecture, GCN2Flow that comprises several Temporal Convolutional blocks (TC blocks) and Spatial GCN blocks (SGC blocks). In this section, the stop network based on routes is defined as , where is the set of stop nodes and is the set of edges. indicates the existence of a route from the stop to the next stop . is a weighted adjacency matrix, whose value of elements denote the geographical distance.
On the one hand, passenger flow prediction leverages historical time series data i.e., the past
time steps are used to predict the next time step. Recurrent neural network-based methods are popular in time-series prediction. However, they have issues, such as time-consuming training and insensitive to the dynamics of long sequences. Therefore, in our TC block, we define a temporal gated convolutional operation, which utilizes gated linear units (GLU)
as a non-linearity with the residual connection. We assume that, are respectively the number of input and output channels, and is the input of the TC block. The operation is as follows:
where (k is the size of convolutional kernel), are learned parameters, and is the element-wise product between matrices.
Note, there are spatial connections between stops of bus routes. For example, the passenger flow of a stop is related to the flow of its neighboring stops. Therefore, the spatial graph convolutional operation through Chebyshev polynomials and first-order approximation  can be written as:
where , is degree matrix of the adjacent matrix , and is the learnable parameter matrix.
Therefore, one TC block and one SGC block are jointly utilized to extract Spatio-temporal features. The whole computational process of two blocks in l-th layer is designed as:
where ( in this case) is the flow matrix of stops with time steps in pattern .
is the activation function (Rectified linear unit). Furthermore, we execute an extra temporal gated convolutional operation and attach a Fully-Connected (FC) layer as the output of the whole network. Therefore,is the prediction flow matrix of next time step. Therefore, the loss function for passenger flow prediction in pattern can be defined as:
Finally, we train multiple GCN2Flow models, the number of which depends on the number of passenger mobility patterns. Therefore, each mobility pattern has its own GCN2Flow model to predict its passenger flow, and then we merge prediction result of all GCN2Flow models to obtain the total passenger flow prediction result. Algorithm 2 presents the pseudocode for the training and predicting process of MPGCN.
Iii-D Route Optimization
Finally, we use the prediction result of MPGCN to attempt a simple application case study. Once ensuring the accuracy of passenger flow prediction, we can assume that the prediction result is the real flow distribution of bus stops at the next time interval. In our framework, we show an example, route optimization, which is closely related with flow prediction, and our optimization task aims at providing a new travel route to avoid crowded bus stops, thus relieving the pressure of overcrowded bus stops in the public transportation systems.
Therefore, we mainly focus on passenger diversion and ride experience as the optimization objective
, measured by the standard deviation () of traffic flow of all bus stops. More specifically, given the Origin-Destination () matrix of passengers (shown in Section IV) and bus route network, how to select optimal routes from the candidate route set becomes the primary purpose. Mathematically, the objective function, , is similar with the standard deviation calculation formula, which means and the domain of variable is a finite set. Then, the optimal route selection problem can be abstracted into a convex optimization problem. In other words, given a stop network and traffic condition at the next time interval, the objective is to minimize the of traffic flow of all bus stops by changing the travel route of some passengers. Therefore, the objective function and constraints can be defined as follows:
where function can generate candidate routes finite set, , by passing in arguments including , matrix at time and the predicted passenger flow at the next time , and counts the passenger flow matrix of all stops based on a candidate route in . Besides, in the process of generating candidate routes set, we would set a threshold to ensure minimal additional cost of passenger travel time, that is, ( in this part), , where is the shortest travel route based on . As a result, we can obtain the optimal routes of passengers, which can make the passenger flow of bus stops more balanced and relieve the crowded situation on the bus to a certain extent.
Iii-E Complexity Analysis
In the part of deep clustering, we denote as the dimension of the input of layer in the autoencoder. The time complexity of the autoencoder is , and the time complexity of GCN module is . Besides, we assume that there are K clusters, and the time complexity of (6) is . Therefore, the total time complexity is the sum of the above three, and is linearly related to the and . Similarly, the time complexity of flow prediction method is .
To demonstrate the efficiency of our proposed MPGCN, we conducted a series of experiments. In this section, firstly, we describe the experimental dataset in detail, including preprocessing of data, analysis of data, and sharing-stop network. Secondly, we show the related parameters setting in experiments. Next, we present the analysis of mobility patterns and the effectiveness evaluation of the prediction performance compared with other methods. Finally, we suggest the application value of our prediction results through a case study.
Iv-a Data Description and Analysis
Iv-A1 Data Description
The real-world bus dataset is employed in our experiments, a typical kind of Spatio-temporal data, shown in TABLE I and TABLE II that contains bus record dataset from bus card, credit card, and Qr code, and bus stop arriving-leaving dataset comprising 12 used fields, which can basically cover the majority of bus passengers and reflect the overall trend of passenger mobility. Besides, the information about bus stops includes longitude, latitude, and the sequence of bus stops in the route. The bus dataset was generated by buses in Jiangsu, by Panda Bus Company, for 30 days (nine weekend days and 21 weekdays) from November 1st to 30th, 2019, and 18 hours a day from 05:00 to 23:00. It is noted that we do not use passengers’ private information, so there is no privacy issue with our data.
|bus_no||ID of each bus||11180|
|card_no||ID of each passenger||2230000010282075|
|cardType||Payment card type||1|
|riding_time||Record time and date||2019-11-01 05:29:20|
|routeId||ID of each route||106|
|bus_no||ID of each bus||61189|
|enterTime||Enter stop time and date||2019-11-01 06:37:59|
|leaveTime||Leave stop time and date||2019-11-01 06:38:12|
|stopId||ID of each stop||46976|
|routeId||ID of each route||157|
|directId||0 for upline, 1 for downline||0|
|stayTime||Bus waiting time (seconds)||13|
Iv-A2 Preprocessing and Analysis
From the raw data, we could not directly obtain bus stops where the passengers get on the bus and scan their card or Qr code. Hence, we first need to match bus stops. Considering the real-world experience of bus riding in China, the bus may have already run when passengers scan their card or Qr code. This fact implies the scanning time may not be between entering and leaving time and thus has a certain deviation. Consequently, we empirically selected 20 seconds to increase the time interval to expand the matching time range. Algorithm 3 presents the technique for matching stops. Another difficulty with data preprocessing is that the drop-off stops of passengers are not given, unlike subway travel, which means the destination of passengers is not clear. Usually, travel by bus is symmetrical. For example, passengers’ origin stop and destination will be switched to return to the origin stop. Based on this assumption, we will extract all getting-on stops and corresponding bus lines, and the two stops will be regarded as the origin and destination if these two stops are on the same line for a certain passenger. Therefore, the symmetrical matrix can be inferred.
After matching stops and inferring the matrix, we expand the matrix into stop trajectories, which enable us to count passenger flow at each stop. Besides, the number of records per passenger is required to be higher than a particular value. This higher number of records is required because of our demand for exploring passenger mobility and its laws. For this, some passengers and their records are needed to be filtered. For example, passengers having a few records in a month or starting to generate a card record end of the month were filtered. Also, we found data anomalies like the missing value problem among the raw data. For example, bus stop arriving-leaving data of some buses for several days is missing, influencing model training. Hence, inspired by 
, we use the linear interpolation method to reduce errors during the computing of traffic flow. The matching process screens some records having less impact on passenger mobility laws. Finally, the data extracted by us contains 857900 bus records of 31353 passengers, 214 routes, and 1114 stops in Huai’an city.
Apart from data preprocessing, cleaning, and filtering, we analyze the relevance of passenger flow and time. In Fig. 5, we list passenger flow for one day (November 1st, 2019). Fig. 5 shows daily flow is similar in shape i.e., there are 2 or 3 peaks and 1 or 2 troughs. Therefore, this similarity also means that not only is traffic flow regular, but also that passenger mobility follows the law. Besides, as shown in Fig. 6, we calculate the number of stops that each passenger passes and the average daily number of records at each stop in a month and describe their distribution based on a certain order of magnitude after counting them. Fig. 6 exhibits both distributions conform with the heavy-tailed distribution, which is why our network embedding part is inspired. The existence of heavy-tailed distribution is the main reason for the latent effectiveness in the process of natural language model and network representation.
After preprocessing, the sharing-stop network is constructed with Algorithm 1. As shown in Fig. 7, we count the degrees of all passenger nodes in the network and draw their distribution, which suggests the analogous distribution, an expected heavy-tailed distribution, and shows the potential law of passenger mobility.
Iv-B Experimental Settings
In the part of deep clustering with GCN, we extracted passenger mobility patterns based on the sharing-stop network. Since the size of the sharing-stop network is large, we set the dimension of the neural network of AE to -100-100-500-16, which was the same as the GCN module. In addition, the result of our method is insensitive to hyper-parameters, therefore the setting of hyper-parameters is
. The learning rate and epoch number were 0.001 and 100, respectively. For considering the impact of cluster number on the passenger flow prediction comprehensively, we set 3, 4, and 5 as the number of clusters (passenger mobility patterns), and selected the most suitable number of passenger mobility patterns to analyze their latent laws based on the prediction results.
For flow prediction, the historical passenger flow data for an hour were used as the input of our proposed method to predict the flow for the next time step, and the time step was taken as 5, 15, and 30 minutes, respectively. There were 5 TC blocks and 2 SGC blocks in our GCN2Flow. The convolution kernel size of both blocks was 3. The batch size and the learning rate were 64 and 0.001 with epochs 100, respectively. Before training, the flow of all bus stops was normalized with the Z-score method, and the stop network as Laplace matrix was also normalized.
Further, similar with previous works [44, 21], we utilized the following three most common metrics used in the comparison of passenger flow prediction methods: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Correlation Coefficient (CC).
Mean Absolute Error (MAE). MAE is the mean of all absolute errors between the predicted values and their corresponding real values, whose equation is given as follows:
where and denote the predicted values and real values, respectively.
Root Mean Square Error (RMSE). RMSE measures the deviation between predicted values and their respective real values. RMSE is defined as:
Correlation Coefficient (CC). CC is used to verify the correlation between variables and has different forms. In this paper, we used Pearson CC to measure correlation possessing the following formula:
represent the covariance and variance, respectively.
In addition, our model is implemented based on Pytorch framework, and the experiments is executed with NVIDIA RTX 2080 Ti.
Iv-C Analysis of Passenger Mobility Patterns
To investigate the impact of extracting passenger mobility patterns on prediction, we varied the number of clusters used in prediction models. In Fig.8, indicates the GCN2Flow and LSTM  models were directly used to predict passenger flow, while =3, 4, and 5 means that the MPGCN and Multi-pattern LSTM (MP-LSTM) models were used to predict flow of passengers for 3, 4, and 5 different mobility patterns, respectively. Fig. 8 shows it is effective to enhance the performance of prediction combining passenger mobility patterns i.e., MPGCN and Multi-pattern LSTM (MP-LSTM) are better than GCN2Flow and LSTM, respectively.
As a result, the MPGCN and Multi-pattern LSTM (MP-LSTM) models produced the best performance for . Therefore, we selected as the number of passenger mobility patterns for further analysis. In our studies, a clustering result is viewed as a mobility pattern. Due to the basics of the sharing-stop network, we suspect that passenger nodes in the same pattern tend to have similar travel habits, like taking fixed and frequent bus stops or routes. In this way, the number of passengers in the four patterns is 11857, 10537, 3475, and 5484, which add up to 31353.
To further mine special laws hidden in mobility patterns, we show several heavy-tail distributions to fit the number of stops passengers pass (
), which are power-law, exponential, log-normal, and Weibull distributions. The probability density function (pdf) of these distribution are shown in TABLEIV. From Fig. 9, for Patterns 1, 2 and 3, the distributions of have similar law i.e., log-normal and Weibull distributions are better fitting curves than the remaining two distributions (power-law and exponential). Before
reaches the highest value, the Weibull distribution has a better fitting effect, and after that, the log-normal distribution becomes better. Besides, by comparing key parameters of distributions,
in pdf of log-normal distribution;and in pdf of Weibull distribution, the similarity between log-normal and Weibull distributions can also be further confirmed. For Pattern 4 and in pdf of log-normal distribution and in pdf of Weibull distribution, log-normal distribution achieves the best fitting. More specifically, through quantitative analysis, we note that 80 percent of passengers of Patterns 1, 2, and 3 tend to pass around less than or equal to 127 stops, while 80 percent of passengers of Pattern 4 only pass less than or equal to 101 stops.
Furthermore, in order to verify our conjecture that passengers in the same pattern tend to have similar travel habits, we count the flow contribution of passengers of four mobility patterns in each bus route and analyze the distribution proportion of the top 40 bus routes in terms of total passenger flow. Then, we define the route preference of the pattern, which denotes the proportion of passengers of this pattern in the route exceeds 50%. As shown in TABLE III (see Appendix for details), noticeable preference differences of mobility patterns can be found. For example, passengers of pattern 4 contribute more than 90% on routes 62, 63, 65, and 66. Particularly, for the routes not shown in TABLE III, we also found that the proportion of pattern 1 and pattern 2 is relatively high and close, both of which are above 30% or 40%, such as routes 2, 4, 16, 26, and 31. This fact indicates that there is a specific shared similarity between the two mobility patterns in terms of travel. Therefore, through deep clustering, the effectiveness of our implicit mobility pattern extraction is verified by these analysis results. In other words, passengers of the same pattern tend to travel by bus on some fixed routes and contribute most of the traffic flow in those routes, while passengers with different patterns often choose different routes to travel. Besides, based on the latent laws of mobility patterns, it is effective to combine them with traffic flow prediction task.
|Pattern No.||List of Route Preference|
|1||[5, 7, 12, 15, 18, 19, 32, 33, 36, 39, 53, 88, 89, 100]|
|2||[1, 6, 11, 14, 22, 23, 28, 38, 116, K1]|
|3||[50, 91, 713]|
|4||[10, 20, 62, 63, 65, 66, 69]|
|Distribution||Probability Density Function (pdf)|
Iv-D Passenger Flow Prediction
Iv-D1 Prediction Result
Based on the passenger mobility patterns obtained from the previous experiments, using GCN2Flow (MPGCN model described in Algorithm 2 for individual patterns), we predicted the passenger flow for each pattern. Fig. 10 shows the short-term passenger flow prediction results of our proposal GCN2Flow and MPGCN with for a weekday and a weekend day. The comparison between the predicted flow with the real flow justifies that the excellent prediction result is achieved. For different trends of weekdays and weekend days, our models both capture the features of passenger flow trends i.e., flow peaks and troughs. In terms of spatial features, the SGC block is capable of fast predicting the dynamic flow changes in the stop network based on the bus route network. Moreover, by combining passenger flow predictions using MPGCN, the prediction accuracy can be improved.
Iv-D2 Comparison of prediction approaches
To verify the potential capability of our approach, we predicted passenger flow using the same dataset by many contemporary and popular relevant models that include machine learning models, mathematical-statistical model, and neural networks model: Logistic Regression (LR), Support Vactor Regression (SVR), Gradient Boosting Decision Tree (GBDT), ARIMA, LSTM, DCRNN , STGCN , MP-LSTM (LSTM with passenger mobility patterns).
The results of prediction evaluation are presented in TABLE V
. Our proposed MPGCN achieves the best performance in all three evaluation metrics. In general comparison, because of the complexity of data, the mathematical-statistical model, ARIMA, is the worst at predicting. Machine learning models have similar prediction performance, i.e., they can predict well for short-term flow, while their prediction accuracy is seriously reduced for the long-term without considering the relationship of spatial geographic information. Although the prediction performance of the neural networks model, LSTM, is respectful, which has better prediction results than those for the mathematical-statistical model and other machine learning models, it is still inferior to our models. It’s worth mentioning that the results of STGCN and DCRNN are close to our GCN2Flow. Besides, in terms of passenger mobility patterns, when applying them in the prediction model (MP-LSTM), its performance can be further enhanced. In particular, for thesetting, MPGCN achieves 5.3% () MSE reduction compared to other baselines. To sum up, the ability of GCN2Flow and the effectiveness of combining passenger mobility patterns, MPGCN vindicate their application in the passenger flow prediction.
Iv-E A Case Study of Route Optimization
On the one hand, passenger flow prediction can be supplied to many downstream tasks. On the other hand, the feedback results of downstream tasks can also verify the accuracy of prediction. In our case, we define a route optimization task for allocating passenger flow and improving travel experience on the bus, which demonstrates the value of our proposed MPGCN in another way.
In the experiment, based on the OD matrix, we utilize prediction results for recommending an optimal route to each passenger, which meets objective function and constraint conditions (Eq. 13, 14). We select the data of the last day in the dataset to carry out the optimization experiment. Then, we recount the last day’s flow after optimizing passengers’ travel routes and calculate the standard deviation of the flow of all bus stops as a simple evaluation metric. From Fig. 11, our route optimization has little impact on passenger flow because of our constraint conditions. On the other hand, it reduces , which means balancing the flow of all bus stops, reducing the probability of congestion on the bus, and achieving traffic diversion in bus travel. Hence, the prediction results of our MPGCN are accurate enough to be effectively applied in traffic flow-based downstream tasks.
Overall, our studies focus on solving the problem of passenger traffic prediction using a novel concept called passenger mobility pattern. Deep clustering with GCN, according to the result analysis, can implicitly extract passenger mobility patterns in the sharing-stop network. In terms of spatial characteristics, passengers of the same mobility pattern have high similarities. That is to say, they share similar travel bus stops and routes, and their contributions to specific routes’ flow are dominant. However, our established sharing-stop network does not include the relationship of passenger travel time, so we were unable to uncover any potential temporal-related laws in the extracted mobility patterns, which merits more investigation in future. Besides, our studies can also be used in other modes of transport like the subway. However, it should be highlighted that we are primarily interested in mobility patterns based on frequent and consistent travel, which necessitates sufficient co-occurrence relationship for passengers. Therefore, our method may not be suitable for the traffic flow prediction task of infrequently used transport mode. For example, if a passenger only travels by air once or twice a year, it is hard to discover their mobility patterns.
In addition, the route optimization problem in public transport systems involves a variety of specific tasks on different scenarios, including route network design, frequency setting, timetable optimization , schedule optimization , and passenger assignment adjustment. Travel time, waiting time, path length, amount of crowding on the buses, and other objective functions are all varied due to different sub-problems. . In essence, our case study of route optimization is about passenger assignment adjustment, thus we do not operate on route design, bus timetables, or schedules. Unlike mentioned above, our optimization aims to passenger diversion and lower level of bus crowding, as stated in Section III-D. It is also understandable that passengers can be informed of route congestion and recommended an alternate travel route to alleviate the overcrowded at bus stops. Furthermore, there are also shortcomings in our route optimization. For example, the new route selected in the optimization may be longer than the original route, resulting in longer travel time or more transfers times. Therefore, our case study is merely a naive application example, which can demonstrate the advantages of passengers prediction in our MPGCN, but it also provides researchers with another idea to extend more applications in practical industrial application.
Vi Conclusion and Future Work
In this paper, we introduced forward a passenger flow prediction framework, MPGCN, including the sharing-stop network construction, passenger mobility patterns recognition, and passenger flow prediction. We executed experiments to analyze extracted mobility patterns, and we discovered that different mobility patterns can fit heavy-tailed distributions well and have their own travel laws and route preference. Our framework gave full consideration to the impact of passenger mobility patterns on prediction. We conducted extensive experiments to demonstrate that MPGCN can accurately predict bus passenger flow based on the real bus record data. Finally, we design a simple case study, which shows the value and application of our accurate prediction in the downstream tasks, like route optimization. Besides, prediction results of MPGCN can be applied extensively in other services and strategies of ITS for sustainable public transportation, such as subsequent bus scheduling, route planning, and congestion management.
When sufficient multi-source sensors data is available, we will attempt to provide fine-grained analysis and service based on human mobility patterns. For further research, considering specific Spatio-temporal information, we will construct different personal mobility prediction architectures.
[Analysis of Passenger Mobility Patterns]
TABLE VI shows flow contribution proportion of passengers of four mobility patterns in the top 40 bus routes by total passenger flow, which is used to analyze the route preferences of mobility patterns and travel rules.
|Route No.||Pattern No. (%)||Total flow||Route No.||Pattern No. (%)||Total flow|
-  (2018) Spatio-temporal data mining: a survey of problems and methods. ACM Comput. Sur. 51 (4). Cited by: §I.
-  (2017) A multi-pattern deep fusion model for short-term bus passenger flow forecasting. Applied Soft Computing 58, pp. 669–680. Cited by: §I.
-  (2017) Time reliability measures in bus transport services from the accurate use of automatic vehicle location raw data. Quality and Reliability Engineering International 33 (5), pp. 969–978. Cited by: §I.
-  (2020) Fare evasion in public transport systems: a review of the literature. Public Transport 12, pp. 27–88. Note: doi: https://doi.org/10.1007/s12469-019-00225-w Cited by: §I.
-  (2020) Structural deep clustering network. In Proceedings of The Web Conference 2020, WWW ’20, pp. 1400–1410. Cited by: §II-C, §III-B.
-  (2017) Passenger routing for periodic timetable optimization. Public Transport 9 (1), pp. 115–135. Cited by: §V.
-  (2014) Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations (ICLR), Cited by: §II-C.
-  (2018) Bike flow prediction with multi-graph convolutional networks. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’18, pp. 397–400. Cited by: §II-C.
-  (2016) Mining human mobility patterns from social geo-tagged data. Pervasive and Mobile Computing 33, pp. 91–107. Cited by: §II-B.
-  (2020) Traffic graph convolutional recurrent neural network: a deep learning framework for network-scale traffic learning and forecasting. IEEE Transactions on Intelligent Transportation Systems 21 (11), pp. 4883–4894. Cited by: §II-A.
-  (2017) Language modeling with gated convolutional networks. Proceedings of Machine Learning Research, Vol. 70, pp. 933–941. Cited by: §III-C.
-  (2015) Time-aware multivariate nearest neighbor regression methods for traffic flow prediction. IEEE Transactions on Intelligent Transportation Systems 16 (6), pp. 3393–3402. Cited by: §II-A.
-  (2020) Deep irregular convolutional residual lstm for urban traffic passenger flows prediction. IEEE Transactions on Intelligent Transportation Systems 21 (3), pp. 972–985. Cited by: §I.
-  (2019) The sensable city: a survey on the deployment and management for smart city monitoring. IEEE Communications Surveys Tutorials 21 (2), pp. 1533–1560. Cited by: §I.
-  (2021) A survey on the transit network design and frequency setting problem. Public Transport, pp. 1–36. Cited by: §V.
Seasonal and trend time series forecasting based on a quasi-linear autoregressive model. Applied Soft Computing 24, pp. 13–18. Cited by: §I.
Forecasting model of traffic flow prediction model based on multi-resolution svr.
Proceedings of the 2019 3rd International Conference on Innovation in Artificial Intelligence, ICIAI 2019, pp. 1–5. Cited by: §II-A.
-  (2020) Congestion recognition for hybrid urban road systems via digraph convolutional network. Transportation Research Part C: Emerging Technologies 121, pp. 102877. Cited by: §I.
-  (2016) Variational graph auto-encoders. NIPS Workshop on Bayesian Deep Learning. Cited by: §II-C.
-  (2017) Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), Cited by: §II-C, §III-C.
-  (2018) Shared subway shuttle bus route planning based on transport data analytics. IEEE Transactions on Automation Science and Engineering 15 (4), pp. 1507–1520. Cited by: §I, §IV-B.
-  (2020) Mobile edge cooperation optimization for wearable internet of things: a network representation-based framework. IEEE Transactions on Industrial Informatics. Cited by: §I.
-  (2015) Short-term traffic flow prediction using seasonal arima model with limited input data. European Transport Research Review 7 (3), pp. 21. Cited by: §II-A, §II-A.
-  (2018) Diffusion convolutional recurrent neural network: data-driven traffic forecasting. In International Conference on Learning Representations (ICLR), Cited by: §IV-D2.
-  (2017) A novel passenger flow prediction model using deep learning methods. Transportation Research Part C: Emerging Technologies 84, pp. 74–91. Cited by: §I, §II-A.
-  (2007) Estimating bus passenger waiting times from incomplete bus arrivals data. Journal of the Operational Research Society 58 (11), pp. 1518–1525. Note: doi: 10.1057/palgrave.jors.2602298 Cited by: §I, §IV-A2.
-  (2019) A survey of models and algorithms for optimizing shared mobility. Transportation Research Part B: Methodological 123, pp. 323–346. External Links: Cited by: §I.
-  (2020) IABACUS: a wi-fi-based automatic bus passenger counting system. Energies 13 (6), pp. 1446. Cited by: §II-B.
-  (2019) Matrix factorization for spatio-temporal neural networks with applications to urban flow prediction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, pp. 2683–2691. Cited by: §I.
-  (2019) Evaluating alternative methods to estimate bus running times by archived automatic vehicle location data. IET Intelligent Transport Systems 13 (3), pp. 523–530. Cited by: §I.
-  (2019) Analysis and prediction of regional mobility patterns of bus travellers using smart card data and points of interest data. IEEE Transactions on Intelligent Transportation Systems 20 (4), pp. 1197–1214. Cited by: §II-B.
-  (2018) On optimal and fair service allocation in mobile cloud computing. IEEE Transactions on Cloud Computing 6 (3), pp. 815–828. Cited by: §I.
-  (2020) Dynamic public resource allocation based on human mobility prediction. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4 (1). Cited by: §I.
-  (2004) Multi-scale high-speed network traffic prediction using k-factor gegenbauer arma model. In 2004 IEEE International Conference on Communications (IEEE Cat. No.04CH37577), Vol. 4, pp. 2148–2152. Cited by: §II-A.
-  (2021) GCN2CDD: a commercial district discovery framework via embedding space clustering on graph convolution networks. IEEE Transactions on Industrial Informatics. Note: doi: 10.1109/TII.2021.3051934 Cited by: §I.
-  (2018) Industrial internet of things: challenges, opportunities, and directions. IEEE Transactions on Industrial Informatics 14 (11), pp. 4724–4734. Cited by: §I.
-  (2014) Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27, pp. 3104–3112. Cited by: §IV-C, §IV-D2.
-  (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §II-C.
-  (2020) Deep learning for spatio-temporal data mining: a survey. IEEE Transactions on Knowledge and Data Engineering. Cited by: §I.
-  (2020) Predicting peak load of bus routes with supply optimization and scaled shepard interpolation: a newsvendor model. Transportation Research Part E: Logistics and Transportation Review 142, pp. 102041. Cited by: §V.
-  (2018) Exploring human mobility patterns in urban scenarios: a trajectory data perspective. IEEE Communications Magazine 56 (3), pp. 142–149. Cited by: §II-B, §III-C.
-  (2019) How powerful are graph neural networks?. In International Conference on Learning Representations (ICLR), Cited by: §I.
-  (2014) Universal predictability of mobility patterns in cities. Journal of the Royal Society Interface 11 (100), pp. 20140834. Cited by: §II-B.
-  (2018) Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), Cited by: §II-A, §IV-B, §IV-D2.
-  (2019) TrafficGAN: network-scale deep traffic prediction with generative adversarial nets. IEEE Transactions on Intelligent Transportation Systems, pp. 1–12. Cited by: §I.
-  (2020) T-gcn: a temporal graph convolutional network for traffic prediction. IEEE Transactions on Intelligent Transportation Systems 21 (9), pp. 3848–3858. Cited by: §II-A.
-  (2014) Commuting efficiency in the beijing metropolitan area: an exploration combining smartcard and travel survey data. Journal of Transport Geography 41, pp. 175–183. Cited by: §I.