1 Introduction
With the increasing population of cities, metro lines have become more and more crowded. For improving the service efficiency of the metro system and facilitating passenger travel, metro ridership prediction has become hotspot research in the community of Intelligent Transportation Systems (ITSs). Recent works focus on designing complicated graph convolutional recurrent network architectures to capture spatial and temporal patterns. These works depend on dividing the ridership data into equalsized intervals on the timeline and counting the ridership as the time period. In metro ridership data, there are distinct characteristics of peak and valley periods. The dense ridership during the peak period needs to be divided more finely, while the ridership in the low valley period is less, and there is not enough data when the time interval is small. This preprocessing destroys information about the ridership over time, which may be information about potential variables. At the same time, the problem is limited to time period and time period prediction, which is not conducive to realtime data collection and feedback to passenger users.
A more practical approach is to construct a continuoustime model that defines a potential state at all times. Chen et al.Chen et al. (2018) proposed a new family of continuous timeseries models (NODE) which directly models the dynamics of network hidden states with an ODE solve. Rubanova et al.Rubanova et al. (2019) generalize state transitions in RNNs to continuoustime dynamics specified by Neural ODEs.and proposes latent ODE network defined in continuous time, which makes it more effective in irregular sampling data. Zhou et al.Zhou et al. (2021, ) Applied the NODE to the highresolution reconstruction and prediction of urban flow grid map. These work provide a new direction for metro passenger flow prediction.
Previous work on NODE mainly focused on Euclidean data such as sequences and matrices. We modified it to make it suitable for nonEuclidean data, such as graph networks, called Graph ODEs(GODE).To facilitate performance comparison between models, we used the three complementary graphs proposed by PVCGNLiu et al. (2020) as spatial information to learn the relationship between subway stations. Inspired by latent ODEs, we use a GODERNN structure as encoder to learn the initial state
, and hidden states containing the characteristics of ridership. Subsequently, we modify the GODEs network in prediction so that the ridership information and its hidden vectors can be incorporated into the prediction process, which improves the cumulative error caused by the long time series. To verify the effectiveness of our STRGODEs, we conduct experiments on two largescale benchmarks(i.e., Shanghai Metro and Hangzhou Metro) with widely used baselines and stateoftheart models and the evaluation results show that our approach outperforms existing stateoftheart methods under various comparison circumstances.
In summary, our contribution is threefold:

We further extend the neural ODEs network to the time series prediction of graph networks, and to the best of our knowledge, firstly apply it to the metro ridership prediction.

By combining the ridership data and its hidden state, we improve the performance of the prediction model of neural grouph ODEs network so that it can reduce the cumulative error over a long time series.

our STRGODEs has better effect and robustness, and is no longer limited by the same time slice and equal distance prediction in previous work. It can be better applied in real life. Since we have only done some early work, this direction is very prospective.
2 Background and Related Works
Traffic States Prediction Traffic forecasting depends on the combination of spatialtemporal feature and has been studied for decades. Traditional methodsLiu et al. (2019); Milenković et al. (2018); Williams and Hoel (2003); Tan et al. (2009)
can only consider the relation of a single station in temporal dimension, lack spatial information, and can only learn simple time series models.In recent years, deep neural network has become the mainstream method in this field. The early work
Yao et al. (2018, 2019); Liu et al. (2018)mainly divided the studied cities into regular grid maps, and transformed the raw traffic data into tensors. CNN is used to capture the spatial correlation among near regions. This preprocessing method will bring structural noise information into the traffic system with irregular topology, which will affect the effect of the model. Graph convolution network
Defferrard et al. (2016); Kipf and Welling (2016) greatly improves the disadvantages of previous work on spatial information. Recent worksZhao et al. (2019); Geng et al. (2019); Bai et al. (2020) focus on designing complicated graph convolutional recurrent network architectures to capture the spatial and temporal patterns. DCRNN Li et al. (2017) reformulates the spatial dependency of traffic as a diffusion process and extends the previous GCN to a directed graph. GraphWaveNetWu et al. (2019) captures the hidden spatial dependency in the data by constructing a adaptive dependency matrix and learning it through node embedding. Liu et al.Liu et al. (2020)construct three complementary graphs to utilize the metro physical topology information. A Graph Convolution Gated Recurrent Unit (GCGRU) is applied to learn the spatialtemporal representation and a FullyConnected Gated Recurrent Unit (FCGRU) is applied to capture the global evolution tendency. However, due to the limitation of RNN structure, there are still some deficiencies in the temporal dimension.
Graph Neural Network Unlike traditional Euclidean data, graph data is difficult to be embedded into Euclidean space losslessly. Graph Convolution Networks (GCN)Defferrard et al. (2016); Kipf and Welling (2016) have been proposed to automatically learn feature representation on graphs,which is widely used in node classificationKipf and Welling (2016), link predictionZhang and Chen (2018), and graph classificationYing et al. (2018).Wu et al. (2020) There are two mainstreams of graph convolution networks, the spectralbased approaches Bruna et al. (2013); Defferrard et al. (2016); Kipf and Welling (2016)and the spatialbased approachesAtwood and Towsley (2016); Gilmer et al. (2017); Hamilton et al. (2017). In these approaches, the adjacency matrix is considered as prior knowledge and is fixed throughout training. Recently researchersLi et al. (2018); Wu et al. (2019); Bai et al. (2020) pay more attention on operating on both spatial and temporal dimensions without the predefined graph structure.
Neural Ordinary Differential Equations
Built upon previous worksWeinan (2017); Lu et al. (2018), Chen et al.Chen et al. (2018) proposed a new family of continuoustime models named Neural ODEs,which can be interpreted as a continuous equivalent of Residual networksHe et al. (2016). A basic formulation of Neural ODEs is shown as:(1) 
where is parametrized as a neural network.The hidden state h(t) is defined at all times, and can be evaluated at any desired times using a numerical blackbox ODE solver:
(2) 
By using adjoint methodChen et al. (2018),the gradients w.r.t. can be computed memoryefficiently during back propagation. Recent worksDupont et al. (2019); Massaroli et al. (2020); Davis et al. (2020); Yan et al. (2019); Ghosh et al. (2020) have tried to analyze this framework theoretically, overcome the instability issue and improve memoryefficiency. Researchers also pay attention to apply NODE to other fields such as medical imagePinckaers and Litjens (2019)
Du et al. (2020), video generationKanaa et al. (2019); Yildiz et al. (2019) and graph dataXhonneux et al. (2020).3 Method
3.1 Problem Definition
Previous work did not support predictions over continuous or unequal density time intervals. To facilitate performance comparison between models, we follow the traditional prediction framework. Assuming N is the number of Metro stations, the time period T data can be expressed as , where represents the the passenger counts of inflow/outflow of the station i at time interval t. our target is to predict the future ridership sequence based on the observed historical values:
(3) 
3.2 STRGODEs
Suppose is the latent state including the ridership feature at time interval t, and is the function of ridership changing with spatial,temporal and ridership information. By Euler method, there is .
The primal problem can be abstracted as we learn the ridership trend function according to the training data and predefined graph (PhysicalVirtual GraphLiu et al. (2020)), obtain the initial state through the observation sequence, and predict the subsequent sequence.
Chen et al.Chen et al. (2018) present a continuoustime, generative approach to modeling time series. Each trajectory is determined from a local initial state, , and a global set of latent dynamics shared across all time series:
(4)  
(5)  
(6) 
We follow the framework of Chen et al.Chen et al. (2018) for both training and prediction. We modify the original ODE Block to make it suitable for graph network data, called GODE Block. As shown in figure 1, inspired by Rubanova et al.Rubanova et al. (2019), we use the GODERNN structure to obtain the local initial state by learning the spatial,temporal and ridership feature of . In particular, due to the need to gather information from adjacent metro stations in GODE Block, we use GODERNN to learn the approximate posterior of local initial state
directly instead of learning the mean and standard deviation.
(7) 
Furthermore, by combining the ridership information and its hidden state, we further improve the performance of the prediction model so that it can reduce the cumulative error over a long time series.
3.3 PhysicalVirtual Graphs
In this section, we follows the predefined graph proposed by PVCGNLiu et al. (2020) to learn spatial information. In our work, the physical graph, similarity graph and correlation graph are denoted as , and , where is a set of nodes represent realworld metro stations(), is a set of edges and denotes the weights of all edges.
Physical Graph:is directly built based on the physical topology of the studied metro system. We first construct a physical connection matrix , where if there exists an edge between node i and j, or else . Note that each diagonal value P(i, i) is directly set to 0.Finally, the edge weight is obtained by performing a linear normalization on each row:
(8) 
Similarity Graph:The similarities of metro stations are used to guide the construction of . We construct a similarity score matrix , where the score S(i, j) between station i and j is computed with Dynamic Time WarpingBerndt and Clifford (1994): . Based on the matrix S, we select some station pairs to build edges with a predefined similarity threshold 0.1 for HZMetro and by choosing the top ten stations with high similarity scores for SHMetro. Finally, we calculate the edge weights by conducting row normalization on S:
(9) 
where if contains an edge connecting node i and k, or else .
Correlation Graph: We utilize the origindestination distribution of ridership to build the virtual graph . First, we construct a correlation ratio matrix : where D(i, j) is the total number of passengers that traveled from station j to station i in the whole training set. Based on the matrix C, we select some station pairs to build edges with a predefined similarity threshold 0.02 for HZMetro and by choosing the top ten stations with high similarity scores for SHMetro. Then, the edge weights is calculated by:
(10) 
3.4 GODE Block
In general, the graph convolutional networks is a method to aggregate the information of neighbor nodes around the target node and can be expressed asDefferrard et al. (2016); Kipf and Welling (2016); Hamilton et al. (2017); Veličković et al. (2017):
(11) 
where refers to the hidden state of node i in the kth layer,
refers to the eigenvector of the edge from node i to node j,
is the aggregation method of neighbor information, is the transformation method of neighbor information and is a transformation that fuses one’s own feature with its aggregated neighbors’information.These and related methods can be understood as special cases of a simple differentiable messagepassing framework (Gilmer et al. 2017Gilmer et al. (2017)):
(12) 
where is hidden state of node in the lth layer of the neural network. Incoming messages of the form
are accumulated and passed through an elementwise activation function
. denotes the set of incoming messages for node and is often chosen to be identical to the set of incoming edges. It can also be expressed in the following form:(13) 
According to Neural ODEsChen et al. (2018), the hidden state h(t) can be defined as the solution to an ODE initialvalue problem:
(14) 
Inspired by RGCNSchlichtkrull et al. (2018), we define our GODEfunction by using learnable weights to learn the changes of stations latent states over time, and learnable weights to learn the association relationship between stations. For every station,there are:
(15) 
For any time , we can calculate the latent representation by:
(16) 
3.5 STRGODEs encoder
Rubanova et al. proposed using an ODERNN as the encoder for a latent ODEs model to learn the mean and standard deviation of the approximate posterior and the local initial state .
As shown in algorithm 1, we made some modifications on ODERNN. We replaced ODESolve with GODESolve, which can be applied to graph data. We use GODERNN to learn the approximate posterior of local initial state directly instead of learning the mean and standard deviation. The hidden state of ridership information is learned and stored in hidden, which is combined with in RNNCell. In our work, we use the classical GRUCho et al. (2014) structure as the RNNCell. Specifically, the reset gate , update gate , new information are computed by:
(17)  
(18)  
(19)  
(20)  
(21) 
3.6 STRGODEs decoder
Before prediction, we convert the latent state obtained from GODERNN to get with a transform block which is composed of a fullyconnected layer, a tanh activation function, and a fullyconnected layer.
Previous work uses Neural ODEs as decoder for prediction and when the sequence is long, the error will accumulate gradually. Similar to the GODERNN, We introduce ridership information and its hidden state into GODESolver through RNNCell structure, as shown in algorithm 2. The experimental results show that it can reduce the cumulative error over a long time series. Specifically, we use the classical GRUCho et al. (2014) structure described in session 3.5 as the RNNCell.
4 Experiment
4.1 Datasets
To evaluate the performance of our work, we conduct experiments on two realworld traffic: SHMetro and HZMetro^{2}^{2}2https://tianchi.aliyun.com/competition/entrance/231708/information.
SHMetro: The SHMetro dataset refers to the ridership data on the metro system of Shanghai, China. There are 288 metro stations connected by 958 physical edges. A total of 811.8 million transaction records were collected from Jul. 1st 2016 to Sept. 30th 2016, with 8.82 million ridership per day. Following the setting of PVCGNLiu et al. (2020), for each station, we measured its inflow and outflow every 15 minutes by counting the number of passengers entering or exiting the station. The ridership data of the first two months and that of the last three weeks are used for training and testing, while the ridership data of the remaining days are used for validation.
HZMetro: The HZMetro dataset refers to the ridership data on the metro system of Hangzhou, China. There are 80 metro stations connected by 248 physical edges with 2.35 million ridership per day. The time interval of this dataset is also set to 15 minutes. Similar to SHMetro, this dataset is divided into three parts, including a training set (Jan. 1st  Jan. 18th), a validation set (Jan. 19th  Jan. 20th), and a testing set (Jan. 21th  Jan. 25th).
4.2 Experimental Settings
We have trained the official code of PVCGNLiu et al. (2020), which has a small gap with the results of corresponding papers. To reduce the effect difference caused by different parameters and facilitate the model performance comparison, we keep the parameters in STRGODEs the same as those in the official code of PVCGN.
We implement our STRGODEs in Python with Pytorch 1.6.0 and executed on a server with one NVIDIA Titan X GPU card. The lengths of input and output sequences are set to 4 simultaneously. The input data and the groundtruth of output are normalized with Zscore Normalization
^{3}^{3}3https://en.wikipedia.org/wiki/Standard_score before being fed into the network. The batch size is set to 8 and the feature dimensionality d is set to 256. The initial learning rate is 0.001 and its decay ratio is 0.1. We optimize the models by AdamKingma and Ba (2014)optimizer for a maximum of 200 epochs by minimizing the mean absolute error between the predicted results and the corresponding groundtruths. The best parameters for all deep learning models are chosen through a carefully parametertuning process on the validation set.
4.3 Compared Methods
To evaluate the overall performance of our work, we compare STRGODEs with widely used baselines and stateoftheart models:
DCRNN: Huang et al.Huang et al. (2019) proposed a deep learning framework specially designed for traffic prediction, which uses bidirectional random walks on graphs to capture spatial dependencies and an encoderdecoder architecture to learn temporal correlation. We implement this method with its official code.
GraphWaveNet: Wu et al.Wu et al. (2019) developed an adaptive dependency matrix to capture the hidden spatial dependencies, and used a stacked dilated 1D convolution component to handle very long sequences. We implement this method with its official code.
ST—ODEs: Zhou et al. applied the NODE to the prediction of urban flow grid maps. They concats the data over different time intervals, feeds it into CNN layer to get the initial state , and then uses the neural ODEs network for prediction.In this experiment, we reproduced the code and extended the ODE network to GODE network which can be used for graph data.
latent ODEs: Rubanova et al. refine the Latent ODE model of Chen et al. Chen et al. (2018) by using the ODERNN as a recognition network, where ODERNN is generalized from RNNs with continuoustime hidden dynamics defined by ordinary differential. In the experiment, we modify its official code and extended the ODE network to GODE network which can be used for graph data.
PVCGN: Liu et al.Liu et al. (2020) developed a Seq2Seq model with GCGRU and FCGRU to forecast the future metro ridership sequentially where Graph Convolution Gated Recurrent Unit (GCGRU) is applied to learn the spatialtemporal representation and FullyConnected Gated Recurrent Unit (FCGRU) is applied to capture the global evolution tendency. We implement this method with its official code.
Time  Metric  DCRNN  Graphwavenet  ST—ODEs  latent ODEs  PVCGN  STRGODEs 

MAE  23.81  24.01  24.54  22.79  22.70  22.87  
15min  RMSE  40.52  40.68  42.43  38.61  37.78  37.71 
MAPE  14.13%  14.31%  15.01%  13.54%  13.54%  13.78 %  
MAE  25.20  25.45  25.34  23.53  23.71  23.24  
30min  RMSE  42.51  42.02  43.49  39.63  39.96  38.69 
MAPE  14.89%  15.18%  15.43%  13.85%  14.06%  14.03 %  
MAE  26.86  27.21  26.56  24.44  24.53  23.79  
45min  RMSE  45.93  46.03  46.00  41.22  41.31  39.68 
MAPE  16.02%  17.39%  16.50%  14.67%  14.66%  14.66 %  
MAE  28.41  29.21  27.45  24.99  24.88  24.35  
60min  RMSE  49.23  49.77  47.61  42.20  41.61  40.83 
MAPE  18.06%  19.41%  19.01%  16.05%  15.68%  15.56 % 
Time  Metric  DCRNN  Graphwavenet  ST—ODEs  latent ODEs  PVCGN  STRGODEs 

MAE  24.09  24.86  24.79  23.51  23.41  23.21  
15min  RMSE  46.11  46.82  46.96  44.64  45.31  44.58 
MAPE  17.87%  19.98%  18.56%  17.64%  17.04%  16.99 %  
MAE  25.23  26.73  25.77  25.06  24.36  23.63  
30min  RMSE  49.92  52.01  50.08  48.91  48.12  46.28 
MAPE  18.32%  20.42%  19.29%  18.37%  17.35%  17.12 %  
MAE  26.84  28.71  26.80  26.18  25.29  24.65  
45min  RMSE  55.07  58.32  53.59  52.60  51.42  49.93 
MAPE  19.37%  21.93%  20.59%  19.22%  17.84%  17.58 %  
MAE  28.22  30.61  27.63  26.83  26.21  25.56  
60min  RMSE  60.18  64.02  56.04  55.78  55.17  53.39 
MAPE  20.57%  24.22%  22.38%  20.28%  18.83%  18.25 % 
4.4 Performance Comparison and Analysis
We measure the performance of predictive models with three widely used metrics  Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE).
Conventional prediction experiment: Table 1 and Table 2 present the prediction performances in conventional prediction experiments of our STRGODEs and representative comparison methods in HZMetro and SHMetro datasets. Compared with latent ODE, We can observe that the GODE network combined with ridership data and hidden states can indeed reduce the cumulative error. Compared with all other methods, STRGODEs has achieved some improvement in most metrics.
Further more, we extract the prediction of ridership in peak periods(7:309:30 and 17:3019:30) in conventional experiments, as shown in table 6 and table 7. We can observe that our STRGODEs outperforms all comparative methods consistently on most metrics on both datasets. On HZMetro, we achieve a relative improvement of 4.13% in MAE metric compared with other optimal algorithms on average, 3.59% in RMSE and 4.28% MAPE. On SHMetro, we achieve a relative improvement of 4.32% in MAE metric compared with other optimal algorithms on average, 4.57% in RMSE and 1.84% MAPE.
Time  Metric  DCRNN  Graphwavenet  ST—ODEs  latent ODEs  PVCGN  STRGODEs 

MAE  34.22  36.42  34.89  33.17  32.68  31.68  
15min  RMSE  53.09  56.13  54.71  52.20  50.23  48.78 
MAPE  10.02%  10.41%  10.02%  9.65%  9.67%  9.31 %  
MAE  36.79  38.47  36.37  34.64  33.83  32.37  
30min  RMSE  57.13  59.05  56.84  53.82  52.59  50.33 
MAPE  10.27%  10.74%  10.13%  9.89%  9.58%  9.18 %  
MAE  37.53  38.71  36.15  34.22  33.85  32.06  
45min  RMSE  59.89  59.92  57.11  53.89  53.28  50.69 
MAPE  10.71%  11.63%  10.78%  10.44%  10.04%  9.62 %  
MAE  37.16  37.89  34.85  32.71  32.05  30.82  
60min  RMSE  58.93  60.13  54.98  52.36  51.64  50.45 
MAPE  11.13%  12.04%  11.86%  11.20%  10.74%  10.20 % 
Time  Metric  DCRNN  Graphwavenet  ST—ODEs  latent ODEs  PVCGN  STRGODEs 

MAE  38.12  38.84  38.32  35.31  36.57  35.17  
15min  RMSE  67.74  68.01  66.89  61.87  64.87  61.93 
MAPE  14.07%  14.23%  13.99%  13.27%  13.32%  13.03 %  
MAE  39.72  42.73  40.46  38.12  37.93  35.81  
30min  RMSE  73.14  77.48  72.20  69.02  68.91  64.34 
MAPE  14.19%  14.92%  14.57%  13.94%  13.49%  13.20 %  
MAE  41.52  45.33  41.71  38.44  38.51  36.21  
45min  RMSE  79.03  85.14  77.12  71.12  73.55  67.16 
MAPE  15.13%  15.79%  15.53%  14.67%  14.11%  13.78 %  
MAE  41.13  45.74  41.10  37.51  37.87  35.44  
60min  RMSE  78.81  89.21  78.73  71.43  73.13  66.99 
MAPE  16.12%  17.21%  16.92%  15.89%  15.03%  14.87 % 
Irregular prediction experiment: In order to compare the performance of different methods in continuous time or unequal interval time, we use a compromise experiment that the traditional method can also work.
In this experiment, assuming that are the observed ridership sequence, where is the random subsequence of , our goal is to predict a the ridership sequence:
(22) 
Time  Metric  ST—ODEs  latent ODEs  PVCGN  STRGODEs 

MAE  31.53  19.95  22.19  17.70  
RMSE  63.46  40.12  38.91  37.31  
MAPE  16.34%  11.42%  13.12%  10.14%  
MAE  29.70  20.83  23.57  18.25  
RMSE  56.69  40.21  41.22  36.04  
MAPE  15.90%  12.34%  13.98%  10.75%  
MAE  30.04  21.10  22.22  18.19  
RMSE  57.51  42.67  40.32  37.52  
MAPE  15.82%  11.92%  13.21%  10.08%  
MAE  33.24  22.02  20.63  18.45  
RMSE  68.08  44.71  38.18  37.41  
MAPE  18.54%  13.13%  13.08%  10.74% 
Time  Metric  ST—ODEs  latent ODEs  PVCGN  STRGODEs 

MAE  27.32  16.61  25.18  16.11  
RMSE  59.35  38.46  51.60  38.66  
MAPE  19.32%  12.55%  17.61%  11.94%  
MAE  27.55  16.36  25.36  15.69  
RMSE  59.45  38.39  52.73  37.36  
MAPE  19.99%  12.66%  18.00%  11.96%  
MAE  27.94  15.63  25.01  14.96  
RMSE  61.24  37.24  53.52  36.33  
MAPE  21.13%  12.21%  18.21%  11.19%  
MAE  28.93  16.12  22.29  15.66  
RMSE  61.88  39.88  50.83  39.72  
MAPE  24.08%  12.24%  16.81%  11.02% 
Table 3 and Table 4 present the prediction performances in irregular prediction experiments of our STRGODEs and representative comparison methods in HZMetro and SHMetro datasets. Through the experimental results, we can see that Neural GODEs network is superior to the traditional method in irregular prediction and compared with other work, the superiority in effectiveness of the proposed STRGODEs is verified.
5 Conclusion
In this paper, we extended Neural ODE algorithms to the graph network and proposed the STRGODEs network, which can effectively learn spatial, temporal, and ridership correlations without dividing data into equalsized intervals on the timeline. Specifically, we use a GODERNN structure as encoder to learn the initial state , and hidden states containing the characteristics of ridership. Subsequently, GODE network which can incorporate the ridership information and its hidden vectors is used to predict the future sequence. Experimental results on two public largescale datasets demonstrate the superior performance of our algorithm.
In future works, we will make further use of the continuous time seriesbased characteristics of the model to meet the requirements of realtime prediction and apply the model to the passenger. At the same time, we will further optimize the time and space efficiency of the model to enable it to run on mobile devices such as mobile phones.
References

Diffusionconvolutional neural networks
. In Advances in neural information processing systems, pp. 1993–2001. Cited by: §2.  Adaptive graph convolutional recurrent network for traffic forecasting. arXiv preprint arXiv:2007.02842. Cited by: §2, §2.
 Using dynamic time warping to find patterns in time series.. In KDD workshop, Vol. 10, pp. 359–370. Cited by: §3.3.
 Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203. Cited by: §2.
 Neural ordinary differential equations. arXiv preprint arXiv:1806.07366. Cited by: §1, §2, §3.2, §3.4, §4.3.
 Learning phrase representations using rnn encoderdecoder for statistical machine translation. arXiv preprint arXiv:1406.1078. Cited by: §3.5, §3.6.
 Time dependence in nonautonomous neural odes. arXiv preprint arXiv:2005.01906. Cited by: §2.
 Convolutional neural networks on graphs with fast localized spectral filtering. arXiv preprint arXiv:1606.09375. Cited by: §2, §2, §3.4.

Modelbased reinforcement learning for semimarkov decision processes with neural odes
. arXiv preprint arXiv:2006.16210. Cited by: §2.  Augmented neural odes. arXiv preprint arXiv:1904.01681. Cited by: §2.

Spatiotemporal multigraph convolution network for ridehailing demand forecasting.
In
Proceedings of the AAAI conference on artificial intelligence
, Vol. 33, pp. 3656–3663. Cited by: §2.  STEER: simple temporal regularization for neural odes. arXiv preprint arXiv:2006.10711. Cited by: §2.

Neural message passing for quantum chemistry.
In
International Conference on Machine Learning
, pp. 1263–1272. Cited by: §2, §3.4.  Inductive representation learning on large graphs. arXiv preprint arXiv:1706.02216. Cited by: §2, §3.4.

Deep residual learning for image recognition.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pp. 770–778. Cited by: §2. 
Diffusion convolutional recurrent neural network with rank influence learning for traffic forecasting
. In2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE)
, pp. 678–685. Cited by: §4.3.  Simple video generation using neural odes. In Annu. Conf. Neural Inf. Process. Syst, Cited by: §2.
 Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.2.
 Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2, §2, §3.4.
 Adaptive graph convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. Cited by: §2.
 Diffusion convolutional recurrent neural network: datadriven traffic forecasting. arXiv preprint arXiv:1707.01926. Cited by: §2.
 Physicalvirtual collaboration modeling for intraand interstation metro ridership prediction. IEEE Transactions on Intelligent Transportation Systems. Cited by: §1, §2, §3.2, §3.3, §4.1, §4.2, §4.3.
 Attentive crowd flow machines. In Proceedings of the 26th ACM international conference on Multimedia, pp. 1553–1561. Cited by: §2.
 DeepPF: a deep learning based architecture for metro passenger flow prediction. Transportation Research Part C: Emerging Technologies 101, pp. 18–34. Cited by: §2.
 Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. In International Conference on Machine Learning, pp. 3276–3285. Cited by: §2.
 Stable neural flows. arXiv preprint arXiv:2003.08063. Cited by: §2.
 SARIMA modelling approach for railway passenger flow forecasting. Transport 33 (5), pp. 1113–1120. Cited by: §2.
 Neural ordinary differential equations for semantic segmentation of individual colon glands. arXiv preprint arXiv:1910.10470. Cited by: §2.
 Latent odes for irregularlysampled time series. arXiv preprint arXiv:1907.03907. Cited by: §1, §3.2.
 Modeling relational data with graph convolutional networks. In European semantic web conference, pp. 593–607. Cited by: §3.4.
 An aggregation approach to shortterm traffic flow prediction. IEEE Transactions on Intelligent Transportation Systems 10 (1), pp. 60–69. Cited by: §2.
 Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §3.4.
 A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics 5 (1), pp. 1–11. Cited by: §2.
 Modeling and forecasting vehicular traffic flow as a seasonal arima process: theoretical basis and empirical results. Journal of transportation engineering 129 (6), pp. 664–672. Cited by: §2.
 A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems. Cited by: §2.
 Graph wavenet for deep spatialtemporal graph modeling. arXiv preprint arXiv:1906.00121. Cited by: §2, §2, §4.3.
 Continuous graph neural networks. In International Conference on Machine Learning, pp. 10432–10441. Cited by: §2.
 On robustness of neural ordinary differential equations. arXiv preprint arXiv:1910.05513. Cited by: §2.
 Revisiting spatialtemporal similarity: a deep learning framework for traffic prediction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33, pp. 5668–5675. Cited by: §2.
 Deep multiview spatialtemporal network for taxi demand prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. Cited by: §2.
 ODE2VAE: deep generative second order odes with bayesian neural networks. Cited by: §2.
 Hierarchical graph representation learning with differentiable pooling. arXiv preprint arXiv:1806.08804. Cited by: §2.
 Link prediction based on graph neural networks. arXiv preprint arXiv:1802.09691. Cited by: §2.
 Tgcn: a temporal graph convolutional network for traffic prediction. IEEE Transactions on Intelligent Transportation Systems 21 (9), pp. 3848–3858. Cited by: §2.
 Urban flow prediction with spatial–temporal neural odes. Transportation Research Part C: Emerging Technologies 124, pp. 102912. Cited by: §1.
 [46] Enhancing urban flow maps via neural odes. Cited by: §1.
Comments
There are no comments yet.