Introduction
In recent years, we have witnessed significant development of Intelligent Transportation Systems (ITS) [27].
Parking guidance and information (PGI) systems, especially parking availability prediction, is an indispensable component of ITS.
According to a survey by the International Parking Institute (IPI)^{1}^{1}1https://www.parking.org/wpcontent/uploads/2015/12/Emergi
ngTrends2012.pdf, over cars on the road are searching for parking, and these cruising cars contribute up to traffic jams in urban areas [19].
Thus, citywide parking availability prediction is of great importance to help drivers efficiently find parking, help governments for urban planning, and alleviate the city’s traffic congestion.
Due to its importance, citywide parking availability prediction has attracted much attention from both academia and industry. On one hand, Google Maps predicts parking difficulty on a citywide scale based on users’ survey and trajectory data [1], and Baidu Maps estimates realtime citywide parking availability based on environmental contextual features (e.g., Point of Interest (POI), map queries, etc.) [18]. The above mentions make citywide parking availability prediction based on biased and indirect input signals (e.g., user’s feedback are noisy and lagged), which may induce inaccurate prediction results. On the other hand, in recent years, we have witnessed realtime sensor devices such as camera, ultrasonic sensor, and GPS become ubiquitous, which can significantly improve the prediction accuracy of parking availability [16, 4, 28]. However, for economic and privacy concerns, it is difficult to be scaled up to cover all parking lots of a city.
In this paper, we propose to simultaneously predict the availability of each parking lot of a city, based on both environmental contextual data (e.g., POI distribution, population) and partially observed realtime parking availability data. By integrating both datasets, we can make a better parking availability prediction at a cityscale. However, it is a nontrivial task faced with the following three major challenges. (1) Spatial autocorrelation. The availability of a parking lot is not only effected by the occupancy of nearby parking lots but may also synchronize with distant parking lots [22, 14]. The first challenge is how to model the irregular and nonEuclidean autocorrelation between parking lots. (2) Temporal autocorrelation. Future availability of a parking lot is correlated with its availability of previous time periods [17]. Besides, the spatial autocorrelation between parking lots may also vary over time [11, 24]. How to model dynamic temporal autocorrelation of each parking lot is another challenge. (3) Parking availability scarcity. Only a small portion of parking lots are equipped with realtime sensors. According to one of the largest map service application, there are over parking lots in Beijing, however, only of them have realtime parking availability data. The third challenge is how to utilize the scarce and incomplete realtime parking availability information.
To tackle above challenges, in this paper, we present Semisupervised Hierarchical Recurrent Graph Neural Network (SHARE) for citywide parking availability prediction. Our major contributions are summarized as follows:

We propose a semisupervised spatiotemporal learning framework to incorporate both environmental contextual factors and sparse realtime parking availability data for citywide parking availability prediction.

We propose a hierarchical graph convolution module to capture nonEuclidean spatial correlations among parking lots. It consists of a contextual graph convolution block and a soft clustering graph convolution block for local and global spatial dependencies modeling, respectively.

We propose a parking availability approximation module to estimate missing realtime parking availabilities of parking lots without sensor monitoring. Specifically, we introduce a propagating convolution block and reuse the temporal module to approximate missing parking availabilities from both spatial and temporal domain, then fuse them through an entropybased mechanism.

We evaluate SHARE on two realworld datasets collected from Beijing and Shenzhen, two metropolises in China. The results demonstrate our model achieves the best prediction performance against seven baselines.
Preliminaries
Consider a set of parking lots , where is the total number of parking lots, and denote a set of parking lots with and without realtime sensors (e.g., camera, ultrasonic sensor, GPS, etc.), respectively. Let denote observed
dimensional contextual feature vectors (
e.g., POI distribution, population, etc.) for all parking lots in at time . We begin the formal definition of parking availability prediction with the definition of parking availability.Definition 1
Parking availability (PA). Given a parking lot , at time step , the parking availability of , denoted is defined as the number of vacant parking spot in .
Specifically, we use to denote observed PAs of parking lots in at time step . In this paper, we are interested in predicting PAs for all parking lots by leveraging the contextual data of and partially observed realtime parking availability data of .
Problem 1
Parking availability prediction problem. Given historical time window , contextual features for all parking lots , and partially observed realtime PAs , our problem is to predict PAs for all over the next time steps,
(1) 
where , is the mapping function we aim to learn.
Framework overview
The architecture of SHARE is shown in Figure 1, where the inputs are contextual features as well as partially observed realtime PAs, and the output are the predicted PAs of all parking lots in next time steps. There are three major components in SHARE. First, the Hierarchical graph convolution module models spatial autocorrelations among parking lots, where the Contextual Graph Convolution (CxtConv) block captures local spatial dependencies between parking lots through rich contextual features (e.g., POI distribution, regional population, etc.), while the Soft Clustering Graph Convolution (SCConv) block captures global correlations among distant parking lots by softly assigning each parking lot to a set of latent cluster nodes. Second, the temporal autocorrelation modeling module employs the Gated Recurrent Unit (GRU) to model dynamic temporal dependencies of each parking lot. Third, the PA approximation module estimates distributions of missing PAs for parking lots in , from both spatial and temporal domain. In the spatial domain, the Propagating Graph Convolution (PropConv) block propagates observed realtime PAs to approxinate missing PAs based on the contextual similarity of each parking lot. In the temporal domain, we reuse the GRU module to approximate current PA distributions based on its output in previous time period. Two estimated PA distributions are then fused through an entropybased mechanism and feed to SCConv block and GRU module for final prediction.
Hierarchical spatial dependency modeling
We first introduce the hierarchical graph convolution module, including the contextual graph convolution block and the soft clustering graph convolution block.
Contextual graph convolution
In the spatial domain, the PA of nearby parking lots are usually correlated and mutually influenced by each other. For example, when there is a big concert, the PAs of parking lots near the concert hall are usually low, and the parking demand usually gradually diffuses from nearby to distant. Inspired by the recent success of graph convolution network [8, 21] on processing nonEuclidean graph structures, we first introduce the CxtConv block to capture local spatial dependencies solely based on contextual features.
We model the local correlations among parking lots as a graph , where is the set of parking lots, is a set of edges indicating connectivity among parking lots, and denotes the proximity matrix of [15]. Specifically, we define the connectivity constraint as
(2) 
where is the road network distance between parking lots and , is a distance threshold.
Since the influence of different nearby parking lots may vary nonlinearly, we employ an attention mechanism to compute the coefficient between parking lots, defined as
(3) 
where and are current contextual representations of parking lot and , is a learnable weighted matrix shared over all edges, and is a shared attention mechanism (e.g., dotproduct, concatenation, etc.) [20]. The proximity score between and is further defined as
(4) 
In general, the above attention mechanism is capable of computing pairwise proximity score for all . However, this formulation will lead to quadratic complexity. To weigh more attention on neighboring parking lots and help faster convergence, we inject the adjacency constraint where the attention operation only operate on adjacent nodes , where is a set of neighboring parking lots of in . Note that the influence of nearby parking lot at different time step may also vary, we learn a different proximity score for each different time steps.
Once is obtained, the contextual graph convolution operation updates representation of current parking lot by aggregating and transforming its neighbors, defined as
(5) 
where
is a nonlinear activation function, and
is a learnable weighted matrix shared over all parking lots. Note that we can stack identical contextual graph convolution layers to capture hop local dependencies, and is the raw contextual feature in the first CxtConv layer.Soft clustering graph convolution
Besides local correlation, distant parking lots may also be correlated. For example, distant parking lots in similar functional areas may show similar PA, e.g., business areas may have lower PA at office hour, and residential areas may have higher PA at the same time. However, CxtConv only captures local spatial correlation. [9] shows when
goes large, the representation of all parking lots tends to be similar, therefore losses discriminative power. To this end, we propose the SCConv block to capture global correlations between parking lots. Specifically, SCConv defines a set of latent nodes and learns the representation of each latent node based on learned representations of each parking lot. Rather than cluster each parking lot into a specific cluster, we learn a soft assignment matrix so that each parking lot have a chance to belong to multiple clusters with different probabilities (but with total probability equal to one), as shown in Figure
2.The intuition behind SCConv is twofold. First, distant parking lots may have similar contextual features and PAs, therefore should have similar representations. The shared latent node representation can be viewed as a regularization for the prediction task. Second, one parking lot may be mapped to multiple latent nodes. If we view each latent node as a different functionality class, a parking lot may serve for several functionalities. For example, a parking lot in a recreational center may be occupied by external visitors from a nearby office building.
The key component in SCConv is the soft assignment matrix. Given that there are latent nodes, let denotes the soft assignment matrix, where denotes the probability of th parking lot maps to th latent node. Specifically, we use denote the th row and denote the th column of . Given the learned representation of each parking lot , each row of is computed as
(6) 
which guarantees that the probabilities that a given parking lot belongs to each latent node sum equals one.
Once is obtained, the representation of each latent node can be derived by
(7) 
Given the representation of each latent node, similar to CxtConv, we apply soft clustering convolution operation to capture the dependency between each latent node,
(8) 
where is nonlinear activation function, and is the proximity score between two latent nodes. Rather than introduce extra attention parameter as in CxtConv, we derive proximity score between latent nodes based on adjacency constraint between parking lots,
(9) 
where equals one if parking lots and are connected. With learned latent node representation, we generate the soft clustering representation for each parking lot as a reverse process of latent node representation generation,
(10) 
Temporal dependency modeling
We leverage the Gated Recurrent Unit (GRU) [3], a simple yet effective variant of recurrent neural network (RNN), to model the temporal dependency. Consider previous step inputs of parking lot , , we denote the status of at time step and as and , respectively. The temporal dependency between and can be modeled by
(11) 
where , are defined as
(12) 
where , , , , , are learnable parameters, is the concatenation operation, and denotes Hadamard product. Then the hidden state is directly used to predict PAs of next time steps,
(13) 
where .
Parking availability approximation
The realtime PA is a strong signal for future PA prediction. However, only a small portion (e.g., in Beijing) of realtime PAs can be obtained through realtime sensors, which prevents us directly apply realtime PA as a part of input feature. To leverage the information hidden in partially observed realtime PA, we approximate missing PAs from both spatial and temporal domain. The proposed method consists of three blocks, i.e., the spatial PropConv block, the temporal GRU block, and the fusion block. Note that rather than approximate a scalar PA , we learn the distribution of PA, , for better information preservation. Given a PA , we discretize its distribution to a dimensional one hot vector . The objective of the PA approximation is to minimize the difference between and .
Spatial based PA approximation
Similar to CxtConv, for each , the PropConv operation is defined as
(14) 
where is the obtained PA distribution, is the proximity score between and . Different from CxtConv, the estimated PA is only aggregated from nearby parking lots with realtime PA, and we preserve the aggregated vector representation without extra activation function. The proximity score is computed through same attention mechanism in Equation (4), but with a relaxed connectivity constraint
(15) 
where denotes the road network distance between parking lot and its th nearest parking lot . The relaxed adjacency constraint improves node connectivity for more sufficient propagation of observed PA, and therefore alleviates the data scarcity problem.
Temporal based PA approximation
We reuse the output of the GRU block to approximate realtime PA from the temporal domain. The difference between current PA approximation and future PA prediction is here we employ a different function. Remember that in previous step, we have obtained hidden state from GRU, we directly approximate distribution of PA at by
(16) 
This step doesn’t introduce extra computation for GRU, and the Softmax layer normalizes sum equals one.
Approximated PA fusion
Rather than directly averaging and , we propose an entropybased mechanism to fuse two PA distributions. Specifically, we weigh more on the approximation less uncertainty [7], i.e., the one with smaller entropy. Given an estimated PA distribution , its entropy is
(17) 
where represents the th dimension of . We fuse two PA distributions and as follow:
(18) 
where .
The approximated PA distribution is applied for two tasks. First, it is concatenated with the learned representation of the CxtConv and fed to the SCConv block for latent node representation learning. Second, it is combined with the output of the CxtConv and SCConv, . We use as the overall representation for each parking lot at time step , and feed it into the GRU module to generate final PA prediction results.
Model training
Since only parking lots
are with observed labels, following the semisupervised learning paradigm,
SHARE aims to minimize the mean square error (MSE) between the predicted PA and the observed PA(19) 
Additionally, in PA approximation, we introduce extra cross entropy (CE) loss to minimize the error between the observed PA and approximated PA distributions (i.e., the spatial and temporal based PA distribution approximation and ) in current time step ,
(20) 
(21) 
By considering both MSE loss and CE loss, SHARE aims to jointly minimize the following objective
(22) 
where is the hyperparameter controls the importance of two CE losses.
Experiments
Experimental setup
Data description.
We use two realworld datasets collected from Beijing and Shenzhen, two metropolises in China. Both datasets are ranged from April 20, 2019, to May 20, 2019. All PA records are crawled every 15 minutes from a publicly accessible app, in which all parking occupancy information are collected by realtime sensors. We associate POI distribution [13, 29] to each parking lot and aggregate checkin records nearby each parking lot in every minutes as the population data. POI and checkin data are collected through Baidu Maps Place API and location SDK [12]. We chronologically order the above data, take the first as the training set, the following for validation, and the rest as the test set. In each dataset, parking lots are masked as unlabeled. The spatial distribution of parking lots in Beijing are shown in Figure 3. The statistics of the datasets are summarized in Table 1.
Implementation details.
Our model and all seven baselines are implemented with PaddlePaddle. Following previous work [10, 26], the PA is normalized before input and scaled back to absolute PA in output. We choose and select for prediction. We set Km and to connect parking lots. The dimension of and are fixed to , is fixed to . The layer of CxtConv, SCConv, and PropConv are , respectively. We use dotproduct attention in this paper. In SCConv, the number of latent nodes is set to , where is the total number of parking lots. The activation function in CxtConv and SCConv are LeakyReLU (), and Sigmoid in other layers. We employ the Adam optimizer for training, fix the learning rate to and set to . For a fair comparison, all parameters of each baseline are carefully tuned based on the recommended settings.
Description  BEIJING  SHENZHEN 
# of parking lots  1,965  1,360 
# of PA records  5,847,840  4,047,360 
Average # of parking spots  210.24  185.36 
# of checkins  9,436,362,579  3,680,063,509 
# of POIs  669,058  250,275 
# of POI categories  197  188 
Evaluation metrics.
We adopt Mean Average Error (MAE) and Rooted Mean Square Error (RMSE), two widely used metrics [11] for evaluation.
Algorithm  Beijing (15/ 30/ 45 min)  Shenzhen (15/ 30/ 45 min)  

MAE  RMSE  MAE  RMSE  
LR  29.90 / 30.27 / 30.58  69.74 / 70.95 / 72.00  24.59 / 24.80 / 25.09  51.31 / 52.36 / 52.80 
GBRT  17.29 / 17.81 / 18.40  44.60 / 48.50 / 51.59  13.90 / 14.67 / 14.71  35.05 / 37.98 / 38.09 
GRU  18.51 / 18.78 / 19.73  55.43 / 55.92 / 58.64  16.73 / 16.88 / 17.14  46.92 / 47.26 / 47.56 
GoogleParking  21.49 / 21.68 / 22.85  57.26 / 59.25 / 60.48  17.10 / 18.33 / 18.69  47.30 / 48.45 / 49.34 
DuParking  17.67 / 17.70 / 18.03  50.17 / 50.63 / 51.75  13.91 / 14.17 / 14.39  42.66 / 43.24 / 43.56 
STGCN  16.57 / 16.44 / 17.10  50.79 / 51.04 / 52.61  13.46 / 13.59 / 13.88  39.26 / 39.96 / 40.29 
DCRNN  15.66 / 15.97 / 16.30  46.28 / 47.80 / 48.87  13.11 / 13.19 / 13.89  42.74 / 43.37 / 44.27 
CxtGNN (ours)  15.29 / 15.69 / 16.15  45.55 / 46.69 / 47.78  12.39 / 12.73 / 13.09  36.31 / 36.92 / 37.46 
CAGNN (ours)  12.45 / 12.77 / 13.20  39.99 / 40.81 / 41.31  10.50 / 10.62 / 10.98  31.86 / 32.12 / 32.83 
SHARE (ours)  10.68 / 10.97 / 11.43  32.00 / 32.78 / 33.78  9.23 / 9.41 / 9.66  30.44 / 30.90 / 31.70 
Baselines.
We compare our full approach with the following seven baselines and two variants of SHARE:

LR
uses logistic regression for parking availability prediction. We concatenate previous
steps historical features as the input and predict each parking lot separately. 
GRU [3] predicts the PA of each parking lot without considering spatial dependency. We train two GRUs for and separately.

GoogleParking [1] is the parking difficulty prediction model deployed on Google Maps. It uses a feedforward deep neural network for prediction.

DuParking [18] is the parking availability estimation model used on Baidu Maps. It fuses several LSTMs to capture various temporal dependencies.

STGCN [26] is a stateoftheart graph neural network model for traffic forecasting. It models both spatial and temporal dependency with convolution structure. The input graph is constructed as described in the original paper but keeps same graph connectivity with our CxtConv.

DCRNN [10] is another graph convolution network based model, which models spatial and temporal dependency by integrating graph convolution and GRU. The input graph is the same as STGCN.

CxtGNN is a basic version of SHARE, without including PA approximation and soft clustering graph convolution.

CAGNN is another variant of SHARE but without including the soft clustering graph convolution block.
Overall performance
Table 2 reports the overall results of our methods and all the compared baselines on two datasets with respect to MAE and RMSE. As can be seen, our model together with its variants outperform all other baselines using both metrics, which demonstrates the advance of SHARE. Specifically, SHARE achieves and improvements beyond the stateoftheart approach (DCRNN) on MAE and RMSE on Beijing for prediction, respectively. Similarity, the improvement of MAE and RMSE on Shenzhen are and . Moreover, we observe significant improvement by comparing SHARE with its variants (i.e., CxtGNN and CAGNN). For example, by adding the PA approximation module, CAGNN achieves lower MAE and lower RMSE than CxtGNN on Beijing, respectively. By further adding the SCConv block, SHARE achieves lower MAE and lower RMSE than CAGNN on Beijing. The improvement in Shenzhen are consistent. All above results demonstrate effectiveness of the PA approximation and the hierarchical graph convolution architecture.
Looking further in to the results, we observe all graph convolution based models (i.e., STGCN, DCRNN and SHARE) outperform other deep learning based approaches (i.e., GoogleParking and Duparking), which consistently reveals the advantage of incorporating spatial dependency for parking availability prediction. Remarkably, GBRT outperforms Googleparking, GRU, LR, and achieves a similar result with Duparking, which validates our exception that GBRT is a simple but effective approach for regression tasks. One extra interesting finding is that both MAE and RMSE of all methods on Shenzhen is relatively smaller than on Beijing. This is possible because the spatial distribution of parking lots is more dense and evenly distributed in Shenzhen; therefore they are easier to predict.
Parameter sensitivity
Due to space limitations, here we report the impact of the ratio of labeled parking lot (i.e., ), the proportion of latent nodes in the soft clustering graph convolution with respect to the total number of parking lot (i.e., ), the input time step and the prediction time step using MAE on Beijing. Each time we vary a parameter, set others to their default values. The results on Beijing using RMSE and on Shenzhen using both metrics are similar.
First, we vary the ratio of the labeled parking lot from to . The results are reported in Figure 4(a). The results are unsurprising: equipping more realtime sensors in parking lots enables us to more accurately predict PA. However, equipping more sensors lead to extra economic cost and may be constrained by policies of each parking lot. Finding the most costeffective ratio and exploring optimal sensor distribution are important problems in the future study.
Then, we vary the ratio of the latent nodes from to . For example, there are parking lots on Beijing, corresponds to latent nodes. The results are reported in Figure 4(b). As can be seen, there is a performance improvement by increasing the ratio of latent node form to , but a performance degradation by further increasing the ratio of the latent node from to . The reason is that heavily reduce the number of latent nodes reduces the discriminative power of learned latent representation, whereas too many latent nodes reduces the regularization power of learned latent representation.
To test the impact of input length, we vary from to . The results are reported in Figure 4(c). SHARE achieves least errors when . One possible reason is that an excessively shortterm input can not provide sufficient temporal correlated information, whereas too long input introduces more noises for temporal dependency modeling.
Finally, to test the impact of prediction step, we vary from to . The results are reported in Figure 4(d). We separate the result of labeled and unlabeled parking lots separately. Overall, labeled parking lots are much easier to predict. Besides, by increase , the error of all parking lots increases consistently. However, we can observe the error of labeled parking lots are increasing faster, this makes sense because the temporal dependency between observed PA and future PA becomes lower when goes large.
Effectiveness on different regions
To evaluate the performance of SHARE on different regions, we partition Beijing into a set of disjoint grid based on longitude and latitude, and test the performance of SHARE on each region. Figure 5(a) and Figure 5(b) plot the averaged MAE of SHARE and averaged number of parking spot in each region on Beijing
, respectively. Overall, the MAE in each region is even except for several outliers. We find the performance of
SHARE is highly correlated with the averaged number of parking spots in each region. For example, the MAE on region and are and , which are greater than the overall MAE . Meanwhile, the averaged parking spot of these two regions are and , significantly greater than overall averaged parking spot . This is possible because for the same ratio of parking availability fluctuate, parking lot with a larger number of parking spot will have larger MAE. This result indicates in the future further optimization can be applied to these large parking lots to improve the overall performance.Related Work
Parking availability prediction.
Previous studies on parking availability prediction mainly fall in two categories, contextual data based prediction and realtime sensor based prediction.
For contextual data based prediction, Googleparking [1] and Duparking [18] predict parking availability based on indirect signals (e.g., user feedbacks and contextual factors), which may induce an inaccurate prediction result.
For realtime sensor based prediction, study in [17] proposes an autoregressive model
and study in [4] proposes a boosting method
for parking availability inference.
Above approaches are limited by economic and privacy concerns and are hard to be scaled to all parking lots in a city. Moreover, all the above approaches don’t fully exploit nonEuclidean spatial autocorrelations between parking lots, which limits their prediction performance.
Graph neural network.
Graph neural network (GNN) extends the wellknown convolution neural network to nonEuclidean graph structures, where the representation of each node is derived by first aggregating and then transforming representations of its neighbors
[21]. It is worth to point out that the idea of our soft clustering graph convolution is partially inspired by [25], but our objective is to capture global spatial correlation for nodelevel prediction. Due to its effectiveness, GNN has been successfully applied to several spatiotemporal forecasting tasks, such as traffic flow forecasting [10, 6] and taxi demand forecasting [5, 23]. However, we argue these approaches either overlook contextual factors or global spatial dependency and are not tailored for parking availability prediction.Conclusion
In this paper, we present SHARE, a citywide parking availability prediction framework based on both environmental contextual data and partially observed realtime parking availability data. We first propose a hierarchical graph convolution module to capture both local and global spatial correlations. Then, we adopt a simple yet effective GRU module to capture dynamic temporal autocorrelations of each parking lot. Besides, a parking availability approximation module is proposed for parking lots without realtime parking availability information. Extensive experimental results on two realworld datasets show that the performance of SHARE for parking availability prediction significantly outperforms seven stateoftheart baselines.
Acknowledgement
This research is supported in part by grants from the National Natural Science Foundation of China (Grant No.71531001).
References
 [1] (2019) Hard to park?: estimating parking difficulty at scale. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2296–2304. Cited by: Introduction, 4th item, Related Work.
 [2] (2016) Xgboost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 785–794. Cited by: 2nd item.
 [3] (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Cited by: Temporal dependency modeling, 3rd item.
 [4] (2013) AdaBoost for parking lot occupation detection. In Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013, pp. 681–690. Cited by: Introduction, Related Work.

[5]
(2019)
Spatiotemporal multigraph convolution network for ridehailing demand forecasting.
In
Proceedings of the ThirtyThird AAAI Conference on Artificial Intelligence
, pp. 3656–3663. Cited by: Related Work.  [6] (2019) Attention based spatialtemporal graph convolutional networks for traffic flow forecasting. In Proceedings of the ThirtyThird AAAI Conference on Artificial Intelligence, pp. 922–929. Cited by: Related Work.
 [7] (2015) Inferring air quality for station location recommendation based on urban big data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 437–446. Cited by: Approximated PA fusion.
 [8] (2017) Semisupervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Cited by: Contextual graph convolution.
 [9] (2018) Deeper insights into graph convolutional networks for semisupervised learning. In Proceedings of the ThirtySecond AAAI Conference on Artificial Intelligence, pp. 3538–3545. Cited by: Soft clustering graph convolution.
 [10] (2018) Diffusion convolutional recurrent neural network: datadriven traffic forecasting. In 6th International Conference on Learning Representations, ICLR 2018, Cited by: 7th item, Implementation details., Related Work.
 [11] (2018) GeoMAN: multilevel attention networks for geosensory time series prediction.. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018, pp. 3428–3434. Cited by: Introduction, Evaluation metrics..
 [12] (2019) Joint representation learning for multimodal transportation recommendation. In Proceedings of the ThirtyThird AAAI Conference on Artificial Intelligence, pp. 1036–1043. Cited by: Data description..
 [13] (2019) Hydra: a personalized and contextaware multimodal transportation recommendation system. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2314–2324. Cited by: Data description..
 [14] (2017) Pointofinterest demand modeling with human mobility patterns. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 947–955. Cited by: Introduction.
 [15] (2019) An efficient approach to finding dense temporal subgraphs. IEEE Transactions on Knowledge and Data Engineering. Cited by: Contextual graph convolution.
 [16] (2010) Parknet: driveby sensing of roadside parking statistics. In Proceedings of the 8th international conference on Mobile systems, applications, and services, pp. 123–136. Cited by: Introduction.
 [17] (2015) Onstreet and offstreet parking availability prediction using multivariate spatiotemporal models. IEEE Transactions on Intelligent Transportation Systems 16 (5), pp. 2913–2924. Cited by: Introduction, Related Work.
 [18] (2018) Duparking: spatiotemporal big data tells you realtime parking availability. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 646–654. Cited by: Introduction, 5th item, Related Work.
 [19] (2006) Cruising for parking. Transport Policy 13 (6), pp. 479–486. Cited by: Introduction.
 [20] (2017) Attention is all you need. In Advances in Neural Information Processing Systems, pp. 6000–6010. Cited by: Contextual graph convolution.
 [21] (2018) Graph attention networks. In 6th International Conference on Learning Representations, ICLR 2018, Cited by: Contextual graph convolution, Related Work.
 [22] (2017) Human mobility synchronization and trip purpose detection with mixture of hawkes processes. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 495–503. Cited by: Introduction.
 [23] (2019) Origindestination matrix prediction via graph convolution: a new perspective of passenger demand modeling. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1227–1235. Cited by: Related Work.
 [24] (2019) Revisiting spatialtemporal similarity: a deep learning framework for traffic prediction. In Proceedings of the ThirtyThird AAAI Conference on Artificial Intelligence, Cited by: Introduction.
 [25] (2018) Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems, pp. 4800–4810. Cited by: Related Work.
 [26] (2018) Spatiotemporal graph convolutional networks: a deep learning framework for traffic forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018, Cited by: 6th item, Implementation details..
 [27] (2011) Datadriven intelligent transportation systems: a survey. IEEE Transactions on Intelligent Transportation Systems 12 (4), pp. 1624–1639. Cited by: Introduction.
 [28] (2015) Smiler: a semilazy time series prediction system for sensors. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1871–1886. Cited by: Introduction.
 [29] (2016) Days on market: measuring liquidity in real estate markets. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 393–402. Cited by: Data description..