Semi-Supervised Hierarchical Recurrent Graph Neural Network for City-Wide Parking Availability Prediction

by   Weijia Zhang, et al.
Rutgers University
Baidu, Inc.

The ability to predict city-wide parking availability is crucial for the successful development of Parking Guidance and Information (PGI) systems. Indeed, the effective prediction of city-wide parking availability can improve parking efficiency, help urban planning, and ultimately alleviate city congestion. However, it is a non-trivial task for predicting citywide parking availability because of three major challenges: 1) the non-Euclidean spatial autocorrelation among parking lots, 2) the dynamic temporal autocorrelation inside of and between parking lots, and 3) the scarcity of information about real-time parking availability obtained from real-time sensors (e.g., camera, ultrasonic sensor, and GPS). To this end, we propose Semi-supervised Hierarchical Recurrent Graph Neural Network (SHARE) for predicting city-wide parking availability. Specifically, we first propose a hierarchical graph convolution structure to model non-Euclidean spatial autocorrelation among parking lots. Along this line, a contextual graph convolution block and a soft clustering graph convolution block are respectively proposed to capture local and global spatial dependencies between parking lots. Additionally, we adopt a recurrent neural network to incorporate dynamic temporal dependencies of parking lots. Moreover, we propose a parking availability approximation module to estimate missing real-time parking availabilities from both spatial and temporal domain. Finally, experiments on two real-world datasets demonstrate the prediction performance of SHARE outperforms seven state-of-the-art baselines.


page 1

page 2

page 3

page 4


Semi-Supervised City-Wide Parking Availability Prediction via Hierarchical Recurrent Graph Neural Network

The ability to predict city-wide parking availability is crucial for the...

Semi-supervised Soil Moisture Prediction through Graph Neural Networks

Recent improvement and availability of remote satellite and IoT data off...

Parallel Multi-Graph Convolution Network For Metro Passenger Volume Prediction

Accurate prediction of metro passenger volume (number of passengers) is ...

Discrete-time Temporal Network Embedding via Implicit Hierarchical Learning in Hyperbolic Space

Representation learning over temporal networks has drawn considerable at...

Group-Aware Graph Neural Network for Nationwide City Air Quality Forecasting

The problem of air pollution threatens public health. Air quality foreca...

HAGEN: Homophily-Aware Graph Convolutional Recurrent Network for Crime Forecasting

The crime forecasting is an important problem as it greatly contributes ...

MugRep: A Multi-Task Hierarchical Graph Representation Learning Framework for Real Estate Appraisal

Real estate appraisal refers to the process of developing an unbiased op...


In recent years, we have witnessed significant development of Intelligent Transportation Systems (ITS) [27]. Parking guidance and information (PGI) systems, especially parking availability prediction, is an indispensable component of ITS. According to a survey by the International Parking Institute (IPI)111
, over cars on the road are searching for parking, and these cruising cars contribute up to traffic jams in urban areas [19]. Thus, city-wide parking availability prediction is of great importance to help drivers efficiently find parking, help governments for urban planning, and alleviate the city’s traffic congestion.

Due to its importance, city-wide parking availability prediction has attracted much attention from both academia and industry. On one hand, Google Maps predicts parking difficulty on a city-wide scale based on users’ survey and trajectory data [1], and Baidu Maps estimates real-time city-wide parking availability based on environmental contextual features (e.g., Point of Interest (POI), map queries, etc.[18]. The above mentions make city-wide parking availability prediction based on biased and indirect input signals (e.g., user’s feedback are noisy and lagged), which may induce inaccurate prediction results. On the other hand, in recent years, we have witnessed real-time sensor devices such as camera, ultrasonic sensor, and GPS become ubiquitous, which can significantly improve the prediction accuracy of parking availability [16, 4, 28]. However, for economic and privacy concerns, it is difficult to be scaled up to cover all parking lots of a city.

In this paper, we propose to simultaneously predict the availability of each parking lot of a city, based on both environmental contextual data (e.g., POI distribution, population) and partially observed real-time parking availability data. By integrating both datasets, we can make a better parking availability prediction at a city-scale. However, it is a non-trivial task faced with the following three major challenges. (1) Spatial autocorrelation. The availability of a parking lot is not only effected by the occupancy of nearby parking lots but may also synchronize with distant parking lots [22, 14]. The first challenge is how to model the irregular and non-Euclidean autocorrelation between parking lots. (2) Temporal autocorrelation. Future availability of a parking lot is correlated with its availability of previous time periods [17]. Besides, the spatial autocorrelation between parking lots may also vary over time [11, 24]. How to model dynamic temporal autocorrelation of each parking lot is another challenge. (3) Parking availability scarcity. Only a small portion of parking lots are equipped with real-time sensors. According to one of the largest map service application, there are over parking lots in Beijing, however, only of them have real-time parking availability data. The third challenge is how to utilize the scarce and incomplete real-time parking availability information.

To tackle above challenges, in this paper, we present Semi-supervised Hierarchical Recurrent Graph Neural Network (SHARE) for city-wide parking availability prediction. Our major contributions are summarized as follows:

  • We propose a semi-supervised spatio-temporal learning framework to incorporate both environmental contextual factors and sparse real-time parking availability data for city-wide parking availability prediction.

  • We propose a hierarchical graph convolution module to capture non-Euclidean spatial correlations among parking lots. It consists of a contextual graph convolution block and a soft clustering graph convolution block for local and global spatial dependencies modeling, respectively.

  • We propose a parking availability approximation module to estimate missing real-time parking availabilities of parking lots without sensor monitoring. Specifically, we introduce a propagating convolution block and reuse the temporal module to approximate missing parking availabilities from both spatial and temporal domain, then fuse them through an entropy-based mechanism.

  • We evaluate SHARE on two real-world datasets collected from Beijing and Shenzhen, two metropolises in China. The results demonstrate our model achieves the best prediction performance against seven baselines.


Consider a set of parking lots , where is the total number of parking lots, and denote a set of parking lots with and without real-time sensors (e.g., camera, ultrasonic sensor, GPS, etc.), respectively. Let denote observed

dimensional contextual feature vectors (

e.g., POI distribution, population, etc.) for all parking lots in at time . We begin the formal definition of parking availability prediction with the definition of parking availability.

Definition 1

Parking availability (PA). Given a parking lot , at time step , the parking availability of , denoted is defined as the number of vacant parking spot in .

Specifically, we use to denote observed PAs of parking lots in at time step . In this paper, we are interested in predicting PAs for all parking lots by leveraging the contextual data of and partially observed real-time parking availability data of .

Problem 1

Parking availability prediction problem. Given historical time window , contextual features for all parking lots , and partially observed real-time PAs , our problem is to predict PAs for all over the next time steps,


where , is the mapping function we aim to learn.

Framework overview

Figure 1: The framework overview of SHARE.

The architecture of SHARE is shown in Figure 1, where the inputs are contextual features as well as partially observed real-time PAs, and the output are the predicted PAs of all parking lots in next time steps. There are three major components in SHARE. First, the Hierarchical graph convolution module models spatial autocorrelations among parking lots, where the Contextual Graph Convolution (CxtConv) block captures local spatial dependencies between parking lots through rich contextual features (e.g., POI distribution, regional population, etc.), while the Soft Clustering Graph Convolution (SCConv) block captures global correlations among distant parking lots by softly assigning each parking lot to a set of latent cluster nodes. Second, the temporal autocorrelation modeling module employs the Gated Recurrent Unit (GRU) to model dynamic temporal dependencies of each parking lot. Third, the PA approximation module estimates distributions of missing PAs for parking lots in , from both spatial and temporal domain. In the spatial domain, the Propagating Graph Convolution (PropConv) block propagates observed real-time PAs to approxinate missing PAs based on the contextual similarity of each parking lot. In the temporal domain, we reuse the GRU module to approximate current PA distributions based on its output in previous time period. Two estimated PA distributions are then fused through an entropy-based mechanism and feed to SCConv block and GRU module for final prediction.

Hierarchical spatial dependency modeling

We first introduce the hierarchical graph convolution module, including the contextual graph convolution block and the soft clustering graph convolution block.

Contextual graph convolution

In the spatial domain, the PA of nearby parking lots are usually correlated and mutually influenced by each other. For example, when there is a big concert, the PAs of parking lots near the concert hall are usually low, and the parking demand usually gradually diffuses from nearby to distant. Inspired by the recent success of graph convolution network [8, 21] on processing non-Euclidean graph structures, we first introduce the CxtConv block to capture local spatial dependencies solely based on contextual features.

We model the local correlations among parking lots as a graph , where is the set of parking lots, is a set of edges indicating connectivity among parking lots, and denotes the proximity matrix of [15]. Specifically, we define the connectivity constraint as


where is the road network distance between parking lots and , is a distance threshold.

Since the influence of different nearby parking lots may vary non-linearly, we employ an attention mechanism to compute the coefficient between parking lots, defined as


where and are current contextual representations of parking lot and , is a learnable weighted matrix shared over all edges, and is a shared attention mechanism (e.g., dot-product, concatenation, etc.[20]. The proximity score between and is further defined as


In general, the above attention mechanism is capable of computing pair-wise proximity score for all . However, this formulation will lead to quadratic complexity. To weigh more attention on neighboring parking lots and help faster convergence, we inject the adjacency constraint where the attention operation only operate on adjacent nodes , where is a set of neighboring parking lots of in . Note that the influence of nearby parking lot at different time step may also vary, we learn a different proximity score for each different time steps.

Once is obtained, the contextual graph convolution operation updates representation of current parking lot by aggregating and transforming its neighbors, defined as



is a non-linear activation function, and

is a learnable weighted matrix shared over all parking lots. Note that we can stack identical contextual graph convolution layers to capture -hop local dependencies, and is the raw contextual feature in the first CxtConv layer.

Soft clustering graph convolution

Besides local correlation, distant parking lots may also be correlated. For example, distant parking lots in similar functional areas may show similar PA, e.g., business areas may have lower PA at office hour, and residential areas may have higher PA at the same time. However, CxtConv only captures local spatial correlation. [9] shows when

goes large, the representation of all parking lots tends to be similar, therefore losses discriminative power. To this end, we propose the SCConv block to capture global correlations between parking lots. Specifically, SCConv defines a set of latent nodes and learns the representation of each latent node based on learned representations of each parking lot. Rather than cluster each parking lot into a specific cluster, we learn a soft assignment matrix so that each parking lot have a chance to belong to multiple clusters with different probabilities (but with total probability equal to one), as shown in Figure 


Figure 2: Hierarchical soft clustering.

The intuition behind SCConv is two-fold. First, distant parking lots may have similar contextual features and PAs, therefore should have similar representations. The shared latent node representation can be viewed as a regularization for the prediction task. Second, one parking lot may be mapped to multiple latent nodes. If we view each latent node as a different functionality class, a parking lot may serve for several functionalities. For example, a parking lot in a recreational center may be occupied by external visitors from a nearby office building.

The key component in SCConv is the soft assignment matrix. Given that there are latent nodes, let denotes the soft assignment matrix, where denotes the probability of -th parking lot maps to -th latent node. Specifically, we use denote the -th row and denote the -th column of . Given the learned representation of each parking lot , each row of is computed as


which guarantees that the probabilities that a given parking lot belongs to each latent node sum equals one.

Once is obtained, the representation of each latent node can be derived by


Given the representation of each latent node, similar to CxtConv, we apply soft clustering convolution operation to capture the dependency between each latent node,


where is non-linear activation function, and is the proximity score between two latent nodes. Rather than introduce extra attention parameter as in CxtConv, we derive proximity score between latent nodes based on adjacency constraint between parking lots,


where equals one if parking lots and are connected. With learned latent node representation, we generate the soft clustering representation for each parking lot as a reverse process of latent node representation generation,


Temporal dependency modeling

We leverage the Gated Recurrent Unit (GRU) [3], a simple yet effective variant of recurrent neural network (RNN), to model the temporal dependency. Consider previous step inputs of parking lot , , we denote the status of at time step and as and , respectively. The temporal dependency between and can be modeled by


where , are defined as


where , , , , , are learnable parameters, is the concatenation operation, and denotes Hadamard product. Then the hidden state is directly used to predict PAs of next time steps,


where .

Parking availability approximation

The real-time PA is a strong signal for future PA prediction. However, only a small portion (e.g., in Beijing) of real-time PAs can be obtained through real-time sensors, which prevents us directly apply real-time PA as a part of input feature. To leverage the information hidden in partially observed real-time PA, we approximate missing PAs from both spatial and temporal domain. The proposed method consists of three blocks, i.e., the spatial PropConv block, the temporal GRU block, and the fusion block. Note that rather than approximate a scalar PA , we learn the distribution of PA, , for better information preservation. Given a PA , we discretize its distribution to a dimensional one hot vector . The objective of the PA approximation is to minimize the difference between and .

Spatial based PA approximation

Similar to CxtConv, for each , the PropConv operation is defined as


where is the obtained PA distribution, is the proximity score between and . Different from CxtConv, the estimated PA is only aggregated from nearby parking lots with real-time PA, and we preserve the aggregated vector representation without extra activation function. The proximity score is computed through same attention mechanism in Equation (4), but with a relaxed connectivity constraint


where denotes the road network distance between parking lot and its -th nearest parking lot . The relaxed adjacency constraint improves node connectivity for more sufficient propagation of observed PA, and therefore alleviates the data scarcity problem.

Temporal based PA approximation

We reuse the output of the GRU block to approximate real-time PA from the temporal domain. The difference between current PA approximation and future PA prediction is here we employ a different function. Remember that in previous step, we have obtained hidden state from GRU, we directly approximate distribution of PA at by


This step doesn’t introduce extra computation for GRU, and the Softmax layer normalizes sum equals one.

Approximated PA fusion

Rather than directly averaging and , we propose an entropy-based mechanism to fuse two PA distributions. Specifically, we weigh more on the approximation less uncertainty [7], i.e., the one with smaller entropy. Given an estimated PA distribution , its entropy is


where represents the -th dimension of . We fuse two PA distributions and as follow:


where .

The approximated PA distribution is applied for two tasks. First, it is concatenated with the learned representation of the CxtConv and fed to the SCConv block for latent node representation learning. Second, it is combined with the output of the CxtConv and SCConv, . We use as the overall representation for each parking lot at time step , and feed it into the GRU module to generate final PA prediction results.

Model training

Since only parking lots

are with observed labels, following the semi-supervised learning paradigm,

SHARE aims to minimize the mean square error (MSE) between the predicted PA and the observed PA


Additionally, in PA approximation, we introduce extra cross entropy (CE) loss to minimize the error between the observed PA and approximated PA distributions  (i.e., the spatial and temporal based PA distribution approximation and ) in current time step ,


By considering both MSE loss and CE loss, SHARE aims to jointly minimize the following objective


where is the hyper-parameter controls the importance of two CE losses.


Experimental setup

Data description.

We use two real-world datasets collected from Beijing and Shenzhen, two metropolises in China. Both datasets are ranged from April 20, 2019, to May 20, 2019. All PA records are crawled every 15 minutes from a publicly accessible app, in which all parking occupancy information are collected by real-time sensors. We associate POI distribution [13, 29] to each parking lot and aggregate check-in records nearby each parking lot in every minutes as the population data. POI and check-in data are collected through Baidu Maps Place API and location SDK [12]. We chronologically order the above data, take the first as the training set, the following for validation, and the rest as the test set. In each dataset, parking lots are masked as unlabeled. The spatial distribution of parking lots in Beijing are shown in Figure 3. The statistics of the datasets are summarized in Table 1.

Figure 3: Spatial distribution of parking lots in Beijing.

Implementation details.

Our model and all seven baselines are implemented with PaddlePaddle. Following previous work [10, 26], the PA is normalized before input and scaled back to absolute PA in output. We choose and select for prediction. We set Km and to connect parking lots. The dimension of and are fixed to , is fixed to . The layer of CxtConv, SCConv, and PropConv are , respectively. We use dot-product attention in this paper. In SCConv, the number of latent nodes is set to , where is the total number of parking lots. The activation function in CxtConv and SCConv are LeakyReLU (), and Sigmoid in other layers. We employ the Adam optimizer for training, fix the learning rate to and set to . For a fair comparison, all parameters of each baseline are carefully tuned based on the recommended settings.

# of parking lots 1,965 1,360
# of PA records 5,847,840 4,047,360
Average # of parking spots 210.24 185.36
# of check-ins 9,436,362,579 3,680,063,509
# of POIs 669,058 250,275
# of POI categories 197 188
Table 1: Statistics of datasets.

Evaluation metrics.

We adopt Mean Average Error (MAE) and Rooted Mean Square Error (RMSE), two widely used metrics [11] for evaluation.

Algorithm Beijing (15/ 30/ 45 min) Shenzhen (15/ 30/ 45 min)
LR 29.90 / 30.27 / 30.58 69.74 / 70.95 / 72.00 24.59 / 24.80 / 25.09 51.31 / 52.36 / 52.80
GBRT 17.29 / 17.81 / 18.40 44.60 / 48.50 / 51.59 13.90 / 14.67 / 14.71 35.05 / 37.98 / 38.09
GRU 18.51 / 18.78 / 19.73 55.43 / 55.92 / 58.64 16.73 / 16.88 / 17.14 46.92 / 47.26 / 47.56
Google-Parking 21.49 / 21.68 / 22.85 57.26 / 59.25 / 60.48 17.10 / 18.33 / 18.69 47.30 / 48.45 / 49.34
Du-Parking 17.67 / 17.70 / 18.03 50.17 / 50.63 / 51.75 13.91 / 14.17 / 14.39 42.66 / 43.24 / 43.56
STGCN 16.57 / 16.44 / 17.10 50.79 / 51.04 / 52.61 13.46 / 13.59 / 13.88 39.26 / 39.96 / 40.29
DCRNN 15.66 / 15.97 / 16.30 46.28 / 47.80 / 48.87 13.11 / 13.19 / 13.89 42.74 / 43.37 / 44.27
CxtGNN (ours) 15.29 / 15.69 / 16.15 45.55 / 46.69 / 47.78 12.39 / 12.73 / 13.09 36.31 / 36.92 / 37.46
CAGNN (ours) 12.45 / 12.77 / 13.20 39.99 / 40.81 / 41.31 10.50 / 10.62 / 10.98 31.86 / 32.12 / 32.83
SHARE (ours) 10.68 / 10.97 / 11.43 32.00 / 32.78 / 33.78 9.23 / 9.41 / 9.66 30.44 / 30.90 / 31.70
Table 2: Parking availability prediction error given by MAE and RMSE on Beijing and Shenzhen.
(a) Ratio of labeled parking lot
(b) Ratio of latent node
(c) Effect of
(d) Effect of
Figure 4: Parameter sensitivity on Beijing.


We compare our full approach with the following seven baselines and two variants of SHARE:

  • LR

    uses logistic regression for parking availability prediction. We concatenate previous

    steps historical features as the input and predict each parking lot separately.

  • GBRT

    is a variant of boosting tree for regression tasks. It is widely used in practice and performs well in many data mining challenges. We use the version in XGboost 

    [2], and the input is the same as LR.

  • GRU [3] predicts the PA of each parking lot without considering spatial dependency. We train two GRUs for and separately.

  • Google-Parking [1] is the parking difficulty prediction model deployed on Google Maps. It uses a feed-forward deep neural network for prediction.

  • Du-Parking [18] is the parking availability estimation model used on Baidu Maps. It fuses several LSTMs to capture various temporal dependencies.

  • STGCN [26] is a state-of-the-art graph neural network model for traffic forecasting. It models both spatial and temporal dependency with convolution structure. The input graph is constructed as described in the original paper but keeps same graph connectivity with our CxtConv.

  • DCRNN [10] is another graph convolution network based model, which models spatial and temporal dependency by integrating graph convolution and GRU. The input graph is the same as STGCN.

  • CxtGNN is a basic version of SHARE, without including PA approximation and soft clustering graph convolution.

  • CAGNN is another variant of SHARE but without including the soft clustering graph convolution block.

Overall performance

Table 2 reports the overall results of our methods and all the compared baselines on two datasets with respect to MAE and RMSE. As can be seen, our model together with its variants outperform all other baselines using both metrics, which demonstrates the advance of SHARE. Specifically, SHARE achieves and improvements beyond the state-of-the-art approach (DCRNN) on MAE and RMSE on Beijing for prediction, respectively. Similarity, the improvement of MAE and RMSE on Shenzhen are and . Moreover, we observe significant improvement by comparing SHARE with its variants (i.e., CxtGNN and CAGNN). For example, by adding the PA approximation module, CAGNN achieves lower MAE and lower RMSE than CxtGNN on Beijing, respectively. By further adding the SCConv block, SHARE achieves lower MAE and lower RMSE than CAGNN on Beijing. The improvement in Shenzhen are consistent. All above results demonstrate effectiveness of the PA approximation and the hierarchical graph convolution architecture.

Looking further in to the results, we observe all graph convolution based models (i.e., STGCN, DCRNN and SHARE) outperform other deep learning based approaches (i.e., Google-Parking and Du-parking), which consistently reveals the advantage of incorporating spatial dependency for parking availability prediction. Remarkably, GBRT outperforms Google-parking, GRU, LR, and achieves a similar result with Du-parking, which validates our exception that GBRT is a simple but effective approach for regression tasks. One extra interesting finding is that both MAE and RMSE of all methods on Shenzhen is relatively smaller than on Beijing. This is possible because the spatial distribution of parking lots is more dense and evenly distributed in Shenzhen; therefore they are easier to predict.

Parameter sensitivity

Due to space limitations, here we report the impact of the ratio of labeled parking lot (i.e., ), the proportion of latent nodes in the soft clustering graph convolution with respect to the total number of parking lot (i.e., ), the input time step and the prediction time step using MAE on Beijing. Each time we vary a parameter, set others to their default values. The results on Beijing using RMSE and on Shenzhen using both metrics are similar.

First, we vary the ratio of the labeled parking lot from to . The results are reported in Figure 4(a). The results are unsurprising: equipping more real-time sensors in parking lots enables us to more accurately predict PA. However, equipping more sensors lead to extra economic cost and may be constrained by policies of each parking lot. Finding the most cost-effective ratio and exploring optimal sensor distribution are important problems in the future study.

Then, we vary the ratio of the latent nodes from to . For example, there are parking lots on Beijing, corresponds to latent nodes. The results are reported in Figure 4(b). As can be seen, there is a performance improvement by increasing the ratio of latent node form to , but a performance degradation by further increasing the ratio of the latent node from to . The reason is that heavily reduce the number of latent nodes reduces the discriminative power of learned latent representation, whereas too many latent nodes reduces the regularization power of learned latent representation.

To test the impact of input length, we vary from to . The results are reported in Figure 4(c). SHARE achieves least errors when . One possible reason is that an excessively short-term input can not provide sufficient temporal correlated information, whereas too long input introduces more noises for temporal dependency modeling.

Finally, to test the impact of prediction step, we vary from to . The results are reported in Figure 4(d). We separate the result of labeled and unlabeled parking lots separately. Overall, labeled parking lots are much easier to predict. Besides, by increase , the error of all parking lots increases consistently. However, we can observe the error of labeled parking lots are increasing faster, this makes sense because the temporal dependency between observed PA and future PA becomes lower when goes large.

(a) MAE
(b) of parking spot
Figure 5: Robustness study on Beijing.

Effectiveness on different regions

To evaluate the performance of SHARE on different regions, we partition Beijing into a set of disjoint grid based on longitude and latitude, and test the performance of SHARE on each region. Figure 5(a) and Figure 5(b) plot the averaged MAE of SHARE and averaged number of parking spot in each region on Beijing

, respectively. Overall, the MAE in each region is even except for several outliers. We find the performance of

SHARE is highly correlated with the averaged number of parking spots in each region. For example, the MAE on region and are and , which are greater than the overall MAE . Meanwhile, the averaged parking spot of these two regions are and , significantly greater than overall averaged parking spot . This is possible because for the same ratio of parking availability fluctuate, parking lot with a larger number of parking spot will have larger MAE. This result indicates in the future further optimization can be applied to these large parking lots to improve the overall performance.

Related Work

Parking availability prediction. Previous studies on parking availability prediction mainly fall in two categories, contextual data based prediction and real-time sensor based prediction. For contextual data based prediction, Google-parking [1] and Du-parking [18] predict parking availability based on indirect signals (e.g., user feedbacks and contextual factors), which may induce an inaccurate prediction result. For real-time sensor based prediction, study in [17] proposes an auto-regressive model and study in [4] proposes a boosting method for parking availability inference. Above approaches are limited by economic and privacy concerns and are hard to be scaled to all parking lots in a city. Moreover, all the above approaches don’t fully exploit non-Euclidean spatial autocorrelations between parking lots, which limits their prediction performance.
Graph neural network.

Graph neural network (GNN) extends the well-known convolution neural network to non-Euclidean graph structures, where the representation of each node is derived by first aggregating and then transforming representations of its neighbors 

[21]. It is worth to point out that the idea of our soft clustering graph convolution is partially inspired by [25], but our objective is to capture global spatial correlation for node-level prediction. Due to its effectiveness, GNN has been successfully applied to several spatiotemporal forecasting tasks, such as traffic flow forecasting [10, 6] and taxi demand forecasting [5, 23]. However, we argue these approaches either overlook contextual factors or global spatial dependency and are not tailored for parking availability prediction.


In this paper, we present SHARE, a city-wide parking availability prediction framework based on both environmental contextual data and partially observed real-time parking availability data. We first propose a hierarchical graph convolution module to capture both local and global spatial correlations. Then, we adopt a simple yet effective GRU module to capture dynamic temporal autocorrelations of each parking lot. Besides, a parking availability approximation module is proposed for parking lots without real-time parking availability information. Extensive experimental results on two real-world datasets show that the performance of SHARE for parking availability prediction significantly outperforms seven state-of-the-art baselines.


This research is supported in part by grants from the National Natural Science Foundation of China (Grant No.71531001).


  • [1] N. Arora, J. Cook, R. Kumar, I. Kuznetsov, Y. Li, H. Liang, A. Miller, A. Tomkins, I. Tsogsuren, and Y. Wang (2019) Hard to park?: estimating parking difficulty at scale. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2296–2304. Cited by: Introduction, 4th item, Related Work.
  • [2] T. Chen and C. Guestrin (2016) Xgboost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 785–794. Cited by: 2nd item.
  • [3] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Cited by: Temporal dependency modeling, 3rd item.
  • [4] R. Fusek, K. Mozdřeň, M. Šurkala, and E. Sojka (2013) AdaBoost for parking lot occupation detection. In Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013, pp. 681–690. Cited by: Introduction, Related Work.
  • [5] X. Geng, Y. Li, L. Wang, L. Zhang, Q. Yang, J. Ye, and Y. Liu (2019) Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting. In

    Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence

    pp. 3656–3663. Cited by: Related Work.
  • [6] S. Guo, Y. Lin, N. Feng, C. Song, and H. Wan (2019) Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, pp. 922–929. Cited by: Related Work.
  • [7] H. Hsieh, S. Lin, and Y. Zheng (2015) Inferring air quality for station location recommendation based on urban big data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 437–446. Cited by: Approximated PA fusion.
  • [8] T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Cited by: Contextual graph convolution.
  • [9] Q. Li, Z. Han, and X. Wu (2018) Deeper insights into graph convolutional networks for semi-supervised learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, pp. 3538–3545. Cited by: Soft clustering graph convolution.
  • [10] Y. Li, R. Yu, C. Shahabi, and Y. Liu (2018) Diffusion convolutional recurrent neural network: data-driven traffic forecasting. In 6th International Conference on Learning Representations, ICLR 2018, Cited by: 7th item, Implementation details., Related Work.
  • [11] Y. Liang, S. Ke, J. Zhang, X. Yi, and Y. Zheng (2018) GeoMAN: multi-level attention networks for geo-sensory time series prediction.. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018, pp. 3428–3434. Cited by: Introduction, Evaluation metrics..
  • [12] H. Liu, T. Li, R. Hu, Y. Fu, J. Gu, and H. Xiong (2019) Joint representation learning for multi-modal transportation recommendation. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, pp. 1036–1043. Cited by: Data description..
  • [13] H. Liu, Y. Tong, P. Zhang, X. Lu, J. Duan, and H. Xiong (2019) Hydra: a personalized and context-aware multi-modal transportation recommendation system. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2314–2324. Cited by: Data description..
  • [14] Y. Liu, C. Liu, X. Lu, M. Teng, H. Zhu, and H. Xiong (2017) Point-of-interest demand modeling with human mobility patterns. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 947–955. Cited by: Introduction.
  • [15] S. Ma, R. Hu, L. Wang, X. Lin, and J. Huai (2019) An efficient approach to finding dense temporal subgraphs. IEEE Transactions on Knowledge and Data Engineering. Cited by: Contextual graph convolution.
  • [16] S. Mathur, T. Jin, N. Kasturirangan, J. Chandrasekaran, W. Xue, M. Gruteser, and W. Trappe (2010) Parknet: drive-by sensing of road-side parking statistics. In Proceedings of the 8th international conference on Mobile systems, applications, and services, pp. 123–136. Cited by: Introduction.
  • [17] T. Rajabioun and P. A. Ioannou (2015) On-street and off-street parking availability prediction using multivariate spatiotemporal models. IEEE Transactions on Intelligent Transportation Systems 16 (5), pp. 2913–2924. Cited by: Introduction, Related Work.
  • [18] Y. Rong, Z. Xu, R. Yan, and X. Ma (2018) Du-parking: spatio-temporal big data tells you realtime parking availability. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 646–654. Cited by: Introduction, 5th item, Related Work.
  • [19] D. C. Shoup (2006) Cruising for parking. Transport Policy 13 (6), pp. 479–486. Cited by: Introduction.
  • [20] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, \. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Advances in Neural Information Processing Systems, pp. 6000–6010. Cited by: Contextual graph convolution.
  • [21] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2018) Graph attention networks. In 6th International Conference on Learning Representations, ICLR 2018, Cited by: Contextual graph convolution, Related Work.
  • [22] P. Wang, Y. Fu, G. Liu, W. Hu, and C. Aggarwal (2017) Human mobility synchronization and trip purpose detection with mixture of hawkes processes. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 495–503. Cited by: Introduction.
  • [23] Y. Wang, H. Yin, H. Chen, T. Wo, J. Xu, and K. Zheng (2019) Origin-destination matrix prediction via graph convolution: a new perspective of passenger demand modeling. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1227–1235. Cited by: Related Work.
  • [24] H. Yao, X. Tang, H. Wei, G. Zheng, and Z. Li (2019) Revisiting spatial-temporal similarity: a deep learning framework for traffic prediction. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, Cited by: Introduction.
  • [25] Z. Ying, J. You, C. Morris, X. Ren, W. Hamilton, and J. Leskovec (2018) Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems, pp. 4800–4810. Cited by: Related Work.
  • [26] B. Yu, H. Yin, and Z. Zhu (2018) Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018, Cited by: 6th item, Implementation details..
  • [27] J. Zhang, F. Wang, K. Wang, W. Lin, X. Xu, and C. Chen (2011) Data-driven intelligent transportation systems: a survey. IEEE Transactions on Intelligent Transportation Systems 12 (4), pp. 1624–1639. Cited by: Introduction.
  • [28] J. Zhou and A. K. Tung (2015) Smiler: a semi-lazy time series prediction system for sensors. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1871–1886. Cited by: Introduction.
  • [29] H. Zhu, H. Xiong, F. Tang, Q. Liu, Y. Ge, E. Chen, and Y. Fu (2016) Days on market: measuring liquidity in real estate markets. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 393–402. Cited by: Data description..