Potential Passenger Flow Prediction: A Novel Study for Urban Transportation Development

Recently, practical applications for passenger flow prediction have brought many benefits to urban transportation development. With the development of urbanization, a real-world demand from transportation managers is to construct a new metro station in one city area that never planned before. Authorities are interested in the picture of the future volume of commuters before constructing a new station, and estimate how would it affect other areas. In this paper, this specific problem is termed as potential passenger flow (PPF) prediction, which is a novel and important study connected with urban computing and intelligent transportation systems. For example, an accurate PPF predictor can provide invaluable knowledge to designers, such as the advice of station scales and influences on other areas, etc. To address this problem, we propose a multi-view localized correlation learning method. The core idea of our strategy is to learn the passenger flow correlations between the target areas and their localized areas with adaptive-weight. To improve the prediction accuracy, other domain knowledge is involved via a multi-view learning process. We conduct intensive experiments to evaluate the effectiveness of our method with real-world official transportation datasets. The results demonstrate that our method can achieve excellent performance compared with other available baselines. Besides, our method can provide an effective solution to the cold-start problem in the recommender system as well, which proved by its outperformed experimental results.


page 1

page 2

page 3

page 4


Modeling Heterogeneous Relations across Multiple Modes for Potential Crowd Flow Prediction

Potential crowd flow prediction for new planned transportation sites is ...

Hub and Spoke Logistics Network Design for Urban Region with Clustering-Based Approach

This study aims to propose effective modeling and approach for designing...

Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction

Taxi demand prediction is an important building block to enabling intell...

Deep Multi-View Spatiotemporal Virtual Graph Neural Network for Significant Citywide Ride-hailing Demand Prediction

Urban ride-hailing demand prediction is a crucial but challenging task f...

Trip Prediction by Leveraging Trip Histories from Neighboring Users

We propose a novel approach for trip prediction by analyzing user's trip...

Diversity and density of urban functions in station areas

The diversity and density of urban functions have been known to affect u...

Two-stage optimization of urban rail transit formation and real-time station control at comprehensive transportation hub

This paper tries to discuss two strategies of dealing with this complex ...


With the growth of intelligent transportation systems, passenger flow prediction models concentrate on discovering the volume of crowds and mobility patterns that best serve people’s daily life [Pan et al.2019, Zhang et al.2019]. Recent advances in passenger flow prediction are focusing mainly on next time interval flow conditions with time evolves [Gong et al.2018, Sun et al.2015]. If a brand-new metro station is inserted into the original metro network, existing predictors have to collect a large amount of latest transactional data to ensure normal operation. However, a real-world requirement from transportation authorities is that they want to obtain the potential passenger flows (PPF) of a planned city area in advance (i.e., before constructing a station in this area). It is significant for the urban traffic development and transportation management, as it can provide insights for the site selection of stations and analysis of passenger movement patterns, as well as give the potential crowd warning.

Figure 1: The example of PPF prediction problem. We aim to forecast the passenger flows of target areas (e.g., , , ) across the entire city network.

In the PPF prediction task, concentrating solely on the entrance and exit potential flows does not provide adequate information, authorities also desperately want to master the distribution of predicted PPF, i.e., forecast the number of potential passengers moving to different destinations. It is utmost important to find how will the new station affect other areas. For instance, Figure 1 illustrates an example of the PPF prediction problem. A city region is partitioned into nine areas111We use grids for clear and simple illustration, the real partition standard is explained in the section of data description., six of them have metro stations (termed as known areas), and three have not constructed yet (termed as target areas). The right part of Figure 1 presents an origin-destination (OD) matrix (each row point is the origin area and column points are destinations), e.g., (, ) = 130 indicates that there are 130 passengers departure from and are going to the . PPF task aims to make an accurate prediction for the target areas in one period (e.g., rush hours) that completes the crowd flows between them and known areas.

To date, limited studies considered the OD passenger flow prediction problem [Gong et al.2018, Wang et al.2019], and to the best of our knowledge, none of existing techniques can forecast PPF across the entire city. It is a novel problem and a real urban developing demand that faces several major challenges: (1) Considering the number of passenger flows and their final destinations simultaneously. (2) Analogously to the cold-start problem in the recommender system [Lam et al.2008], it is hard to infer the preference of a new user from the known data. In our problem, a new station in the target area can be similarly regarded as a new user. (3) Since the PPF is a spatial-temporal mining problem, spatial and temporal information should be taken into account appropriately.

To resolve this novel and significant problem, in this paper, we devise a multi-view localized correlation learning model for the PPF prediction (MLC-PPF for short). To leverage the spatial information, we first construct a localized similarity matrix which associates with the real geographical neighbors and regional properties (e.g., business or residential regions). The intuition behind this strategy is from the First Law of Geography [Tobler1970], i.e. “Everything is related to everything else, but near things are more related than distant things”. Second, a novel weighted correlation learning strategy is proposed. At last, to improve the prediction accuracy and well handle the cold-start challenge, we draw the side information from urban statistical data, where each area has a multi-view features to guide the learning process. In summary, our main contributions are shown as follows:

  • We formulate the PPF prediction problem and provide the first attempt on forecasting passenger flows for urban transportation development.

  • We propose a multi-view localized correlation learning method to provide a solution for the PPF prediction that can learn localized correlations via a multi-view learning process.

  • We show that our method can be transferred to the classic cold-start problem in the recommender system. It achieves a superior result that gives a new perspective for relevant tasks.

  • We conduct extensive prediction experiments on a large real-world transactional dataset and show that our model outperforms other available algorithms.

Related Work

Passenger Flow Prediction

Most existing passenger flow prediction models focused on forecasting entrance/exit flows at certain stations or areas, neglecting the crowd flows across different stations. [Chen and Wei2011, Wei and Chen2012, Ni, He, and Gao2017]. Wei et al. [Chen and Wei2011]

developed an effective short-term passenger flow prediction model to explore the time variants and capture dynamic patterns on a single subway line. Subsequently, a modified approach is proposed based on the neural network, which aims to solve the same entrance/exit crowd flow prediction task in a few metro lines

[Wei and Chen2012]. Ni et al. [Ni, He, and Gao2017] used auxiliary information, such as social media events, to improve the forecast performance.

One of the research hotspots is named the city network-wide crowd flow prediction, which is a significant task for the modern transportation management [Deng et al.2016]. Nowadays, some of methods were focusing on forecasting the citywide crowd flows. [Ma et al.2016]

devised a series of visualization approaches to show the flows’ dynamic changes in the networks. Zhang et al. proposed the deep learning models based on the ResNet to predict crowd inflows and outflows of the entire city regions.

[Zhang et al.2018, Zhang et al.2019]. The Probabilistic model is an effective approach to estimate the traffic speed. For example, [Zhan et al.2016] and [Lin et al.2017] used trajectory data to estimate citywide traffic volume via probabilistic graphical models. [Gong et al.2018] proposed an effective method based on online latent space learning to predict the crowd flow distribution, i.e., forecast the OD pairs and the quantity of passenger flows simultaneously. To the best of our knowledge, none of existing crowd flow prediction methods considered the PPF problem studied in this paper.

Other relevant studies, such as [Hsieh, Lin, and Zheng2015], are point-based prediction model, not in a matrix formulation. [Hsieh, Lin, and Zheng2015] selects k points to predict k values. But in our task, k target areas are required nk prediction values, where n is the number of known areas. It is because we also need to consider the crowd flows between each area.

Figure 2: The flowchart of our proposed model. In the learning process, given a set of previous PPF matrices {}, MLC-PPF learns the localized correlation matrix and adaptive-weight via a -nearest indicator matrix . The cross-domain knowledge is utilized to guide the updating of . Then, the target prediction can be inferred by Algorithm 1.

Multi-view Learning

Traditional passenger flow mining usually deals with data from a single view. Recently, there exists a diversity of datasets from different sources in various domains with multiple views [Zheng2015, Li et al.2019]. The multi-view learning algorithm is widely recognized as an effective way of solving the cross-domain problem, that features from different views can be served for the target domain learning process [Singh and Gordon2008, Xu, Tao, and Xu2013, Elkahky, Song, and He2015]. [Xu, Tao, and Xu2015] proposed a matrix co-factorization based method (MVL-IV) to embed different views into a shared subspace, such that the incomplete views can be estimated by the information on observed views. To connect multiple views, MVL-IV assumed that different views have distinct ‘feature’ matrices (i.e., ), but correspond to the same coefficient matrix (i.e.,

). The tensor-based methods, such as

[Hu et al.2013], [Hu et al.2016] [Taneja and Arora2018] were proposed to address the cross-domain recommendation problem. They devised a cross-domain triadic factorization model to learn the triadic factors for user, item and domain, where the item dimensionality varies with domains. The above approaches and other similar methods [Rendle et al.2009, Xiong et al.2010] cannot address our PPF prediction problem directly because they are not formulated for the passenger flow prediction task. However, since they can handle the cold-start problem by utilizing the cross-domain knowledge, an illuminative clue is educed.

In conclusion, none of relevant studies can solve the PPF prediction problem directly. Accordingly, this paper aims to design a reliable approach for PPF prediction with cross-domain knowledge involved.

Problem Statement

Focusing on the PPF prediction problem, every origin-destination among areas needs to be recorded. We formulate the OD passenger flow network as a fully connected graph = (, ), where is a set of vertexes and is the set of edges. records the -th origin or destination area, and an edge denotes an origin-destination flow from area to . The value of each edge is associated with the observed flow , i.e., is the total number of passengers that departure from area and are going to the area. Then, G can be represented by PPF matrix , where = . The example of and is shown in Figure 1. = 55 means that 55 passengers leave from and theirs’ destination is .

In the real-world scenario, one area may have several stations. In this case, we calculate the passenger flows of these stations together to present the total flows of the area. We consider three specific and useful time periods to predict PPF, which will help the authorities to do a better temporal analysis of transportation development. The three periods are morning rush hour (7.00 AM - 9.00 AM), afternoon rush hour (5.00 PM - 7.00 PM), and non-rush hour (2.00 PM - 4.00 PM).

Furthermore, traffic periodicity is a very important factor for relevant studies. Crowd flows also represent the stable and daily periodic properties, especially on weekdays. To extract the temporal information and make a more general prediction, we consider a series of previous daily PPF matrices () in the same time period to predict the PPF matrix of target areas () for the day . Note that, the prediction is not limited in the -th day, the target can be changed easily based on the real requirement.

To best simulate the crowd flow changes when picking the target areas, in this paper, we tracked all trajectories of passengers, from origins to their destinations. For example, if area is selected as a target area, all the departure crowd flows from will add to its closest area (e.g., ) to best simulate the people’s choice. In this way, the PPF is learned by the crowd flows under the assumption that the original passengers from will departure from its closest neighbor .

The Proposed Method

In this section, we propose our PPF prediction model MCL-PPF. We will describe the strategy of localized correlation learning with adaptive-weight, and how to leverage the cross-domain multi-view information to improve our work. Figure 2 shows the flowchart of our model.

Localized Correlation Learning

PPF prediction problem is a spatially related task that the more similar between two areas, the more correlations of passenger flow condition they have. Assuming that a city is partitioned into areas, including known and target areas. presents the PPF matrix, and presents the localized flow matrix of , where the -th row of is a combination of its -nearest area passenger flows. In that way, we formulate the function to learn the localized correlation, which can be expressed as:


where is the localized correlation matrix that learns the transformation from to in each period of day , is the total number of days; stands for the projection of all observed passenger flows from the known area set ; is Frobenius norm of matrix.

Now, we will discuss how to build of one period in a day. The physical distances among areas need to be considered first. Moreover, the development of a city gradually fosters different functional regions, such as business and educational areas, where the areas belonging to the same functional region will have strong connections with their properties [Zheng et al.2014].

Thus, the similarities among areas should take into account the above two standards. To this end, we build two distance metrics from the real geographic location and regional similarity. The distance metric between -th and -th areas is shown as follows:


where is the geographic distance between -th and -th areas; and presents the Euclidean distance which is calculated by intrinsic features of areas (e.g., point of interest attributes).

After having gotten the , the neighbors of the -th area can be picked. Then, we construct an indicator matrix for the -nearest neighbors of all areas where each row indicates the position of its -nearest known areas. For example, in the stage 2 of Figure 2, the first row of illustrates that , , and are the -nearest areas of if is setting to 3. Accordingly, for each day , the localized flow matrix can be represented as , where is an adaptive-weight matrix that learns the different weights of -nearest areas. To this stage, the localized correlation learning process is shown as Eq. (3).



is the entrywise product; The loss function aims to learn the localized correlations matrix

and weight matrix simultaneously.

Improvement by Cross-domain Learning Process

As mentioned above, there are various functional regions of a city. Thanks to the urban statistical data, the passenger flow similarities among different areas can be reviewed from this cross-domain perspective. Based on the phenomenon that the similar functional regions have the similar passenger flows (e.g., the business regions have a large number of entrance flows during the morning rush hour, and people leave from residential areas in the same time span), we leverage such information to guide the localized correlation learning process.

The statistical data have multiple views to record the differences between areas. For example, the economy view reveals the economic features, such as the number of industries and employee statistics; and the population view consists of detailed population information. Let {} denote the multi-fold views of statistical data, where , the row of denotes the area and column denotes the feature. To improve the prediction performance, cross-domain knowledge is involved as guidance, which can be formulated as:


where is the regularization parameter; denotes the localized matrix of .

After solving Eq. (4), the learned matrices and can be used to make the prediction. The predicted PPF of target areas in -th day is:


where is an indicator matrix whose entry () is one if is observed and zero otherwise.

To this stage, the OD passenger flows in the target areas are learned by the above processes, i.e., predict each row of target areas. However, the column of target areas revealing how much crowds arrived at these areas needs to be predicted with a slight modification. That is, replace with in Eq. (4) to learn the localized correlation from the other side. It can be solved in a likewise manner. Thus we only presented the optimization strategy of Eq. (4) due to the page limitation.

Learning and Prediction

Eq. (4) is a complex non-convex problem. But the loss function associated with Eq. (4) is convex regarding with fixed and vice verse. We can optimize them alternatively until convergence (e.g., alternating least squares (ALS)). A straightforward way to minimize the loss function is through the gradient method.

Considering while is fixed, Eq. (4) can be rewritten as follows:


Taking the derivative of with respect to , we can get gradient :


Analogously, the derivative of with respect to is:


Let and be the step-size and number of iterations. In each stage, we adopt the following update rules:


where .

Based on the above update equations, the iterative learning and prediction process for MLC-PPF are summarized in Algorithm 1.

Input: PPF matrices []; Mutiple views of statistical data [].
Output: Prediction
1 Initialize : by solving Eq. (3), where the pseudo-inverse of matrix;
2 Initialize : , where is built by Eq. (2)
3 Construct by the real geographic location and regional similarity.
4 for t = 1 to T do
5       if  /  then
6             update By Equation 9
7             update By Equation 10
8             update By Equation 11
10      else
11             Break
Return By Equation 5.
Algorithm 1 MLC-PPF


Data Description

  • We describe the transactional dataset used in this paper, which is a large-scale, real-world dataset provided by NSW Sydney Transport. After data cleaning222We removed the recording errors and UNKNOWN trips, etc., the dataset contains above 35 million transactional records covering 194 stations including the city train and ferry stations between 7 Nov 2016 and 11 Dec 2016. We pick the data between 7 Nov. 2016 and 20 Nov. 2016 as the training and validation sets (used to tune parameters); the remaining data are used as the test set.

  • The urban statistical data are collected from Australian Bureau of Statistics-2016 (ABS) with four views; those are Economy, Family, Income, and Population. The numbers of the di-mension of four views are 43, 44, 50, 97, respectively.

  • All the transactional dataset across the transport network are mapped into 117 areas to build the flow matrices , . The designation of areas is based on the Australian Statistical Geography Standard for the best practical value.

Methods and Metrics

We use the following five baselines which can learn the flow data by the cross-domain knowledge guidance. Among them, CDTF and WITF are two tensor-factorization-based (TF) methods that can solve the cold-start problem. For NMF, we concatenate the flow matrix with the statistical data. All parameters used in baselines and our method are picked by a grid search approach.

  • NMF: Predict the PPF by the non-negative matrix factorization, which concatenates the flow matrix and the statistical data [Lee and Seung2001].

  • MVL-IV: A state-of-the-art multi-view learning method based on the matrix co-factorization, it learns the same coefficient matrix to connect multiple views [Xu, Tao, and Xu2015]. In this method, we set the flow matrix as one of the views, other views are from the ABS dataset.

  • LS-KNN

    : Latent similarity k-nearest neighbors. After calculating the latent similarities among areas by Eq. (2), we pick k-nearest neighbors of the target areas, and use average crowd flows of these neighbors as an estimate (k=4).

  • CDTF: A state-of-the-art TF method to learn the cross-domain knowledge [Hu et al.2013].

  • WITF: A weighted irregular TF method which is similar as the CDTF [Hu et al.2016]. For CDTF and WIFT, we leverage the passenger flow and ABS data to construct the tensor.

Methods Morning Rush Hour Afternoon Rush Hour Non-rush Hour Average
NMF 124.50 30.78% 117.92 37.13% 89.11 28.44% 110.51 32.12%
MVL-IV 108.31 29.50% 101.55 29.78% 92.05 27.54% 100.64 28.94%
CDTF 75.15 22.43% 84.02 25.93% 67.78 19.37% 75.65 22.58%
WITF 69.30 18.73% 72.06 19.45% 62.57 17.26% 67.98 18.48%
LS-KNN 19.89 5.42% 20.20 7.67% 23.51 7.94% 21.20 7.01%
MLC-PPF 9.84 2.30% 11.47 3.12% 8.22 1.21% 9.84 2.21%
Table 1: Comparisons with different time periods. We report the average mean absolute errors (MAE) and Normalized Root Mean Square Error (NRMSE) among various methods. The target areas occupied 20% of the total set. Best results are bold.
Methods 5% 10% 15% 25%
NMF 88.11 20.70% 90.23 21.66% 107.44 26.52% 128.70 30.11%
MVL-IV 92.75 20.36% 90.40 20.75% 99.01 23.79% 101.93 32.40%
CDTF 59.25 18.90% 61.25 19.26% 68.07 22.00% 79.06 26.35%
WITF 60.41 18.23% 60.72 19.77% 61.18 19.23% 71.07 20.73%
LS-KNN 13.69 4.72% 16.45 5.01% 18.51 5.44% 23.99 9.25%
MLC-PPF 8.64 1.37% 8.80 1.20% 9.73 1.90% 11.07 2.34%
Table 2: Comparisons with different removing ratios. We report MAE and NRMSE through all test data.


We used the two most widely used evaluation metric to measure the PPF prediction quality. They are

Mean Absolute Error (MAE) and Normalized Root Mean Square Error (NRMSE).

where is a forecasting passenger flow from -th area to -th; and is the ground truth; is the number of predictions; .

Comparisons on Different Time Periods

The first set of experiments is designed to assess the performance on different time periods. We randomly removed 20% areas as the target set, and the remaining 80% areas as the known set. The learning step is fitted to and there are only two hyper-parameters used in our method, where and are chosen from {1,2,3,…,10} and {} respectively. We repeat the experiment 20 times with random initialization and report the average results.

Experimental results are presented in Table 1. Compared with other approaches, our method achieved the best prediction accuracies on both three time periods. None of the multi-view and cross-domain methods work well because it is hard to capture the relationships between statistical data and the passenger flows. The approach LS-KNN performs better than other baselines, which illustrates that the PPF prediction problem has a strong spatial correlation property. In summary, the proposed method is a well-designed model for PPF prediction, which outperforms the other available baselines because it considers the localized correlations and the cross-domain knowledge simultaneously.

Comparisons on Various Missing Ratios

In this experiment, we evaluate how performance will change with varied number of target areas. We randomly pick 5%, 10%, 15%, 25% areas as the target areas, and run 20 times to report the average errors. The test period is in the morning rush hours. The performances of different methods are summarized in Table 2.

It is apparent that the experimental results lead to similar conclusions to the first comparison. Our model, MLC-PPF, significantly outperforms all other comparative methods over all testing sets. The performances of MLC-PPF in 5% dataset are very close to that of in 10%, which illustrates that the 90% remaining area set can learn a satisfied localized correlation and make accurate PPF predictions. In the real-world application, the proportion of target areas is usually small since only a few areas are suitable for constructing a new station.

(a) Factor
(b) Factor
Figure 3: Effect of Parameters
Figure 4: The case study. This figure shows the passenger flow prediction that departure from “Homebush” to other areas. To keep figure clear, we only draw our method and the ground-truth because other baselines perform far worse than the MLC-PPF.

Parameter Analysis

In this section, we analyze the effects of two hyper-parameters used in this paper, where is the number of nearest neighborhoods, and the regularization factor controls the strength of guidance from data.

Figure 3(a) shows the different performances with a varying setting for . For each area, the correlation matrix only learns the transform from these neighborhoods. As can be shown in the results, = 2 is the best choice for our method, and the model is not sensitive to between 1 and 5.

Figure 3(b) represents various results by varying . = achieves the best results, and the performances are stable when choosing between [, ].

In a summary, the parameters used in this paper are benefit to the improvement of our models. MLC-PPF is stable because it is insensitive to parameters.

Case Study

We display a PPF prediction result of one target area in this section. In this case, the area “Homebush” is treated as the target area. For better visualization, we only remain the areas where the number of arrived passengers is greater than 5.

As shown in Figure 4, our model yields a great prediction result compared with the ground-truth, especially in some main areas of Sydney, such as the central area “Sydney-Haymarket”, “Burwood-Croydon”, “North Sydney-Lavender Bay” and “Parramatta-Rosehill”. The case study demonstrates the effectiveness of our method for the PPF prediction.

Transfer to the Cold-start Problem

As we have emphasized, our strategy can provide a new perspective to address the classic cold-start problem in the recommender system. This set of experiments is designed to assess the transferability of our model.

We choose a very famous dataset from Amazon to do the evaluation, in which the dataset contains 1,555,170 users and 1-5 scaled ratings over 548,552 different products covering four domains: books, music CDs, DVDs and videos [Hu et al.2013]. We randomly remove the 20% users from target domains to simulate the cold-start situation. Three baselines are used in this comparison. CMF is an effective method based on the collective matrix factorization which couples rating matrices for all domains on the User dimension [Singh and Gordon2008]. CDTF and WITF are two tensor-factorization-based cross-domain recommendation methods, they devise a strategy to transform original data into a cubical tensor that can better capture the interactions between user factors and item factors [Hu et al.2013, Hu et al.2016]. In this experiment, we leverage the information excluding target domain to build the -nearest indicator matrix .

Table 3 shows the results of our methods together with some state-of-the-art approaches. MLC-PPF can achieve the greatest accuracies for the target domains, which illustrates that our method is able to solve the unacquainted world phenomenon and give inspiration for relevant tasks. Despite the effectiveness of our methods, we should admit that there is a limitation of MLC-PPF. MLC-PPF only can make the prediction when its -neighbors have ratings. However, based on the test results, the predicted ratings are reliable and able to make the recommendation.

Book 0.834 0.755 0.740 0.396
Music 0.847 0.779 0.716 0.582
Table 3: Transfer to the cold-start problem. We report the MAE of all test methods.


In this paper, we proposed an effective method for the potential passenger flow prediction, which is a novel study that brings benefits to the urban transportation development. To address this spatio-temporal problem, we design a multi-view localized correlation learning model (MLC-PPF) for the PPF prediction. The -nearest indicator matrix is constructed by the real geographical neighbors and regional properties. MLC-PPF can learn the correlations between each known area and its k-nearest neighbors with the cross-domain knowledge guidance. We evaluate the performance of our method with a set of well-designed experiments. All empirical results not only demonstrate that the proposed model outperforms all the other methods in the PPF prediction task, but also represent the capability of tackling the cold-start problem in recommender system.


  • [Chen and Wei2011] Chen, M.-C., and Wei, Y. 2011. Exploring time variants for short-term passenger flow. Journal of Transport Geography 19(4):488–498.
  • [Deng et al.2016] Deng, D.; Shahabi, C.; Demiryurek, U.; Zhu, L.; Yu, R.; and Liu, Y. 2016. Latent space model for road networks to predict time-varying traffic. In Proceedings of the 22nd ACM SIGKDD, 1525–1534.
  • [Elkahky, Song, and He2015] Elkahky, A. M.; Song, Y.; and He, X. 2015. A multi-view deep learning approach for cross domain user modeling in recommendation systems. In Proceedings of the 24th International Conference on World Wide Web, 278–288. International World Wide Web Conferences Steering Committee.
  • [Gong et al.2018] Gong, Y.; Li, Z.; Zhang, J.; Liu, W.; Zheng, Y.; and Kirsch, C. 2018. Network-wide crowd flow prediction of sydney trains via customized online non-negative matrix factorization. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 1243–1252. ACM.
  • [Hsieh, Lin, and Zheng2015] Hsieh, H.-P.; Lin, S.-D.; and Zheng, Y. 2015. Inferring air quality for station location recommendation based on urban big data. In the 21th ACM SIGKDD International Conference.
  • [Hu et al.2013] Hu, L.; Cao, J.; Xu, G.; Cao, L.; Gu, Z.; and Zhu, C. 2013. Personalized recommendation via cross-domain triadic factorization. In Proceedings of the 22nd international conference on World Wide Web, 595–606. ACM.
  • [Hu et al.2016] Hu, L.; Cao, L.; Cao, J.; Gu, Z.; Xu, G.; and Yang, D. 2016. Learning informative priors from heterogeneous domains to improve recommendation in cold-start user domains. ACM Transactions on Information Systems (TOIS) 35(2):13.
  • [Lam et al.2008] Lam, X. N.; Vu, T.; Le, T. D.; and Duong, A. D. 2008. Addressing cold-start problem in recommendation systems. In Proceedings of the 2nd international conference on Ubiquitous information management and communication, 208–211. ACM.
  • [Lee and Seung2001] Lee, D. D., and Seung, H. S. 2001. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, 556–562.
  • [Li et al.2019] Li, Z.; Zhang, J.; Wu, Q.; Gong, Y.; Yi, J.; and Kirsch, C. 2019. Sample adaptive multiple kernel learning for failure prediction of railway points. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2848–2856. ACM.
  • [Lin et al.2017] Lin, L.; Li, J.; Chen, F.; Ye, J.; and Huai, J. 2017. Road traffic speed prediction: a probabilistic model fusing multi-source data. IEEE Transactions on Knowledge and Data Engineering 30(7):1310–1323.
  • [Ma et al.2016] Ma, Y.; Lin, T.; Cao, Z.; Li, C.; Wang, F.; and Chen, W. 2016. Mobility viewer: An eulerian approach for studying urban crowd flow. IEEE Transactions on Intelligent Transportation Systems 17(9):2627–2636.
  • [Ni, He, and Gao2017] Ni, M.; He, Q.; and Gao, J. 2017. Forecasting the subway passenger flow under event occurrences with social media. IEEE Transactions on Intelligent Transportation Systems 18(6):1623–1632.
  • [Pan et al.2019] Pan, Z.; Liang, Y.; Wang, W.; Yu, Y.; Zheng, Y.; and Zhang, J. 2019. Urban traffic prediction from spatio-temporal data using deep meta learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1720–1730. ACM.
  • [Rendle et al.2009] Rendle, S.; Balby Marinho, L.; Nanopoulos, A.; and Schmidt-Thieme, L. 2009. Learning optimal ranking with tensor factorization for tag recommendation. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 727–736. ACM.
  • [Singh and Gordon2008] Singh, A. P., and Gordon, G. J. 2008. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 650–658. ACM.
  • [Sun et al.2015] Sun, L.; Lu, Y.; Jin, J. G.; Lee, D.-H.; and Axhausen, K. W. 2015. An integrated bayesian approach for passenger flow assignment in metro networks. Transportation Research Part C: Emerging Technologies 52:116–131.
  • [Taneja and Arora2018] Taneja, A., and Arora, A. 2018. Cross domain recommendation using multidimensional tensor factorization. Expert Systems with Applications 92:304–316.
  • [Tobler1970] Tobler, W. R. 1970. A computer movie simulating urban growth in the detroit region. Economic geography 46(sup1):234–240.
  • [Wang et al.2019] Wang, Y.; Yin, H.; Chen, H.; Wo, T.; Xu, J.; and Zheng, K. 2019. Origin-destination matrix prediction via graph convolution: a new perspective of passenger demand modeling. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1227–1235. ACM.
  • [Wei and Chen2012] Wei, Y., and Chen, M.-C. 2012. Forecasting the short-term metro passenger flow with empirical mode decomposition and neural networks. Transportation Research Part C: Emerging Technologies 21(1):148–162.
  • [Xiong et al.2010] Xiong, L.; Chen, X.; Huang, T.-K.; Schneider, J.; and Carbonell, J. G. 2010. Temporal collaborative filtering with bayesian probabilistic tensor factorization. In Proceedings of the 2010 SIAM International Conference on Data Mining, 211–222. SIAM.
  • [Xu, Tao, and Xu2013] Xu, C.; Tao, D.; and Xu, C. 2013. A survey on multi-view learning. arXiv preprint arXiv:1304.5634.
  • [Xu, Tao, and Xu2015] Xu, C.; Tao, D.; and Xu, C. 2015. Multi-view learning with incomplete views. IEEE Transactions on Image Processing 24(12):5812–5825.
  • [Zhan et al.2016] Zhan, X.; Zheng, Y.; Yi, X.; and Ukkusuri, S. V. 2016. Citywide traffic volume estimation using trajectory data. IEEE Transactions on Knowledge and Data Engineering 29(2):272–285.
  • [Zhang et al.2018] Zhang, J.; Zheng, Y.; Qi, D.; Li, R.; Yi, X.; and Li, T. 2018. Predicting citywide crowd flows using deep spatio-temporal residual networks. Artificial Intelligence 259:147–166.
  • [Zhang et al.2019] Zhang, J.; Zheng, Y.; Sun, J.; and Qi, D. 2019. Flow prediction in spatio-temporal networks based on multitask deep learning. IEEE Transactions on Knowledge and Data Engineering.
  • [Zheng et al.2014] Zheng, Y.; Capra, L.; Wolfson, O.; and Yang, H. 2014. Urban computing: concepts, methodologies, and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 5(3):38.
  • [Zheng2015] Zheng, Y. 2015. Methodologies for cross-domain data fusion: An overview. IEEE transactions on big data 1(1):16–34.