Introduction
With the growth of intelligent transportation systems, passenger flow prediction models concentrate on discovering the volume of crowds and mobility patterns that best serve people’s daily life [Pan et al.2019, Zhang et al.2019]. Recent advances in passenger flow prediction are focusing mainly on next time interval flow conditions with time evolves [Gong et al.2018, Sun et al.2015]. If a brandnew metro station is inserted into the original metro network, existing predictors have to collect a large amount of latest transactional data to ensure normal operation. However, a realworld requirement from transportation authorities is that they want to obtain the potential passenger flows (PPF) of a planned city area in advance (i.e., before constructing a station in this area). It is significant for the urban traffic development and transportation management, as it can provide insights for the site selection of stations and analysis of passenger movement patterns, as well as give the potential crowd warning.
In the PPF prediction task, concentrating solely on the entrance and exit potential flows does not provide adequate information, authorities also desperately want to master the distribution of predicted PPF, i.e., forecast the number of potential passengers moving to different destinations. It is utmost important to find how will the new station affect other areas. For instance, Figure 1 illustrates an example of the PPF prediction problem. A city region is partitioned into nine areas^{1}^{1}1We use grids for clear and simple illustration, the real partition standard is explained in the section of data description., six of them have metro stations (termed as known areas), and three have not constructed yet (termed as target areas). The right part of Figure 1 presents an origindestination (OD) matrix (each row point is the origin area and column points are destinations), e.g., (, ) = 130 indicates that there are 130 passengers departure from and are going to the . PPF task aims to make an accurate prediction for the target areas in one period (e.g., rush hours) that completes the crowd flows between them and known areas.
To date, limited studies considered the OD passenger flow prediction problem [Gong et al.2018, Wang et al.2019], and to the best of our knowledge, none of existing techniques can forecast PPF across the entire city. It is a novel problem and a real urban developing demand that faces several major challenges: (1) Considering the number of passenger flows and their final destinations simultaneously. (2) Analogously to the coldstart problem in the recommender system [Lam et al.2008], it is hard to infer the preference of a new user from the known data. In our problem, a new station in the target area can be similarly regarded as a new user. (3) Since the PPF is a spatialtemporal mining problem, spatial and temporal information should be taken into account appropriately.
To resolve this novel and significant problem, in this paper, we devise a multiview localized correlation learning model for the PPF prediction (MLCPPF for short). To leverage the spatial information, we first construct a localized similarity matrix which associates with the real geographical neighbors and regional properties (e.g., business or residential regions). The intuition behind this strategy is from the First Law of Geography [Tobler1970], i.e. “Everything is related to everything else, but near things are more related than distant things”. Second, a novel weighted correlation learning strategy is proposed. At last, to improve the prediction accuracy and well handle the coldstart challenge, we draw the side information from urban statistical data, where each area has a multiview features to guide the learning process. In summary, our main contributions are shown as follows:

We formulate the PPF prediction problem and provide the first attempt on forecasting passenger flows for urban transportation development.

We propose a multiview localized correlation learning method to provide a solution for the PPF prediction that can learn localized correlations via a multiview learning process.

We show that our method can be transferred to the classic coldstart problem in the recommender system. It achieves a superior result that gives a new perspective for relevant tasks.

We conduct extensive prediction experiments on a large realworld transactional dataset and show that our model outperforms other available algorithms.
Related Work
Passenger Flow Prediction
Most existing passenger flow prediction models focused on forecasting entrance/exit flows at certain stations or areas, neglecting the crowd flows across different stations. [Chen and Wei2011, Wei and Chen2012, Ni, He, and Gao2017]. Wei et al. [Chen and Wei2011]
developed an effective shortterm passenger flow prediction model to explore the time variants and capture dynamic patterns on a single subway line. Subsequently, a modified approach is proposed based on the neural network, which aims to solve the same entrance/exit crowd flow prediction task in a few metro lines
[Wei and Chen2012]. Ni et al. [Ni, He, and Gao2017] used auxiliary information, such as social media events, to improve the forecast performance.One of the research hotspots is named the city networkwide crowd flow prediction, which is a significant task for the modern transportation management [Deng et al.2016]. Nowadays, some of methods were focusing on forecasting the citywide crowd flows. [Ma et al.2016]
devised a series of visualization approaches to show the flows’ dynamic changes in the networks. Zhang et al. proposed the deep learning models based on the ResNet to predict crowd inflows and outflows of the entire city regions.
[Zhang et al.2018, Zhang et al.2019]. The Probabilistic model is an effective approach to estimate the traffic speed. For example, [Zhan et al.2016] and [Lin et al.2017] used trajectory data to estimate citywide traffic volume via probabilistic graphical models. [Gong et al.2018] proposed an effective method based on online latent space learning to predict the crowd flow distribution, i.e., forecast the OD pairs and the quantity of passenger flows simultaneously. To the best of our knowledge, none of existing crowd flow prediction methods considered the PPF problem studied in this paper.Other relevant studies, such as [Hsieh, Lin, and Zheng2015], are pointbased prediction model, not in a matrix formulation. [Hsieh, Lin, and Zheng2015] selects k points to predict k values. But in our task, k target areas are required nk prediction values, where n is the number of known areas. It is because we also need to consider the crowd flows between each area.
Multiview Learning
Traditional passenger flow mining usually deals with data from a single view. Recently, there exists a diversity of datasets from different sources in various domains with multiple views [Zheng2015, Li et al.2019]. The multiview learning algorithm is widely recognized as an effective way of solving the crossdomain problem, that features from different views can be served for the target domain learning process [Singh and Gordon2008, Xu, Tao, and Xu2013, Elkahky, Song, and He2015]. [Xu, Tao, and Xu2015] proposed a matrix cofactorization based method (MVLIV) to embed different views into a shared subspace, such that the incomplete views can be estimated by the information on observed views. To connect multiple views, MVLIV assumed that different views have distinct ‘feature’ matrices (i.e., ), but correspond to the same coefficient matrix (i.e.,
). The tensorbased methods, such as
[Hu et al.2013], [Hu et al.2016] [Taneja and Arora2018] were proposed to address the crossdomain recommendation problem. They devised a crossdomain triadic factorization model to learn the triadic factors for user, item and domain, where the item dimensionality varies with domains. The above approaches and other similar methods [Rendle et al.2009, Xiong et al.2010] cannot address our PPF prediction problem directly because they are not formulated for the passenger flow prediction task. However, since they can handle the coldstart problem by utilizing the crossdomain knowledge, an illuminative clue is educed.In conclusion, none of relevant studies can solve the PPF prediction problem directly. Accordingly, this paper aims to design a reliable approach for PPF prediction with crossdomain knowledge involved.
Problem Statement
Focusing on the PPF prediction problem, every origindestination among areas needs to be recorded. We formulate the OD passenger flow network as a fully connected graph = (, ), where is a set of vertexes and is the set of edges. records the th origin or destination area, and an edge denotes an origindestination flow from area to . The value of each edge is associated with the observed flow , i.e., is the total number of passengers that departure from area and are going to the area. Then, G can be represented by PPF matrix , where = . The example of and is shown in Figure 1. = 55 means that 55 passengers leave from and theirs’ destination is .
In the realworld scenario, one area may have several stations. In this case, we calculate the passenger flows of these stations together to present the total flows of the area. We consider three specific and useful time periods to predict PPF, which will help the authorities to do a better temporal analysis of transportation development. The three periods are morning rush hour (7.00 AM  9.00 AM), afternoon rush hour (5.00 PM  7.00 PM), and nonrush hour (2.00 PM  4.00 PM).
Furthermore, traffic periodicity is a very important factor for relevant studies. Crowd flows also represent the stable and daily periodic properties, especially on weekdays. To extract the temporal information and make a more general prediction, we consider a series of previous daily PPF matrices () in the same time period to predict the PPF matrix of target areas () for the day . Note that, the prediction is not limited in the th day, the target can be changed easily based on the real requirement.
To best simulate the crowd flow changes when picking the target areas, in this paper, we tracked all trajectories of passengers, from origins to their destinations. For example, if area is selected as a target area, all the departure crowd flows from will add to its closest area (e.g., ) to best simulate the people’s choice. In this way, the PPF is learned by the crowd flows under the assumption that the original passengers from will departure from its closest neighbor .
The Proposed Method
In this section, we propose our PPF prediction model MCLPPF. We will describe the strategy of localized correlation learning with adaptiveweight, and how to leverage the crossdomain multiview information to improve our work. Figure 2 shows the flowchart of our model.
Localized Correlation Learning
PPF prediction problem is a spatially related task that the more similar between two areas, the more correlations of passenger flow condition they have. Assuming that a city is partitioned into areas, including known and target areas. presents the PPF matrix, and presents the localized flow matrix of , where the th row of is a combination of its nearest area passenger flows. In that way, we formulate the function to learn the localized correlation, which can be expressed as:
(1) 
where is the localized correlation matrix that learns the transformation from to in each period of day , is the total number of days; stands for the projection of all observed passenger flows from the known area set ; is Frobenius norm of matrix.
Now, we will discuss how to build of one period in a day. The physical distances among areas need to be considered first. Moreover, the development of a city gradually fosters different functional regions, such as business and educational areas, where the areas belonging to the same functional region will have strong connections with their properties [Zheng et al.2014].
Thus, the similarities among areas should take into account the above two standards. To this end, we build two distance metrics from the real geographic location and regional similarity. The distance metric between th and th areas is shown as follows:
(2) 
where is the geographic distance between th and th areas; and presents the Euclidean distance which is calculated by intrinsic features of areas (e.g., point of interest attributes).
After having gotten the , the neighbors of the th area can be picked. Then, we construct an indicator matrix for the nearest neighbors of all areas where each row indicates the position of its nearest known areas. For example, in the stage 2 of Figure 2, the first row of illustrates that , , and are the nearest areas of if is setting to 3. Accordingly, for each day , the localized flow matrix can be represented as , where is an adaptiveweight matrix that learns the different weights of nearest areas. To this stage, the localized correlation learning process is shown as Eq. (3).
(3) 
where
is the entrywise product; The loss function aims to learn the localized correlations matrix
and weight matrix simultaneously.Improvement by Crossdomain Learning Process
As mentioned above, there are various functional regions of a city. Thanks to the urban statistical data, the passenger flow similarities among different areas can be reviewed from this crossdomain perspective. Based on the phenomenon that the similar functional regions have the similar passenger flows (e.g., the business regions have a large number of entrance flows during the morning rush hour, and people leave from residential areas in the same time span), we leverage such information to guide the localized correlation learning process.
The statistical data have multiple views to record the differences between areas. For example, the economy view reveals the economic features, such as the number of industries and employee statistics; and the population view consists of detailed population information. Let {} denote the multifold views of statistical data, where , the row of denotes the area and column denotes the feature. To improve the prediction performance, crossdomain knowledge is involved as guidance, which can be formulated as:
(4) 
where is the regularization parameter; denotes the localized matrix of .
After solving Eq. (4), the learned matrices and can be used to make the prediction. The predicted PPF of target areas in th day is:
(5) 
where is an indicator matrix whose entry () is one if is observed and zero otherwise.
To this stage, the OD passenger flows in the target areas are learned by the above processes, i.e., predict each row of target areas. However, the column of target areas revealing how much crowds arrived at these areas needs to be predicted with a slight modification. That is, replace with in Eq. (4) to learn the localized correlation from the other side. It can be solved in a likewise manner. Thus we only presented the optimization strategy of Eq. (4) due to the page limitation.
Learning and Prediction
Eq. (4) is a complex nonconvex problem. But the loss function associated with Eq. (4) is convex regarding with fixed and vice verse. We can optimize them alternatively until convergence (e.g., alternating least squares (ALS)). A straightforward way to minimize the loss function is through the gradient method.
Considering while is fixed, Eq. (4) can be rewritten as follows:
(6) 
Taking the derivative of with respect to , we can get gradient :
(7) 
Analogously, the derivative of with respect to is:
(8) 
Let and be the stepsize and number of iterations. In each stage, we adopt the following update rules:
(9) 
(10) 
(11) 
where .
Based on the above update equations, the iterative learning and prediction process for MLCPPF are summarized in Algorithm 1.
Experiments
Data Description

We describe the transactional dataset used in this paper, which is a largescale, realworld dataset provided by NSW Sydney Transport. After data cleaning^{2}^{2}2We removed the recording errors and UNKNOWN trips, etc., the dataset contains above 35 million transactional records covering 194 stations including the city train and ferry stations between 7 Nov 2016 and 11 Dec 2016. We pick the data between 7 Nov. 2016 and 20 Nov. 2016 as the training and validation sets (used to tune parameters); the remaining data are used as the test set.

The urban statistical data are collected from Australian Bureau of Statistics2016 (ABS) with four views; those are Economy, Family, Income, and Population. The numbers of the dimension of four views are 43, 44, 50, 97, respectively.

All the transactional dataset across the transport network are mapped into 117 areas to build the flow matrices , . The designation of areas is based on the Australian Statistical Geography Standard for the best practical value.
Methods and Metrics
We use the following five baselines which can learn the flow data by the crossdomain knowledge guidance. Among them, CDTF and WITF are two tensorfactorizationbased (TF) methods that can solve the coldstart problem. For NMF, we concatenate the flow matrix with the statistical data. All parameters used in baselines and our method are picked by a grid search approach.

NMF: Predict the PPF by the nonnegative matrix factorization, which concatenates the flow matrix and the statistical data [Lee and Seung2001].

MVLIV: A stateoftheart multiview learning method based on the matrix cofactorization, it learns the same coefficient matrix to connect multiple views [Xu, Tao, and Xu2015]. In this method, we set the flow matrix as one of the views, other views are from the ABS dataset.

LSKNN
: Latent similarity knearest neighbors. After calculating the latent similarities among areas by Eq. (2), we pick knearest neighbors of the target areas, and use average crowd flows of these neighbors as an estimate (k=4). 
CDTF: A stateoftheart TF method to learn the crossdomain knowledge [Hu et al.2013].

WITF: A weighted irregular TF method which is similar as the CDTF [Hu et al.2016]. For CDTF and WIFT, we leverage the passenger flow and ABS data to construct the tensor.
Methods  Morning Rush Hour  Afternoon Rush Hour  Nonrush Hour  Average  

MAE  NRMSE  MAE  NRMSE  MAE  NRMSE  MAE  NRMSE  
NMF  124.50  30.78%  117.92  37.13%  89.11  28.44%  110.51  32.12% 
MVLIV  108.31  29.50%  101.55  29.78%  92.05  27.54%  100.64  28.94% 
CDTF  75.15  22.43%  84.02  25.93%  67.78  19.37%  75.65  22.58% 
WITF  69.30  18.73%  72.06  19.45%  62.57  17.26%  67.98  18.48% 
LSKNN  19.89  5.42%  20.20  7.67%  23.51  7.94%  21.20  7.01% 
MLCPPF  9.84  2.30%  11.47  3.12%  8.22  1.21%  9.84  2.21% 
Methods  5%  10%  15%  25%  

MAE  NRMSE  MAE  NRMSE  MAE  NRMSE  MAE  NRMSE  
NMF  88.11  20.70%  90.23  21.66%  107.44  26.52%  128.70  30.11% 
MVLIV  92.75  20.36%  90.40  20.75%  99.01  23.79%  101.93  32.40% 
CDTF  59.25  18.90%  61.25  19.26%  68.07  22.00%  79.06  26.35% 
WITF  60.41  18.23%  60.72  19.77%  61.18  19.23%  71.07  20.73% 
LSKNN  13.69  4.72%  16.45  5.01%  18.51  5.44%  23.99  9.25% 
MLCPPF  8.64  1.37%  8.80  1.20%  9.73  1.90%  11.07  2.34% 
Metrics.
We used the two most widely used evaluation metric to measure the PPF prediction quality. They are
Mean Absolute Error (MAE) and Normalized Root Mean Square Error (NRMSE).where is a forecasting passenger flow from th area to th; and is the ground truth; is the number of predictions; .
Comparisons on Different Time Periods
The first set of experiments is designed to assess the performance on different time periods. We randomly removed 20% areas as the target set, and the remaining 80% areas as the known set. The learning step is fitted to and there are only two hyperparameters used in our method, where and are chosen from {1,2,3,…,10} and {} respectively. We repeat the experiment 20 times with random initialization and report the average results.
Experimental results are presented in Table 1. Compared with other approaches, our method achieved the best prediction accuracies on both three time periods. None of the multiview and crossdomain methods work well because it is hard to capture the relationships between statistical data and the passenger flows. The approach LSKNN performs better than other baselines, which illustrates that the PPF prediction problem has a strong spatial correlation property. In summary, the proposed method is a welldesigned model for PPF prediction, which outperforms the other available baselines because it considers the localized correlations and the crossdomain knowledge simultaneously.
Comparisons on Various Missing Ratios
In this experiment, we evaluate how performance will change with varied number of target areas. We randomly pick 5%, 10%, 15%, 25% areas as the target areas, and run 20 times to report the average errors. The test period is in the morning rush hours. The performances of different methods are summarized in Table 2.
It is apparent that the experimental results lead to similar conclusions to the first comparison. Our model, MLCPPF, significantly outperforms all other comparative methods over all testing sets. The performances of MLCPPF in 5% dataset are very close to that of in 10%, which illustrates that the 90% remaining area set can learn a satisfied localized correlation and make accurate PPF predictions. In the realworld application, the proportion of target areas is usually small since only a few areas are suitable for constructing a new station.
Parameter Analysis
In this section, we analyze the effects of two hyperparameters used in this paper, where is the number of nearest neighborhoods, and the regularization factor controls the strength of guidance from data.
Figure 3(a) shows the different performances with a varying setting for . For each area, the correlation matrix only learns the transform from these neighborhoods. As can be shown in the results, = 2 is the best choice for our method, and the model is not sensitive to between 1 and 5.
Figure 3(b) represents various results by varying . = achieves the best results, and the performances are stable when choosing between [, ].
In a summary, the parameters used in this paper are benefit to the improvement of our models. MLCPPF is stable because it is insensitive to parameters.
Case Study
We display a PPF prediction result of one target area in this section. In this case, the area “Homebush” is treated as the target area. For better visualization, we only remain the areas where the number of arrived passengers is greater than 5.
As shown in Figure 4, our model yields a great prediction result compared with the groundtruth, especially in some main areas of Sydney, such as the central area “SydneyHaymarket”, “BurwoodCroydon”, “North SydneyLavender Bay” and “ParramattaRosehill”. The case study demonstrates the effectiveness of our method for the PPF prediction.
Transfer to the Coldstart Problem
As we have emphasized, our strategy can provide a new perspective to address the classic coldstart problem in the recommender system. This set of experiments is designed to assess the transferability of our model.
We choose a very famous dataset from Amazon to do the evaluation, in which the dataset contains 1,555,170 users and 15 scaled ratings over 548,552 different products covering four domains: books, music CDs, DVDs and videos [Hu et al.2013]. We randomly remove the 20% users from target domains to simulate the coldstart situation. Three baselines are used in this comparison. CMF is an effective method based on the collective matrix factorization which couples rating matrices for all domains on the User dimension [Singh and Gordon2008]. CDTF and WITF are two tensorfactorizationbased crossdomain recommendation methods, they devise a strategy to transform original data into a cubical tensor that can better capture the interactions between user factors and item factors [Hu et al.2013, Hu et al.2016]. In this experiment, we leverage the information excluding target domain to build the nearest indicator matrix .
Table 3 shows the results of our methods together with some stateoftheart approaches. MLCPPF can achieve the greatest accuracies for the target domains, which illustrates that our method is able to solve the unacquainted world phenomenon and give inspiration for relevant tasks. Despite the effectiveness of our methods, we should admit that there is a limitation of MLCPPF. MLCPPF only can make the prediction when its neighbors have ratings. However, based on the test results, the predicted ratings are reliable and able to make the recommendation.
Target Domain  CMF  CDTF  WITF  MLCPPF 

Book  0.834  0.755  0.740  0.396 
Music  0.847  0.779  0.716  0.582 
Conclusion
In this paper, we proposed an effective method for the potential passenger flow prediction, which is a novel study that brings benefits to the urban transportation development. To address this spatiotemporal problem, we design a multiview localized correlation learning model (MLCPPF) for the PPF prediction. The nearest indicator matrix is constructed by the real geographical neighbors and regional properties. MLCPPF can learn the correlations between each known area and its knearest neighbors with the crossdomain knowledge guidance. We evaluate the performance of our method with a set of welldesigned experiments. All empirical results not only demonstrate that the proposed model outperforms all the other methods in the PPF prediction task, but also represent the capability of tackling the coldstart problem in recommender system.
References
 [Chen and Wei2011] Chen, M.C., and Wei, Y. 2011. Exploring time variants for shortterm passenger flow. Journal of Transport Geography 19(4):488–498.
 [Deng et al.2016] Deng, D.; Shahabi, C.; Demiryurek, U.; Zhu, L.; Yu, R.; and Liu, Y. 2016. Latent space model for road networks to predict timevarying traffic. In Proceedings of the 22nd ACM SIGKDD, 1525–1534.
 [Elkahky, Song, and He2015] Elkahky, A. M.; Song, Y.; and He, X. 2015. A multiview deep learning approach for cross domain user modeling in recommendation systems. In Proceedings of the 24th International Conference on World Wide Web, 278–288. International World Wide Web Conferences Steering Committee.
 [Gong et al.2018] Gong, Y.; Li, Z.; Zhang, J.; Liu, W.; Zheng, Y.; and Kirsch, C. 2018. Networkwide crowd flow prediction of sydney trains via customized online nonnegative matrix factorization. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 1243–1252. ACM.
 [Hsieh, Lin, and Zheng2015] Hsieh, H.P.; Lin, S.D.; and Zheng, Y. 2015. Inferring air quality for station location recommendation based on urban big data. In the 21th ACM SIGKDD International Conference.
 [Hu et al.2013] Hu, L.; Cao, J.; Xu, G.; Cao, L.; Gu, Z.; and Zhu, C. 2013. Personalized recommendation via crossdomain triadic factorization. In Proceedings of the 22nd international conference on World Wide Web, 595–606. ACM.
 [Hu et al.2016] Hu, L.; Cao, L.; Cao, J.; Gu, Z.; Xu, G.; and Yang, D. 2016. Learning informative priors from heterogeneous domains to improve recommendation in coldstart user domains. ACM Transactions on Information Systems (TOIS) 35(2):13.
 [Lam et al.2008] Lam, X. N.; Vu, T.; Le, T. D.; and Duong, A. D. 2008. Addressing coldstart problem in recommendation systems. In Proceedings of the 2nd international conference on Ubiquitous information management and communication, 208–211. ACM.
 [Lee and Seung2001] Lee, D. D., and Seung, H. S. 2001. Algorithms for nonnegative matrix factorization. In Advances in neural information processing systems, 556–562.
 [Li et al.2019] Li, Z.; Zhang, J.; Wu, Q.; Gong, Y.; Yi, J.; and Kirsch, C. 2019. Sample adaptive multiple kernel learning for failure prediction of railway points. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2848–2856. ACM.
 [Lin et al.2017] Lin, L.; Li, J.; Chen, F.; Ye, J.; and Huai, J. 2017. Road traffic speed prediction: a probabilistic model fusing multisource data. IEEE Transactions on Knowledge and Data Engineering 30(7):1310–1323.
 [Ma et al.2016] Ma, Y.; Lin, T.; Cao, Z.; Li, C.; Wang, F.; and Chen, W. 2016. Mobility viewer: An eulerian approach for studying urban crowd flow. IEEE Transactions on Intelligent Transportation Systems 17(9):2627–2636.
 [Ni, He, and Gao2017] Ni, M.; He, Q.; and Gao, J. 2017. Forecasting the subway passenger flow under event occurrences with social media. IEEE Transactions on Intelligent Transportation Systems 18(6):1623–1632.
 [Pan et al.2019] Pan, Z.; Liang, Y.; Wang, W.; Yu, Y.; Zheng, Y.; and Zhang, J. 2019. Urban traffic prediction from spatiotemporal data using deep meta learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1720–1730. ACM.
 [Rendle et al.2009] Rendle, S.; Balby Marinho, L.; Nanopoulos, A.; and SchmidtThieme, L. 2009. Learning optimal ranking with tensor factorization for tag recommendation. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 727–736. ACM.
 [Singh and Gordon2008] Singh, A. P., and Gordon, G. J. 2008. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 650–658. ACM.
 [Sun et al.2015] Sun, L.; Lu, Y.; Jin, J. G.; Lee, D.H.; and Axhausen, K. W. 2015. An integrated bayesian approach for passenger flow assignment in metro networks. Transportation Research Part C: Emerging Technologies 52:116–131.
 [Taneja and Arora2018] Taneja, A., and Arora, A. 2018. Cross domain recommendation using multidimensional tensor factorization. Expert Systems with Applications 92:304–316.
 [Tobler1970] Tobler, W. R. 1970. A computer movie simulating urban growth in the detroit region. Economic geography 46(sup1):234–240.
 [Wang et al.2019] Wang, Y.; Yin, H.; Chen, H.; Wo, T.; Xu, J.; and Zheng, K. 2019. Origindestination matrix prediction via graph convolution: a new perspective of passenger demand modeling. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1227–1235. ACM.
 [Wei and Chen2012] Wei, Y., and Chen, M.C. 2012. Forecasting the shortterm metro passenger flow with empirical mode decomposition and neural networks. Transportation Research Part C: Emerging Technologies 21(1):148–162.
 [Xiong et al.2010] Xiong, L.; Chen, X.; Huang, T.K.; Schneider, J.; and Carbonell, J. G. 2010. Temporal collaborative filtering with bayesian probabilistic tensor factorization. In Proceedings of the 2010 SIAM International Conference on Data Mining, 211–222. SIAM.
 [Xu, Tao, and Xu2013] Xu, C.; Tao, D.; and Xu, C. 2013. A survey on multiview learning. arXiv preprint arXiv:1304.5634.
 [Xu, Tao, and Xu2015] Xu, C.; Tao, D.; and Xu, C. 2015. Multiview learning with incomplete views. IEEE Transactions on Image Processing 24(12):5812–5825.
 [Zhan et al.2016] Zhan, X.; Zheng, Y.; Yi, X.; and Ukkusuri, S. V. 2016. Citywide traffic volume estimation using trajectory data. IEEE Transactions on Knowledge and Data Engineering 29(2):272–285.
 [Zhang et al.2018] Zhang, J.; Zheng, Y.; Qi, D.; Li, R.; Yi, X.; and Li, T. 2018. Predicting citywide crowd flows using deep spatiotemporal residual networks. Artificial Intelligence 259:147–166.
 [Zhang et al.2019] Zhang, J.; Zheng, Y.; Sun, J.; and Qi, D. 2019. Flow prediction in spatiotemporal networks based on multitask deep learning. IEEE Transactions on Knowledge and Data Engineering.
 [Zheng et al.2014] Zheng, Y.; Capra, L.; Wolfson, O.; and Yang, H. 2014. Urban computing: concepts, methodologies, and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 5(3):38.
 [Zheng2015] Zheng, Y. 2015. Methodologies for crossdomain data fusion: An overview. IEEE transactions on big data 1(1):16–34.