1 Introduction
With the rapid urbanization, metro has become the mainstream of public transportation for its advantages such as fast speed, large capacity, and convenience. At the stage of urban construction and transportation planning, it is crucial to explore multiple urban indicators from a systematic perspective to capture the urban trafficrelated phenomenon like metro ridership at station level, which is a critical element for determining the scale of stations and access facilities. Metro ridership at station level is known to be influenced by the interactions of the different components of an urban system (e.g., landuse, socioeconomics, etc.). Understanding the influence of these components is vital to accurately estimate travel demand and to effectively make design schemes of urban systems including the identification of which public infrastructures, services, and resources need to be built and deployed. Modelling metro ridership at station level can help to not only estimate and forecast ridership but also analyse the influencing factors on it.
Various methods have been proposed for transit ridership estimation. As one of the bestknown models, the fourstep (generation, distribution, mode choice, and assignment) model has dominated the history of transport modeling since the 1950s [1]. However, the fourstep model has many drawbacks in practice [2], such as limitation in model accuracy, low data precision, insensitivity to land use, institutional barriers, and high expense [3]. The fourstep model is generally effective for estimating transit ridership on a regional scale rather than more detailed scales (such as station level) [4]
. As an alternative to the fourstep model, direct demand models have drawn growing attention for ridership estimation in recent decades. Direct demand models estimate ridership as a function of influencing factors within the Pedestrian Catchment Areas (PCA) via regression analysis, which enable identifying factors that contribute to higher transit ridership
[3, 5, 6, 7, 8]. In the models, a PCA is a geographic area for which a station attracts passengers. The size and shape of a catchment area depend on how accessible a station is and how far it is from alternative stations. One can use buffers to create circular catchment areas by a specific distance or use Thiessen polygons to illustrate the area most accessible to each station. The major advantages of direct demand models in travel analysis are simplicity of use, easy interpretation of results, immediate response, and low cost. The direct demand models generally use OLS multiple regression , which assumes parametric stability. With the development of spatial modeling, direct demand models could increase their spatial explanatory power by using GWR, which is designed to model spatial parametric nonstationarity, and variance heterogeneity. However, the issues of GWR models, such as collinearity in the estimated coefficients, and network connection intermedia may influence metro ridership modelling and estimation. Thus, there are still a number of limitations to be promoted in the existing research on transit ridership modeling.
In light of deficiencies of current direct demand models, considering the pros and cons of Geographically Weighted Regression (GWR) for modelling potentially spatially varying relationships, this paper proposes an Adapted Geographically Weighted Lasso (AdaGWL) framework for modelling the ridership of metro systems, which considers using networkbased distance metric rather than Euclideanbased distance metric, and simultaneously shrink regression coefficients and select feature locally in the process of GWR with Least Absolute Shrinkage and Selection Operator (LASSO).The first objective of this study is to identify the association between multiple factors and metro station ridership from a local perspective. The second objective is to estimate ridership at station level accurately. An AdaGWL model is used to analyze the significance of various local factors that impact metro station ridership. The ridership data of Shenzhen Metro at stationlevel in the year of 2013 is used to validate the superiority of our proposed model over global model (Ordinary Least Square), GWR, GWR calibrated with networkbased distance metric, and GWL model.
Our contributions are threefold:
(1) This study introduces networkbased distance metric during estimating ridership. By implementing the AdaGWL model on the real world case study of Shenzhen metro systems, we calibrate distance with networkbased distance metric instead of Euclideanbased distance metric. The reason is that locations may be related through a certain intermedia system but not in Euclidean space. This paper investigates ridership at metro stations, which are related through railways. In practice, taking network connection intermedia into consideration by using networkbased distance metric is important as routine transportation activities normally occur on a determined transportation network but not a continuous space [9]. Nonetheless, few existing direct demand models have included spatial aspects considering network connection intermedia.
(2) We adopt the basic framework of GWR considering spatial autocorrelation of variables to model metro ridership, which outperforms the traditional global regression model OLS in terms of model fitting and spatial explanatory power.
(3) This study enables simultaneous coefficient penalization and model selection during GWR. Even though GWR models have been utilized to model and analyze the transit ridership and its influencing factors, the collinearity of GWR may make interpreting individual coefficients problematic. Also, different influencing factors may exist at different stations, so the feature selection of the model can be conducted from the local instead of global perspective, that is, local penalization and feature selection, which have rarely been done in GWR research.
In general, to the extent of our knowledge, this is the first work that investigates the utility of GWL model calibrated with networkbased distance metric for estimating metro ridership and analyzing the influencing factors from a local perspective. It is worth noting that the fundamental framework proposed is not limited to the specific dataset used in this study, but can be extended and generalized to other applications in transportation systems, such as light rail and rapid bus transit systems.
The remainder of this paper is organized as follows. In Section 2, we review the current literature on transit ridership estimation modeling and factors investigated in those models; In Section 3, we present the data description and various influencing factors considered; In Section 4, we describe the methodology for estimating metro transit ridership and identifying significant influencing factors. Section 5 provides the details for model implementation and results analysis and discussion. Finally, we conclude this paper in Section 6.
2 Literature Review
2.1 Transit Ridership Estimation Modelling
This paper summarizes the related studies on direct demand models for ridership estimation (see Figure A1 in A.1). As summarized in Figure A1, the most widely used method is Ordinary Least Squares (OLS) or linear least squares in the study of the influencing factors on transit ridership by using direct demand models, Kuby et al.(2004), Sohn and Shim(2010), Loo et al.(2010), Sung and Oh(2011), Gutierrez et al.( 2011), Thompson et al.(2012), Guerra et al.(2012), Zhao et al.(2013), Chan and MirandaMoreno (2013), Singhai et al.(2014), Liu et al.(2016) and Pan et al.(2017) all applied OLS regression to model transit ridership and its influencing factors[7, 10, 11, 12, 3, 13, 14, 15, 16, 17, 18, 19]. However, the limitation of OLS models is that they assume the transit ridership is affected by various factors but has nothing to do with the spatial location, without considering the spatial autocorrelation, in other words, OLS models estimate ridership from a global perspective believing that calculated coefficients do not have significant differences in space. Fotheringham (1996) proposed Geographically Weighted Regression (GWR) model which can reveal the spatial relations on the condition of spatial heterogeneity. GWR model has a strong capability of spatial data analysis [20], and it has been widely used to analyse spatial data in economics, geography, ecology, and many other fields of research. At present, Cardozo et al.(2012) compared the performance of OLS and GWR in modelling transit ridership and its influencing factors, and GWR showed the better goodnessoffit than OLS for forecasting stationlevel ridership[4]. In addition, Jun et al.(2015) used mixed geographically weighted regression (MGWR) models consisting of local and global independent variables to examine the relationship between the land use characteristics and subway ridership[21]. However, there was still room for improvement in their GWR models, such as considering the impact of network connection intermedia on distance metric and correlation in the estimated coefficients during GWR [22]. Actually, Wheeler and Tiefelsdorf (2005) showed that GWR coefficients can be correlated even though there is no independent variable collinearity, the coefficient correlation increases systematically with increasing collinearity[23]. Therefore, the application of GWR in the estimation of transit ridership needs to be promoted.
The research of GWR with penalized forms (e.g., norm/ norm) can be found in Wheeler’s studies[24, 22]
. Both ridge regression (
norm) and the Lasso ( norm) are penalization methods which impose constraints on the regression coefficients. Ridge regression aims to alleviate collinearity effects by making use of the penalty to regularize the size of regression coefficients and reducing variance. The Lasso also performs regularization on variables in consideration. It considers the absolute values of the sum of the regression coefficients. It also sets the coefficients to zero thus simultaneously performing regression coefficient penalization and model selection. Wheeler (2007) implemented a ridgeregression version of GWR, called GWRR, and found it could constrain the regression coefficients to reduce local correlation effects in an existing dataset and showed a lower estimation error for the dependent variable than of GWR [24]. The lasso has also been introduced into the GWR framework in Wheeler (2009)’s study. The lasso is considered in the GWR framework since it has capability to simultaneously coefficient regularization and local model selection, also for its potential to reduce prediction and estimation errors for estimating the dependent variable in GWR. Wheeler (2009) implemented a Lasso version of GWR, called Geographically Weighted Lasso (GWL), and demonstrated the benefit of GWL through a comparative analysis with GWR and GWRR.It can stabilize regression coefficients in the presence of collinearity and produce lower estimation error of the dependent variable than does GWR, and also offered a key advantage to GWRR for the identification of insignificant local effects by doing model selection[22]. However, the existing version GWL model didn’t consider the effect of network connection intermedia, which opened up new perspectives that fully justified the pursuit of this research. Thus, we try to design a version of GWL model adapted to metro network that considering network connection intermedia during calculating distance.2.2 Dependent Variables and Independent Variables
With regard to the dependent variables of direct demand models considered, daily ridership was selected in most of the relevant studies (also see Figure A1 in A.1), such as the research of Kuby et al.(2004), Chu (2004), Sohn and Shim (2010) , Loo et al.(2010), Zhao et al.(2013) and Zhang and Wang(2014), which all took the average weekday ridership as dependent variable[7, 8, 10, 11, 15, 9]. In addition, monthly station ridership was considered in the research of Gutierrez et al.(2011)[3]. For shorter time span, Zhao et al.(2013), Singhal et al.(2014), Hu et al.(2016), Li et.al (2016) and Chan and MirandaMoreno (2013)chose the hourly (eg. Peak hour) ridership as the dependent variable of their models[15, 17, 30, 29, 16].
Concerning independent variables of models, they can be roughly divided into the following categories, Land use variables, social economic variables, accessibility and the network structure. Figure A1 summarized the independent variables considered in existing studies. For the first kind of variables, such as the research of Estupinan and Rodriguez (2008), Loo et al. (2010), Sohn and Shim (2010), Gutierrez et al. (2011), Sung and Oh (2011), Choi et al. (2012), Cardozo et al. (2012) and Zhao et al (2013)[27, 11, 10, 3, 12, 5, 4, 15], which considered commercial, residence, education, entertainment and other mixed land use as independent variables. For the second kind of variables, factors considered mainly are population, employment, and automobile ownership ratio. For example, Chu (2004), Kuby et al. (2004), Loo et al. (2010), Sohn and Shim (2010), Gutierrez et al. (2011), Choi et al. (2012), Cardozo et al. (2012), Guerra et al. (2012), Thompson et al. (2012) and Zhao et al. (2013) analyzed the influence of population and employment on transit ridership [8, 7, 11, 10, 3, 5, 4, 14, 13, 15]. Chu (2004), Loo et al. (2010), Thompson et al (2012), Cardozo et al. (2012) and Zhao et al. (2013) considered the relationship between automobile ownership and the ridership[8, 11, 13, 4, 15]. For the third category, accessibility, Chu (2004), Kuby et al. (2004), Estupinan and Rodriguez (2008), Loo et al. (2010), Sung and Oh (2011), Gutierrez et al. (2011), Choi et al. (2012), Cardozo et al. (2012), Guerra et al. (2012) and Zhao et al. (2013) analyzed the influence of bus feeder system on transit ridership[8, 7, 27, 11, 10, 13, 5, 14, 15]. Moreover, Kuby et al. (2004), Estupinan and Rodriguez (2008), Loo et al. (2010), Sohn and Shim (2010), Thompson et al. (2012), Choi et al. (2012), Guerra et al. (2012) and Zhao et al. (2013) studied on the station accessibility[7, 27, 11, 10, 13, 5, 14, 15]
. Finally, with regard to network structure, Kuby et al. (2004) took the influence of transfer station and terminal station on ridership into account, but both of them were regarded as dummy variables in their models
[7]. So far, few relevant studies have been carried out from the quantitative perspective by using measurements from the field of complex network, such as quantifying the practical significance (e.g., transfer, terminal station and the importance of stations’ positions that lie on a large number of paths taken by flows) by the calculation of nodes’ degree centrality and betweenness centrality in networks.3 Methodology
3.1 Geographically Weighted Regression
GWR model is an extension of OLS, which can be formulated as:
(1) 
When geographical location factors are introduced into regression parameters to allow local parameter estimation, OLS model evolves into GWR model, which is as following:
(2) 
Where and are observed values of dependent variable and independent variables at the location of , which is geospatial coordinates of observation point , called a model calibration location. In the context of metro networks, and are observed metro ridership and potential influencing factors at station with the geospatial coordinates .
is the normally distributed error term (with the expected value 0 and a constant variance).
refers to unknown coefficients associated with spatial position pending for estimation. is the number of independent variables. The GWR model at location in matrix term is:(3) 
The vector of estimation regression coefficients at location
is:(4) 
Where is the matrix of independent variables, which typically includes a column of 1s for the intercept. is the diagonal weights matrix, which contains the geographical weights in its leading diagonal and 0 in its offdiagonal elements. The weights matrix is calculated for each location and applies weights to observations . Here, more weight generally will be placed more emphasis on closer observations to the model calibration location . is the vector of dependent variable values; and is the vector of local regression coefficients at location for independent variables and an intercept term.
The weights matrix is calculated from a kernel function . There are many kinds of kernel functions. According to the nature of the bandwidth , kernel functions can be divided into two groups: the fixed kernel function and the adaptive kernel function. Gaussian kernel function is a typical example of the fixed kernel, which was expressed as Equation (5).
(5) 
where is the distance between the calibration location and location , and is the kernel bandwidth parameter.The fixed kernel function assumes that the bandwidth at each location is a constant across the study area. If the locations and satisfy , , whereas decreases according to a Gaussian curve as increases. However, the weights are nonzero for all data points, no matter how far they are from the location [20]. If the sample points are distributed as clusters in the study area but not regularly spaced, it is generally appropriate to allow the kernel to adapt this irregularity by increasing its size when the sample points are distributed sparser and decreasing its size when they are denser. A convenient way of implementing this adaptive bandwidth specification is to choose a kernel which allows the same number of sample points for each estimation. A commonly used adaptive kernel function is the bisquare kernel function, as follows:
(6) 
where at the location () and at the location when [31].
Both the Gaussian kernel function and the bisquare kernel function are highly dependent on bandwidth . Thus, the kernel bandwidth should be estimated firstly to fit the GWR model. The most common method in practice is leaveoneout crossvalidation (CV) across all the calibration locations proposed by Cleveland(1979) [32]. CV is an iterative process that finds the kernel bandwidth with the lowest associated prediction error of all the dependent variables . For each calibration location , it removes the data for observation in the model calibration at location and predicts from the other data points leaving out and the kernel weights associated with the current bandwidth. The LeaveOneOut CV (LOOCV) function is , where is the fitted value with location omitted during the fitting process. The optimal bandwidth minimizes the CV function.
An alternative to CV in kernelbandwidth estimation is the Akaike information criterion (AIC), proposed by Fotheringham et al (2002)[33]. CV and the AIC are also methods applied in model selection, and more details about AIC and model selection can be referred to Burnham and Anderson, 2002 [34]. There is little published research on discussing if CV and AIC will calculate an uniform solution or whether one method is preferred under a certain condition.In this paper, we adopt CV method. Actually, the selection of kernel function and bandwidth depends on the specific distribution of sample points in the study area, which has a strong impact on the descriptive and predictive power of GWR models [31]. Finally, with the kernel weights calculated at each calibration location and the determined bandwidth from the kernel function , the regression coefficients are estimated at each model calibration location, consequently, the values of dependent variable can be estimated as following:
(7) 
3.2 Distance Matrix considering Network Structure
In practice, since the GWR model is applied to metro station ridership estimation, it is necessary to take the network connection intermedia into consideration when calculating the distance, however, existing studies usually calculate the distance matrix only considering coordinates, i.e.,just using Euclideanbased distance metric to calibrate the distance no matter how the locations connected. In practice, in metro networks, one station is connected to another station with railway, so the Euclideanbased distance cannot reflect the actual distance between two stations. The study uses networkbased distance to measure the spatial connectivity instead of Euclideanbased distance.The two distance measurement methods can be illustarted as Figure 1.
The Euclideanbased distance between stations and can be calculated by:
(8) 
Here, we calculate the distance between two close stations and considering the radius of the earth according to Equation (9):
(9) 
Thus, the networkbased distance between stations and can be calculated by:
(10) 
Where denotes the th path between stations and ,and . is the set of paths between stations and . is the number of stations the th path passes through. So the networkbased distance is exactly the distance of the shortest path along the railway lines, which considers the network connection intermedia. In this way, the interpoint networkbased distance matrix can be obtained. Therefore, the shortest path length of every two stations considering railway connection will be adopted to calculate the networkbased distance matrix. Here, we made use of distances fuction in “igraph” package (see: https://cran.rproject.org/web/packages/igraph/index.html)to calculate the length of all the shortest paths between vertices in the network. As we adjusted the distance measurement of metro networks, the weights calculated from kernel functions will be changed accordingly.
3.3 Least Absolute Shrinkage and Selection Operator (LASSO)
To address the issue of collinearity in the GWR framework, the Least Absolute Shrinkage and Selection Operator (LASSO) is deemed a proper method because it more directly reduce the variance in the regression coefficient while retaining interpretability of covariate effects[23].In this study, we will also consider the lasso version of GWR framework. As a kind of shrinkage method, the lasso not only tends to reduce the variability of the estimates thus can improve the model’s stability, but also can set some of the coefficients to zero, thus allows for variable selection. The lasso makes use of the norm. penalties are convex and the assumed sparsity can lead to significant computational advantages. The lasso is defined as following:
(11) 
subject to
(12) 
where is a parameter that controls the degree of coefficient shrinkage. Tibshirani (1996) proved that the lasso constraint is equivalent to adding the penalty term to the Residual Sum of Squares (RSS). Thus a direct relationship between and which is a complexity parameter that controls the degree of shrinkage of coefficients. Hence, the lasso coefficients can be also expressed as:
(13) 
The generally used methods for solving the lasso are standard convex optimizer [35] and least angle regression (LARS) (Efron et al 2004)[36]. In this study, we adopted LARS to solve the lasso. The procedure of the LARS algorithm can be referred to Wheeler, 2009[22].
3.4 Adapted Geographically Weighted Lasso (AdaGWL) model
To implement the lasso in GWR, Wheeler (2009) developed a framework called Geographically Weighted Lasso (GWL) and wrote an R package “gwrr” including gwl.est function to run GWL models [43]. Based on to the framework of Wheeler (2009), we developed our AdaGWL model in the context of metro ridership estimation by taking network connection intermedia into consideration[22]. Here, we called lars function in “lars” package [45] to implement lasso methods and GWmodel package to implement GWR framework [42].
For solving the AdaGWL model we proposed, the algoritms including the subroutines and the main body are designed. First of all, the subroutine to find the best local LARS solution at each location by minimizing LOOCV Root Mean Square Error (RMSE) using LARS is shown in Algorithm 3.4.
1.1 Algorithm 1 LocalLARS() algorithm for obtaining the best LARS solution at each location by minizing CV RMSE.
Besides, the subroutine of bandwidth selection algorithm via binary search is shown in Algorithm 3.4.
1.1 Algorithm 2 BandwidthSelector() algorithm for selecting the bandwidth by the binary search for the minimum error.
With the introduction of subroutines earlier, the main body for implementing AdaGWL is shown as Algorithm 3.4:
1.1 Algorithm 3 Pseudo code to estimate the AdaGWL solutions.
//Phase 1: Estimate the local scaling GWL parameters when the LOOCV error is minimum. //
//Phase 2: Estimate the final local scaling GWL solutions using the shrinkage parameter and bandwidth estimated in Phase 1.//
4 Empirical Study Area and Data
In this paper, we concern investigating factors influencing the transit ridership at metro station level in Shenzhen, which is supported by a relatively large Metro transportation network, consisting of 5 lines (Line 1(Luobao Line), Line 2(Shekou Line), Line 3(Longgang Line), Line 4(Longhua Line) and Line 5 (Huanzhong Line)) and 118 stations in the year of 2013. Figure 2(a) is the schematic map of Shenzhen metro in 2013^{1}^{1}1Source: http://toursmaps.com/wpcontent/uploads/2017/02/shenzhen_metro_map1.gif. Figure 2(b) shows the spatial distribution of those stations. The population of Shenzhen at the end of 2013 is about 10,628,900, and the population density is 5,323 persons/km2 ^{2}^{2}2Source: http://www.sztj.gov.cn/nj2014/indexeh.htm.
4.1 Data Description
The Shenzhen metro ridership data at station level were aggregated by using the data collected through Automatic Fare Collection (AFC) system of Shenzhen Metro Corporation in China. The dataset includes the total information about entryexit smart card records. The data used in the research cover a time span of 7 days from October 14th (Mon) to 20th (Sun) in the year of 2013, which includes summed boarding and alighting ridership amounts at two time resolutions with five kinds of time periods: time of the day ((1) rushhour and (2) nonrush hour) and day of the week ((3) average weekday, (4) average weekend and (5) average daily ridership of the whole week). The ridership data at two time resolutions are adopted as dependent variables in the fitting of models. The independent variables represent factors hypothesized to influence station ridership (A detailed description is provided in Table1
). These variables can be classified into four categories: (1) Social economic variables; (2) Land use variables; (3) Intermodal traffic access variables; and (4) Network structure variables. Before collecting the data such as land userelated variables and population, a critical first step is to evaluate the walking distance to metro stations, i.e., the size of a PCA, aiming to determine the range of data collection. As the average friendly walking distance is generally assumed to be 500m in large and middlesized cities according to Kim et al. (2017)
[37], we also define the distance of PCA of each Shenzhen metro station as 500m. In our work, we use a buffer to create circular PCA by 500m. Based on the buffer with 500m radius determined, population, all of the land userelated data and the number of bus stations are collected subsequently.Categories  Independent variables  Acronym of variables  Source 
Land Use  No. of residential units  Residence  Baidu map 
No. of restaurants  Restaurant  Baidu map  
No. retailers/shopping  Shopping  Baidu map  
No. of schools  School  Baidu map  
No. of offices  Offices  Baidu map  
No. of banks  Bank  Baidu map  
No. of hospitals  Hospital  Baidu map  
No. of hotels  Hotel  Baidu map  
Network Structure  Distance to the city center  Dis_to_center  Calculated 
Degree centrality  Degree  Calculated  
Betweenness centrality  Betweenness  Calculated  
Social Economics  Population  Pop  Worldpop 
Days since opened  Days_open  Wikipedia  
Intermodal Traffic Access  No. of bus stations  Bus  Baidu map 
4.2 Dependent variable
This paper aims to identify and analyze different factors influencing the ridership at station level. In all available days(Oct. 14th Oct. 20th), the count of average daily records of weekdays is about 2,513,330, and that of weekends is about 2,480,449, which is less than that of weekdays, indicating the daily trip frequency of weekdays is higher than that of weekends.
We conduct preliminary statistical analysis using metro AFC data on October 14. Figure3(a) shows the spatial distribution of AFC data records in one day. It presents that the records are most densely distributed at Grand Theatre Station and Laojie Station, closely followed by Huaqiang Road station and Luohu station, and the records of other stations have relatively sparse distribution.
Figure3(b) is about temporal distribution of AFC data records, which shows that the spatial distribution of records has a peak value at both 8:00 and 18:00 on no matter weekdays or weekends, and the records in rush hours on weekends are significantly less than those on weekdays. In addition, it is found that the peak value in the morning rush hour is generally less than that in the evening rush, which may result from people’s travel time (such as commuting time) inconsistency in the morning while consistency in the evening. Besides, the records of other periods except for rush hours of weekends are more than those of weekdays. A possible reason could be that people take more noncommuting trips on weekends than on weekdays.
Additionally, the characteristics of temporal distribution of records on weekdays and weekends are quite similar, which suggests that there are similar metro travel patterns with morning and evening peaks on weekdays and weekends in Shenzhen. Moreover, from the perspective of distribution of records in a single day, it is also found that the maximum evening peak value falls on Friday. It indicates that more people tend to take trips in addition to daily commutes on Friday evening because of the last workday.
According to the analysis above, the travel demands and travel patterns not only differentiate in spatial but also differentiate at different time resolutions, such as the level of time of the day and day of the week.
4.3 Independent variables
The independent variables represent factors hypothesized to influence station ridership (Table1). The variables can be classified into four categories: (1) Land use; (2) Social economic variables; (3) Intermodal traffic access variables; and (4) Network structure variables.
4.3.1 Land use variables
All of the land userelated data within a PCA were collected from Baidu Map with the assistance of API, and land use variables consist of the stations’ nearby residence, entertainment, services, business, education, offices. Specifically, the information covers the numbers of residence, restaurants, schools, offices, hospitals, banks, shopping places, and hotels within 500m PCA.
4.3.2 Social economics
With regard to social economic variables, they consist of the population distribution of Shenzhen in the year of 2013 and operation days since metro stations opened. The information of days since metro lines and stations opened was collected from “Wikipedia” ^{3}^{3}3Source: https://en.wikipedia.org/wiki/Shenzhen_Metro . The population data were collected from the website of Worldpop ^{4}^{4}4Source: http://www.worldpop.org.uk/data/get_data/. The format of the population file is Geotiff. The file provides estimated numbers of people per grid square at degrees spatial resolution (approximately 100m at the equator), which can be projected to “GCSWGS1984” geographic coordinate system. During the data preprocessing, the population within each buffer can be obtained by summing up the value of the grid falls into the metro station buffer. Figure 4 shows the population distribution of the whole city of Shenzhen in the year of 2013. Meanwhile, the buffers of metro stations with a radius of 500 m were created by using ArcGIS 10.2[39], which are also illustrated in Figure 4. Through the preliminary visualization in Figure 4, it was noted that population is densely distributed near the metro region. The influence of population density within each station buffer on ridership is pending for analysis in the model.
4.3.3 Intermodal traffic access variables
As for intermodal traffic access, here we only considered the feeder bus system, and the number of bus stations within 500m PCA of a metro station was hypothesized to be positively related to station ridership, which was also collected from Baidu Map ^{5}^{5}5Source: https://map.baidu.com/.
4.3.4 Network structure variables
In this paper, network structure variables comprise the degree centrality and betweenness centrality of the metro network nodes, and spatial characteristics of each station: distance to city center. In the field of complex networks, as degree is a simple centrality measure that counts how many neighbors a node has, and the betweenness centrality for each node refers to the number of shortest paths that pass through the node (Erciyes, 2015) [40], thus they are correlated to the information for transfer stations or terminal stations, and the importance of stations in the aspect of their controlling over flows passing between others of metro networks. As for the distance of each station i to the city center, i.e., Shenzhen Municipal People’s Government, located in Futian District, we calculate it by the following (14) considering the effect of the radius of the earth:
(14) 
Where, is the radius of the earth, and are the latitude and longitude of the city center and station , respectively. The related geographical data were collected from Google map ^{6}^{6}6Source: https://maps.google.com.
5 Model Implementation and Results Discussion
In this section, we present the implementation details for metro station ridership modeling using the proposed AdaGWL. We then use this implementation in our case study for results comparisons (with OLS, GWR, and GWL models), aiming to demonstarte the superiority of AdaGWL model, and further analyze the results of AdaGWL combined with the practical meanings.
5.1 Implementation
5.1.1 Data processing
First of all, for the sake of convenience in representing and understanding, we use alphanumeric code instead of Chinese to denote each station name. Here, we define identifiers for station names according to the following rules: (1) nontransfer stations consist of 3 digits, where the first digit denotes the line number, and the rest 2 digits denote the sequential number of station; (2) transfer stations start with character t followed by 3 digits, where the first 2 digits denote the intersection of two lines, and the last digit means the sequential number of intersections between those two lines. For example, “402” represents the 2nd station of Line 4, and “t131” represents the transfer station that is the first intersection of lines 1 and 3. In this way, all of 118 stations can be encoded by such identifiers containing the line and station information literally.
Shenzhen metro implements AFC system in the whole rail network, which brings chance to record accurately the information of passengers’ boarding and alighting points. The information makes it possible to possess the temporal and spatial distribution law of ridership, getting the realtime ridership data of each station. This paper is based on the boarding and alighting ridership (smart card) data of 118 stations of Shenzhen metro covering October 14th to October 20th (a whole week), 2013. In the study, we sum up the boarding (entry) and alighting (exit) records at different time resolutions, such as the level of time of the day and day of the week. Therefore, different regression models with different dependent variables, those are rushhour(evening:17:0019:00) ridership, nonrush hour (9:0017:00, 19:0023:00) ridership, average weekday ridership, average weekend ridership and average daily ridership of the whole week (the operation times of Shenzhen metro is 6:3023:00), will be built intending to validate the model’s effectiveness and applicability to different time resolutions, and further to find if the factors influencing the stationlevel ridership differentiate at different time resolutions ^{7}^{7}7Average daily ridership of a whole week is the average of total ridership of seven days of week in operation times (6:3023:00). Rush hour ridership is calculated by the total ridership of evening rushhours from 17:00 to 19:00 of a whole week divided by 14 hours (multiply 2 hours by 7 days). Nonrush hour ridership is calculated by the total ridership of remaining time (9:0017:00, 19:0023:00) except for morning (7:009:00) and evening rushhours of a whole week divided by 84 hours (multiply 12 hours by 7 days)..
For the acquisition of data as independent variables mostly relies on the collection with the assistance of Baidu map API, all of land use variables are the aggregated data collected within 500m PCA, which can be used in the model directly.For the raw population data with Tiff format, they are processed and aggregated within 500m buffers with ArcGIS 10.2 . Figure 5 shows the overall procedures of population data processing with ArcGIS 10.2. The processing procedure contains 3 main steps described as follows.

Transform the coordinate information of metro station into points with shapefile format. Create a new point feature layer based on x and y coordinates defined in a source table and geographic coordinate system by using “Make XY Event Layer” tool.

Generate 500m buffers of metro stations. Copy features from the input layer to a feature class (shapefile format) by using “Copy features” tool as the layer above is temporary. Then Create buffer polygons around input features to 500 meters by using “buffer” tool.

Zonal statistical analysis. Aggregate the population within each buffer. Since the value of each grid represents the number of people per grid square at degrees spatial resolution (approximately 100m at the equator), so the population within each buffer can be obtained by summing up the value of the grid falls into the buffer by using “Zonal Statistics as Table” tool.
5.1.2 Spatial autocorrelation test to variables
Before building geographically weighted models, analysis was performed to test whether the candidate variables were spatially autocorrelated. First of all, the spatial distribution of variable candidates is analyzed. The results (Figure 6) indicate that these independent variables do have certain spatial correlation and aggregation to some extent. For example, population is densely distributed in the southeast regions, covering Luohu district and Futian district. Areas with large betweenness centrality generally locate on the closed railway loop of the metro network. Besides, the openning time of stations of Shenzhen metro is associated with the whole line openning time, and the distance to city center is exactly calculated according to the geographical location information, so these two variables have significant spatial correlation.
With the overview of spatial distribution of independent variables, the test of spatial autocorrelation can further detect how strong spatial correlation of variables are, this will provide a theoretical basis for the feasibility of applying a GWR model. Moran’s I is a measure of spatial autocorrelation developed by Patrick Alfred Pierce Moran (1950) [47] . All of the dependent variables and candidate predictors had estimated Moran’s I values higher than the expected E(I) (Table 2), thus, a positive spatial autocorrelation existed . Moran scatter plot can reflect the spatial autocorrelation intuitively. The scatter plot has four quadrants. If the observed value falls to the first and third quadrants, it indicates that there is a strong positive spatial correlation. If it falls to the second and fourth quadrants, it indicates there is a strong negative spatial correlation. Figure 7 shows several variables’ Moran scatter plot.
Variable  Moran’s I  Expected I  pvalue  

Dependent variables  Weekly_ridership  0.24116622  0.00854701  1.21E06 
Weekday_ridership  0.26389365  0.00854701  1.44E07  
Weekend_ridership  0.16612276  0.00854701  0.0004006  
Evenrush_ridership  0.27729573  0.00854701  3.90E08  
Nonrush_ridership  0.24779725  0.00854701  4.64E07  
Candidate predictors  Residence  0.39365231  0.00854701  1.35E14 
Restaurant  0.34171521  0.00854701  1.69E11  
Shopping  0.23155195  0.00854701  3.92E06  
School  0.28917679  0.00854701  4.16E11  
Offices  0.17494683  0.00854701  1.37E05  
Bank  0.38831017  0.00854701  1.98E14  
Bus  0.31229433  0.00854701  7.90E10  
Hospital  0.21127159  0.00854701  1.26E05  
Hotel  0.46648095  0.00854701  2.2e16  
Dis_to_center  0.95915286  0.00854701  2.2e16  
Degree  0.12583388  0.00854701  0.005471  
Betweenness  0.24105756  0.00854701  1.67E06  
Pop  0.71033294  0.00854701  2.2e16  
Days_open  0.38318677  0.00854701  2.11E13 
According to Moran scatter plots (see Figure 7), it can be seen that Moran’s I values of all variables above are not close to 0, which indicates these variables are not randomly distributed in space, and mostly falls to the first and the third quadrants. It is shown that each variable is positively spatial correlated more or less, especially eight of independent variables: population, distance to city center, and days since opened, and the number of residence, restaurants, banks, bus stations, and hotels have strong spatial correlation as Moran’s I is greater than 0.3 (Cressie, 1993) [48]. The result also lays the foundation for the feasibility of followup study.
5.1.3 Local collinearity diagnostics
Since strong spatial correlation was found for the variables in the research, it is reasonable to build GWR models to analyze influencing factors on station ridership of Shenzhen Metro. Here, condition number that evaluated collinearity were 32.50 (), indicating the low degree of multicollinearity among the candidate independent variables. However, through a series of local collinearity diagnostics for the local regression coefficients of GWR model for ridership estimation at different time resolutions, we found there was severe collinearity in local regression coefficients according to local condition numbers of all stations greater than 1000 (see Figure 8) . Multicollinearity is of great concern when condition numbers are greater than 1000, and results become unstable in the presence of strong local collinearity. It just showed that the local regression coefficients were potentially collinear even if the underlying exogenous variables in the data generating process were uncorrelated [23]. Due to the collinearity issue in the GWR model for our case, it is proper to implement AdaGWL model to the data.
5.1.4 Model implementation
A series of models icluding OLS, GWR, GWL, and the proposed AdaGWL were built in R software[41] by using the “GWmodel”, “gwrr”, “fields”, “lars” and “igraph packages [42, 43, 44, 45, 46].The algorithm for solving the AdaGWL model is coded in R programming language and tested on Intel(R) Core(TM) i56500 CPU, 3.20 GHz processor. Windows 7 operating system is used for testing. For GWR, GWL, and AdaGWL models, the selection of kernel function and bandwidth is necessary, which depends on the specific distribution of sample points in the study area, which has a strong impact on the descriptive and predictive power of GWR models [31], it is necessary to consider the distribution of sample points of Shenzhen metro network while selecting the kernel function. Since the stations of Shenzhen metro are relatively regularly spaced, so we adopted Gaussian kernel function to calculate the weight matrix and determined the bandwidth based on CV.
5.2 Results analysis
5.2.1 Comparative analysis
In the following part, OLS model, GWR model (an spatial extension of OLS), Adapted GWR (AdaGWR) model for metro network (i.e., GWR model calibrated with networkbased distance metric), GWL model (GWR with LASSO enabling simultaneous coefficient penalization and model selection), AdaGWL model for metro network (GWL model calibrated with networkbased distance metric) are implemented to the Shenzhen metro dataset described earlier, the results of models for estimating ridership at different time periods are shown in Figure 9:
The accuracy of the estimated responses is measured by calculating RMSE. The RMSE is the square root of the mean of the squared deviations of the estimates from the true values, and should be small for accurate estimators. Rsquared is a statistical measure that represents the proportion of the variance for a dependent variable that’s explained by independent variables. The results of fitting five models to the data provide the error values shown above. In general, the performance ranks of five methods are the same for the five models (different time periods), which are “AdaGWLGWLAdaGWRGWROLS”. First of all, we can see the superiority of four local models (GWR, AdaGWR, GWL, and AdaGWL) over global model (OLS) in terms of estimation error of dependent variable and goodness of fit. Secondly, it is noted that the adapted models including both GWR and GWL performed better than original versions of GWR and GWL models, which proved the fitness of the adapted models to the metro network. Thirdly, AdaGWL for metro network do substantially better than other three models at estimating the dependent variable. Therefore, we can conclude that AdaGWL model for the metro network we proposed can estimate Shenzhen metro ridership at any time period more accurately.
5.2.2 Analysis of the local regression of AdaGWL model
1) To see which station a certain variable has the most impact on
Focusing on the result of AdaGWL model, the local regression coefficients’ distribution for each variable of all stations can be plotted as the bubble plots. Take the modelfor estimating average daily ridership of a whole weel as an example. The spatial distribution of local coefficients is shown in Figure 10.
Through understanding the spatial distribution of local coefficients (elasticities), it is possible to know how relations between the variables vary across space (estimated coefficients) and variables selection of all stations. In above figures, the bubble with the bold outline demonstrates the coefficient for the variable in the station equals to 0, and other bubbles’ colours identify the range of coefficients, the bubble with lighter colours mean the coefficient is larger, and vice versa. Take the population factor as an example, stations with large positive coefficients are mainly distributed in the centre, meaning that more trips per capita were expected in the central south area of the metro network, where commerce, administration and education are concentrated. Besides, we can note that for the factors of degree and hospital, there are a large number of stations with zero coefficient, so these factors are not so important factors for influencing most of Shenzhen metro stations’ ridership.
Besides, in order to verify the applicability of AdaGWL model to different time resolutions. We apply the models to estimate the ridership of other time resolutions. With regards to the models for nonrush hour ridership, the spatial distribution of local coefficients is illustrated in Figure 11.
According to Figure 11, we can see that more stations with zero coefficient than those of average daily ridership, and the coefficient matrix is very sparse, indicating that for a single station, main factors influencing the hourly ridership at nonrush hour are not manifold.
2) To see which variable has the biggest impact on a certain station
The heat map is applied to demonstrate the coefficient distribution of all variables and all stations (scaled by column (station) at first). For example, for station “313” (i.e., Tianbei Station), population is the factor that has the biggest positive impact on its ridership compared with other factors. For station “t251” (i.e., Huangbeiling Station), the number of shopping malls in its PCA has the biggest positive impact on its ridership compared with other factors.
Furthermore, in order to present the impacts of different factors on different stations more intuitively, based on the coefficients of (scaled by column (station)) of the model for average daily ridership of a whole week, we conduct clustering (Figure13
) to further illustrate the association among stations, so that we can see which groups of stations that all variables have similar effects on, that is, for a group of stations, some variables have the same great influence on ridership (coefficients are large) and some variables have the same slight influence on ridership (coefficients are small). Kmeans clustering is one of the simplest and popular unsupervised machine learning algorithms. The goal of Kmeans clustering is to find groups in the data, with the number of groups represented by the variable
. Kmeans algorithm works iteratively to assign each data point to one of groups based on the features that are provided. Data points are clustered based on feature similarity. Here, we choose the kmeans clustering method in this study. We determine the number of groups as 4 through minimizing the difference of WithinCluster Sum of Squares (WCSS).According to Figure 13, the four clusters are coloured by four colours. Referring to the distribution of functional zones in Shenzhen ^{8}^{8}8Source: http://www.szgeoinfo.com:5001/msmap/flex/landuse.html , we can see that the stations of cluster 1(purple) are mainly located in employment and educational based land. Stations in cluster 2 (green) are mainly located in commercial land. Stations in cluster 3 (pink) mainly distribute in traffic hubs land, and cluster 4(blue) are mainly located in residential based areas. To analyze the rationality of the categories as shown in the cluster analysis and the functional characteristics of the catergories, the results are summarized in Table 3.
Cluster  Functional zone  Cluster size (Number of stations)  Functional characteristics  Representative stations 

1  Employment and educational based  34  The coefficients of offices and schools landuse are relatively larger than those of other variables, so offices and schools can attract more commuting ridership in this zone. The change of offices and schools landuse will have great impact on the ridership of stations in this cluster.  Shenzhen University; Xili; Hightech Park; Daxin; Qianhaiwan; 
2  Commercial land  33  The coefficients of shopping land use are relatively larger than other variables, indicating shopping land use has great positive impact on these stations. Surrounding the stations in this cluster, commercial, recreational and other land uses are signally larger than other categories land uses, which attracts a large amount of non commuting ridership.  Guomao; Huaqiang Rd; Huaqiang North; Shopping Park; Houhai; Convention&Exhibition Center; 
3  Traffic hubs land  29  The coefficients of degree and bus stations are relatively larger than those of other variables, meaning that the transfer stations and bus stations have great impact on ridership. The stations of this kind are mainly located in urban external  
transportation hub (including railway station, passenger transport station, crossborder checkpoint, and Special Economic Zone Port), the station ridership is mainly derived from the passenger flow of traffic hub.  Luohu; Shenzhen North Station; Shekou Port; Chiwan; Buji; Xixiang  
4  Residential based  22  The coefficients of residential land use are relatively larger than those other variables, indicating the station ridership is mainly affected by the commuting passenger flow from residential areas. stations of this kind are also mainly located in residential based areas (even though the number of residential units surrounding these stations might be not the largest among other kinds of land use).  Baishizhou; Wuhe; Bu Xin; Jixiang; 
By this token, AdaGWL model can offer clear categorizing of functional zones of Shenzhen metro stations belong to, which makes findings more explainable and instructive to metro transportation and urban planning.
Here, we also use heatmap to demonstrate the coefficient distribution of all variables and all stations (scaled by column (station) at first) for nonrush hour ridership (see Figure 14), also, based on the coefficients of stations of the model for nonrush hour ridership, we did clustering to illustrate the association among stations (see Figure 15).
According to Figure 15, stations belong to cluster 1 (green) are mainly located in residential based areas. Stations in cluster 2 (green) are mainly located in commercial land. Stations in cluster 3 (pink) mainly distribute in traffic hubs land, and cluster 4(blue) are mainly located in employment and educational based land. The result is slightly different from that for average daily ridership, indicating different time resolutions will make stations present different characteristics.
6 Conclusion and Impications for Metro Planning and Periphery Development
In summary, this paper proposed an AdaGWL framework for modelling the ridership of metro systems, which considered the network connection intermedia when calculating the distance matrix instead of Euclideanbased distance, and also simultaneously performed regressioncoefficient shrinkage and local model selection. We demonstrated the superiority of our framework through a real world case study of Shenzhen Metro Systems, and meanwhile, we analyzed the influencing factors of Shenzhen metro stationlevel ridership from a local perspective. Different from previous works, the proposed direct demand model based on AdaGWL framework engaged local model selection and networkbased distance to estimate the stationlevel ridership accurately.
The results of the case study showed that AdaGWL model performed the best in terms of goodnessoffit compared with OLS, the ordinary GWR model, AdaGWR calibrated with networkbased distance, and ordinary GWL model. The implementation of the proposed AdaGWL model can provide a theoretical basis for interpreting the influencing factors of stationlevel ridership from a local perspective. First, through understanding the spatial distribution of local coefficients (elasticities), it was possible to know how relations between the variables vary across space (estimated coefficients) and variables selection of all stations, which was helpful to see which station an influencing factor has the most impact on. Second, the coefficient distribution of all variables and all stations made it possible to see which variable has the biggest impact on a given station. Third, through clustering analysis of the stations according to the regression coefficients, clusters’ functional characteristics were found to be in compliance with the facts of the functional land use policy of Shenzhen.
In general, the adapted version of GWL introduced in this paper not only extended the traditional GWR to deal with collinearity issues among regression coefficients and estimation of stationlevel ridership but also inspired metro planning and periphery development from a local perspective. First, Transit Oriented Development (TOD) planning is suggested to adjust measures to local conditions. Here, local conditions mean more than the geographical factors, and they also incorporate different local effects of influencing factors on metro ridership. Second, different functional zones need to adopt different strategies of diverting passenger flows. For example, in commercial developed areas, enhancing security when diverting passenger flows is vital, while improving efficiency is more important when diverting passenger flows in employment based areas. Third, for stations with unsaturated passenger flows, the attraction to passenger flows can be boosted in combination with the functionality of the region. These findings can also inspire the metro planning and periphery development of other cities.
Appendix A Appendix
a.1 Summary of literature review on direct demand models for ridership estimation
The related studies on direct demand models for ridership estimation are summarized as following.
References
 [1] The four step model. [Online]. Available: https://escholarship.org/uc/item/0r75311t
 [2] N. Marshall and B. Grady, “Sketch transit modeling based on 2000 census data,” Transportation Research Record: Journal of the Transportation Research Board, no. 1986, pp. 182–189, 2006.
 [3] J. Gutiérrez, O. D. Cardozo, and J. C. GarcíaPalomares, “Transit ridership forecasting at station level: an approach based on distancedecay weighted regression,” Journal of Transport Geography, vol. 19, no. 6, pp. 1081–1092, 2011.
 [4] O. D. Cardozo, J. C. GarcíaPalomares, and J. Gutiérrez, “Application of geographically weighted regression to the direct forecasting of transit ridership at stationlevel,” Applied Geography, vol. 34, pp. 548–558, 2012.
 [5] J. Choi, Y. J. Lee, T. Kim, and K. Sohn, “An analysis of metro ridership at the stationtostation level in seoul,” Transportation, vol. 39, no. 3, pp. 705–722, 2012.
 [6] R. Cervero, “Alternative approaches to modeling the traveldemand impacts of smart growth,” Journal of the American Planning Association, vol. 72, no. 3, pp. 285–295, 2006.
 [7] M. Kuby, A. Barranda, and C. Upchurch, “Factors influencing lightrail station boardings in the united states,” Transportation Research Part A: Policy and Practice, vol. 38, no. 3, pp. 223–247, 2004.
 [8] X. Chu, “Ridership models at the stop level,” National Center for Transit Research, University of South Florida, Tech. Rep., 2004.
 [9] D. Zhang and X. C. Wang, “Transit ridership estimation with network kriging: a case study of second avenue subway, nyc,” Journal of Transport Geography, vol. 41, pp. 107–115, 2014.
 [10] K. Sohn and H. Shim, “Factors generating boardings at metro stations in the seoul metropolitan area,” Cities, vol. 27, no. 5, pp. 358–368, 2010.
 [11] B. P. Loo, C. Chen, and E. T. Chan, “Railbased transitoriented development: lessons from new york city and hong kong,” Landscape and Urban Planning, vol. 97, no. 3, pp. 202–212, 2010.
 [12] H. Sung and J.T. Oh, “Transitoriented development in a highdensity city: Identifying its association with transit ridership in seoul, korea,” Cities, vol. 28, no. 1, pp. 70–82, 2011.
 [13] G. Thompson, J. Brown, and T. Bhattacharya, “What really matters for increasing transit ridership: Understanding the determinants of transit ridership demand in broward county, florida,” Urban Studies, vol. 49, no. 15, pp. 3327–3345, 2012.
 [14] E. Guerra, R. Cervero, and D. Tischler, “Halfmile circle: does it best represent transit station catchments?” Transportation Research Record: Journal of the Transportation Research Board, no. 2276, pp. 101–109, 2012.
 [15] J. Zhao, W. Deng, Y. Song, and Y. Zhu, “What influences metro station ridership in china? insights from nanjing,” Cities, vol. 35, pp. 114–124, 2013.
 [16] S. Chan and L. MirandaMoreno, “A stationlevel ridership model for the metro network in montreal, quebec,” Canadian Journal of Civil Engineering, vol. 40, no. 3, pp. 254–262, 2013.
 [17] A. Singhal, C. Kamga, and A. Yazici, “Impact of weather on urban transit ridership,” Transportation research part A: policy and practice, vol. 69, pp. 379–391, 2014.
 [18] C. Liu, S. Erdogan, T. Ma, and F. W. Ducca, “How to increase rail ridership in maryland: Direct ridership models for policy guidance,” Journal of Urban Planning and Development, vol. 142, no. 4, p. 04016017, 2016.
 [19] H. Pan, J. Li, Q. Shen, and C. Shi, “What determines rail transit passenger volume? implications for transit oriented development planning,” Transportation Research Part D: Transport and Environment, vol. 57, pp. 52–63, 2017.
 [20] C. Brunsdon, A. S. Fotheringham, and M. E. Charlton, “Geographically weighted regression: a method for exploring spatial nonstationarity,” Geographical analysis, vol. 28, no. 4, pp. 281–298, 1996.
 [21] M.J. Jun, K. Choi, J.E. Jeong, K.H. Kwon, and H.J. Kim, “Land use characteristics of subway catchment areas and their influence on subway ridership in seoul,” Journal of Transport Geography, vol. 48, pp. 30–40, 2015.
 [22] D. C. Wheeler, “Simultaneous coefficient penalization and model selection in geographically weighted regression: the geographically weighted lasso,” Environment and planning A, vol. 41, no. 3, pp. 722–742, 2009.
 [23] D. Wheeler and M. Tiefelsdorf, “Multicollinearity and correlation among local regression coefficients in geographically weighted regression,” Journal of Geographical Systems, vol. 7, no. 2, pp. 161–187, 2005.
 [24] D. C. Wheeler, “Diagnostic tools and a remedial method for collinearity in geographically weighted regression,” Environment and Planning A, vol. 39, no. 10, pp. 2464–2481, 2007.
 [25] G. Walters and R. Cervero, “Forecasting transit demand in a fast growing corridor: The directridership model approach,” Fehrs and Peers Associates, 2003.
 [26] B. D. Taylor, D. Miller, H. Iseki, and C. Fink, “Analyzing the determinants of transit ridership using a twostage least squares regression on a national sample of urbanized areas,” 2003.
 [27] N. Estupiñán and D. A. Rodríguez, “The relationship between urban form and station boardings for bogota’s brt,” Transportation Research Part A: Policy and Practice, vol. 42, no. 2, pp. 296–306, 2008.
 [28] J. Deng and M. Xu, “Characteristics of subway station ridership with surrounding land use: A case study in beijing,” in Transportation Information and Safety (ICTIS), 2015 International Conference on. IEEE, 2015, pp. 330–336.

[29]
J. Li, M. Yao, and Q. Fu, “Forecasting method for urban rail transit ridership at station level using back propagation neural network,”
Discrete Dynamics in Nature and Society, vol. 2016, 2016.  [30] N. Hu, E. F. Legara, K. K. Lee, G. G. Hung, and C. Monterola, “Impacts of land use and amenities on public transport use, urban planning and design,” Land Use Policy, vol. 57, pp. 356–367, 2016.
 [31] L. Guo, Z. Ma, and L. Zhang, “Comparison of bandwidth selection in application of geographically weighted regression: a case study,” Canadian Journal of Forest Research, vol. 38, no. 9, pp. 2526–2534, 2008.
 [32] W. S. Cleveland, “Robust locally weighted regression and smoothing scatterplots,” Publications of the American Statistical Association, vol. 74, no. 368, pp. 829–836, 1979.
 [33] A. S. Fotheringham, C. Brunsdon, and M. Charlton, Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. International Union of Crystallography, 2002.
 [34] K. P. Burnham and D. R. Anderson, Model selection and multimodel inference: A Practical Informationtheoretic Approach. Springer, 2002.
 [35] N. Gauraha, “Introduction to the lasso,” Resonance, vol. 23, no. 4, pp. 439–464, 2018.
 [36] J. I. Efron B, Hastie T and T. R, “Least angle regression,” Annals of Statistics, vol. 32, no. 2, pp. 407–499, 2004.
 [37] D. Kim, P. Elek, and R. Mirjana, Mapping Urbanities: Morphologies, Flows, Possibilities.
 [38] M. J. Panik, Growth curve modeling: Theory and applications. John Wiley and Sons, 2014.
 [39] Environmental Systems Research Institute, Inc. (Esri), “Arcgis.” [Online]. Available: http://desktop.arcgis.com/en/arcmap/
 [40] K. Erciyes, Complex Networks: An Algorithmic Perspective. CRC Press, Inc., 2014.
 [41] R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2009, ISBN 3900051070. [Online]. Available: http://www.Rproject.org
 [42] M. C. C. B. T. N. Binbin Lu, Paul Harris and I. Gollini, GWmodel: GeographicallyWeighted Models, 2018, r package version 2.06. [Online]. Available: https://CRAN.Rproject.org/package=GWmodel
 [43] D. Wheeler, gwrr: Fits geographically weighted regression models with diagnostic tools, 2013, r package version 0.21. [Online]. Available: https://CRAN.Rproject.org/package=gwrr
 [44] J. P. Douglas Nychka, Reinhard Furrer and S. Sain, fields: Tools for Spatial Data, 2018, r package version 9.6. [Online]. Available: https://CRAN.Rproject.org/package=fields
 [45] B. E. Trevor Hastie, lars: Least Angle Regression, Lasso and Forward Stagewise, 2013, r package version 1.2. [Online]. Available: https://CRAN.Rproject.org/package=lars
 [46] G. C. et al., igraph: Network Analysis and Visualization, 2018, r package version 1.2.2. [Online]. Available: https://CRAN.Rproject.org/package=igraph
 [47] P. A. Moran, “Notes on continuous stochastic phenomena,” Biometrika, vol. 37, no. 1/2, pp. 17–23, 1950.
 [48] N. A. C. Cressie, Statistics for Spatial Data. J. Wiley,, 1993.