1 Introduction
Various institutions around the world are increasingly incorporating autonomous vehicle fleets into their oncampus mobility solutions ntu2018autonomous ; linc2018largest ; driverless2018 . Whereas traditional bus services often follow fixed itineraries, autonomous vehicle fleets can have their itineraries dynamically adapted in realtime, e.g., per changing demand for mobility during a work day. In other words, autonomous mobility services are amenable to demandresponsive routing, if demand can be predicted ahead of time.
Fortunately, institutional campuses are often finely meshed with WiFi access points, which can in turn be used as sensors of crowd presence sevtsuk2009mapping ; meneses2012large . By using WiFi information to detect movements of wireless devices across campus, an estimation of current and future demand for mobility can be made. An opportunity thus emerges for realtime, demandresponsive autonomous mobility.
The transport literature is rich with methods for demand prediction and route optimization. Multiple studies on these methods concentrate only on one of the two aspects, leaving the other to external or future works. Moreover, multiple approaches do not account for uncertainty in the demand, and use point estimates instead of more informative full distributions. In this study, however, we offer a framework for demandresponsive routing, which explicitly incorporates both demand prediction and route optimization, while leveraging demand uncertainty for robustness in optimization.
1.1 Overview of Our Solution Framework
The setting in which we apply our framework is that of an autonomous mobility service, for instance, a fleet of autonomous shuttles in a university or hospital. People move in the serviceable area between connected locations, e.g., buildings connected by roads. These movements are captured by a network of sensors, e.g., a network of WiFi hotspots probing people’s wireless devices.
We partition the temporal dimension into consecutive lags (e.g., ), so that for each OD pair, in every lag , the corresponding number of movements follows some latent distribution. This is the marginal distribution for one OD pair at lag , which may be correlated with the marginals of other OD pairs at that lag. Overall, there also exists a latent joint distribution for all OD pairs at lag , which yields the number of simultaneous movements in the serviced area while accounting for all correlations.
The number of movements between each OD pair represents the demand for the mobility service, while the fleet of vehicles represents the supply. Based on the data collected at lag , our online framework both predicts the demand and accordingly optimizes the supply for , as illustrated in Figure 1. For demand prediction, the framework estimates the marginal distributions, which we offer to do through Quantile Regression. For supply optimization, the framework computes the best service routes using a scenariobased method, which samples from an estimated joint distribution, and then solves a linear optimization program each sample. As a key link between demand prediction and supply optimization, our framework constructs a copula, which combines the marginal distributions into a joint distribution.
This predictive optimization framework thus not only caters explicitly for both demand prediction and supply optimization, but also keeps them decoupled thanks to the use of a copula. As the copula preserves all correlations between OD pairs in the joint distribution, we are free to work more easily with marginal distributions during demand prediction. We advocate here for estimating each marginal by estimating several quantiles of its Cumulative Distribution Function. Importantly, our framework retains uncertainty in travel demand (rather than reduce it to point estimates), and takes advantage of this uncertainty during route optimization. Our optimization scheme is robust: instead of predefining a set of scenarios for optimization, we independently sample travel demand vectors from the copula, solve an optimization problem for each vector, and aggregate the solutions into a single optimal result.
1.2 Summary of Contributions
The main contributions of this paper are thus as follows:

We develop a predictive optimization framework for demandresponsive transit network design, which decouples demand prediction from adaptive supply optimization, and joins them through a copula.

On the demand side, we develop a Quantile Regression model for predicting the marginal distribution of movement counts for each OriginDestination pair.

On the supply side, we devise a robust optimization method for demandresponsive transit network design under stochastic demand, based on a route generation procedure and a novel maxutility maxflow formulation.

We demonstrate the capabilities of our predictive optimization framework through a case study based on realworld WiFi data.
1.3 Paper Structure
The rest of this paper is structured as follows. In Section 2, we review literature on demand prediction and supply optimization under uncertainty, and compare our study to previous studies in this research field. In Section 3, we present models for demand prediction with uncertainty, which we offer to do through Quantile Regression. There, we also apply the prediction models to a case study of autonomous mobility in a Danish university campus, based on actual WiFi records. In Section 4, we present a novel method for online, demandresponsive transit network design under stochastic demand. In Section 5, we apply this stochastic optimization method to the aforementioned case study. Finally, in Section 6, we summarize our findings, draw conclusions and identify future research directions.
2 Literature Review
2.1 Demand Prediction with Uncertainty
There are several benefits for preserving uncertainty in predictions, rather than providing only point estimates, such as the predictive mean or standard deviation. Uncertainty in predictions conveys a degree of confidence, so that decisions can then be made more intelligently. For example, when given a full predictive distribution, a supply optimization method can prepare for a full range of possible scenarios, from best case to worst case. In contrast, giving only the predictive mean and standard deviation may be misleading, as e.g., the mean of a multimodal distribution may well lie in a neighborhood of low probability.
Quantile Regression (QR) can be used to approximate a full predictive distribution by estimating several of its quantiles, without assuming any particular parametric form for the distribution. The QR model itself can follow either of several functional forms, such as linear or splines koenker2005quantile , nonlinear or nonparametric with Gaussian Processes antunes2017review ; yang2018power , or vectorvalued sangnier2016joint . As more quantiles are used, the approximation which QR yields becomes more precise and more robust to artifacts in the true predictive distribution, such as multimodality and nonsymmetry.
In this paper, we evaluate the proposed framework through a case study, for which linear QR outperforms other QR models. Nevertheless, the framework supports multiple models other than QR for predicting arbitrary marginal distributions. For a moderately sized dataset as in our case study, Bayesian Inference
peled2019modelcan also be used with proper modeling of random variables and their dependency structure. Alternatively, Deep Neural Network models can be constructed along with prediction intervals, as described in
mazloumi2011prediction and khosravi2011comprehensive .2.2 Transit Route Network Design Problem (TRNDP)
The transit route network design problem (TRNDP) deals with the planning of optimal routes for transit services. Extensive research into TRNDP variants and solution methods has been conducted since the late 1960s. Most studies had different assumptions and simplifications in their models due to the complexity of the problem, which is combinatorial in nature. We refer the readers to Kepaptsoglou2009sw
for an extensive review of previous studies on TRNDP up to year 2007. The author classified the studies according to their objective functions, decision variables, transit network structure, demand patterns and characteristics, as well as the solution methods. Here we shall focus on two aspects relevant to our study: the mathematical programming formulation for TRNDP, and demands elasticity and stochasticity.
2.2.1 Mathematical programming formulation for TRNDP
The transit route planning process typically consists of five steps, as outlined in Ceder1986lk . The process is sequential, in that the decisions at each step become the input for the next step. The process starts with a network design problem, where the bus stops and routes are decided. Then the frequencies are determined based on the fleet size, followed by timetabling and scheduling, and finally the scheduling of drivers. However, it is clear that while the solutions obtained at each step of the process are optimal, they may not necessarily be optimal overall. Despite the temptation to formulate a single model to globally optimize the bus planning procedure, one must note the complexity involved, as the problem at each step of the process is highly combinatorial. Here we shall review the steps relevant to our study: network design, and frequency determination.
For network design, TRNDP instances are usually formulated based on a network graph, where the nodes represent transit stops, and edges represent connective paths between nodes. The objective is then to select which transit stops to serve, and the order of visiting them in each of the routes, based on travel demands and generalized costs. A solution method for network design gives the optimal routes and their corresponding temporal route length.
The frequency determination problem, on the other hand, is to optimally allocate vehicles to the different routes. This allocation largely depends on the temporal route lengths. The integration of the two problems can increase the complexity substantially, and often results either in a single highly nonlinear model Cipriani2012zm ; Fan2008lg ; Fan_Wei2006mj , or a bilevel mixed integer model Szeto2011qj ; Szeto2014pn .
2.2.2 Demands elasticity and stochasticity
While some previous studies have considered demand elasticity, which is an inherent property of a real transit network, most studies have only considered fixed demand. In Fan_Wei2006mj , Fan and Machemehl have attributed this to the NPhard complexity of TRNDP. The consideration of elastic demands often results in an iterative procedure, which repeatedly chooses routes structure and demand splits until some convergence criterion is achieved Cipriani2012yr ; Lee_YoungJae2005on ; Fan_Wei2006lc
. However, this would mean that no optimal solution can be guaranteed due to the heuristic nature of the problem. Moreover, in this method, demands are not considered as directly dependent on the service quality. Instead, the elasticity is internalized in demands stochasticity alongside other factors that affect demand stochasticity, e.g., weather and seasonal variations.
One common method of dealing with demands stochasticity is by doing robust optimization. Robust optimization comes in several forms, including minimax robust optimization An2015tv ; Laporte2010ud ; Lou2009et ; An2016bx , and scenariobased robust optimization. While minimax robust optimization guarantees feasibility for all possible demand scenarios, it can sometimes be overly conservative as acknowledged in An2015tv
. Also, the method does not make good use of the information embedded in the distribution of demands data, because taking the worstcase scenario essentially reduces the probability distribution to a single point value. In contrast, we optimize routes based on quantile estimates of the true demand distribution, and thus obtain routes which are optimal in expectation.
Another commonly used method to improve robustness is scenariobased robust optimization. Stochasticity is then generated either by adding random perturbations to the average demands, by explicitly constructing demand scenarios for different seasons and/or time of the day, or by random sampling of parameters from their probability densities. In Amiripour_S_M_Mahdi2014om Amiripour et al. consider TRNDP with variable demands. To simulate demand stochasticity, they generate perturbations of a demand matrix, each by adding stochastic noise. While the method does add robustness to the solution by introducing random noises to the demands, the random perturbations may not necessarily reflect the stochasticity that can be observed in collected demand data. In contrast, our framework generates demand stochasticity not through random permutation, but by estimating the true predictive distribution itself. As more data is collected over time, the framework updates this estimation of the true joint distribution, so that its samples from the distribution conform to updated stochastic information.
Another paper by Amiripour et al. Amiripour2014rp studied the design of a largescale bus network with seasonal demand variation. They propose a scenariobased route optimization method based on metaheuristic methods. Their mathematical programming model yields an optimal route for a whole year by enumerating scenarios for each season, and minimizing the weighted sum of the objective for each season. This study bears some similarities to our paper, but one major difference is that they assume fixed demands, whereas we consider demands to be dynamic and stochastic.
2.3 Combining Demand Prediction with Supply Optimization
A seminal example of combining demand prediction with supply optimization is found in Stein1978DAR (1978). There, the author studies itinerary optimization for DialaRide services, and assumes that users’ requests are independently drawn from a uniform spatial distribution over the serviced area. This assumption is overly simplifying for a shuttle service on a university campus, where student movements are often spatiotemporally correlated and unevenly distributed.
More recent combined methods can be traced back to cortes2009hybrid (2009), who use a hybrid statespace model for both demand estimation and route optimization. The authors deal with deliveryandpickup with predetermined stations, and combine both solution components within the formulation of the objective function. As noted in ichoua2006exploiting , such coupling is also present in a line of precedent paper by Powell and his team (powell1988comparative ; powell1996stochastic ; godfrey2002adaptive ; powell2003stochastic ; spivey2004dynamic ). Contrary to cortes2009hybrid , our approach decouples demand estimation from supply optimization, so that each can be treated separately, and joins them through a copula (as defined later in Sec. 4.2). The use of copula further allows us to first derive marginal demand distributions for each OD pair, rather than estimate an entire joint distribution at once. Similarly to cortes2009hybrid , our approach is nonmyopic, in that it considers multiple future scenarios for improving user gains.
In 2013, Ferrucci et al. ferrucci2013pro
presented another approach to demandresponsive supply optimization, whereby the service area is segmented, and each segment is assumed to follow a Poisson distribution with timedependent rate. In contrast, our framework does not presume a particular form of demand distribution, and can rather approximate it through e.g., Quantile Regression (QR, as defined in Sec.
3.2).In 2018, Iglesias et al. iglesias2018data
presented a datadriven framework to rebalance an autonomous ondemand fleet. For demand prediction, they propose only a Deep Neural Network model with LongShort Term Memory (LSTM), which is not interpretable and does not yield uncertainty estimates. In contrast, we compare multiple baseline models, and offer an interpretable linear QR model for demand prediction with uncertainty estimates.
Let us conclude the literature review with two recent (2017) investigations that bear particular resemblance to ours. The first is by AlonsoMora et al. alonso2017predictive , where the authors study urbanscale route optimization for a hypothetical fleet of selfdriving taxis. Similarly to our approach, they too leverage historical data for realtime demand prediction, and construct a marginal probability distribution for each OD pair. Nevertheless, the method of alonso2017predictive relies on the frequentist approach, which can be beneficial for large datasets, but fails to retain enough uncertainty for smaller datasets as in our case.
It should be noted that as in alonso2017predictive , our predictive optimization framework is also amenable to parallelization. On the demand prediction side, the marginal distributions for each OD pair can be obtained in parallel if independent models are used. On the supply optimization side, the linear optimization program can be solved in parallel for each independent sample from the joint distribution of demand.
The second study which particularly resembles ours is by Miller et al. miller2017predictive , where the authors too predictively optimize a small fleet of oncampus autonomous shuttles, with the objective of minimizing expected customer waiting time. However, whereas they focus on predicting key locations for proactive positioning of the vehicles, our objective is to predictively optimize complete vehicle routes. Also, their sources of demand information are users’ requests via a dedicated smartphone app, and sensors that the few vehicle carry around. In contrast, our source of demand information is a network of hundreds of WiFi hotspots, fixed all over campus, which thus provide high spatiotemporal observability. Finally, whereas our framework offers online predictive optimization, miller2017predictive identify key positions offline and leave online demand prediction to future work.
3 Demand Prediction through Quantile Regression
In this Section, we describe how Quantile Regression (QR) can be applied to demand prediction with uncertainty. To facilitate the explanation, we conduct a case study of predictive route optimization for an autonomous shuttle service, which is planned to start operating in the Lyngby campus of the Technological University of Denmark (DTU) in 2019. For this autonomous mobility project, we obtained a dataset of WiFi records, collected from various buildings on the campus for several weeks in 2017 and 2018. We also use this case study later on to describe and test our stochastic route optimization method.
3.1 Data for Demand Estimation
The data in this case study consists of crowd movements between different locations in DTU Lyngby campus, illustrated in Figure 1(a). In each location, WiFi access points probe for wireless devices, and we are given hourly aggregated counts of wireless idevices that change location. The aggregated counts contain no information about individual devices. We use the counts of wireless devices as a proxy for counts of people on campus, and assume that approximation errors (e.g., due to uncovered locations or varying number of devices per person) amount to systematic noise in the observations.
And so, for each of OD pairs, we have a time series of hourly aggregated movements. Each time series ranges from 17Nov2017 00:00 to 14Jan2018 23:59, and we remove all data in 23:0006:59, during which there are nearly no movements on campus (this may also be a maintenance window for the autonomous shuttles). Before fitting prediction models, we partition the lags as 17Nov20177Jan2018 ( days), and 8Jan201814Jan2018 ( days). Figure 1(b) illustrates the various time periods.
3.2 Quantile Regression
For each OD pair, there exists a latent marginal distribution of number of movements at lag . We estimate the density of this distribution through quantiles of its Cumulative Distribution Function (CDF): for any , the ’th quantile is the smallest real that the CDF maps to .
And so, for each and each OD pair, we use Quantile Regression (QR) to estimate , the ’th quantile of the number of observed movements at lag , . The approximated marginal distribution is then defined as following:
(1) 
where is the estimated ’th quantile at lag . To illustrate, Figure 3 shows an example of for one of our QR models, which we define later.
Before fitting models, we transform the time series for each OD pair in two steps. First, because is nonstationary, we difference it as , which is stationary in for all OD pairs, as we verify through augmented DickeyFuller test with . Second, we remove from the Christmas holiday period: [23Dec2017, , 1Jan2018], during which there were nearly no movements on campus. Using the transformed time series, we train each QR model on (without Christmas), and then evaluate its predictive performance on , by calculating the following measures for each OD pair:

Mean Titled Loss:
where .

Percent of measured values that fall within the quantile range:

Mean length of quantile range:

Number of pairwise quantile crossings:
A QR prediction model is better if it achieves lower total over all OD pairs. The other measures are averaged for each model over all OD pairs, and we use them as additional indicators of performance quality: it is preferred to bring closer to while simultaneously obtaining lower mean , and it is also preferred to obtain fewer quantile crossings.
For the remainder of this Section, we define the QR prediction models in our experiments. Table 1 summarizes the overall predictive performance of all QR models, and highlights the best performing model.
Model  Total  Mean  Mean  Mean Crossings 

w. Seasonality  
w/o Seasonality  459.428  
w. Sorting  
w. Exams  
w/o first week  
FC Linear, Common LR  
FC Linear, LR per OD  
FC Linear, LR per OD and  
, Common Params  
, Params per OD  
, Params per OD and  
All ODs Together  
similar to  
All Quantiles Together  
Parameter Sharing  
Common Params  
Params per 
3.2.1 Modeling by Historical Percentiles
For any lag , let denote the timeofday of , and let denote the dayofweek of , so that . For any OD pair, let denote the observed number of movements, and let denote the modeled ’th quantile.
Our first model is a naive Historical Percentiles model , in which for each OD pair and each , is the ’th percentile of
Hereafter, the superscript denotes that we fit the prediction model independently for each and each OD pair.
As Table 1 shows, achieves far from , which indicates that the underlying time series is not a simple repetition of historical patterns. Moreover, if instead of we fit on the ’th and ’th quantiles – namely the minima and maxima in the train set – we obtain mean only , which is far from . This means that for some OD pairs, there are movement counts in the test set that lie outside of the range of values in the train set, which confirms that historical percentiles are not good predictors for this data.
3.2.2 Linear Quantile Regression
We proceed to more flexible Linear Quantile Regression (LQR) models, so that for each OD pair and independently:
(2) 
where is iff , is iff , and
’s are parameters to be estimated. Here too, as for all other QR prediction models, the loss function to be minimized is the total
. We train all LQR models via Iterative Weighted Least Squares with an Epanechnikov kernel and HallSheather bandwidth selection hall1988distribution , and clip any negative predictions to zero.Our first linear model achieves better total and significantly better mean than . Next, we attempt to further improve performance by removing seasonality from the data. That is, we transform into , where and are, respectively, the mean and standard deviation of . However, performs worse on than does on .
We thus proceed to run LQR again on , and this time eliminate quantile crossings by sorting in ascending order for every . This results in a better performing model , and as an example, Figure 3 illustrates its predictions for one OD pair.
During , exams took place on campus. Hence we next add to each data vector a binary feature , which indicates whether the data is in the exam period, namely:
(3) 
The corresponding model is , which we train with quantile sorting. yields nearly the same total as does , and mean noticeably closer to , with a small increase in mean . We thus designate as the best performing model so far.
3.2.3 Deep Quantile Regression
Whereas LQR models are parametric and linear, Deep Neural Network (DNN) models provide a nonparametric approximation of the true predictive distribution. As a baseline, we wish to first obtain similar performance for as for , hence our first deep model is as in Figure 4
: a fullyconnected, feedforward neural network, where a single linear unit combines all input features. While training
, we use the first week of as a validation set for early stop. Consequently, we aim for to first perform as well as , for which we have omitted the first week from the train set too.Hyperparameter selection can strongly affect the performance of DNN, hence before training, we use Bayesian Optimization to tune a learning rate in . To this end, we partition into three subsets: the first in , the following in , and the remaining days in . In every iteration, the optimizer tries a different learning rate to train the DNN on , while using for early stop, and then obtains the total of the trained DNN on . The optimal learning rate is the one which minimizes this total after at most iterations. Note that while the same learning rate may be common to several DNN models, we still fit each of them independently for each and each OD pair.
First, we optimize a single learning rate, common to all OD pairs and all . The optimization process is described in Figure 5, and the optimal common learning rate turns out to be . Nevertheless, this learning rate yields worse performance than , as shown in Table 1. We thus next optimize the learning rate independently for each OD pair, and obtain a distribution of learning rates as in Figure 4(b). This results in a little improvement over the common learning rate, but performance is still noticeably worse than . Finally, we optimize the learning rate independently for every combination of OD and , as summarized in Figure 6(a), but gain no performance improvement.
In summation, deep QR did not perform as well as linear QR, despite learning rate optimization. The dataset is thus not large enough for effective deep learning with backpropagation, hence we do not attempt to further improve performance, e.g., using kernel regularization, dropout, or recurrent units. As an alternative attempt at nonparametric QR, we next experiment with Gradient Boosting.
3.2.4 Gradient Boosting
Gradient Boosting (GBoost) friedman2002stochastic builds an ensemble of regression trees incrementally using gradient descent. As in previous prediction models, the loss function being boosted is total . Figure 6 shows an example of for one OD pair and a single .
As for , we partition as , and use Bayesian Optimization to find the best hyperparameters for : learning rate in , maximum tree depth in , and maximum number of trees in . Figure 6(b) shows that for all , the optimal learning for follows a similar pattern of distributions as for (Figure 6(a)
), albeit with greater variance for
. Nevertheless, Table 1 shows that contrary to , the performance of improves when optimization is carried out separately for each OD and . Compared to , the best performing has similar total and mean , but worse mean .3.3 Modeling SpatioTemporal Dependencies
For every type of prediction model above (LQR, DNN, GBoost), we have built an independent model for each of the combinations of OD pair and . Let us now examine models which process together multiple OD pairs or multiple quantiles, and so may be able to exploit spatiotemporal dependencies in the data. Indeed, Figure 8 shows that for some couples of OD pairs, the time series of movements are strongly correlated.
3.3.1 Linear Quantile Regression on Multiple Time Series
We begin with a linear QR model which processes all OD pairs together, independently for each , as:
(4) 
where for an arbitrary, prefixed ordering of the OD pairs, is binary and indicates whether the data corresponds to the ’th OD pair. We train and test using the same methods as for . Table 1 summarizes the performance of , which is seen to be significantly worse than .
Next, we try linear model , where for each OD pair independently:
(5) 
where is a hyperparameter, and is the ’th past lag of the ’th OD pair. is therefore similar to , namely Vector Autoregression of order
, where each of several response variables is modeled on the last
lags of all response variables. We have experimented with , and noticed that performance deteriorated as increased. Table 1 summarizes the performance of for , which is seen to perform better than , but still worse than .3.3.2 Multivariate Deep Quantile Regression
Next, we try multivariate variants of the DNN models which we built earlier. These variants process multiple quantiles together, and so take less time to train and optimize than the earlier univariate, independent models.
Our first deep multivariate model is as illustrated in Figure 8(a), such that for each OD pair independently, all quantiles are modeled together. We again partition as , and use Bayesian Optimization to find the best learning rate in , common to all OD pairs, which turns out to be . The performance of in Table 1 is seen to be worse than , but better than the univariate models.
Modeling multiple quantiles together can thus result in improved performance. We next try to further improve performance by introducing a hidden layer between the input and the output layers, as in Figure 8(b), so that different quantiles share some trainable parameters. Table 1 shows that this model, which we name , performs badly.
3.4 Gradient Boosting on Multiple Time Series
Finally, we fit Gradient Boosting (GBoost) models on the time series of all OD pairs together, independently for each . As in Section 3.2.4, we use Bayesian Optimization to find the best hyperparameters, and summarize results in Table 1.
First, we optimize a single set of hyperparameters, common to all . This results in model , which outperforms , our previously best performing prediction model. That is, compared to , achieves better total and significantly better mean , with a small decrease in mean .
Next, we optimize hyperparameters independently for each . The resulting model achieves similar total and better mean vs. , but does not improve over . In summation, yields the best prediction quality among all QR models.
3.5 Conclusions on Demand Prediction with Quantile Regression
We have experimented with four types of QR models for predicting distributions of future demand: naive Historical Percentiles, Linear QR, Deep QR, and Gradient Boosting QR. Our experiments include independent models for each OD pair and each , as well as multivariate models which simultaneously process multiple OD pairs or multiple quantiles. Table 1 summarizes the experiment results.
Historical Percentiles performs rather poorly, which indicates that the data at hand does not follow a simple repeating pattern. For Linear and Deep QR, independent models mostly yield better prediction quality than multivariate models. For Gradient Boosting QR, however, multivariate models outperform independent models, such that is the overall best performing prediction model.
In conclusion, model can exploit spatiotemporal dependencies in the data to some extent. Indeed, Figure 8 shows that for some OD pairs, the time series of movement counts are strongly correlated.
4 Supply Optimization
In this Section, we describe in detail the supply optimization part of our predictive stochastic optimization framework. First, we discuss usual methods for dealing with stochastic linear programs, then present our approach to it. Then, we look at the methods used to achieve predicitive optimization, and finally we apply this framework on the test case of DTU Lyngby, for which we have already obtained marginal demand distributions in the previous section.
4.1 Stochastic Transit Network Design Problem
The theoretical setting we consider is that of a transit network, represented by a directed graph , where is the set of nodes, and is the set of edges (links). The set of nodes consists of a set of demand nodes where movements originate and arrive, as well as a set of bus stop nodes . The Transit Route Network Design Problem (TRNDP) has the following objective: find an optimal transit route in along with an optimal frequency setting for the transit service.
In its classic form, TRNDP is usually formulated as a static linear program with parameters that take on deterministic values. In practice, however, parameters are often not static but stochastic, so that they follow some probability distributions. Approaches for dealing with such stochasticity commonly reduce the distributions to point estimates. One such approach is to use the expected values of parameter distributions. While the expected value formulation may be simple to obtain, its solution nonetheless lacks robustness. To overcome this weakness, maximin robust optimization can be used, whereby the maximum of the support () of the parameters is taken instead of their expected values (). Doing so ensures that the solution is feasible for all possible combinations of parameters. This approach is often called the worstcase scenario approach, because the solution space encapsulates all possible combinations, including the worstcase scenario.
Nevertheless, there are drawbacks to both approaches for dealing with stochasticity in parameters. First, by reducing the parameter distributions to point values, both approaches discard of useful information in the full distributions. Second, while the worstcase scenario approach yields robust optimization, it can sometimes be overly conservative, and so miss potentially optimal solutions. Therefore, we next devise a robust solution method for stochastic TRNDP, which takes advantage of the full predictive distributions of parameters. As (stochastic) TRNDP is an NPhard combinatorial problem, solution methods for TRNDP are often approximate, and so is our method.
Any instance of a static linear program is uniquely defined by its parameters. For a stochastic linear program, we can generate a static instance by setting each parameter as a sample from its distribution. Consequently, our solution method for stochastic TRNDP generates multiple instances by independent sampling, and solves each instance separately. Each solution is an optimal route for the corresponding instance, and different instances may yield the same solution. We rank these routes by their frequency of occurrence in the set of independent solutions, and choose the route with highest frequency as the overall optimal solution.
4.2 Sampling from the Joint Demand Distribution
As discussed in the previous subsection, to obtain multiple instances of the stochastic TRNDP problem, we need to be able to jointly sample from the marginal distributions of the parameters. In order to achieve joint sampling of the distributions, we require a function that pieces together the individual distributions to form a joint distribution. The function, in essence, retains the correlation structure between the different parameters, and provides us with a mean of obtaining joint samples. In this Section, we will introduce the readers to Gaussian copula, and how it is used in our predictive stochastic optimization framework.
To sample from the joint distribution of demand over all OD pairs, we first construct a Gaussian copula, defined as follows. Let be the marginal cumulative distribution functions (CDFs) of demand parameters for each OD pair , respectively. The random vector is distributed per the joint cumulative distribution function (CDF) of its components:
(6) 
If each of is continuous, then by Sklar’s theorem, is uniquely defined as a function of these marginal CDFs, namely
(7) 
is then called a copula, and by using this formulation, we decouple the correlation structure in from the individual marginals. A commonly used type of copula is the Gaussian copula,
(8) 
where
is the joint CDF of the zeromean multivariate normal distribution with covariance matrix
.We can use the Gaussian copula to obtain samples from the joint distribution of . To obtain such a sample, we first draw from (Eq. 8). Then, using the multivariate normal CDF , we transform into . Finally, we map to the desired sample through the inverses of the marginal CDFs, as
(9) 
4.3 Covariance Matrix for the Copula
In this study, we assume that only travel demands are stochastic, while other parameters, such as travel time between nodes, remain constant over time. To construct the copula, we assume for simplicity that the correlation structure of the data is timeinvariant. As such, it suffices to compute the covariance matrix of the copula offline once, based on historical data. We note, however, that this computation can be efficiently maintained online as new data becomes known, so that the copula retains the updated state of correlation.
To construct the covariance matrix for the Gaussian copula in our framework, we use the observed movement counts in , as following. First, for the demand nodes in the TRNDP graph , we calculate for each OD pair its empirical marginal CDF , based on all its historically observed movement counts in . For each , we then collect the corresponding observations of all OD pairs into a travel demand vector of size ,
(10) 
where the travel demand from any demand node to itself, i.e. for , is . Next, we transform elementwise into the Gaussian layer as follows:
(11) 
where is the standard univariate normal CDF.
Finally, we collect the transformed vectors for all into a matrix , and calculate the covariance matrix in the transformed space as:
(12) 
where is given by the columnwise expectation of the matrix . is then the covariance matrix of the Gaussian copula
4.3.1 Route generation
In a transit route network design problem (TRNDP), the network structure is not usually explicitly defined, which gives rise to a huge number of possibilities for routes on the network. However, a sizable proportion of these routes may not be desirable. Therefore, we first find a set of candidate routes based on several criteria via a route generation procure.
First, a set of candidate bus stops is established from all bus stops in . The candidate set of stops are constructed considering door to door accessibility as well as the connectivity to the public transport services beyond the campus. Accordingly, the candidate bus stops include the feasible stop locations in the vicinity of demand hotspots, and the existing bus stops that are already being served by conventional bus lines. These candidate bus stops are indexed arbitrarily with , where is the number of candidate bus stops. These indices remain the same throughout in our formulation.
After determining the set of candidate stops, the shortest path for buses between each pair of candidate stops is obtained. The routes were then constructed by incrementally adding candidate bus stops to the end of the routes. The stops are connected to the adjacent bus stops by the shortest bus path. The quality of each route is evaluated before another candidate bus stop is added to the route. We consider a route to be desirable if the route has little overlapping path, and a route length between 0.75 and 1.5 kilometres. To determine how much the route paths are overlapping on each other, we plot the route paths on a twodimensional discretized space (see Figure 11). The fitness of a route is then given by the proportion of grids with multiple route paths passing through them. Routes with more than 25 percent of the paths overlapping on each other are discarded, and the procedure repeats by adding another candidate bus stop to the end of the route until all the bus stops have been exhausted or that the route length has exceeded the desired range. After the routes have been generated, the temporal route lengths can be precalculated, assuming a constant velocity of travel throughout.
In this route generation procedure, circular routes, i.e. routes that have the same departure and arrival station, are permitted. However, using the procedure described above, one would get multiple distinct routes which represent the same route albeit the difference in the departure and arrival station as shown in Figure 12. Thus, we circumvent the problem by limiting the departure and arrival station of any circular route to the station with the smallest index.
This route generation procedure effectively eliminates the decision variables for selecting bus stops for each route, and also for calculating the temporal route length, based on the selected stops, in order to determine the frequency of the service. Instead, they are replaced with binary decision variables, one for each routecandidate, to select routes form the set of candidates.
4.4 TRNDP as a maxutility maxflow problem
As discussed in Section 2.2.1, TRNDP is usually solved independent of the frequency determination problem. However, some authors formulated either a single highly nonlinear problem, or a bilevel mixed integer program to solve the two problems simultaneously. In these cases, the problems became very complex and rely on heuristics to solve for suboptimal solutions.
In this Section, we shall formally describe the formulation of a maxutility maxflow problem to solve the TRNDP and frequency determination problem simultaneously. The objective of the problem is to maximize the total time savings (utility) and the flow of demands from the source nodes to a dummy sink node node . The source nodes are made up of OD nodes, each representing an OD pair. Then we have nodes which represent the candidate routes, and a special node to represent the ”walking route”. In essence, if none of the routes have enough capacity or does not help to reduce travel time for demands of any OD pair, the demands will flow to node , which indicates that the passengers travelling between OD will have to walk from its origin to the destination. Finally, we also have nodes that indicates the number of buses assigned to the candidate routes.
An illustration of the formulation can be found in Figure 13. The maxutility maxflow formulation can be interpreted as a twostage process as shown in the figure. In stage one, the demands for all OD pairs are assigned to multiple candidate routes. Then in stage two, the number of buses allocated to the routes are determined.
The notations that are used in the formulation that follows are given in the list below.
Notation  Definition 
Indices  
Origin node index.  
Destination node index.  
Candidate route index.  
Number of buses allocated index.  
Parameters  
Utility for the transit assignment stage, given by the time savings from taking a ride on candidate route compared to walking directly from origin to destination , for each passenger traveling from origin to destination .  
Utility for the buses allocation stage, given by the negative of the average waiting time for candidate route , with buses allocated to the route.  
Temporal route length of candidate route in minutes.  
Total travel demand from origin to destination .  
Walk time between origin and destination .  
Ride time on candidate route for passengers traveling from origin to destination .  
Total walk time from origin to boarding node on candidate route , and walk time from alighting node on candidate route to destination .  
Maximum capacity of each shuttle bus.  
Number of routes permitted.  
Number of origindestination pairs.  
Number of candidate routes.  
Fleet size.  
Decision variables  
Demands flowing from node to node .  
Demands flowing from node to node  
Equals to 1 if buses are allocated to candidate route .  
Frequency of candidate route when allocated buses. 
In this formulation, we maximize the total time savings of all passengers as such:
(13) 
where and represent the edge costs for stages one and two respectively. In stage one, the edge costs are given by the time savings per passenger, namely:
(14) 
Note that does not take into account the wait time at the bus stop, which will be taken care of in . is defined as the average waiting time for candidate route , given that the route is allocated buses, which is given by the following:
Comments
There are no comments yet.