Online Framework for Demand-Responsive Stochastic Route Optimization

by   Inon Peled, et al.
Nanyang Technological University

This study develops an online predictive optimization framework for operating a fleet of autonomous vehicles to enhance mobility in an area, where there exists a latent spatio-temporal distribution of demand for commuting between locations. The proposed framework integrates demand prediction and supply optimization in the network design problem. For demand prediction, our framework estimates a marginal demand distribution for each Origin-Destination pair of locations through Quantile Regression, using counts of crowd movements as a proxy for demand. The framework then combines these marginals into a joint demand distribution by constructing a Gaussian copula, which captures the structure of correlation between different Origin-Destination pairs. For supply optimization, we devise a demand-responsive service, based on linear programming, in which route structure and frequency vary according to the predicted demand. We evaluate our framework using a dataset of movement counts, aggregated from WiFi records of a university campus in Denmark, and the results show that our framework outperforms conventional methods for route optimization, which do not utilize the full predictive distribution.



There are no comments yet.



Predicting origin-destination ride-sourcing demand with a spatio-temporal encoder-decoder residual multi-graph convolutional network

With the rapid development of mobile-internet technologies, on-demand ri...

Contextualized Spatial-Temporal Network for Taxi Origin-Destination Demand Prediction

Taxi demand prediction has recently attracted increasing research intere...

Spatio-Temporal Analysis of On Demand Transit: A Case Study of Belleville, Canada

The rapid increase in the cyber-physical nature of transportation, avail...

Estimating Latent Demand of Shared Mobility through Censored Gaussian Processes

Transport demand is highly dependent on supply, especially for shared tr...

Modeling Censored Mobility Demand through Quantile Regression Neural Networks

Shared mobility services require accurate demand models for effective se...

A simulation sandbox to compare fixed-route, flexible-route transit, and on-demand microtransit system designs

With advances in emerging technologies, options for operating public tra...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Various institutions around the world are increasingly incorporating autonomous vehicle fleets into their on-campus mobility solutions ntu2018autonomous ; linc2018largest ; driverless2018 . Whereas traditional bus services often follow fixed itineraries, autonomous vehicle fleets can have their itineraries dynamically adapted in real-time, e.g., per changing demand for mobility during a work day. In other words, autonomous mobility services are amenable to demand-responsive routing, if demand can be predicted ahead of time.

Fortunately, institutional campuses are often finely meshed with WiFi access points, which can in turn be used as sensors of crowd presence sevtsuk2009mapping ; meneses2012large . By using WiFi information to detect movements of wireless devices across campus, an estimation of current and future demand for mobility can be made. An opportunity thus emerges for real-time, demand-responsive autonomous mobility.

The transport literature is rich with methods for demand prediction and route optimization. Multiple studies on these methods concentrate only on one of the two aspects, leaving the other to external or future works. Moreover, multiple approaches do not account for uncertainty in the demand, and use point estimates instead of more informative full distributions. In this study, however, we offer a framework for demand-responsive routing, which explicitly incorporates both demand prediction and route optimization, while leveraging demand uncertainty for robustness in optimization.

1.1 Overview of Our Solution Framework

Figure 1:

Our online predictive optimization framework. (a) Data about crowd movements is collected via WiFi probing. (b) The marginal distribution of each OD pair is estimated through Quantile Regression. (c) A copula combines the marginal distributions into a joint demand distribution. (d) Samples are drawn from the joint distribution. (e) An optimal route is computed for each sample. (f) The most frequently computed optimal route is selected.

The setting in which we apply our framework is that of an autonomous mobility service, for instance, a fleet of autonomous shuttles in a university or hospital. People move in the serviceable area between connected locations, e.g., buildings connected by roads. These movements are captured by a network of sensors, e.g., a network of WiFi hotspots probing people’s wireless devices.

We partition the temporal dimension into consecutive lags (e.g., ), so that for each OD pair, in every lag , the corresponding number of movements follows some latent distribution. This is the marginal distribution for one OD pair at lag , which may be correlated with the marginals of other OD pairs at that lag. Overall, there also exists a latent joint distribution for all OD pairs at lag , which yields the number of simultaneous movements in the serviced area while accounting for all correlations.

The number of movements between each OD pair represents the demand for the mobility service, while the fleet of vehicles represents the supply. Based on the data collected at lag , our online framework both predicts the demand and accordingly optimizes the supply for , as illustrated in Figure 1. For demand prediction, the framework estimates the marginal distributions, which we offer to do through Quantile Regression. For supply optimization, the framework computes the best service routes using a scenario-based method, which samples from an estimated joint distribution, and then solves a linear optimization program each sample. As a key link between demand prediction and supply optimization, our framework constructs a copula, which combines the marginal distributions into a joint distribution.

This predictive optimization framework thus not only caters explicitly for both demand prediction and supply optimization, but also keeps them decoupled thanks to the use of a copula. As the copula preserves all correlations between OD pairs in the joint distribution, we are free to work more easily with marginal distributions during demand prediction. We advocate here for estimating each marginal by estimating several quantiles of its Cumulative Distribution Function. Importantly, our framework retains uncertainty in travel demand (rather than reduce it to point estimates), and takes advantage of this uncertainty during route optimization. Our optimization scheme is robust: instead of pre-defining a set of scenarios for optimization, we independently sample travel demand vectors from the copula, solve an optimization problem for each vector, and aggregate the solutions into a single optimal result.

1.2 Summary of Contributions

The main contributions of this paper are thus as follows:

  • We develop a predictive optimization framework for demand-responsive transit network design, which decouples demand prediction from adaptive supply optimization, and joins them through a copula.

  • On the demand side, we develop a Quantile Regression model for predicting the marginal distribution of movement counts for each Origin-Destination pair.

  • On the supply side, we devise a robust optimization method for demand-responsive transit network design under stochastic demand, based on a route generation procedure and a novel max-utility max-flow formulation.

  • We demonstrate the capabilities of our predictive optimization framework through a case study based on real-world WiFi data.

1.3 Paper Structure

The rest of this paper is structured as follows. In Section 2, we review literature on demand prediction and supply optimization under uncertainty, and compare our study to previous studies in this research field. In Section 3, we present models for demand prediction with uncertainty, which we offer to do through Quantile Regression. There, we also apply the prediction models to a case study of autonomous mobility in a Danish university campus, based on actual WiFi records. In Section 4, we present a novel method for online, demand-responsive transit network design under stochastic demand. In Section 5, we apply this stochastic optimization method to the aforementioned case study. Finally, in Section 6, we summarize our findings, draw conclusions and identify future research directions.

2 Literature Review

2.1 Demand Prediction with Uncertainty

There are several benefits for preserving uncertainty in predictions, rather than providing only point estimates, such as the predictive mean or standard deviation. Uncertainty in predictions conveys a degree of confidence, so that decisions can then be made more intelligently. For example, when given a full predictive distribution, a supply optimization method can prepare for a full range of possible scenarios, from best case to worst case. In contrast, giving only the predictive mean and standard deviation may be misleading, as e.g., the mean of a multi-modal distribution may well lie in a neighborhood of low probability.

Quantile Regression (QR) can be used to approximate a full predictive distribution by estimating several of its quantiles, without assuming any particular parametric form for the distribution. The QR model itself can follow either of several functional forms, such as linear or splines koenker2005quantile , non-linear or non-parametric with Gaussian Processes antunes2017review ; yang2018power , or vector-valued sangnier2016joint . As more quantiles are used, the approximation which QR yields becomes more precise and more robust to artifacts in the true predictive distribution, such as multi-modality and non-symmetry.

In this paper, we evaluate the proposed framework through a case study, for which linear QR outperforms other QR models. Nevertheless, the framework supports multiple models other than QR for predicting arbitrary marginal distributions. For a moderately sized dataset as in our case study, Bayesian Inference


can also be used with proper modeling of random variables and their dependency structure. Alternatively, Deep Neural Network models can be constructed along with prediction intervals, as described in

mazloumi2011prediction and khosravi2011comprehensive .

2.2 Transit Route Network Design Problem (TRNDP)

The transit route network design problem (TRNDP) deals with the planning of optimal routes for transit services. Extensive research into TRNDP variants and solution methods has been conducted since the late 1960s. Most studies had different assumptions and simplifications in their models due to the complexity of the problem, which is combinatorial in nature. We refer the readers to Kepaptsoglou2009-sw

for an extensive review of previous studies on TRNDP up to year 2007. The author classified the studies according to their objective functions, decision variables, transit network structure, demand patterns and characteristics, as well as the solution methods. Here we shall focus on two aspects relevant to our study: the mathematical programming formulation for TRNDP, and demands elasticity and stochasticity.

2.2.1 Mathematical programming formulation for TRNDP

The transit route planning process typically consists of five steps, as outlined in Ceder1986-lk . The process is sequential, in that the decisions at each step become the input for the next step. The process starts with a network design problem, where the bus stops and routes are decided. Then the frequencies are determined based on the fleet size, followed by timetabling and scheduling, and finally the scheduling of drivers. However, it is clear that while the solutions obtained at each step of the process are optimal, they may not necessarily be optimal overall. Despite the temptation to formulate a single model to globally optimize the bus planning procedure, one must note the complexity involved, as the problem at each step of the process is highly combinatorial. Here we shall review the steps relevant to our study: network design, and frequency determination.

For network design, TRNDP instances are usually formulated based on a network graph, where the nodes represent transit stops, and edges represent connective paths between nodes. The objective is then to select which transit stops to serve, and the order of visiting them in each of the routes, based on travel demands and generalized costs. A solution method for network design gives the optimal routes and their corresponding temporal route length.

The frequency determination problem, on the other hand, is to optimally allocate vehicles to the different routes. This allocation largely depends on the temporal route lengths. The integration of the two problems can increase the complexity substantially, and often results either in a single highly nonlinear model Cipriani2012-zm ; Fan2008-lg ; Fan_Wei2006-mj , or a bi-level mixed integer model Szeto2011-qj ; Szeto2014-pn .

2.2.2 Demands elasticity and stochasticity

While some previous studies have considered demand elasticity, which is an inherent property of a real transit network, most studies have only considered fixed demand. In Fan_Wei2006-mj , Fan and Machemehl have attributed this to the NP-hard complexity of TRNDP. The consideration of elastic demands often results in an iterative procedure, which repeatedly chooses routes structure and demand splits until some convergence criterion is achieved Cipriani2012-yr ; Lee_Young-Jae2005-on ; Fan_Wei2006-lc

. However, this would mean that no optimal solution can be guaranteed due to the heuristic nature of the problem. Moreover, in this method, demands are not considered as directly dependent on the service quality. Instead, the elasticity is internalized in demands stochasticity alongside other factors that affect demand stochasticity, e.g., weather and seasonal variations.

One common method of dealing with demands stochasticity is by doing robust optimization. Robust optimization comes in several forms, including minimax robust optimization An2015-tv ; Laporte2010-ud ; Lou2009-et ; An2016-bx , and scenario-based robust optimization. While minimax robust optimization guarantees feasibility for all possible demand scenarios, it can sometimes be overly conservative as acknowledged in An2015-tv

. Also, the method does not make good use of the information embedded in the distribution of demands data, because taking the worst-case scenario essentially reduces the probability distribution to a single point value. In contrast, we optimize routes based on quantile estimates of the true demand distribution, and thus obtain routes which are optimal in expectation.

Another commonly used method to improve robustness is scenario-based robust optimization. Stochasticity is then generated either by adding random perturbations to the average demands, by explicitly constructing demand scenarios for different seasons and/or time of the day, or by random sampling of parameters from their probability densities. In Amiripour_S_M_Mahdi2014-om Amiripour et al. consider TRNDP with variable demands. To simulate demand stochasticity, they generate perturbations of a demand matrix, each by adding stochastic noise. While the method does add robustness to the solution by introducing random noises to the demands, the random perturbations may not necessarily reflect the stochasticity that can be observed in collected demand data. In contrast, our framework generates demand stochasticity not through random permutation, but by estimating the true predictive distribution itself. As more data is collected over time, the framework updates this estimation of the true joint distribution, so that its samples from the distribution conform to updated stochastic information.

Another paper by Amiripour et al. Amiripour2014-rp studied the design of a large-scale bus network with seasonal demand variation. They propose a scenario-based route optimization method based on metaheuristic methods. Their mathematical programming model yields an optimal route for a whole year by enumerating scenarios for each season, and minimizing the weighted sum of the objective for each season. This study bears some similarities to our paper, but one major difference is that they assume fixed demands, whereas we consider demands to be dynamic and stochastic.

2.3 Combining Demand Prediction with Supply Optimization

A seminal example of combining demand prediction with supply optimization is found in Stein1978DAR (1978). There, the author studies itinerary optimization for Dial-a-Ride services, and assumes that users’ requests are independently drawn from a uniform spatial distribution over the serviced area. This assumption is overly simplifying for a shuttle service on a university campus, where student movements are often spatio-temporally correlated and unevenly distributed.

More recent combined methods can be traced back to cortes2009hybrid (2009), who use a hybrid state-space model for both demand estimation and route optimization. The authors deal with delivery-and-pickup with predetermined stations, and combine both solution components within the formulation of the objective function. As noted in ichoua2006exploiting , such coupling is also present in a line of precedent paper by Powell and his team (powell1988comparative ; powell1996stochastic ; godfrey2002adaptive ; powell2003stochastic ; spivey2004dynamic ). Contrary to cortes2009hybrid , our approach decouples demand estimation from supply optimization, so that each can be treated separately, and joins them through a copula (as defined later in Sec. 4.2). The use of copula further allows us to first derive marginal demand distributions for each OD pair, rather than estimate an entire joint distribution at once. Similarly to cortes2009hybrid , our approach is non-myopic, in that it considers multiple future scenarios for improving user gains.

In 2013, Ferrucci et al. ferrucci2013pro

presented another approach to demand-responsive supply optimization, whereby the service area is segmented, and each segment is assumed to follow a Poisson distribution with time-dependent rate. In contrast, our framework does not presume a particular form of demand distribution, and can rather approximate it through e.g., Quantile Regression (QR, as defined in Sec.


In 2018, Iglesias et al. iglesias2018data

presented a data-driven framework to rebalance an autonomous on-demand fleet. For demand prediction, they propose only a Deep Neural Network model with Long-Short Term Memory (LSTM), which is not interpretable and does not yield uncertainty estimates. In contrast, we compare multiple baseline models, and offer an interpretable linear QR model for demand prediction with uncertainty estimates.

Let us conclude the literature review with two recent (2017) investigations that bear particular resemblance to ours. The first is by Alonso-Mora et al. alonso2017predictive , where the authors study urban-scale route optimization for a hypothetical fleet of self-driving taxis. Similarly to our approach, they too leverage historical data for real-time demand prediction, and construct a marginal probability distribution for each OD pair. Nevertheless, the method of alonso2017predictive relies on the frequentist approach, which can be beneficial for large datasets, but fails to retain enough uncertainty for smaller datasets as in our case.

It should be noted that as in alonso2017predictive , our predictive optimization framework is also amenable to parallelization. On the demand prediction side, the marginal distributions for each OD pair can be obtained in parallel if independent models are used. On the supply optimization side, the linear optimization program can be solved in parallel for each independent sample from the joint distribution of demand.

The second study which particularly resembles ours is by Miller et al. miller2017predictive , where the authors too predictively optimize a small fleet of on-campus autonomous shuttles, with the objective of minimizing expected customer waiting time. However, whereas they focus on predicting key locations for proactive positioning of the vehicles, our objective is to predictively optimize complete vehicle routes. Also, their sources of demand information are users’ requests via a dedicated smartphone app, and sensors that the few vehicle carry around. In contrast, our source of demand information is a network of hundreds of WiFi hotspots, fixed all over campus, which thus provide high spatio-temporal observability. Finally, whereas our framework offers online predictive optimization, miller2017predictive identify key positions offline and leave online demand prediction to future work.

3 Demand Prediction through Quantile Regression

In this Section, we describe how Quantile Regression (QR) can be applied to demand prediction with uncertainty. To facilitate the explanation, we conduct a case study of predictive route optimization for an autonomous shuttle service, which is planned to start operating in the Lyngby campus of the Technological University of Denmark (DTU) in 2019. For this autonomous mobility project, we obtained a dataset of WiFi records, collected from various buildings on the campus for several weeks in 2017 and 2018. We also use this case study later on to describe and test our stochastic route optimization method.

3.1 Data for Demand Estimation

Figure 2: Data in our case study: (1(a)) Observed locations in DTU Lyngby Campus, each covering one or more buildings. (1(b)) Observed hourly movements between all locations, with dates specified for Mondays.

The data in this case study consists of crowd movements between different locations in DTU Lyngby campus, illustrated in Figure 1(a). In each location, WiFi access points probe for wireless devices, and we are given hourly aggregated counts of wireless idevices that change location. The aggregated counts contain no information about individual devices. We use the counts of wireless devices as a proxy for counts of people on campus, and assume that approximation errors (e.g., due to uncovered locations or varying number of devices per person) amount to systematic noise in the observations.

And so, for each of OD pairs, we have a time series of hourly aggregated movements. Each time series ranges from 17-Nov-2017 00:00 to 14-Jan-2018 23:59, and we remove all data in 23:0006:59, during which there are nearly no movements on campus (this may also be a maintenance window for the autonomous shuttles). Before fitting prediction models, we partition the lags as 17-Nov-20177-Jan-2018 ( days), and 8-Jan-201814-Jan-2018 ( days). Figure 1(b) illustrates the various time periods.

3.2 Quantile Regression

For each OD pair, there exists a latent marginal distribution of number of movements at lag . We estimate the density of this distribution through quantiles of its Cumulative Distribution Function (CDF): for any , the ’th quantile is the smallest real that the CDF maps to .

And so, for each and each OD pair, we use Quantile Regression (QR) to estimate , the ’th quantile of the number of observed movements at lag , . The approximated marginal distribution is then defined as following:


where is the estimated ’th quantile at lag . To illustrate, Figure 3 shows an example of for one of our QR models, which we define later.

Figure 3: Example output of a Quantile Regression model (). Top: estimated quantiles for every OD pair at one lag ( 2018-01-08 20:00), where axes are logarithmic between and . Bottom: estimated quantiles for the entire test set of one OD pair ().

Before fitting models, we transform the time series for each OD pair in two steps. First, because is non-stationary, we difference it as , which is stationary in for all OD pairs, as we verify through augmented Dickey-Fuller test with . Second, we remove from the Christmas holiday period: [23-Dec-2017, , 1-Jan-2018], during which there were nearly no movements on campus. Using the transformed time series, we train each QR model on (without Christmas), and then evaluate its predictive performance on , by calculating the following measures for each OD pair:

  • Mean Titled Loss:

    where .

  • Percent of measured values that fall within the quantile range:

  • Mean length of quantile range:

  • Number of pairwise quantile crossings:

A QR prediction model is better if it achieves lower total over all OD pairs. The other measures are averaged for each model over all OD pairs, and we use them as additional indicators of performance quality: it is preferred to bring closer to while simultaneously obtaining lower mean , and it is also preferred to obtain fewer quantile crossings.

For the remainder of this Section, we define the QR prediction models in our experiments. Table 1 summarizes the overall predictive performance of all QR models, and highlights the best performing model.

Model Total Mean Mean Mean Crossings
w. Seasonality
w/o Seasonality 459.428
w. Sorting
w. Exams
w/o first week
FC Linear, Common LR
FC Linear, LR per OD
FC Linear, LR per OD and
, Common Params
, Params per OD
, Params per OD and
All ODs Together
similar to
All Quantiles Together
Parameter Sharing
Common Params
Params per
Table 1: Predictive performance of Quantile Regression models.

3.2.1 Modeling by Historical Percentiles

For any lag , let denote the time-of-day of , and let denote the day-of-week of , so that . For any OD pair, let denote the observed number of movements, and let denote the modeled ’th quantile.

Our first model is a naive Historical Percentiles model , in which for each OD pair and each , is the ’th percentile of

Hereafter, the superscript denotes that we fit the prediction model independently for each and each OD pair.

As Table 1 shows, achieves far from , which indicates that the underlying time series is not a simple repetition of historical patterns. Moreover, if instead of we fit on the ’th and ’th quantiles – namely the minima and maxima in the train set – we obtain mean only , which is far from . This means that for some OD pairs, there are movement counts in the test set that lie outside of the range of values in the train set, which confirms that historical percentiles are not good predictors for this data.

3.2.2 Linear Quantile Regression

We proceed to more flexible Linear Quantile Regression (LQR) models, so that for each OD pair and independently:


where is iff , is iff , and

’s are parameters to be estimated. Here too, as for all other QR prediction models, the loss function to be minimized is the total

. We train all LQR models via Iterative Weighted Least Squares with an Epanechnikov kernel and Hall-Sheather bandwidth selection hall1988distribution , and clip any negative predictions to zero.

Our first linear model achieves better total and significantly better mean than . Next, we attempt to further improve performance by removing seasonality from the data. That is, we transform into , where and are, respectively, the mean and standard deviation of . However, performs worse on than does on .

We thus proceed to run LQR again on , and this time eliminate quantile crossings by sorting in ascending order for every . This results in a better performing model , and as an example, Figure 3 illustrates its predictions for one OD pair.

During , exams took place on campus. Hence we next add to each data vector a binary feature , which indicates whether the data is in the exam period, namely:


The corresponding model is , which we train with quantile sorting. yields nearly the same total as does , and mean noticeably closer to , with a small increase in mean . We thus designate as the best performing model so far.

3.2.3 Deep Quantile Regression

Figure 4: , where are input features at lag , and are trainable parameters.

Whereas LQR models are parametric and linear, Deep Neural Network (DNN) models provide a non-parametric approximation of the true predictive distribution. As a baseline, we wish to first obtain similar performance for as for , hence our first deep model is as in Figure 4

: a fully-connected, feed-forward neural network, where a single linear unit combines all input features. While training

, we use the first week of as a validation set for early stop. Consequently, we aim for to first perform as well as , for which we have omitted the first week from the train set too.

Hyper-parameter selection can strongly affect the performance of DNN, hence before training, we use Bayesian Optimization to tune a learning rate in . To this end, we partition into three subsets: the first in , the following in , and the remaining days in . In every iteration, the optimizer tries a different learning rate to train the DNN on , while using for early stop, and then obtains the total of the trained DNN on . The optimal learning rate is the one which minimizes this total after at most iterations. Note that while the same learning rate may be common to several DNN models, we still fit each of them independently for each and each OD pair.

(a) Common LR
(b) LR per OD ()
(c) LR per OD and ()
Figure 5: Bayesian Optimization of learning rate (LR) of . The best LR, indicated by a vertical line, can shift considerably for the same OD pair under different optimization resolutions.

First, we optimize a single learning rate, common to all OD pairs and all . The optimization process is described in Figure 5, and the optimal common learning rate turns out to be . Nevertheless, this learning rate yields worse performance than , as shown in Table 1. We thus next optimize the learning rate independently for each OD pair, and obtain a distribution of learning rates as in Figure 4(b). This results in a little improvement over the common learning rate, but performance is still noticeably worse than . Finally, we optimize the learning rate independently for every combination of OD and , as summarized in Figure 6(a), but gain no performance improvement.

In summation, deep QR did not perform as well as linear QR, despite learning rate optimization. The dataset is thus not large enough for effective deep learning with backpropagation, hence we do not attempt to further improve performance, e.g., using kernel regularization, dropout, or recurrent units. As an alternative attempt at non-parametric QR, we next experiment with Gradient Boosting.

3.2.4 Gradient Boosting

Figure 6: A Gradient Boosting model passes the input

through each decision tree until reaching a leaf, then outputs the sum of all these leaves.

Gradient Boosting (GBoost) friedman2002stochastic builds an ensemble of regression trees incrementally using gradient descent. As in previous prediction models, the loss function being boosted is total . Figure 6 shows an example of for one OD pair and a single .

As for , we partition as , and use Bayesian Optimization to find the best hyper-parameters for : learning rate in , maximum tree depth in , and maximum number of trees in . Figure 6(b) shows that for all , the optimal learning for follows a similar pattern of distributions as for (Figure 6(a)

), albeit with greater variance for

. Nevertheless, Table 1 shows that contrary to , the performance of improves when optimization is carried out separately for each OD and . Compared to , the best performing has similar total and mean , but worse mean .

Figure 7: Box plot of distribution of optimal learning rate over all OD pairs, for each .

3.3 Modeling Spatio-Temporal Dependencies

For every type of prediction model above (LQR, DNN, GBoost), we have built an independent model for each of the combinations of OD pair and . Let us now examine models which process together multiple OD pairs or multiple quantiles, and so may be able to exploit spatio-temporal dependencies in the data. Indeed, Figure 8 shows that for some couples of OD pairs, the time series of movements are strongly correlated.

Figure 8: Pearson correlation between the time series of every couple of different OD pairs.

3.3.1 Linear Quantile Regression on Multiple Time Series

We begin with a linear QR model which processes all OD pairs together, independently for each , as:


where for an arbitrary, pre-fixed ordering of the OD pairs, is binary and indicates whether the data corresponds to the ’th OD pair. We train and test using the same methods as for . Table 1 summarizes the performance of , which is seen to be significantly worse than .

Next, we try linear model , where for each OD pair independently:


where is a hyper-parameter, and is the ’th past lag of the ’th OD pair. is therefore similar to , namely Vector Autoregression of order

, where each of several response variables is modeled on the last

lags of all response variables. We have experimented with , and noticed that performance deteriorated as increased. Table 1 summarizes the performance of for , which is seen to perform better than , but still worse than .

3.3.2 Multivariate Deep Quantile Regression


Figure 9: Multivariate deep Quantile Regression models for any OD pair. are input features at lag .

Next, we try multivariate variants of the DNN models which we built earlier. These variants process multiple quantiles together, and so take less time to train and optimize than the earlier univariate, independent models.

Our first deep multivariate model is as illustrated in Figure 8(a), such that for each OD pair independently, all quantiles are modeled together. We again partition as , and use Bayesian Optimization to find the best learning rate in , common to all OD pairs, which turns out to be . The performance of in Table 1 is seen to be worse than , but better than the univariate models.

Modeling multiple quantiles together can thus result in improved performance. We next try to further improve performance by introducing a hidden layer between the input and the output layers, as in Figure 8(b), so that different quantiles share some trainable parameters. Table 1 shows that this model, which we name , performs badly.

3.4 Gradient Boosting on Multiple Time Series

Finally, we fit Gradient Boosting (GBoost) models on the time series of all OD pairs together, independently for each . As in Section 3.2.4, we use Bayesian Optimization to find the best hyper-parameters, and summarize results in Table 1.

First, we optimize a single set of hyper-parameters, common to all . This results in model , which outperforms , our previously best performing prediction model. That is, compared to , achieves better total and significantly better mean , with a small decrease in mean .

Next, we optimize hyper-parameters independently for each . The resulting model achieves similar total and better mean vs. , but does not improve over . In summation, yields the best prediction quality among all QR models.

3.5 Conclusions on Demand Prediction with Quantile Regression

We have experimented with four types of QR models for predicting distributions of future demand: naive Historical Percentiles, Linear QR, Deep QR, and Gradient Boosting QR. Our experiments include independent models for each OD pair and each , as well as multivariate models which simultaneously process multiple OD pairs or multiple quantiles. Table 1 summarizes the experiment results.

Historical Percentiles performs rather poorly, which indicates that the data at hand does not follow a simple repeating pattern. For Linear and Deep QR, independent models mostly yield better prediction quality than multivariate models. For Gradient Boosting QR, however, multivariate models outperform independent models, such that is the overall best performing prediction model.

In conclusion, model can exploit spatio-temporal dependencies in the data to some extent. Indeed, Figure 8 shows that for some OD pairs, the time series of movement counts are strongly correlated.

4 Supply Optimization

In this Section, we describe in detail the supply optimization part of our predictive stochastic optimization framework. First, we discuss usual methods for dealing with stochastic linear programs, then present our approach to it. Then, we look at the methods used to achieve predicitive optimization, and finally we apply this framework on the test case of DTU Lyngby, for which we have already obtained marginal demand distributions in the previous section.

4.1 Stochastic Transit Network Design Problem

The theoretical setting we consider is that of a transit network, represented by a directed graph , where is the set of nodes, and is the set of edges (links). The set of nodes consists of a set of demand nodes where movements originate and arrive, as well as a set of bus stop nodes . The Transit Route Network Design Problem (TRNDP) has the following objective: find an optimal transit route in along with an optimal frequency setting for the transit service.

In its classic form, TRNDP is usually formulated as a static linear program with parameters that take on deterministic values. In practice, however, parameters are often not static but stochastic, so that they follow some probability distributions. Approaches for dealing with such stochasticity commonly reduce the distributions to point estimates. One such approach is to use the expected values of parameter distributions. While the expected value formulation may be simple to obtain, its solution nonetheless lacks robustness. To overcome this weakness, maximin robust optimization can be used, whereby the maximum of the support () of the parameters is taken instead of their expected values (). Doing so ensures that the solution is feasible for all possible combinations of parameters. This approach is often called the worst-case scenario approach, because the solution space encapsulates all possible combinations, including the worst-case scenario.

Nevertheless, there are drawbacks to both approaches for dealing with stochasticity in parameters. First, by reducing the parameter distributions to point values, both approaches discard of useful information in the full distributions. Second, while the worst-case scenario approach yields robust optimization, it can sometimes be overly conservative, and so miss potentially optimal solutions. Therefore, we next devise a robust solution method for stochastic TRNDP, which takes advantage of the full predictive distributions of parameters. As (stochastic) TRNDP is an NP-hard combinatorial problem, solution methods for TRNDP are often approximate, and so is our method.

Any instance of a static linear program is uniquely defined by its parameters. For a stochastic linear program, we can generate a static instance by setting each parameter as a sample from its distribution. Consequently, our solution method for stochastic TRNDP generates multiple instances by independent sampling, and solves each instance separately. Each solution is an optimal route for the corresponding instance, and different instances may yield the same solution. We rank these routes by their frequency of occurrence in the set of independent solutions, and choose the route with highest frequency as the overall optimal solution.

4.2 Sampling from the Joint Demand Distribution

As discussed in the previous subsection, to obtain multiple instances of the stochastic TRNDP problem, we need to be able to jointly sample from the marginal distributions of the parameters. In order to achieve joint sampling of the distributions, we require a function that pieces together the individual distributions to form a joint distribution. The function, in essence, retains the correlation structure between the different parameters, and provides us with a mean of obtaining joint samples. In this Section, we will introduce the readers to Gaussian copula, and how it is used in our predictive stochastic optimization framework.

To sample from the joint distribution of demand over all OD pairs, we first construct a Gaussian copula, defined as follows. Let be the marginal cumulative distribution functions (CDFs) of demand parameters for each OD pair , respectively. The random vector is distributed per the joint cumulative distribution function (CDF) of its components:


If each of is continuous, then by Sklar’s theorem, is uniquely defined as a function of these marginal CDFs, namely


is then called a copula, and by using this formulation, we decouple the correlation structure in from the individual marginals. A commonly used type of copula is the Gaussian copula,



is the joint CDF of the zero-mean multivariate normal distribution with covariance matrix


We can use the Gaussian copula to obtain samples from the joint distribution of . To obtain such a sample, we first draw from (Eq. 8). Then, using the multivariate normal CDF , we transform into . Finally, we map to the desired sample through the inverses of the marginal CDFs, as


4.3 Covariance Matrix for the Copula

In this study, we assume that only travel demands are stochastic, while other parameters, such as travel time between nodes, remain constant over time. To construct the copula, we assume for simplicity that the correlation structure of the data is time-invariant. As such, it suffices to compute the covariance matrix of the copula offline once, based on historical data. We note, however, that this computation can be efficiently maintained online as new data becomes known, so that the copula retains the updated state of correlation.

To construct the covariance matrix for the Gaussian copula in our framework, we use the observed movement counts in , as following. First, for the demand nodes in the TRNDP graph , we calculate for each OD pair its empirical marginal CDF , based on all its historically observed movement counts in . For each , we then collect the corresponding observations of all OD pairs into a travel demand vector of size ,


where the travel demand from any demand node to itself, i.e. for , is . Next, we transform element-wise into the Gaussian layer as follows:


where is the standard univariate normal CDF.

Finally, we collect the transformed vectors for all into a matrix , and calculate the covariance matrix in the transformed space as:


where is given by the column-wise expectation of the matrix . is then the covariance matrix of the Gaussian copula

4.3.1 Route generation

In a transit route network design problem (TRNDP), the network structure is not usually explicitly defined, which gives rise to a huge number of possibilities for routes on the network. However, a sizable proportion of these routes may not be desirable. Therefore, we first find a set of candidate routes based on several criteria via a route generation procure.

First, a set of candidate bus stops is established from all bus stops in . The candidate set of stops are constructed considering door to door accessibility as well as the connectivity to the public transport services beyond the campus. Accordingly, the candidate bus stops include the feasible stop locations in the vicinity of demand hotspots, and the existing bus stops that are already being served by conventional bus lines. These candidate bus stops are indexed arbitrarily with , where is the number of candidate bus stops. These indices remain the same throughout in our formulation.

Figure 10: The nodes used in our case study. The nodes are labeled with their indices. The prefix ’D’ in the node labels indicates a demand node, while the rest are candidate bus stops.

After determining the set of candidate stops, the shortest path for buses between each pair of candidate stops is obtained. The routes were then constructed by incrementally adding candidate bus stops to the end of the routes. The stops are connected to the adjacent bus stops by the shortest bus path. The quality of each route is evaluated before another candidate bus stop is added to the route. We consider a route to be desirable if the route has little overlapping path, and a route length between 0.75 and 1.5 kilometres. To determine how much the route paths are overlapping on each other, we plot the route paths on a two-dimensional discretized space (see Figure 11). The fitness of a route is then given by the proportion of grids with multiple route paths passing through them. Routes with more than 25 percent of the paths overlapping on each other are discarded, and the procedure repeats by adding another candidate bus stop to the end of the route until all the bus stops have been exhausted or that the route length has exceeded the desired range. After the routes have been generated, the temporal route lengths can be precalculated, assuming a constant velocity of travel throughout.

In this route generation procedure, circular routes, i.e. routes that have the same departure and arrival station, are permitted. However, using the procedure described above, one would get multiple distinct routes which represent the same route albeit the difference in the departure and arrival station as shown in Figure 12. Thus, we circumvent the problem by limiting the departure and arrival station of any circular route to the station with the smallest index.

This route generation procedure effectively eliminates the decision variables for selecting bus stops for each route, and also for calculating the temporal route length, based on the selected stops, in order to determine the frequency of the service. Instead, they are replaced with binary decision variables, one for each route-candidate, to select routes form the set of candidates.

Figure 11: Example of route set reduction for our case study: (a) Map plot of an example circular route, with bus stops indicated by circles. The route starts and ends at the filled circle. (b) The same route, plotted on a two-dimensional discretized space. Orange tiles indicate overlapping route parts. (c) Enlarged view of the area highlighted in grey in (b).
Figure 12: Four different representations of the same circular route, with bus stops indicated by circles. Each route originates from the filled circle, then visits the remaining stops in counter-clockwise order.

4.4 TRNDP as a max-utility max-flow problem

As discussed in Section 2.2.1, TRNDP is usually solved independent of the frequency determination problem. However, some authors formulated either a single highly nonlinear problem, or a bi-level mixed integer program to solve the two problems simultaneously. In these cases, the problems became very complex and rely on heuristics to solve for suboptimal solutions.

In this Section, we shall formally describe the formulation of a max-utility max-flow problem to solve the TRNDP and frequency determination problem simultaneously. The objective of the problem is to maximize the total time savings (utility) and the flow of demands from the source nodes to a dummy sink node node . The source nodes are made up of OD nodes, each representing an OD pair. Then we have nodes which represent the candidate routes, and a special node to represent the ”walking route”. In essence, if none of the routes have enough capacity or does not help to reduce travel time for demands of any OD pair, the demands will flow to node , which indicates that the passengers travelling between OD will have to walk from its origin to the destination. Finally, we also have nodes that indicates the number of buses assigned to the candidate routes.

An illustration of the formulation can be found in Figure 13. The max-utility max-flow formulation can be interpreted as a two-stage process as shown in the figure. In stage one, the demands for all OD pairs are assigned to multiple candidate routes. Then in stage two, the number of buses allocated to the routes are determined.

The notations that are used in the formulation that follows are given in the list below.

Notation Definition
Origin node index.
Destination node index.
Candidate route index.
Number of buses allocated index.
Utility for the transit assignment stage, given by the time savings from taking a ride on candidate route compared to walking directly from origin to destination , for each passenger traveling from origin to destination .
Utility for the buses allocation stage, given by the negative of the average waiting time for candidate route , with buses allocated to the route.
Temporal route length of candidate route in minutes.
Total travel demand from origin to destination .
Walk time between origin and destination .
Ride time on candidate route for passengers traveling from origin to destination .
Total walk time from origin to boarding node on candidate route , and walk time from alighting node on candidate route to destination .
Maximum capacity of each shuttle bus.
Number of routes permitted.
Number of origin-destination pairs.
Number of candidate routes.
Fleet size.
Decision variables
Demands flowing from node to node .
Demands flowing from node to node
Equals to 1 if buses are allocated to candidate route .
Frequency of candidate route when allocated buses.
Table 2: Notations used in the max-utility max-flow formulation.

In this formulation, we maximize the total time savings of all passengers as such:


where and represent the edge costs for stages one and two respectively. In stage one, the edge costs are given by the time savings per passenger, namely:


Note that does not take into account the wait time at the bus stop, which will be taken care of in . is defined as the average waiting time for candidate route , given that the route is allocated buses, which is given by the following: