Riding on the wave of the sharing economy, car-sharing services such as Car2go111https://www.car2go.com, Wunder Mobility222https://www.wundermobility.com, TURO333https://www.turo.com, Zipcar444https://www.zipcar.com and Communauto555https://www.communauto.com play increasingly important role in terms of offering economical and environmentally conscious mobility options to citizens, especially in highly populated urban areas. To the society, car sharing can save parking lots, reduce traffic congestion and air pollution . To individual users, it requires fewer ownership responsibilities and less costs to satisfy their mobility needs. In addition, car sharing provides users with a large range of vehicles, which allows them to match vehicles to trip purposes. The earliest efforts of car-sharing service can be traced back to the 1940s in Europe and 1980s in North America . Despite its rather earlier origins, only the past decade has seen significant growth in large-scale car sharing businesses, which can be mainly attributed to the proliferation of the mobile internet.
A car-sharing service can be financed by public and /or private entities and managed by a service organization which maintains a fleet of cars and light trucks in a network of vehicle locations. Individuals gain access to car-sharing by joining the membership of the organization. Typically, a member pay a modest fixed charge plus a usage fee each time they use a vehicle. Vehicles are usually deployed in a lot located in a neighborhood or at a transit station. A member can reserve a vehicle through a phone call or Internet. Once approved, the reserved vehicle is assigned to the member who picks it up at an appointed time and leaves it at a specific car-sharing location, which may be the same as the pick-up point (one-way car-sharing systems ) or anywhere in a specified zone (free floating car-sharing systems ).
Three levels of decision-making, namely, strategic level, tactical level, and operational level are involved in the management of car-sharing [4, 5]. Strategic decisions include determining the mode assumed by the network (one-way, two-way, free-floating), the number, location, and capacity of stations and fleet size. Tactical decisions mainly involve management policies that govern the service in the medium term, such as reservation and pricing policies. Operational decisions are those need to be made on a daily bases according to the dynamic market and fleet conditions. Typical examples include the decisions of placing initial inventories at each location and relocating vehicles across the network of locations to accommodate the realized demands. In this paper, we propose a data-driven optimization framework to support vehicle relocation decision-making as well as initial inventory placement decisions in car-sharing management. To begin with, We review the related works in the literature.
I-a Related Works
Vehicle relocation problems in the car-sharing context are extensively studied in the literature. One major stream of work is to model CSRP by applying complicating deterministic optimization technique, which can be effectively solved by large-scale optimization exact algorithms such as Lagrangian relaxation, branch-and-bound or by heuristic algorithms such as neighborhood search, simulated annealing etc. For example, Gambella et al. formulate electric vehicle relocation problem (EVRP) as two mixed integer programming (MIP) models to maximize the profit associated with the trips performed by the users in operating hours and non-operating hours, respectively. In the model settings, EVs battery consumption and recharge process are taken into considerations. Two model-based heuristic algorithms based on removing relocation and rolling horizon mechanisms are designed to solve the relocation model due to the computational complexity. The experiment results show that the proposed algorithms achieve near-optimal solutions and outperforms the solutions by cplex restricted by a time limit. Similarly, the authors in 
investigate the electric vehicle fleet size and trip pricing problem which is formulated as a mixed-integer non-linear programming (MINLP) model to maximize the overall profit by defining both long-term resource allocation and short-term operation strategy. Specifically, the proposed MINLP model aims to optimize the station location, station capacity and fleet size simultaneously. To solve this large scale MINLP problem, a customized gradient algorithm is introduced and validate in a real case study. An integrated framework for electric vehicle re-balancing and staff relocation (EVR&SR) is proposed by. The EVR&SR is represented using a space-time network and formulated as mixed-integer linear programming (MILP) model to minimize the overall cost including investment costs and operation expenses. The determination of the optimal allocation plan of EVs and staff relocation in the strategic level as well as the decisions of EV relocation and staff relocation are both taken into considerations in this framework. Since even the medium-scale instances cannot be solved by CPLEX and Gurobi effectively, a Lagrangian relaxation-based solution approach which decomposes the primal problem into a group of sub-problems embedded with dynamic programming and greedy algorithm is introduced to tackle the large-scale problem instance. It is able to reach the near-optimal solution in a short time. In , a more general framework which involves a multi-objective MILP model and a virtual hub is introduced. In details, the mulit-objeictive model considers both vehicle relocation and electrical charging requirements. While the virtual hub is aggregated to tackle the extremely large number of relocation variables. The problem can be solved by the typical branch-and-bound approach which generates the efficient frontier and reaches the trade-off between operator’s and users’ benefits to maximize the net revenue for the operator. To guarantee the flexibility of car-sharing service,  proposes a two-stage optimization model which involves optimizing destination locations and maximizing manager’s profit. However, the aforementioned studies do not consider any uncertain parameters such as demand, supply and travelling time. Thus, these modeling approaches cannot be directly applied to our CSRP.
Another line of literature models CSRP by applying stochastic programming modeling techniques. A similar application like CSRP called bike sharing allocation and re-balancing problem (BSA & RP) is introduced in . In order to minimize the total expected penalty which involves the sum of all the charged penalties for delivery, re-balancing, extra and excess inventory and stock-out, the problem is formulated as a two-stage stochastic programming model. In the two-stage SP model, the initial allocation in strategic level is considered in the first-stage decision, while the rebalancing is tackled in the second-stage decision. Meanwhile, a solution-based heuristic algorithm based on scenario generation is devised to solve the model. A multi-stage stochastic linear programming (SLP) model is developed for optimizing strategic allocation of car-sharing vehicles (OSACV) in  considering dynamic and uncertain demands. In the problem settings, the vehicles are assumed to be in use, in transit empty or stationary empty. Additionally, the travelling time between locations is one day. The aim of the problem is maximizing total expected profits which involves revenue and moving cost in both strategic and operational levels. Since the SP model involves seven stages, a scenario tree approach is utilized to solve the complex multi-stage SP model. In 
, the authors address large-scale dynamic repositioning and routing problem (DRRP) instances with stochastic customer demand. The DRRP can be applied in many similar fields such as bike-sharing after simplified extension. A two-stage stochastic programming model based on network flow formulation is built to minimize the expected cost, wherein, the customer arrivals and starting time are assumed to follow Poisson distribution. An iterative algorithm called SPAR (separable, projective, approximation, routine) is adapted to solve the model in a real-world case study. Nevertheless, the above modelings and approaches cannot be applied in data-driven environment directly since they do not utilize the historical data in an accurate way. Furthermore, mathematical models that are formulated based on SP assumed that the probability distribution is known with a specific type. However, in the real historical data, the probability distribution information may contain many even infinite parameters which cannot be described by simple known distribution such as Gaussian distribution or Poisson distribution as referred in.
I-B Research Gaps
Nowadays, with the rapid development of transportation in cities, a huge amount of data is generated every day, which leads to the significant change in the intelligent transportation system [13, 14]. However, increasing data brings new challenges to traditional optimization of car-sharing relocation problem (CSRP) which plays a key role in CSS. For example, the customer demand (traffic flow) variability has a great impact on inventory level, the inappropriate decision-makings may lead poor service level . Therefore, how to tackle the uncertainty factors in data-driven environment is the key factor for CSRP.
The major limitation of previous works related to SP is that the probability distribution information is assumed to be known or estimated by experience. Actually, in those relevant works, the probability distribution are determined by decision-makers using parametric approach. Specifically, the decision-makers select a specific parametric distribution (e.g. Gaussian distribution). Afterwards, the parameters of the distribution will be determined by statistical methods. However, in most real applications, the true distribution information may be too complex to be described by simple parametric approaches. Therefore, we explore utilizing related machine learning approaches to make the SP model more practical. Recently, combining machine learning (ML) / deep learning(DL) with optimization techniques becomes the trend in operations research (OR) community[17, 18], which is known as data-driven optimization. A few researchers attempted to leverage the advantages of ML to make optimization models more realistic, and applied this in chemical industry[19, 20]. In detail, they applied Dirichlet process mixture model (DPMM) and principle component analysis (PCA) on distributionally robust optimization (DRO) model, which cannot satisfy our purpose. To the best of our knowledge, no similar work are applied in CSRP.
I-C Objectives and Contributions
In light of the results from previous works[19, 20], to consider applying the concept in CSRP, we proposed an innovative data-driven stochastic programming framework named DDKSP, which organically integrates the non-parametric approach - kernel density estimation (KDE) and stochastic programming model. Specifically, unlike the previous relevant work in which the probability distribution are assume to be known or estimated by parametric approach, the true probability distribution of customer demands are extracted by KDE. Then a two-stage non-linear stochastic programming model with the derived parameters is proposed to formulate the CSRP. Finally, integrating sample average approximation method with Benders decomposition algorithm is introduced to solve the two-stage non-linear SP model. It is worth noting that our proposed framework can be easily extended to solve the homogeneous problems such as bike-sharing and EV-sharing problem [21, 22, 23, 24].
The rest of the paper is organized as follows. The problem description and formulation are discussed in section 2. While section 3 describes the DDKSP framework which involves KDE, sample average approximation (SAA) method and Benders decomposition algorithm. Data prepossession and numerical experiment are presented in section 4. Finally, we conclude our work and propose future work in section 5.
Ii Problem Formulation
Ii-a Problem Statement
Generally, we study the CSRP which is a typical decision-making under centralized environment. It involves two roles, a car company and customers. Consider a one-way car-sharing system (pickup at one location while dropoff at any locations), a car company owns a number of vehicles and there is a number of locations for car dispatch. For the customers, they reserved cars in advance and picked the car at the specific location. The CSRP can be considered as a two-stage decision-making problem which can be described as follows. In the first-stage (in the strategic phase), during a time window (e.g., from 0 am to 4 am) before the upcoming customer demands realize, each vehicle location is allocated with a certain number of cars (initial inventory decision-making), which incurs holding costs denoted by . In the second-stage (in the operational phase), after the real customer demand revealed (we assume that there exist a deadline that no customer orders accepted for today, e.g. 4 am), customers who reserved the cars will visit the locations to pick up the vehicles which brings revenue denoted by . Meanwhile, the truck carriers in the car company must dynamically move the cars from lower demands locations to higher demands locations to prevent the imbalance of vehicles among locations, which incurs moving costs denoted by .
Since the first-stage decision must be made before the second-stage, namely, the decision-makers must decide the most appropriate number of cars at each location to satisfy all the possibilities (called scenarios in stochastic programming) of customer demands (more cars will incur more holding cost, less cars will incur more moving cost), while reducing moving cost as possible as they can. The mathematical model must be able to hedge against the customer demands uncertainty. Based on the problem settings, the objective of CSRP is maximizing the overall expected profit, which involves total revenue, holding costs at each location and moving costs between locations. In this sense, the CSRP in this work focus on answering the following questions. (1) How many initial vehicles before the real demands revealed are required in each location, (2) how to move cars between locations in order to satisfy customer demands while maximize the overall profit.
In this work, the most critical concern for CSRP is the way of modeling uncertainty under data-driven environment. For convenience, only customer demand is considered as uncertainty parameter. Since the CSRP is a typical two-stage problem with demands uncertainty, we investigate to utilize two-stage stochastic programming model to formulate the problem. In the two-stage SP model, decision variables are divided into two groups: the first stage decision variables (here-and-now) which should be determined before the real demands revealed, and the second stage decision variables (wait-and-see) which are determined after the real demands realized.
Meanwhile, without the loss of generality, in the problem settings, several assumptions are made in the following.
We assume that the vehicle reservations in our work are determined before the operational phase (second-stage) starts, which implies that the customers cannot cancel or delay the reservations.
Our work assume that all the vehicles are working in the same condition, which means homogeneous cars are provided for customers.
We assume that the historical customer demand at each location is available, which indicates that the probability distribution information can be derived from historical data.
It is assumed that the true demands at all the locations are realized simultaneously.
Ii-B Model Formulation
In this section, we will discuss CSRP model formulations include deterministic model and two-stage SP counterpart. It is worth noting that probability distributions are required for SP model. For clarity, the notations are listed below.
regional origins and/or destinations
The set of scenarios
: holding cost at location .
: moving cost from location to location .
: the average demand of location .
: first-stage decision variable which denotes the number of vehicles at location .
: the second-stage decision variable which denotes the number of vehicles moving from location to location under scenario .
Random Variables (for stochastic programming model)
: random demands which denotes the number of cars that will be picked up by customers at location .
: the probability of scenario .
Ii-B1 Deterministic CSRP Model
In the deterministic model, we consider to allocate the limited vehicles to different locations in order to maximize the overall profit. For convenience, we consider using the average demands. The deterministic model for CSRP can be formulated as follows.
The objective function (1) is to maximize the overall profit which equals the difference of total revenue and total holding cost. The constraint in equation (2) ensures that the number of total vehicles are not exceeded the capacity which can be easily estimated from historical data. The constraints in equation (3) imply two-fold meanings. If the number of allocated cars at location is higher than the customer demand at location , then the number of vehicles that move out must be less than the difference of number of cars at this location and customer demand of this location. Otherwise, no cars move out from location which implies the quantity of available vehicles is lower than the customer demand at location . Constraints (4) and (5) are the types of decision variables.
Although the deterministic model is capable of tackling the optimization model in a simple way, the average demands for model may lead to optimal solution with high risk even infeasible. Additionally, it is worth noting that the objective function (1) is a piece-wise linear function, therefore, it is required to reformulated to a linear function before solving.
Ii-B2 Two-Stage SP CSRP Model
The car-sharing operators wish to maximize expected profit over all possible realization of scenarios. Considering the customer demands are under uncertainty, we assume the demand scenarios are sampled from the probability distribution that are derived from historical data. Then the two-stage SP model of CSRP can be formulated as follows.
The objective function (6) is to maximize the overall profit, which is denotes by the difference of revenue and overall cost (the summation of holding cost and moving/transferring cost). Constraint (7) is identical to constraint (2). Similar as constraints (3), constraints (8) also imply two-fold meanings, slightly unlike constraint (3), it involves SP scenarios. Specifically, if the number of allocated cars at location is higher than the customer demand at location , then the number of vehicles that move out under scenario must be less than the difference of number of cars at this location and customer demand of this location under scenario . Otherwise, no cars move out from location . under scenario . Constraints (9) and (10) describe the type of decision variables.
Inspired by the idea of integration of ML with OR, the DDKSP framework is proposed in this work, which is briefly described as follows. Basically, the DDKSP framework involves four components, specifically, ML / DL part (in our problem setting, it is KDE) is in charge of probability distribution extraction from uncertain data, SP part focuses on the problem modeling, SAA & Benders decomposition part aims at reformulation SP model, and the last part yields the final decision-making. The DDKSP framework can be illustrated in Fig. 1
. It is worth noting that our framework can be readily extended by components replacement. For example, the ML DL part can adopt general supervised and unsupervised learning algorithms depend on the specific problems, the SP part can be replaced by Robust Optimization (RO) or Distributionally Robust Optimization (DRO) , and the SAA & Benders decomposition part can be replaced by other large-scale decomposition algorithms such as column generation, Lagrangian relaxation etc.
For the first component, we adopt Kernel density estimation (KDE) for our work. KDE is a typical non-parametric approach which is applied to describe probability distribution without specifying the distribution form in advance . Let f be the density function of parameters, given a set of data , then the KDE for f can be obtained as follows
where K is the kernel function and h is the bandwidth. In this work, we select Gaussian kernel function as the kernel which is given below.
Iii-B Two-Stage SP CSRP Model Reformulation
Unlike the deterministic model which can be solved by off-the-shelf commercial solvers effectively. Normally, the two-stage SP model required reformulation since the continuous probability distribution contains infinite scenarios. In this paper, we utilize the sample average approximation (SAA) - a Monte Carlo method to reformulate the two-stage SP model. The procedure of SAA can be summarized as follows.
Input: probability distribution , number of sample , size and two-stage SP model
Output: the optimal value
Notice that the reformulation model in SAA, the objective function becomes
where is the number of scenarios. Additionally, the objective function is still a non-linear objective function. We introduce the auxiliary variable to transform the non-linear objective function to the linear type. Let . Then the two-stage SP model becomes
Iii-C Two-Stage SP CSRP Model Decomposition
After the reformulation, the two-stage SP model becomes a very large-scale deterministic model, for example, if we consider 50 locations and 1000 scenarios, the number of second-stage decision variables will be 50*50*1000 = 2,500,000. To solve large-scale model effectively, decomposition algorithm is required. In this work, we introduce Benders decomposition to solve the reformulated model. Generally, Benders decomposition is an effective algorithm aims solving mixed integer linear programming (MILP) model, in which the primal model is decomposed into one master problem (MP) and a group of subproblems (SUBP) in dual form, the outcome is yielded from iterative solving SUBP and updated MP.
For convenience, in the following, we neglect the constant . Then we divide the reformulated model into a MP
and a SUBP in the dual form
where and are the dual variables of SUBP, and are the fixed values that are determined by the MP. During each iteration in MP, the values are adjusted and assigned to the SUBP. Finally, the algorithm can be summarized as follows.
where is a very small factional number, which is usually set from to . Therefore, in our case, either values of upper bound or lower bound can be considered as the optimal solution.
Iv Numerical Experiment
Experiment Design. We design a group of experiments. To begin with, we do the data pre-processing & analysis including data aggregation for demand and demand distribution analysis. After that both non-parametric approach KDE and parametric approaches (Gaussian, Laplace and Poisson) are applied to derive probability distributions for the SP model. Then we compare the SP model with deterministic model in terms of values of objective functions and models running time. Moreover, we validate and compare the KDE with three parametric approaches - Gaussian, Laplace and Poisson distributions. Finally, we explore and show the two-stage decision making based on a day record.
Experiment Setup. The algorithms (SAA, BD, KDE and parametric approaches) are implemented using Python 3.7, the mathematical models are solved by Gurobi 666https://www.gurobi.com/academia/academic-program-and-licenses/ 8.1 academic version under the platform Intel i7, 16GB RAM, Windows 10. It is worth noting that the deterministic parameters in our SP model like (revenue) and (transferring cost) can be estimated from the data set easily. For convenience, in the following experiments, the revenue per car is set to $100, the transferring cost is roughly estimated based on the distance between locations which ranges between 10 to 100, the number of available vehicles is set to 16,000, and the holding cost is assumed to follow the Gaussian distribution with the parameters .
Iv-a Data Analysis
The data sets are from New York taxi trip777https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page, we collected three years (July 2016 - June 2019) green taxi trip records as the data source which is archived by month. We split the three years data sets into training set (from July 2016 to December 2018, 919 days) and testing set (from January 2019 to June 2019, 181 days), each data set involves thousands of naive one-trip records with a complex structure. Take the data set 2018-01 for example, it contains 793,529 records and 19 attributes. For our application purpose, we investigate 6 attributes which is listed in Table I. Additionally, in this data set the whole New York city is divided into 259 different locations. The New York city location division information details can be found via https://data.world/nyc-taxi-limo/taxi-zone-lookup. The main task of data processing is to aggregate the trip records into demands, which are aggregated by days. After the data processing, we selected 20 locations (location IDs: 74, 41, 7, 75, 255, 82, 166, 42, 181, 97, 129, 25, 95, 244, 33, 260, 256, 66, 223 and 65, sorted by demands descending) with highest average demands, which are plotted on the map in Fig. LABEL:fig:nyc.
|PULocationID||pickup location ID|
|DOLocationID||dropoff location ID|
Among the 20 locations with high demands that are estimated from the data sets, there are mainly two types of distributions for demands. One is unimodal type, the other one which represents the most locations is bimodal type. In the first type, a specific functional form for the density model such as Gaussian distribution can be assumed, in other words, parametric methods can be applied on these scenarios. Most of the works that related to SP adapts this approach. While in the second type, the particular form of parametric functions are unable to provide the appropriate representation of the real density. In such cases, we must consider using non-parametric or semi-parametric approaches such as KDE or Gaussian mixture model (GMM).
Most of the parametric methods may work well in the unimodal distributions, but cannot achieve the same goal for bimodal distributions. That is why KDE approach is introduced in this work.
Iv-B Stochastic Model vs. Deterministic Model Results
In order to compare the deterministic model with SP one under different scenarios, We generate 5 groups of scenarios for SP model based on the probability distributions that are derived from KDE. The numbers of scenarios are 20, 50, 100, 200 and 500. Meanwhile, each group runs 10 times under SAA. Additionally, we consider deterministic model using the average demands that are calculated from training set (average demand of 919 days) and testing set (average demand of 181 days). The average objective values and time elapse can be seen in Table II.
|Number of Scenario||Objective Value||Time Elapse (s)|
|deterministic (average on training set)||$1,325,723||0.24|
|deterministic (average on testing set)||$1,017,054||0.24|
Based on the experimental results, we come to conclude that the two-stage SP model is able to yield more outcomes than the deterministic model. the objective value of two-stage SP model is 11.56% and 45.42% more than deterministic counterpart on training set and testing set respectively. Additionally, by average demands, the overall profit on the training set is more that the one on the testing set.
Iv-C Validations on Parametric Approaches
Besides the non-parametric approach, we also use several popular parametric distributions (Gaussian, Laplace and Poisson distributions) as the customer demands distributions based on the data sets. Meanwhile, the parameters from Laplace , Gaussian , and Poisson distributions are estimated by maximum likelihood estimation (MLE) using the sampling data, which implies the following equations satisfy.
where denotes the number of sampling data.
|Number of Scenario||KDE||Gaussian||Laplace||Poisson|
The comparison between KDE and the three parametric approaches is shown in Table III , the overall profit yielded from Gaussian distribution is slightly better than the one yielded from Laplace distribution, and both of them are better than Poisson distribution. However all of the parametric approaches are inferior to the non-parametric approach KDE in terms of the overall profit (3.72%, 4.58% and 11% lower than non-parametric method by average).
Iv-D Two Stages Decision Makings
In the two-stage SP model, solutions involves two parts, the first-stage decision variables which denote the numbers of cars that are placed at each location (or the initial inventory level) before demands realize, and the second-stage decision variables which denote the number of cars that are moving between locations for re-balancing. We design a group of experiment in this subsection.
Firstly, the values of first-stage decision variables are derived from two-stage SP model using KDE, Poisson, Laplace and Gaussian based on training sets (30 months), the results under different distributions are shown in Table IV, Table V, Table VI, Table VII, respectively. Take Table IV
for example, the rows denote the numbers of scenario in SP model, the columns denote top 20 locations with highest demands (by descending sort) as mentioned before. We come to conclude that the solutions by KDE are more stable (lower variance) compared with Poisson, Laplace and Gaussian distributions. In practical applications, the decision-makers can use the average values as the first-stage decisions.
|scenario||top 20 locations with highest demands|
|scenario||top 20 locations with highest demands|
|scenario||top 20 locations with highest demands|
|scenario||top 20 locations with highest demands|
Secondly, after the real demands reveal, the decision-makers must decide the vehicle moving strategy between locations (second-stage decision-making). We validate this using one day record (2019-01-01) on the testing set, which is shown in Table VIII. Based on the first-stage decisions from KDE, Poisson, Laplace and Gaussian, then the outcomes of second-stage decision are shown in Table X, Table XI, Table XII, Table XIII, respectively. The structure of the table is explained as follows, the rows denote the locations that cars moving in, while the columns represent the locations that cars moving out. The cell values imply the number of cars moving between the locations. For convenience, the numbers in both rows and columns are the top 20 locations with highest demands as mentioned above. It is worth noting that, the first-stage decision values we use are from scenario 20 of the four types of distribution, the moving results may vary if we adopt scenario 50, 100, 200 and 500. It is clear to see that, in this use case, the total number of car-moving in KDE is much less than the rest of three parametric approaches. Meanwhile, we come to conclude that given the data set, the distribution type and parameters have a great impact on the result of stochastic programming model. For example, in the Table VII we observe that the first-stage decision under Poisson is quite different from the rest of three, especially in the first location. Therefore, it leads the different second-stage decision which is shown in Table XIII. It is also worth noting that these outcomes are based on single day record, the outcomes will be different if it is applied on the rest of days record.
Finally, we come to investigate the profits based on different approaches over the entire testing sets. Specifically, we compute and compare the overall profit using KDE, Gaussian, Laplace and Poisson on the testing set. We compare the outcomes for six months (181 days), which are shown in Fig. 5, 9, respectively. The plots imply that the KDE approach outperforms the rest three approaches in terms of overall profits. Specifically, by average, Gaussian and Laplace distributions are ranked second and third, respectively, with a slight gap compared to KDE, Poisson distribution yielded 11% profit lower than KDE. This summarized result is shown in Table IX.
V Conclusions and Future Work
In this paper, we propose a data-driven stochastic programming framework DDKSP to solve CSRP using New York taxi trip record data sets. In more real world, the demand distribution would be time variant and evolves gradually (or the parameters of distribution vary at least), which renders the dynamic system outdated and leads to deteriorates the resulting solution quality
. In order to describe this evolution in a more precise way, we will investigate Bayesian learning which focus on posterior probability distribution that is based on prior probability distribution and the likelihood of current data. Namely, we will explore the dynamic data-driven stochastic programming model for CSRP.
Additionally, in our work, the proposed framework treats the customer demands by days, which can be considered as an offline data-driven framework. In several applications, the customer demands may fluctuate intensively in hours even minutes such as taxi dispatch problem. Therefore, We will explore data-driven optimization frameworks with online learning using real-time data in our future works. Meanwhile, in this paper, for convenience, some other factors we do not consider. For example, we do not consider the capacity of locations, and the route condition of balancing which may lead different transportation costs. Later on, we will extend the two-stage SP model to a more practical one.
M. Bruglieri, F. Pezzella, and O. Pisacane, “A two-phase optimization method
for a multiobjective vehicle relocation problem in electric carsharing
Journal of Combinatorial Optimization, vol. 36, pp. 162–193, 2018.
-  S. Shaheen, D. Sperling, and C. Wagner, “Carsharing in europe and north american: past, present, and future,” 1998.
-  R. Vosooghi, J. Puchinger, M. Jankovic, and G. Sirin, “A critical analysis of travel demand estimation for new one-way carsharing systems,” in 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2017, pp. 199–205.
-  S. Illgen and M. Höck, “Literature review of the vehicle relocation problem in one-way car sharing networks,” Transportation Research Part B: Methodological, 2018.
-  R. Cavagnini, L. Bertazzi, F. Maggioni, and M. Hewitt, “A two-stage stochastic optimization model for the bike sharing allocation and rebalancing problem,” 2018.
-  C. Gambella, E. Malaguti, F. Masini, and D. Vigo, “Optimizing relocation operations in electric car-sharing,” Omega, vol. 81, pp. 234–245, 2018.
-  K. Huang, G. H. de Almeida Correia, and K. An, “Solving the station-based one-way carsharing network planning problem with relocations and non-linear demand,” Transportation Research Part C: Emerging Technologies, vol. 90, pp. 1–17, 2018.
-  M. Zhao, X. Li, J. Yin, J. Cui, L. Yang, and S. An, “An integrated framework for electric vehicle rebalancing and staff relocation in one-way carsharing systems: Model formulation and lagrangian relaxation-based solution approach,” Transportation Research Part B: Methodological, vol. 117, pp. 542–572, 2018.
-  B. Boyacı, K. G. Zografos, and N. Geroliminis, “An optimization framework for the development of efficient one-way car-sharing systems,” European Journal of Operational Research, vol. 240, no. 3, pp. 718–733, 2015.
-  A. Di Febbraro, N. Sacco, and M. Saeednia, “One-way car-sharing profit maximization by means of user-based vehicle relocation,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 2, pp. 628–641, 2018.
-  W. D. Fan, “Optimizing strategic allocation of vehicles for one-way car-sharing systems under demand uncertainty,” in Journal of the Transportation Research Forum, vol. 53, no. 3, 2014.
-  J. Warrington and D. Ruchti, “Two-stage stochastic approximation for dynamic rebalancing of shared mobility systems,” Transportation Research Part C: Emerging Technologies, vol. 104, pp. 110–134, 2019.
-  J. Zhang, F.-Y. Wang, K. Wang, W.-H. Lin, X. Xu, and C. Chen, “Data-driven intelligent transportation systems: A survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 12, no. 4, pp. 1624–1639, 2011.
-  L. Zhu, F. R. Yu, Y. Wang, B. Ning, and T. Tang, “Big data analytics in intelligent transportation systems: A survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 1, pp. 383–398, 2018.
-  Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, “Traffic flow prediction with big data: a deep learning approach,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 2, pp. 865–873, 2014.
-  Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
-  Y. Bengio, A. Lodi, and A. Prouvost, “Machine learning for combinatorial optimization: a methodological tour d’horizon,” arXiv preprint arXiv:1811.06128, 2018.
-  E. Larsen, S. Lachapelle, Y. Bengio, E. Frejinger, S. Lacoste-Julien, and A. Lodi, “Predicting solution summaries to integer linear programs under imperfect information with machine learning,” arXiv preprint arXiv:1807.11876, 2018.
-  C. Ning and F. You, “Data-driven stochastic robust optimization: General computational framework and algorithm leveraging machine learning for optimization under uncertainty in the big data era,” Computers & Chemical Engineering, vol. 111, pp. 115–133, 2018.
-  C. Shang and F. You, “Distributionally robust optimization for planning and scheduling under uncertainty,” Computers & Chemical Engineering, vol. 110, pp. 53–68, 2018.
-  S. Faridimehr, S. Venkatachalam, and R. B. Chinnam, “A stochastic programming approach for electric vehicle charging network design,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 5, pp. 1870–1882, 2018.
-  M. Cocca, D. Giordano, M. Mellia, and L. Vassio, “Free floating electric car sharing: A data driven approach for system design,” IEEE Transactions on Intelligent Transportation Systems, 2019.
-  ——, “Data driven optimization of charging station placement for ev free floating car sharing,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 2490–2495.
-  X. Huo, X. Wu, M. Li, N. Zheng, and G. Yu, “The allocation problem of electric car-sharing system: A data-driven approach,” Transportation Research Part D: Transport and Environment, vol. 78, p. 102192, 2020.
-  A. Ben-Tal, L. El Ghaoui, and A. Nemirovski, Robust optimization. Princeton University Press, 2009, vol. 28.
E. Delage and Y. Ye, “Distributionally robust optimization under moment uncertainty with application to data-driven problems,”Operations research, vol. 58, no. 3, pp. 595–612, 2010.
-  C. M. Bishop et al., Neural networks for pattern recognition. Oxford university press, 1995.
-  T. Santoso, S. Ahmed, M. Goetschalckx, and A. Shapiro, “A stochastic programming approach for supply chain network design under uncertainty,” European Journal of Operational Research, vol. 167, no. 1, pp. 96–115, 2005.
-  J. F. Benders, “Partitioning procedures for solving mixed-variables programming problems,” Computational Management Science, vol. 2, no. 1, pp. 3–19, 2005.
-  C. Ning and F. You, “Optimization under uncertainty in the era of big data and deep learning: When machine learning meets mathematical programming,” Computers & Chemical Engineering, 2019.
Appendix A Moving between Locations Based on the First-Stage Decision