Car-Sharing which was coined in the middle of 20th centuryShaheen et al. (1998) is increasing sharply in many cities now. This trend becomes much more popular since people may benefit a lot from the the sharing system, such as saving parking lots, reducing the traffic congestion and air pollutionBruglieri et al. (2018). To use the car-sharing service, normally, a customer can reserve the vehicle by phone or Internet. Once approved, the reserved vehicle is assigned to the customer who picks it up at an appointed time and leaves it at a specific car-sharing location, which may be the same as the pick-up point (one-way car-sharing systems) or anywhere in a specified zone (free-floating car-sharing systems) Vosooghi et al. (2017); Illgen and Höck (2018). Typically, car-sharing systems are financed by public and /or private entities and managed by service provides, who are involved in strategic, tactical, and operational decision-making. Strategic decisions can be include determining the number, location, and capacity of stations for car rental and return, whereas tactical decisions can include allocation decisions. Daily, operational decisions include determining how to periodically re-distribute cars to station.
Nowadays, With the rapid development of transportation in cities, a huge amount of data is generated every day. Thanks to the emerging technologies such as wireless sensor network (WSN), cloud computing and Big Data, which make it possible to collect, store and analyze the data in an effective and efficient way. However, increasing data brings new challenges to traditional car-sharing optimization issues.
In order to solve the problem, several optimization approaches are under investigation including complicating determinate modeling and optimization under uncertainty.
For deterministic model, Gambella et al. Gambella et al. (2018)
propose an MIP model along with two heuristic algorithms to optimize electric vehicle relocation problem, Huang et al.Huang et al. (2018) investigate one-way station-based relocation considering non-linear demand, an mixed integer non-linear model is proposed, Xu et al.Xu et al. (2018) study the electric vehicle fleet size and trip pricing (EVFS&TP) problem for one-way car-sharing services by taking into account the necessary practical requirements of vehicle relocation and personnel assignment. A mixed-integer nonlinear and nonconvex programming model is developed. Li et al. Li et al. (2016) focus on the Share-a-Ride Problem (SARP) aiming at maximizing the profit of serving a set of passengers and parcels using a set of homogeneous vehicles. An adaptive large neighborhood search heuristic algorithm is devised. Zhao et al. Zhao et al. (2018) devise an integrated framework to minimize the total cost, including the EV and staff investment, EV re-balancing and staff relocation costs. The model is reformulated and solved by Lagrangian relaxation approach. Boyacı et al. Boyacı et al. (2015) explore one-way vehicle-sharing systems that is taking vehicle relocation and electric vehicle charging requirements into consideration. A multi-objective optimization model is developed and solved by branch-and-bound.
For stochastic programming model, Brandstätter et al.Brandstätter et al. (2017) solve strategic optimization problems of car-sharing systems that utilize electric cars by a two-stage stochastic programming model. Also, the heuristic algorithm is used to tackle large-scale instances. Biondi et al. Biondi et al. (2016) explore to optimize car-sharing system with uncertain demands from the perspective of queue theory. Fan et al. Fan and Machemehl (2004) consider the stochastic dynamic vehicle allocation problem (SDVAP), a multi-stage stochastic programming model is formulated to maximize profits and to manage fleets of vehicles in both time and space. Later, they develop a stochastic programming model to optimize strategic allocation of vehicles for one-way car-sharing systems under demand uncertainty Fan (2014). Cavagnini et al. Cavagnini et al. (2018) propose a bike-sharing system which composes one depot and multiple capacitated stations.
Although the aforementioned work handle with car-sharing problem from different perspectives, among these works, most of the works consider modeling car-sharing problem in the deterministic way without involving any uncertain parameters such as demand, supply, travelling time. Only a few of them use optimization under certainty techniques to solve car-sharing problem, none of them utilize the accurate probability information from historical data. Meanwhile, most mathematical models that are formulated based on SP are assumed that the probability distribution is known with a specific type. However, in the real historical data, the probability distribution information may contain many even infinite parameters which cannot be described by simple known distribution such as Gaussian distribution. We will further discuss this topic in the numerical experiment section.
There are several ways to hedge against uncertainty using optimization techniques. In stochastic programming(SP)Birge and Louveaux (2011), uncertainty is modeled through discrete or continuous probability functions, in other words, SP models heavily rely on probability information from historical data. In fuzzy programming (FP)Zimmermann (1978), uncertainty parameters are considered as fuzzy numbers and constraints are treated as fuzzy sets. In robust optimization(RO)Ben-Tal et al. (2009), uncertainty is described in a particular set called uncertain set. In distibtuionally robust optimization(DRO)Delage and Ye (2010), uncertainty is formulated by an ambiguity set which includes a family of probability distributions. In our scope, we are primarily interested in extracting exact probability distribution information from historical data. To this end, we come to consider using two-stage SP to solve the car-sharing problem.
In order to overcome the issue aforementioned, we consider to utilize related machine learning approach to make the SP model more practical. Recently, integrating machine learning (ML) with optimization techniques becomes the trend in operational research (OR) communityBengio et al. (2018), Larsen et al. (2018). A few researchers attempted to leverage the advantages of ML make optimization more realistic, especially, when it is applied in big data and data-driven optimizationNing and You (2018),Shang and You (2018). In our work, we will follow the trend to solve car-sharing problem. Specifically, we proposed a framework that involves two major components. In ML part, we utilize the non-parametric approach - kernel density estimation to extract more accurate probability distribution from historical data, while in OR part, stochastic programming models are constructed based on those parameters. To our best knowledge, the proposed framework is the first one to solve car-sharing problem under demand uncertainty. The contribution of this work can be summarized as follows.
We consider using the non-parametric approach kernel density estimation to extract the arbitrary probability distribution of user demands from historical data on New York taxi trip data set.
A two-stage stochastic programming model using the aforementioned probability distribution information is proposed to formulate car-sharing problem.
Integrating sample average approximation method with Benders decomposition algorithm to solve the two-stage stochastic programming model.
The rest of the paper is organized as follows. The problem description is discussed in section 2. Section 3 investigates some related literature, the methodology is explored in section 4. In section 5, both deterministic and two-stage stochastic models are designed. While section 6 describes the framework which involves sample average approximation (SAA) method and Benders decomposition algorithm. Data prepossession and numerical experiment are presented in section 7. Finally, we conclude our work and propose future work in section 8.
2 Problem Description
In this article, we address the car-sharing problem with the demands under uncertainty using a two-stage stochastic programming model. The objective is to make the maximize overall profit, which involves total revenue, holding costs at each location and moving costs between locations. Generally, we study a car-sharing system managed by a service provider wherein the decision-making is centralized. The problem can be stated as follows: the decision making can be divided into two stages. During the first stage, at the beginning of the day (e.g. at 0 AM), the number of vehicles at each location must be determined. During the second stage, after the real demand revealed (e.g. no new orders for today accepted), the truck carriers must decide how many vehicles to relocate between locations.
In this work, the most critical concern for car-sharing problem is the way of modeling uncertainty. For convenience, only customer demands considered as uncertainty parameters. Normally, in SP paradigm uncertain parameters are modeled as random variables with specific probability distributions which are extracted from historical data. Unlike the existing works which assume the uncertainty parameters conform a known probability distribution such as Gaussian, Poisson, log-normal distributions, in our work, the uncertainty parameters follow any types of distributions or non-parametric distributions.
In the two-stage SP model, all the decision variable are divided into two groups: the first stage decision variables (or here-and-now) which should be determined before the real demands revealed, and the second stage decision variables (or wait-and-see) which are determined after the real demands realized. Based on the problem statement, a group of assumption are made as follows.
Allocation resource is limited, cannot satisfy all the demands,
Holding costs incur depends on the number of vehicles and locations,
Moving cost depends on the specific route,
The car-sharing service must finish within one day.
3 Model Formulation
In this section, we will discuss car-sharing model formulations include deterministic model and two-stage SP counterpart. It is worth noting that probability distributions are required for SP model. Unlike most existing works which assume that the probability distribution of uncertain parameters are known, the probability distribution information in our work is obtained from non-parametric learning approach - kernel density estimation. For clarity, the notations are listed in below.
regional origins and/or destinations
The set of scenarios
= holding cost at location .
= moving cost from location to location .
= the average demand of location .
= first-stage decision variable which denotes the number of vehicles at location .
Random Variables (for stochastic programming model)
= random demands which denotes the number of cars that will be picked up by customers at location .
= the probability of scenario s.
= the second-stage decision variable which denotes the number of vehicles moving from location i to location j under scenario s.
3.1 Deterministic Model
In the deterministic model, we consider to allocate the limited vehicles to different locations in order to maximize the overall profit. For convenience, we consider using the average demands. The deterministic model for car-sharing problem can be formulated as follows.
The objective function (1) is to maximize the overall profit which equals the difference of total revenue and total holding cost. The constraint in equation (2) ensures that the number of total vehicles are not exceeded the capacity which can be easily estimated from historical data. The constraints in equation (3) guarantee that each location must satisfy the customer demand. Constraints (4) and (5) are the types of decision variables.
Although deterministic model is capable of tackling the optimization model in a simple way, the average demands for model may lead to optimal solution with high risk even infeasible. Additionally, it is worth noting that the objective function (1) is a piece-wise linear function, therefore, it is required to reformulated to a linear function before solving.
3.2 Two-Stage Stochastic Programming Model
The two-stage stochastic programming model of car-sharing problem can be formulated as follows.
Similar as one-stage SP model, the objective function in equation (6) is to maximize the overall profit, which is denotes by the difference of revenue and overall cost (the summation of holding cost and moving/transferring cost). Constraint (7) denotes the number of vehicle cannot exceed the capacity of car firm. Constraint (8) implies that the sum number of cars that moving from location i to each location should not exceed the number of car at location i. Constraints (9) and (10) describe the type of decision variables.
4.1 Model Reformulation
Unlike the deterministic model which can be solved by off-the-shelf commercial solvers effectively. Normally, the two-stage SP model required reformulation since the continuous probability distribution contains infinite scenarios. In this paper, we utilize the sample average approximation (SAA)Santoso et al. (2005) - a Monte Carlo method to reformulate the two-stage SP model. There are a variety of variant SAA approachesGeyer and Thompson (1992); Mak et al. (1999); Plambeck et al. (1996); Shapiro and Homem-de Mello (1998) with different names. In order to reduce the computation, we consider a simplified edition. The the procedure of SAA can be summarized as follows.
Notice that the reformulation model in SAA, the objective function becomes
where is the number of scenarios. Additionally, the objective function is still a non-linear objective function. We introduce the auxiliary variable to transform the non-linear objective function to a linear type. Then the two-stage SP model becomes
4.2 Model Decomposition
After the final reformulation, the two-stage SP model becomes a very large-scale deterministic model, for example, if we consider 50 locations and 1000 scenarios, the number of second-stage decision variable will be 50*50*1000 = 2,500,000. To solve large-scale model effectively, decomposition algorithm is required. In this section, we introduce Benders decompositionBenders (2005) to solve the problem. For convenience, in the following, we neglect the constant N. Then we divide the reformulated model into master problem (MP)
and subproblem (SP) in the dual form
and are the dual variables of SP, while and are the fixed values that are determined by MP. During each iteration in MP, the values are adjusted and assigned to SP. Finally, the algorithm can be summarized as follows.
5 Numerical Experiment
Experiment Setup. All the algorithms (KDE, SAA and BD) are implemented using Python 3.7, the mathematical models are solved by Gurobi 8.1 under the platform Intel i7, 16GB RAM, Windows 10.
Experiment Design. We devise a group of experiments. After the data preprocessing and distribution estimation by KDE, firstly, we validate the running times and expected profits based on different numbers of scenarios, additionally, we compare the outcomes deterministic model with two-stage SP model. Secondly, we compare the results yielded from non-parametric approach KDE with several parametric distributions like Gaussian distribution in terms of expected profit. The above experiments are based on training sets, finally, we compare the expected values obtained from training sets with testing sets. Specifically, we fix the first-stage decision variables by the outcomes that yields from training sets, then compute the overall expected values that the demands are from testing sets.
5.1 Data Pre-possessing
The data sets are from New York taxi trip111https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page, we collected three years (July 2016 - June 2019) green taxi trip records as the data source which is archived by month. Then we split the three years data sets into training set (from July 2016 to December 2018) and testing set (from January 2019 to June 2019), each data set involves thousands of naive one-trip records with a complex structure. Take the data set 2018-01 for example, it contains 793,529 records and 19 attributes. It is worth noting that the deterministic parameters in our SP model like (revenue) and (transferring cost) can be estimated from the data set easily. For convenience, in the following experiments, the revenue per car is set to $100, the transferring cost is set to $5 by rough estimation, the number of available vehicles is set to 15,000, and the holding cost is assumed to follow the Gaussian distribution with parameters . Additionally, in this data set the whole New York city is divided into 259 different locations, we picked 20 locations with highest demands, which are aggregated by days (i.e. 914 days for training set and 181 days for testing set). Meanwhile, the pickup location and drop off location names are mapped as location ID stored in PULocationID and DOLocationID in the data set respectively. The New York city location division information details can be found via https://data.world/nyc-taxi-limo/taxi-zone-lookup.
5.2 Probability Distribution and Sampling Results
After using KDE, the demands probability distributions of each location is illustrated in the following figure, in which the bar plots denote the primitive demand while the curves are the approximate distributions for the locations derived by KDE.
Among the top 20 demands locations, there are mainly two types of distribution, one is unimodal type, which can be seen from Figure 1. The other type which represents the most locations is bimodal type. This can be seen from the following figure.
In the first type, a specific functional form for the density model such as Gaussian distribution can be assumed, in other words, parametric methods can be applied on these scenarios. While in the second type, the particular form of parametric functions are unable to provide the appropriate representation of the real density. In such cases, we must consider using non-parametric approaches such as KDE.
Most of the parametric methods may work well in the unimodal distributions, but cannot achieve the same goal for bimodal distributions. That is why KDE approach is introduced in this work.
5.3 Stochastic Model vs. Deterministic Model Results
In this experiment, We generate 5 groups of scenarios for SP model based on the probability distributions that are derived from KDE. The numbers of scenarios are 20, 50, 100, 200 and 500, each group runs 10 times. Additionally, we consider deterministic model using the average demands that are calculated from training set (average demand of 919 days) and testing set (average demand of 181 days). The average objective value and time elapse can be seen in the table below.
|Number of Scenario||Objective Value||Time Elapse (s)|
|deterministic (average on training set)||$1,325,723||0.24|
|deterministic (average on testing set)||$1,017,054||0.24|
Based on the experimental results, we come to conclude that the two-stage SP model is able to yield more outcomes than the deterministic model. While by average demands, the overall profit on the training set is more that the one on the testing set. Meanwhile, as we discussed in the beginning of this section, as the number of scenario increases, the time elapses grows by approximate linear increment.
5.4 Validations on Parametric Distributions
Besides the non-parametric approach, we also use several popular parametric distributions (such as Gaussian, lognormal, Laplace and Exponential distributions) in terms of average expected value. Among these distributions, we found that the SP model based on the exponential and lognormal distributions are infeasible. The reason is the high average demand in a specific location will yield the sampling that with extreme high demand which lead the SP model infeasible.
|Number of Scenario||Outcome(Laplace)||Outcome(Gaussian)|
As can be seen from the table, the overall profit yielded from Laplace distribution is slightly better than the one yielded from Gaussian distribution. However both of the parametric approaches are inferior to the one from KDE in terms of the overall profit.
5.5 Solutions Comparisons on Testing Sets
In the two-stage SP model, solutions involves two parts, first-stage decision variables which denotes the numbers of cars that are placed at each location before demands realize. Second-stage decision variables which denote the number of cars that are moving between locations. In this experiment, we use the values of first-stage decision variables that are derived from two-stage SP model that are constructed based on training model, to validate the overall profit of two-stage SP model that are constructed based on testing sets (6 months, 181 days).
As can be seen from the results, using the values first-stage decision variables, the outcomes gap on training sets and testing sets is from 0.35% to 46.57%, the average gap is 22.1%. Meanwhile, compared to Table 1, we come to conclude that most profits that yielded from SP model on the testing sets are higher that the profit that yielded from deterministic model on the testing sets.
6 Conclusions and Future Work
In this paper, we propose a framework that involves kernel density estimation to predict the location demands, and a two-stage stochastic programming model to solve the car-sharing problem under demand uncertainty. In more real world, the demand distribution would be time variant and evolves gradually (or the parameters of distribution vary at least), which renders the data-driven system outdated and leads to deteriorates the resulting solution qualityNing and You (2019)
. In order to describe this evolution in a more precise way, we will investigate Bayesian learning which focus on posterior probability distribution that is based on prior probability distribution and the likelihood of current data. Namely, we will explore the dynamic data-driven stochastic programming model for car-sharing problem.
Additionally, in our work, the proposed framework treats the location demands by days. For some real-time applications, however, the daily demand should be considered as time-series data, which would be handled by time-series prediction machine learning algorithm. We will explore this topic in our future works.
Meanwhile, in this paper, for convenience, some other factors we do not consider. For example, we do not consider the capacity of locations, and the route condition of balancing which may lead different transportation costs. Later on, we will extend the two-stage SP model to a more practical one.
- Shaheen et al. (1998) S. Shaheen, D. Sperling, C. Wagner, Carsharing in europe and north american: past, present, and future (1998).
Bruglieri et al. (2018)
M. Bruglieri, F. Pezzella,
A two-phase optimization method for a multiobjective
vehicle relocation problem in electric carsharing systems,
Journal of Combinatorial Optimization 36 (2018) 162–193.
- Vosooghi et al. (2017) R. Vosooghi, J. Puchinger, M. Jankovic, G. Sirin, A critical analysis of travel demand estimation for new one-way carsharing systems, in: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), IEEE, pp. 199–205.
- Illgen and Höck (2018) S. Illgen, M. Höck, Literature review of the vehicle relocation problem in one-way car sharing networks, Transportation Research Part B: Methodological (2018).
- Gambella et al. (2018) C. Gambella, E. Malaguti, F. Masini, D. Vigo, Optimizing relocation operations in electric car-sharing, Omega 81 (2018) 234–245.
- Huang et al. (2018) K. Huang, G. H. de Almeida Correia, K. An, Solving the station-based one-way carsharing network planning problem with relocations and non-linear demand, Transportation Research Part C: Emerging Technologies 90 (2018) 1–17.
- Xu et al. (2018) M. Xu, Q. Meng, Z. Liu, Electric vehicle fleet size and trip pricing for one-way carsharing services considering vehicle relocation and personnel assignment, Transportation Research Part B: Methodological 111 (2018) 60–82.
- Li et al. (2016) B. Li, D. Krushinsky, T. Van Woensel, H. A. Reijers, An adaptive large neighborhood search heuristic for the share-a-ride problem, Computers & Operations Research 66 (2016) 170–180.
- Zhao et al. (2018) M. Zhao, X. Li, J. Yin, J. Cui, L. Yang, S. An, An integrated framework for electric vehicle rebalancing and staff relocation in one-way carsharing systems: Model formulation and lagrangian relaxation-based solution approach, Transportation Research Part B: Methodological 117 (2018) 542–572.
- Boyacı et al. (2015) B. Boyacı, K. G. Zografos, N. Geroliminis, An optimization framework for the development of efficient one-way car-sharing systems, European Journal of Operational Research 240 (2015) 718–733.
- Brandstätter et al. (2017) G. Brandstätter, M. Kahr, M. Leitner, Determining optimal locations for charging stations of electric car-sharing systems under stochastic demand, Transportation Research Part B: Methodological 104 (2017) 17–35.
- Biondi et al. (2016) E. Biondi, C. Boldrini, R. Bruno, Optimal deployment of stations for a car sharing system with stochastic demands: A queueing theoretical perspective, in: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), IEEE, pp. 1089–1095.
- Fan and Machemehl (2004) W. Fan, R. Machemehl, A multi-stage monte carlo sampling based stochastic programming model for the dynamic vehicle allocation problem, Technical Report, 2004.
- Fan (2014) W. D. Fan, Optimizing strategic allocation of vehicles for one-way car-sharing systems under demand uncertainty, in: Journal of the Transportation Research Forum, volume 53.
- Cavagnini et al. (2018) R. Cavagnini, L. Bertazzi, F. Maggioni, M. Hewitt, A two-stage stochastic optimization model for the bike sharing allocation and rebalancing problem, 2018.
- Birge and Louveaux (2011) J. R. Birge, F. Louveaux, Introduction to stochastic programming, Springer Science & Business Media, 2011.
Fuzzy programming and linear programming with several objective functions,Fuzzy sets and systems 1 (1978) 45–55.
- Ben-Tal et al. (2009) A. Ben-Tal, L. El Ghaoui, A. Nemirovski, Robust optimization, volume 28, Princeton University Press, 2009.
Delage and Ye (2010)
E. Delage, Y. Ye,
Distributionally robust optimization under moment uncertainty with application to data-driven problems,Operations research 58 (2010) 595–612.
- Bengio et al. (2018) Y. Bengio, A. Lodi, A. Prouvost, Machine learning for combinatorial optimization: a methodological tour d’horizon, arXiv preprint arXiv:1811.06128 (2018).
- Larsen et al. (2018) E. Larsen, S. Lachapelle, Y. Bengio, E. Frejinger, S. Lacoste-Julien, A. Lodi, Predicting solution summaries to integer linear programs under imperfect information with machine learning, arXiv preprint arXiv:1807.11876 (2018).
- Ning and You (2018) C. Ning, F. You, Data-driven stochastic robust optimization: General computational framework and algorithm leveraging machine learning for optimization under uncertainty in the big data era, Computers & Chemical Engineering 111 (2018) 115–133.
- Shang and You (2018) C. Shang, F. You, Distributionally robust optimization for planning and scheduling under uncertainty, Computers & Chemical Engineering 110 (2018) 53–68.
- Santoso et al. (2005) T. Santoso, S. Ahmed, M. Goetschalckx, A. Shapiro, A stochastic programming approach for supply chain network design under uncertainty, European Journal of Operational Research 167 (2005) 96–115.
- Geyer and Thompson (1992) C. J. Geyer, E. A. Thompson, Constrained monte carlo maximum likelihood for dependent data, Journal of the Royal Statistical Society: Series B (Methodological) 54 (1992) 657–683.
- Mak et al. (1999) W.-K. Mak, D. P. Morton, R. K. Wood, Monte carlo bounding techniques for determining solution quality in stochastic programs, Operations research letters 24 (1999) 47–56.
- Plambeck et al. (1996) E. L. Plambeck, B.-R. Fu, S. M. Robinson, R. Suri, Sample-path optimization of convex stochastic performance functions, Mathematical Programming 75 (1996) 137–176.
- Shapiro and Homem-de Mello (1998) A. Shapiro, T. Homem-de Mello, A simulation-based approach to two-stage stochastic programming with recourse, Mathematical Programming 81 (1998) 301–325.
- Benders (2005) J. F. Benders, Partitioning procedures for solving mixed-variables programming problems, Computational Management Science 2 (2005) 3–19.
Ning and You (2019)
C. Ning, F. You,
Optimization under uncertainty in the era of big data and deep learning: When machine learning meets mathematical programming,Computers & Chemical Engineering (2019).