DeepAI
Log In Sign Up

DDKSP: A Data-Driven Stochastic Programming Framework for Car-Sharing Relocation Problem

Car-sharing issue is a popular research field in sharing economy. In this paper, we investigate the car-sharing relocation problem (CSRP) under uncertain demands. Normally, the real customer demands follow complicating probability distribution which cannot be described by parametric approaches. In order to overcome the problem, an innovative framework called Data-Driven Kernel Stochastic Programming (DDKSP) that integrates a non-parametric approach - kernel density estimation (KDE) and a two-stage stochastic programming (SP) model is proposed. Specifically, the probability distributions are derived from historical data by KDE, which are used as the input uncertain parameters for SP. Additionally, the CSRP is formulated as a two-stage SP model. Meanwhile, a Monte Carlo method called sample average approximation (SAA) and Benders decomposition algorithm are introduced to solve the large-scale optimization model. Finally, the numerical experimental validations which are based on New York taxi trip data sets show that the proposed framework outperforms the pure parametric approaches including Gaussian, Laplace and Poisson distributions with 3.72

READ FULL TEXT VIEW PDF
09/20/2019

A Two-Stage Stochastic Programming Model for Car-Sharing Problem using Kernel Density Estimation

Car-sharing problem is a popular research field in sharing economy. In t...
06/26/2020

A GRU-based Mixture Density Network for Data-Driven Dynamic Stochastic Programming

The conventional deep learning approaches for solving time-series proble...
04/23/2019

Data-driven Computing in Elasticity via Chebyshev Approximation

This paper proposes a data-driven approach for computing elasticity by m...
02/25/2022

On the complexity and modeling of the electric vehicle sharing problem

We introduce the electric vehicle sharing problem (EVSP), a problem that...
03/30/2020

A flexible method for estimating luminosity functions via Kernel Density Estimation

We propose a flexible method for estimating luminosity functions (LFs) b...
02/28/2020

Distributionally Robust Chance Constrained Programming with Generative Adversarial Networks (GANs)

This paper presents a novel deep learning based data-driven optimization...
08/13/2020

Maximum Customers' Satisfaction in One-way Car-sharing: Modeling, Exact and Heuristic Solving

One-way car-sharing systems are transportation systems that allow custom...

I Introduction

Riding on the wave of the sharing economy, car-sharing services such as Car2go111https://www.car2go.com, Wunder Mobility222https://www.wundermobility.com, TURO333https://www.turo.com, Zipcar444https://www.zipcar.com and Communauto555https://www.communauto.com play increasingly important role in terms of offering economical and environmentally conscious mobility options to citizens, especially in highly populated urban areas. To the society, car sharing can save parking lots, reduce traffic congestion and air pollution [1]. To individual users, it requires fewer ownership responsibilities and less costs to satisfy their mobility needs. In addition, car sharing provides users with a large range of vehicles, which allows them to match vehicles to trip purposes. The earliest efforts of car-sharing service can be traced back to the 1940s in Europe and 1980s in North America [2]. Despite its rather earlier origins, only the past decade has seen significant growth in large-scale car sharing businesses, which can be mainly attributed to the proliferation of the mobile internet.

A car-sharing service can be financed by public and /or private entities and managed by a service organization which maintains a fleet of cars and light trucks in a network of vehicle locations. Individuals gain access to car-sharing by joining the membership of the organization. Typically, a member pay a modest fixed charge plus a usage fee each time they use a vehicle. Vehicles are usually deployed in a lot located in a neighborhood or at a transit station. A member can reserve a vehicle through a phone call or Internet. Once approved, the reserved vehicle is assigned to the member who picks it up at an appointed time and leaves it at a specific car-sharing location, which may be the same as the pick-up point (one-way car-sharing systems [3]) or anywhere in a specified zone (free floating car-sharing systems [4]).

Three levels of decision-making, namely, strategic level, tactical level, and operational level are involved in the management of car-sharing [4, 5]. Strategic decisions include determining the mode assumed by the network (one-way, two-way, free-floating), the number, location, and capacity of stations and fleet size. Tactical decisions mainly involve management policies that govern the service in the medium term, such as reservation and pricing policies. Operational decisions are those need to be made on a daily bases according to the dynamic market and fleet conditions. Typical examples include the decisions of placing initial inventories at each location and relocating vehicles across the network of locations to accommodate the realized demands. In this paper, we propose a data-driven optimization framework to support vehicle relocation decision-making as well as initial inventory placement decisions in car-sharing management. To begin with, We review the related works in the literature.

I-a Related Works

Vehicle relocation problems in the car-sharing context are extensively studied in the literature. One major stream of work is to model CSRP by applying complicating deterministic optimization technique, which can be effectively solved by large-scale optimization exact algorithms such as Lagrangian relaxation, branch-and-bound or by heuristic algorithms such as neighborhood search, simulated annealing etc. For example, Gambella et al. 

[6] formulate electric vehicle relocation problem (EVRP) as two mixed integer programming (MIP) models to maximize the profit associated with the trips performed by the users in operating hours and non-operating hours, respectively. In the model settings, EVs battery consumption and recharge process are taken into considerations. Two model-based heuristic algorithms based on removing relocation and rolling horizon mechanisms are designed to solve the relocation model due to the computational complexity. The experiment results show that the proposed algorithms achieve near-optimal solutions and outperforms the solutions by cplex restricted by a time limit. Similarly, the authors in [7]

investigate the electric vehicle fleet size and trip pricing problem which is formulated as a mixed-integer non-linear programming (MINLP) model to maximize the overall profit by defining both long-term resource allocation and short-term operation strategy. Specifically, the proposed MINLP model aims to optimize the station location, station capacity and fleet size simultaneously. To solve this large scale MINLP problem, a customized gradient algorithm is introduced and validate in a real case study. An integrated framework for electric vehicle re-balancing and staff relocation (EVR&SR) is proposed by

[8]. The EVR&SR is represented using a space-time network and formulated as mixed-integer linear programming (MILP) model to minimize the overall cost including investment costs and operation expenses. The determination of the optimal allocation plan of EVs and staff relocation in the strategic level as well as the decisions of EV relocation and staff relocation are both taken into considerations in this framework. Since even the medium-scale instances cannot be solved by CPLEX and Gurobi effectively, a Lagrangian relaxation-based solution approach which decomposes the primal problem into a group of sub-problems embedded with dynamic programming and greedy algorithm is introduced to tackle the large-scale problem instance. It is able to reach the near-optimal solution in a short time. In [9], a more general framework which involves a multi-objective MILP model and a virtual hub is introduced. In details, the mulit-objeictive model considers both vehicle relocation and electrical charging requirements. While the virtual hub is aggregated to tackle the extremely large number of relocation variables. The problem can be solved by the typical branch-and-bound approach which generates the efficient frontier and reaches the trade-off between operator’s and users’ benefits to maximize the net revenue for the operator. To guarantee the flexibility of car-sharing service, [10] proposes a two-stage optimization model which involves optimizing destination locations and maximizing manager’s profit. However, the aforementioned studies do not consider any uncertain parameters such as demand, supply and travelling time. Thus, these modeling approaches cannot be directly applied to our CSRP.

Another line of literature models CSRP by applying stochastic programming modeling techniques. A similar application like CSRP called bike sharing allocation and re-balancing problem (BSA & RP) is introduced in [5]. In order to minimize the total expected penalty which involves the sum of all the charged penalties for delivery, re-balancing, extra and excess inventory and stock-out, the problem is formulated as a two-stage stochastic programming model. In the two-stage SP model, the initial allocation in strategic level is considered in the first-stage decision, while the rebalancing is tackled in the second-stage decision. Meanwhile, a solution-based heuristic algorithm based on scenario generation is devised to solve the model. A multi-stage stochastic linear programming (SLP) model is developed for optimizing strategic allocation of car-sharing vehicles (OSACV) in [11] considering dynamic and uncertain demands. In the problem settings, the vehicles are assumed to be in use, in transit empty or stationary empty. Additionally, the travelling time between locations is one day. The aim of the problem is maximizing total expected profits which involves revenue and moving cost in both strategic and operational levels. Since the SP model involves seven stages, a scenario tree approach is utilized to solve the complex multi-stage SP model. In [12]

, the authors address large-scale dynamic repositioning and routing problem (DRRP) instances with stochastic customer demand. The DRRP can be applied in many similar fields such as bike-sharing after simplified extension. A two-stage stochastic programming model based on network flow formulation is built to minimize the expected cost, wherein, the customer arrivals and starting time are assumed to follow Poisson distribution. An iterative algorithm called SPAR (separable, projective, approximation, routine) is adapted to solve the model in a real-world case study. Nevertheless, the above modelings and approaches cannot be applied in data-driven environment directly since they do not utilize the historical data in an accurate way. Furthermore, mathematical models that are formulated based on SP assumed that the probability distribution is known with a specific type. However, in the real historical data, the probability distribution information may contain many even infinite parameters which cannot be described by simple known distribution such as Gaussian distribution or Poisson distribution as referred in

[12].

I-B Research Gaps

Nowadays, with the rapid development of transportation in cities, a huge amount of data is generated every day, which leads to the significant change in the intelligent transportation system [13, 14]. However, increasing data brings new challenges to traditional optimization of car-sharing relocation problem (CSRP) which plays a key role in CSS. For example, the customer demand (traffic flow) variability has a great impact on inventory level, the inappropriate decision-makings may lead poor service level [15]. Therefore, how to tackle the uncertainty factors in data-driven environment is the key factor for CSRP.

The major limitation of previous works related to SP is that the probability distribution information is assumed to be known or estimated by experience. Actually, in those relevant works, the probability distribution are determined by decision-makers using parametric approach. Specifically, the decision-makers select a specific parametric distribution (e.g. Gaussian distribution). Afterwards, the parameters of the distribution will be determined by statistical methods. However, in most real applications, the true distribution information may be too complex to be described by simple parametric approaches. Therefore, we explore utilizing related machine learning approaches to make the SP model more practical. Recently, combining machine learning (ML) / deep learning(DL)

[16] with optimization techniques becomes the trend in operations research (OR) community[17, 18], which is known as data-driven optimization. A few researchers attempted to leverage the advantages of ML to make optimization models more realistic, and applied this in chemical industry[19, 20]. In detail, they applied Dirichlet process mixture model (DPMM) and principle component analysis (PCA) on distributionally robust optimization (DRO) model, which cannot satisfy our purpose. To the best of our knowledge, no similar work are applied in CSRP.

I-C Objectives and Contributions

In light of the results from previous works[19, 20], to consider applying the concept in CSRP, we proposed an innovative data-driven stochastic programming framework named DDKSP, which organically integrates the non-parametric approach - kernel density estimation (KDE) and stochastic programming model. Specifically, unlike the previous relevant work in which the probability distribution are assume to be known or estimated by parametric approach, the true probability distribution of customer demands are extracted by KDE. Then a two-stage non-linear stochastic programming model with the derived parameters is proposed to formulate the CSRP. Finally, integrating sample average approximation method with Benders decomposition algorithm is introduced to solve the two-stage non-linear SP model. It is worth noting that our proposed framework can be easily extended to solve the homogeneous problems such as bike-sharing and EV-sharing problem [21, 22, 23, 24].

The rest of the paper is organized as follows. The problem description and formulation are discussed in section 2. While section 3 describes the DDKSP framework which involves KDE, sample average approximation (SAA) method and Benders decomposition algorithm. Data prepossession and numerical experiment are presented in section 4. Finally, we conclude our work and propose future work in section 5.

Ii Problem Formulation

Ii-a Problem Statement

Generally, we study the CSRP which is a typical decision-making under centralized environment. It involves two roles, a car company and customers. Consider a one-way car-sharing system (pickup at one location while dropoff at any locations), a car company owns a number of vehicles and there is a number of locations for car dispatch. For the customers, they reserved cars in advance and picked the car at the specific location. The CSRP can be considered as a two-stage decision-making problem which can be described as follows. In the first-stage (in the strategic phase), during a time window (e.g., from 0 am to 4 am) before the upcoming customer demands realize, each vehicle location is allocated with a certain number of cars (initial inventory decision-making), which incurs holding costs denoted by . In the second-stage (in the operational phase), after the real customer demand revealed (we assume that there exist a deadline that no customer orders accepted for today, e.g. 4 am), customers who reserved the cars will visit the locations to pick up the vehicles which brings revenue denoted by . Meanwhile, the truck carriers in the car company must dynamically move the cars from lower demands locations to higher demands locations to prevent the imbalance of vehicles among locations, which incurs moving costs denoted by .

Since the first-stage decision must be made before the second-stage, namely, the decision-makers must decide the most appropriate number of cars at each location to satisfy all the possibilities (called scenarios in stochastic programming) of customer demands (more cars will incur more holding cost, less cars will incur more moving cost), while reducing moving cost as possible as they can. The mathematical model must be able to hedge against the customer demands uncertainty. Based on the problem settings, the objective of CSRP is maximizing the overall expected profit, which involves total revenue, holding costs at each location and moving costs between locations. In this sense, the CSRP in this work focus on answering the following questions. (1) How many initial vehicles before the real demands revealed are required in each location, (2) how to move cars between locations in order to satisfy customer demands while maximize the overall profit.

In this work, the most critical concern for CSRP is the way of modeling uncertainty under data-driven environment. For convenience, only customer demand is considered as uncertainty parameter. Since the CSRP is a typical two-stage problem with demands uncertainty, we investigate to utilize two-stage stochastic programming model to formulate the problem. In the two-stage SP model, decision variables are divided into two groups: the first stage decision variables (here-and-now) which should be determined before the real demands revealed, and the second stage decision variables (wait-and-see) which are determined after the real demands realized.

Meanwhile, without the loss of generality, in the problem settings, several assumptions are made in the following.

Assumption 1.

We assume that the vehicle reservations in our work are determined before the operational phase (second-stage) starts, which implies that the customers cannot cancel or delay the reservations.

Assumption 2.

Our work assume that all the vehicles are working in the same condition, which means homogeneous cars are provided for customers.

Assumption 3.

We assume that the historical customer demand at each location is available, which indicates that the probability distribution information can be derived from historical data.

Assumption 4.

It is assumed that the true demands at all the locations are realized simultaneously.

Ii-B Model Formulation

In this section, we will discuss CSRP model formulations include deterministic model and two-stage SP counterpart. It is worth noting that probability distributions are required for SP model. For clarity, the notations are listed below.
Indices/Sets
regional origins and/or destinations
The set of scenarios
Parameters
: holding cost at location .
: moving cost from location to location .
: the average demand of location .
Decision Variables
: first-stage decision variable which denotes the number of vehicles at location .
: the second-stage decision variable which denotes the number of vehicles moving from location to location under scenario .
Random Variables (for stochastic programming model)
: random demands which denotes the number of cars that will be picked up by customers at location .
: the probability of scenario .

Ii-B1 Deterministic CSRP Model

In the deterministic model, we consider to allocate the limited vehicles to different locations in order to maximize the overall profit. For convenience, we consider using the average demands. The deterministic model for CSRP can be formulated as follows.

(1)

s.t.

(2)
(3)
(4)
(5)

The objective function (1) is to maximize the overall profit which equals the difference of total revenue and total holding cost. The constraint in equation (2) ensures that the number of total vehicles are not exceeded the capacity which can be easily estimated from historical data. The constraints in equation (3) imply two-fold meanings. If the number of allocated cars at location is higher than the customer demand at location , then the number of vehicles that move out must be less than the difference of number of cars at this location and customer demand of this location. Otherwise, no cars move out from location which implies the quantity of available vehicles is lower than the customer demand at location . Constraints (4) and (5) are the types of decision variables.

Although the deterministic model is capable of tackling the optimization model in a simple way, the average demands for model may lead to optimal solution with high risk even infeasible. Additionally, it is worth noting that the objective function (1) is a piece-wise linear function, therefore, it is required to reformulated to a linear function before solving.

Ii-B2 Two-Stage SP CSRP Model

The car-sharing operators wish to maximize expected profit over all possible realization of scenarios. Considering the customer demands are under uncertainty, we assume the demand scenarios are sampled from the probability distribution that are derived from historical data. Then the two-stage SP model of CSRP can be formulated as follows.

(6)

s.t.

(7)
(8)
(9)
(10)

The objective function (6) is to maximize the overall profit, which is denotes by the difference of revenue and overall cost (the summation of holding cost and moving/transferring cost). Constraint (7) is identical to constraint (2). Similar as constraints (3), constraints (8) also imply two-fold meanings, slightly unlike constraint (3), it involves SP scenarios. Specifically, if the number of allocated cars at location is higher than the customer demand at location , then the number of vehicles that move out under scenario must be less than the difference of number of cars at this location and customer demand of this location under scenario . Otherwise, no cars move out from location . under scenario . Constraints (9) and (10) describe the type of decision variables.

Iii Ddksp

Inspired by the idea of integration of ML with OR, the DDKSP framework is proposed in this work, which is briefly described as follows. Basically, the DDKSP framework involves four components, specifically, ML / DL part (in our problem setting, it is KDE) is in charge of probability distribution extraction from uncertain data, SP part focuses on the problem modeling, SAA & Benders decomposition part aims at reformulation SP model, and the last part yields the final decision-making. The DDKSP framework can be illustrated in Fig. 1

. It is worth noting that our framework can be readily extended by components replacement. For example, the ML  DL part can adopt general supervised and unsupervised learning algorithms depend on the specific problems, the SP part can be replaced by Robust Optimization (RO) 

[25] or Distributionally Robust Optimization (DRO) [26], and the SAA & Benders decomposition part can be replaced by other large-scale decomposition algorithms such as column generation, Lagrangian relaxation etc.

Fig. 1: The overview of DDKSP framework

Iii-a Kde

For the first component, we adopt Kernel density estimation (KDE) for our work. KDE is a typical non-parametric approach which is applied to describe probability distribution without specifying the distribution form in advance [27]. Let f be the density function of parameters, given a set of data , then the KDE for f can be obtained as follows

where K is the kernel function and h is the bandwidth. In this work, we select Gaussian kernel function as the kernel which is given below.

Iii-B Two-Stage SP CSRP Model Reformulation

Unlike the deterministic model which can be solved by off-the-shelf commercial solvers effectively. Normally, the two-stage SP model required reformulation since the continuous probability distribution contains infinite scenarios. In this paper, we utilize the sample average approximation (SAA)[28] - a Monte Carlo method to reformulate the two-stage SP model. The procedure of SAA can be summarized as follows.

Input: probability distribution , number of sample , size and two-stage SP model


Output: the optimal value

1:
2:while  do
3:     
4:     a sample of N scenario is generated according to ;
5:     reformulate the model as
6:     solve the model and get optimal value and optimal solution ;
7:end while
8:return
as the approximate optimal result.
Algorithm 1 Sampling Average Approximation

Notice that the reformulation model in SAA, the objective function becomes

where is the number of scenarios. Additionally, the objective function is still a non-linear objective function. We introduce the auxiliary variable to transform the non-linear objective function to the linear type. Let . Then the two-stage SP model becomes

s.t.

(11)
(12)
(13)
(14)
(15)
(16)

Iii-C Two-Stage SP CSRP Model Decomposition

After the reformulation, the two-stage SP model becomes a very large-scale deterministic model, for example, if we consider 50 locations and 1000 scenarios, the number of second-stage decision variables will be 50*50*1000 = 2,500,000. To solve large-scale model effectively, decomposition algorithm is required. In this work, we introduce Benders decomposition[29] to solve the reformulated model. Generally, Benders decomposition is an effective algorithm aims solving mixed integer linear programming (MILP) model, in which the primal model is decomposed into one master problem (MP) and a group of subproblems (SUBP) in dual form, the outcome is yielded from iterative solving SUBP and updated MP.

For convenience, in the following, we neglect the constant . Then we divide the reformulated model into a MP

(17)

and a SUBP in the dual form

(18)

s.t.

(19)

where and are the dual variables of SUBP, and are the fixed values that are determined by the MP. During each iteration in MP, the values are adjusted and assigned to the SUBP. Finally, the algorithm can be summarized as follows.

Input:
      Output: the optimal solution

1:;
2:while  do
3:     given the fixed value and , solve
4:     if  is unbounded then
5:         get ray(, ) and add cut - to
6:     else if  is optimal then
7:         get point(, ) and add cut - to
8:         update
9:     else
10:         the original model is infeasible      
11:     end if
12:     solve the model
13:     update value of
14:end while
15:return either or as the optimal value
Algorithm 2 Benders Decomposition for Two-Stage SP CSRP Model

where is a very small factional number, which is usually set from to . Therefore, in our case, either values of upper bound or lower bound can be considered as the optimal solution.

Iv Numerical Experiment

Experiment Design. We design a group of experiments. To begin with, we do the data pre-processing & analysis including data aggregation for demand and demand distribution analysis. After that both non-parametric approach KDE and parametric approaches (Gaussian, Laplace and Poisson) are applied to derive probability distributions for the SP model. Then we compare the SP model with deterministic model in terms of values of objective functions and models running time. Moreover, we validate and compare the KDE with three parametric approaches - Gaussian, Laplace and Poisson distributions. Finally, we explore and show the two-stage decision making based on a day record.

Experiment Setup. The algorithms (SAA, BD, KDE and parametric approaches) are implemented using Python 3.7, the mathematical models are solved by Gurobi 666https://www.gurobi.com/academia/academic-program-and-licenses/ 8.1 academic version under the platform Intel i7, 16GB RAM, Windows 10. It is worth noting that the deterministic parameters in our SP model like (revenue) and (transferring cost) can be estimated from the data set easily. For convenience, in the following experiments, the revenue per car is set to $100, the transferring cost is roughly estimated based on the distance between locations which ranges between 10 to 100, the number of available vehicles is set to 16,000, and the holding cost is assumed to follow the Gaussian distribution with the parameters .

Iv-a Data Analysis

The data sets are from New York taxi trip777https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page, we collected three years (July 2016 - June 2019) green taxi trip records as the data source which is archived by month. We split the three years data sets into training set (from July 2016 to December 2018, 919 days) and testing set (from January 2019 to June 2019, 181 days), each data set involves thousands of naive one-trip records with a complex structure. Take the data set 2018-01 for example, it contains 793,529 records and 19 attributes. For our application purpose, we investigate 6 attributes which is listed in Table I. Additionally, in this data set the whole New York city is divided into 259 different locations. The New York city location division information details can be found via https://data.world/nyc-taxi-limo/taxi-zone-lookup. The main task of data processing is to aggregate the trip records into demands, which are aggregated by days. After the data processing, we selected 20 locations (location IDs: 74, 41, 7, 75, 255, 82, 166, 42, 181, 97, 129, 25, 95, 244, 33, 260, 256, 66, 223 and 65, sorted by demands descending) with highest average demands, which are plotted on the map in Fig. LABEL:fig:nyc.

Attribute Description
lpep_pickup_datetime pickup time
lpep_dropoff_datetime dropoff time
PULocationID pickup location ID
DOLocationID dropoff location ID
trip_distance total distance
fare_amount passenger fare
TABLE I: MAJOR attributes in the data set

Among the 20 locations with high demands that are estimated from the data sets, there are mainly two types of distributions for demands. One is unimodal type, the other one which represents the most locations is bimodal type. In the first type, a specific functional form for the density model such as Gaussian distribution can be assumed, in other words, parametric methods can be applied on these scenarios. Most of the works that related to SP adapts this approach. While in the second type, the particular form of parametric functions are unable to provide the appropriate representation of the real density. In such cases, we must consider using non-parametric or semi-parametric approaches such as KDE or Gaussian mixture model (GMM).

Most of the parametric methods may work well in the unimodal distributions, but cannot achieve the same goal for bimodal distributions. That is why KDE approach is introduced in this work.

Iv-B Stochastic Model vs. Deterministic Model Results

In order to compare the deterministic model with SP one under different scenarios, We generate 5 groups of scenarios for SP model based on the probability distributions that are derived from KDE. The numbers of scenarios are 20, 50, 100, 200 and 500. Meanwhile, each group runs 10 times under SAA. Additionally, we consider deterministic model using the average demands that are calculated from training set (average demand of 919 days) and testing set (average demand of 181 days). The average objective values and time elapse can be seen in Table II.

Number of Scenario Objective Value Time Elapse (s)
20 $1,477,845 2.73
50 $1,487,606 6.87
100 $1,475,688 10.89
200 $1,484,367 21.73
500 $1,469,642 53.12
deterministic (average on training set) $1,325,723 0.24
 deterministic (average on testing set) $1,017,054 0.24
TABLE II: Average objective value and time elapse under different number of scenarios

Based on the experimental results, we come to conclude that the two-stage SP model is able to yield more outcomes than the deterministic model. the objective value of two-stage SP model is 11.56% and 45.42% more than deterministic counterpart on training set and testing set respectively. Additionally, by average demands, the overall profit on the training set is more that the one on the testing set.

Iv-C Validations on Parametric Approaches

Besides the non-parametric approach, we also use several popular parametric distributions (Gaussian, Laplace and Poisson distributions) as the customer demands distributions based on the data sets. Meanwhile, the parameters from Laplace , Gaussian , and Poisson distributions are estimated by maximum likelihood estimation (MLE) using the sampling data, which implies the following equations satisfy.

where denotes the number of sampling data.

Number of Scenario KDE Gaussian Laplace Poisson
20 $1,477,845 $1,467,117 $1,425,569 $1,299,895
50 $1,487,606 $1,422,868 $1,402,279 $1,312,471
100 $1,475,688 $1,417,811 $1,417,403 $1,315,831
200 $1,484,367 $1,406,112 $1,412,343 $1,321,364
500 $1,469,642 $1,406,103 $1,398,546 $1,332,124
average $1,479,030 $1,424,002 $1,411,228 $1,316,337
TABLE III: Average objective value under different probability distributions

The comparison between KDE and the three parametric approaches is shown in Table III , the overall profit yielded from Gaussian distribution is slightly better than the one yielded from Laplace distribution, and both of them are better than Poisson distribution. However all of the parametric approaches are inferior to the non-parametric approach KDE in terms of the overall profit (3.72%, 4.58% and 11% lower than non-parametric method by average).

Iv-D Two Stages Decision Makings

In the two-stage SP model, solutions involves two parts, the first-stage decision variables which denote the numbers of cars that are placed at each location (or the initial inventory level) before demands realize, and the second-stage decision variables which denote the number of cars that are moving between locations for re-balancing. We design a group of experiment in this subsection.

Firstly, the values of first-stage decision variables are derived from two-stage SP model using KDE, Poisson, Laplace and Gaussian based on training sets (30 months), the results under different distributions are shown in Table IV, Table V, Table VI, Table VII, respectively. Take Table IV

for example, the rows denote the numbers of scenario in SP model, the columns denote top 20 locations with highest demands (by descending sort) as mentioned before. We come to conclude that the solutions by KDE are more stable (lower variance) compared with Poisson, Laplace and Gaussian distributions. In practical applications, the decision-makers can use the average values as the first-stage decisions.

scenario top 20 locations with highest demands
20 1544 1469 1529 1119 1034 736 825 483 452 849 513 630 466 593 580 495 447 498 413 325
50 1541 1308 1055 1215 1074 978 732 504 664 663 653 663 591 609 561 509 469 468 403 340
100 1595 1356 1212 1052 1046 876 770 474 641 652 630 634 655 560 544 534 528 505 406 330
200 1564 1293 1315 1059 1008 822 822 507 655 681 658 642 596 535 573 549 490 473 428 330
500 1567 1338 1316 1079 1027 843 814 473 599 660 638 634 620 544 557 529 499 462 451 350
TABLE IV: VALUES of First Stage Decision Variables under KDE
scenario top 20 locations with highest demands
20 1393 1488 1637 1044 1085 790 888 502 485 903 469 616 501 468 527 476 463 463 447 355
50 1545 1390 1170 982 1092 867 809 641 485 718 633 624 581 576 521 543 586 414 454 369
100 1553 1244 1391 1120 999 902 813 470 639 648 656 651 609 560 514 534 482 448 422 345
200 1539 1248 1288 1073 1028 871 785 566 690 704 622 637 588 560 559 523 499 428 443 349
500 1562 1300 1229 1099 1032 850 814 572 653 658 630 637 593 579 539 532 490 455 431 345
TABLE V: Values of First Stage Decision Variables under Gaussian
scenario top 20 locations with highest demands
20 1267 1297 1164 1273 1223 687 519 733 607 862 538 625 565 560 568 648 554 465 467 377
50 1670 1255 1401 801 920 914 798 427 526 894 621 717 630 423 585 540 537 518 472 351
100 1607 1275 1383 1061 983 849 798 520 649 586 633 615 596 550 582 526 497 433 491 366
200 1523 1312 1250 1002 1028 938 814 561 575 672 665 627 594 594 541 505 500 485 452 362
500 1522 1322 1255 1104 1021 872 781 596 614 683 618 634 569 571 568 537 501 449 440 343
TABLE VI: Values of First Stage Decision Variables under Laplace
scenario top 20 locations with highest demands
20 238 1541 1527 1330 1193 1063 961 900 812 826 0 689 0 662 648 582 585 539 516 388
50 0 1483 1466 1276 1149 1052 275 834 796 812 698 679 672 646 612 582 563 527 492 386
100 0 1492 1477 1261 1151 1032 475 861 752 791 717 679 661 608 580 572 550 519 457 365
200 0 1481 1443 1282 1139 1008 546 829 787 783 707 660 648 623 601 561 565 498 473 366
500 0 1472 1439 1281 1117 1011 757 796 755 779 698 665 633 621 591 544 531 502 449 359
TABLE VII: Values of First Stage Decision Variables under Poisson

Secondly, after the real demands reveal, the decision-makers must decide the vehicle moving strategy between locations (second-stage decision-making). We validate this using one day record (2019-01-01) on the testing set, which is shown in Table VIII. Based on the first-stage decisions from KDE, Poisson, Laplace and Gaussian, then the outcomes of second-stage decision are shown in Table X, Table XI, Table XII, Table XIII, respectively. The structure of the table is explained as follows, the rows denote the locations that cars moving in, while the columns represent the locations that cars moving out. The cell values imply the number of cars moving between the locations. For convenience, the numbers in both rows and columns are the top 20 locations with highest demands as mentioned above. It is worth noting that, the first-stage decision values we use are from scenario 20 of the four types of distribution, the moving results may vary if we adopt scenario 50, 100, 200 and 500. It is clear to see that, in this use case, the total number of car-moving in KDE is much less than the rest of three parametric approaches. Meanwhile, we come to conclude that given the data set, the distribution type and parameters have a great impact on the result of stochastic programming model. For example, in the Table VII we observe that the first-stage decision under Poisson is quite different from the rest of three, especially in the first location. Therefore, it leads the different second-stage decision which is shown in Table XIII. It is also worth noting that these outcomes are based on single day record, the outcomes will be different if it is applied on the rest of days record.

Location 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Demand 1370 687 1120 861 1041 374 780 487 505 785 326 308 325 572 536 373 325 289 663 245
TABLE VIII: True demands on 2019-01-01 for two-stage SP model testing

Finally, we come to investigate the profits based on different approaches over the entire testing sets. Specifically, we compute and compare the overall profit using KDE, Gaussian, Laplace and Poisson on the testing set. We compare the outcomes for six months (181 days), which are shown in Fig. 5, 9, respectively. The plots imply that the KDE approach outperforms the rest three approaches in terms of overall profits. Specifically, by average, Gaussian and Laplace distributions are ranked second and third, respectively, with a slight gap compared to KDE, Poisson distribution yielded 11% profit lower than KDE. This summarized result is shown in Table IX.

(a) January (b) February (c) March
Fig. 5: The profit from January to March.
(a) April (b) May (c) June
Fig. 9: The profit from April to June.
Approach KDE Gaussian Laplace Poisson
Profit $1,339,604 $1,317,018 $1,304,749 $1,200,684
TABLE IX: Daily Average Profits on Testing Sets

V Conclusions and Future Work

In this paper, we propose a data-driven stochastic programming framework DDKSP to solve CSRP using New York taxi trip record data sets. In more real world, the demand distribution would be time variant and evolves gradually (or the parameters of distribution vary at least), which renders the dynamic system outdated and leads to deteriorates the resulting solution quality[30]

. In order to describe this evolution in a more precise way, we will investigate Bayesian learning which focus on posterior probability distribution that is based on prior probability distribution and the likelihood of current data. Namely, we will explore the dynamic data-driven stochastic programming model for CSRP.

Additionally, in our work, the proposed framework treats the customer demands by days, which can be considered as an offline data-driven framework. In several applications, the customer demands may fluctuate intensively in hours even minutes such as taxi dispatch problem. Therefore, We will explore data-driven optimization frameworks with online learning using real-time data in our future works. Meanwhile, in this paper, for convenience, some other factors we do not consider. For example, we do not consider the capacity of locations, and the route condition of balancing which may lead different transportation costs. Later on, we will extend the two-stage SP model to a more practical one.

References

  • [1] M. Bruglieri, F. Pezzella, and O. Pisacane, “A two-phase optimization method for a multiobjective vehicle relocation problem in electric carsharing systems,”

    Journal of Combinatorial Optimization

    , vol. 36, pp. 162–193, 2018.
  • [2] S. Shaheen, D. Sperling, and C. Wagner, “Carsharing in europe and north american: past, present, and future,” 1998.
  • [3] R. Vosooghi, J. Puchinger, M. Jankovic, and G. Sirin, “A critical analysis of travel demand estimation for new one-way carsharing systems,” in 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).   IEEE, 2017, pp. 199–205.
  • [4] S. Illgen and M. Höck, “Literature review of the vehicle relocation problem in one-way car sharing networks,” Transportation Research Part B: Methodological, 2018.
  • [5] R. Cavagnini, L. Bertazzi, F. Maggioni, and M. Hewitt, “A two-stage stochastic optimization model for the bike sharing allocation and rebalancing problem,” 2018.
  • [6] C. Gambella, E. Malaguti, F. Masini, and D. Vigo, “Optimizing relocation operations in electric car-sharing,” Omega, vol. 81, pp. 234–245, 2018.
  • [7] K. Huang, G. H. de Almeida Correia, and K. An, “Solving the station-based one-way carsharing network planning problem with relocations and non-linear demand,” Transportation Research Part C: Emerging Technologies, vol. 90, pp. 1–17, 2018.
  • [8] M. Zhao, X. Li, J. Yin, J. Cui, L. Yang, and S. An, “An integrated framework for electric vehicle rebalancing and staff relocation in one-way carsharing systems: Model formulation and lagrangian relaxation-based solution approach,” Transportation Research Part B: Methodological, vol. 117, pp. 542–572, 2018.
  • [9] B. Boyacı, K. G. Zografos, and N. Geroliminis, “An optimization framework for the development of efficient one-way car-sharing systems,” European Journal of Operational Research, vol. 240, no. 3, pp. 718–733, 2015.
  • [10] A. Di Febbraro, N. Sacco, and M. Saeednia, “One-way car-sharing profit maximization by means of user-based vehicle relocation,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 2, pp. 628–641, 2018.
  • [11] W. D. Fan, “Optimizing strategic allocation of vehicles for one-way car-sharing systems under demand uncertainty,” in Journal of the Transportation Research Forum, vol. 53, no. 3, 2014.
  • [12] J. Warrington and D. Ruchti, “Two-stage stochastic approximation for dynamic rebalancing of shared mobility systems,” Transportation Research Part C: Emerging Technologies, vol. 104, pp. 110–134, 2019.
  • [13] J. Zhang, F.-Y. Wang, K. Wang, W.-H. Lin, X. Xu, and C. Chen, “Data-driven intelligent transportation systems: A survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 12, no. 4, pp. 1624–1639, 2011.
  • [14] L. Zhu, F. R. Yu, Y. Wang, B. Ning, and T. Tang, “Big data analytics in intelligent transportation systems: A survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 1, pp. 383–398, 2018.
  • [15] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, “Traffic flow prediction with big data: a deep learning approach,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 2, pp. 865–873, 2014.
  • [16] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
  • [17] Y. Bengio, A. Lodi, and A. Prouvost, “Machine learning for combinatorial optimization: a methodological tour d’horizon,” arXiv preprint arXiv:1811.06128, 2018.
  • [18] E. Larsen, S. Lachapelle, Y. Bengio, E. Frejinger, S. Lacoste-Julien, and A. Lodi, “Predicting solution summaries to integer linear programs under imperfect information with machine learning,” arXiv preprint arXiv:1807.11876, 2018.
  • [19] C. Ning and F. You, “Data-driven stochastic robust optimization: General computational framework and algorithm leveraging machine learning for optimization under uncertainty in the big data era,” Computers & Chemical Engineering, vol. 111, pp. 115–133, 2018.
  • [20] C. Shang and F. You, “Distributionally robust optimization for planning and scheduling under uncertainty,” Computers & Chemical Engineering, vol. 110, pp. 53–68, 2018.
  • [21] S. Faridimehr, S. Venkatachalam, and R. B. Chinnam, “A stochastic programming approach for electric vehicle charging network design,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 5, pp. 1870–1882, 2018.
  • [22] M. Cocca, D. Giordano, M. Mellia, and L. Vassio, “Free floating electric car sharing: A data driven approach for system design,” IEEE Transactions on Intelligent Transportation Systems, 2019.
  • [23] ——, “Data driven optimization of charging station placement for ev free floating car sharing,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC).   IEEE, 2018, pp. 2490–2495.
  • [24] X. Huo, X. Wu, M. Li, N. Zheng, and G. Yu, “The allocation problem of electric car-sharing system: A data-driven approach,” Transportation Research Part D: Transport and Environment, vol. 78, p. 102192, 2020.
  • [25] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski, Robust optimization.   Princeton University Press, 2009, vol. 28.
  • [26]

    E. Delage and Y. Ye, “Distributionally robust optimization under moment uncertainty with application to data-driven problems,”

    Operations research, vol. 58, no. 3, pp. 595–612, 2010.
  • [27] C. M. Bishop et al., Neural networks for pattern recognition.   Oxford university press, 1995.
  • [28] T. Santoso, S. Ahmed, M. Goetschalckx, and A. Shapiro, “A stochastic programming approach for supply chain network design under uncertainty,” European Journal of Operational Research, vol. 167, no. 1, pp. 96–115, 2005.
  • [29] J. F. Benders, “Partitioning procedures for solving mixed-variables programming problems,” Computational Management Science, vol. 2, no. 1, pp. 3–19, 2005.
  • [30] C. Ning and F. You, “Optimization under uncertainty in the era of big data and deep learning: When machine learning meets mathematical programming,” Computers & Chemical Engineering, 2019.

Appendix A Moving between Locations Based on the First-Stage Decision

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4
8 0 0 0 0 0 0 45 0 0 0 0 0 0 8 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 174 0 0 0 0 0 0 0 0 0 0 0 0 13 44 19 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
TABLE X: Vehicle Moving between locations based on the first-stage decision under KDE
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 0 0 0 0 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 60
14 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 23 90 0 0 0 0 0 0 0 0 0 0 0 0 0 103 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
TABLE XI: Vehicle Moving between locations based on the first-stage decision under Gaussian
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 0 103 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 261 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 196 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
TABLE XII: Vehicle Moving between locations based on the first-stage decision under Laplace
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 0 854 0 0 0 0 0 0 0 0 0 0 0 0 112 0 0 166 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 260 0 0 66
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 117 0 0 0 0 41 0 0 0 90 0 0 0 0 0 77
13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 147 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
TABLE XIII: Vehicle Moving between locations based on the first-stage decision under Poisson