The pervasive use of tablet and mobile devices leads to increased popularity of round-the-clock online shopping, which urges the sustainable development of logistics industry . For instance, online ordered products generated over one billion package deliveries in 2013, and this number is predicted to grow by 28.8% in 2018 . To on-line shoppers, speed and cost are two major concerns, between which they pay relatively more attention to cost . Unfortunately, speed and cost usually conflict with each other in nature. In another word, speeding up the shipping process often implies more personnel and vehicles on the road, incurring more extra cost as a result . For example, users have to pay a high amount of money to enjoy the express service, such as 5 dollars per package if users want the same-day delivery service in US . Therefore, logistics services which can lower the cost but still ensure the arrivals of packages on time are preferable .
With the sustainable development and proliferated daily use of positioning and mobile Internet technologies, rich data regarding status of the vehicles, passengers, and packages (e.g., real-time positions, origins and destinations of passengers and packages, available transport capacity of a vehicle) can be easily recorded and accessible in real time [10, 56]. In this context, crowdshipping (also termed as crowd logistics, crowdsourcing logistics) which receives the increasing attention from both academic and industrial communities in the last few years, has been recognized as a promising and cost-effective way to alleviate the contradiction by sending passengers and packages simultaneously in a shared space and transport network [14, 17, 21, 30, 41]. In line with the previous research, with a particular focus on taxis, we propose having packages take hitchhiking rides collaboratively with existing taxis that are transporting passengers on the street, i.e., the existing mobility of taxi drivers [4, 32]. We illustrate the basic idea using the following example.
There is a package to be delivered from A to B. Here we simply assume that both A and B have facilities such as smart parcel boxes that can store packages temporally. Opportunistically, a passenger (e.g. ) at A makes a real-time taxi ordering request111It is popular that passengers order taxis in real time with mobile apps such as Uber (https://www.uber.com/). Usually, to make a request, a passenger has to provide information including his/her origin, intended destination. The request will be broadcast locally, and the taxi driver who accepts request would come to pick up the passenger ., intending to go to B. Once a taxi (e.g. ) responds to the request, we can assign the package delivery task to its driver. More specifically, we can ask the taxi driver to first collect the package and then pick up the passenger. After dropping off at B, the taxi driver leaves the package at an appointed location. Finally, the package will be delivered to or collected by its receiver at B.
As shown in the above example, this approach only requires small additional efforts and time from the taxi drivers involved, without degrading the service quality and any interrupt to passengers. We can formulate the proposed taxi-based package delivery as a route planning problem with the objective of delivering the packages to their destination on time (by the deadline given by users). To make the idea of organizing the passenger flow and package flow seamlessly via the taxi transport network feasible, we need to address the following two major research challenges:
Package flow and passenger flow are incompatible in time and space. More specifically, 1) compared to package flow, passenger flow presents salient peak-hour patterns; 2) due to the financial considerations, most passengers choose to take taxis only when the destination is close, e.g., within 4 km . While for packages, the destination is generally far away from the origin (e.g., longer than 5 km) . Therefore, we argue that routing algorithms based on the framework of query-matching merely may not work [17, 30], since a single hitchhiking ride may not be able to deliver a package to its destination; instead, a collaborative relay of taxis is needed.
Requests for demands of packages and passengers come with high uncertainties in stream. Although regular spatial and temporal patterns about passenger flow have been unveiled from taxi GPS trajectory data in the coarse granularity, it is challenging to predict the passenger demands accurately in a quite fine granularity (e.g., the passenger demands in the next 5 minutes for a given road segment) [37, 53, 59]. Meanwhile, same situation also happens to package demands 
. In summary, uncertainties exist in both requests, which hinder the estimation and comparison of time cost of package routing paths.
With the above-mentioned research objective and challenges, the main contributions of the paper are:
We propose a novel passenger and package mixed transport mode which leverages the unintentional cooperations among a crowd of occupied taxis to deliver city-wide packages on time, in order to lower the transport cost and enhance the transport efficiency simultaneously.
We formulate the package routing problem as the arriving-on-time problem  to tackle with uncertainties of passenger and package requests. Moreover, we propose a probabilistic framework named CrowdExpress, which contains two phases to solve it. In the first phase, we build the package transport network by mining the historical taxi GPS trajectory data offline. In the second phase, for each real-time generated package delivery request, we propose an online adaptive taxi scheduling algorithm based on the probabilistic model called maxProb to iteratively determine the next stop of the package “on-the-fly". The algorithm monitors real-time taxi ordering requests, recursively computes the maximum arriving-on-time probability if assigning the delivery task to the currently available taxi, and compares it to the one if waiting for future taxi rides, based on the real-time package location and the remaining time budget.
We conduct extensive evaluations using road network data and taxi GPS trajectory data generated over 19,000 taxis in a month in the city of New York (NYC) to verify the efficiency and effectiveness of CrowdExpress. Results demonstrate that CrowdExpress responds within 25 milliseconds. What is more, it can throughput around 9,500 packages daily with the success rate over 94%, i.e., over 94% of packages can be delivered successfully on time, which is consistently better than the baseline approaches.
The rest of the paper is organized as follows. In Section II, we review the related work and show how this paper differs from prior research. In Section III, we introduce some basic concepts, the assumptions we have made, the formal problem formulation, and overview the system. We present the technical details about our two-phase approach in Section IV and Section V respectively. We evaluate the performance of the proposed framework in Section VI. Finally, we conclude the paper and chart the future directions in Section VII.
Ii Related Work
Here, we will review the related work, which can be categorized into two groups. The first group consists of the work on crowdsourced logistics, whereas the second group focuses on taxi trajectory mining and its supporting applications.
Ii-a Crowdsourced Logistics
Crowdsourcing has been used for many different applications, from problem solving  to various sensing tasks [26, 45, 58]. There exist two concrete papers that particularly targeted at the package delivery problem, leveraging the spatial and time overlaps between crowdsourcing workers. Specifically, Sadilek et al.  recruited a group of twitter users, asking one person to pass the assigned package to another twitter user that happened to be nearby (within a certain distance). However, this work had two main limitations: 1) It is hard to trace and coordinate the users since people rarely share their location information continuously via geo-twitter . Therefore, the choice of suitable deliverers is limited, probably resulting in longer and uncontrollable package delivery delay. 2) It may be not practical to ask a participant to make a dedicated trip to pass the package to another suitable user, as it may interrupt his/her on-going activities (e.g. having conversation/dinner with friends) that are hard to be inferred from the user’s geo-twitters data. Similarly, the work in  which employed mobile users based on the overlaps of space and time inferred from cell towers had similar limitations: 1) The cell towers may be sparse in certain areas and thus it is difficult for mobile users to relay packages in that areas. 2) Only with data when people make calls, as a result, the number of mobile users can be recruited is limited. Compared to the proposed solutions in [36, 44], there are also some papers which intended to leverage the abundant existing passenger-delivery trips to hitchhike packages appeared in these two years [4, 3, 17, 22, 27, 30, 31]. Although the passenger flow and package flow are combined to be transported mixed, the authors fail to consider their distinct patterns in time and space. Specifically, they formulate the problem as the share-a-ride problem and insert the package requests into the passenger-delivery trips, which may not able to deliver packages successfully in real cases as we argued previously. To make the matters worse, in their solutions, during the passenger-sending course, taxis have to make several dedicated stops and detour, which degrades the service quality to passengers. At the current stage, the research on crowdsourced logistics mainly focused on issues regarding how to efficiently discover the ‘optimal’ package delivery paths and almost completely ignored the multi-criteria design of the package relay network [4, 14, 15, 27]. How to design the package relay network (e.g., the optimal number and location of package interchange stations) is vital and challenging, which should be a separated research issue [2, 51]. In this paper, we exploit existing taxi services to deliver packages  and simply cluster frequent pickup and drop-off points to get package interchange stations without any optimization technicals. Packages can be temporary stored at interchange stations in-between rides, and thus no time overlap and pair-wise contact between participants is needed. Our method requires less effort from the participants and can transport packages over a longer distance. In addition, we try to minimize the impact on the quality and experience of passenger service, which is similar to our previous work . Generally speaking, CrowdExpress is different from  in the objective, proposed solution as well as the evaluation environment. To be more specific, CrowdExpress aims to send each package to the destination on time (by the deadline required by the user), rather than as quick as possible. We argue that the objective is more reasonable and matches the real-life application scenarios more. In CrowdExpress, we formulate the problem as the one of finding arriving-on-time paths, and propose a probabilistic framework to address it. We evaluate the performance in NYC, using the real-world taxi trajectory data. Due to the openness, we choose to use the taxi trajectory data from NYC in our experiments, expecting to lower the data barrier and attract more researchers with different backgrounds into the promising interdisciplinary filed.
Ii-B Taxi Trajectory Data Mining
Information mined from taxi trajectory data can benefit for taxi drivers, passengers and city planners. For taxi drivers who are mostly interested in making more money while minimizing the fuel cost, many papers have tried to recommend areas with more potential passengers, e.g. [23, 55]. In addition, Zhang et al.  investigated the differences between efficient and inefficient taxi service in terms of passenger-searching, passenger-delivery and service-region preference. Work on recommending the best corner to catch taxis, real-time ordering free taxis, and the taxi fee estimation aims to improve the experiences of passengers, e.g. . An interesting work detected anomalous taxi rides and warned the passengers “on-the-fly” that they were taken on a unnecessary detour . For city planners, taxi trajectory data provides a rich data source to identify flaws in city planning [33, 62], probe traffic conditions , estimate the travel demands, infer the land-use efficiency [34, 40], suggest bus routes , etc. Castro et al.  provided a good survey on understanding city dynamics using taxi trajectory data. Recent studies also incorporate taxi trajectory data with other data sources such as POI data, Foursquare check-in data, and Flickr image data, to enable smarter applications, such as building functions inferring, personalized and fuel-efficient travel route planning, real-time travel purpose inference and so on [9, 12, 18, 39, 54, 63]. Another stream is to revisit previous well-known research issues using the advanced machine learning skills. For instance, Xu et al. forecasted taxi demands in real time using recurrent neural networks. Tong et al. applied a simple linear regression model with more than 200 million dimensions of features to predict the original taxi demands for each POI.
Another stream is to revisit previous well-known research issues using the advanced machine learning skills. For instance, Xu et al.
forecasted taxi demands in real time using recurrent neural networks. Tong et al.
applied a simple linear regression model with more than 200 million dimensions of features to predict the original taxi demands for each POI.However, most of the existing work that leverages taxi trajectory data focuses on transporting passengers. Little attention has been paid to shipping goods. To the best of our knowledge, we are among the first to target this new application.
Iii Preliminary, Problem Statement and System Overview
In this section, we provide definitions of some basic concepts, elicit assumptions we have made, and give a formal problem statement. Finally, we give an overview of the proposed CrowdExpress system.
Iii-A1 Basic Concepts
We define the basic concepts used in this work as follows:
Definition 1 (Taxi Trajectory).
A taxi trajectory is a sequence of GPS points corresponding to a single passenger-delivery trip. Here, the taxi trajectory is represented by a pair of Origin-Destination (OD), where the origin is the road segment that the trip starts and the destination is the one that the trip ends. The travel time is exactly the time difference between the ending and starting times.
Definition 2 (Real-time Taxi Ordering Request).
A taxi ordering request () is defined as a triplet , where and refer to the passenger’s origin and intended destination, respectively. refers to the time that the passenger submits the request.
Definition 3 (Package Transport Network).
A package transport network is a graph , consisting of a node set and an edge set , where each element in is an interchange station which is responsible for package collections and storage. Edge set is a subset of the cross product . Each element in is a non-stop directional transport route from node to node , implying that there is an abundant passenger flow for hitchhiking packages. It should be noted that the edge in the package transport network has different meaning from the edge defined over the road network. There can be multiple driving paths over the road network connecting two interchange stations.
Definition 4 (Package Delivery Request).
A package delivery request () is defined as a triplet , where and refer to the origin and destination of the package delivery respectively; refers to the time when the user submits the request (i.e. the birth time). Note that here , , indicating that packages should originate and end at interchange stations.
Definition 5 (Time Slot Slicing).
We divide a whole day into different time slots (periods) according to the day type, since the traffic conditions are changing in different time slots, resulting in large variance in travel time. A work day is divided into three time slots and a rest day is divided into two time slots, as detailed in Fig.
We divide a whole day into different time slots (periods) according to the day type, since the traffic conditions are changing in different time slots, resulting in large variance in travel time. A work day is divided into three time slots and a rest day is divided into two time slots, as detailed in Fig.1.
Definition 6 (Travel Time Probability Function).
Each edge in the package transport network () is associated with an independent random travel time (cost) whose probability density function is denoted by
whose probability density function is denoted by. varies at different time slots.
For instance, the probability that a package spends a time in the interval  from node to node directly can be computed by the definition, as shown in Eq. 1.
The travel time probability function in each time slot can be obtained separately, according to Eq. 1. What is more, as can be observed, the travel time probability is a monotonic increasing function of the time .
Definition 7 (Travel Time Discretization).
To simplify the calculation of travel time probability along an edge, we consider the travel time in a discrete manner. More precisely, we use a piecewise constant function with equal step width to discretize different travel times222Here, we set throughout the whole paper..
In the discrete case, the integral shown in Eq. 1 can be replaced using the formula shown as follows.
where refers to the number of all the possible travel times from node to node which are recorded in the taxi trajectory in history, while refers to the number of travel times less than after time discretization as defined; , and . For the edge from to shown in Fig. 2, .
Definition 8 (Arriving-on-Time Probability).
The arriving on time probability of a package-delivering path within a given time duration (i.e., deadline, ) is defined as the ratio of the number of travel times less than to the number of all possible travel times (suppose that the package is shipped from to via ), as follows.
It is obviously that the integral computation becomes more complicated if the given path is longer (i.e., contains more interchange stations) as more travel time combinations are generated.
Similarly, to ease the computation of integral in Eq. 3, we first let the travel time be considered in the discrete manner, then the integral can be degraded to the sum computation. Taking the path () shown in Fig. 2 as an example, its arriving-on-time probability within 15 minutes can be computed as:
For the given path, two cases can lead to the successful arrival of packages by the deadline. Case I: If the first recruited taxi driver spent no more than 5 minutes in the first segment (i.e., ), due to the sufficient time margin left, then the second recruited taxi driver can arrive at by the deadline at a hundred percent. Case II: If the first recruited taxi driver took more than 5 minutes in the first segment, then the package can be arrived on time only if the second recruited taxi driver spent no more than 5 minutes to accomplish the second segment (i.e., ).
For a unique path, its arriving-on-time probability becomes higher (or at least unchanged) if given a longer deadline, mathematically, we have:
The proof can be found in Appendix A. ∎
Definition 9 (Maximum Probability of Arriving-on-Time).
For an OD pair, the maximum probability of arriving-on-time (with time cost no greater than ) is defined as the maximal one among all probabilities on all possible paths from the origin () to the destination (), denoted by . In another word, serves as the upper bound of .
If a package at node is firstly sent to next, the probability that the package spends a time in the interval  on edge is , thus the time margin at node is . On the basis of Bellman’s principle of optimality [6, 38], no matter which node that the package is elected to send next, the package must follow the optimal routing strategy in shipping from node to the destination within the remaining time . Therefore, the maximum probability of arriving-on-time can be formally defined recursively as follows:
Intuitively, according to Definition 5, to compute the maximum probability of arriving-on-time for an OD pair, one needs to find all possible paths from the origin to the destination, which is well-known as an NP-hard problem . To make the concept clear, we use the example shown in Fig. 2 again. It is easy to find all paths from to , that is, , , and , respectively. For each path, similar to the computation in Eq. 3, it is not difficult to obtain the respected arriving-on-time probability within a given deadline (e.g., minutes). As a result, for the example when delivering the package via . It is obvious that given any deadline, since no travel time is needed if the package stays still. We exclude the round trip in the study.
In this work, we make the following assumptions.
Taxis cannot be recruited to take part in the package delivery tasks in the course of sending passengers. We further assume the recruited taxis have enough room for the package storage.
Compared to [4, 30, 31], the taxis are recruited to collect packages before picking up passengers, and offload packages after dropping off passengers to minimize potential impact on passengers’ experiences. This assumption tries to guarantee the quality of taxi services for passengers. In fact, it is infeasible that taxi drivers delicately stopped to get packages on-the-way when delivering passengers, unlike to the ride-sharing among passengers. In the case of passenger ride-sharing, passengers can be easily picked up on-the-way because they can proactively wait and get on cars at the appointed locations. On the contrary, in order to get packages, taxi drivers must have to take a sequence of complex actions, including parking, getting off the car, loading the package, getting on the car, and so on.
The taxi drivers are willing to accept the assigned package delivery tasks.
We believe that this assumption can be realistic given proper incentive mechanisms. In the design of incentive mechanisms, a prime principle is to ensure that the reward matches the amount of efforts put in by the drivers. For example, some places are harder to get to or park the taxi, then the incentive should be higher. However, designing a proper incentive mechanism is beyond the scope of this paper.
The packages are traceable.
In the delivery process, a package is either stored at the interchange station or carried by a scheduled taxi. Each interchange station is authorized and has a unique Id; each taxi is registered in taxi management department and also has a unique Id. This assumption tries to address the package security issues. Any package damage or loss will impair this novel package delivery service.
Iii-B Problem Statement
The collaborative crowdsourced package deliveries leveraging the relays of passenger-occupied taxis can be viewed as the problem of finding arriving-on-time paths, and thus can be formulated as follows:
A historical set of taxi trajectory records , such as from the past month in the designated city,
A set of real-time taxi ordering requests from mobile phone apps, and a set of real-time package delivery requests . Note that these two requests come in stream,
A given deadline specified by the user for each package delivery request.
Objectives: Build a package transport network (i.e., the identification of interchange stations and estimation of edge values) based on the historical taxi trajectory data. Moreover, find a package delivery path for each package request (), which can make the package arrive the destination by the deadline. However, such package delivery path may not be unique or existed. To migrate the issue, we thus transform the problem to the arriving-on-time problem, i.e., finding the optimal one that is expected to have the maximum probability of arriving-on-time333If the optimal one is still unable to send the package on-time, then it is safe to claim that the package delivery is an unsuccessful one..
The following two constraints should be satisfied.
Only taxis that accept the real-time taxi ordering requests after the package delivery request is posted can be scheduled, i.e. .
A recruited taxi can be available to participate again only after completing the current task (i.e., dropping off the package at the predefined interchange station).
Iii-C System Overview
We develop a two-phase system called CrowdExpress, i.e., offline package transport network building and online taxi scheduling and package routing to find the optimal route with the maximum probability of arriving-on-time for each package delivery request within a given deadline, by collaboratively recruiting taxi drivers that have been reserved to passengers (occupied by passengers), as shown in Fig. 3.
Phase I is an offline process, with the historical taxi trajectory data as input, aiming to identify the package interchange stations, estimate the edge values, as well as find the reference paths for any given OD package pairs. Based on the constructed package transport network, for a real-time incoming individual package delivery request, Phase II mainly takes four online steps to tackle with, namely, Exploitability Checking, Probability Computation and Comparison, Package Information Updating and Stopping Criteria Checking, with the streaming taxi ordering requests as input. The system finally outputs the corresponding package delivery paths. The technical details will be presented in the next two sections.
Iv Phase I: Offline Package Transport Network Building
The task of offline package transport network building is to identify interchange stations (i.e., node locations) as well as the estimation of travel time distributions (i.e., edge values). Here, we mainly take a three-step procedure to achieve the objectives, detailed as follows.
Iv-a Package Interchange Station Identification
One basic principle for the identification of interchange stations is that they should be located where passengers are frequently picked-up or dropped-off, to take as full advantage of passenger-sending rides for the package hitchhiking as possible, and probably to minimize the extra efforts imposed to the taxi drivers as well. Fortunately, with the taxi trajectory data left, such information (i.e., pick-up and drop-off points) can be easily extracted. Then, we cluster them using DBSCAN algorithm as it is capable of merging closer data points with arbitrary distributions . Finally, locations near the point centroids of each cluster and the road sides are identified as the locations of the interchange stations to serve the purposes of package collection, storage and receiving. The locations of the package interchange stations in the Manhattan area of NYC is shown in Fig. 4. No package interchange stations are identified in the Upper Manhattan since there were fewer passengers take taxi to reach there.
Iv-B Edge Estimation
Iv-B1 From Trajectory Data to Passenger Flow
It is straightforward to infer the passenger flow between any two interchange stations during a given time slot from the trajectory data . Specifically, we first group the trajectories according to their starting time (). Second, to compute the passenger flow from to , we count the number of the trajectories satisfying Eqns. 67. It should be noted that there could be no passenger flow between some interchange station pairs.
where and are the original and destination points of , respectively; gets the latitude and longitude location of the given interchange station; calculates the driving distance from point to point ; is a user-specified parameter. The physical meaning of is that any passenger-delivery ride which starts and ends near a pair of interchange stations (i.e., with driving distance less than ) can be hitchhiked for the package delivery between this pair. Hence, for a given OD pair, a bigger would result in a bigger number of passenger flow. It is worth noting that, for a specific trajectory, there could be multiple interchange station pairs that satisfy Eqns 67, in other words, can provide package hitchhiking ride between all these pairs. Therefore, a bigger also leads to a bigger number of interchange station pairs, suggesting that the corresponding trajectory can be more capable of providing hitchhiking rides. However, for passengers, a bigger may mean a longer waiting time for the reserved taxis, since the taxi driver might have to travel farther to collect the package before picking up passengers. To control for the additional waiting time, we set to 500 meters.
Iv-B2 From Passenger Flow to Time Distributions
To estimate an edge value, we need to estimate two parts, i.e. the waiting time and the driving time . The driving time is simply the travel time of each taxi trajectory.
The waiting time on the edge is defined as the time required to wait for a suitable hitchhiking ride that can transport a package from to directly. To address this problem, we employ the Non-Homogeneous Poisson Process (NHPP) to model the behavior of passenger taking taxis . According to the statistical frequency of passenger taking taxis from to in history (i.e. passenger flow), we can estimate the waiting time of packages at different time slots at the interchange stations. Specifically, the waiting time on the edge from to is:
where is the average number of passengers taking taxis from to during the given time slots; is the length of that time slot. Note that the waiting time obtained by Eq. 8 is in the statistical sense, and it could be much smaller in the real case due to the timely availability of right passenger-sending trips.
The waiting time component is substituted in advance when computing the arriving-on-time probability of a given path at a given time duration, thus the corresponding time distributions is obtained by discretizing the driving time component only according to Definition 2. Fig. 5 shows the time distributions after discretization. As one can observe, driving times greater than 40 minutes occupy quite a small percentage, i.e., less than 1.38% in the taxi trajectory data in NYC. Note that the edge value would be if there was no passenger flow (or less than a certain amount) on the respected edge in history.
Iv-C Reference Path Finding
On the basis of the constructed package transport network, we find two reference paths for each given OD pair, i.e., the shortest path when assuming value on each edge is the minimum travel time (), and the shortest one when assuming value on each edge is the maximum travel time (), respectively. More specifically, given an OD pair, when choosing all minimum travel times on edges, we can find the shortest path, i.e. , using Dijkstra’s algorithm . refers to the case that the package can be delivered in the most efficient manner if sent via that path. Similarly, when choosing all maximum travel times on edges, we can also discover the shortest path, i.e. . , as a comparison, refers to the case that the package can be delivered in the lowest efficient manner if sent via that path. Moreover, the corresponding total travel times of those two paths are also recorded, which can be used to guide the online taxi scheduling and package routing upon the real-time taxi ordering requests in the second phase, with details would be further addressed in the next section. It should be noted that, although it is a time-consuming procedure to find two shortest paths for each OD pair with a computation complexity of , it can be operated offline444Suppose there are nodes in the network, the maximum number of all OD pairs is ..
V Phase II: Online Taxi Scheduling and Package Routing
Given an OD pair of the package, the task of online taxi scheduling is the decision-making. To be more specific, the phase should make a decision whether utilizing the current available hitchhiking ride for the package delivery or waiting for the future rides, according to the upcoming taxi ordering requests generated in real-time and the remaining time margin. While the task of online package routing is much simpler, i.e., just assigning the delivery task to the scheduled taxi. As a result, the package will be delivered to the next stop, which is same to the destination of the taxi. The two steps impact each other mutually. On one hand, both the potential hitchhiking rides for the package delivery and the time margin is highly related to the current stop where the package locates; on the other hand, which the next stop that the package will head to is determined by the scheduled taxi. In the following, we mainly focus on the step of online taxi scheduling, which includes the following operations.
V-a Exploitability Checking
Before triggering this phase, we first need to conduct the exploitability checking, i.e., to determine: 1) whether the origin of an upcoming taxi ordering request is close to the location of the package; and 2) whether it ends at one of the interchange stations. If both conditions are met, then we further need to compare the maximum arriving-on-time probability of sending the package via (hitchhiking the current) to the maximum arriving-on-time probability of sending the package via other potential next stations (the other neighbours of in the transport network) (waiting for the future). If the former value is greater, then the package will be sent out immediately by hitchhiking the current taxi ride; otherwise, the system will wait for the new future taxi ordering requests that may lead to a higher arriving-on-time probability and the decision will made again, given the new time margin. Thus, the core of the online taxi scheduling is the maximum arriving-on-time probabilities computation and comparison.
V-B Probability Computation and Comparison
V-B1 Probability Computation if Hitchhiking the Current
According to the definition, the maximum arriving-on-time probability for case if hitchhiking the current (suppose that the recruited taxi will go to ) can be computed as follows:
where refers to the path from to while stopping at in the next; is the time difference between the occurring time of the taxi ordering request and the birth time of the package delivery request, i.e. ; is the given time duration of the deadline; is the maximum arriving-on-time probability from to , as defined in Definition 5.
By definition, it is easy to compute the arriving-on-time probability of a determined path under a given deadline time, such as . By contrast, it is rather challenging to get the value of the latter part in Eq. 9. As discussed, one naive and straightforward way is first to enumerate all the possible paths from to , then compute the value of arriving-on-time probability for each of them, finally pick up the maximum one as the final value. It is easy to understand that the trivial method cannot work in real cases as the problem of finding all possible path for a given OD pair is NP-hard. Actually, it is also no need and some branches in the transport network can be trimmed recursively. We propose a novel algorithm named maxProb to compute the probability, which mainly consists of two operations, i.e, initialization and deep-first searching.
Initialization: From to , it will be easy to find two shortest paths, , . We further obtain the two corresponding reference paths from to via , and compute their arriving-on-time probabilities given the remaining time, which are two boundaries and used to guide the process of branch trimming. For brevity, we use and 555 refers to the package is sent from to directly, while refers to the package is sent from to via some intermediate stops that are determined by the reference paths between and . to represent the arriving-on-time probabilities of and , respectively.
Depth-First-Searching: From to , we mainly apply the Depth-First-Search (DFS) method to recursively get each possible path , and compute the maximum probability of arriving-on-time. One exception is that the user specifies an extremely long deadline, mathematically, , implying that the package can be delivered on time for sure via the reference path. Therefore, no DFS is needed and a simple taxi scheduling can be enough under such circumstance. The overall procedure of DFS starting from can be summarized as follows.
The core function is to find the next package stop of , with the pseudocode shown in Algorithm 1. The very beginning task is to get the neighouring stations of , given the topology of the built package transport network (Line 1). DFS starts to find the next stop from one of the neighbouring nodes of (e.g., ) in the loop (Lines 49). Whether can be the next package stop is determined by the inequation shown in Line 6. In the in-equation, the reference probability () is first set to (Line 3). is the time cost of the reference path estimated by the historical taxi trajectory data; the right part of the in-equation is the maximum arriving-on-time probability which corresponds to the ideal case that the package can be shipped from to in the most-efficient way (i.e., the time cost on each edge in the package transport network is the minimal). If the in-equation satisfies, it indicates that there exists a potential path which can lead to a higher maximum arriving-on-time probability than the reference path, thus DFS will continue to search with a new start from recursively, with the same procedure to the DFS starting from ; otherwise, DFS will be terminated and the related branches will be trimmed at the same time. Thus, a recursion may be stopped either at some intermediate node or generates a successful path reaching the given destination. If a valid path is resulted (), will be updated using its corresponding probability if and only if it is greater than the previous value.
The whole DFS ends when all neighhours of are checked by repeatedly calling the above recursive DFS. Finally, the maximum arriving-on-time probability if assigning the package delivery task to the current available taxi shall be the final value of .
V-B2 Probability Computation if Waiting for the Future
If the future taxi rides heading to any one of the other neighouring nodes of except for (marked as ) could lead to a higher arriving-on-time probability, compared to the case if hitchhiking the current, a better decision should be the waiting. The maximum arriving-on-time probability if waiting for the future can be computed as follows:
where and refers to the edge value component of waiting time from to . As can be seen, the major difference between Eq. 9 and Eq. 10 is the time margin. More specifically, less time margin is left for the package deliveries as an additional time cost would be induced while waiting for the future taxi rides. Here, we simply use the average waiting time to approximate the additional time cost.
Similarly, all maximum arriving-on-time probabilities of waiting for the future exploitable taxi rides from can be computed, and the maximal one among them will be chosen to represent the maximum arriving-on-time probability if assigning the package delivery task to the future taxis.
V-B3 Probability Comparison
As discussed, once receiving a real-time taxi ordering request, on-line taxi scheduling and package routing will be activated, and the package may be shipped to some intermediate stop by hitchhiking the current ride or stands still at the current stop by comparing those two maximum arriving-on-time probabilities. Note that the remaining time margin shrinks as time goes by, the two probabilities computed in Eqns. 9 and 10 are dynamically changed, thus the better decision (hitchhiking the current or waiting for the future) can be also adjusted adaptively “on-the-fly”.
V-C Package Information Updating
After the package is sent to the next station whether by hitchhiking the current or future rides, the information about the package delivery request should be updated. To be more specific, the origin of the package should be set as the updated station that the package locates; the birth time should be set as the time when the package arrives at the current stop. The newly updated package delivery request will be used as the input of Phase II.
V-D Stopping Criteria Check
For a package delivery, the previous three operations will be iteratively conducted until one of the following two stopping criteria is satisfied: 1) the package has arrived at its destination; 2) the time is running out (the package cannot be delivered by the deadline), in that case, the system would report failure. For those failure package deliveries, empty taxis can be recruited dedicatedly to send them to the destinations. However, the topic is out of the scope of the paper.
Vi Experimental Evaluations
In this section, we empirically evaluate the performance of the proposed maxProb
algorithm. We first introduce the experimental setup, baseline algorithms used for comparison, evaluation metrics and results on algorithm efficiency and effectiveness. We discuss some open research issues to be further addressed in the end.
Vi-a Experimental Setup
Vi-A1 Experimental Data
We use the real-world datasets for the evaluation, i.e. the road network data which is extracted from OpenStreetMap666www.openstreetmap.org, and one month of taxi trajectory data generated by over 19,000 taxis in the city of New York (NYC), US. Readers can refer to  for the details on how to extract the road network from the crowd-sourced open platform (i.e., OpenStreetMap) correctly. We determine package interchange stations according to the algorithm discussed earlier.
For the taxi trajectory data, we split it data into training and testing sets, according to the date of the month. Specifically, the training set contains taxi trajectories on 1st20th, January, 2013, which are used to build package transport network. It should be noted that for the taxi trajectory data in NYC, no detailed travel routes between the pick-up and drop-off points are provided due to the privacy considerations. The testing trajectories were generated from 21st to 31st, January, 2013, which are used as the real-time taxi ordering requests () for testing the performance of the proposed maxProb algorithm. Table I shows some basic statistics of the taxi trajectory and road network data.
|Taxi trajectory||Number of taxis||>19,000|
|Number of occupied rides||13 M|
|Road network||Number of road intersections||11,999|
|Number of road segments||15,202|
Vi-A2 Package Delivery Request
Since the data sets do not contain information about package delivery requests, we apply simple mechanism to simulate it. The novel package delivery system targets the city-wide person-to-person service. Hence, to simulate a package delivery request (), we randomly generate its birth time, origin and destination. Regarding the origin and destination, any package should be originated and ended at the interchange stations.
Vi-A3 Evaluation Environment
All the evaluations in the paper are programmed using Java language under the Eclipse J2SE 1.5 integrated development environment, and run on an Intel Core i5-4950 PC with 8-GB RAM and Windows 7 operation system.
Vi-A4 Evaluation Metrics
We adopt the following three metrics to evaluate the proposed maxProb.
Success Rate. The success rate is the ratio of the number of packages which can be delivered successfully within a given deadline (i.e. time duration) to the number of total packages (i.e. the number of package delivery requests simulated).
where represents the optimal path generated by the proposed maxProb algorithm for a given package delivery request; is the given deadline. The delivery performance is better if the success rate is higher within a given shorter deadline.
Regarding the deadline setting, we do not set an absolute value for all package deliveries since package originated (ended) at different locations would need absolutely varied time. Thus, for an individual package delivery, we set a relative deadline separately instead, according to the following equation:
in which is the average value of the time cost by the two reference paths, which is obviously different for packages with different OD pairs. is the extra time value imposed by the user; a smaller indicates that the user needs the package more urgently and wants it to be arrived more timely.
Number of Relays. The number of relays () during a package delivery is defined as the number of participating taxis (Formula 13).
On one hand, fewer relays generally mean a lower chance of package loss or damage, and perhaps less overhead cost. On the other hand, fewer number of participating taxis may imply requiring less reward cost to taxi drivers. Thus, the performance is better if the number of relays is smaller.
Package Throughput. The average number of package deliveries that the system can complete successfully per day. The system achieves better performance if the package throughput is bigger.
Vi-B Baseline Algorithms
To show the superior performance of our proposed algorithm, we compare it with the following three baseline algorithms.
(1) FCFS - This method adopts the First Come First Service strategy. Specifically, the package will be assigned to the first taxi that will pick up a passenger near the interchange station that the package locates, regardless of its destination. In fact, this algorithm always favors the strategy of hitchhiking the current, which is also known as an extension of the simple and well-known flooding strategy [5, 28].
(2) DesCloser - This method assigns the package to the first taxi that will head to somewhere closer to the destination of the package, compared to the current station of the package. This algorithm implements a distance-based geo-cast scheme that is commonly seen in other domains [60, 64].
(3) Direct - This method waits for the taxi heading to the destination near the interchange stations that the package will be delivered directly, without any intermediate stops. Specifically, the package will be assigned to the taxi that will pick up and drop off a passenger near the interchange stations that the package locates and heads to, respectively. Thus, no relays are needed.
Remark. Each relay in DesCloser is effective as it ensures that the package would move towards its destination step by step; while some relays in FCFS can be ineffective as the package moves further away from its destination. For Direct, it may be inefficient for package deliveries where there is little passenger flow in-between. However, all three baseline algorithms do not take the arriving-on-time probability of package deliveries into account, thus probably resulting in a high failure rate.
Vi-C Experimental Objectives
We plan the experiments to address the following questions.
Question 1: How much computational resource is required to generate the response for a package delivery request?
Question 2: How does maxProb perform under different given deadlines?
Question 3: How does maxProb perform w.r.t the birth time of package deliveries?
Question 4: How does maxProb perform w.r.t the locations (both origins and destinations) of package deliveries?
Question 5: How many packages can be delivered daily on average (i.e., throughput) with the proposed system?
The first question concerns the efficiency of maxProb, and Questions 24 are related to its effectiveness. To answer the first question, we compute the response time of the algorithms. Since passenger flows are both time- and space-dependent, to assess the effectiveness of the different algorithms, we calculate their success rates and the number of relays with respect to packages to be dispatched to different parts of the city at different time of the day. We test the throughput of the proposed system and the success rate under different number of package requests generated per hour, and also examine the system throughput given different number (density) of interchange stations in the designate city (Question 5).
Vi-D Experimental Results
Vi-D1 Results of Response Time
We first analyze the main operations involved in the four algorithms respectively. For a given package delivery request (), when a new real-time taxi ordering request () comes in, all four algorithms need to determine whether starts near the origin of the package and stops at some interchange station, i.e., exploitability checking. FCFS will recruit the first taxi that satisfies the criteria, but for DesCloser, it needs to further determine whether the heading destination of the taxi is closer to the destination of the package, compared to its current location. For Direct, it also needs to determine whether the taxi would head to the destination of the package. Thus one more comparison operation is required for both DesCloser and Direct algorithms. For maxProb, the procedure is even more complicated, mainly requiring additional probability computation and comparison operations, as discussed previously. Each algorithm needs to repeat its own operation procedure at each intermediate station (except for Direct) and thus the total response time is the accumulated computational time over all iterations.
We show the comparison results of average response time of the four different algorithms in Fig. 6. The average response time of FCFS is the biggest while that of Direct is the smallest among all algorithms. The average response times of DesCloser and maxProb are in-between and DesCloser costs slightly more time than maxProb. More precisely, the average response time of Direct is within 7 milliseconds; the average response time of maxProb is around 22 milliseconds; the average response time of FCFS is no more than 30 milliseconds. All results indicate that all four algorithms are quite efficient, and can plan and adjust the shipping routes in real time. We also observe an interesting phenomenon: although FCFS only involves two simple comparisons for each candidate taxi, it is the most time-consuming method accumulatively. In comparison, maxProb which contains the most sophisticated operations needs a shorter response time than FCFS and DesCloser. We argue that this is because: FCFS and DesCloser require more rounds of computation (i.e., more relays) than the maxProb algorithm.
We further evaluate the effectiveness of the branch trimming in the probability computation for maxProb in terms of the average response time saving, with the result shown in Fig. 7. To better illustration, we also highlight the result w.r.t different deadlines within the range of [20, 40] minutes in the left-top part of the figure. A significant time saving is obtained with the introduction of branch trimming. To be more specific, the gap of the average response time with/without branch trimming increases exponentially wider as the given deadline becomes bigger, what is more, all the average response times with branch trimming remain stable and small under all given deadlines. The package delivery requests are the same for the efficiency studies, with a number of 100.
Vi-D2 Results of Success Rate
We present results of the performance of maxProb in terms of the success rate under different deadlines in Fig. 8. More specifically, is set in a range from 20 to 100 minutes, with an equal interval of 20 minutes. As one can see, the success rate under all deadlines is high, with a value above 90%. The success rate firstly becomes slightly higher then decreases gradually as users allow a longer deadline. The highest success rate appears when the is set 60 minutes, with the value of around 95%. The observed phenomena seems somehow counterfactual at the first glance as the success rate should be higher when setting a longer deadline in intuition. The root cause is that: when giving a bigger deadline, the arriving-on-time probability if hitching the current becomes greater, as guaranteed by Theorem 1. An extreme case is that the probability would always equal one and dominates the other possibilities, as a consequence, the maxProb algorithm tends to select the hitchhiking the current strategy while routing packages. Under such circumstance, maxProb degrades to the FCFS algorithm to some extent, causing the negligible decrease of the success rate. To get rid of such degradation, the key issue is to lower the value computed by Eqn. 9. Thus, one potential solution can be the reduction of remaining time margin during package routing when applying maxProb algorithm. For instance, if hitchhiking the current strategy is always preferred during package routing, the remaining time margin can be manually reduced to 90% of the true one that imposed by the user. It should be noted that the success rate of FCFS is much lower, compared to maxProb , which will be verified in the following experiments. The number of package delivery requests is 10,000, with the birth time uniformly distributed at the day-time (i.e., from 8:00 to 18:00).
, which will be verified in the following experiments. The number of package delivery requests is 10,000, with the birth time uniformly distributed at the day-time (i.e., from 8:00 to 18:00).
We are also interested at the performance of the success rate at different time of the day. As can be observed in Fig. 9, the success rate is quite stable and high from 8:00 to 18:00, with the value above 95%, demonstrating the usefulness of the proposed system in practice during the day time. The success rate is extremely low during late-night and early-morning. For instance, the success rate is even lower than 10% from 2:00 to 5:00. To get an in-depth understanding on the root cause, we further show the number of passenger-delivery trips at different time of the day in Fig. 9. As expected, the success rate is highly correlated with the number of passenger-delivery trips, i.e., a greater number of passenger-delivery trips implies a higher success rate since more hitchhiking opportunities for package deliveries are provided. An interesting observation is that the success rate at 17:00 is still high though the number of passenger-delivery trips is not large at that time, compared to nearby hours. This is because: on one hand, the number of hitchhiking opportunities for package deliveries should be accumulated during the given deadline (usually bigger than an hour); on the other hand, the number of passenger-delivery trips in the next two hours increases and remains high. On the contrary, the number of passenger-delivery trips during late-night and early morning is continuously small. During that time periods, although hitchhiking opportunities are accumulated, it is still insufficient, resulting in a poor success rate. In this study, the number of package delivery requests is 1,000 for each hour time; the is fixed to 60 minutes.
We report the result of success rate for four different algorithms w.r.t the time of the day. For simplicity, we do not provide the success rate of the four algorithms under every hour of a day, and just split a day into two time slots, i.e, day time from 8:00 to 18:00 and the rest is the night time, respectively. In the time dimension, as shown in Fig. 10, for all four algorithms, the success rate is higher at day time than night time, except for FCFS which only achieves the similarly low success rate at both time slots. Direct algorithm returns quite close performance at day time and night time. As predicted, the success rate of FCFS algorithm is the lowest, i.e., below 10%. Compared to the other three baseline algorithms, maxProb achieves the best success rate at both day time and night time. In this experiment, the number of package delivery requests is 10,000, with the birth time uniformly distributed at the day time and nigh time, respectively; the is fixed to 60 minutes.
In real situation, there are more passengers who prefer to take taxis at some interchange stations, providing more hitchhiking opportunities for package deliveries, probably leading to a better success rate. We thus manually categorize the interchange stations into two classes (i.e. popular, unpopular) in advance, by taking its total number of pick-ups and drop-offs in history into account. The number of interchange stations labeled as popular is 19; and the number of interchange stations labeled as unpopular is 15. We further identify three categories of package delivery requests, according to the labels of the original and destination station, as shown in Table II. Then, we test the performance on success rate for each category. The number of package delivery requests for each category is same, with a value of 10,000.
|Category I||packages start and end at popular stations (p2p)|
|Category II||packages start or end at popular stations (p2u or u2p)|
|Category III||packages start and end at unpopular stations (u2u)|
Fig. 11 shows the comparison of success rate of the four algorithms under a given deadline (the is fixed to 60 minutes in this study) for the three categories. maxProb achieves the best performance for three categories, while the performance of FCFS is the worst. One exception is the success rate of Direct for the Category III packages. In such case, without any relay at the intermediate stations, Direct obtains an extremely low success rate (i.e., under 5%), which is due to the fact that there is insufficient passenger flow between two unpopular stations. For the same algorithm, the performance is also different for different categories of package delivery requests. The performance for the Category I packages is the best; the performance is the worst for the Category III packages. Particularly, maxProb ensures that around 90% of the Category I packages (75% for Category II and III) can be arrived-on-time successfully. While for FCFS, only less than 10% of all Category packages can be delivered by the deadline. Similar to the the previous results, DesCloser performs better than FCFS and worse than maxProb for all three categories.
Vi-D3 Results of Number of Relays
We compare the number of relays777For the experiments on the number of relays, it should be noted that the number of relays is counted and averaged only for the packages that are delivered on time. of maxProb under different given deadlines, with the results shown in Fig. 12. One can observe that the number of relays presents an ascending tendency with the increase of the given deadline. One possible reason is that more ineffective relays (i.e., the package moves back and forth towards the destination commonly) are resulted since maxProb is inclined to hitchhike the coming taxi immediately, as discussed.
We also show the results on how the number of relays obtained by maxProb varies under different time of day in Fig. 13. Overall, the number of relays is more or less unchanged (i.e., with a value of around 3) during the whole day, except for the early morning when the number of relays is relatively smaller, probably caused by the extremely little passenger flow during that time.
We report the results of the number of relays for four algorithms w.r.t the day time and nigh time respectively, as shown in Fig. 14. As can be predicted, the number of relays for Direct shall equal one. For the other three algorithms, slightly more relays are generally required at the day time than the night time. Moreover, a little surprisedly, the number of relays resulted from DesCloser and FCFS are quite close to that obtained by maxProb, implying that the number of relays is somehow independent of the adopted algorithms for a successful package delivery. We will investigate deeper about the potential causes qualitatively and quantitatively such as the geographical and temporal distributions of the successful package deliveries in the future work.
We further report the results of the number of relays for four algorithms w.r.t the package categories, as shown in Fig. 15. Similarly, the number of relays for Direct is one, regardless of the package categories. Compared to the other two algorithms (i.e., DesCloser and FCFS), maxProb requires slightly fewer relays for all three package categories. Similar to the results of the success rate w.r.t the package categories, for all algorithms except for Direct, the performance in terms of the number of relays is the best for the Category I packages; the performance for the Category III packages is the worst.
Vi-D4 Results of Package Throughput
It is necessary to evaluate the package throughput of the system, since it is a primary consideration when applied to real-life scenarios. Fig. 16 shows the results on the number of packages that the proposed system can transport successfully w.r.t the number of total generated package delivery requests per day, together with the success rate. More specifically, package throughput increases gradually with the number of generated package delivery requests before approaching a stable value. On the contrary, the success rate declines almost linearly with the number of generated package delivery requests. As can be seen, the maximum package throughput is around 20,000 per day, however, the corresponding success rate is quite low (i.e., around 40%) and might be not applicable in real life. To make the proposed system practical in real situations, the package throughput should be around 9,500 per day while maintaining a relatively promising success rate.
We further examine the package throughput under different density (number) of interchange stations, as shown in Fig. 17. As expected, the more interchange stations our system has, the higher throughput it can achieve (the number of package delivery requests for this study is 10,000). We argue that the root cause is the improvement on utilization ratio of taxi trips for package deliveries, which is defined as the ratio between the number of taxi trips involved in package deliveries and the total number of taxi trips. We thus plot the utilization ratio under different density of interchange stations in Fig. 17 as well. As evidenced, an increasing utilization ratio (35.5%, 58.3% and 64.9% respectively) is achieved as the interchange stations in the city becomes denser (18, 34 and 46 respectively). Moreover, there is still a considerable room to increase the package throughput further if placing more interchange stations to better utilize the taxi trips. We are aware that the choice of interchange stations (different locations but with the same number) may affect the throughput result, but the overall trend shall be the same.
Vi-E Open Research Issues
In this section, we discuss some open issues which are not resolved in this work but will be addressed in the future.
Package Delivery Request. Modeling package delivery request is a challenging task, requiring additional data sources, such as demographics, socio-economics, land use, and on-line purchasing activities. How to accurately model the city-wide package flow distributions can be a separate research problem itself, and there has not yet been any reliable model as far as we know . In the scope of this paper, we focus on discovering the near-optimal package delivery paths with high efficiency, given any package delivery request. Hence, we simply generate the package birth time, origins and destinations randomly. In the future, we plan to incorporate other more real-world data sources to model the city-wide package delivery requests , and integrate them into our framework.
Transport Network Optimization. At the current research of crowd logistics, most studies focus on developing advanced package routing algorithms. In contrast, less attention has been paid to the package transport network optimization (i.e., the number/locations of interchange stations). In the future, we plan to address the two issues simultaneously.
Multi-objective Optimization. The on-time performance is one of the important optimization objectives of crowdsourced logistics. From the side of logistics service providers, other objectives are also equally important, such as the money paid to the taxi drivers. The more taxi drivers are recruited, the more rewards are necessary. Thus, to minimize the number of package relays is worth to be considered. From the side of taxi drivers, the minimization of detour driving distance is of great importance. In the future, we plan to formulate the problem of crowd delivery via hitchhiking taxi rides as the multi-objective optimization one.
Practical Issues. There are still many practical issues to be addressed before truly realizing the system. The capacity issue of the interchange station is one example. On one hand, interchange stations frequently visited by passengers are more likely to be recruited for relaying packages, and thus require a larger capacity in general. On the other hand, the spatial and temporal traffic patterns of package deliveries also have a critical impact on the capacity issue . Other practical issues such as the incentive mechanism for taxi drivers, the package pricing methods, the cooperation cost of the interchange station, the size of the package, the standard for container of the package also need further investigation.
Vii Conclusion and Future Work
In this paper, we present a novel framework called CrowdExpress for package delivery path planning. The framework proposes to exploit hitchhiking rides provided by occupied taxis to transport packages in time without degrading the quality of passenger services. More specifically, we first built the package transport network by mining the taxi GPS trajectory data offline. Then we proposed a two-phase approach for package delivery path planning with a novel and comprehensive process. Using real-world datasets which include road network data, and a large-scale taxi trajectory data generated by over 19,000 taxis in a month in NYC, US, we compared our proposed method with three baseline algorithms, and showed that our method is more efficient and effective.
In the future, we plan to broaden and deepen this work in several directions. First, we plan to correlate the package deadline to the package pricing. Currently, we set a general and relative deadline for all package deliveries and handle them equally. In the near future, we plan to set different priorities for packages according to users’ expected arriving time. For instance, users are charged higher if they want their packages to be arrived earlier, and new package routing algorithms should be developed accordingly. Second, we intend to take actions to improve the system throughput and coverage. Such as grouping packages with close destinations and optimizing the interchange stations (number and location). Finally, we plan to implement and test our system with real users in actual settings, collecting feedback on how to further improve the service.
The work was supported by the National Key Research and Development Project of China (No. 2017YFB1002000), the National Science Foundation of China (No. 61602067 and No. 61872050), Chongqing Basic and Frontier Research Program (No. cstc2018jcyjAX0551) and the Fundamental Research Funds for the Central Universities (No. 2018cdqyjsj0024). Chao Chen is the corresponding author.
-  L. Alarabi, A. Eldawy, R. Alghamdi, and M. F. Mokbel. TAREEG: a mapreduce-based web service for extracting spatial data from openstreetmap. In Proc. of the ACM SIGMOD, pages 897–900, 2014.
-  T. H. Ali, S. Radhakrishnan, S. Pulat, and N. C. Gaddipati. Relay network design in freight transportation systems. Transportation Research Part E: Logistics and Transportation Review, 38(6):405–422, 2002.
-  C. Archetti, M. Savelsbergh, and M. G. Speranza. The vehicle routing problem with occasional drivers. European Journal of Operational Research, 254(2):472–480, 2016.
-  A. M. Arslan, N. Agatz, L. Kroon, and R. Zuidwijk. Crowdsourced delivery—a dynamic pickup and delivery problem with ad hoc drivers. Transportation Science, to appear, 2018.
-  A. Balasubramanian, B. Levine, and A. Venkataramani. Dtn routing as a resource allocation problem. In Proc. of ACM SIGCOMM, pages 373–384, 2007.
-  R. E. Bellman and S. E. Dreyfus. Applied dynamic programming. Princeton university press, 2015.
-  P. S. Castro, D. Zhang, C. Chen, S. Li, and G. Pan. From taxi GPS traces to social and community dynamics: A survey. ACM Computing Surveys, pages 17:1–17:34, 2013.
-  P. S. Castro, D. Zhang, and S. Li. Urban traffic modelling and prediction using large scale taxi gps traces. In Proc. of Pervasive, pages 57–72, 2012.
C. Chen, S. Jiao, S. Zhang, W. Liu, L. Feng, and Y. Wang.
Tripimputor: Real-time imputing taxi trip purpose leveraging multi-sourced urban data.IEEE Transactions on Intelligent Transportation Systems, 99:1–13, 2018.
-  C. Chen, J. Ma, Y. Susilo, Y. Liu, and M. Wang. The promises of big data and small data for travel behavior (aka human mobility) analysis. Transportation Research Part C: Emerging Technologies, 68:285–299, 2016.
-  C. Chen, D. Zhang, P. Castro, N. Li, L. Sun, S. Li, and Z. Wang. iBOAT: Isolation-based online anomalous trajectory detection. IEEE Transactions on Intelligent Transportation Systems, 14(2):806–818, 2013.
-  C. Chen, D. Zhang, B. Guo, X. Ma, G. Pan, and Z. Wu. TripPlanner: Personalized trip planning leveraging heterogeneous crowdsourced digital footprints. IEEE Transactions on Intelligent Transportation Systems, 16(3):1259–1273, 2015.
-  C. Chen, D. Zhang, N. Li, and Z.-H. Zhou. B-Planner: Planning bidirectional night bus routes using large-scale taxi gps traces. IEEE Transactions on Intelligent Transportation Systems, 15(4):1451–1465, 2014.
-  C. Chen, D. Zhang, X. Ma, B. Guo, L. Wang, Y. Wang, and E. Sha. CrowdDeliver: Planning city-wide package delivery paths leveraging the crowd of taxis. IEEE Transactions on Intelligent Transportation Systems, 18(6):1478–1496, 2017.
-  W. Chen, M. Mes, and M. Schutten. Multi-hop driver-parcel matching problem with time windows. Flexible services and manufacturing journal, pages 1–37, 2017.
-  T. Crainic, L. Gobbato, G. Perboli, and W. Rei. Logistics capacity planning: A stochastic bin packing formulation and a progressive hedging metaheuristic. Technical report, CIRRELT-2014-66, 2014.
-  A. Devari. Crowdsourced last mile delivery using social networks. Master’s thesis, The State University of New York at Buffalo, 2016.
-  Y. Ding, C. Chen, S. Zhang, B. Guo, Z. Yu, and Y. Wang. GreenPlanner: Planning personalized fuel-efficient driving routes using multi-sourced urban data. In IEEE International Conference on Pervasive Computing and Communications (PerCom), pages 207–216, 2017.
-  A. Doan, R. Ramakrishnan, and A. Y. Halevy. Crowdsourcing systems on the world-wide web. Commun. ACM, 54(4):86–96, Apr. 2011.
-  M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. of ACM KDD, volume 96, pages 226–231, 1996.
-  D. J. Fagnant and K. M. Kockelman. The travel and environmental implications of shared autonomous vehicles, using agent-based model scenarios. Transportation Research Part C: Emerging Technologies, 40:1–13, 2014.
-  E. Fatnassi, J. Chaouachi, and W. Klibi. Planning and operating a shared goods and passengers on-demand rapid transit system for sustainable city-logistics. Transportation Research Part B: Methodological, 81:440–460, 2015.
-  Y. Ge, H. Xiong, A. Tuzhilin, K. Xiao, M. Gruteser, and M. Pazzani. An energy-efficient mobile recommender system. In Proc. of ACM KDD, pages 899–908, 2010.
-  B. L. Golden, L. Levy, and R. Vohra. The orienteering problem. Naval research logistics, 34(3):307–318, 1987.
-  J. González-Feliu, M. G. Cedillo-Campo, and J. L. García-Alcaraz. An emission model as an alternative to od matrix in urban goods transport modelling. Dyna, 81(187):249–256, 2014.
-  B. Guo, Z. Wang, Z. Yu, Y. Wang, N. Y. Yen, R. Huang, and X. Zhou. Mobile crowd sensing and computing: The review of an emerging human-powered sensing paradigm. ACM Computing Surveys, 48(1):7:1–7:31, 2015.
-  N. Kafle, B. Zou, and J. Lin. Design and modeling of a crowdsource-enabled system for urban parcel relay and delivery. Transportation Research Part B: Methodological, 99:62–82, 2017.
-  Y.-B. Ko and N. H. Vaidya. Flooding-based geocasting protocols for mobile ad hoc networks. ACM/Springer MONET, 7(6):471–480, 2002.
-  B. Leng, H. Du, J. Wang, L. Li, and Z. Xiong. Analysis of taxi drivers’ behaviors within a battle between two taxi apps. IEEE Transactions on Intelligent Transportation Systems, 17(1):296–300, 2015.
-  B. Li, D. Krushinsky, H. A. Reijers, and T. Van Woensel. The share-a-ride problem: People and parcels sharing taxis. European Journal of Operational Research, 238(1):31–40, 2014.
-  B. Li, D. Krushinsky, T. Van Woensel, and H. A. Reijers. The share-a-ride problem with stochastic travel times and stochastic delivery locations. Transportation Research Part C: Emerging Technologies, 67:95–108, 2016.
-  Y. Liu, B. Guo, C. Chen, H. Du, Z. Yu, D. Zhang, and H. Ma. Foodnet: Toward an optimized food delivery network based on spatial crowdsourcing. IEEE Transactions on Mobile Computing, to appear, 2018.
-  Y. Liu, C. Liu, N. Yuan, L. Duan, Y. Fu, H. Xiong, S. Xu, and J. Wu. Exploiting heterogeneous human mobility patterns for intelligent bus routing. In Proc. of IEEE ICDM, pages 360–369, Dec 2014.
-  Y. Liu, F. Wang, Y. Xiao, and S. Gao. Urban land uses and traffic ‘source-sink areas’: Evidence from gps-enabled taxi data in shanghai. Landscape and Urban Planning, 106(1):73–87, 2012.
-  R. Lowe and M. Rigby. The last mile–exploring the online purchasing and delivery journey. Technical report, Barclays, 2014.
-  J. McInerney, A. Rogers, and N. R. Jennings. Crowdsourcing physical package delivery using the existing routine mobility of a local population. In the Orange D4D Challenge, 2014.
-  L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, and L. Damas. Predicting taxi–passenger demand using streaming data. IEEE Transactions on Intelligent Transportation Systems, 14(3):1393–1402, 2013.
-  Y. Nie and Y. Fan. Arriving-on-time problem: discrete algorithm that ensures convergence. Transportation Research Record: Journal of the Transportation Research Board, (1964):193–200, 2006.
-  B. Pan, Y. Zheng, D. Wilkie, and C. Shahabi. Crowd sensing of traffic anomalies based on human mobility and social media. In Proc. of ACM SIGSPATIAL, pages 344–353, 2013.
-  G. Pan, G. Qi, Z. Wu, D. Zhang, and S. Li. Land-use classification using taxi GPS traces. IEEE Transactions on Intelligent Transportation Systems, 14(1):113–123, 2013.
-  H. B. Rai, S. Verlinde, J. Merckx, and C. Macharis. Crowd logistics: an opportunity for more sustainable urban freight transport? European Transport Research Review, 9(3):39, 2017.
-  A. J. Rohm and V. Swaminathan. A typology of online shoppers based on shopping motivations. Journal of Business Research, 57(7):748 – 757, 2004.
-  A. Sadilek, H. Kautz, and J. P. Bigham. Finding your friends and following them to where you are. In Proc. of WSDM, pages 723–732, 2012.
-  A. Sadilek, J. Krumm, and E. Horvitz. Crowdphysics: Planned and opportunistic crowdsourcing for physical tasks. In Proc. ICWSM, 2013.
-  I. Semanjski and S. Gautama. Crowdsourcing mobility insights–reflection of attitude based segments on high resolution mobility behaviour data. Transportation Research Part C: Emerging Technologies, 71:434–446, 2016.
-  S. Skiena. Dijkstra’s algorithm. Implementing Discrete Mathematics: Combinatorics and Graph Theory with Mathematica, Reading, MA: Addison-Wesley, pages 225–227, 1990.
-  X. Tan, Y. Shu, X. Lu, P. Cheng, and J. Chen. Characterizing and modeling package dynamics in express shipping service network. In Proc of IEEE Congress on Big Data, pages 144–151, 2014.
-  R. Tarjan. Depth-first search and linear graph algorithms. SIAM journal on computing, 1(2):146–160, 1972.
-  The Economist. Same-day dreamer. http://www.economist.com/news/business/, 2014.
-  Y. Tong, Y. Chen, Z. Zhou, L. Chen, J. Wang, Q. Yang, J. Ye, and W. Lv. The simpler the better: a unified approach to predicting original taxi demands based on large-scale online platforms. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1653–1662, 2017.
-  H. Üster and P. Kewcharoenwong. Strategic design and analysis of a relay network in truckload transportation. Transportation Science, 45(4):505–523, 2011.
-  J. Xu, R. Rahmatizadeh, L. Bölöni, and D. Turgut. Real-time prediction of taxi demand using recurrent neural networks. IEEE Transactions on Intelligent Transportation Systems, 19(8):2572–2581, 2018.
H. Yao, F. Wu, J. Ke, X. Tang, Y. Jia, S. Lu, P. Gong, J. Ye, and Z. Li.
Deep multi-view spatial-temporal network for taxi demand prediction.
2018 AAAI Conference on Artificial Intelligence (AAAI’18), 2018.
-  Z. Yu, H. Xu, Z. Yang, and B. Guo. Personalized travel package with multi-point-of-interest recommendation based on crowdsourced user footprints. IEEE Transactions on Human-Machine Systems, 46(1):151–158, 2016.
-  N. J. Yuan, Y. Zheng, L. Zhang, and X. Xie. T-finder: A recommender system for finding passengers and vacant taxis. IEEE Transactions on Knowledge and Data Engineering, 25(10):2390–2403, 2013.
-  D. Zhang, B. Guo, and Z. Yu. The emergence of social and community intelligence. Computer, (7):21–28, 2011.
-  D. Zhang, L. Sun, B. Li, C. Chen, G. Pan, S. Li, and Z. Wu. Understanding taxi service strategies from taxi GPS traces. IEEE Transactions on Intelligent Transportation Systems, 16(1):123–135, 2014.
-  D. Zhang, L. Wang, H. Xiong, and B. Guo. 4w1h in mobile crowd sensing. IEEE Communications Magazine, 52(8):42–48, 2014.
-  J. Zhang, Y. Zheng, D. Qi, R. Li, X. Yi, and T. Li. Predicting citywide crowd flows using deep spatio-temporal residual networks. Artificial Intelligence, 259:147–166, 2018.
-  L. Zhang, B. Yu, and J. Pan. Geomob: A mobility-aware geocast scheme in metropolitans via taxicabs and buses. In Proc. of INFOCOM, pages 1279–1787, April 2014.
-  X. Zheng, X. Liang, and K. Xu. Where to wait for a taxi? In Proc. of ACM UrbComp, pages 149–156, 2012.
-  Y. Zheng, Y. Liu, J. Yuan, and X. Xie. Urban computing with taxicabs. In Proc. of UbiComp, pages 89–98, 2011.
-  C. Zhong, X. Huang, S. M. Arisona, G. Schmitt, and M. Batty. Inferring building functions from a probabilistic model using public transportation data. Computers, Environment and Urban Systems, 48:124 – 137, 2014.
-  M. Zorzi and R. Rao. Geographic random forwarding (GeRaF) for ad hoc and sensor networks: multihop performance. IEEE Transactions on Mobile Computing, 2(4):337–348, 2003.
Appendix A Proof of Theorem 1
The theorem can be proven by induction, detailed as follows:
Base Case: When (in which is length of the given path which is quantified by the number of interchange stations contained by the path minus one), it is obviously that we have , according to Definition 6.
Induction Step: Let be given and suppose Eq. 4 is true for , that is, we have:
where is the probability density function on the edge from the origin to the first stop.
Conclusion: By the principle of induction, Eq. 4 is true for all . ∎