How to eliminate detour behaviors in E-hailing: On-line detection and Pricing regulation

10/15/2019 ∙ by Qiong Tian, et al. ∙ 0

With the fast development of information and communication technology (ICT), taxi business becomes a typical electronic commerce mode. However, one traditional problem still exists in taxi service, that greedy taxi drivers may deliberately take unnecessary detours to overcharge passengers. The detection of these fraudulent behaviors is essential to ensure high-quality taxi service. In this paper, we propose a novel framework for detecting and analyzing the detour misbehaviors both in off-line database and among on-line trips. Applying our framework to real-world taxi data, a remarkable performance (AUC > 0:98, 100 has been achieved in off-line phases, meanwhile, an excellent precision (AUC > 0:9) also has arrived in on-line detection. In additional, some constructive suggestions upon pricing regulation are also provided to control the happening of detours. Finally, some commercial value-added applications in DiDi benefited from our method have yielded good results to improve the map service.



There are no comments yet.


page 1

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In recent years, the advances in sensing, communication, storage and computing have changed our life in various aspects. One representative is the widely used online ride-hailing platforms, such as Uber, Lyft and DiDi, which redefine the industry of taxi service. The scale and richness of different digital trajectories collected by these platforms provide us with opportunities to understand social behaviors and community dynamics in different contexts, showing great potential to revolutionize the services in various areas ranging from public safety, urban planning to transportation management [1].

In modern cities, taxi service is a typical electronic commerce mode which is supervised by Global Positioning Systems (GPS) and online ride-hailing systems. According to the 2018 statistics, there are over 100,000 taxis running every day in Beijing, and about 70,000 in Shanghai. Therefore, gathering and analyzing these large-scale taxis’ trajectories have provided us with a great opportunity to reveal the hidden “facts” about city dynamics and human behaviors [2]. Moreover, lots of value-added applications, such as transportation management, city planning, and personalized services, could benefit from big data analysis over taxis’ GPS data sets.

For a long time, taxi service has been facing a typical challenge that greedy taxi drivers who overcharge passengers may deliberately take unnecessary detours, especially when passengers travel in unfamiliar cities. Currently, experienced staff are responsible for detecting taxi driving frauds via manually checking the corresponding taxis’ trajectories based on feedbacks from passengers. However, it is extremely difficult for these staff to supervise the suspicious taxis efficiently and precisely because of the exponentially increasing number of trips. On the other hand, many frauds are not even noticed and perceived by passengers. Given that most of detour trajectories usually deviate significantly from their route recommended by the online platforms, it is crucial to efficiently and automatically detect and disclose the detour trips to ensure the quality of taxi service.

Traditional anomaly detection method mainly consists of four parts, including distance-based method, density-based method, similarity-based method and learning method. For instance, Knorr et al. (2000)


presented a distance-based algorithm which showed a great efficiency when it comes to outlier detection for large datasets. Breunig et al. (2000)

[4] introduced a method for finding outliers in a multidimensional dataset, which depends on the local density of its neighborhood. The neighborhood is defined by the distance to the MinPts-th nearest neighbor. Zhang et al. (2011) [5] raised an isolation-based anomalous trajectory (iBAT) detection method which achieved a remarkable detection performance. The main idea of his method is to discover those trajectories that are “few” and “different” from the normal trajectory clusters based on similarity. Then Chen et al. (2013) [6] introduced the isolation-based online anomalous trajectory (iBOAT) which is able to detect anomalous trajectories “on-the-fly”. In recent years, learning technology also has been used to solve anomaly detection problems [7, 8, 9, 10].

After reviewing existing studies on anomalous trajectory detection, it is not difficult to find that all these previous studies focus on detecting anomalous behaviors on trip-based level. When considering how to control and eliminate these detected anomalies, we find that all these methods are discussing how to discover the problems rather than how to restrain the happening of anomaly. Moreover, the majority of these studies concentrate on providing a global evaluation of given trajectory, however, new questions have arisen for trajectory mining technique, e.g. Which parts of the trajectory are responsible for the anomalies? Why do the drivers take an anomalous trace, subjectively or unintentionally? To gain a more comprehensive understanding of these cases, more-refined detection, especially studying the anomalous behaviors of drivers in each segment, needs to be conducted to discover more details. Note that more-refined detection will be much beneficial for distinguishing the trajectory outliers caused by road network changes or drivers’ purposes. In addition, the most studies of taxi trajectory detection focus on certain OD pattern, which means the anomalous trajectories will be detected given the origin and destination [5, 6, 11]. While applying detection technique to on-line framework, the distribution of trips‘ OD in real-world is extremely flexible and some unfinished trips‘ destination may be changed in the future. To utilize a finitely known information to achieve an accurately monitoring, a real-time framework should be elaborately established to integrate a variety of information.

In view of these challenges, we propose a framework to discover the anomaly both in off-line database and among on-line trips. According to our frameworks, two mechanisms including real-time warning and price policy has been introduce in the following phases to eliminate the detour behaviors. In summary, the major contributions of our paper are showed below:

  • By extracting the distanced-based feature and time-based feature in off-line phases, the proposed method naturally falls into the category of supervised learning. Implementing the logistic regression model, the proposed method has been proved achieving a remarkable performance by using millions of real-world taxi data.

  • Based on aforementioned off-line model, we further employ the estimated parameter to conduct an on-line detection and propose a real-time warning mechanism. Applying these technique to the on-line experiments, our proposed method ultimately arrives at an excellent precision for giving a warning to the anomalous drivers.

  • In order to eliminate the detour behaviors in a long-term, we also conduct a comprehensive discussion upon pricing mechanism in E-hailing platforms. After extensive experiments, several suggestions upon pricing regulation also have been given to motivate drivers to avoid detours.

  • Last but not least, the statistical data generated by our method can be used to diagnose the changes of road network and evaluate the drivers’ behaviors, which proves our method can be used for innovative applications with wide commercial values.

The rest of paper is organized as follows. First, we give a preliminary knowledge and problem statement in Section 2. Then the Framework consisting of off-line classification, on-line detection and pricing regulation is elaborated in Section 3. Quantitative and qualitative experiments results are presented in Section 4 and two innovative applications are described in Section 5. Finally, we conclude our work in Section 6.

Ii Problem Formulation

In this section, notations used in this paper will be first introduced. A single-mode road network is represented by a directed graph , which includes a node set and a link set segment . There are some important concepts defined as follows:

Definition 1: A segment segment, is a basic unit generated by intersected nodes in , and each segment in the road network will own a specific direction.

For a better understanding, Figure 1 presents the road network model near Beihang University, Beijing. The red points A and B stand for the intersections of streets, thus, the link between A and B is a standard segment in this paper.

Definition 2: A GPS point is denoted by a triple , which stands for the latitude, longitude and the GPS generation time of .

Definition 3: A taxi trajectory is a series of GPS points which are generated by an occupied taxi and then ordered by timestamp,

Obviously, for this trip is the origin position and is the destination position.

Because the GPS information is updated in a couple seconds, one taxi trajectory may contain hundreds (or even thousands) of records, which are mostly redundant in identifying the trip. To reduce the complexity of calculation, we match the taxi trajectory data with a successive segment series of the road network, namely, transforming into a series of ordered segments. One of the most popular map matching algorithms is Hidden Markov Model which finds the most possible sequence of status given a sequence of observations [12]. Adopting this method, we can transfer the taxi trajectory into abstract trajectories, which is defined below.

Fig. 1: Examples of Segments

Definition 4: An abstract trajectory is a series of segments that are generated by the map matching process.

where indicates the segment unit in the trajectory.

To provide E-hailing drivers with real-time traffic guidance, the online ride-hailing platforms will automatically update the route plan at each part of trajectory to guide the taxi driver with dynamic shortest path [19, 20]. For instance, in Figure 2, we can observe that the system will generate rout plan with an ongoing trip. By monitoring the changes among each route plan, the managers of the platforms can get valid evidence on the behaviors of drivers. Thus, the route plan is defined below.

Definition 5: A one-off route plan is a series of segments generated by the platforms.

where indicates the segment unit in the route plan .

Definition 6: Given , a route plan set is a set of route recommendations provided for this trip. In other words, each abstract trajectory () has a route plan set (route plan), which is sorted by created time.

where is the corresponding route plan at , intuitively, .

After assigning the GPS points to segments and introducing the route plans, we will handle with and route plan in the rest of this paper. Our goal is to discover the anomalous segments in the trajectory and identify the detour behaviors among the trips. Formally, the problem is defined as below.

Fig. 2: Examples of real-time route plans

Problem: Given the trajectory and its corresponding route plans , several problems in this paper will be solved includes:

  • Off-line classification

    : This implies that the method should have a high detection accuracy and a low false alarm rate to classify the detour trips and normal trips via off-line data-set.

  • On-line detection: With the real-time updating of trajectory and route plans, we should further to provide real-tine indicator to monitor the on-going trajectory.

  • Long-term regulation: From the perspective of long-term, we should discover the primary causes of detour behaviors and offer the platforms some suggestions to restrain the happening of detour.

Complete notations which will be used in the subsequent analysis are listed in Table I.

Variable Explanation
Road segment
GPS point,
Abstract trajectory,
One-off Route plan,

Destination changing probability

Route plan of given trajectory ,
Set of anomalous segments,
Set of anomalous distance of given ,
Set of anomalous segments of given ,
Set of historical trajectory,
Set of labeled detour trips
Utility of E-hailing drivers

Indicator of log-odds in categorizing model

Proportion of detours in a given trajectory set
Base fare, begin-charged distance and begin-charged time
Actual destination, initial destination of trajectory
Unit distance and unit time
Fare rate per unit distance and per unit time
Operating cost of unit distance, opportunity cost of unit time
Actual destination, initial destination of trajectory
Ratio of detour distance; Ratio of delay time
Coefficients to be estimated in categorizing model
Estimated coefficients from categorizing model
TABLE I: Notations

Iii Methodology

Having defined the necessary notations and stated the problem, we present the detour detection method in this section. As shown in Figure 3, the method consists of four phases. The focus of the first phase is data preparation. The result of data pre-processing is helpful for detour trajectory detection which is investigated in the second phase. The second phase aims to conduct an off-line classification based on the previous processed data including addressing anomaly, qualifying abnormal parts, evaluating the trip and LR-based categorizing. After developing an off-line model to discover detour trips, we further propose an off-line detection mechanism and pricing regulation to eliminate the detour behaviors from the perspective of platforms.

Fig. 3: Overview of our method

Iii-a Data preparation

With the large amount of taxi trajectories consisting of mass GPS points, valid taxi OD pairs can be extracted. These historical trajectories and corresponding route plans could provide platforms with a strong evidence for identifying drivers’ abnormal behaviors.

As shown in Figure 3, we firstly collect the dataset of GPS points for target cities. Secondly, applying the map matching method, we transform dataset of GPS points into a series of segments and obtain the set of . Then we extract the corresponding route plans route plan based on trajectory id from the database. Therefore, we can obtain a structured mapped indexing table that contains three columns, identifier, the abstract trajectory and the route plans route plan. Hence, we define a function , which could easily get a collection of and route plan from when a certain trajectory identifier information is given. For instance, suppose there is an existing and as shown in Figure 2, we can obtain the shown in Table II.

route plan
TABLE II: Example of

Iii-B Off-line classification

In this part, we will discuss how to categorize off-line trajectory as normal trip or detour trip. According to the processed data, four phases including addressing anomaly, qualifying abnormal parts, evaluating the trip and categorizing will be conducted below.

Iii-B1 Addressing anomalous segments

Before trying to detect the detour trajectory, we need to address the anomalous segments which cause deviations in the trajectory . It is obvious that an inconsistency between current segment and the next segment in previous route plan will result in a deviation. Therefore, we define the set to pick out all of anomalous segments in given trajectory below:

Definition 7: Given the , is extracted as a set of anomalous segments. The filtered into should satisfy the following constraint:


Equation (1) indicates the current position deviates from the schedule at the previous step . Take Figure 2 as an example, when the trip arrives at , the driver choose a different route rather than original plan, in other words,

After extracting the anomalous set , now we have an efficient mechanism to look for the anomalous segments in the trajectory. The main purpose of the following stage is to provide a score quantifying the anomalous degree of the segments in the .

Iii-B2 Qualifying anomalous segments

In practice, there will exist a significant difference between the previous and current route plans located at anomalous segment. Both two scenarios in Figure 4 depict that a driver takes a new route rather than following the previous route plan, while after a lap of inconsistent segments, the current route plan usually coincides with the previous deviated route plan because the destination is unchanged. In detail, Figure 4. A reflects a long detour occurs, while Figure 4. B depicts that the driver take a shortcut compared to previous route plan.

Fig. 4: Examples of anomalous segments

In order to characterize the anomaly degree of each anomalous segment in , we need to take both the route distance and travel time occurred at each segment into account, and the principal idea of this method is to comparing the difference between the current and the previous in terms of distance level and time level. Therefore,

Definition 8: Given the , is the set of anomalous distance and is the set of anomalous time occurring at each anomalous segment in , respectively. Here, and can be denoted as follows:


where indicates the function calculating the sum of network (topology) distance, and represents the function calculating the estimated travel time. In practice, we just simply assume that we have sufficient taxi trajectories to estimate the travel time among the specific segment sections, and then we employ the Wide-Deep-Recurrent (WDR) learning model [14] to solve it.

Intuitively, Equation (2) imposes that is the difference between the remaining distance and the previous distance . Equation (3) computes the which is the estimated travel time difference between and during the current time. The advantage of using this measurement is that it contributes to exactly identifying the deviation behaviors such as avoiding congestion, experienced choices and so on. Hence, we can easily observe three kinds of scenarios of and as follows:

  • If and , we can observe the deviation occurring in leads to a worse condition, which means more travel distance and more time-consuming.

  • if or , the deviation occurring in results in complex situations and needs further discussions.

    • The driver chooses another longer route to avoid the traffic jam compared to the initial route plan ( and ).

    • The driver thinks he takes a shortcut compared to the original plan, but unfortunately, he cuts off in traffic ( and ).

  • if and , it’s obvious the driver takes a better route.

Iii-B3 Evaluating the trip

Upon the previous stages, we have updated the anomalous segments set and obtained the deviation distance and travel time on each segment in . After has finished, we need further to provide a holistic evaluation by summarizing both and of the given trajectory.

In reality, however, we should note that some deviations occurring at the tail of the trajectory probably construct a strong noise to misjudge the driver’s purpose. As shown in Figure 5. A-D, the driver heads for an actual destination in Figure 5. D compared to initial setting without advance reports to the system. In this case, the purposes of drivers’ deviations can be fallen into several kinds: changing to a totally different destination, choosing an accessible place to drop off, etc. As a consequence, it’s meaningless to pay attention to these anomalous segments (i.e. in Figure 5 B and D) at the tail due to the inaccurate destination.

Fig. 5: Examples of changing initial destination

Therefore, in order to clear up the purposes of above scenarios, we introduce an destination changing probability defined as follows


where represents the function acquiring the Euclidean distance between and .

In this paper, the probability threshold is empirically introduced, which aims to identify the purposes of drivers’ deviations. If , we think the driver wants to go to an entirely different destination , so we don’t research these special cases in this paper. Nevertheless, if ,we consider difference between and is caused by accessibility (parking, traffic rules and so on) of the destination rather than changing destinations. Under this circumstance, we further employ Tail filtering algorithm to cut off the noises at the tail of and update .

Given the way the ongoing score and is computed, once the trip has finished and has been modified, the distance-based feature and the time-based feature of the given trajectory can be defined as:


where represents the actual travel time of .

In summary, we have so far merged the anomalous distance and anomalous travel to get the holistic distance-based feature and time-based feature . Given a trajectory, can be treated as the ratio of detour distance and can be treated as ratio of delay time.

Iii-B4 Logistic regression based categorizing

Logistic regression (also known as logit model) is widely used to model the outcomes of a categorical dependent variable

[16]. Logistic regression measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function, which is the cumulative logistic distribution. And in this paper, we have discussed how to compute a distance-based feature and a time-based feature , hence we further apply the logistic regression model to categorizing the detour and non-detour trips.

Provide that the detour indicator is a discrete variable ( represents a detour trip) with factors as independent variables, an logistic model can be written in terms of cumulative probability of a specific detour indicator for a given set of factors :


Apply logit transformation to the cumulative probability we have:


where is called the odds, which represents the ratio of the probability of detour to the probability of non-detour. Similarly, is a log-odds, which is denoted as in the rest of paper. Having transformed the probability into a log-odds, the logit function linearizes the association between the probability and independent variables. In this context, the model is a generalized linear model.

Though as a generalized linear model, simple linear regression estimation methods like least-squares estimation are not applicable. The maximum likelihood estimation (MLE) is applied instead. The log-transformation is widely used to deal with the likelihood function in practice. Therefore, if we set the

, then we have a log-likelihood function:


The MLE method is thus deduced as:


Common optimization techniques are used in order to solve the MLE method involve Gradient Descent Algorithm and Quasi-Newton Methods. In this paper, for brevity, optimization algorithm is not provided. Alternatively, we can solve the above model by function LogisticRegression of Scikit-learn package from Python.

Iii-C On-line detection

In this part, we will further discuss how to monitor the anomalous behavior in real time. Along with the movement of the trajectory, the sets , , will be updated according to all known information at every timestamp. Therefore, given the at timestamp on , the latest and at can be calculated below:


We should note that in Equation 5 will be replaced by since the trip is unfinished. Also, will be switched to to estimate the overall travel time. After then, we can calculate log-odds at current segment :


where , and can be estimated from off-line categorizing model from equation 10.

As a consequence, the latest log-odds should be treated as a significant indicator to detect the driver’s behavior:

  • If , it’s obvious that the on-going trip has fall into the category of detour. Therefore, in our on-line detection setting, a warning will be triggered as a form of sending message when has exceeded 0.

  • Otherwise, the on-going trip will be treated as a normal case, and it should note that If there exists any warning message generated at the previous section, as a consequence, the platform will cancel the warning and inform the driver.

As cases depicted in Figure 6, the driver refused the initial route plan at , afterwards he will receive a warning because the log odds of had exceeded the base line . Nevertheless, the driver deviated from as well, after that the log odds were back to normal level and the warning would be cancelled.

Fig. 6: Example of real-time detection

In E-hailing companies, the analysis of taxi driver behavior based on our online detection method could provide an efficient evaluation of the driver’s performance. Meanwhile, our online detection method could help E-hailing companies build a good service environment with fine, self-disciplined taxi drivers.

Iii-D Long-term pricing regulation

From perspective of long-term, it is necessary to note that pursuing higher income might be the primary cause of detour trips. Therefore, rather than on-line detection, regulating proper fare rate will exert a significant influence on the drivers’ income level, further eliminate the number of detours happening in a long-term. In order to detect these deliberate misbehaviors related to drivers’ income, we may discuss the pricing mechanism in E-hailing platform at first, a price-wise linear fare structure has been widely used in taxi industries as below:


where denotes the base fare if the total distance and the overall travel time do not exceed and , respectively. Moreover, and represent the fare rate per unit distance and per unit time, respectively. And we introduce the Heaviside function to identify whether the distance or the travel time reach the charging standards.

To conduct a deep analysis upon anomaly related to income, we define a utility function of a driver gain some basic understanding of the detour behaviors:


where and are the unit of distance and time, respectively. represents the coefficient of operating costs per unit distance, and indicates the loss of opportunity cost per unit time, which depends on the base fare .

By introducing a utility function , we can have a quantified indicator of a detour behavior in term of its monetary revenue. If the driver takes a detour (increase of and ), he will receive a monetary income , while pay for a fuel cost and opportunity cost . As a consequence, the changes of detour utility will have a significant influence on driver’s intention upon detour or not.

Iv Results and experiments

Iv-a Data collection

In this study, we randomly select December 1th to 31th, 2018 and then choose floating-car datasets of these days from four major cities (Beijing, Shanghai, Guangzhou, Shenzhen). After removing the abnormal cases with very short travel time or extremely high travel speed , we obtain about millions of samples. Combining with passengers’ feedbacks and historical manual verification (customer response system in DiDi), all the trips can be labeled as detour or not. Table III lists the statistics of these four datasets. Meanwhile, the parameter is an empirical value set to 0.01 according to the recommendations of experienced experts from DiDi Chuxing.

City Beijing Shanghai Guangzhou Shenzhen
Driver (ten thousand)
Period 2018/12/01-2018/12/31
Training(%) 40
Testing (%) 60
TABLE III: Statistics of offline dataset

Iv-B Off-line model evaluation

The results of the logistic regression model are presented in Table IV. The results show that the coefficients of and differ across different cities, but signs of the coefficients remain positive, and the p-values show that both distance-based variable and time-based variable are significant. The aforementioned analysis is consistent with the reality. Moreover, the classifying boundaries of four cities are depicted in Figure 7.

City Variable Coefficient Standard Error Chi-Square P-value
Beijing Intercept
Shanghai Intercept
Guangzhou Intercept
Shenzhen Intercept
TABLE IV: Outputs for the logistic regression model
Fig. 7: The Classification in four cities

The evaluation for the model in this paper is using the (Area Under ROC Curve). In practice, true positive rate (the fraction of anomalous data that is successfully detected) and false positive rate (the fraction of normal ones that is predicted to be anomalous) are two important measures to evaluate the performance of an anomaly detection method. Obviously, a good anomaly detection method should have both high and low . The curve shows the (y-axis) against the (x-axis), and the value is defined as the area under the curve. Figure 8 depicts the curves of proposed method on four datasets in Table V. We can find that the proposed method is able to achieve high detection rate whilst keeping low false alarm rate. For all datasets, over 90% of detour trajectories can be detected at a 10% false alarm rate.

Fig. 8: The ROC curves of the Logistic regression model

Without loss of generality, a typical comparison analysis is also conducted, we compare the value of LR with the iBAT method [5] as shown in Table V, we can see that LR achieves quite high values ( on all datasets) and the iBAT method achieves lower values ( on all datasets), suggesting that the logistic method under our framework outperforms the iBAT method to detect the detour behaviors. This is due to the fact that iBAT suffers from the problem of data sparsity, if a trajectory is infrequent, it’s difficult to detect outlier based on the theory of similarity. While our method can overcome this obstacle to achieve a remarkable performance by combining the corresponding route plan information.

City Beijing Shanghai Guangzhou Shenzhen
TABLE V: The AUC value of LR and iBAT method

Iv-C On-line detection performance

To evaluate the effectiveness of the on-line detection phase, we divided a trip into ten stages in terms of its completeness. Based on the statistics of warned trips quantity among each stage, we plot the AUC of each stage in the Figure 9. There are two interesting discoveries in the figures: Firstly, the warned trips quantity dramatically increase among the first 50% stages, in other words, the majority of detour behaviors may happen in the first half of the trip. Secondly, due to the limited information known in the first 50% stages, AUCs are not very high (less than 0.8) and there will exists lots of misjudged warnings. However, with going of the trip, the majority of misjudged warnings will be cancelled in our on-line mechanism and ultimately AUCs will achieve an excellent performance() at the tail stages.

Fig. 9: The AUC of warned trips of online detectionin four cities

Iv-D Pricing policy

city 00:00-06:00 06:00-12:00 12:00-17:00 17:00-21:00 21:00-24:00
TABLE VI: The pricing regulation in DiDi Chuxing

To describe the detour behaviors happening in each time interval, we define a detour proportion , which is the ratio of detour quantity and total quantity, as follow:


Figure 17 displays the changes of trip quantity and detour proportion under different time interval in a day. As intuition suggests, all the trend of quantity in four cities show the evening-peak trend (among 17:00-19:00) and off-peak trend in early hours (among 3:00-5:00). While, as a sharp contrast, the detour proportion climbs at the peak in early hours then gradually decreases and tends to some small fluctuations in the daytime.

To explain the changes of detour proportion , we further investigate the total income of drivers and on-duty drivers quantity in a day. As intuitively presented in Figure 11, the total income arrives at the high value during the morning-peak and the evening-peak. However, since a lower demand quantity and fewer drivers, it drops sharply in early hours (among 3:00-5:00). Therefore, we can figure out as the average income per time unit of all drivers from aforementioned statistic.

Fig. 10: Detour Distributions under different time interval in a day
Fig. 11: Total income and driver quantity in four cities

As we have mention above, each driver will have a detour utility from equation 15, so we can provide a statistic of average utility in Figure 12. Meanwhile, to evaluate the gains of detour behaviors, the has been also depicted in Figure 12 to give a comparison with detour utility. Consistent with a low detour proportion mentioned above, the detour utility approaches to during daytime, in other words, taking a detour during daytime may has little effect on increasing monetary income. Contrast to our daytime, the detour utility is far beyond during early hours and mid-night, which means that deliberate detour could bring much more profit than normal driving.

Fig. 12: Utilities in four cities
Fig. 13: Relation between detour proportion and Utility/Alpha4
city intercept of
TABLE VII: Estimated in four cities

To give a intuitive suggestion, we introduce a piece-wise linear approximation to describe how detour proportion changes with detour utility among each time interval:


where are parameters to be estimated, and Figure 13 shows the approximated function in four cities, the statistic of estimated is listed in Table VII.

The first interesting observation is that the estimated s of Guangzhou and Shenzhen are larger than 0, which means that there still exists some drivers to take detour although they are unable to receive more profit (). Not surprisingly, another points happening in Beijing and Shanghai are that there are no detours if less than a threshold, which equals to the intercept of axis depicted in Table VII. In above cases, the platform could increase in these cities to raise of all drivers, as a result, detour drivers will receive a less (even negative) utility compared to the previous condition. By this means, the platform can stimulate drivers in these cities to take a normal trip in order to maintain their income level.

V Applications

From the perspective of platforms, the method presented in this paper also has wide commercial value for optimize map services, such as identifying the changes of road network and recommending a better routes. To achieve this goal, we utilize the conditional probability of refusing route plan:


where denotes the probability that system provides the route plan route plan, and is the probability that driver takes a different route compared to given route plan. In reality, and can be estimated by using the historical trip data.

Fig. 14: Examples of under different time interval in a day

The road sections with high can more likely reveal some road network changes existing in route plan. Taking Figure 14 as an example, if there is a temporary incident occurring on the route plan at 12:00-13:00, the will sudden spurts and then falls into a relatively stable level. Therefore, we can detect changes of in real time to further diagnose the changes of road network by taking more factors (Road grades, traffic flow, traffic rules, weather and so on) into consideration. Combining our method and other detection framework [17, 18], several kinds of network changes (E.g. appearing traffic rules, appearing closed roads, cancelled traffic rules, newly opened roads shown in Figure 15. A-D) can be discovered to correct our route recommendations.

Fig. 15: Examples of road changes under different closed deviation sections

On the contrary, the road sections with low will indicate detour is more likely caused by driver’s behaviors (taking a roundabout route or violating the rules) rather than road conditions. Having applied the above processes to evaluate the driver’s performance on DiDi platform for a month, above 60% misbehaviors of drivers are reduced.

Vi Conclusion

In this paper, we have investigated the problem how to eliminate the detour behaviors in E-hailing platforms, which is motivated by the fact that anomalous trajectories can reveal many hidden “facts” about the city dynamics and human behaviors. To solve the problem, we propose a novel framework for detecting and analyzing the detour misbehaviors both in off-line database and among on-line trips. Applying our framework to real-world taxi data, a remarkable performance (, 100% of detour trajectories can be detected at less than 10% false alarm rate) has been achieved in off-line phases, meanwhile, an excellent precision () also has arrived in on-line detection. In additional, after conducting extensive experiments, some constructive suggestions upon pricing regulation are also provided to control the happening of detours. Finally, two commercial value-added applications in DiDi benefited from our method have yielded good results to improve the taxi service.

In the future, more real-world applications, such as road network changes mining and recommendation service correction, will be developed to validate our method. We believe that these value-added applications could benefit from our proposed method and significantly improve the level of taxi service.


This work was done during internship of the second author in Map Department, DiDi chuxing. The authors would like to appreciate Siyuan Feng, a beautiful girl and an intelligent product manager in DiDi chuxing, and Yong Liu, a professional algorithm engineer in DiDi chuxing, for their kindness help and comprehensive advice.


  • [1] D. Zhang, B. Guo, and Z. Yu. 2011. The emergence of social and community intelligence. Computer, 44, 7(2011), 21-28.
  • [2] H. Yuan, Y. Qian, R. Yang, M. Ren. 2014. Human mobility discovering and movement intention detection with GPS trajectories. Decision Support Systems, 63 (2014), 39-51.
  • [3] E.M. Knorr, R.T. Ng, and V. Tucakov. 2000. distance-based outliers: algorithms and applications. VLDB Journal. 8.3-4(2000), 237–253.
  • [4] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. 2000. LOF: Identifying density-based local outliers. ACM SIGMOD Record, 29, 2(2000),93-104.
  • [5] D. Zhang, N. Li, Z.H. Zhou, C. Zhen, L. Sun, and S.J. Li. 2011. iBAT: detecting anomalous taxi trajectories from GPS traces. In Proceedings of the 13th international conference on Ubiquitous computing. ACM, 99-108.
  • [6] C. Chen, D. Zhang, P.S. Castro, N. Li, L. Sun, and S. Li. 2013. iBOAT: Isolation-Based Online Anomalous Trajectory Detection. IEEE Transactions on Intelligent Transportation Systems,14, 2(2013), 806–818.
  • [7] X. Li, J. Han, S. Kim, and H. Gonzalez. 2007. Roam: rule-and motif-based anomaly detection in massive moving object data sets. In Proceedings of the 2007 SIAM International Conference on Data Mining. SIAM, 273–284.
  • [8] R. R. Sillito and R. B. Fisher. 2008. Semi-supervised learning for anomalous trajectory detection. In Proceedings of the British Machine Vision Conference. BMVC. 1035–1044.
  • [9] G. N. Xiao, Z. Juan, C. Zhang. (2015).

    Travel mode detection based on GPS track data and Bayesian networks. Computers

    . ,Environment and Urban Systems, 54(2015), 14-22.
  • [10] S. Dabiri, K. Heaslip. (2018).

    Inferring transportation modes from GPS trajectories using a convolutional neural network

    . Transportation Research Part C: Emerging Technologies, 86(2018), 360-371.
  • [11] Z. Zhou, W.C. Dou, G.C. Jia, C.H. Hu, X.L. Xu, X.T. Wu, and J.G. Pan. 2016. A Method for Real-time Trajectory Monitoring to Improve Taxi Service Using GPS Big Data. Information and Management, 53, 8(2016), 964-977.
  • [12] P. Newson, and J. Krumm. 2009. Hidden Markova map matching through noise and sparseness. In Proceeding of the 17th ACM SIGSPATIAL International Conference on Geographical Information Systems. ACM, 336–343.
  • [13] J. Zobel, and A. Moffat. 2006. Inverted files for text search engines. ACM Computing Surveys, 38, 2(2006), 1–55.
  • [14] Z. Wang, K. Fu, J. P. Ye. 2018. Learning to Estimate the Travel Time. In Proceedings of the 24th ACM SIGKDD international conference on Knowledge discovery and data mining.
  • [15] L. Bergroth, H. Hakonen, and T. Raita. 2000. A survey of longest common subsequence algorithms. IEEE Seventh International Symposium on String Processing and Information Retrieval (2000) 39–48.
  • [16] S. A. Czepiel. (2009). Maximum likelihood estimation of logistic regression models: theory and implementation. Scott A Czepiels Homepage (2009).
  • [17] J. Wang, C. Wang, X. Song, V. Raghavan. (2017). Automatic intersection and traffic rule detection by mining motor-vehicle GPS trajectories. Computers, Environment and Urban Systems, 64(2017), 19-29.
  • [18] E. D”Andrea, F. Marcelloni. (2017). Detection of traffic congestion and incidents from GPS trace analysis. Expert Systems with Applications, 73(2017), 43-56.
  • [19] PABast, H., Delling, D., Goldberg, A., Müller-Hannemann, M., Pajor, T., Sanders, P., et al. (2016). Route Planning in Transportation Networks. Algorithm Engineering.
  • [20] Martin, Cynthia C and Thrift, Philip R and Lineberry, Marion C. (1993) Systems and methods for planning the scheduling travel routes: Google Patents, US Patent 5,272,638.