Revisiting Driver Anonymity in ORide

01/16/2021
by   Deepak Kumaraswamy, et al.
0

Ride Hailing Services (RHS) have become a popular means of transportation, and with its popularity comes the concerns of privacy of riders and drivers. ORide is a privacy-preserving RHS proposed in 2017 and uses Somewhat Homomorphic Encryption (SHE). In their protocol, a rider and all drivers in a zone send their encrypted coordinates to the RHS Service Provider (SP) who computes the squared Euclidean distances between them and forwards them to the rider. The rider decrypts these and selects the optimal driver with least Euclidean distance. In this work, we demonstrate a location-harvesting attack where an honest-but-curious rider, making only a single ride request, can determine the exact coordinates of about half the number of responding drivers even when only the distance between the rider and drivers are given. The significance of our attack lies not in inferring location of the optimal driver (which is anyway sent to the rider in clear after ride establishment) but in inferring locations of other drivers in the zone, who aren't (supposed to be) revealed to the rider as per the protocol. We validate our attack by running experiments on zones of varying sizes in arbitrarily selected big cities. Our attack is based on enumerating lattice points on a circle of sufficiently small radius and eliminating solutions based on conditions imposed by the application scenario. Finally, we propose a modification to ORide aimed at thwarting our attack and show that this modification provides sufficient driver anonymity while preserving ride matching accuracy.

READ FULL TEXT VIEW PDF

Authors

page 1

page 4

05/10/2021

Attacks on a Privacy-Preserving Publish-Subscribe System and a Ride-Hailing Service

A privacy-preserving Context-Aware Publish-Subscribe System (CA-PSS) ena...
12/13/2021

Comments on "A Privacy-Preserving Online Ride-Hailing System Without Involving a Third Trusted Server"

Recently, Xie et al. (IEEE Transactions on Information Forensics and Sec...
09/19/2018

Efficient and Privacy-Preserving Ride SharingOrganization for Transferable andNon-Transferable Services

Ride-sharing allows multiple persons to share their trips together in on...
09/19/2018

Efficient and Privacy-Preserving Ride Sharing Organization for Transferable and Non-Transferable Services

Ride-sharing allows multiple persons to share their trips together in on...
09/09/2020

Understanding the Dynamics of Drivers' Locations for Passengers Pickup Performance: A Case Study

With the emergence of e-hailing taxi services, a growing number of schol...
09/10/2019

On Re-Balancing Self-Interested Agents in Ride-Sourcing Transportation Networks

This paper focuses on the problem of controlling self-interested drivers...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Ride Hailing Services such as Uber, Lyft are becoming popular world-wide year over year. According to Pew Research [1], the number of Americans who have used RHS has more than doubled since 2015. In order to provide the service, RHS Service Providers (SP) collect upfront information about individuals desiring to use their services, which include riders and drivers who are part of the network. In addition, details of rides offered and accepted are also collected as part of their billing and statistics gathering. This raises a number of privacy concerns among the individual users. Though the SP would, in general, keep the information secure given the need to keep its reputation high, there is nothing to prevent breach of privacy if either the provider turns malicious or if someone with access to information internal to the provider wants to mine the information for personal gain [2, 3].

A ride hailing service consists of three parties, namely, the SP, a rider who has subscribed for services of the SP and a set of drivers involved in ride selection. The SP is modeled as an honest-but-curious adversary. We consider the threat model where the rider attempts to mount a location-harvesting attack on participating drivers. While there are a number of solutions proposed in the last few years that preserve privacy of riders and drivers with respect to the SP, there are only a few works that look at privacy issues of drivers with respect to riders. The work Geo-locating Drivers [4] does an analysis of features and web APIs of non-privacy preserving RHS apps which can be used to extract privacy sensitive driver data. PrivateRide by Pham et al. [5] describes how riders or other malicious outsiders posing as riders can harvest personal information of drivers for purposes of stalking, blackmailing or other malicious activities. Apart from user-profiling, there are several instances where leakage of information regarding driver locations can lead to serious threats. The article [6] reports of how a gang makes use of ride-hailing apps to harvest driver locations for robbery. Another article [7], claims that some regular (non-SP) taxi drivers, pretending to be customers, located Uber vehicles to attack them. These incidents provide a motivation to consider driver-threat models that leak locations of drivers to riders.

One of the early privacy-preserving ride hailing services is ORide [8] proposed at the USENIX Security Symposium 2017. While the primary focus of this proposal was to provide an oblivious ride-matching solution to riders while preserving the privacy of riders and drivers from the SP, it also considers location-harvesting attacks against drivers by a malicious set of riders who create and cancel fake ride requests simultaneously from multiple locations. ORide ensures the anonymity of the drivers and riders with respect to SP primarily through the use of a Somewhat Homomorphic Encryption (SHE) scheme. There are more recent works that also propose privacy-preserving RHS, and an overview of the works related to RHS is given in Section IV.

In ORide, the SP collects SHE encrypted coordinates of the drivers in the zone of the rider, homomorphically computes the Euclidean distances between the rider and drivers, and then sends these encrypted values to the rider. The rider then decrypts the encrypted distances, chooses the nearest driver and proceeds with ride establishment (the ORide protocol is recalled in Section II-A). This clearly leaks the distances of even those drivers who were not selected to offer the ride. But given that there are many possibilities for the coordinates of the driver, even if only their distance is known, one would expect that in practice the exact driver location is anonymous. However, we show that while the protocol hides personal information of the drivers it offers only limited anonymity for the drivers’ locations w.r.t. a rider who requests a ride.

I-a Our Contribution

In this work, we show a location-harvesting attack on the drivers in the ORide protocol. Along with the privacy for riders and drivers with respect to the SP, ORide also claims that its design offers location privacy for drivers with respect to riders by preventing location-harvesting attacks. This is done using deposit tokens and permutation of driver indices for each ride request, which prevents a malicious rider from making fake ride requests and triangulating locations of all drivers in the zone [8, §8]. We show in Section II-B that even an honest-but-curious rider, with only one ride request and response, can recover the exact coordinates of about half the number of drivers who respond to her ride request. Such an attack is not easy for the SP to detect unlike attacks that involve simultaneous ride requests and cancellations. We remark here that except driver location information, no personal driver information is revealed in our attack. Nonetheless, in Section II-D we discuss practical scenarios where revealing only the drivers locations (without their identities) can be harmful.

Our attack is motivated by the classical Gauss’ circle problem [9, Ch. 9]. ORide uses a map-projection system such as UTM [10] to work with planar integer coordinates. Recovering the integer coordinates of the driver by the rider reduces to solving , where are non-negative integers that are known to the rider. Relabelling this equation as results in a variant of the Gauss’ circle problem. Since is sufficiently small, it is feasible to enumerate all the lattice points (i.e., points with integer coordinates) on the circle of radius . In our case, since always corresponds to the case where a solution is known to exist, we experimentally observe that the number of solutions to be about on average (over our choice of zones in Table I). Then we use the following ideas to further eliminate the potential solutions: (i) the driver coordinates must be in the same zone as that of the rider, (ii) the driver is typically expected to be at a motorable location such as road though the rider can book the ride from anywhere. This allows us to eliminate most of the possibilities (see Algorithm 1) and reduce the number of solutions from 20 to about 2 on average. In Section II-C, we validate our attack by running experiments over zones of different sizes for four arbitrarily chosen big cities, and show that a rider can determine the exact locations of 45% of the responding drivers (see Table I). Our attacks take an average only 2 seconds per driver on a commodity laptop. We stress that we are not only using the geographical information to eliminate locations, but also the fact that all coordinates are encoded as integers and hence there are only a handful of locations to enumerate on the circle in the first place. Our attack exploits an inherent property of SHE schemes – namely the requirement of integer-like encoded inputs for exact arithmetic [11]. We also believe that the abstraction of our attack as enumerating lattice points on a circle (and also our extension to other distance metrics in Appendix A) is generic and will motivate similar exploits in other privacy preserving solutions that use SHE.

In Section III, we propose a modification to the ORide protocol which serves as a solution to overcome our attack. Here the driver obfuscates her location by choosing random coordinates within a certain distance from her original location. Now the rider receives Euclidean distances that are homomorphically computed between her and the driver’s anonymized location. Accordingly we modify the rider’s attack from Section II to account for the fact that these anonymized coordinates (which are represented by different lattice solutions) may not lie on road. However the anonymized coordinates will definitely have a road within proximity , since the driver was originally on road. Through experiments we analyze this new attack on the proposed modification and evaluate its effect on driver anonymity and accuracy (refer Table II). The optimal driver chosen in this case (based on least Euclidean distance between the rider and anonymized drivers’ locations) is sufficiently close to the time-wise closest driver (who takes the least time to arrive at rider’s location). Our solution is therefore viable in practice and is successful in preserving driver anonymity.

In Appendix A, we investigate possible alternate modifications to ORide in an attempt to mitigate our attack. However we show that these non-trivial techniques are eventually vulnerable to the same attack.

In Section IV, we discuss related works on privacy-preserving RHS and also briefly discuss the applicability of our attacks to these works.

Ii Analysis of ORide Protocol

In this section we briefly recall the ORide protocol of [8], followed by a security analysis of the protocol at the rider’s end. We then describe our attack that would allow a rider to predict a driver’s location with good accuracy, and present the results of practical experiments.

Ii-a ORide : A Privacy-Preserving Ride Hailing Service

As mentioned in Section I, ORide is a privacy-preserving ride hailing service that uses an SHE scheme to match riders with drivers. In the process, identities and locations of drivers and riders are not revealed to the SP. The protocol provides accountability for SP and law-enforcement agencies in case of a malicious driver or rider. It also supports convenience features like automatic payment and reputation-rating of drivers/riders. In short, it is a complete and practical solution along with novel methods that help keep the identity of drivers and riders oblivious to the SP, together with accountability and convenience. The experiments done in their paper use real datasets consisting of taxi rides in New York city [12]. Their instantiation provides 112-bit security, based on the FV SHE scheme [13] which relies on the hardness of the Ring Learning With Errors (RLWE) problem.

We give below a high-level overview of the ORide protocol relevant to our attack. (For more details, the reader is referred to the original paper). The registered drivers periodically advertise their geographical zones to the SP. These zones are predefined by the SP and available to drivers and riders. The size of a zone is chosen in such a way that there are sufficiently many riders and drivers to ensure anonymity while maintaining the efficiency of ride-matching. When a rider wishes to hail a ride, she generates an ephemeral FV public/private key-pair . She encrypts her planar coordinates using this key and sends it to the SP along with and her zone . SP broadcasts the public key received from the rider to each driver in . The th driver encrypts her planar coordinates using and sends it to SP. SP homomorphically computes the squared values of the Euclidean distances between each driver and the rider in parallel, and sends the encrypted result to the rider. The rider decrypts the ciphertext sent by SP to obtain the squared Euclidean distance to each driver . She then selects the driver with smallest squared Euclidean distance and then notifies the SP of the selected driver. This selected driver is in turn notified by the SP. As part of the ride establishment protocol a secure channel is then established between the rider and the driver. They then proceed to service the ride request as per the protocol. Further steps, although important, are not relevant to our work and, hence, we do not mention them here.

The threat model addressed in the paper is that of an honest-but-curious SP, whereas the drivers and riders are active adversaries who do not collude with the SP. All the plaintext information is encoded as integer polynomials before encrypting with the FV SHE scheme. In ORide, the apps on the drivers and riders use a map-projection system such as UTM [10] to convert pairs of floating-point latitudes and longitudes to planar integer coordinates. Drivers use third-party services like Google Maps or TomTom for navigation.

Ii-B Attack: Predicting Driver Locations

We now analyze the ORide protocol from the rider’s end. For ease of explanation, the rest of our paper shall refer to the squared Euclidean distance between two points as simply the Euclidean distance. In the ORide protocol, before a rider finally chooses the closest driver, she is given a list of Euclidean distances corresponding to drivers in her zone. In this case, the rider only gets to know the Euclidean distance to each driver, and not the driver’s exact coordinates. Mathematically, this would mean that there are infinite possibilities for the driver’s location on the circumference of a circle defined from the rider’s perspective.

On the contrary, we show that the driver’s Euclidean distance allows the rider to identify the actual location of a driver with good probability. We show that by identifying road networks on a live map (using Google Maps API

[14]), along with the fact that ORide uses integer coordinates, the number of possible driver locations from the rider’s perspective can be reduced significantly, to around locations on average.

Remark. While we make use of the fact that ORide uses integers coordinates, our attack would also work for fixed- point encoding of the coordinates. This is because the current (exact) techniques for fixed-point encodings for RLWE-based SHE schemes essentially use the scaled-integer representation [11].

Before we proceed with our analysis, we make the assumption that when the rider requests a ride and when each driver in the zone sends her encrypted coordinates to SP, the drivers are on road (since we use Google Maps API in our experiments, these include city roads, parking lot roads and many other categories, as specified by the definition of a road segment by Google Maps [14]). This assumption is reasonable since a vast majority of the active drivers at any point in time constantly move around the city looking for potential rides or about to finish serving another ride.

When we say that a driver’s coordinates lie on road, we mean that the coordinates lie within the borders of the road. The current standards for lane width in the United States recommends that each lane is 3 metres wide on average [15]. Since many roads within a city consist of 2 lanes, we assume that a pair of coordinates lie on road if the location is within 3 metres from the centre of a road (neighborhoods in many cities around the world consist mostly of 2 lane roads, so our experiments give a fairly accurate idea of location recovery probabilities). We stress that the drivers can be anywhere in the zone on any road and our experiments indeed follow this distribution.

Rider’s attack. A rider performs the following attack to obtain a set of possible locations for a driver. At the time of ride request, let the rider coordinates be and the driver coordinates be , which the rider does not know. Let the rider’s zone be denoted by . SP receives the encrypted values of , then homomorphically computes the Euclidean distance in encrypted form, and the rider decrypts this to obtain . If is not too large (refer to Section II-C for a concrete discussion on bounds for ), the rider can efficiently find all integer solutions to the equation . The rider could use an algorithm to accomplish this: keep a solution-set, and for every integer , compute . If is an integer, add the coordinates , , , , , , and into this set.

Now, rider maintains a set containing the possible driver locations. For each integral solution satisfying , the rider identifies potential driver coordinates as and adds to if is inside .

Once the rider obtains these possible driver coordinates in , she checks whether each solution lies on road. (Google Maps Road API [14] can be used to achieve this). The rider now obtains a filtered set of coordinates that are inside , and also lie on a road. Note that since the actual coordinates of the driver also satisfy these conditions, it is always present in this set. The cardinality of would denote the number of predicted locations for a driver. If this cardinality is exactly one, then the rider has successfully predicted the driver’s exact location. The attack is summarized in Algorithm 1.

We present an illustrative example in Figure 1. Consider a large zone in Dallas, USA, with a cartesian grid embedded over the road view of the map. Consider a driver and a rider pair inside this zone. The rider is said to be located at the origin, and let the driver be at coordinates (which agrees with our assumption that drivers lie on road). The rider is given the Euclidean distance to this driver. She then obtains all lattice points lying on this circle: . Out of these, the rider filters out coordinates that lie on road (shown as green dots in Figure 1) to obtain . Note that the driver’s actual coordinates belong to . Note also that if the rider’s location, her zone and the Euclidean distance to this driver was given as input to Algorithm 1, we would receive as outputs .

Fig. 1: Illustrative example of location prediction of a single driver by a rider.

Remark. In the scenario of ORide, the rider’s zone usually consists of multiple drivers. Note that in Algorithm 1, when calculating the possible coordinates for the th driver , the analysis that a rider performs for one driver is independent of the analysis for other drivers. Therefore, the averaged results for multiple drivers in one execution of the attack (if the driver locations are randomly and independently sampled subject to above conditions) is equivalent to the averaged results over multiple executions of the attack in the case of only a single driver present inside the zone.

Our implementation of the attack will therefore consider one driver inside the zone, and average the results over multiple experiments, considering randomly chosen rider and driver locations each time.

Input :  The rider’s zone , number of drivers inside , rider’s coordinates , Euclidean distances between the rider and driver ()
Output :  For each driver , denotes the prediction set made for the location of by the rider
Procedure Predict_Driver() :
        ; for each driver  do
               Receive: from SP
               Store unique lattice points
               for  to  do
                     
                      if  then
                             If is an integer, collect the possible symmetric values for and
                            
                            
                             for  do
                                    Compute predicted location for
                                    ;
                                    if () is inside  then
                                          
                                    end if
                                   
                             end for
                            
                      end if
                     
               end for
               Filtered lattice points on road
               for  do
                      Use Google Maps Road API to check if the coordinates lie on road
                     
                      if   then
                            
                      end if
                     
               end for
               Size of is the number of locations that the rider has predicted for
              
               if  then
                      Exact driver loc predictions
                     
               end if
              
        end for
       
        Output: , ,
Algorithm 1 Location-harvesting Attack on ORide

Ii-C Implementation of Our Attack

Using Google Maps API for Python [16], we performed experiments to validate our attack across four arbitrarily chosen cities: New York city, Dallas, Los Angeles and London 111The code for our attack presented in Section II can be accessed at https://github.com/deepakkavoor/rhs-attack. We ran our experiments over zones of sizes { 1 km2, 4 km2, 9 km2, 25 km2, 100 km2, 400 km2, 900 km2 } (with the exception of New York city due to its geography having multiple smaller discontinguous areas). For each city, and for each zone size , we performed 30 experiments. In each experiment, a random square zone of area equal to , was chosen. We chose a random latitude-longitude pair inside for the rider in this zone. For a driver, we similarly chose a random latitude-longitude pair inside that was on road (Google Maps Road API was used to accomplish this).

These coordinates were converted to UTM coordinates using the utm library for Python [17]. The Euclidean distance between these UTM coordinates was made available to the rider. Finally, we obtained the driver’s filtered set of probable locations as described in Algorithm 1. After obtaining the predicted driver coordinates, we averaged the number of such predicted locations over multiple experiments. We also counted the percentage of experiments in which the rider predicted exactly one location. With these considerations, our results for varying grid sizes and cities are shown in Table I.

Zone Size (km2) Output of Algorithm 1 Output of Algoritm 1
New York Dallas Los Angeles London New York Dallas Los Angeles London
1 2.6 1.8 2.5 1.6 32% 52% 32% 60%
2 2.4 1.6 2.1 1.6 48% 52% 36% 68%
4 2.0 2.0 1.8 1.7 44% 44% 52% 56%
9 2.7 1.9 2.1 2.3 38% 56% 48% 40%
25 2.6 2.2 2.4 2.1 36% 44% 36% 48%
100 2.7 2.1 2.2 1.8 32% 44% 44% 56%
400 2.3 2.1 2.5 1.8 36% 48% 28% 56%
900 2.8 3.1 2.4 40% 24% 40%
TABLE I: The rider’s prediction based on Algorithm 1, averaged over 30 experiments.

Ii-C1 Timings

Our experiments were performed on an Intel Core i5-8250U CPU @ 1.60 GHz with 8 GB RAM running Ubuntu 18.04.4 LTS. On an average, one experiment (as described above) took 2 seconds for each driver, showing that our attack is indeed efficient, thus allowing a rider to practically obtain any driver’s coordinates with good confidence.

Ii-C2 Interpretation of values in Table I

Our experiments showed that the average number of solutions to over all the aforementioned zone sizes was 20. When we filter these solutions based on whether they lie inside a zone and on road, the average possible driver coordinates were 2 in number (as indicated by the value avg in Table I), which is a significant reduction. Although it may seem that Euclidean distance gives fair anonymity to driver coordinates, our attack shows that in practice, this is not the case, and a rider can indeed find the driver’s location with good probability. We also note from the average value of exact in Table I that the rider can predict the driver’s exact location around 45% of the times.

Note that in each city, as zones get bigger, the number of lattice solutions and filtered coordinates tend to increase leading to higher avg. More lattice solutions imply that the event when a rider predicts exactly one location for a driver is rare, thus decreasing the value of exact. This trend can be verified from Table I.

Ii-C3 Anonymity Sets

In ORide, when the rider makes a request, she sends her zone identity to SP, and the SP now knows which zone the rider is in. This zone could contain the rider’s home/work address. As pointed out in the ORide paper [8], SP might be able to guess the identities of the riders if this pick-up zone had a limited number of ride activities, and a limited number of riders (as an extreme example, a zone where only one rider lives). Therefore, ORide defines zones in such a way that each zone has at least a large minimum number of ride requests per day. This large minimum is referred to by them as the anonymity-set size. The choice of size of these zones is left to the SP, based on balancing the communication bandwidth requirements and sizes of anonymity sets in those zones. (A very high anonymity set would mean that the demand for rides in that zone is high, leading to longer ride matching times and higher bandwidth usage).

We justify our choice of choosing zones of sizes for our experiments:

  • In a densely populated city like New York City (population density222https://worldpopulationreview.com/us-cities 11,084 persons/km2), where more people tend to use ride hailing services, a smaller zone size would suffice to achieve the required anonymity-set size. In a sparse city like Dallas (population density 1,590 persons/km2), where fewer ride-hailing activities occur, these zones would have to be bigger in size to achieve the same anonymity for riders. Taking into consideration the different possible zone sizes in both densely populated and sparse cities, the experiments validate our attack in zones of areas ranging from 1 km2 to 900 km2.

  • We analyzed the NYC Uber-Dataset [18] for May 2019, and deduced that the demand for taxi rides was very high in Manhattan compared to the other boroughs of NYC. We chose May since this month had one of the highest ride requests for Uber in 2019. Based on this, we followed the zone demarcation that was proposed by ORide: each Census Tract (CT) [19] in Manhattan is considered as one zone. The boroughs of Queens and Bronx are merged into one zone, and the boroughs of Brooklyn and Staten Island are merged into one zone. The size of each CT in Manhattan varies between 1 km2 and 4 km2 that correspond to zone size of higher activity. Since the boroughs other than Manhattan have lesser activity, these zones are expected to have a larger area. Indeed, the combined area of Queens and Bronx is around 390 km2, and the combined area of Brooklyn and Staten Island is around 330 km2. Since this is the primary zone demarcation proposed by the authors of ORide, we found it reasonable to include these ranges of areas for our experiments in Table I.

We next discuss few details involved in the implementation of our attack.

  • Although zones can be of any geographical shape, we chose square zones for ease of choosing random coordinates inside its boundary, and to simplify checking whether a given coordinate lies inside the zone.

  • Latitude-longitude coordinates were converted into UTM formats using Python’s utm library. On an average, this conversion results in a difference of 0.5 metres between the original coordinate and the planar coordinate’s representation. For all practical purposes, this difference is very small, and the two coordinates can be considered to represent the same location.

  • As discussed earlier, based on the NYC Uber-Dataset and ORide’s proposed demarcation, even large sparse zones that have a sufficiently big anonymity-set would rarely exceed km2. Note that the Euclidean distance between two UTM coordinates is equal to the distance (in metres) between latitude-longitude representation of those points. Hence, the possible value of even for a 30 km 30 km grid, would be at most . Since there exists an algorithm to compute solutions to , it is indeed feasible for the rider to perform this analysis on modern computers in very less time, even for different zone structures chosen by SP.

Remark. We give a brief insight into the number of drivers inside a zone, which averages to 400. The zone demarcation proposed for New York by the authors of ORide was discussed briefly above. According to ride information for May 2019 in the NYC Uber-Dataset, a zone in Manhattan had at most 6,000 ride requests per day. (We chose the month of May since it experienced the most ride-requests in the year 2019). We make the same assumption that the authors of ORide did: the drop-off zone for a driver is her waiting zone for new ride requests. Moreover, as in ORide, we assume that the waiting time between a driver’s drop-off event and her next pick-up event is at most 30 minutes. This would mean that during a ride-request event, the available drivers to answer this request are the ones who had a drop-off event inside that zone in the last 30 minutes since the ride-request. We considered the top 20 high-ride zones, and for each zone, grouped the ride requests for a day based on 30 minute intervals. Each 30 minute interval consisted of at most 400 drop-off events inside each zone. This would imply that when a ride request occurs at any time of the day, at most 400 drivers would be waiting in that zone to service this request. We stress that there are at most 400 drivers in all zone demarcations considered above, and as the zone size increases the density of drivers (the number of drivers available for ride request in 1 km2) in that zone decreases.

Ii-D Impact of our Attack

We have experimentally shown that our location-harvesting attack can identify the exact locations of a driver in about 45% of the cases. Equivalently, this means that in a zone of around 400 available drivers, a ride request leaks the locations of around 180 drivers to the rider. Although our attack doesn’t reveal additional driver data such as user profiles, this leak of location information could still cause potential threats to drivers and ultimately affect the SP’s reputation. For instance, according to [7], it is claimed that non-SP taxi drivers try to identify locations of Uber vehicles and attack them. There are also reports of people using ride-hailing apps to locate and rob drivers registered to the SP [6]. In general, ensuring privacy of the locations of drivers in the zone should be an important aspect of any privacy-preserving RHS. Notably, our work (with just a single ride request and response to an honest-but-curious rider) refutes the claim made by ORide that it is designed to prevent location-harvesting attacks by malicious riders who make multiple fake ride requests. We think that this flaw in ORide is not merely an implementation error. The requirement of integer-encoded inputs is inherent to current SHE schemes, and this helps us obtain small number of lattice points on the circle.

Iii Mitigation of our Attack

We propose a solution where the driver can thwart our attack by anonymizing her location. Each driver could choose a random coordinate within a circle of fixed radius around herself, encrypt and send these random coordinates to the SP instead of her original coordinates. We show that this modification to ORide provides sufficient anonymity while preserving ride matching accuracy, and is therefore a reasonable solution to mitigate our attack. We analyze the effect of this technique on driver anonymity and provide concrete values for ride matching accuracy through experiments using Google Maps API.

Remark. Appendix A discusses other ideas that may intuitively seem to thwart our attack. However, we show that those modifications are vulnerable to our attack from Section II and hence do not preserve driver anonymity.

Iii-a Anonymizing Driver Locations

By anonymizing her location, each driver may try to preserve the privacy of her location with respect to a rider. Let each driver (at coordinates ) choose a circle of radius centered at her location (where is publicly known), and pick a random UTM coordinates inside this circle. The driver encrypts (instead of as suggested by ORide) and sends it to SP. We refer to as the anonymized driver coordinates.

As per the original attack in Section II, the rider obtains a Euclidean distance and enumerates all lattice points that correspond to this distance. Due to the changes described above, these lattice points need not correspond to possible driver coordinates. They instead represent possible anonymized driver coordinates. The rider would have next proceeded to filter each lattice point based on whether it is on road or not. But a lattice point which represents an anonymized driver location may not lie a point on road, although the original driver did. Filtering in this way would lead to erroneous conclusions by the rider, and she may throw out a lattice point that actually corresponds to the driver location.

Observe that within distance of anonymized driver coordinates, there will always lie a road (since the original driver was on road). We modify the rider’s attack accordingly to cope with this fix. Suppose there was a lattice point discovered by the rider. Within a circle of radius centered at this point, if there were no roads at all (for instance a lattice point that was in the middle of a park) the rider can then conclude that this point is not the driver’s anonymized coordinates. So, the best option that a rider has (to improve her attack against this obfuscation technique) is to filter each lattice point based on whether there is a road within distance of that point. As we see in Section III-B, the possibility that a lattice point is filtered out in a dense city is low if we choose an appropriate value of . This prevents a rider from eliminating many lattice points thus improving driver anonymity. Moreover, this technique preserves accuracy when compared to ORide as we show next.

Iii-B Anonymity of Drivers with respect to Rider

The value of is public and should be decided by the SP, who can in fact implement the end-user application in such a way that the driver’s device locally computes based on the current location of the driver. If the driver’s local device senses that she is in a densely populated city (and thus there are many roads within close proximity of an arbitrary point in that region of the city), a smaller can be chosen. On the other hand, if the device understands that the driver is in a location where there are very few roads within distance from an arbitrary point in that region, a larger is chosen (for instance, in a sparsely populated city with low road density). This choice of based on the concentration of roads around the driver is motivated by the modification to rider’s attack discussed at the end of Section III-A (the rider’s attack now tries to filter lattice points based on the availability of roads within distance from each lattice point solution).

We consider the number of (anonymized) driver locations predicted by a rider as a measure of anonymity for that driver. This depends on the number of lattice solutions for the Euclidean distance between (anonymized) driver location and rider. Along with this, it also depends on the number of solutions that the rider can further filter based on availability of roads within distance from each solution. We expect anonymity to increase with due to higher probability of finding a road within distance of any location.

Similar to the setup in Section II-C, in the following discussion we average results over 25 experiments where each experiment chooses a random zone of size 4 km2 in the mentioned city (along with random coordinates for a rider and driver) and runs the modified rider’s attack for filtering coordinates.

For a small value of such as 10 m, any coordinate within distance of some point is practically the same location. We experimentally observed that the average anonymity for a driver in Los Angeles was around 3, which is close to what we observe in the original attack (see Table I). Hence small values of should not be chosen since they offer low anonymity.

In a densely populated city such as Los Angeles, most locations within the city are expected to have roads within reasonable distance. For  50 m we observed that the area surrounding most lattice points in Los Angeles had at least one road within 50 m. From a rider’s perspective, this would mean that most lattice solutions obtained by her are possible choices for the anonymized driver coordinates. Experiments showed that the average number of filtered lattice points when  m was 14 (meaning the rider has 14 possible anonymized locations of a driver). This provides sufficient anonymity to a driver in practice, since the probability of correctly predicting a driver’s anonymized coordinate is only . This is certainly an improvement compared to for ORide (Table I).

Considering Dallas, a city with relatively sparse road density, we observed that a significant number of locations did not have roads within 50 m, and this allowed the rider to filter out many possible lattice solutions. Our experiments suggest that choosing m prevents the rider from doing so and offers sufficient anonymity, which averaged around 16.

Iii-C Accuracy of Ride-matching

When a driver chooses random coordinates within distance instead of her own location, the Euclidean distance is now computed between the rider’s location and the anonymized driver location.

Among all drivers in the zone, suppose an optimal driver is chosen according to some metric . For example, if represents Euclidean distance, the optimal driver is the one with least Euclidean distance from her location to the rider in the case of ORide, and the one with least Euclidean distance from her anonymized location to the rider in the case of our modified solution. Let be the time taken for this optimal driver to reach rider. Let be the minimum time taken among all drivers in the zone to reach rider (corresponding to the time-wise closest driver). We evaluate the accuracy of metric as the percentage of experiments in which is less than or equal to 1 minute (in practice it is okay for the rider to wait another extra minute compared to the time-wise closest driver). Google Maps API was used to determine the time taken for a driver to reach the rider.

We chose zones of varying sizes in Los Angeles and Dallas. In each experiment a random rider and 400 drivers were chosen in each zone. We compared the accuracy of Euclidean metric for = 50 m and = 150 m in both scenarios – when used in the context of ORide (computed between the rider and driver’s actual location) and when used in the fix to our attack (computed between the rider and driver’s anonymized location). As discussed previously, we chose  m for Los Angeles and  m for Dallas, respectively, to ensure sufficient anonymity. Moreover, the sizes of zones are chosen to be smaller in Los Angeles (refer Section II-C) and larger in Dallas. The inferred accuracies were averaged over 25 experiments (refer Table II). We see that our solution indeed provides sufficient driver anonymity with respect to rider while preserving accuracy of ride matching compared to ORide.

Choosing large to achieve greater anonymity in a small zone (where driver density is high) leads to loss of accuracy. This seems intuitively correct, since having a large anonymity radius in a small zone with high driver density greatly changes the ordering of drivers based on Euclidean distances. To concretely verify this, we used a similar setup described above and observed that with = 150 m and a 4 km2 zone size in Los Angeles the accuracy of ORide was around 84% whereas that of the modified solution was only 70%. So, should increase with zone size both to preserve accuracy and driver anonymity (prevent filtering of lattice solutions based on availability of roads).

City Zone Radius ORide Our
Size (km2) (m) solution
Los Angeles 4 50 84% 80%
25 50 92% 90%
Dallas 100 150 83% 83%
TABLE II: Comparison of accuracy of selecting best driver in ORide vs. our solution (with anonymized driver locations), averaged over 25 experiments.

Iv Related Works

We primarily consider here references related to privacy preserving ride-hailing services. We also consider some ride-sharing services that specifically deal with driver privacy.

Among providers of RHS namely Lyft, DiDi, OLA, taxify and others, Uber is one of the popular ride service providers. An in-depth analysis of the practices followed by Uber and the impact of price-surging on passengers and drivers are done by Chen et al. [20]. The Guardian [21] reports how anonymized details of New York city taxi drivers can be used to easily convert the data to its original format to obtain personal information. Different threat models are widely considered in the literature, namely, a malicious driver targeting riders, and an honest-but-curious SP harvesting information about riders and drivers with the intention of selling it to other entities for advertising purposes, or with potentially malicious intentions to target high profile individuals. Privacy of the driver is given much less attention; so much so that in a few papers the actual driver locations are revealed to the SP as well as the rider [22] and [23]. As motivated in Section I there can be instances where a malicious rider can target drivers of a specific SP. For example, a competitor SP can masquerade as rider to collect driver profile information or statistics to target the drivers belonging to the specific SP. Geo-locating Drivers by Zhao et al. [4] does a study of leakage of sensitive data, in particular, it evaluates the threat to driver information. They show it is possible to harvest driver data by a malicious outsider SP by analyzing APIs in non-privacy preserving apps provided to drivers by Uber, Lyft and other popularly deployed SPs.

PrivateRide by Pham et al. [5] is one of the first papers to address privacy in RHS. The location of the riders are kept hidden by means of a cloaked region, and location privacy is preserved by using cryptographically secure constructs. They tackle the issue of a malicious outsider posing as a rider to harvest driver, by releasing such driver information only after the ride request is fulfilled and driver and rider are in close proximity. A recent work by Khazbak et al. [22] improves upon the solution of PrivateRide by providing obfuscation techniques (spatial and temporal cloaking), of rider locations, to achieve better results in terms of selecting the closest driver, at the cost of slightly more computational overhead. However, the drivers’ locations are revealed to the rider. Duan et al. [24] improve on the work of Khazbak et al. [22] to propose a discount-based solution that provides incentive to riders based on performance decrease in driver selection, while preserving rider privacy. In their model, drivers upload their exact location to the SP. B-ride [23] is a ride-sharing solution that uses a cloaking area for rider, but the pickup location of the drivers are sent to the rider. [25] is a ride-sharing solution for long-distance travel in which the SP selects a driver whose encrypted pre-defined spatial regions, matches riders whose source and destinations are in those regions.

ORide [8] is a follow-up work by the same authors of PrivateRide that provides more robust privacy and accountability guarantees, and has been described earlier in this paper. All the following works try to improve upon ORide by proposing different models of privacy-preserving closest driver selection by the SP. We note here that our attack is relevant in cases where the rider gets to make a choice, and is not applicable in situations where the SP selects a single suitable driver and provides the same to the rider. pRide by Luo et al. [26] proposes a privacy-preserving ride-matching service involving two non-colluding servers with one being the SP and the other a third-party Crypto Provider (CP). The solution makes use of Road Network Embedding (RNE) [27] technique to transform a road network into a higher dimensional space so that the distance computation between any two nodes in the network can be performed efficiently. They propose two solutions, one using the Paillier cryptosystem and another using BGN cryptosystem. The homomorphically encrypted driver and rider locations received by the SP are sent to the CP along with a random noise where it is decrypted and garbled. The SP then uses a garbled circuit to find the closest driver to the rider and completes the ride request. They show high accuracy in matching the closest driver while preserving the privacy of driver and rider locations. The disadvantage of this scheme is their use of a second Crypto Server that does not collude with the SP, which may be inconvenient to realize in practice, and also the high communication cost between the two servers. lpRide by Yu et al. [28] improves upon pRide to perform all the homomorphic distance computation algorithms on a single SP server thus eliminating high communication cost when two servers are involved. They use modified Paillier cryptosystem [29] for encrypting RNE transformed locations of rider and driver. Wang et al. propose TRACE [30] that uses bilinear pairing for encrypting driver and rider locations. PSRide by Yu et al. [31] uses Paillier cryptosystem and Yao’s garbled circuit with two servers on the same lines as pRide and hence suffers from some of the disadvantages mentioned above.

V Conclusion

In this paper we present an attack on a privacy-preserving RHS, ORide [8]. We show that an honest-but-curious rider can determine the coordinates of nearly half the number of drivers in a zone even when only the Euclidean distance between the rider and a driver is available to the rider. Our attack involves enumeration of lattice points on a circle of appropriate radius and subsequent elimination of lattice points based on geographic conditions. Finally we propose a modification to the ORide protocol as a strategy to mitigate our attack. Here a driver anonymizes her location by choosing a random coordinate within a circle of certain radius around herself. We show through concrete experiments that this technique preserves driver anonymity and accuracy of ride matching.

Although protocols may seem secure in theory, there may arise several complications and vulnerabilities when they are deployed practically, as demonstrated by our attack in Section II. In the future it will be interesting to experimentally investigate the notion of driver privacy with respect to both the SP and rider in more recent works following ORide (lpRide, [28], pRide [26]).

Appendix A

We look at potential ways in which our attack can be thwarted and analyze their efficacy. In the first scenario, in order to obfuscate driver locations, the SP homomorphically adds noise to driver distances before sending them to the rider. For this case, we show that a rider can still break anonymity by recovering the original distances between the rider and the drivers. In the second scenario, the SP uses -norm metric instead of the Euclidean distance and we show that our attack also extends to this case.

Note that increasing zone sizes is not a countermeasure to our attack. As discussed in Section II-C, zone sizes should be small enough (less than 1000 km2 in practice) to ensure efficient ride-matching times and lower bandwidth costs.

a.1 Homomorphic Noise Addition by SP

In order to thwart our attack, the SP could try to obfuscate driver locations by transforming the (encrypted squared) Euclidean distances using a random monotonic polynomial with integer coefficients and of a small degree, as suggested by Kesarwani et al. [32]. Integer coefficients are needed for ease of representation in homomorphic computations, monotonicity is needed to maintain the sorting order of the distance inputs (so that the rider obtains the correct order upon decryption), and low polynomial degree is required for efficient homomorphic evaluation. Let be the Euclidean distance between the rider and a driver in her zone.

The rider would get to know from the SP, for each driver in her zone, the values for some random monotonic integer polynomial of low degree. Note that is unknown to the rider, but the degree , range of coefficients of the polynomial and range of () are publicly known. We claim that the rider can obtain the actual distance .

[33] provides a method of recovering a monotonic integer polynomial of low degree and bounded input range when only sufficiently many outputs evaluated at integer points are provided. We used the publicly available SageMath [34] code from the authors of [33] with parameters similar to that described in [32], namely , and . Next one obtains outputs by evaluating this polynomial on the distances . These two steps are the same as what the SP would do (homomorphically) once it receives inputs from the rider and all drivers in a particular zone. The values, , and are the only values given to the SageMath code in the experiments to recover (squared) Euclidean distances to drivers for various zone sizes. We correlated back the results of the recovery with the input distances and verified that in all cases the recovered distances matched correctly, which means that the rider can proceed with the attack mentioned in Section II after recovering values. The result of recovery of the distances in various zone sizes and the time to recover is given in Table III.

Zone size (km2) 1 4 9 25 100 900
Time (seconds) 38 54 60 274 934 7469
TABLE III: Time taken to recover and the monotonic polynomial for different zone sizes

a.2 -norm Metric by SP

In order to mitigate our attack in Section II, the SP may try to homomorphically compute the -norm (instead of Euclidean distance) of ciphertexts and send it to rider. Let denote coordinates of a rider and driver , respectively. The rider would thus obtain for each driver in her zone (the value of should not be too large to allow efficient homomorphic computations by SP).

Note that if

is odd,

could represent a negative value. Since ORide uses ciphertext packing and non-Boolean circuit representation with the underlying SHE library [8], it is very inefficient to compute the absolute value homomorphically. Hence, the SP would have to use only even values for .

Let . In the rider’s attack, she has to now enumerate all lattice points satisfying the equation . Observe that if is a solution to this equation, then the lattice point is a solution to . This implies that the solution set comprising of lattice points satisfying is smaller than the solution set of lattice points satisfying

. Based on our experiments on various zones and cities, we have estimated the number of lattice points satisfying

to be around 20 (refer to Section II-C). This means that on average, the lattice points satisfying cannot be greater than 20 in number. The rider (similar to the rest of the attack) can then check whether each lattice point lies in the zone and on road, to reduce the number of possible predicted driver locations. In this way, our attack also applies when the SP uses -norm instead of Euclidean distance.

Acknowledgment

The authors would like to thank Sonata Software Limited, Bengaluru, India for funding this work. We also thank the anonymous reviewers of ACM CCS 2020 and USENIX Security 2021 for their useful suggestions regarding obfuscation of driver location to preserve driver privacy.

References