## 1 Introduction

Ride-hailing services such as Uber and Lyft have become a popular choice of transportation in the past decade [9]. By offering convenience and reliability to its customers, these services are well suited for intra-city commutes. A ride-hailing service usually consists of three entities: the ride-hailing server (RS), riders or customers (RCs) and drivers or vehicles (RVs). The RS is primarily responsible for hosting the ride-hailing service publicly. Drivers can register to this service and become identified as certified RVs. A customer who wishes to make use of this service can sign up as an RC and request for a ride. Depending on the pick-up and destination locations, the RS smartly forwards this ride request from RC to suitable RVs in the region. A list of nearby available RVs is revealed to the RC along with their reputations, who then makes a suitable choice.

However, revealing locations of RCs/RVs to other entities can have severe consequences. A pick-up location could correspond to the residential address of an RC, which can be used for stalking/kidnapping. There have also been instances when RVs registered to a particular ride-hailing service have been targeted by regular taxi drivers or targeted for theft [14, 4]. Preserving privacy of sensitive users’ locations has become a primary concern in ride-hailing services. Generally, the RS is assumed to be honest-but-curious. This means that RS tries to learn as much information as possible without maliciously deviating from the ride-hailing protocol. Such a model is reasonable to assume since the RS wishes to preserve its reputation among the public. But it is still dangerous for the RS to learn locations of RCs and RVs, in case the RS later turns malicious or becomes a victim of cyberattacks [3, 8].

In the past few years, there have been many works that focus on ensuring location privacy of RCs and RVs in the context of ride-hailing services. Section 5 contains an overview of recent papers in this area. These works use cryptographic primitives to hide sensitive location information from the RS, while trying to ensure efficiency and ride-matching accuracy.

In this paper, we focus on TRACE [16], proposed by Wang et. al. in 2018. TRACE is a privacy-preserving solution to ride-hailing services. Here, the RS first spatially divides each city into quadrants. RCs and RVs mask their sensitive location information using randomness and then forward it to RS. The RS then identifies the quadrant in which RCs and RVs lie, without finding out their exact locations. To ensure efficiency and accuracy, the ride request from an RC is forwarded only to RVs that are in the same quadrant as RC. The RC then makes a choice among RVs that lie in its vicinity to finalize ride establishment. Since the RS knows the distribution of RVs in different quadrants, it can periodically change its spatial division of the city to optimize bandwidth usage, reduce waiting time and improve accuracy.

TRACE uses masking with random secrets to prevent other entities of the protocol from learning the underlying message. At a high level, a large prime is chosen and the plaintext is multiplied with a random integer in . These masked messages are encrypted using shared keys to prevent external eavesdroppers from gaining any useful information. Since TRACE uses lightweight cryptographic techniques and simple modular arithmetic, it is efficient in practice. The security guarantees for TRACE state that RS cannot learn about the exact locations of RCs and RVs apart from the quadrant they are in. Additionally, RCs and RVs cannot learn about the secret spatial division maintained by RS, since this could reveal the density of drivers across the city, among other proprietary information and trade secrets of RS.

### 1.1 Our Contribution

We propose an attack on TRACE and disprove the above security claims by showing that the RS can indeed retrieve the exact locations of all RCs and RVs. Secondly, we show that RCs and RVs can learn the secret spatial division information maintained by RS. These attacks constitute a total break of the privacy objectives of TRACE. The underlying idea behind our attack is to eliminate the (unknown) randomness shared across different messages when other entities mask their location values. This allows one to efficiently obtain an overdetermined system of linear (modular) equations in the unknown plaintext locations. We stress that this attack is purely algebraic, and does not make any geometric assumptions about the region. Our attack is efficient (runs in time quadratic in the security parameters) and holds even when all entities are honest-but-curious. For instance, with the recommended security parameters from [16], an RV can recover the quadtree maintained by RS in under a minute (see Table 2) and the RS can recover the exact location of an RV in under a second (see Table 3).

The rest of our paper is organized as follows. In Section 2, we describe relevant steps of the TRACE protocol from [16]. The first attack in Section 3.1 describes how RCs and RVs can recover the secret quadtree maintained by RS. The second attack in Section 3.2 describes how the RS can recover exact locations of RCs and RVs. We briefly discuss a modification to the TRACE protocol that prevents only the first attack, and argue that the second attack (which is more severe than the first) is hard to thwart. Algorithms 1 and 2 summarize the above two attacks. Section 4 provides details about our experimental setup and evaluates the efficiency and success rate of our attack in practice (refer Tables 2 and 3). Section 5 gives an overview of recent works in the area of privacy-preserving ride-hailing services. We conclude our paper and provide remarks about future work in Section 6.

## 2 Overview of TRACE

This section contains a high level overview of the TRACE protocol [16]. Details that are not directly relevant to our attack will be omitted. For more information the reader is referred to the original paper.

### 2.1 Preliminaries

A quadtree with nodes is a data structure used to represent the partition of a 2-D space into quadrants and subquadrants. Each node in the tree is associated with four coordinates denoting corners of the quadrant represented by that node. Every non-leaf node in the quadtree has four children denoting the division of that quadrant into four subquadrants. An example is presented in Figure 1.

Given a point and a quadrant with , we can easily check if lies within the quadrant by doing the following [16, Section III]. For each compute

(1) |

where . If all , then lies within the quadrant, otherwise it does not. Given a quadtree, this idea can be extended to find the quadrant/node of the tree in which lies. Starting at the root, among its four children, find that quadrant/node in which lies; then recurse on its children until a leaf is encountered.

### 2.2 System Design and Security Goals

System Design. The three primary entities in the TRACE protocol are the ride-hailing server/service provider (RS), the customer/rider (RC) and the vehicle/driver (RV). All of the aforementioned entities are assumed to be honest-but-curious. This means that they wish to learn as much information as they can about the other entities without violating any protocol steps.

RS is mainly responsible for forwarding requests/responses between RCs and RVs. As part of the protocol, RS maintains a spatial division of the city into quadrants and uses it to identify regions in which RCs and RVs lie. It does so in such a way that RCs and RVs do not learn any information about the spatial division, while RS does not learn the exact locations of RCs and RVs. The RC can choose a pick-up point and send a ride-hailing request to RS, who then forwards it to the RVs that lie in close vicinity of RC. RVs submit their masked location information to RS at regular intervals, allowing the RS to have an idea of distribution of RVs in the city. Depending on the density of RVs, RS can periodically optimize its space division to improve ride-matching accuracy.

Threat Model. We assume the same threat model that is considered in TRACE. All entities are assumed to be honest-but-curious, that is, they follow the protocol specification but may infer additional data from the observed transcripts. RS does not collude with RCs and RVs (to try and obtain information about customers), since it has an incentive to maintain high reputation.

Security Goals. It is essential to ensure that location information of RCs and RVs is not revealed to other entities. The spatial division maintained by RS should also be kept secret, as this could reveal information about density of drivers in a city and other proprietary information/trade secrets of RS. The authors of TRACE claim that the following security requirements are satisfied during the protocol execution.

###### Claim 1

RS creates a quadtree containing information about spatial divison of the city into quadrants, and masks it with a randomly chosen secret to compute . Given , RCs and RVs do not learn anything about .

###### Claim 2

RS can only learn the quadrants in which RCs lie. RS does not obtain any other information about the exact pick-up locations of RCs.

###### Claim 3

RS can only learn the quadrants in which RVs lie. RS does not obtain any other information about the exact locations of RVs.

### 2.3 TRACE Protocol

This section describes the execution of the TRACE protocol. Figure 2 gives a summarized view of the messages exchanged between different entities.

RS acts as a central entity for forwarding messages between RCs and RVs. It establishes shared keys with RCs and RVs through the Diffie-Hellman key exchange. All messages exchanged between RS and RCs, RVs are encrypted using a symmetric encryption scheme. The authentication of entities is ensured by signing these messages using the BLS signature scheme [1]. The notations used in the TRACE protocol and their descriptions are provided in Table 1.

Notation | Description |
---|---|

RS | Ride-hailing server (service provider) |

RC | Customer (rider) |

RV | Vehicle (driver) |

Security parameters of TRACE | |

Spatial division (quadtree) maintained by RS | |

Large primes chosen by RS | |

Large primes chosen by RC | |

Coordinates of -th vertex in the -th quadtree node | |

Random values used by RS when masking | |

Masked quadtree computed by RS | |

Coordinates of RV | |

Random values chosen by RV when masking | |

Random permutation chosen by RV | |

Data aggregated by masking and | |

Pick-up coordinates of RC | |

Square with at its center | |

Length of a side of | |

Random values used by RC when masking | |

Data aggregated by masking and |

For convenience, the remainder of this paper shall refer to subscripts and as simply and , respectively.

Step 0. RS publishes details about different system parameters (for example, the group and its generator used in the signature scheme, public key of RS, choice of symmetric encryption). RCs and RVs also establish their public keys. RS announces security parameters . As we shall see subsequently, they specify the size of different randomness used when masking location information. Step 3 elaborates on the constraint that should exist among these four parameters to ensure correctness of the protocol.

RS chooses two large public primes and (of size bits and bits, respectively) and a random secret known only to itself.

Step 1. RS divides the two-dimensional space into squares or rectangles represented by a quadtree

with nodes. The -th quadrant has four corners where . RS wishes to learn the quadrant in which each RV lies without learning its exact location. To do this, RS sends a masked version of to RV. Concretely, RS chooses 24 random values () of size bits each. For every vertex of , let be the vertex adjacent to it in the anticlockwise direction, i.e. . RS masks this vertex by computing

The values are public, whereas are only known to RS. The masked coordinate is

where denotes concatenation. Next, RS computes the masked quadrant

for , to get the masked quadtree

It then encrypts and forwards it to RV.

Step 2. RV decrypts this message and uses along with its own randomness to mask its location . For , RV chooses a fresh random number (each bits long) and computes

RV chooses a random permutation to reorder the -indices for each . That is,

The order within each is still preserved, that is,

RV encrypts and forwards it to RS.

Step 3. RS obtains that contains the masked location of each RV, and does the following computations to identify the quadrant/node of the quadtree in which RV lies.

Similarly,

Next, RS computes the difference

Compare this to Equation (1). Since is always positive, RS can identify whether RV lies in by checking if is positive for all . Using the method described in Section 2.1, RS can query the quadtree to identify the exact quadrant where RV lies.

Note that it was necessary to remove the modulus with respect to when obtaining and , otherwise those values would always be positive irrespective of whether RV was inside the quadrant or not. To remove this modulus it is sufficient if the following is always true during the computation of (a similar condition exists for ).

Let denote the bit length of a non-negative integer. Recall that . To ensure the above conditions hold, the parameters are chosen such that

(2) |

Moreover, the size of location coordinates are assumed to be negligible compared to these security parameters. In [16], the above values are set as .

Step 4. RC receives from RS. Now the RC tries to mask its location with respect to the quadtree and send it to RS. Suppose the pick-up point of RC is . RC chooses a square of side (where is 1 km) with this pickup point at its center. Let the vertices of this square be . Recall that in Step 2, each RV masked its location with respect to and computed . RC also does an equivalent computation here; after receiving from RS, it computes a masking for each of the four vertices of to obtain .

Next, RC chooses a public prime of size bits, a public prime of bits, a secret and 4 random values of bits each. It computes

RC encrypts and sends it to RS.

Step 5. The goal here is to convey the masked location information from RC to RVs that are “nearby” to it. RS decrypts the message from RC to get . Similar to Step 3, for each of , RS obtains the quadrant in which the vertex represented by (i.e. ) lies. With this RS knows the quadrants in which the corners of square lies. RS can construct a region enclosing . From Step 3, RS also knows the quadrants in which each RV lies. RS encrypts and sends it to those RVs that lie in (call these RVs as SRVs).

Step 6. SRV receives from RS and tries to add in masked information about its own location to these values. It chooses three random ’s of bits each and computes

SRV encrypts and sends to RC via RS.

Step 7. RC uses (that contain masked information of RC’s and SRVs’ locations) to check if that SRV is within distance .

When , the SRV is within the circle query range of radius around . Call such SRVs as CRVs.

Once again (similar to Step 3) we need to eliminate the modulus with respect to (otherwise would always be positive even if the SRV had distance ). With the relationship imposed on the security parameters (Equation (2) in Step 3), the following condition holds and the modulus is removed.

Step 8. RC masks its take-off point using (similar to Step 2) to create , and forwards along with the list of CRVs to RS. (The take-off point usually lies very close to the RC’s pick-up point from Step 4). Similar to Step 3, RS uses to identify the subregion in which the take-off point lies. RS chooses a random location in this subregion and forwards it to CRVs. Each CRV inspects to make a decision on whether to accept this ride-hailing request from RC. The CRVs who decide to accept send an “Accept Response” to RS. RS forwards the list of ready and available CRVs to RC. RC chooses a suitable CRV from this list, and this CRV is informed about the same by RS. Later, the RC and the chosen CRV proceed with ride establishment by negotiating a shared session key and by exchanging information such as location, phone number, reputation, etc.

## 3 Attack on TRACE

This section presents two attacks which (with high empirical probability) disprove the following privacy claims made about TRACE. First, in Section

3.1, we show that RCs and RVs can obtain the secret spatial division (quadtree) information maintained by RS (violation of Claim 1). We also discuss a modification to the TRACE protocol, as a countermeasure for this attack. Secondly, in Section 3.2, we show how the RS can identify exact locations of all RCs and RVs (violation of Claims 2, 3). We also briefly argue why this attack is not straightforward to thwart. In both attacks, the entities recover location coordinates modulo prime . This is same as recovering the actual integer values since is a very large prime and the coordinate values are negligibly small compared to .Steps from the TRACE protocol described in Section 2.3 will be referred as and when needed. In Section 4, we shall experimentally evaluate the success probability of our attacks.

### 3.1 RCs, RVs obtain Quadtree

After an RV receives the masked quadtree computed by RS (Step 2), we show how it can recover all underlying vertices of the quadtree’s nodes. This same principle allows an RC to obtain information about the quadtree as well (recall that each RC receives from RS in Step 4).

Intuition. Intuitively, our attack works as follows. Each quadtree node is masked by the RS using random values , resulting in . When an RV receives , it knows but does not know . For a single , the number of equations involved is (since there is one equation for each , ). The number of unknowns involved in is (, ’s and quadrant vertices , ). A key observation is that if one considers along with a different , the number of equations is . However the number of unknowns involved is (, ’s and quadrant vertices , where ). That is, considering an additional gives 24 new equations but introduces only 8 new variables. This would allow RV to solve this system of modular equations and obtain the secrets along with quadrant vertices of and .

Formal attack. Without loss of generality, we show how an RV can recover vertices of quadrants when given (i.e. ). The first task is to eliminate the unknown randomness . This can be done by subtracting from . For , we get the following equations.

(3) | ||||

(4) | ||||

(5) | ||||

(6) | ||||

(7) | ||||

(8) |

Here . The parameters are unknown to RV along with the 16 variables . RV can obtain linear (modular) equations in these variables by eliminating as follows.

Compare and :

(9) |

Compare and :

(10) |

Compare and :

(11) |

Compare and :

(12) |

Consider Equations (9)—(15) for all . There are linear (modular) equations in the unknowns . This can be treated as a linear system of equations with elements from the field , and standard techniques from linear algebra such as Gaussian Elimination can be applied to find solutions for in .

#### 3.1.1 Existence of a unique solution

Suppose we represent Equations (9)—(15) using matrix notation as , where , ,

, and vector

represents the 16 unknown quadrant vertices of . We observed that , and the RV cannot obtain unique solutions for from this system.Hence we propose a modification to our attack such that equals the number of unknowns. Previously, considering only gave us 28 equations and unknowns. If we instead consider and take pairwise combinations, we end up with equations and unknowns (which is slightly better). But we observed that in some cases, the resulting matrix had rank . Next, considering and taking pairwise combinations gives us equations and unknowns. We observed (from experiments described in Section 4) that the corresponding matrix always had rank , and an RV can therefore solve this system to get the unique values (in ) of quadrant vertices for . One can proceed further and consider more , but that would be redundant since rank already equals the number of unknowns.

We now formalize the above idea. Let the linear system defined by Equations (9)—(15) (for vertices of ) be denoted by

(16) |

Here and are submatrices corresponding to unknown vertices of and , respectively. Note that , , . In the same manner, take all pairwise combinations from and compute . Define a linear system that considers all the above systems simultaneously.

Comments

There are no comments yet.