I Introduction
Locationbased services (LBSs) have been increasingly employed in a variety of applications, including navigation, resourcetrucking, recommendation, advertising, games, and authentication. One of the popular applications has been to discover interesting locations from collected location data and provide them for third parties. When the providers of LBSs publish some geographic locations of users, the accurate locations may reveal private information, such as home addresses, health conditions, and political orientation.
To prevent or mitigate the privacy breach, many location obfuscation techniques have been proposed to hide accurate locations of users while providing their approximate information used in LBSs. For example, the dummy location insertion [1] generates dummy points and makes a user’s location indistinguishable among a set of locations, which provides anonymity. The spacial cloaking technique [2] chooses a sufficiently large region that includes indistinguishable locations to achieve anonymity. The location perturbation technique [3] adds to each location a controlled random noise and guarantees differential privacy, independently of any side information that an adversary may possess.
Such perturbation techniques have been developed to construct more practical mechanisms for location obfuscation. The planar Laplacian mechanism [4] satisfies geoindistinguishability, an extended notion of differential privacy to the Euclid distance. The optimal geoindistinguishable mechanism [5] minimizes the quality loss caused by the perturbation while preserving geoindistinguishability.
Although these geoindistinguishable mechanisms hide the concrete locations, no prior work has investigated the relationships between geoindistinguishability and anonymity to our knowledge. In this paper, we show geoindistinguishability does not guarantee to provide anonymity. This means that the location data obfuscated by geoindistinguishable mechanisms might be vulnerable to reidentification attacks (e.g., [6, 7]) for instance when the LBS provider shares the obfuscated data with a malicious data analyst. Moreover, such leakage of user identity information can be efficiently detected and quantified using an automated tool such as [8, 9].
In this work we empirically explore the relationships among obfuscation, anonymity, and utility for users and for data analysts in geoindistinguishable location obfuscation. In particular, we propose a method for effectively anonymizing the obfuscated data by deleting some data before publishing them to third parties. The overview of the method is shown in Fig. 1.
The contributions of this paper are summarized as follows:

We evaluate the anonymity of the location data obfuscated by two location obfuscation mechanisms: (the planar Laplacian mechanism) and (the optimal geoindistinguishable mechanism). We empirically show that satisfies stronger anonymity than .

We propose the notion of asymptotic anonymity, which generalizes anonymity to an approximate anonymity of sampled users.

We show that the location deletion method, which simply removes the locations of the users who do not satisfy anonymity, makes the location dataset anonymous while preserving geoindistinguishability. In particular, we demonstrate that requires to delete a smaller number of users than to achieve anonymity.

We demonstrate by experiments that the utility for users and for data analysts is better in than in .
Ii Preliminaries
For a finite set , we denote by the number of elements in , and by
the set of all probability distributions over
.Iia Obfuscation Mechanism
In this work we consider a number of users each reporting some rough information on his single geographic location to an LBS (locationbased service) provider while keeping the exact location hidden from the provider. To compute an obfuscated location , each user uses a location obfuscation mechanism that adds a certain noise to and outputs it as .
Formally, let be a finite set of all possible locations of the users, and be a finite set of all (possibly fake) locations reported by the users. Then a location obfuscation mechanism (or simply an obfuscater) is a probabilistic algorithm that, given an original location , outputs a reported location . We denote by the conditional probability that the mechanism outputs given input .
The probability distribution of the original locations is represented by the prior over
, and the prior probability of a location
is denoted by .IiB Geoindistinguishability
Geoindistinguishability [4] is a notion of location privacy that can be regarded as a variant of local differential privacy [10] in which the privacy budget is multiplied by the Euclidean distance between locations and .
Definition 1 (geoindistinguishability)
Given , an obfuscation mechanism provides geoindistinguishability if for any inputs and any output , we have:
Then the difference between and are proportional to the distance between and . This implies that geoindistinguishability allows an adversary to infer approximate information about the original location (e.g., a user is in Paris), but hides the exact location (e.g., home address) from her. By relaxing the privacy requirements in this way, the amount of noise added to the location can be significantly reduced (compared to local differential privacy [10]). Consequently, geoindistinguishability is useful to implement practical LBSs such as the POI (point of interest) retrieval [4].
IiC Planar Laplacian () Mechanism
The planar Laplacian () mechanism [4] is an example of the mechanism providing geoindistinguishability. It generates a random noise according to a twodimensional Laplace distribution, and obfuscates an original location by adding the noise to . In this paper we use a variant of the planar Laplacian mechanism, which outputs a symbol “” when the obfuscated location is outside the area of interest .
Formally, the variant planar Laplacian mechanism is defined by:
where . Intuitively, is selected to have the best utility by preventing unnecessarily frequent outputs of .
Proposition 1
satisfies geoindistinguishability.
Proof:
can be seen as a cascade of the standard planar Laplacian (that does not output ) and the postprocessing algorithm that maps each to .
It is easy to see that by the triangle inequality,
the standard planar Laplacian satisfies geoindistinguishability.
Since differential privacy is immune to postprocessing,
provides geoindistinguishability.
IiD Optimal Geoindistinguishability () Mechanism
The planar Laplacian mechanism is efficiently computable while the utility of the reported location may not be optimal. For this reason, Bordenabe et al. [5] propose an optimal geoindistinguishable location obfuscation mechanism that given a privacy budget , minimizes the quality loss (QL) that is defined as the expected value of the Euclidean distance, i.e.,
The mechanism can be obtained by solving a linear optimization problem that minimizes while satisfying geoindistinguishability. However, the computational complexity of this optimization is in . To reduce this to , they show an approximation technique based on a spanning graph of the set of locations. See [5] for details.
IiE Anonymity
The notion of anonymity [11] of a user ensures that the user cannot be distinguished from at least other users being at the same location. More formally, for a positive integer , we say that the users at a location are anonymous if where is the number of the users who report as their locations. We also say that a dataset of locations satisfies anonymity if for every location in the dataset, the users at are anonymous. In this definition anonymity depends only on the users that have the lowest level of anonymity, and does not take the other users into account.
Iii Anonymization of Obfuscated Location Data
In this section we address some limitations in the definition of anonymity and introduce two anonymity notions that generalize anonymity. The first notion measures an obfuscater’s capability of anonymization independently of the number of sampled users. The second notion extends the first one to take into account the fact that different users in the dataset may have different levels of anonymity. Finally, we present a simple solution for enhancing the anonymity of the obfuscated data while preserving geoindistinguishability.
Iiia Limitations in the Definition of Anonymity
anonymity is not always useful to evaluate the level of anonymity in the presence of sampled users.
First, anonymity in the context of location privacy depends on the number of the LBS’s users in a sample data, and does not solely express an obfuscater ’s capability of anonymization. For instance, if the number of sampled users increases then anonymity tends to hold for a larger value of (roughly proportionally to ) for the same and . In other words, anonymity is not defined as a property of independently of the number of sampled users.
Second, different users in the dataset may have different anonymity levels, whereas anonymity of the dataset depends only on the users that have the lowest level of anonymity. Hence anonymity is not expressive enough to take into account the different anonymity levels of the other users.
IiiB Asymptotic Anonymity
To overcome the first limitation described in Section IIIA, we introduce a notion that expresses an obfuscater ’s capability of anonymization independently of the number of sampled users. Intuitively, for a , we define the notion of asymptotic anonymity as an extension of anonymity where for any sufficiently large number of users, each user is indistinguishable from roughly other users.
Formally, this notion is defined using the probability that the obfuscation mechanism outputs as follows.
Definition 2 (asymptotic anonymity)
Given a threshold , the users at a location are asymptotically anonymous if . Given a prior and an obfuscater , we say that provides asymptotic anonymity if for all , implies , where .
Note that itself can be computed from and independently of . When provides asymptotic anonymity, the number of users required to achieve anonymity is roughly given by .
Example 1 (Anonymity of the prior and posterior)
Let us formalize the asymptotic anonymity before/after applying a mechanism . The prior provides ()asymptotic anonymity while provides ()asymptotic anonymity^{1}^{1}1Remarkably, the asymptotic anonymity contrasts with the Bayesvulnerability (aka. converse of the Bayes risk [12]) in quantitative information flow. Instead of minimization, the prior/posterior Bayesvulnerabilities are respectively and , and represent the probabilities of an adversary’s correctly guessing in one attempt before/after observing . . To achieve anonymity before (resp. after) applying , the number of users should be roughly (resp. ).
For a large number of users, we can compute an approximate maximum value of from the sample by , which converges to quickly as shown in Fig. 5.
As we will see in Section IVB1, asymptotic anonymity (resp. anonymity) holds only for small values of (resp. ). This implies that the obfuscation mechanism does not necessarily provide anonymity to all users although it hides the exact original locations in terms of geoindistinguishability.
IiiC Asymptotic Anonymity
Similarly to anonymity, the definition of asymptotic anonymity also suffers from the second limitation described in Section IIIA. To evaluate the different levels of anonymity of different users, we introduce another notion that relaxes anonymity by allowing some rate of errors. Roughly speaking, the new notion expresses that given a sample data with users, at least users are anonymous.
Definition 3 (asymptotic anonymity)
Let . Given a and an acceptable error rate , provides asymptotic anonymity if
This notion can be used to roughly estimate the utility loss in anonymizing the location data. When there are
users in the dataset, at most users are not anonymous. If we remove the locations data of these users, then the dataset will satisfy anonymity while the utility of the dataset deteriorates proportionally to the number of deleted users.IiiD Location Deletion Method () for Anonymity
As explained so far, geoindistinguishable mechanisms are useful to hide the exact locations from the LBS provider, whereas they may not be able to provide anonymity of the obfuscated location data. When the LBS provider wishes to publish such obfuscated data to third parties, a simple solution to achieve anonymity is what we call the location deletion method , i.e., to delete the obfuscated locations that do not satisfy anonymity. Then the modified database satisfies anonymity while preserving geoindistinguishability thanks to the immunity to the postprocessing.
More specifically, given a threshold , the minimum number of users that should be removed is approximately given by:
where . When is a Laplacian mechanism, then all locations occur with nonzero probabilities, and thus the approximate number of deleted users is . We will demonstrate the effect of this combination of obfuscation and anonymization by experiments in Section IV.
Iv Experimental Evaluation
In this section we empirically compare the two obfuscation mechanisms and , and illustrate how the location deletion method enhances the anonymity of obfuscated data and affects the utility for users and for data analysts.
Iva Experimental Setup
We performed experiments using the Foursquare dataset (Globalscale Checkin Dataset) [13]. This dataset includes location checkins by users all over the world. In our experiments, we used the data in Manhattan, which consists of location checkins by users. We assumed that each user obfuscated a single location using an geoindistinguishable obfuscation mechanism , and sent the obfuscated location to the LBS provider.
We divided Manhattan into regions with regular intervals. Let be the set of these regions, and be the empirical distribution of the users’ locations over . We defined the distance between two regions and by the Euclidean distance between their central points. Here we normalized the distance so that the distance between two adjacent regions is one.
As an obfuscation mechanism , we employed the planar Laplacian mechanism (in Section IIC) and the Optimal geoindistinguishable mechanism (in Section IID). In , we solved the optimization problem^{2}^{2}2In we set the dilation factor to be . that minimizes QL while satisfying
geoindistinguishability using the linear programming solver
linprog in MATLAB. For both and , we set the privacy budget to be to , which have been widely used in the literature [14].After obtaining all obfuscated regions , we applied the location deletion method to remove the regions that do not satisfy anonymity (where is or ). We denote by (resp. ) the application of (resp. ) postprocessed by .
IvB Experimental Results
We show the experimental results on anonymity and utility.
IvB1 anonymity before anonymization
By experiments we found that unless we add much noise, the obfuscation does not provide anonymity, i.e., for a user. Specifically, is provided by for and by for .
IvB2 Utility for users
In Fig. 2 we compare with in terms of the utility for users. Specifically, we evaluated the quality loss, i.e., the average Euclidean distance between the original region and the obfuscated region . We also evaluated the number of users who remain at the same region after obfuscation, i.e., . As shown in Fig. 2, for a larger , smaller noise is added, hence both and have better utility for users; They decrease the quality loss, and increase the number of users remaining at the same regions. The results also demonstrate that outperforms in terms of the utility for users. This is because chooses locations that minimize the expected distance, which also makes more users remain at the same regions.
IvB3 Utility for data analysts
In Fig. 3 we compare with in terms of the utility for data analysts. The graphs show the ratio of deleted users for () on the left and for () on the right. As for we also show the ratio of users reporting as obfuscated regions (indicated as ()).
According to these graphs, the ratio of deleted users is significantly smaller in than in . To see this in detail, we present the maps of Manhattan that plot the density of the user locations without noise (Fig. (a)a), and of those obfuscated by (Fig. (b)b) and by (Fig. (c)c).
In Fig (b)b we see that the planar Laplacian spreads the population over the whole map. This is because uniformly draws an angle (from ) to which it maps each location. For , the reported regions are distributed almost uniformly. Hence for a small value of , only a few obfuscated regions need to be deleted to achieve anonymity (Fig. 3 on the left), whereas for a large value of , most of the obfuscated locations need to be deleted (Fig. 3 on the right).
In contrast to , the optimal geoindistinguishable mechanism concentrates more users in the crowded regions as shown in Fig. (c)c. To see this in detail, we note that for a more crowded region , the prior probability is larger. Since tries to minimize , if is larger then chooses a region with a smaller , i.e., closer to . Hence the users located in the crowded regions tend not to move by the obfuscation. Conversely, the users outside the crowded regions tend to move to one of the closest crowded regions that provide geoindistinguishability.
Owing to this concentration, provides asymptotic anonymity with a smaller error rate . For instance, in , only users do not satisfy anonymity (), whereas in , users do not (). This means that removes a smaller number of users than , and thus has a better utility for data analysts.
To sum up is more effective than in terms of the utility both for users and for data analysts while providing geoindistinguishability and anonymity.
IvB4 Convergence of the empirical value of
2 prior (without noise)  

after applying  
after applying 
In Fig. 5 we show how the empirically computed value of converges to the value displayed in Table I when increasing the number of users. In the experiments we uniformly sampled a subset (of size ) from the original dataset, applied each mechanism with , and computed the maximum such that users satisfy anonymity (for ). These graphs imply that is (roughly) independent of and thus asymptotic anonymity can be seen as a property of the prior and obfuscater. Therefore is useful to learn that given a different number of sampled users, the dataset roughly satisfies anonymity.
V Conclusion
We have empirically evaluated the anonymity of the location data obfuscated by and by , and shown that provides stronger anonymity than in the sense that it requires to remove a fewer users to achieve anonymity. To analyze this formally, we have introduced the notion of asymptotic anonymity. We have also demonstrated that has better utility for users and for data analysts.
In future work we plan to develop a utilityoptimal obfuscater satisfying geoindistinguishability and anonymity. We will also explore rigorous foundations of obfuscation based on statistics, and relationships with quantitative information flow.
References
 [1] H. Kido, Y. Yanagisawa, and T. Satoh, “Protection of location privacy using dummies for locationbased services,” in Proc. of ICDE Workshops, 2005, p. 1248.
 [2] M. Gruteser and D. Grunwald, “Anonymous usage of locationbased services through spatial and temporal cloaking,” in Proc. of MobiSys. USENIX, 2003.
 [3] A. Machanavajjhala, D. Kifer, J. M. Abowd, J. Gehrke, and L. Vilhuber, “Privacy: Theory meets practice on the map,” in Proc. of ICDE. IEEE, 2008, pp. 277–286.
 [4] M. E. Andrés, N. E. Bordenabe, K. Chatzikokolakis, and C. Palamidessi, “Geoindistinguishability: differential privacy for locationbased systems,” in Proc. of CCS’13. ACM, 2013, pp. 901–914.
 [5] N. E. Bordenabe, K. Chatzikokolakis, and C. Palamidessi, “Optimal geoindistinguishable mechanisms for location privacy,” in Proc. of CCS’14, 2014, pp. 251–262.

[6]
Z. Montazeri, A. Houmansadr, and H. PishroNik, “Achieving perfect location privacy in markov models using anonymization,” in
Proc. of ISITA’16, 2016, pp. 355–359.  [7] T. Murakami, “A succinct model for reidentification of mobility traces based on small training data,” in Proc. of ISITA’18, 2018, to appear.
 [8] T. Chothia, Y. Kawamoto, and C. Novakovic, “A tool for estimating information leakage,” in Proc. of CAV’13, 2013, pp. 690–695.
 [9] ——, “LeakWatch: Estimating information leakage from java programs,” in Proc. of ESORICS’14 Part II, 2014, pp. 219–236.
 [10] J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Local privacy and statistical minimax rates,” in Proc. of FOCS’13, 2013, pp. 429–438.
 [11] L. Sweeney, “kanonymity: A model for protecting privacy,” Int. Journal of Uncertainty, Fuzziness and KnowledgeBased Systems, vol. 10, no. 5, pp. 557–570, 2002.
 [12] K. Chatzikokolakis, C. Palamidessi, and P. Panangaden, “On the Bayes risk in informationhiding protocols,” J. of Comp. Security, vol. 16, no. 5, pp. 531–571, 2008.
 [13] D. Yang, D. Zhang, and B. Qu, “Participatory cultural mapping based on collective behavior data in location based social networks,” ACM Transactions on Intelligent Systems and Technology, vol. 7, no. 3, pp. 30:1–30:23, 2015.
 [14] J. Hsu, M. Gaboardi, A. Haeberlen, S. Khanna, A. Narayan, B. C. Pierce, and A. Roth, “Differential privacy: An economic method for choosing epsilon,” in Proc. of CSF’14, 2014, pp. 398–410.
Comments
There are no comments yet.