Optimal noise functions for location privacy on continuous regions

05/24/2018
by   Ehab ElSalamouny, et al.
0

Users of location-based services (LBSs) are highly vulnerable to privacy risks since they need to disclose, at least partially, their locations to benefit from these services. One possibility to limit these risks is to obfuscate the location of a user by adding random noise drawn from a noise function. In this paper, we require the noise functions to satisfy a generic location privacy notion called ℓ-privacy, which makes the position of the user in a given region X relatively indistinguishable from other points in X. We also aim at minimizing the loss in the service utility due to such obfuscation. While existing optimization frameworks regard the region X restrictively as a finite set of points, we consider the more realistic case in which the region is rather continuous with a non-zero area. In this situation, we demonstrate that circular noise functions are enough to satisfy ℓ-privacy on X and equivalently on the entire space without any penalty in the utility. Afterwards, we describe a large parametric space of noise functions that satisfy ℓ-privacy on X, and show that this space has always an optimal member, regardless of ℓ and X. We also investigate the recent notion of ϵ-geo-indistinguishability as an instance of ℓ-privacy, and prove in this case that with respect to any increasing loss function, the planar Laplace noise function is optimal for any region having a nonzero area.

READ FULL TEXT VIEW PDF
06/21/2022

Three-way optimization of privacy and utility of location data

With the recent bloom of data and the drive towards an information-based...
12/03/2018

Local Obfuscation Mechanisms for Hiding Probability Distributions

We introduce a formal model for the information leakage of probability d...
07/23/2018

On the Anonymization of Differentially Private Location Obfuscation

Obfuscation techniques in location-based services (LBSs) have been shown...
10/26/2020

Geo-Graph-Indistinguishability: Location Privacy on Road Networks Based on Differential Privacy

In recent years, concerns about location privacy are increasing with the...
06/16/2022

TACO: A Tree-based Approach to Customizing Location Obfuscation based on User Policies

A large body of literature exists for studying Location obfuscation in d...
02/12/2018

Tagvisor: A Privacy Advisor for Sharing Hashtags

Hashtag has emerged as a widely used concept of popular culture and camp...
06/21/2019

n-VDD: Location Privacy Protection Based on Voronoi-Delaunay Duality

To date, location privacy protection is a critical issue in Location-Bas...

1 Introduction

The popularity of hand-held devices, such as smartphones, that have positioning capabilities has lead to the development of Location-Based services (LBSs). In an LBS, the device of a user sends a request together with his geographical position to the service provider who personalizes the service according to the reported location. The usefulness of these LBSs comes at the cost of various privacy risks as discussed by [22, 18, 12]. For example, based on the disclosed locations of the user, an adversary can identify the points of interests of a user, such as the home and workplace, predict his mobility and even reconstruct part of his social network.

To limit these risks, one possibility for achieving location privacy is to make the position of a user indistinguishable to some degree from other locations. A recent trend of research [27, 26, 1, 11] has been directed to obfuscating the user’s location in the submitted queries and has lead to several quantifications of location privacy. For instance, the authors of [27, 26]

have developed a framework in which the location privacy of the user is measured by the expected adversary’s error in estimating the user’s real location. However, this quantification depends on the user’s prior distribution (

i.e.

, his probabilities to be in the individual points of the considered space) and also on the strong assumption that the adversary knows this prior.

Since it is hard to control or even to assess the knowledge of the adversary, another work [1] has introduced the notion of -geo-indistinguishability, which abstracts away from both the knowledge of the adversary and the prior of the user. This notion describes the required protection as a guarantee on the obfuscation mechanism itself. Informally, a mechanism should not report an output that influences too much the knowledge of the adversary about the user’s real location. More precisely, a mechanism satisfies -geo-indistinguishability if the log of the ratio between the probability of reporting an output when the user is at location , and that probability when he is instead at location does not exceed a distinguishability in which is a fixed privacy parameter and is the distance between and . This means that the user’s position is hardly distinguishable from nearby points, while being increasingly (i.e., at a linear rate) distinguishable from far away points. The notion of -geo-indistinguishability is inspired from differential privacy, which was proposed in [8] to protect the privacy of the participants in statistical databases. In principle, the addition or removal of a participant in the database should have a minor impact on the output of algorithm operating on the database. In that sense, -geo-indistinguishability, similarly to differential privacy, abstracts from the adversary’s knowledge, and restricts the information disclosed through the mechanism to the observer.

The idea of restricting the distinguishability between each pair of locations in a geographical region is generalized in [11] to give rise to the notion of -privacy. Here is a function that specifies for every distance a maximum level of distinguishability. The function can take various forms depending on the user’s privacy requirements. For example, if the distinguishability between two points is required to increase linearly, setting yields -geo-indistinguishability. Alternatively, if only the distinguishability between nearby points (within distance ) is required to be restricted, setting , leads to another instance called (, )-location privacy [11].

Obfuscating the position reported to the LBS provider causes a degradation in the quality of the obtained service since it is tuned to the reported location instead of the real one. This degradation is typically measured by a loss function specifying the loss (as a non-negative number) when the distance between the real position of the user and the reported one is . The utility of the mechanism for a user is therefore measured by the expected value of the loss function, taking into account the prior distribution of the user and the probabilistic obfuscation performed by the mechanism.

In this work, our main objective is to provide a mathematically grounded framework that allows to optimize the trade-off between the utility of the LBS requested by the user and his location privacy within a geographical region. A previous approach that was adopted in [4] for the case of -geo-indistinguishability is to regard the region as a finite set of points and to assume that the outputs of a mechanism are also drawn from . In this situation, an optimal mechanism is obtained by solving a linear optimization problem that minimizes the expected loss (taking user’s prior into account) subject to the privacy constraints. Here, the main difficulty is that the number of linear constraints is too large because of the restriction of the distinguishability between every two points in , and considering also every output of the mechanism. Despite the improvement proposed by the authors of [4] to reduce the number of constraints, the size of has still to be very small (e.g., 50 to 75 points) to solve the problem in a reasonable time.

While it is always possible to discretize any geographical region into a finite set of points, this discretization usually incurs a significant loss of quality for the users. For example, to construct a mechanism that satisfies -privacy for the users in Paris using the above linear optimization, we would need to divide its map into a grid of a feasible size (e.g., 63 cells as shown in Figure 1), making every cell 1.5km 1.5km. In this discretization scheme, the position of every user is always approximated by the center of the enclosing cell before being obfuscated by the mechanism. Figure 1 displays one cell in which a user located near its north-east corner asks for the nearest restaurant to his position. In this case, he would get an answer that is tailored, in the best case, to the center of his cell, which is 0.812km away from him. It is clear that the situation gets more problematic as we consider larger regions.

Figure 1: Approach in which Paris is represented by a finite set of cells: (a) Division of the city into 63 squared cells. The side length of every cell is 1.5km. (b) One cell in which the user is 812 meters away from the center.

We take a different approach centered on mechanisms that we call “symmetric”. In these mechanisms, a single distribution

, called the “noise distribution” is used to sample the noise added to the user’s location to produce the reported output. Since the added noise is essentially an Euclidean vector, the distribution

is also regarded as a probability measure on the subsets of the Euclidean vector space

. This distribution can be described more succinctly in many situations by a probability density function (pdf)

, which we refer to as the “noise function” of . This scheme is both simple and scalable with respect to the topology and the size of the considered region

since it is based on one probability distribution (

i.e, on the noise) that is used at every position of the user in . Moreover, the expected loss is independent of the user’s prior, making the notion of an optimal noise dependent only on the region and the considered loss function. In this work, we provide a framework that investigates the above approach in the general setting of the distinguishability (privacy) function and the region of interest , aiming to find the optimal noise function with respect to an arbitrary loss function. More precisely, our main contributions can be summarized as follows.

Main contributions.

  • We extend symmetric mechanisms [11] by using their noise distributions instead of their pdfs (i.e., noise functions) since these latter ones may not exist in some cases (e.g., when the distribution assigns non-zero probabilities to discrete vectors). In this extension, we describe the precise condition on a distribution to exhibit a noise function, and the condition on this function to satisfy -privacy. This privacy condition turns to be independent of the continuity restriction that was imposed in [11] on all noise functions.

  • When the region is continuous with a non-zero area, we prove that some practical instances of -privacy are satisfied on only if they are satisfied on the entire space . Based on this result, the class of circular noise functions turns to be general enough (i.e., without any penalty in the utility due to restriction to this class) to satisfy -privacy on any region having a non-zero area. This extends the special case in which the region is a disc in as shown in [11].

  • For any setting of distinguishability function , set of locations and loss function , we describe precise conditions that allow a space of noise functions to have an optimal member for with respect to and . Based on these conditions, we describe a parametric space of noise functions that always admits such an optimal member.

  • We consider the instance , which corresponds to the notion of -geo-indistinguishability [1], and prove that in this setting the planar Laplacian noise function (a two-dimensional version of the Laplace density function) is optimal, with respect to any increasing loss function and for any region having a non-zero area.

Outline of the paper.

First in Section 2, we review the related work before introducing in Section 3 some preliminaries, such as the notions of mechanisms, -privacy and the utility measure. Then, in Section 4 we develop the formal tools to analyze the privacy of noise distributions and their corresponding noise functions. Afterwards in Section 5, we focus on continuous regions having nonzero areas and discuss the conditions of satisfying -privacy on them before discussing in Section 6 the existence of optimal noise functions considering an arbitrary setting of the distinguishability function , the region and the loss function . As a case study, we describe in Section 7 the optimal noise function for -geo-indistinguishability and finally summarize our conclusions and directions for future work in Section 8.

2 Related work

A possibility to define location privacy is with respect to the ability of an adversary to identify the user’s location [29]. One of the first attempts to achieve location privacy in this direction was to hide the association between the user’s identity and his location by removing his identity from the request submitted to the LBS provider or replace it with a pseudonym [24]. However, it turns out that the user’s identity can be uncovered by correlating his disclosed locations with some background knowledge [2, 21, 13]. This issue motivated recent approaches focusing on obfuscating the user’s location itself before sending it to the server. For example, the authors of [19, 14] proposed a -anonymization of the user location, in the sense that the region reported to the LBS provider, is called a “cloak”, and ensures that the user is indistinguishable from other users. However, as shown by [29], this guarantee may be sometimes inconsistent with the location privacy of the requesting user, for instance if users are in the same location or at least in a small area. In addition, the protection provided by this “cloaking” technique depends heavily on the background knowledge of the adversary. To address this shortcoming, the authors of [27, 26] have developed another metric for location privacy, which is the expected error of the adversary’s estimation of the user’s location. The larger this error is, the higher level of privacy is given to the user. In this quantification, it is explicitly assumed that the adversary knows the user’s prior.

Since it is hard in practice to assess the knowledge of adversaries, specially in the existence of public sources of information [11], a recent concept that is inspired from differential privacy [8] is to quantify location privacy instead by the amount of information leaked through the privacy mechanism itself. Therefore, this makes this measure independent of both the user’s prior and the adversary’s knowledge. Differential privacy has been used for instance by the authors of [7] in a non-interactive setting to sanitize the transit data of the users of Montréal transportation system. To allow such sanitization despite the inherent high-dimensionality of the considered data, the authors adopted a data-dependent approach to restrict the output domain of the sanitization mechanism in the light of the underlying database. In our work, we focus on interactive mechanisms sanitizing the user’s location each time he sends a request to an LBS. An adaptation of differential privacy in this setting was proposed by the authors of [1] in which the distinguishability between the user’s location and another point (in a fixed domain ) increases linearly with the distance between the two points. This makes the user’s location indistinguishable from nearby points, while being increasingly distinguishable from further away points. A generalization of this model has been proposed in [11] in which the distinguishability, modeled by a generic function , between two points still depends on the distance between them, but may take various forms depending on the privacy requirements of the user. The article [11] introduced also a restricted form of “symmetric mechanisms”, which we extend in terms of the underlying noise distributions.

With respect to optimizing the trade-off between privacy and the expected loss, in addition to [4] which we already mentioned in the introduction, the authors of [28] considered this problem from a different perspective. They relied on the view of location privacy as the expected adversary’s error in estimating the user’s real location (as in [27, 26] above) and proposed to construct the mechanism that maximizes the user’s privacy, while respecting a certain threshold on the utility. They also assume that the adversary has an optimal strategy that exploits his knowledge about the user’s prior to guess the real location. This construction is performed by solving a linear optimization problem in which the number of constraints is quadratic with respect to the number of locations in the considered region , and therefore has the same efficiency limitations of the methodology used in [4].

According to the distinction made by [26] between sporadic and continuous location exposure, we focus in this article on the sporadic case in which the locations reported by the user are sparsely distributed over time such that they can be considered independent of each other. In this case it is sufficient to sanitize each single location in an independent manner. However, in the continuous exposure scenario, the successive reported locations are correlated and therefore other approaches are required to protect the user’s entire trace. For instance, [7] describes an efficient mechanism to sanitize a collection of mobility traces in a non-interactive fashion, while in the interactive setting of accessing LBSs, other techniques such as the predictive mechanism [6] may be used to mitigate the impact of the correlation between the user’s successive locations on his privacy.

Finally, we want to point out that our notion of symmetric mechanism is similar to the noise-adding mechanism of [15] in the sense that both of them add continuous obfuscation noise independently of the original data, and the two articles aim to optimize the added noise. However, they differ in two main aspects. First, while the mechanism in [15] adds real-valued noise to the numerical query results, our mechanisms add vector-valued noise to the the user’s real position. Second, while [15] aims to satisfy the standard -differential privacy for statistical databases, our goal is more general in the sense that we want to satisfy -privacy for the user’s locations. The same authors of [15] described also in another work [16] a (nearly) optimal noise-adding mechanism satisfying the approximate -differential privacy for integer-valued and histogram queries.

3 Preliminaries

We consider a user who may be located anywhere in a certain domain of locations , and uses an obfuscation mechanism to produce a noisy position, which is reported to the LBS server. Thus, a mechanism is modeled by a probabilistic function that takes the user’s real location and reports a position to the LBS provider. We write this probabilistic event as . The difference between the reported and real locations is an Euclidean vector , which we coin as the noise vector, 111 Throughout this paper, we denote the space of points (i.e., locations) by , while the space of Euclidean vectors is represented by .. The input domain of the mechanism is arbitrary and is usually specified to capture all the points that the user may visit. The output domain of the mechanism, on the other side, is assumed to be the entire space .

3.1 -privacy

A mechanism satisfies -privacy (for a user) on a domain of locations if it guarantees that for each region , the probability of reporting a point in when the user is at , i.e. , is not “too different” from that probability when he is instead at (both are in ). The restriction on this difference between the two probabilities depends on the distance between and (i.e. ) and the specification of a distinguishability function . More formally, we recall the definition of this notion from [11].

Definition 1 (-privacy [11]).

For a distinguishability function , a mechanism satisfies -privacy on if for all it holds

Note that the level of privacy is controlled by the behavior of with respect to the distance between the two points. The distinguishability function may be also seen as modeling the risk of distinguishing the user’s location from others at distance . For example, the risk level may get lower as the distance grows and is accordingly modeled by an increasing .

3.2 Symmetric mechanisms

A mechanism is called ‘symmetric’ if sampling the noise vector is independent of the real location of the user [11]. More precisely, a symmetric mechanism samples a noise vector using a fixed probability distribution on the subsets of the vector space and then reports to the LBS server the user’s location after adding to it. We call the noise distribution of .

In [11], a symmetric mechanism was defined using the probability density function (pdf) of the distribution , assuming that this pdf exists for . Moreover, this pdf, which is also called a noise function, was assumed to be continuous everywhere in each bounded subregion of , except possibly on finitely many analytic curves. In our reasoning about optimality, we will abstract from these assumptions and base our analysis on the noise distribution as a probability measure before studying its pdf (if it exists). More precisely in Section 4, we will redefine a symmetric mechanism in a more generic manner using its noise distribution , demonstrate the precise conditions on to satisfy -privacy, and then proceed to study its corresponding pdf (i.e., its noise function).

3.3 Loss functions and the expected loss

The utility of a mechanism for the user is measured by the expected (average) “loss” incurred due to reporting noisy locations instead of the real ones. This requires specifying a loss function

that assigns to each noise magnitude a loss value. In general, the expected loss depends on the prior probabilities

of visiting the points of , and of course on the mechanism. However if the mechanism is symmetric (i.e., the noise vector is sampled using a fixed noise distribution as described earlier) the expected loss is independent of and the prior distribution . Assuming that has a probability density function (i.e., a noise function) , it was shown in [11] that the expected loss of with respect to is given by

(1)

In practice, the loss is defined by the user depending on the target LBS. For example, if he wants to query the set of nearest restaurants to his position, may be defined as ; i.e. the less perturbation of his location, the more useful is the response of his query. Alternatively, for a weather forecasting service, may take the value 0 if the noise magnitude is within a certain threshold in which the weather is almost uniform, while it takes larger values beyond this threshold.

4 Noise distributions and noise functions

As mentioned previously in Section 3.2, a symmetric mechanism is determined by its noise distribution , which corresponds to its probability measure on the subsets of the vector space . Therefore, we define a symmetric mechanism in the following by its corresponding distribution .

For any set of points , let be the set of position vectors that correspond to the points in . In addition for any set of vectors , and a vector , let be the translation image of by . Finally, let be the probability that the sampled noise vector is a member of . Then we define a symmetric mechanism using its underlying noise distribution as follows.

Definition 2 (Symmetric mechanism).

A mechanism is said to be symmetric if there is a noise distribution on the subsets of the vector space such that for every input location and a region , it holds that

The above definition means that an output point in is produced by first sampling a noise vector from using , and then adding this vector to the user’s position . It is important to characterize when exactly a noise distribution satisfies -privacy on a set of locations . By Definition 1 of -privacy, the probability of any output of the mechanism should not substantially (subject to the function ) vary from the probability of this event if the user’s position in changes by a vector . If a fixed noise distribution is used for sampling noise vectors independently of the input location, this statement can be translated to an equivalent condition on the distribution . This condition has to take into account all displacements that the user can make in . Therefore, in the following we denote by the set of all possible displacement vectors in (i.e., ).

Theorem 1 (-private distributions).

A noise distribution satisfies -privacy on the domain if and only if

(2)
Proof.

First we show that Def. 1 implies Inequality (2) in the theorem. Consider any , and any . Then there must be two points such that . Let be a planar region such that . Therefore . Using Def. 2 we obtain , and . Now using Def. 1, we get , which yields Inequality (2) by substituting .

Conversely, we show that Inequality (2) implies the inequality in Def. 1. Consider any region , and any . Let . As shown above, . Substituting these equalities in (2) with , we obtain which leads, using Def. 2, to the inequality of Def. 1. ∎

4.1 Noise functions

Since the noise vectors are sampled from the vector space which is clearly continuous, it makes sense to describe a noise distribution by a corresponding probability density function (pdf) . We coin this pdf as the “noise function” of . However, in general, this function may not exist for . For instance, if is a distribution on a discrete set of noise vectors in , then has no noise function. The necessary and sufficient condition on to have a noise function is recognized by the Radon-Nikodym theorem [25, Theorem 5.4], which is formulated using the Lebesgue measure of every subset of . Precisely, a distribution has a noise function if and only if every null subset of , (i.e. having Lebesgue measure zero), has also probability . In formal terms, this property means that whenever . A distribution that has this property is said to be “absolutely continuous” with respect to , and is written as . In this case, the Lebesgue differentiation theorem relates the noise distribution to its noise function , and leads to the following important characterization of -privacy in terms of .

Theorem 2 (-private noise functions).

Let be a noise distribution satisfying . Then and its noise function satisfy -privacy on a domain if and only if there is a null set such that for all vectors , it holds

(3)
Proof.

Since , it follows by the Radon-Nikodym theorem that there is a noise function on the vector space satisfying , for every . Now Let be a ball of radius around . It follows by the Lebesgue differentiation theorem that

(4)

In other words there is a null set (empty or has ) such that the above equation is satisfied for every . Now consider any such that . Then it also holds that . Since satisfies -privacy on , it holds by Theorem 1 that Note that is and therefore . It is also easy to see that . Thus we have

By taking the limits of the above equation when and substituting the two limits using Equation (4) we obtain .

Conversely, suppose that Inequality (3) holds for every such that . Consider any fixed . Then by this inequality, it holds that a.e. in . Let . Then by integrating the latter inequality on any set we get

The above theorem is useful to check whether a given noise function satisfies (or not) -privacy. In fact Condition 3 describes the constraints on the values of to satisfy -privacy. This actually raises another issue, which is central to the objective of this paper. This issue concerns whether these constraints can be used to derive an “optimal” noise function. In general, the answer is negative because for any satisfying -privacy, Condition 3 may be violated for some null set that may be anywhere in . In other words, if we want to construct an optimal noise function, then for any such that , we do not know if the inequality in 3 should hold for the values of at or not. However, the answer to the above question is positive if the values of at the vectors in can be “regulated” such that (3) holds everywhere in . In this case, we would have a strict condition that is satisfied for every pair . It turns out that such “regulation” is possible if the distinguishability function is regular as we define in the following.

Definition 3 (Regular distinguishability functions).

A distinguishability function is said to be regular if for every , it holds that

Note that is a metric on vectors and therefore it respects the well known triangle inequality . Therefore by Definition 3, a distinguishability function is regular if the triangle inequality for vectors still holds when is applied to every one of its terms. An instance of regular distinguishability functions is obtained when the distinguishability is proportional to the above metric (i.e., ). This function describes exactly the notion of -geo-indistinguishability [1], for which we describe an optimal noise function in Section 7.1. In general, for any regular distinguishability function , the following theorem confirms that every noise function can be always regulated to satisfy the privacy Condition 3 everywhere in .

Theorem 3 ((regulating noise functions).

Let be a regular distinguishability function. Then for every domain of locations and every noise function satisfying -privacy on , there is a noise function a.e. in such that for all vectors it holds

(5)
Proof.

Let be regular, and for any set of locations let be a noise function satisfying -privacy on . According to Theorem 2 there is a null set such that for every it holds

Define as follows. For every , let , and for every let . Note that this infimum exists because is nonempty and is lower bounded by . Observe also that a.e. In the following we show that Inequality (5) holds for every two vectors in . First, it is easy to see that for all , Inequality (5) holds since at these vectors. Now for every and , it holds by the definition of that . Based on the hypothesis that is regular, we also claim for every that

(6)

which implies that for all and . Thus we conclude that Inequality (5) holds for every and . We prove Inequality (6) as follows. Suppose this inequality does not hold for some . Then there are such that , i.e. . Since it also holds that because is regular, we obtain which contradicts with the fact that since .

Finally consider any . We show that . Consider any arbitrary small . By the definition of , there must be such that . Recalling that , and using the inequality which was already proved, we obtain . Since is regular, it holds that . Therefore for every . Taking the limits of this inequality as yields . ∎

Theorem 3 allows us to assume without loss of generality that the privacy Constraints 5 are satisfied for every pair . In fact since almost everywhere, the integrals of these two functions are the same on any subset of . This means that is (similar to ) a valid pdf and also has the same expected loss of . As mentioned earlier, this conclusion is useful when we derive the optimal noise function satisfying -privacy for some domain , because we do not need to consider noise functions in which (5) is violated on a null set.

4.2 Circular noise functions

A noise function is called “circular” if all noise vectors having the same magnitude are drawn with the same probability density [11]. This probability density is determined by an underlying function , which we call the “radial” of . Thus, for every vector it holds that . In this case, it is easy to express the expected loss of with respect to a loss function as

(7)

It is also easy to ensure that assigns total probability 1 to all vectors in by the following constraint that we coin as the “total probability law”.

(8)

We now describe the condition on a circular noise function to satisfy -privacy for a domain . This condition depends on the set that captures every two noise magnitudes required to have a restricted distinguishability from each other. This distinguishability for a pair of magnitudes must ensure that every two vectors having these magnitudes are properly indistinguishable from each other. Therefore the distinguishability for is exactly the “minimal” distinguishability defined as

Theorem 4 (-privacy of circular noise functions).

A circular noise function having a radial satisfies -privacy on a domain of locations if and only if there is a discrete set of noise magnitudes such that for all it holds

(9)
Proof.

Suppose that satisfies -privacy on . Then its noise distribution also satisfies it. By the circularity of , the probability of any ball of radius around a vector depends only on the magnitude of (and ) regardless of its direction. Let denote this probability. Now satisfies -privacy if and only if it satisfies the condition of Theorem 1 that can be written for as

for all and This condition (according to Theorem 1) considers all vectors such that and having the magnitudes respectively. The minimum distinguishability is taken to ensure that the distinguishability between every is properly upper-bounded by . By the Lebesgue differentiation theorem, the derivative of with respect to the Lebesgue measure on exists and is equal to almost everywhere in . By the circularity of , this means that for a discrete set of magnitudes, it holds for every that . Applying this limit to the two sides of the above inequality, we obtain the condition stated by the theorem.

Conversely we show that this condition implies that satisfies -privacy as follows. For every such that , there must be such that . Thus , hence . Since , we have . Note that this inequality holds for all vectors in except those having magnitudes in . Thus, this inequality holds for all vectors in where is the set composed of the union of the discrete set of circles having their radii in . Since is clearly a null set in , it follows from Theorem 2 that satisfies -privacy. ∎

The minimal distinguishability depends, by its definition, on and . For example, if is the entire space of locations , and the distinguishability is increasing with , it is easy to see that is exactly .

Based on Theorem 4, the trade-off between the location privacy provided by a noise function and its utility can be observed. In particular, if the incurred loss increases with the noise magnitude, then to provide a reasonable utility, the noise function should intuitively assign high probability densities to short noise vectors to reduce the loss. However in view of Theorem 4 if this function is too biased, it may violate -privacy. Optimizing this trade-off is therefore an interesting issue that we investigate in our work.

Now, we proceed by highlighting an important merit of circular noise functions when the domain is a “disk” in the planar space . Informally, every noise function satisfying -privacy can be replaced by a circular one that both provides the same utility of and also satisfies -privacy. While this result was proved in [11] under a continuity assumption (on noise functions) described in Section 3.2, the following theorem removes the need for this assumption and establishes that result in general when the distinguishability function is regular. Furthermore, this theorem gives a stronger statement about : its radial satisfies the condition (9) of -privacy without exceptions on a discrete set of magnitudes. In this case, we say that “strictly” satisfies -privacy on .

Theorem 5 (Generality of circular noise functions).

Let be a regular distinguishability function, and be a disk in . For every noise function satisfying -privacy for and for every loss function , there exists a circular noise function (with a radial ) such that and strictly satisfies -privacy on , which means that

Proof.

Since is regular, it holds by Theorem 3 that for every noise function satisfying -privacy there is a noise function that satisfies Inequality (5) for every two vectors in . Let be a circular noise function (with a radial ) defined on using the polar coordinates of every vector as . By this definition satisfies the total probability law (8) and is therefore a valid radial. It can be also verified that using Equations (7) and (1) (as in the proof of Theorem 15 in [11]). Finally, using the same argument in the proof of Theorem 23 in [11], it follows that for all . ∎

Finally, an important strength of the approach is that sampling a noise vector from circular functions is very simple compared to sampling from non-circular ones. A generic algorithm for this sampling is described in [11].

5 Noise distributions on continuous regions

In Section 4, we have established the conditions for a noise distribution, and its corresponding noise function to satisfy -privacy on an arbitrary domain . In the following, we focus on the case when is a continuous region with a nonzero area such as a country, a city or in general a region that contains a dense set of points of interests. In this case we find, under a mild condition on and the distinguishability function , that satisfying the conditions of -privacy on is actually equivalent to satisfying these conditions more widely on the entire planar space .

Theorem 6 (Satisfying -privacy for continuous regions).

Let be a distinguishability function satisfying for some distance that for all . Let also be any region that contains a disk of diameter . In this case a noise distribution satisfies -privacy on if and only if it satisfies -privacy on .

Proof.

It is clear that if a noise distribution satisfies -privacy on , it must satisfy it on since .

Conversely, suppose that satisfies -privacy on and satisfies the stated condition. We show in this case that must satisfy -privacy for . More precisely, we demonstrate that the condition of -privacy described by Inequality (2) is satisfied on the domain . Observe that is the entire vector space , and therefore for every two points , we have . Therefore, we proceed by showing that for any

If , it is easy to see that and therefore the above inequality holds since satisfies -privacy on . If otherwise , there is a sequence of points on the line connecting and such that and , and every successive two points are apart, except which are at most apart, i.e. for , and . Since satisfies -privacy on and , we have

which implies that

Since for all , it follows that . It is also clear that