As the need and advancements of various kinds of analysis and research using sensitive personal data are becoming more and more popular and significant, the risk of privacy violations is also increasing manifold. One of the most successful proposals to address the issue of privacy protection is differential privacy (DP)[DworkDP1, DworkDP2], a mathematical property that makes it difficult for an attacker to detect the presence of a record in a dataset, and which is typically achieved by adding noise to the results of queries performed on the dataset. With advancements in research, the local variation of differential privacy (LDP)[DuchiLDP] has gained substantial popularity in the community, thanks to the fact that the noise is applied at the user’s end and therefore it does not need a trusted curator. The local model of privacy is particularly suitable for the situations in which the user communicates her personal data in exchange for some service. One such scenario is the use of location-based services (LBS), where the user typically sends her location in exchange for information like the shortest path to a destination, points of interest in the surroundings, traffic information, friends nearby, etc. The local model of differential privacy can be implemented directly on the user’s device (tablet, smartphone, etc.) and the fact that the user can control directly her privacy-protection mechanism is very appealing. However, the drawback of injecting noise locally to the datum is that it decreases also the utility from the perspective of the user (besides that of the data collector) because the quality of service (QoS) is usually affected negatively by the lack of accuracy of the data. Therefore, in the local model of privacy protection, taking into account the utility of the data both on the users’ and the service providers’ sides is essential.
A lot of research has been done to address the privacy-utility trade-off in the local model of privacy. In standard LDP, typically the focus is on optimizing the utility from the point of view of the data collector, i.e., devising mechanisms and post-processing methods that would allow deriving maximally accurate statistics from the collection of the noisy data [DuchiLDP, GoogleRappor]
. Statistical precision is essential for various kinds of studies and analytics performed by data consumers, for reasons including and not limited to improving and upgrading the standard of service, providing useful statistics to companies and research institutions, training machine learning models[GoogleRappor, AppleDP], etc.
In contrast to the above, in other domains, such as location-privacy, the focus is usually on optimizing the QoS, i.e., the utility from the point of view of the user. Notably, this is the case for the framework proposed by Shokri et al. [ShokriLocationPrivacy, ShokriPrivacyGames], and for geo-indistinguishability [AndresKostasCatuscia_GeoInd], a variant of LDP that takes into account the distance between locations [Bordenabe:14:CCS, Fernandes:21:LICS]. The preservation of the best possible quality of service despite the obfuscation operated on the data is what motivates these efforts.
Notably, the optimization of statistical utility does not necessarily imply a substantial improvement in the QoS, nor vice-versa (although they go in the same direction, in the sense that both tend to preserve as much as possible the original information, under the privacy constraint). A counterexample is provided by Example II.1 in the Related Work Section. Thus, we acknowledge and appreciate the existence of a three-way entanglement between the privacy level of the obfuscation, the QoS that is still fostered by a datum after being obfuscated, and the statistical information that can be extracted from a collection of such obfuscated data. To the best of our knowledge, the problem of devising a privacy-protection mechanism that optimizes at the same time the users’ QoS and the statistical utility, has never been addressed by the research community.
In this paper, we aim to explore this three-way conflict between privacy and the two kinds of utility, and we propose a method to find a mechanism for location-privacy that satisfies geo-indistinguishability and strides towards optimizing this triadic privacy-utility trade-off. We use information-theoretic notions to quantify location-privacy and QoS (mutual information and distortion111The term “distortion” is used here in the sense of Rate-Distortion Theory, a sub-area of Information Theory.
, respectively), and a statistical notion (the divergence between distributions) to quantify statistical utility. Since the data distribution is typically unknown, we start by assuming a uniform distribution, and find the Pareto-optimal channel between privacy and QoS for this uniform distribution, using theBlahut-Arimoto algorithm
(that, indeed, depends on the input distribution). Using this channel as an obfuscation mechanism, we let the user obfuscate their data and collect them. Then we post-process these obfuscated data to best estimate the original distribution, using anexpectation-maximization (EM) method known as the iterative Bayesian update
. We then repeat this procedure starting, this time, from the estimated original distribution, in order to get a channel with a better Pareto-optimality (between privacy and QoS) in the sense that it is for a distribution closer to the true one. Continuing to repeat this procedure, we get a more and more refined channel and a better and better estimation of the true distribution. We formally prove the convergence of our method by providing a probabilistic characterization of its limiting behaviour via translating our model into a Markov chain. The convergence then comes as a consequence of the unique probability distribution property of Markov chains. We also validate this result experimentally.
In the process of establishing the soundness of our method, we have also proved various desirable mathematical properties, like its uniform continuity. Most notably, we have discovered that the two seemingly unrelated algorithms mentioned above, the Blahut-Arimoto and the iterative Bayesian update, are actually duals of each other, thus establishing a bridge between the QoS and the statistical notion of utility.
Finally, we perform experiments on real location data from the Gowalla datasets for San Francisco and Paris, and empirically show the convergence of our method and the achievement of a three-way optimality between privacy, QoS, and statistical utility, thus confirming the theoretical results. The experiments also show the efficacy of the combination of the proposed mechanism and the post-processing method (the iterative Bayesian update), in that the estimation of the original distribution is very accurate, especially when measured using a notion of distance between distribution compatible with the ground distance used to measure the QoS. Namely, the Earth Mover’s distance222The Earth movers distance is considered a canonical way to lift a distance on a certain domain to a distance between distributions on the same domain..
In summary, the contribution of this paper is as follows:
We propose an iterative method (Optima3) that produces a geo-indistinguishable privacy mechanism which maximizes the privacy of the users under a given threshold for QoS and provides a very accurate statistical estimation of the distribution of the original data (i.e., the most likely original distribution), thus advancing towards a three-way optimization of privacy and utility.
We prove that Optima3 is uniformly continuous and show its convergence by translating it into the framework of Markov chains. We confirm with experiments on real data that Optima3 has the desired behaviour.
We show that the Blahut-Arimoto algorithm and the iterative Bayesian update are duals of each other, thus establishing a connection between the information-theoretic field of Rate-Distortion Theory and the Statistical EM method.
The paper is organized as follows: Section II discusses some related work. Section III introduces some preliminary notions related to privacy and various information theoretical concepts relevant to our work. Section IV illustrates the duality between Blahut-Arimoto algorithm and iterative Bayesian update. Section V lays the foundation to our method. Section VI demonstrates the translation of our method to the framework of Markov chains to study its limiting behaviour. Section VII is dedicated to the mathematical soundness of our proposed method, illustrating, with proof, some desirable mathematical properties it satisfies. Section VIII exhibits the experimental results using Gowalla check-in data from Paris and San Francisco, reassuring that our method converges and engenders a geo-indistinguishable privacy channel that achieves a three-way privacy-utility optimality. In Section IX, we present the conclusion.
Ii Related work
The clash between privacy and utility has been brought to attention and widely studied in the literature [BrickellPvcyUtility:2008, TraftTradeoff:2014]. Optimization techniques for differential privacy and utility in the context of statistical databases have been intensively analyzed from various perspectives and have been investigated by the community in the recent past [GhoshOptimalPrivacyUtility:2012, GupteOptimalPrivacyforMinMaxAgents, LiOptimalLinearQueries:2010]
. There has been some recent work focusing on devising privacy mechanisms that are optimal to limit the privacy risk against Bayesian inference attacks while maximizing the utility[ShokriLocationPrivacy, ShokriPrivacyGames]. In [Oya:17:CCS] Oya et al. examine a two-way optimal location-privacy preserving mechanism and analyse it with respect to different privacy and utility metrics.
In [Oya:19:EuroSnP], Oya et al. consider the idea of the optimal location-privacy preserving mechanisms (LPPM) proposed by Shokri et al. in [ShokriLocationPrivacy], which maximize the average adversary error (used as the notion of privacy) under some bound on the QoS loss. Oya et al. [Oya:19:EuroSnP] then use the EM method to design blank-slate models that they empirically show to outperform the traditional hardwired models. However, a fundamental problem with this approach is that there can exist LPPMs which are optimal in the sense of [ShokriLocationPrivacy], but the EM method would fail to converge or would not give sensible results. Such examples have been explored by ElSalamouny et al. in Section II, of [EhabConvergenceIBU]. Indeed, [EhabConvergenceIBU] points out the several mistakes in the theoretical results of [AgarwalIBU], which Oya et al. intrinsically rely on, in order to prove the convergence of their proposed method in [Oya:19:EuroSnP].
An example of a location-privacy mechanism that achieves Shokri et al.’s optimality in terms of QoS, but has no statistical utility at all, is the following. Consider three collinear locations, , and , where is in the middle between and and at distance
from each of them. Assume that the prior probability distribution on these three locations is uniform and that the constraint on the utility is that it should not exceed. Then one mechanism that optimizes the QoS, according to [ShokriLocationPrivacy], is the one that maps all locations in . However, we observe that this mechanism is far from providing optimal statistical utility. In fact, the obfuscated locations can only be ’s, and they do not provide any information on what could be the original distribution. In other words, given obfuscated locations ( ’s) all sampled from some discrete distribution on , and of the form with , have the same likelihood to be the original one.
The metrics we considered for privacy, QoS and statistical utility are fairly standard in the literature. A typical information theoretic quantification of privacy is done using mutual information (MI) (a.k.a. conditional entropy) between the original and the noisy data exploiting the compatible relationship between MI and DP [Cuff:16:CCS]. The approach to measure privacy with MI is widespread in the literature: to gauge anonymity [Zhu:05:ICDCS, Chatzikokolakis:08:IC]
, to estimate privacy in machine learning complementing typical loss function used for training, i.e., cross entropy[Abadi:16:CoRR, Tripathy:19:ACCC, Romanelli:20:CSF, Huang:17:Entropy], to assess location-privacy [Oya:17:CCS].
Different metrics to measure QoS for the users have been proposed and studied of late. One of the most popular choices of such a utility-metric to measure quality loss established by a privacy mechanism is the average loss or average distortion, quantifying how much quality a user loses on average, with respect to a chosen metric. This notion has gained the spotlight recently in the community [AndresKostasCatuscia_GeoInd, NicolasKostasCatusciaOptimalGeoInd, ChatzikokolakisPalamidessiStronati2015, ChatzikokolakisElSalamounyPalamidessi_PracticalLocation2017, ShokriLocationPrivacy] since it is the most intuitive notion of QoS.
A standard approach to determine the statistical utility of a dataset privatized with an LDP mechanism is by estimating the distribution of the original data from that of the noisy data. One of the most flexible and powerful techniques to do this is by using the EM method of iterative Bayesian update (IBU) [AgarwalIBU, agrawal2005privacy] that have been studied and analyzed recently [EhabConvergenceIBU, EhabGIBU].
In the quest of resolving this triadic face-off of privacy, QoS, and statistical utility, in this paper, we aim to engender a privacy channel that maximizes MI, which we use to measure privacy for location data, for an allowed threshold of average distortion, which we used to represent the quality loss, and re-construct the distribution of the original data using an EM method like the IBU.
Iii-a Standard notions of privacy
Definition III.1 (Differential privacy [DworkDP1, DworkDP2]).
For a certain query, a randomizing mechanism provides -differential privacy (DP) if, for all neighbouring333differing in exactly one place datasets, and , and all Range(), we have
Definition III.2 (Local differential privacy [DuchiLDP]).
Let and denote the spaces of original and noisy data, respectively. A randomizing mechanism provides -local differential privacy (LDP) if, for all , and all , we have
Definition III.3 (Geo-indistinguishability [AndresKostasCatuscia_GeoInd]).
Let be a space of locations and let denote the Euclidean distance between and . A randomizing mechanism is -geo-indistinguishable if for all , and every , we have
Definition III.4 (Mutual information[ShannonInfoTheory]).
be a pair of random variables defined over the discrete space. The Mutual information (MI) of and is given as:
where is the joint PMF of and , and and are the marginal PMFs of and , respectively.
In the context of the location-privacy, we obfuscate the original locations to other locations in the same space. In other words, we consider the space of the original and the noisy data to be the same, i.e., .
Iii-B Notion of utility
Definition III.5 (Quality of service).
For a discrete space of locations, , let be any distortion (loss) metric. Let be a random variable on with PMF , and let denote any randomizing location-privacy mechanism with
giving the probability of a locationbeing obfuscated as location , i.e., for every .
We define the quality of service (QoS) of for as the average distortion w.r.t the loss function d(.), given as:
Definition III.6 (Full-support probability distribution).
Let be a probability distribution defined on the space . We call to be a full-support distribution on if for every .
Definition III.7 (Iterative Bayesian update [AgarwalIBU]).
Let be a privacy mechanism that locally obfuscates locations on the discrete space such that for all . Let be i.i.d. random variables on following some PMF . Let denote the random variable of the output when is obfuscated with for all .
Suppose we have a realisation of produced with . Let be the empirical distribution of the locations obtained from the observed . Then the iterative Bayesian update (IBU) presents an iterative EM procedure that aims to estimate by converging to the maximum likelihood estimate (MLE) for the observed locations under the given mechanism. IBU works as follows:
Start with any full-support PMF on as an “initial guess”.
for all .
The convergence of IBU has been studied in [AgarwalIBU] and revised in [EhabConvergenceIBU]. In the cases where IBU converges to a MLE, let the limiting estimate of be called , i.e., , and as, for an observed set of noisy locations, it is determined completely with the privatizing mechanism and the starting PMF , let be functionally denoted as .
Definition III.8 (Earth mover’s distance [kantarovich]).
Let and be PMFs defined over a discrete space of locations, . For a metric , the earth mover’s distance (EMD) or the Kantorovich–Rubinstein metric is defined as:
is the set of all joint distributions overwhose marginals are and , i.e., and for every .
Definition III.9 (Statistical utility).
Suppose we have a privacy mechanism that obfuscates location data on the discrete space . Let be the PMF of the original location data sampled from and let be the PMF on which estimates of using IBU for the mechanism . Then we define the statistical utility of the privacy mechanism as .
Iii-C Optimization of privacy and QoS
We recall that in this work we use mutual information to measure privacy and average distortion to measure QoS of a certain privacy mechanism. Therefore, to optimize privacy and QoS, we wish to find a privacy channel that minimizes mutual information (i.e., maximizes privacy) under an allowed level of average distortion. This optimization problem can be solved using the well-known Blahut-Arimoto algorithm [Blahut72computationof, Arimoto1972AnAF].
Definition III.10 (Rate-distortion function [ShannonInfoTheory]).
Let and be a pair of discrete spaces and be the space of all channels encoding from to . Suppose is a random variable defined over and fix a .
The rate-distortion (RD) function for and under the distortion metric is defined as:
where is the random variable on that denotes the output of the encoding of with any .
Definition III.11 (Blahut-Arimoto algorithm [Blahut72computationof, Arimoto1972AnAF]).
Let be a random variable on the discrete space of locations with a PMF . For a metric and a fixed , we wish to find a channel giving . In other words, we wish to find a channel such that:
where and are the same as in Definition III.10. Blahut-Arimoto algorithm (BA) illustrates an iterative method to find such an optimal channel , given as follows:
Start with any stochastic channel , i.e., for all .
where is the negative of the slope of the RD function for a maximum allowance for , given as . We refer to as the loss parameter which captures and reflects the role of within BA. Let the optimal channel that is obtained this way be functionally represented as , as, for a fixed , the limiting channel is uniquely determined by the choices of the original distribution, , and the initial channel, .
In [csizar] Csizar proved the convergence of BA when is finite, therefore guaranteeing to fabricate that maximizes privacy for a given allowance of quality loss in the context of our work.
In [Oya:17:CCS] Oya et al. proved that the limiting channel of BA, , with a loss parameter satisfies -geo-indistinguishability if the chosen metric is an Euclidean metric.
|LPPM||Location-privacy preserving mechanism|
|PMF||Probability mass function|
|IBU||Iterative Bayesian update algorithm|
|MLE||Maximum likelihood estimate|
|QoS||Quality of service|
|EMD||Earth mover’s distance|
|AvgD||Average distortion under a given measure|
|w.r.t.||With respect to|
|a.k.a.||Also known as|
|w.l.o.g.||Without the loss of generality|
|Finite space of source locations|
|Space of all stochastic channels on|
|Simplex of all full_support PMFs on|
|Number of samples|
|Sample of original locations|
|Maximum average distortion|
|Slope parameter of RD function|
|PMF of the original locations (true PMF)|
|Uniform channel over , i.e.,|
|Limiting channel by BA starting with ,|
|MLE of by IBU starting with under|
|Estimate by iteration of IBU starting with under|
|Number of iterations needed for BA to converge|
|Number of iterations needed for IBU to converge|
|Estimate of by Optima3 after iterations starting with|
|Discretized with each component of the PMFs divided in parts|
|Prob. of Optima3 estimating starting from|
|Transition matrix of Optima3 as a Markov chain over|
|Stationary distribution of the Markov chain of Optima3|
|PMF in estimated by Optima3 after iterations|
Iv Duality between IBU and BA
satisfying -geo-indistinguishability, as proposed by Ghosh et al. in [GhoshOptimalPrivacyUtility:2012]. Flipping the roles of and , considering to be the empirical distribution of observed locations obfuscated by , we reduce (4) to the iterative step of IBU. As we consider a finite space of locations, we advantage from the fact that BA converges and, therefore, implying that exists and, hence, is unique. This equivalence with IBU’s iterative step shows that gives a unique MLE for under , i.e., . Thus, we discover a wonderful duality between IBU and BA, and exploit this to obtain a new approach to prove following theorem, complementing the proof given by ElSalamouny et al. for Corollary 2 in [EhabConvergenceIBU].
[Corollary 2 in [EhabConvergenceIBU]] IBU always converges to the unique MLE of the observed distribution for location data on a finite space under planar geometric mechanism with Euclidean metric.
A direct consequence of the duality between IBU and BA, as described and explained above. ∎
V Three-way optimal mechanism: Optima3
Being motivated by the duality of IBU and BA, we proceed to propose an iterative method, which we name Optima3, in order to furnish a geo-indistinguishable location-privacy channel that maximizes privacy for a given level of quality loss and preserves the statistical utility of the dataset.
In the scope of this paper, we consider location data being sampled from a finite and, therefore, discrete space . Let the sampling PMF of the original locations be which we shall refer to as as the true distribution or true PMF of the data. We assume a black-box setting for sampling from the true distribution, i.e., we assume can sample locations from , which act as auxiliary data to construct our mechanism, following the true distribution without explicitly knowing . Such a setting is common in the literature [EhabConvergenceIBU, EhabGIBU], and is similar to what Oya et al. adapted in [Oya:19:EuroSnP]. In this work, to be able to achieve geo-indistinguishability, we shall adhere to the classic Euclidean metric to measure ground distance between two locations. Optima3 proceeds as follows:
Start with any stochastic channel and a full-support PMF on defined on . Set .
In step :
For a fixed the maximum average distortion, .
Do a black-box sampling of i.i.d. locations from following , and obtain true location samples .
Obfuscate with to produce noisy samples which give the empirical distribution of the observed locations .
The targeted three-way optimal channel obtained with Optima3 exists and it guarantees geo-indistinguishability.
It is straightforward to observe that the goal of Optima3 is to engender a privacy channel that minimizes mutual information for a maximum average distortion. To circumvent any bias for BA to achieve the minimum mutual information starting with any PMF, we feed BA, in each iteration of Optima3, with a uniform channel, i.e., for all . Let the privacy channel generated this way, for a fixed of maximum average distortion and always starting from a uniform initial channel, after iterations, be functionally represented as , as the entire Optima3 method is uniquely determined with just the starting full-support PMF .
To evaluate the statistical utility of , we measure the EMD between the true distribution and the estimated PMF at the end of iterations of Optima3. Thus, the quantity parameterizes the utility of for the service providers. In this work we use the same Euclidean distance as the underlying metric for computing both the EMD and the average distortion. This consistency of the use of Euclidean distance threads together and complements the notion of utility on the end of the users and that for the service providers. The opportunity to capture the essence of the same metric that quantifies QoS and statistical utility for a privacy channel was one of the motivations to use EMD to compare between the true and the estimated PMFs. On this note, we observe one of the most crucial properties of the optimal channel generated by BA in the context of IBU.
For a starting PMF on finite and a uniform initial channel , let the limiting channel generated by be . Then there is a unique MLE for a set of observed locations on which are obfuscated with .
In Appendix A. ∎
Theorem V.2 shows that in each iteration of Optima3, IBU will estimate the unique MLE given by the noisy locations under the optimal channel engendered by BA. This is one of the major aspects where Optima3 triumphs over the method proposed by Oya et al. in [Oya:19:EuroSnP] which relies on the flawed theoretical results by [AgarwalIBU] and adheres to the idea of optimal LPPMs presented by Shokri et al. in [ShokriQuantifyingLocPriv2011]. As discussed earlier, ElSalamouny et al. illustrate various LPPMs which would be optimal by Shokri et al.’s standards in [ShokriQuantifyingLocPriv2011] but may not have a unique MLE for a given set of observed data obfuscated by them. This is one of the principal reasons why the EM method capitalized on by Oya et al. in [Oya:19:EuroSnP] is not reliable to estimate the true PMF to a desirable degree of accuracy.
Vi Optima3 as a Markov chain
Remark 2 and Theorem V.2 attest that in a single iteration, Optima3 would spawn a unique channel optimizing location-privacy and QoS, and from there on we would converge to a unique MLE of the observed noisy locations sanitized with that channel. Now, in order to examine the convergence of Optima3, model it as a Markov process.
First of all, we acknowledge that discretizing the probability simplex is reasonable under the realistic computational boundaries. Let be the probability simplex of full-support PMFs on the finite space of locations . We discretize the interval in equal intervals, meaning we allow to take the values from for every and for every . Note that according to our need or complying to the available computational capacity, can be made as large as desired – the only requirement is to have a finite . For example, in the case of using Python as a computational resource, could be assigned something as large as .
An important consequence of such a discretization of the simplex of full-support PMFs on is that it guarantees the finiteness of the probability simplex. Let be the discretized probability simplex on . Note that , and, hence, the size of the discretized probability simplex is finite444Since we are working on a finite space of locations, . An alternative perspective to this is that introduces a discretized mesh within the continuous on , and therefore, every full-support PMF on lies on the discrete mesh. The coarseness of this mesh on is tuned by how large or small is. In particular, . for a given , let the size of the discretized probability simplex on be , i.e., . Figure 1 illustrates an example of such a discretization of a probability simplex on a location space of size 3 by introducing a mesh on a continuous probability simplex. In this example, is the number of points of intersection in the grids inside the area of the triangle denoting the three-dimensional probability simplex for full-support PMFs.
Now we relate this idea of making the probability simplex finite with the aim to model the long-term behaviour of Optima3. Probing the functionality of our proposed method, we see that starting with any PMF , the chance of ending up at some , at the end of one iteration, is determined by the noise injected by the channel generated by . In other words, in iteration of Optima3, feeding to BA and implementing it with the uniform channel , the limiting channel of BA, , is uniquely determined (Remark 2), and once we have the noisy locations obfuscated with , the limiting distribution produced with IBU is the unique MLE of the observed locations under (Theorem V.2). Thus, the only door of randomness in the cycle of Optima3 is the addition of noise done by the geo-indistinguisible LPPM . In particular, the probability of ending up in certain after one cycle of Optima3 starting from some is some function of the optimal channel that BA converged to within this cycle, i.e., .
Let us a step ahead to make the notion of probability distributions a level more abstract by introducing , such that for every . , essentially, gives the probability of a cycle of Optima3 landing up in a certain PMF staring from some PMF . This immediately enables an environment to look at Optima3 as a Markov chain over the finite state space with a transition matrix such that for every PMF . With this Markov chain interpretation of Optima3, let be the random variable denoting the state in we are at in the iteration of Optima3.
A key element to note here is that due to the discretization of the probability simplex into , Optima3 can be modelled as discrete-time a Markov chain on a finite state space.
Definition VI.1 (Irreducible Markov chain).
A discrete-time Markov chain with a transition matrix over finite state space is irreducible if for all its states , there exists such that , where denotes the transition matrix for the Markov chain at time .
Definition VI.2 (Period of Markov chain).
For a discrete-time Markov chain with a transition matrix over finite state space, let for any state be the set of all time-steps which have a non-zero probability of the Markov chain to start and end in . Then the period of state is .
Definition VI.3 (Aperiodic Markov chain).
A discrete-time Markov chain over finite state space is called aperiodic if the period of all of its states is 1.
Definition VI.4 (Stationary distribution).
For a discrete-time Markov chain with a transition matrix over finite state space, a distribution on the state space is called a stationary distribution if .
Definition VI.5 (First hitting time).
For a discrete-time Markov chain with a transition matrix over finite state space, let the first hitting time for a state be defined as .
Optima3 is an irreducible Markov chain.
In Appendix A. ∎
Optima3 is an aperiodic Markov chain.
In Appendix A. ∎
Now we have the foundations laid to analyze the long-term behaviour of Optima3 seen as a Markov chain. In particular, we now aim to investigate the limiting behaviour of Optima3 if it is let run for long enough. In this regard, we concede to a well-known result in the theory of Markov chains: an irreducible discrete-time Markov chain over a finite state space has a unique stationary distribution (Theorem 3.3 in [FreedmanMCConverge]). As a consequence, we have the following theorem.
Optima3, seen as a discrete-time Markov chain with transition matrix over the finite state space , has a unique stationary distribution over and it is given by
for every .
Immediate from Corollary 39 and Theorem 54 by Serfozo in [serfozo2009basics]. Explicitly, Theorem 3.3 in [FreedmanMCConverge]. ∎
Exploiting the fact that we elevated Optima3 to a Markov chain over a finite state space, as a direct derivative of Theorem VI.3, we can furnish the following theorem.
Let be the unique stationary distribution for the discrete-time Markov chain of Optima3 over the finite state space . Then, over time, the estimation of the true PMF given by Optima3 follows the distribution , i.e., .
An immediate corollary of the Perron–Frobenius theorem [wikiPerronFrobeniustheorem]. An explicit proof has been given by Freedman in Theorem 4.9 of [FreedmanMCConverge]. ∎
Let us elucidate the deeper interpretation of Theorem VI.4. One very interesting conclusion we get from the result is that irrespective of what initial PMF we start the first cycle of Optima3 from, after enough iterations our method will estimate the true PMF following a fixed distribution that can be computed independently and beforehand using (5).
Moreover, having a sufficiently large number of samples, in the iteration, for an arbitrary , the black-box sampling of the locations, as described in step 2.ii) in the description of Optima3 in Section V, should empirically approximate the true PMF . This would also imply that the empirical distribution of the noisy locations, obfuscated with , is expected to be close to .
We are expected to end up in such a scenario as a result of having enough samples, and once we have this, starting from any PMF , the MLE of the observed locations will be, approximately, the true PMF , and IBU estimates it quite efficiently [EhabConvergenceIBU]. To put it down mathematically, as , in the first iteration of Optima3
and the cycle repeats in the subsequent iterations of Optima3.
Therefore, as a synopsis, we intuitively understand that for a sufficiently large number of location samples, the expected hitting time for in the Markov chain is fairly small, i.e., we are expected to arrive at approximating (hit) the true PMF with a rather small number of iterations as we run Optima3. This implies is large and, hence, we can make the following remark.
Combining Theorem VI.3 with the above expounding and, we can say that the privacy channel generated by Optima3, which optimizes privacy and QoS, gives the best statistical utility with a very high probability if we have a sufficiently large number of samples, thus, establishing a three-way optimality. In other words, letting to be the unique MLE of for a given set of observed locations and a geo-indistinguishable channel produced by BA, for any in , we will have:
Vii Further mathematical analysis
In this section, we aim to dissect further into the mathematical intricacies and the properties that our proposed method follows. In order to measure the distances between PMFs within intermediate steps of Optima3, we shall use the total variation distance and the norm.
Definition VII.1 (Total variation).
For any two PMFs , the total variation (TV) distance between them, is defined as:
Definition VII.2 ( norm).
For any two PMFs , the norm between them, is defined as:
Note that for any pair of , we have .
Vii-a Properties of BA under Optima3
In [PalaiyanurDistCts], Palaiyanur et al. studied the uniform continuity of the rate-distortion function. In order to elaborate on the analytical behaviour BA under Optima3, we shall aim to take advantage of the results in [PalaiyanurDistCts]. In particular, we note that using the any distortion metric on , for every location , we have such that . Therefore, we satisfy the Condition (Z) proposed by Palaiyanur et al. in Section II of [PalaiyanurDistCts]. With this, we immediately place ourselves in the environment to profit from Lemma 2 of [PalaiyanurDistCts].
As is finite, we can define the maximum distortion on as . Let us fix as the maximum allowance for the average distortion. Let us, further, define the minimum possible non-zero distortion on as . Then we have the following result.
Let such that . Let and be random variables on with PMFs and , respectively. Then we have:
Immediate from Lemma 2 of [PalaiyanurDistCts]. ∎
Let be positive constants. Let be such that . Then
where is Lambert W function of some integer order .
In Appendix A. ∎
Let us fix a maximum average distortion under a chosen distortion metric for the subsequent analysis. Now we are prepared to show a very interesting property for the rate-distortion function by setting and . Then we can assert that the it is uniformly continuous if we are only focusing on the “small jumps” of the RD function.
For satisfying , there exists such that for all ,
where and are random variables on with PMFs and , respectively.
For a starting PMF and an initial channel, we know BA converges to give the channel that estimates the unique RD function [csizar]. We can, therefore, comment that the uniform continuity of RD function implies the uniform continuity of BA under the same condition of (jump) as in Corollary VII.2.1.
Vii-B Properties of IBU under Optima3
For comparing between the successive iterations of IBU we shall use the TV distance between them. For the convenience of notation, let us extend the functional representation of IBU , as introduced in Definition III.7, as follows. For any LPPM over the finite space and starting with any full-support , let denote the iteration of IBU, as in (1), for all . Therefore, in the context of any cycle of Optima3, in one cycle
Suppose is a privacy channel over . Then, having a sufficiently large number of samples, every step of IBU is probabilistically uniformly continuous w.r.t. , i.e., as , for all , there exists w.p. 1 such that for all :
In Appendix A. ∎
As an immediate corollary of Lemma VII.3, we can elevate the uniform continuity of a single step of IBU to the entire method and formally assert the following.
For an optimal privacy channel , derived with BA, defined over , if we have a sufficiently large number of samples, IBU is a probabistically uniformly continuous transformation w.r.t. , i.e., as , for all , there exists a w.p. 1 such that for every :
In Appendix A. ∎
For an optimal privacy channel over generated with BA, with a sufficiently large number of samples, IBU is a Lipschitz continuous transformation w.r.t. , i.e., as , there exists constant w.p. 1 such that for any pair of PMFs :
In Appendix A. ∎
Vii-C Compiling BA and IBU under Optima3
Having discussed several very interesting and quite desirable analytical properties of the RD function, BA, and IBU, we are now in a position to comment about the uniform continuity of Optima3.
Optima3 is a uniformly continuous transformation w.r.t. norm, i.e., supposing Optima3 runs for iterations, for all there exists such that for every :
In Appendix A. ∎
Viii Experimental results
In this section, we describe the empirical results obtained by carrying out experimental analysis to illustrate and validate the working of our proposed method. We perform our experiments on real location data from the Gowalla dataset [Gowalla:online, cho2011friendship]. We consider Gowalla check-ins from (i) a northern part of San Francisco bounded by latitudes (37.7228, 37.7946) and longitudes (-122.5153, -122.3789) covering an area of 12Km8Km discretized with a grid; (ii) a central part of Paris bounded by latitudes (48.8286, 48.8798) and longitudes (2.2855, 2.3909) covering an area of 8Km6Km discretized with a grid. In this setting, we work with 123,108 check-in locations in San Francisco and 10,260 check-in locations in Paris. Figure 1(a) shows the particular points of check-ins from Paris and San Francisco and Figure 1(b) displays the distribution with which these location data are spread across the two cities.
We implemented Optima3 on the datasets from Paris and San Francisco separately to judge its performance on real location data. For the location dataset from Paris, 15 cycles of Optima3 was run, where each cycle comprised of 8 iterations of BA and 10 iterations of IBU. In the case of San Francisco, we ran Optima3 for 8 cycles with 5 iterations of BA and IBU each under every cycle of Optima3. In both cases, we assigned the value of the loss parameter parameterizing the maximum allowance of average distortion, , to be and . This was done to test the performance of Optima3 in estimating the true PMF under two different levels of privacy. Each experiment was run for 5 rounds to account for the randomness of the sampling and obfuscation. In each cycle of Optima3 across all the settings, BA was initiated with the uniform channel and a uniform distribution over the locations as the “starting guess” of the true distribution.
gives rise to a geo-indistinguishable location-privacy mechanism with BA that injects less noise than that produced with . As a result, we obtain a more accurate estimation of the true PMF via Optima3 for the case of than for as less local noise has been injected for the former than the latter. However, in both cases, the EMD between the true and the estimated PMF seem very low, insinuating a good performance of the mechanism produced with Optima3 for preserving the statistical utility of the data. Moreover, for both Paris and San Francisco, Optima3 seems to significantly improve its estimation of the true PMF with every iteration until it converges to the MLE of the observed locations.
Comparing Figure 3 with Figure 1(b), we see that the estimations of the true distributions of the locations in Paris and San Francisco by IBU under Optima3 for both the settings of the loss parameter are fairly accurate. However, as we would expect, the privacy mechanism obtained under Optima3 harbours better statistical utility for than for because of the level of local noise injected.
Now we shift our attention to Figures 4 and 5 to analyze the performance of Optima3 in preserving the statistical utility for the locations datasets of Paris and San Francisco, respectively. Figures 3(b) and 3(a) show us the behaviour of the EMD between the true distribution of the locations in Paris and its estimate by IBU under Optima3 in each of its 15 cycles. One of the most crucial observations here is that the EMD between the true and the estimated PMFs seem to decrease with the number of iterations, and finally converge, implying that the estimated PMFs given by Optima3 seem to improve at the end of each cycle go on to converge to the MLE of the noisy locations under the channel produced by BA, estimating the true PMF of the locations. This, empirically, suggests the convergence of the entire method. This is a major difference from the work of [Oya:19:EuroSnP] which, as we pointed out before, harbours the potential of encountering with a LPPM which is optimal according to the standards set by Shokri et al. in [ShokriQuantifyingLocPriv2011] but the EM method used to estimate the true distribution would fail to converge for that mechanism as illustrated in Example II.1.
In particular, we observe a significant improvement in the estimation of the true PMF after the first cycle of Optima3, after which the estimated PMF converges to the MLE of the observed locations with following quite a stable trend. In order to visualize this road to convergence of Optima3 more distinctly, in Figures 3(d) and 3(c) we focus on the second iteration of Optima3 onward for and , respectively.
We observe an exact similar trend for the case of the San Francisco dataset. Figures 4(a) and 4(b) show the behaviour of the statistical utility of the privacy channel generated by BA under each of the 8 cycles of Optima3 for and , respectively. To magnify the path to convergence, we demonstrate the behaviour of Optima3 from its second cycle onward in Figures 4(c) and 4(d) for and , respectively.
The explicit values555up to 5 significant digits after the decimal of the EMDs between the true and the estimated PMFs on the location data from Paris and San Francisco for both the settings of the loss parameter can be found in Tables III and IV, respectively.
|Round 1||Round 2||Round 3||Round 4||Round 5||Round 1||Round 2||Round 3||Round 4||Round 5|
|Round 1||Round 2||Round 3||Round 4||Round 5||Round 1||Round 2||Round 3||Round 4||Round 5|
Viii-a QoS vs statistical utility under Optima3
We go on to probe the behaviour and the compatibility of the two paradigms of utility we considered. In particular, we aim to investigate the performance of the following being placed against each other: i) EMD between the true and the estimated distributions at the end of all the cycles of Optima3 until its empirical convergence, and ii) the maximum average distortion under the privacy channel generated by Optima3, which is specified with the loss parameter . We recall (3) and acknowledge that a higher the value of , the lesser the local noise that is injected into the data, staying consistent with our observations in Figure 3.
We continue working with the location data from Paris and San Francisco obtained from the Gowalla dataset in the same framework as described before. We consider taking the values from the set , and for each value of the loss parameter, we run Optima3 on the datasets of Paris and San Francisco using the same number of iterations as used in the previous experiments – for the Paris dataset, within each cycle of Optima3 we implement 8 iterations of BA and 10 iterations of IBU, running Optima3 for 15 cycles, and for the San Francisco dataset, we use 5 iterations for each of BA and IBU within a single cycle of Optima3 and run the entire method for 8 cycles. We run the experiments for 5 rounds for each value of to account for the randomness of noise insertion.
At the end of running all the batches of Optima3 correspondingly on the two datasets, we set to dissect the trend of the final estimated distributions for the different loss parameters representing the maximum average distortion w.r.t. the Euclidean distance. Specifically, we plot the EMD between the true and the final estimated PMFs for each of the two datasets after the completion of Optima3 against the different in .
Figures 5(a) and 5(b) show us that the statistical utility when plotted against the loss parameter denoting the maximum average distortion, for each the two location datasets, results in producing a Pareto curve. We see that the EMD between the true and the estimated PMFs for a fixed number of cycles of Optima3 run with a given number of iterations for BA and IBU, such that the entire method numerically converges, prominently decreases and becomes stable as increases, i.e., the amount of locally injected noise decreases. This depicts an improvement of the estimated PMF until it advances to converge to the true PMF with an increase of the loss parameter. This observation is consistent with the Pareto-behaviour of mutual information with the maximum average distortion as studied in rate-distortion theory [ShannonInfoTheory], and thus, we empirically bridge together the two ends of utility we focused on in this paper.
Viii-B Connecting the theoretical results
The experimental analysis on the real locations from Paris and San Francisco galvanize the theory developed in Sections V, VI, and VII with validations and visualisations. The very accurate estimate of the true distributions of the data using IBU for the privacy channels produced by BA under Optima3 ensures perceptible evidence towards our goal of furnishing a three-way optimal location-privacy mechanism satisfying geo-indistinguishability.
The outcomes of the experiments reassure the uniform continuity of Optima3 and provide empirical evidence for Theorem VII.6. Moreover, from the experiments, we can draw a very interesting parallel by perceiving Optima3 as a discrete-time Markov chain over the finite space . We recall Theorem VI.4 that suggests that the limiting behaviour of the estimated PMF by Optima3 will follow a unique distribution over where was obtained with the help of (5) in Theorem VI.3. Furthermore, Remark 5 suggests that this limiting distribution of the estimates, , should be reflecting a high probability for Optima3 to estimate the unique, by Theorem V.2, MLE of the observed locations under the privacy channel generated by BA. Figures 4 and 5 reinforce this point by empirically demonstrating the convergence of Optima3 to the MLE across all the rounds of experiments, embracing the fact that, indeed, the MLE has the highest chance of being estimated by Optima3 at the end of each cycle and empirically backing up (7). This confirms that the privacy channel obtained with BA that optimizes MI and AvgD within each cycle of Optima3 also guarantees the best statistical utility over time with sufficient number of samples and gives a very accurate estimate of the true PMF, evidently heading towards establishing a three-way optimization between privacy and the two notions of utility.
In this paper we have considered the triadic conflict between location-privacy, users’ quality of service, and statistical utility of the collected data, addressing the problem of finding the mechanism that would optimize