Imagine dropping a few drops of ink into a glass of water. The ink drops spread out, forming complicated tendrils that coil back on each other, expanding quickly, until all of the ink has diffused and the liquid is a slightly darker shade than its original colour. There is no physical process by which you can make the diffusing ink coalesce back into its original droplets. This intuition is at the heart of what we call computational cloaking. Because it is physically impossible to reconstruct the ink droplet exactly, we should be able to hide or keep private in a precise sense its original location. When mathematicians and physicists refer to cloaking, they usually mean transformation optics (Greenleaf et al., 2009), the design of optical devices with special customised effects on wave propagation. In this paper, we exploit the ill-conditionedness of inverse problems to design algorithms to release differentially private measurements of the physical system.
We are motivated by the explosion in the power and ubiquity of lightweight (thermal, light, motion, etc.) sensors. These data offer important benefits to society. For example, thermal sensor data now plays an important role in controlling HVAC systems and minimising energy consumption in smart buildings (Lin et al., 2002; Beltran et al., 2013). However, these sensors also collect data inside intimate spaces, homes and workspaces, so the information contained in the data is sensitive. To continue with the example of thermal sensor data, one might consider sources of heat to be people, whose locations we aim to keep private.
Our work indicates that it is possible to produce locally differentially private sensor measurements that both keep the exact locations of the heat sources private and permit recovery of the general vicinity
of the sources. That is, the locally private data can be used to recover an estimate,, that is close to the true source locations, , in the Earth Mover Distance (EMD). This is the second aspect to our work: algorithms that reconstruct sparse signals with error guarantees with respect to EMD (rather than the more traditional or error in which accurate recovery is insurmountable).
1.1 Source Localization
Suppose that we have a vectorof length that represents the strengths and positions of our “sources.” The th entry represents the strength of the source at position . Further, suppose that we take linear measurements of our source vector; we observe
where represents some generic linear physical transformation of our original data. Let us also assume that the source vector consists of at most sources (or non-zero entries). The straightforward linear inverse problem is to determine , given and a noisy version of . More precisely, given noisy measurements , can we produce an estimate that is still useful?
For physical processes such as diffusion, intuitively, we can recover the approximate geographic vicinity of the source. This is exactly the concept of closeness captured by the Earth Mover Distance (EMD). Thus, in this paper, we aim to recover that is close to
in the EMD. The EMD can be defined between any two probability distributions on a finite discrete metric space. It computes the amount of work required to transform one distribution into the other.
(Rubner et al., 2000) Let and be two probability distributions on the discrete space . Now, let
1.2 Differential Privacy
To understand our definition of cloaking, we give a very brief introduction to differential privacy in this section. A more in-depth introduction can be found in Dwork and Roth (2014). Differential privacy has emerged over the past decade as the leading definition of privacy for privacy-preserving data analysis. A database is a vector in for some data universe . We call two databases , adjacent or “neighbouring” if .
Definition 2 (-Differential Privacy)
(Dwork et al., 2006) A randomised algorithm is -differentially private if for all adjacent databases , and events ,
To understand this definition suppose the database contains some sensitive information about Charlie and the data analyst, Lucy, produces some statistic about the database via a differentially private algorithm. Then Lucy can give Charlie the following guarantee: an adversary given access to the output can not determine whether the database was or , where has Charlie’s true data and has Charlie’s data replaced with an arbitrary element of .
1.3 Computational Cloaking Precisely
First, we clarify exactly what information we would like to keep private. We consider the coordinates of to be our data, that is the locations of the sources are what we would like to keep private. We assume that there exists a metric on the set of possible source locations, which induces the EMD on the set of source vectors. For the remainder of this work, we will assume that the metric is such that every pair of source locations is connected by a path that travels via neighbours.
When the matrix represents a physical process, we usually cannot hope to keep the existence of a source private and also recover an estimation to that is close in the EMD. However, it may be possible to keep the exact location private while allowing recovery of the “general vicinity” of the source. In fact, we will show in Section 4 that this is possible for diffusion on the discrete 1-dimensional line and in Section 5 that we can generalise these results to diffusion on a general graph. We are going narrow our definition of “neighbouring” databases to capture this idea.
For , two source vectors and are -neighbours if
The larger is, the less stringent the neighbouring condition is, so the more privacy we are providing. This definition has two important instances. We can move a source of weight 1 by units, hiding the location of a large heat source (like a fire) within a small area. Also, we can move a source with weight by 1 unit, hiding the location that small heat source (like a person) over a much larger area. We will usually drop the when referring to neighbouring vectors.
A locally differentially private algorithm is a private algorithm in which the individual data points are made private before they are collated by the data analyst. In many of our motivating examples the measurements are at distinct locations prior to being transmitted by a data analyst (for example, at the sensors). Thus, the “local” part of the title refers to the fact that we consider algorithms where each measurement, , is made private individually. This is desirable since the data analyst (e.g. landlord, government) is often the entity the consumer would like to be protected against. Also, it is often the case that the data must be communicated via some untrusted channel (Walters et al., 2007; FTC, 2015). Usually this step would involve encrypting the data, incurring significant computational and communication overhead. However, if the data is made private prior to being sent, then there is less need for encryption. We then wish to use this locally differentially private data to recover an estimate to the source vector that is close in the EMD. The structure of the problem is outlined in the following diagram:
Design algorithms and such that:
(Privacy) For all neighbouring source vectors and , indices , and Borel measurable sets we have
(Utility) is small.
1.4 Related Work
An in-depth survey on differential privacy and its links to machine learning and signal processing can be found in(Sarwate and Chaudhuri, 2013). The body of literature on general and local differential privacy is vast and so we restrict our discussion to work that is directly related. There is a growing body of literature of differentially private sensor data (Liu et al., 2012; Li et al., 2015; Wang et al., 2016; Jelasity and Birman, 2014; Eibl and Engel, 2016). Much of this work is concerned with differentially private release of aggregate statistics derived from sensor data and the difficulty in maintaining privacy over a period of time (called the continual monitoring problem).
Connections between privacy and signal recovery have been explored previously in the literature. Dwork et al. (2007) considered the recovery problem with noisy measurements where the matrix has i.i.d. standard Gaussian entries. Newer results of Bun et al. (2014) can be interpreted in a similar light where is a binary matrix. Compressed sensing has also been used in the privacy literature as a way to reduce the amount of noise needed to maintain privacy (Li et al., 2011; Roozgard et al., 2016).
There are also several connections between sparse signal recovery and inverse problems (Farmer et al., 2013; Burger et al., 2010; Haber, 2008; Landa et al., 2011). The heat source identification problem is severely ill-conditioned and, hence, it is known that noisy recovery is impossible in the common norms like and . This has resulted in a lack of interest in developing theoretical bounds (Li et al., 2014), thus the mathematical analysis and numerical algorithms for inverse heat source problems are still very limited.
To the best of the author’s knowledge, the papers that are most closely related to this work are Li et al. (2014), Beddiaf et al. (2015) and Bernstein and Fernandex-Granda (2017). All these papers attempt to circumvent the condition number lower bounds by changing the error metric to capture “the recovered solution is geographically close to the true solution”, as in this paper. Our algorithm is the same as Li, et al., who also consider the Earth Mover Distance (EMD). Our upper bound is a generalisation of theirs to source vectors with more than one source. Beddiaf, et al. follows a line of work that attempts to find the sources using -minimisation and regularisation. In work that was concurrent to ours, Bernstein et al. also considered heat source location, framed as deconvolution of the Gaussian kernel. They proved that a slight variant of Basis Pursuit Denoising solves the problem exactly assuming enough sensors and sufficient separability between sources. They also arrive at a similar result to Theorem 14 for the noisy case (Bernstein and Fernandex-Granda, 2017, Theorem 2.7).
2 Privacy of measurements and Ill-conditioned Matrices
2.1 The Private Algorithm
Because we assume that our sensors are lightweight computationally, the algorithm is simply for each sensor to add Gaussian noise locally to its own measurement before sending to the perturbed measurement to the central node111Gaussian noise is not the only option to achieve privacy. There has been some work on the optimal type of noise to add to achieve privacy (Geng and Viswanath, 2016).
. The question then is; how much noise should we add to maintain privacy? The following lemma says, essentially, that the standard deviation of the noise added to a statistic should be proportional to how much the statistic can vary between neighbouring data sets. Letbe a function and let (called the sensitivity of ).
Lemma 4 (The Gaussian Mechanism)
(Dwork and Roth, 2014) Let , and then
is an -differentially private algorithm.
Let us apply the Gaussian mechanism to the general linear inverse problem. As we discussed previously, ill-conditioned source localization problems behave poorly under addition of noise. Intuitively, this should mean we need only add a small amount of noise to mask the original data. We show that this statement is partially true. However, there is a fundamental difference between the notion of a problem being ill-conditioned (as defined by the condition number) and being easily kept private. Let be the th column of .
With and the definition of -neighbours presented in Definition 3, we have
is a -differentially private algorithm where
Let be the spectrum of , enumerated such that . The condition number, , is a measure of how ill-conditioned this inverse problem is. It is defined as
where is the pseudo inverse of . The larger the condition number the more ill-conditioned the problem is Belsley et al. (1980).
The following matrix illustrates the difference between how ill-conditioned a matrix is and how much noise we need to add to maintain privacy. Suppose
where is small. While this problem is ill-conditioned, is large, we still need to add considerable noise to the first coordinate of to maintain privacy.
A necessary condition for to be small is that the matrix is almost rank 1, that is, the spectrum should be almost 1-sparse. In contrast the condition that
is large is only a condition on the maximum and minimum singular values. The following lemma says that if the amount of noise we need to add,, is small then the problem is necessarily ill-conditioned.
Let be a matrix such that then
where is the parameter in Definition 3.
Suppose is a neighbouring source to then
Since we could have replaced 1 and with any pair of neighbours we have
The following lemma gives a characterization of in terms of the spectrum of . It verifies that the matrix must be almost rank 1, in the sense that the spectrum should be dominated by the largest singular value.
If , then for any pair of neighbouring locations and and , where is the diameter of the space of source locations.
Conversely, if and then .
Let and be neighbouring sources. Now, assume then
Suppose wlog that and let be the matrix whose columns are all duplicates of the first column of . Recall that the trace norm of a matrix is the sum of its singular values and for any matrix, and . Since is rank 1, , thus,
Conversely, suppose and . Using the SVD we know, where and are the left and right singular values, respectively. Thus,
2.2 Backdoor Access via Pseudo-randomness
It has been explored previously in the privacy literature that replacing a random noise generator with cryptographically secure pseudorandom noise generator in an efficient differentially private algorithm creates an algorithm that satisfies a weaker version of privacy, computational differential privacy (Mironov et al., 2009). While differential privacy is secure against any adversary, computational differential privacy is secure against a computationally bounded adversary. In the following definition, is a security parameter that controls various quantities in our construction.
Definition 8 (Simulation-based Computational Differential Privacy (SIM-CDP))
(Mironov et al., 2009) A family, , of probabilistic algorithms is -SIM-CDP if there exists a family of -differentially private algorithms , such that for every probabilistic polynomial-time adversary , every polynomial , every sufficiently large , every dataset with , and every advice string of size at , it holds that,
That is, and are computationally indistinguishable.
The transition to pseudo-randomness, of course, has the obvious advantage that pseudo-random noise is easier to generate than truly random noise. In our case, it also has the additional benefit that, given access to the seed value, pseudo-random noise can be removed, allowing us to build a “backdoor” into the algorithm. Suppose we have a trusted data analyst who wants access to the most accurate measurement data, but does not have the capacity to protect sensitive data from being intercepted in transmission. Suppose also that this party stores the seed value of each sensor and the randomness in our locally private algorithm is replaced with pseudo-randomness. Then, the consumers are protected against an eavesdropping computationally bounded adversary, and the trusted party has access to the noiseless 222This data may still be corrupted by sensor noise that was not intentionally injected measurement data. This solution may be preferable to simply encrypting the data during transmission since there may be untrusted parties who we wish to give access to the private version of the data.
Corollary 9 (Informal)
Replacing the randomness in Proposition 13 with pseudo-randomness produces a local simulation-based computational differentially private algorithm for the same task. In addition, any trusted party with access to the seed of the random number generator can use the output of the private algorithm to generate the original data.
3 Recovery algorithm and Examples
We claimed that the private data is both useful and differentially private. In this Chapter we discuss recovering an estimate of from the noisy data . Algorithms for recovering a sparse vector from noisy data have been explored extensively in the compressed sensing literature. However, theoretical results in this area typically assume that the measurement matrix is sufficiently nice. Diffusion matrices are typically very far from satisfying the niceness conditions required for current theoretical results. Nonetheless, in this Chapter we discuss the use of a common sparse recovery algorithm, Basis Pursuit Denoising (BPD), for ill-conditioned matrices. The use of BPD to recover source vectors with the heat kernel was proposed by Li et al. (2014), who studied the case of a 1-sparse source vector.
We begin with a discussion of known results for BPD from the compressed sensing literature. While the theoretical results for BPD do not hold in any meaningful way for ill-conditioned diffusion matrices, we present them here to provide context for the use of this algorithm to recover a sparse vector. We then proceed to discussing the performance of BPD on private data in some examples: diffusion on the 1D unit interval and diffusion on general graphs.
3.1 Basis Pursuit Denoising
Basis Pursuit Denoising minimises the -norm subject to the constraint that the measurements of the proposed source vector should be close in the -norm to the noisy sensor measurements. To simplify our discussion, let be the standard deviation of the noise added to the sensor measurements. The bound in Algorithm 1 is chosen to ensure is a feasible point with high probability.
The bound in Algorithm 1 is chosen to ensure is a feasible point with high probability.
Hsu et al. (2012) Let then for all ,
So for large and small , we have with high probability.
3.2 Basis Pursuit Denoising for RIP matrices
In order to present the results in this section cleaner, rather than keeping track of we introduce parameters . Basis Pursuit Denoising
is the convex relaxation of the problem we would like to solve, -minimisation:
The minimum of the norm is the sparsest solution. Unfortunately, this version of the problem is NP hard, so in order to find an efficient algorithm we relax to the norm. The norm is the “smallest” convex function that places a unit penalty on unit coefficients and zero penalty on zero coefficients. Since the relaxation is convex, we can use convex optimisation techniques to solve it. In the next section we’ll discuss an appropriate optimisation algorithm. In this section, we focus on when the solution to the relaxed version (2) is similar to the solution for Equation (3).
We call the columns of , denoted by , atoms. We will assume for this section that for all . Notice that the vector is the linear combination of the with coefficients given by the entries of so we can think of recovering the vector as recovering the coefficients333This is where BPD gets its name. We are pursuing the basis vectors that make up .. A key parameter of the matrix is its coherence:
Similar to , the coherence is a measure how similar the atoms of . The larger the coherence is, the more similar the atoms are, which makes them difficult to distinguish. For accurate sparse recovery, it is preferential for the coherence to be small. The following theorem relates the solutions to Equation (2) and (3).
Theorem 11 says that if the matrix is coherent then the solution to the convex relaxation (Algorithm 1) is at least as sparse as a solution to (3) with error tolerance somewhat smaller than . Also, only recovers source locations that also appear in , although it may not recover all of the source locations that appear in . The final property bounds the weight assigned to any source identified in and not . If then the worst case discrepancy between and occurs when concentrates its weight on a single atom. In our case, the noise vector has i.i.d. Gaussian coordinates and hence is unlikely to concentrate its weight.
The key property for exact recovery of , rather than , is that is a near isometry on sparse vectors. A matrix satisfies the Restricted Isometry Property (RIP) of order with restricted isometry constant if is the smallest constant such that for all -sparse vectors ,
If is a feasible point and is small, then we can guarantee that and are close in the norm.
The exact constant is given explicitly in Candès (2008) and is rather small. For example, when , we have .
Theorems 11 and 12 only provide meaningful results for matrices with small and . Unfortunately, the coherence and restricted isometry constants for ill-conditioned matrices, and in particular diffusion matrices, are both large. It is somewhat surprising then that BPD recovers well in the examples we will explore in the following sections.
4 Diffusion on the Unit Interval
Let us define the linear physical transformation explicitly for heat source localization. To distinguish this special case from the general, we denote the measurement matrix by (instead of ). For heat diffusion, we have a diffusion constant and a time at which we take our measurements. Let in what follows. Let . Let and suppose the support of is contained in the discrete set . Let and suppose we take measurements at locations so is the measurement of the sensor at location at time and we have
The heat kernel, , is severely ill-posed due to the fact that as heat dissipates, the measurement vectors for different source vectors become increasingly close Weber (1981). Figure 1 shows the typical behaviour of Algorithm 1 with the matrix . As can be seen in the figure, this algorithm returns an estimate that is indeed close to in the EMD but not close in more traditional norms like the and norms. This phenomenon was noticed by Li et al. (2014), who proved that if consists of a single source then EMD is small where .
With the definition of neighbours presented in Definition 3 and restricting to we have
For all we have
Figure 2 shows calculations of with varying parameters. The vertical axes are scaled to emphasise the asymptotics. These calculations suggest that the analysis in Proposition 13 is asymptotically tight in , and .
Suppose that is a source vector, and assume the following:
Assumptions 1 and 2 state that needs to be large enough that for each possible source and we need to take the measurements before the heat diffuses too much. Assumption 3 says that the sources need to be sufficiently far apart. We can remove this assumption by noting that every source vector is close to a source vector whose sources are well separated and that for all , .
The result is a generalisation to source vectors with more than one source of a result of Li et al. (2014) . Our proof is a generalisation of their proof and is contained in Section 6. In order to obtain a recovery bound for the private data, we set . The asymptotics of this bound are contained in Table 1. It is interesting to note that, unlike in the constant case, the error increases as (as well as when ). This is because as the inverse problem becomes less ill-conditioned so we need to add more noise.
The following theorem gives a lower bound on the estimation error of the noisy recovery problem.
where is the infimum over all estimators , is the supremum over all source vectors in and is sampled from .
Note that this lower bound matches our upper bound asymptotically in and is slightly loose in . It varies by a factor of from our theoretical upper bound. Experimental results (contained in the extended version) suggest that the error decays like . A consequence of Theorem 15 is that if two peaks are too close together, roughly at a distance of , then it is impossible for an estimator to differentiate between the true source vector and the source vector that has a single peak located in the middle. Before we prove Theorem 15 we need following generalisation of the upper bound in Proposition 13.
The proof of Theorem 15 will be an application of Fano’s inequality, a classic result from information theory. Suppose are probability distributions on the same space. Then the Kullback-Leibler (KL) divergence of and is defined by . For a collection of probability distributions, the KL diameter is defined by
If is a metric space, and , then we define the -packing number of to be the largest number of disjoint balls of radius that can fit in , denoted by . The following version of Fano’s lemma is found in Yu (1997).
Lemma 17 (Fano’s Inequality.)
Let be a metric space and be a collection of probability measures. For any totally bounded and ,
where the infimum is over all estimators.
[Proof of Theorem 15] For any source vector , let be the probability distribution induced on by the process . Then the inverse problem becomes estimating which distribution the perturbed measurement vector is sampled from. Let and be two source vectors. Then
for some constant
, where we use the fact that the KL-divergence is additive over independent random variables, along with Lemma16. Now, let . Let be the set consisting of the following source vectors: , , , , which are all at an EMD from each other. Then and . Thus, by Lemma 17,
5 Diffusion on Graphs
In this section we generalise to diffusion on an arbitrary graph. As usual, our aim is to protect the exact location of a source, while allowing the neighbourhood to be revealed. Diffusion on graphs models not only heat spread in a graph, but also the path of a random walker in a graph and the spread of rumours, viruses or information in a social network. A motivating example is whisper networks where participants share information that they would not like attributed to them. We would like people to be able to spread information without fear of retribution, but also be able to approximately locate the source of misinformation. The work in this section does not directly solve this problem since in our setting each node’s data corresponds to their probability of knowing the rumour, rather than a binary yes/no variable. In future work, we would like to extend this work to designing whisper network systems with differential privacy guarantees. If a graph displays a community structure, then we would like to determine which community the source is in, without being able to isolate an individual person within that community.
Let be a connected, undirected graph with nodes. The matrix contains the edge weights so is the weight of the edge between node and node and the diagonal matrix has equal to the sum of the -th row of . The graph Laplacian is . As above, we also have a parameter controlling the rate of diffusion . Then if the initial distribution is given by then the distribution after diffusion is given by the linear equation Thanou et al. (2017). We will use to denote the matrix . Note that, unlike in the previous section, we have no heat leaving the domain (i.e., the boundary conditions are different).
The graph has a metric on the nodes given by the shortest path between any two nodes. Recall that in Lemma 7 we can express the amount of noise needed for privacy, , in terms of the spectrum of . Let
be the eigenvalues ofthen are the eigenvalues of . For any connected graph , the Laplacian
is positive semidefinite and 0 is a eigenvalue with multiplicity 1 and eigenvector the all-ones vector.
For any graph ,
where is the th row of the matrix whose columns are the left singular vectors of .
With set-up as in Lemma 7 we have
Since the first eigenvector of is the all ones vector, we .
An immediate consequence of Lemma 18 is that is bounded above by
The second smallest eigenvalue of , , (called the algebraic connectivity) is related to the connectivity of the graph, in particular the graphs expanding properties, maximum cut, diameter and mean distance (Mohar, 1991). As the graph becomes more connected, the rate of diffusion increases so the amount of noise needed for privacy decreases. The dependence on the rows of the matrix of left singular vectors is intriguing as these rows arise in several other areas of numerical analysis. Their norms are called leverage scores Drineas et al. (2012) and they appear in graph clustering algorithms.