In the past few years, there has been a wave of commercially deployed systems for differentially private [DMNS06] data analysis, most notably Google’s RAPPOR [EPK14] running in the Chrome Browser and Apple’s implementation in iOS. These deployments make it possible to perform statistical analysis of users’ sensitive data without compromise those users’ individual privacy. These systems operate in the so-called (non-interactive) local model of differential privacy. In these protocols, each user separately produces a differentially private response, and sends this response to a possibly untrusted aggregator who can then extract some useful information about the set of users. Local protocols have several advantages: each user’s privacy is ensured even if the other users and aggregator are all malicious, and they require each party to asynchronously send only a single short message. However, the most expressive and accurate differentially private algorithms can only be implemented in the central model of differential privacy, in which all users’ send their data in the clear to a trusted aggregator who runs some differentially private algorithm.
Starting with Beimel, Nissim, and Omri [BNO08] and Kasiviswanathan et al. [KLN08] there has been considerable effort to understand the relative power of the local and central models of differential privacy [CSS12, HKR12, DJW13, BS15]
. These works show quantitative—roughly quadratic—separations between the number of samples required to solve certain learning problems in these two models. The classic example of such a problem is estimating the mean of a
-valued random variable. In the central model this problem can be solved up to errorwith -differential privacy using samples via the laplace mechanism [DMNS06]. In contrast, local protocols require samples for the same problem.
However, there are more powerful differentially private algorithms in the central model that seem to have no analogue in the local model, most notably the celebrated exponential mechanism of McSherry and Talwar [MT07]. The exponential mechanism is one of the most fundamental tools for analyzing high-dimensional datasets subject to differential privacy, and has numerous application: PAC learning [KLN08], synthetic data generation [BLR13], frequent itemset mining [BLST10]
, feature selection and sparse regression[TS13, TTZ15, LCS16], multiple hypothesis testing [DSZ15], and many more. For all of these problems, the best algorithms in the local model require an exponential increase in the number of samples.
Kasiviswanathan et al. [KLN08] showed that there is no general purpose analogue of the exponential mechanism in the local model. Specifically, they showed that the exponential mechanism is sufficient to PAC learn the class of PARITY functions over using samples, whereas any local model protocol for this problem requires a number of samples that grows exponentially with the dimension . Therefore, any local model protocol implementing the exponential mechanism requires exponentially more samples. However, this lower bound relies on a general connection between the local model and statistical-query algorithms, and is therefore somewhat specific to PAC learning, not quantitatively tight, and somewhat involved technically.
The contribution of this work is to prove quantitatively tight lower bounds for local protocols solving very simple and natural selection problems. Specifically, we consider two natural selection problems: finding the most significant feature and stochastic linear optimization over a simplex or -ball. Our proofs use the information-theoretic framework for analyzing differentially private algorithms [MMP10, DJW13, BS15], and are arguably simpler and more direct than the lower bounds of Kasiviswanathan et al. Conceptually, the message of our results is that locally differentially private algorithms are severely limited for solving sparse, high-dimensional problems.
1.1 Our Results
Selecting a Significant Feature. One of the simplest applications of the exponential mechanism is to select the coordinate of a distribution over with the largest mean. Specifically, we have some distribution over with mean and each user receives an iid sample . The goal is to identify the coordinate such that is as large as possible. This is the prototypical application of the exponential mechanism, which shows that this problem can be solved in a differentially private way using , and this bound is tight [BU17, SU17]. In contrast, the natural local model protocol for this algorithm is to use randomized response [War65] to estimate each coordinate of , which requires samples—an exponentially worse dependence on . Our first result shows that this simple local algorithm is optimal up to constant factors, even if we restrict to be a product distribution.
Theorem 1.1 (Informal).
Suppose there is a non-interactive -differentially private local protocol that takes iid samples from an arbitrary product distribution over with mean and outputs a coordinate such that
then . The same holds if we replace with .
Proof Overview. Our proof of Theorem 1.1 uses information-theoretic techniques most similar to those used by McGregor et al. [MMP10]; Duchi, Jordan, and Wainwright [DJW13]; and Bassily and Smith [BS15] to analyze two-party and local differentially private algorithms. As with most lower bounds for statistical estimation, we construct a hard family of distributions such that (1) any algorithm satisfying the accuracy assumption of Theorem 1.1 can be used to uniquely identify the specific distribution , and (2) any algorithm that can be used to identify the specific distribution requires many samples. In our construction,
is just the uniform distribution overexcept that the -th coordinate is biased with mean , so it is easy to see that (1) is satisfied.
To establish (2), it (roughly) suffices to show that for every -differentially private algorithm ,
is small where is the uniform distribution on . The resulting sample complexity bound will be roughly the inverse of (1). The typical way to bound (1) is to argue that the family of distributions form a packing where is small for every . However, this will not work in our case because for every there is a differentially private algorithm such that , which is too large to establish Theorem 1.1. This algorithm ignores all coordinates except and privately estimates the mean of the -th coordinate. Intuitively, in order to bound (1), we need to formalize the intuition that every differentially private algorithm has a “privacy budget” that must be shared across the coordinates, so each coordinate may use only a small fraction of the privacy budget. Specifically, we are able to show that for every -differentially private algorithm ,
which intuitively means that on average the algorithm uses -differential privacy per coordinate. We establish this fact using a Fourier-analytic argument.
Stochastic Sparse Linear Optimization. A fundamental tool in statistical estimation is stochastic convex/concave optimization. Unsurprisingly, there has been considerable attention on understanding the sample complexity of differentially private stochastic optimization in both the central model [CMS11, KST12, BST14, TTZ15, Ull15] and the local model [DJW13, STU17]. One important case is sparse optimization, where we want to maximize over in the simplex and every function in the support of is concave and has -bounded gradients.
In the special case where the functions are linear, then these problems can be solved privately in the central model using samples via the exponential mechanism. More generally, Talwar, Thakurta, and Zhang [TTZ15] showed that any concave maximization problem in this setting can be solved in the central model using samples. In contrast, all known local protocols for this sort of optimization problem require at least samples, even for the special case of linear optimization.
Theorem 1.2 (Informal).
Suppose there is a non-interactive -differentially private local protocol that takes iid samples from an arbitrary product distribution over with mean and outputs a vector
and outputs a vectorsuch that
then . The same holds if we replace the simplex in with the the -unit-ball in .
We remark that Theorem 1.2 is quantitatively tight for linear optimization problems, since randomized response can be used to solve this problem with sample complexity . For more general concave minimization problems, there are interactive local model protocols that match the lower bound of Theorem 1.2 up to polynomial factors [TTZ15], but there is no matching upper bound for non-interactive local protocols (and such an upper bound is unlikely [STU17]).
A dataset is a collection of samples in some domain . Two datasets are neighboring if can be obtained from by adding, removing, or swapping one sample.
Definition 2.1 (Differential Privacy [Dmns06]).
A randomized algorithm is -differentially private if for every neighboring datasets , and every ,
The parameters may be functions of .
2.1 Locally Differentially Private Protocols
In a local protocol, each of the samples is held by a separate party. Each party randomizes their data independently, and then submits them to an aggregator, who collects the random messages and produces some output.
Definition 2.2 (Local Differentially Private Protocols).
We say that an algorithm is a (non-interactive) local protocol if it can be written in the form
for some local randomizer and some aggregator . In this case we write . We say that a (non-interactive) local protocol satisfies -local differential privacy if the local randomizer (but not necessarily ) satisfies -differential privacy.
We remark that none of our results would change if we allowed each sample to be randomized using a different local randomizer . However, it will ease notation slightly to focus on the case where each sample uses the same local randomizer .
The following lemma of Bun, Nelson, and Stemmer [BNS17] asserts that it is essentially without loss of generality to focus on -locally differentially private protocols.
Lemma 2.3 ([Bns17]).
Suppose is an -differentially private local protocol with and . Then there is an -differentially private local protocol such that for every dataset , .
Above, denotes the statistical distance between the two distributions.
2.2 Information-Theoretic Preliminaries
Our proof relies on some standard information-theoretic tools, which we quickly recall. Readers who are familiar with lower bounds for statistical estimation can likely skip this section.
We use the convention . Let
be a probability distribution over some supportand let denote its probability mass function. (For notational convenience we assume the support is finite and use the probability-mass function, however this is not essential for any of our results.) The entropy of is defined to be
For a joint distribution, the conditional entropy is , and the mutual information is .
We will use two notions of distance between probability distributions and . Namely, the divergence and -divergence are defined as
We will use the fact that upper bounds . That is, . We also use the fact that where denotes the product distribution with marginals and , respectively. The final tool we use is (a convenient special case of) Fano’s Inequality: if are jointly distributed random variables, each supported on , then
3 Main Lower Bound
In this section will introduce and prove a tight lower bound for locally differentially private protocols solving a certain hard statistical estimation problem. In Section 4 we will formalize the connection between this hard problem and the problems discussed in the introduction.
Before stating our lower bound, we introduce the hard statistical estimation problem. Let and be parameters. Let be the uniform distribution over . For every and , we define the distribution to be uniform over except that with probability . Formally,
Let be uniform over and let be uniform over . Let be the uniform mixture over distributions . Note that by construction, for every , . As the parameters will be clear from context, we will omit them and simply write . The next theorem is a tight lower bound on the number of samples needed to infer given drawn iid from the conditional distribution .
Fix any , and . Let be the distributions defined above (with parameters ). Suppose is an -locally differentially private protocol and is such that
We prove Theorem 3.1 using an information-theoretic argument. The main novelty is to establish a bound on the mutual information between the output of the local randomizer and the distribution parameters , which is done in Lemma 3.2. First we prevent the outline of the argument and introduce this key lemma, and afterwards we will prove the key lemma.
Proof of Theorem 3.1.
Throughout the proof, we fix the parameters . Let the distributions be as defined above (with parameters ). Let be a locally differentially private protocol with local randomizer . Let be iid samples from the conditional distribution and let be the differentially private responses. Let be the marginal distribution of each . Then we can write
where the inequality is because conditioning can only reduce entropy, and because are independent conditioned on .
We will use as shorthand for the conditional distribution . That is, is the distribution obtained by running on a random sample . Note that, .
(Chain rule for-divergence)
|(χ^2 upper bounds )|
The main technical part of the proof is to establish the following lemma, which we defer to later.
If is -differentially private, and , then
To complete the proof, we now turn our attention to proving Lemma 3.2.
Proof of Lemma 3.2.
As a shorthand, we will use
Note that in the first case the randomness is over , , and , whereas in the second case the probability is over and . Recall the definition of -divergence,
In order to bound (4), we will fix a particular choice of and analyze the quantity . Let be the distribution of when we condition on a specific sample . Note that since we are conditioning on a particular sample , the distribution parameters become irrelevant. Then we can write
where we have defined the function . The following claim is now immediate from the definition of -differential privacy.
For every choice of , .
In order to complete the proof we need the following claim. Intuitively, the claim states that the function has low correlation on average with each coordinate of . The proof of the claim uses Fourier analysis of the function .
For every choice of , and every function
Proof of Claim 3.4.
Recall the definition of the Fourier transform over the Boolean hypercube:
Aside from the definition, the only fact we will need about Fourier analysis is Parseval’s identity, which asserts that
Now, recall that we defined so that . Therefore,
where the final equality uses the fact that . To finish the proof, we have
This completes the proof of the claim. ∎
In order to make the proof simpler, we symmetrized the distribution by using the random sign-bit . However, it is easy to see that the sign bit is not necessary to obtain the lower bound, simply because any algorithm that is accurate with probability at least conditioned on is also accurate with probability at least when is random. Therefore we obtain the following easy corollary.
Fix any , and . Let be the distributions defined above (with parameters ). Suppose is an -locally differentially private protocol and is such that
4 Lower Bounds for Stochastic Optimization
In this section we show how Theorem 3.1 implies tight lower bounds for locally differentially private protocols solving an important and widely studied problems in private stochastic optimization. Consider a stochastic linear optimization over the ball where the linear objective functions have gradients bounded in the norm. We use
to denote the and unit balls in , respectively. Any point
gives rise to a linear loss functionby . Given a distribution over , we obtain a stochastic linear optimization problem to find
The next theorem is an application of Theorem 3.1, and shows that any locally differentially private algorithm for solving stochastic linear optimization in the setting requires sample complexity growing polynomially in .
Fix any and . Suppose is an -locally differentially private protocol such that for every product distribution over ,
Recall that the vertices of the ball in are the vectors where denotes the -th coordinate-basis vector in . Since the maximizer of then linear objective function must occur at one of the vertices, we can write
Thus, where is the coordinate whose mean has the largest absolute value and is the sign of the mean of the -th coordinate.
Now, let be the distributions defined in Theorem 3.1. For these distributions, by construction we have and . Moreover, for any other , Thus, for any we can conclude that
Thus, any that approximately maximizes the linear objective uniquely identifies the correct distribution parameters . So, if satisfies the assumption of the corollary, and is the uniform mixture of the distributions , then
Therefore, any algorithm that outputs an -approximate maximizer can also be used to identify the distribution parameters . By Theorem 3.1, any such algorithm requires at least samples. ∎
We remind the reader that Theorem 4.1 is optimal up to constant factors, as there is a simple algorithm based on randomized response that finds an -approximate maximizer the objective function using samples.
Theorem 4.1 should be contrasted with the fact that in the central model there is a simple differentially private algorithm for solving stochastic linear optimization that requires only logarithmically many samples in . The algorithm is an immediate application of the exponential mechanism [MT07], and its guarantee is stated in the next lemma.
Theorem 4.2 (Application of [Mt07]).
Fix and . For some , there is an -differentially private (but not locally differentially private) algorithm such that for any distribution over ,
In fact, Talwar, Thakurta, and Zhang [TTZ15] showed an algorithm that achieves a similar guarantee (although with slightly weaker parameters) even for the more general problem of maximizing an arbitrary concave function whose gradients are bounded in the norm over the ball. Thus, there is a stark contrast between the central model, where we can solve any -bounded concave maximization problem using samples, and the local model, where even maximizing an -bounded linear objective requires samples.
Optimization Over the Simplex.
Fix and . Suppose is an -locally differentially private protocol such that for every product distribution over ,
where . Then
We thank Adam Smith for highlighting this problem. We thank Thomas Steinke, Matthew Joseph, Aaron Roth, and Bo Waggonner for helpful discussions.
- [BLR13] Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to noninteractive database privacy. J. ACM, 60(2):12, 2013.
- [BLST10] Raghav Bhaskar, Srivatsan Laxman, Adam Smith, and Abhradeep Thakurta. Discovering frequent patterns in sensitive data. In ACM International Conference on Knowledge Discovery and Data Mining (KDD), 2010.
- [BNO08] Amos Beimel, Kobbi Nissim, and Eran Omri. Distributed private data analysis: Simultaneously solving how and what. In CRYPTO, 2008.
- [BNS17] Mark Bun, Jelani Nelson, and Uri Stemmer. Heavy hitters and the structure of local privacy. arXiv preprint arXiv:1711.04740, 2017.
Raef Bassily and Adam Smith.
Local, private, efficient protocols for succinct histograms.
ACM Symposium on Theory of Computing (STOC), 2015.
- [BST14] Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Symposium on Foundations of Computer science (FOCS), 2014.
- [BU17] Mitali Bafna and Jonathan Ullman. The price of selection in differential privacy. In Conference on Learning Theory (COLT), 2017.
Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate.
Differentially private empirical risk minimization.
Journal of Machine Learning Research, 12:1069–1109, 2011.
- [CSS12] T.-H. Hubert Chan, Elaine Shi, and Dawn Song. Optimal lower bound for differentially private multi-party aggregation. In European Symposium on Algorithms (ESA), 2012.
- [DJW13] John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy and statistical minimax rates. In IEEE Symposium on Foundations of Computer Science (FOCS), 2013.
- [DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference (TCC), 2006.
- [DSZ15] Cynthia Dwork, Weijie Su, and Li Zhang. Private false discovery rate control. CoRR, abs/1511.03803, 2015.
- [EPK14] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In ACM Conference on Computer and Communications Security (CCS), 2014.
- [HKR12] Justin Hsu, Sanjeev Khanna, and Aaron Roth. Distributed private heavy hitters. International Colloquium on Automata, Languages, and Programming (ICALP), 2012.
- [KLN08] Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? In IEEE Symposium on Foundations of Computer Science (FOCS), 2008.
- [KST12] Daniel Kifer, Adam Smith, and Abhradeep Thakurta. Private convex empirical risk minimization and high-dimensional regression. In Conference on Learning Theory (COLT), 2012.
- [LCS16] Jing Lei, Anne-Sophie Charest, Aleksandra Slavkovic, Adam Smith, and Stephen Fienberg. Differentially private model selection with penalized and constrained likelihood. arXiv preprint arXiv:1607.04204, 2016.
- [MMP10] Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, and Salil P. Vadhan. The limits of two-party differential privacy. In IEEE Symposium on Foundations of Computer Science (FOCS), 2010.
- [MT07] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In IEEE Foundations of Computer Science (FOCS), 2007.
- [STU17] Adam Smith, Abhradeep Thakurta, and Jalaj Upadhyay. Is interaction necessary for distributed private learning? In IEEE Symposium on Security and Privacy (Oakland), 2017.
- [SU17] Thomas Steinke and Jonathan Ullman. Tight lower bounds for differentially private selection. In IEEE Symposium on Foundations of Computer Science (FOCS), 2017.
- [TS13] Abhradeep Thakurta and Adam Smith. Differentially private feature selection via stability arguments, and the robustness of the LASSO. In Conference on Learning Theory (COLT), 2013.
- [TTZ15] Kunal Talwar, Abhradeep Thakurta, and Li Zhang. Nearly optimal private LASSO. In Advances in Neural Information Processing Systems, NIPS, 2015.
- [Ull15] Jonathan Ullman. Private multiplicative weights beyond linear queries. In ACM Symposium on Principles of Database Systems (PODS), 2015.
- [War65] Stanley L Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.