In the last decade, differential privacy (Dwork et al., 2006b) has become the de-facto gold standard of privacy-preserving data analysis. Moreover, in recent years, there has been a growing interest in devising differentially private techniques for statistical inference (see Related Work below). However, by and large, these works have focused on the centralized model, where the dataset in its entirety is given to a trusted curator who has direct access to the data. This is in contrast to the trust-free local model (Warner, 1965; Kasiviswanathan et al., 2008), in which each individual perturbs her own data and broadcasts the noisy (and privacy preserving) outcome. The local model is growing in popularity in recent years with practical, large scale deployments (see Erlingsson et al. (2014); Apple Press Info (2016)). Yet only a handful of works (Duchi et al., 2013a, b; Gaboardi and Rogers, 2018; Sheffet, 2018) examine differentially private statistical inference techniques in the local-model.
This work focuses on the task of mean estimation in the local-model. The problem is composed of iid samples drawn from a Gaussian such that for some known bound , and is either provided as an input (known variance case) or left unspecified (unknown variance case). We point out that the privacy analysis in our algorithms hold even if the assumption of normal data is not satisfied, whereas our utility analysis relies on this assumption. The goal of our algorithms is to provide an estimation of , which may be represented in multiple forms. The classical approach in statistical inference is to represent the likelihood that each point on the real line is
with a probability distribution — where in the case of known variance (
-test) the output is a Gaussian distribution, and in the case of unknown variance (-test) the output is a -distribution. This distribution allows an analyst to estimate a confidence interval based on the random sample of data s.t. , where non-privately it holds that (assuming is a constant). Based on confidence intervals, one is able to reject (or fail-to-reject) certain hypotheses about , such as the hypothesis that or that the means of two (or more) separate collections of samples ( and ) are identical.
The goal of this work is to provide upper- and lower-bounds for the problem of mean-estimation under -local differentially private (LDP) algorithms assuming the data is drawn from an unknown Gaussian. For our upper bounds in the case of known variance, we design a -LDP algorithm, which yields a confidence interval of length provided that . In the case of unknown variance we give an algorithm that returns a valid confidence interval of similar length assuming we have a lower bound on the value of the unknown . For our lower-bounds, we prove that any -LDP algorithm must return an interval whose length is , proving the optimality of our technique up to a -factor. In the known variance case, our algorithm results in a private -test, which we also assess empirically.
1.1 Our Techniques: Overview
In our algorithms, we use two basic LDP canonical algorithms of Randomized Response (Warner, 1965; Kasiviswanathan et al., 2008) and Bit Flipping (in its various versions) (Erlingsson et al., 2014; Bassily and Smith, 2015; Bassily et al., 2017). The mechanisms are known, and, for completeness, in Section 2 we provide utility bounds for these building blocks under randomly drawn input.
The Known Variance Case.
In the known variance case, our approach is a direct LDP implementation of the ideas behind the algorithm of Karwa and Vadhan (2018) who provide a private confidence interval in the centralized model. We equipartition the interval where is assumed to be between into sub-intervals of length , and use the above-mentioned Bit Flipping mechanism to find the most likely interval. The most common interval must be within distance from the mean (with high probability) of the underlying Gaussian distribution. This allows us to narrow in on an interval of length which should hold new points from the same distribution with probability at least .
Once we have found this interval, we merely project each datapoint onto and add Gaussian noise of to the projection, and then average the outcomes. This implies we have i.i.d sample points for a Gaussian of mean and variance .111Actually, this is an approximation of the distribution, since we clip the original Gaussian. However, since the probability mass we remove is , the TV-dist to this distribution is . Thus, , the average of these noisy datapoints, is also sampled from a Gaussian, whose variance is . We can thus represent the likelihood that each point on is the mean by using a Gaussian , which is our analog to the -test. Moreover, the interval of length centered at is a -confidence interval. Details appear in Section 3, where in Section 3.1 we also present some empirical assessment of our -test.
The Unknown Bounded Variance Case.
We now consider the case of unknown variance, where instead of knowing we are provided bounds on the smallest and largest (resp.) values of the variance: . First, we illustrate our algorithm in the case where we know . This is of course the more natural case, as we think of as large and as reasonable. Later, we discuss how to deal with the case of general unknown variance.
In this case, the approach of Karwa and Vadhan (2018) is to estimate the variance using the pairwise differences of the datapoints. That is due to the property of Gaussians where the difference between two iid samples is . This however is an approach that only works in the centralized model, where one is able to observe two datapoints without noise. In the local model, we are forced to use a different approach.
The approach we follow is to do binary search for different quantiles of the Gaussian, an approach which has appeared before in certain testers, and in particular in the work of Feldman (2017). Given a quantile , a continuous and smooth distribution , our goal is to find the threshold point such that for a given tolerance parameter . In each iteration , we hold an interval which is guaranteed to hold , and we use the middle point of this interval as our current guess. Denoting as the current interval’s mid-point, we use enough of the dataset to estimate up to error , and then either halt (if the estimated probability is approximately ) or recurse on either the left- or right-half of the interval. Since our initial interval is (of length ) and we must halt when we reach an interval of length (we treat as a constant), then the number of iterations overall is .
And so, we first run binary search till we find a point for which we estimate that . We then find a point for which we estimate that . Due to the properties of a Gaussian, and . Of course, we do not have access to the actual quantiles, but rather just an estimation of them, but we are still able to show that w.p. it holds that . (These bounds explain why taking as a constant, say , suffice for our needs.) We can thus run the algorithm for the known variance case with this estimation of the variance on the remainder of the dataset. The full details of our algorithm appear in Section 4.
The General Unknown Variance Case.
In the general case, where isn’t known, we begin by testing to see if the variance is or by estimating the probability that a new datapoint falls inside the interval . If this probability is large then we have that and we can use the previous algorithm for unknown bounded variance; whereas if this probability is small, it must be that , and we run a very different algorithm. Instead of binary search, we merely estimate using the first half of the points, and then estimate using the remaining half of the points. Denoting and as the points on the real line for which the CDF of a standard normal equals and
respectively, we can now interpolate a Gaussian curve that matchesto and to and infer its mean and variance accordingly. The key point is that both and are within distance of the true mean ; so by known properties of the Gaussian distribution, estimating and up to an error of implies a similar error guarantee in estimating . This approach is discussed in Section 5.
Lastly, we give bounds on any -LDP algorithm that approximates the mean of a Gaussian distribution. Formally, we say an algorithm -solves the mean-estimation problem if its input is a sample of points drawn iid from a Gaussian distribution with for some given parameter , and its output is an interval such that w.p. and furthermore . Note that the probability is taken over both the sample draws and the coin-tosses of the algorithm. We prove that any one-shot, where each datapoint is queried only once, -locally differentially private algorithm that -solves that mean estimation problem must have that and also hold that .
In addition, we also provide lower bounds for any one-shot -LDP algorithm that approximates the quantile of a given distribution using iid samples from . Our bounds show that dependency on certain parameters () is necessary. In particular, if (or ) is left unspecified (namely and ), then no LDP algorithm can -solve the mean-estimation problem.
Note that our upper-bounds are given by -LDP algorithms, yet our lower bounds deal only with -LDP algorithms. However, a recent result of Bun et al. (2018) shows that in the local model (as opposed to the centralized model) any -LDP is equivalent to a -LDP algorithm. Further details appear in the Preliminaries.
1.2 Related Work
Several works have studied the intersection of differential privacy and statistics (Dwork and Lei, 2009; Smith, 2011; Chaudhuri and Hsu, 2012; Duchi et al., 2013a, b; Dwork et al., 2015) mostly focusing on robust statistics; but only a handful of works study rigorously the significance and power of hypotheses testing under differential privacy (Vu and Slavkovic, 2009; Uhler et al., 2013; Wang et al., 2015; Gaboardi et al., 2016; Kifer and Rogers, 2017; Cai et al., 2017; Sheffet, 2017; Karwa and Vadhan, 2018). Vu and Slavkovic (2009) looked at the sample size for privately testing the bias of a coin. Johnson and Shmatikov (2013), Uhler et al. (2013) and Yu et al. (2014) focused on the Pearson -test, showing that the noise added by differential privacy vanishes asymptotically as the number of datapoints goes to infinity, and propose a private -based test which they study empirically. Wang et al. (2015), Gaboardi et al. (2016), and Kifer and Rogers (2017) then revised the statistical tests themselves to incorporate the additional noise due to privacy as well as the randomness in the data sample. Cai et al. (2017) give a private identity tester based on noisy -test over large bins, Sheffet (2017)
studies private Ordinary Least Squares using the JL transform, andAliakbarpour et al. (2018) study identity and equivalence testing. All of these works however deal with the centralized-model of differential privacy.
Few additional works are highly related to this work. Karwa and Vadhan (2018) give matching upper- and lower-bounds on the confidence intervals for the mean of a population, also in the centralized model. Duchi et al. (2013a, b) give matching upper- and lower-bound on robust estimators in the local model, and in particular discuss mean estimation. However, their bounds are related to minmax bounds rather than mean estimation or -tests. Gaboardi and Rogers (2018) and Sheffet (2018) study the asymptotic power and the sample complexity (respectively) of a variety of chi-squared based hypothesis tests in the local model. Finally, we mention the related work of Feldman (2017) who also discusses mean estimation using a version of a statistical query oracle which is thus related to LDP. Similar to our approach, Feldman (2017) also uses the folklore approach of binary search in the case the input variance is significantly smaller than the given bounding interval.
We will write the dataset where . Our goals is to develop confidence intervals for the mean subject to local differential privacy in two settings: (1) known variance, (2) unknown variance. We assume that the mean is in some finite interval
and similarly for the standard deviation, if it is not known a priori. We first present the definition of differential privacy in the curator model, where the algorithm takes a single element from universe as input.
An algorithm is -differentially private (DP) if for all and for all outcomes , we have
We then define local differential privacy, formalized by Kasiviswanathan et al. (2008), which does not require individuals to release their raw data to some curator, but rather each data entry is perturbed to prevent the true entry from being stored.
Definition 2 (LR Oracle).
Given a dataset , a local randomizer oracle takes as input an index and an -DP algorithm , and outputs chosen according to the distribution of , i.e. .
Definition 3 (Kasiviswanathan et al. (2008)).
An algorithm is -local differentially private (LDP) if it accesses the input database via the LR oracle with the following restriction: if for are the ’s invocations of on index , then each for is - DP and , .
In this work we present and prove bounds regarding one-shot mechanisms, where an algorithm is allowed to only query a user once and then she is never queried again.
We say a randomized mechanism is a one-shot local differentially private if for any dataset input , interacts with datum by first choosing a single differentially private mechanism , applying and then only post-processes the resulting output without any further interaction with . In other words, has only one-round of interaction with any datapoint. As a result is merely post-processing of the length vector of outputs .
Note that the definition of a one-shot mechanism does not rule out choosing the separate mechanisms adaptively — it is quite possible that depends on previous outcomes for . The definition only rules out the possibility of revisiting the datum of an individual based on prior responses from this datum.
We now present a result from Bun et al. (2018), which shows that approximate differential privacy, i.e. -DP where , cannot provide more accurate answers than pure-differential privacy, i.e. , in the local setting. This is another significant difference between the local and central model due to the fact that approximate-DP answers can be significantly more accurate than pure-DP answers in the central model.
Theorem 5 (Bun et al. (2018)).
Fix parameter . Let be -LDP with and . Then there exists an algorithm that is -LDP and has total variation distance of at most from for any input .
This result will prove to be useful in showing that our local private confidence interval widths are tight up to polylogarithmic terms. Note that this result was extended to other values of by Cheu et al. (2018).
We next define our utility goal, which is to find confidence intervals that contain the mean parameter with high probability, where the probability is over the sample and the randomness of the LDP algorithm.
Definition 6 (Confidence Interval).
An algorithm produces a valid -confidence interval for the mean of the underlying Gaussian distribution if the following holds
Our primary objective is to design an algorithm that is -LDP that also produces a valid -confidence interval.
Throughout this paper, we use several concentration bounds, especially for Gaussians, where it is known that for any we have
A useful tool in our analysis is the following well-known variation of McDiarmid’s inequality. The Hoeffding inequality is a direct result of it, in the case all random variables are distributed i.i.d.
[McDiarmid’s Inequality] Let be independent random variables. Denote and such that and . Then for any we have
2.1 Existing Locally Private Mechanisms
A basic approach to preserve differential privacy is to use additive random noise. Suppose each datum is sampled from an interval of length . Then adding random noise taken from to each datum (independently) guarantees -differential privacy (Dwork et al., 2006b); and adding random noise taken from to each datum (independently) guarantees -differential privacy (Dwork et al., 2006a).
Another canonical -local differentially private algorithm is the randomized response algorithm (Warner, 1965). In this mechanism, each datum is a bit and on each datum we operate independently, applying where
It is straight-forward to see that on an input composed of many s and many s, the expected number of s in the output is
and so the naïve estimator for the number of s in the input is
The following claim summarizes a folklore result about input chosen iid from a distribution. This will be useful in the sequel for our results.
Let be a domain and let be a distribution over this domain. Given a predicate , we denote . Given i.i.d draws from , denote by the randomized response estimator in (1) applied to the bits . Fix any . Then if then we have that
The proof applies both the Hoeffding and the McDiarmid inequality. Denoting as the number of s in the sampled input, we argue that when is large enough we have that
The first of the two inequalities is an immediate consequence of the Hoeffding bound, stating that in the process of sampling the entries from the distribution, since . Having fixed the input to have exactly ones, it is evident that is a function of the -bit input , with and where each datum can affect its value by at most . McDiarmid’s inequality thus states that
as . ∎
Another useful local differentially private algorithm is the bit flipping algorithm (Erlingsson et al., 2014; Bassily and Smith, 2015). Let be a domain and let be a partition of into types. This allows us to identify each datum in our dataset with a -dimensional vector indicating the type using a standard basis vector, or one-hot vector. The Bit Flipping mechanism now runs independent randomized response mechanism for each coordinate separately, where the privacy-loss for each coordinate is set as . Therefore, per datum we output a vector
, and seeing as each coordinate is slightly skewed towardsor , then de-biasing with the following estimator is likely to produce a good approximation of the true histogram for the input dataset:
Again, our focus is on the performance of the bit flipping mechanism over random input. Specifically, in the sequel we will used the following property.
Let be a domain and let be a distribution over this domain. Given a domain partition , we denote as the vector whose th entry is . Given , we denote the bit-flipping histogram applied to the -dimensional standard-basis vectors . Fix any . Then if then we have that
The proof is similar to the proof of Claim 8, replacing the naive bounds with a union bound. We apply both the Hoeffding and the McDiarmid inequality. Denote the empirical histogram over the types specified by over the drawn inputs as . We argue that when is large enough we have that and
The first of the two inequalities is an immediate consequence of a union bound along with the Hoeffding bound, stating that in the process of sampling the entries from the distribution,
since . Having fixed the input to have exactly entries of each type , it is evident that is a function of the input composed of standard basis vectors in
-dimensions. Our unbiased estimator thus satisfies that, and moreover, each datum can affect the value of by at most . Applying a union bound along with McDiarmid’s inequality, we get that
as . ∎
3 Confidence Intervals for the Mean with Known Variance
In this section we assume that is known and we want to estimate a confidence interval for based on a sample of users, subject to local differential privacy. As in Karwa and Vadhan (2018), we will break the algorithm into two parts. First, we discretize the interval into bins of width , so that we have a collection of disjoint intervals.
where . Denote as the function that maps each to the indicating vector of the bin it resides in, and assigns any point outside the interval the all- vector, we can now apply the Bit Flipping mechanism to estimate the histogram over the bins. Next, we find the bin with the largest count, denoted , and argue this bin is close up to two standard deviations to the true population mean . We then move to the second part of the algorithm, where we place an interval of length around the -th bin which is likely to hold all remaining points (a point outside this interval is projected onto the nearest point in ). Adding Gaussian noise to each point suffices to make the noisy result -differentially private, and yet we can still sum over all points and obtain an estimation of the population mean which is close up to . Details are given in Algorithm . We comment that we could replace the noise in the latter part by Laplace noise (rather than Gaussian) and obtain a -LDP; this however would prevent us from (naïvely) using the algorithm for the purpose of -test.
The following two theorems prove that Algorithm satisfies the required privacy and utility results.
This follows from the fact that Algorithm applies one of two locally differentially private mechanisms to each datum — either bit flipping (which is known to be -LDP) or additive random noise using Gaussian noise (a -LDP algorithm). ∎
Let and . Set + 1. If we have , then . Furthermore,
The utility analysis of our algorithm follows a similar analysis to Lemma 2.3 in Karwa and Vadhan (2018). First note that Claim 9 assures us that if , then each coordinate of is -close coordinate-wise to the true population histogram over the bins. We show that for sufficiently large, selecting to be the largest coordinate of implies that we are close to within a constant multiple of the standard deviation .
Let and be known and . Let . If , then selecting as the largest coordinate of the histogram we have that w.p. the following holds
The proof follows from the analysis done in Claim 1 of Karwa and Vadhan (2018). We order the entries of the histogram in a non-ascending order as . We then have the following difference between the largest bin and the 3rd largest bin (note that the largest and second largest bin might have equal counts in the extreme case where the mean lies precisely between the two bins, but in any case the 3rd largest bin will be at least one standard deviation from the mean and must have noticeably smaller count)
If , then the index for the corresponding largest entry of will be within of the ratio . Since each bin width is , we have . All that is left is to apply Claim 9 with accuracy parameter set as and . This completes the proof. ∎
Next, conditioned on finding such that , we argue that the interval is sufficiently large so that w.h.p the projection onto this interval does not alter even a single one of the datapoints in .
Suppose is an index satisfying the result of Lemma 12. Fix , and let . Then
We use the inequality . Lemma 12 bounds . Known concentration bounds for Gaussians give that . Applying a union bound over bad events concludes the proof. ∎
We can now provide the full utility analysis of Algorithm . Namely, we argue that we indeed obtain a locally differentially private estimate for the mean of our data in the known variance case. We advise the reader to compare this result to Theorem 4.1 in Karwa and Vadhan (2018) where the dependency on is (mainly) additive rather than multiplicative.
Proof of Theorem 11.
By definition, we have that , and by the symmetry of the Gaussian PDF we have that . Therefore, subject to Lemmas 12 and 13 holding, . Thus we have that w.p. it holds that the output of our algorithm satisfies proving the first part of the theorem.
The second part of the theorem follows for standard bounds on the Normal distribution, we state that . The remainder follows from the definition of and in Algorithm 1, and the fact that when then . ∎
Fix , set , and let . There exists an algorithm that returns a valid -confidence interval that is -LDP and
3.1 Experiment: -Test
As in Algorithm , we denote and . Following the proof of Theorem 11, we have that — under the assumption that no datapoint is clipped — all datapoints we use in the latter part of Algorithm 1 are sampled from . This allows us to infer that (w.p. ) the average of the datapoints in is sampled from . Just as in Algorithm 1, denoting as the average of the noisy datapoints, we now can define an approximation of the likelihood: . As a result, for any interval on the reals we can associate a likelihood of , and we know that w.p. it indeed holds that . This mimics the power of a -test (Hogg et al., 2005) — in particular we can now compare two intervals as to which one is more likely to hold , compare populations, etc.
Note however that, as opposed to standard -test, the result of Algorithm 1 only gives confidence bounds up to an error of . So for example, given two intervals and we can safely argue that it is more likely that than only when . Similarly, if we wish to draw an interval whose likelihood to contain is for some , we must pick a corresponding -confidence interval from . Naturally, this limits us to the setting where , or conversely: we can never allow for more certainty than the parameter specified as an input for Algorithm 1.
Subject to this caveat, Algorithm 1 allows us to perform -test in a similar fashion to the standard -test, after we omit the first datapoints from our sample. One of the more common uses of -test is to test whether a given sample behaves in a similar fashion to the general population. For example, suppose that the SAT scores of the entire population are distributed like a Gaussian of mean and variance . Taking a sample of SAT scores from one specific city, we can apply the
-test to see if we can reject the null hypothesis that the score distribution in this city are distributed just as they are distributed in the general population. Should we havesamples of SAT scores which happen to be distributed from for some , then sufficiently large (with dependency on ) should allow us to reject this null hypothesis with confidence . We set to discover precisely this notion of utility, using our locally-private -test.
The Experiment: We tested our LDP -test on iid samples from a Gaussian. We set the null-hypothesis to be , whereas the samples were drawn from the alternative hypothesis with . We run our experiments in the known variance case with a fixed bound and . In each set of experiments we vary while keeping . In Figure 0(a), we plot the average p-value over 1,000 trails for our Z-test when the data is actually generated with sample size and mean that varies. In Figure 0(b), we plot the empirical power of our test over 1000 trails where we fix and vary the sample size . Our figures show the tradeoffs between the privacy parameter, the alternate we are comparing the null to, and the sample size. The results themselves match the theory pretty well and emphasize the magnitude of the needed sample size. For we need 10,000 sample points to reject the null hypothesis w.h.p. When , even 100,000 sample points do not suffice to reject the null hypothesis w.h.p despite the fact that the difference between the means of the null and the alternative is times greater than the variance. This is a setting where non-privately we can reject the null hypothesis with a sample size . This illustrates (yet again) how LDP relies on the abundance of data.
4 Mean Estimation with Unknown (Bounded) Variance
In this section we discuss the problem of locally private mean estimation in the case where the variance of the underlying population is unknown. For ease of exposition, we separate this case into two sub-cases. First, we assume that the variance is bounded by some and it is the sole focus of this section as it the more likely of the two. In the second case, we consider very-large variance (), a case which Karwa and Vadhan (2018) do not analyze, and it is deferred to Section 5. As our lower bounds show, our algorithm must be provided bounds and such that . As we show, our parameters dependency on these upper- and lower-bounds on the variance is logarithmic (so, for example, is a useful bound for us).
Our overall approach in this section mimics the same approach from Algorithm 1. Our goal is to find a suitably large, yet sufficiently tight interval that is likely to hold the latter part of the input. However, finding this -interval cannot be done using the off-the-shelf bit flipping mechanism as that requires that we know the granularity of each bin in advance. Indeed, if we discretize the interval with an upper-bound on the variance, each bin might be far too large and result in an interval which is far larger than the variance of the underlying population; and if we were to discretize with a lower-bound on the variance we cannot guarantee substantial differences between the bins that are close to . And so, we abandon the idea of finding a histogram on the data. Instead, we propose finding a good approximation for via quantile estimation based on a binary search. This result is likely to be of independent interest. Once we establish formal guarantees on our locally private binary search algorithm (privacy and utility bounds), we plug those into our confidence interval estimation algorithm in Subsection 4.2.
4.1 Locally Private Binary Search and Quantile Estimation
We now show how to estimate quantiles of a probability distribution using randomized response and binary search. We assume our domain is contained in the real line and that there exists some distribution over this domain. We define the quantile as . Given a target probability , let be the quantile we want to estimate, namely . We will say that is a -quantile of when . Since our algorithm is randomized and therefore uses only estimations, we must allow for some error , and find some such that with high probability.
Our binary search begins with some bounded interval guaranteed to contain , i.e.