Subsampled Rényi Differential Privacy and Analytical Moments Accountant

07/31/2018 ∙ by Yu-Xiang Wang, et al. ∙ The Regents of the University of California 2

We study the problem of subsampling in differential privacy (DP), a question that is the centerpiece behind many successful differentially private machine learning algorithms. Specifically, we provide a tight upper bound on the Rényi Differential Privacy (RDP) (Mironov, 2017) parameters for algorithms that: (1) subsample the dataset, and then (2) apply a randomized mechanism M to the subsample, in terms of the RDP parameters of M and the subsampling probability parameter. This result generalizes the classic subsampling-based "privacy amplification" property of (ϵ,δ)-differential privacy that applies to only one fixed pair of (ϵ,δ), to a stronger version that exploits properties of each specific randomized algorithm and satisfies an entire family of (ϵ(δ),δ)-differential privacy for all δ∈ [0,1]. Our experiments confirm the advantage of using our techniques over keeping track of (ϵ,δ) directly, especially in the setting where we need to compose many rounds of data access.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Differential privacy (DP) is a mathematical definition of privacy proposed by Dwork et al. (2006b). Ever since its introduction, DP has been widely adopted and as of today, it has become the de facto standard of privacy definition in the academic world with also wide adoption in the industry (Erlingsson et al., 2014; Apple, 2017; Uber Security, 2017). DP provides provable protection against adversaries with arbitrary side information and computational power, allows clear quantification of privacy losses, and satisfies graceful composition over multiple access to the same data. Over the past decade, a large body of work has been developed to design basic algorithms and tools for achieving differential privacy, understanding the privacy-utility trade-offs in different data access setups, and on integrating differential privacy with machine learning and statistical inference. We refer the reader to (Dwork & Roth, 2013) for a more comprehensive overview.

Rényi Differential Privacy (RDP, see Definition 4(Mironov, 2017) is a recent refinement of differential privacy (Dwork et al., 2006b). It offers a unified view of the -differential privacy (pure DP), -differential privacy (approximate DP), and the related notion of Concentrated Differential Privacy (Dwork & Rothblum, 2016; Bun & Steinke, 2016). The RDP point of view on differential privacy is particularly useful when the dataset is accessed by a sequence of randomized mechanisms, as in this case a moments accountant technique can be used to effectively keep track of the usual DP parameters across the entire range  (Abadi et al., 2016).

A prime use case for the moments accountant technique is the NoisySGD algorithm (Song et al., 2013; Bassily et al., 2014) for differentially private learning, which iteratively executes:

(1)

where is the model parameter at th step, is the learning rate,

is the loss function of data point

, is the standard gradient operator, is an index set of size that we uniformly randomly drawn from , and . Adding Gaussian noise (also known as the Gaussian mechanism) is a standard way of achieving -differential privacy (Dwork et al., 2006a; Dwork & Roth, 2013; Balle & Wang, 2018). Since in the NoisySGD case the randomized algorithm first chooses (subsamples) the mini-batch randomly before adding the Gaussian noise, the overall scheme could be viewed as a subsampled Gaussian mechanism. Therefore, with the right setting of , each iteration of NoisySGD can be thought of as a private release of a stochastic gradient.

More generally, a subsampled randomized algorithm first takes a subsample of the dataset generated through some subsampling procedure111There are different subsampling methods, such as Poisson subsampling, sampling without replacement, sampling with replacement, etc., and then applies a known randomized mechanism on the subsampled data points. It is important to exploit the randomness in subsampling because if is -DP, then (informally) a subsampled mechanism obeys -DP for some related to the sampling procedure. This is often referred to as the “privacy amplification” lemma222Informally, this lemma states that, if a private algorithm is run on a random subset of a larger dataset (and the identity of that subset remains hidden), then this new algorithm provides better privacy protection (reflected through improved privacy parameters) to the entire dataset as a whole than the original algorithm did. — a key property that enables NoisySGD and variants to achieve optimal rates in convex problems (Bassily et al., 2014), and to work competitively in Bayesian learning (Wang et al., 2015)

and deep learning 

(Abadi et al., 2016) settings. A side note is that privacy amplification is also the key underlying technical tool for characterizing the learnability in statistical learning (Wang et al., 2016) and achieving tight sample complexity bounds for simple function classes (Beimel et al., 2013; Bun et al., 2015).

While privacy amplification via subsampling is a very important tool for designing good private algorithms, computing the RDP parameters for a subsampled mechanism is a non-trivial task. A natural question, with wide ranging implications for designing successful differentially private algorithms is the following: Can we obtain good bounds for privacy parameters of a subsampled mechanism in terms of privacy parameters of the original mechanism? With the exception of the special case of the Gaussian mechanism under Poisson subsampling analyzed in (Abadi et al., 2016), there is no analytical formula available to generically convert the RDP parameters of a mechanism to the RDP parameters of the subsampled mechanism.

In this paper, we tackle this central problem in private data analysis and provide the first general result in this area. Specifically, we analyze RDP amplification under a sampling without replacement procedure: , which takes a data set of

points and outputs a sample from the uniform distribution over all subsets of size

. Our contributions can be summarized as follows:

  • We provide a tight bound (Theorem 9) on the RDP parameter () for a subsampled mechanism () in terms of the RDP parameter () of the original mechanism () itself and the subsampling ratio . Here, is the order of the Rényi divergence in the RDP definition (see Definition 4 and the following discussion). This is the first general result in this area that can be applied to any RDP mechanism. For example, in addition to providing RDP parameter bounds for the subsampled Gaussian mechanism case, our result enables analytic calculation of similar bounds for many more commonly used privacy mechanisms including subsampled Laplace mechanisms, subsampled randomized response mechanisms, subsampled “posterior sampling” algorithms under exponential family models (Geumlek et al., 2017), etc. Even for the subsampled Gaussian mechanism our bounds are tighter than those provided by Abadi et al. (2016) (albeit the subsampling procedure and the dataset neighboring relation they use are slightly different from ours).

  • Consider a mechanism with RDP parameter . Interestingly, our bound on the RDP parameter of the subsampled mechanism indicates that as the order of RDP

    increases, there is a phase transition point

    satisfying . For , the subsampled mechanism has an RDP parameter , while for , the RDP parameter either quickly converges to which does not depend on , or tapers off at which happens when . The subsampled Gaussian mechanism falls into the first category, while the subsampled Laplace mechanism falls into the second.

  • Our analysis reveals a new theoretical quantity of interest that has not been investigated before — a ternary version of the Pearson-Vajda divergence (formally defined in Appendix B). A privacy definition defined through this divergence seems naturally coupled with understanding the effects of subsampling, just like how Rényi differential privacy (RDP) (Mironov, 2017) seems naturally coupled with understanding the effects of composition.

  • From a computational efficiency perspective, we propose an efficient data structure to keep track of the Rényi differential privacy parameters in its symbolic form, and output the corresponding -differential privacy as needed using efficient numerical methods. This avoids the need to specify a discrete list of moments ahead of time as required in the moments accountant method of Abadi et al. (2016) (see the discussion in Section 3.3). Finally, our experiments confirm the improvements in privacy parameters that can be obtained by applying our bounds.

We end this introduction with a methodological remark. The main result of this paper is the bound in Theorem 9, which at first glance looks cumbersome. The remarks following the statement of the theorem in Section 3.1 discuss some of the asymptotic implications of this bound, as well as its meaning in several special cases. These provide intuitive explanations justifying the tightness of the bound. In practice, however, asymptotic bounds are of limited interest: concrete bounds with explicit, tight constants that can be efficiently computed are needed to provide the best possible privacy-utility trade-off in practical applications of differential privacy. Thus, our results should be interpreted under this point of view, which is summarized by the leitmotif “in differential privacy, constants matter”.

2 Background and Related Work

In this section, we review some background about differential privacy, some related privacy notions, and the technique of moments accountant.

Differential privacy and Privacy Loss Random Variable.

We start with the definition of -differential privacy. We assume that is the domain that the datapoints are drawn from. We call two datasets and neighboring (adjacent) if they differ in at most one data point, meaning that we can obtain by replacing one data point from by another arbitrary data point. We represent this as .

Definition 1 (Differential Privacy).

A randomized algorithm is -DP (differentially private) if for every pair of neighboring datasets (i.e., that differs only by one datapoint), and every possible (measurable) output set the following inequality holds: .

The definition ensures that it is information-theoretically impossible for an adversary to infer whether the input dataset is or beyond a certain confidence, hence offering a degree of plausible deniability to individuals in the dataset. Here, are what we call privacy loss parameters and the smaller they are, the stronger the privacy guarantee is. A helpful way to work with differential privacy is in terms of tail bounds on the privacy loss random variable. Let and

be the probability distribution induced by

on neighboring datasets and respectively, the the privacy loss random variable is defined as: where . Up to constant factors, -DP (Definition 1) is equivalent to requiring that the probability of the privacy loss random variable being greater than is at most for all neighboring datasets .333For meaningful guarantees, is typically taken to be “cryptographically” small. An important strength of differential privacy is the ability to reason about cumulative privacy loss under composition of multiple analyses on the same dataset.

Classical design of differentially private mechanisms takes these privacy parameters as inputs and then the algorithm carefully introduces some randomness to satisfy the privacy constraint (Definition 1), while simultaneously trying to achieve good utility (performance) bounds. However, this paradigm has shifted a bit recently as it has come to our realization that a more fine-grained analysis tailored for specific mechanisms could yield more favorable privacy-utility trade-offs and better privacy loss parameters under composition (See, e.g., Dwork & Rothblum, 2016; Abadi et al., 2016; Balle & Wang, 2018).

A common technique for achieving differential privacy while working with a real-valued function is via addition of noise calibrated to ’s sensitivity , which is defined as the maximum of the absolute distance where are adjacent inputs.444The restriction to a scalar-valued function is intended to simplify this presentation, but is not essential. In this paradigm, the Gaussian mechanism is defined as: . A single application of the Gaussian mechanism to a function with sensitivity satisfies -differential privacy if555Balle & Wang (2018) show that a more complicated relation between and yields an if and only if statement. and  (Dwork & Roth, 2013, Theorem 3.22).

Stochastic Gradient Descent and Subsampling Lemma.

A popular way of designing differentially private machine learning models is to use Stochastic Gradient Descent (SGD) with differentially private releases of (sometimes clipped) gradients evaluated on mini-batches of a dataset 

(Song et al., 2013; Bassily et al., 2014; Wang et al., 2015; Foulds et al., 2016; Abadi et al., 2016). Algorithmically, these methods are nearly the same and are all based on the NoisySGD idea presented in (1). They differ primarily in how they keep track of their privacy loss. Song et al. (2013) uses a sequence of disjoint mini-batches to ensure each data point is used only once in every data pass. The results in (Bassily et al., 2014; Wang et al., 2016; Foulds et al., 2016) make use of the privacy amplification lemma to take advantage of the randomness introduced by subsampling. The first privacy amplification lemma appeared in (Kasiviswanathan et al., 2011; Beimel et al., 2013), with many subsequent improvements in different settings. For the case of -DP, Balle et al. (2018) provide a unified account of privacy amplification techniques for different types of subsampling and dataset neighboring relations. In this paper, we work in the subsampling without replacement setup, which satisfies the following privacy amplification lemma for -DP.

Definition 2 (Subsample).

Given a dataset of points, the procedure selects a random sample from the uniform distribution over all subsets of of size . The ratio is defined as the sampling parameter of the procedure.

Lemma 3 ((Ullman, 2017)666This result follows from Ullman’s proof, though the notes state a weaker result. See also (Balle et al., 2018)).

If is -DP, then that applies obeys -DP with and .

Roughly, the lemma says that subsampling with probability amplifies an -DP algorithm to an -DP algorithm for a sufficiently small choice of . The overall differentially private guarantees in (Wang et al., 2015; Bassily et al., 2014; Foulds et al., 2016) were obtained by keeping track of the privacy loss over each iterative update of the model parameters using the strong composition theorem in differential privacy (Dwork et al., 2010), which gives roughly -DP777The notation hides various logarithmic factors. for iterations of an arbitrary -DP algorithm (see Appendix A for a discussion about various composition results in differential privacy).

The work of Abadi et al. (2016) was the first to take advantage of the fact that is a subsampled Gaussian mechanism and used a mechanism-specific way of doing the strong composition. Their technique, referred to as moments accountant, is described below.

Cumulant Generating Functions, Moments Accountant, and Rényi Differential Privacy. The moments accountant technique of Abadi et al. (2016)

centers around the cumulant generating function (CGF, or the log of the moment generating function) of the privacy loss random variable:

(2)

After a change of measure, this is equivalent to:

Two random variables have identical CGFs then they are identically distributed (almost everywhere). In other words, this function characterizes the entire distribution of the privacy loss random variable.

Before explaining the details behind the moments accountant technique, we introduce the notion of Rényi differential privacy (RDP) (Mironov, 2017) as a generalization of differential privacy that uses the -Rényi divergences between and .

Definition 4 (Rényi Differential Privacy).

We say that a mechanism is -RDP with order if for all neighboring datasets

As RDP reduces to -DP (pure DP), i.e., a randomized mechanism is -DP if and only if for any two adjacent inputs and it satisfies . For , the RDP notion reduces to Kullback-Leibler based privacy notion, which is equivalent to a bound on the expectation of the privacy loss random variable. For a detailed exposition of the guarantee and properties of Rényi differential privacy that mirror those of differential privacy, see Section III of Mironov (2017). Here, we highlight two key properties that are relevant for this paper.

Lemma 5 (Adaptive Composition of RDP, Proposition 1 of (Mironov, 2017)).

If that takes dataset as input obeys -RDP, and that takes the dataset and the output of as input obeys -RDP, then their composition obeys -RDP.

Lemma 6 (RDP to DP conversion, Proposition 3 of (Mironov, 2017)).

If obeys -RDP, then obeys -DP for all .

RDP Functional View. While RDP for each fixed can be used as a standalone privacy measure, we emphasize its functional view in which is a function of for , and this function is completely determined by . This is denoted by , and with this notation, mechanism satisfies -RDP in Definition 4. In other words,

Here is referred to as the RDP parameter. We drop the subscript from when is clear from the context. We use (or ) to denote the case where , which indicates that the mechanism is -DP (pure DP) with .

Our goal is, given a mechanism that satisfies -RDP, to investigate the RDP parameter of the subsampled mechanism , i.e., to get a bound on such that the mechanism satisfies -RDP.

Note that is equivalent to a data-independent upper bound of the CGF (as defined in (2)),

up to a scaling transformation (with ) as noted by the following remark.

Remark 7 (Rdp Cgf).

A randomized mechanism obeys -RDP for all .

The idea of moments accountant (Abadi et al., 2016) is to essentially keep track of the evaluations of CGF at a list of fixed locations through Lemma 5 and then Lemma 6 allows one to find the smallest given a desired or vice versa using:

(3)
(4)

Using the convexity of CGF and monotonicity of in (Van Erven & Harremos, 2014, Corollary 2, Theorem 3), we observe that the optimization problem in (4) is log-convex and the optimization problem (3) is unimodal/quasi-convex. Therefore, the optimization problem in (3) (similarly, in (4)) can be solved to an arbitrary accuracy in time using the bisection method, where is the optimal value for from (3) (similarly, (4)). The same result holds even if all we have is (possibly noisy) blackbox access to or its derivative (see more details in Appendix G).

For other useful properties of the CGF and an elementary proof of its convexity and how it implies the monotonicity of the Rényi divergence, see Appendix H.

Other Related Work. A closely related notion to RDP is that of zero-concentrated differential privacy (zCDP) introduced in (Bun & Steinke, 2016) (see also (Dwork & Rothblum, 2016)). zCDP is related to CGF of the privacy loss random variable as we note here.

Remark 8 (Relation between CGF and Zero-concentrated Differential Privacy).

If randomized mechanism obeys -zCDP for some parameters , then the CGF . On the other hand, if ’s privacy loss r.v. has CGF , then is also -zCDP for all such that the quadratic function .

In general, the RDP view of privacy is broader than the CDP view as it captures finer information. For CDP, subsampling does not improve the privacy parameters (Bun et al., 2018). A truncated variant of the zCDP has been very recently proposed by Bun et al. (2018) and they studied the effect of subsampling in tCDP. While this independent work attempts to solve a problem closely related to ours, they are not directly comparable in that they deal with the amplification properties of tCDP while we deal with that of Rényi DP (and therefore CDP without truncation). A simple consequence of this difference is that the popular subsampled Gaussian mechanism explained above, that is covered by our analysis, is not directly covered by the amplification properties of tCDP.

3 Our Results

In this section, we present first our main result, an amplification theorem for Rényi Differential Privacy via subsampling. We first provide the upper bound, and then discuss the optimality of this bound. Based on these bounds, in Section 3.3, we discuss an idea for implementing a data structure that can efficiently track privacy parameters under composition.

3.1 “Privacy Amplification” for RDP

We start with our main theorem that bounds for the mechanism in terms of of the mechanism and sampling parameter used in the procedure. Missing details from this Section are collected in Appendix B.

Theorem 9 (RDP for Subsampled Mechanisms).

Given a dataset of points drawn from a domain and a (randomized) mechanism that takes an input from for , let the randomized algorithm be defined as: (1) : subsample without replacement datapoints of the dataset (sampling parameter ), and (2) apply : a randomized algorithm taking the subsampled dataset as the input. For all integers , if obeys -RDP, then this new randomized algorithm obeys -RDP where,

The bound in the above theorem might appear complicated, and this is partly because of our efforts to get a precise non-asymptotic bound (and not just a bound) that can be implemented in a real system. Some additional practical considerations related to evaluating the bound in this theorem such as computational resources needed, numerical stability issues, etc., are discussed in Appendix G. The phase transition behavior of this bound, noted in the introduction, is probably most easily observed through Figure 1 (Section 4), where we empirically illustrates the behavior of this bound for the commonly used subsampled mechanisms. Now before discussing the proof idea, we mention few remarks about this result.

Generality. Our results cover any Rényi differentially private mechanism, including those based on any exponential family distribution (see Geumlek et al., 2017, and our exposition in Appendix I). As mentioned earlier, previously such a bound (even asymptotically) was only known for the special case of the subsampled Gaussian mechanism (Abadi et al., 2016).

Pure DP. In particular, Theorem 9 also covers pure-DP mechanisms (such as Laplace and randomized response mechanisms) with a bounded . In this case, we can upper bound everything within the logarithm of Theorem 9 with a binomial expansion:

which results in a bound of the form

As the expression converges to which gives quantitatively the same result as the privacy amplification result in Lemma 3 for the pure DP, modulo an extra factor which becomes negligible as gets smaller.

Bound under Additional Assumptions. The bound in Theorem 9 could be strengthened under additional assumptions on the RDP guarantee. We defer a detailed discussion on this topic to Appendix B.5 (see Theorem 27), but note that a consequence of this is that one can replace in the above bound with an exact evaluation given by the forward finite difference operator of some appropriately defined functional. Also we note that these additional assumptions hold for the Gaussian mechanism.

In particular, with subsampled Gaussian mechanism for functions with sensitivity (i.e., ) the dominant part of the upper bound on arises from the term . Firstly, since the Gaussian mechanism does not have a bounded term, this term can be simplified as . Let us consider the regimes: (a) “large”, (b) “small”. When is large, becomes the tight term in . In this case, for small and , the overall bound simplifies to (matching the asymptotic bound given in Appendix C). When is small, then the becomes the tight term in . This (small ) is a regime that the results of Abadi et al. (2016) do not cover.

Integer to Real-valued . The above calculations rely on a binomial expansion and thus only work for integer ’s. To apply it to any real-valued, we can use the relation between RDF and CGF mentioned in Remark 7, and the fact that CGF is a convex function (see Lemma 36 in Appendix H). The convexity of

implies that a piecewise linear interpolation yields a valid upper bound for all

.

Corollary 10.

Let and denotes the floor and ceiling operators. Then, .

The bound on can be translated into a RDP parameter bound as noted in Remark 7.

Proof Idea The proof of this theorem is roughly split into three parts (see Appendix B.1). In the first part, we define a new family of privacy definitions called ternary--differential privacy (based on ternary version of Pearson-Vajda divergence) and show that it handles subsampling naturally (Proposition 16, Appendix B.1). In the second part, we bound the Rényi DP using the ternary--differential privacy and apply the subsampling lemma from the first part. In the third part, we propose a number of ways of converting the expression stated as ternary--differential privacy back to that of RDP (Lemmas 171819, Appendix B.1). Each of these conversion strategies yield different coefficients in the sum inside the logarithm defining ; our bound accounts for all these strategies at once by taking the minimum of these coefficients.

3.2 A lower bound of the RDP for subsampled mechanisms

We now discuss whether our bound in Theorem 9 can be improved. First, we provide a short answer: it cannot be improved in general.

Proposition 11.

Let be a randomized algorithm that takes a dataset in as an input. If obeys -RDP for a function and that there exists such that for all integer (e.g., this condition is true for all output perturbation mechanisms for counting queries), then the RDP function for obeys the following lower bound for all integers :

Proof.

Consider two datasets where contains data points that are identically and is different from only in its last data point. By construction, , and . In other words, and It follows that

When we take to be the one in the assumption that attains the RDP upper bound, then we can replace in the above bound with as claimed. ∎

Let us compare the above lower bound to our upper bound in Theorem 9 in two regimes. When , such that is the dominating factor in the summation, we can use the bounds to get that both the upper and lower bound are . In other words, they match up to a constant multiplicative factor. For other parameter configurations, note that , our bound in Theorem 9 (with the ) is tight up to an additive factor which goes to as and . We provide explicit comparisons of the upper and lower bounds in the numerical experiments presented in Section 4.

The longer answer to this question of optimality is more intricate. The RDP bound can be substantially improved when we consider more fine-grained per-instance RDP in the same flavor as the per-instance -DP (Wang, 2018). The only difference from the standard RDP is that now is parameterized by a pair of fixed adjacent datasets. This point is in illustrated in Appendix C, where we discuss an asymptotic approximation of the Rényi divergence for the subsampled Gaussian mechanism.

3.3 Analytical Moments Accountant

Our theoretical results above allow us to build an analytical moments accountant for composing differentially private mechanisms. This is a data structure that tracks the CGF function of a (potentially adaptive) sequence of mechanisms in symbolic form (or as an evaluation oracle). It supports subsampling before applying and the will be adjusted accordingly using the RDP amplification bound in Theorem 9. The data structure allows data analysts to query the smallest from a given (or vice versa) for -DP using (3) (or (4)).

Practically, our analytical moments accountant is better than the moment accountants proposed by Abadi et al. (2016) in several noteworthy ways: (1) our approach allows one to keep track the CGF’s of all in symbolic form without paying infinite memory, whereas moments account (Abadi et al., 2016) requires a predefined list of ’s and pays a memory proportional to the size of the list; (2) our approach completely avoids numerical integration used by moments account; and finally (3) our approach supports subsampling for generic RDP mechanisms while the moments accountant was built for supporting only Gaussian mechanisms. All of this translates into an efficient and accurate way for tracking ’s and ’s when composing differentially private mechanisms.

We design the data structure to be numerically stable, and efficient in both space and time. In particular, it tracks CGFs with time to compose a new mechanism and uses space only linear in the number of unique mechanisms applied (rather than the number of total mechanisms applied). Using the convexity of CGFs and the monotonicity of RDP, we are able to provide conversion to -DP to within accuracy in oracle complexity , where is the optimal value for . Similarly, for queries.

Note that for subsampled mechanisms the direct evaluation of the upper bounds in Theorem 9 is already polynomial in . To make the data structure truly scalable, we devise a number of ways to approximate the bounds that takes only evaluations of . More details about our analytical moments accountant and substantiations to the above claims are provided in Appendix G.

4 Experiments and Discussion

In this section, we present numerical experiments to demonstrate our upper and lower bounds of RDP for subsampled mechanisms and the usage of analytical moments accountant. In particular, we consider three popular randomized privacy mechanisms: (1) Gaussian mechanism (2) Laplace mechanism, and (3) randomized response mechanism, and investigate the amplification effect of subsampling with these mechanisms on RDP. The RDP of these three mechanisms are known in analytical forms (See, Mironov, 2017, Table II) :

[width=]figs/gaussian_RDP_against_alpha.pdf

(a) Subsampled Gaussian with .

[width=]figs/laplace_RDP_against_alpha.pdf

(b) Subsampled Laplace with .

[width=]figs/randresp_RDP_against_alpha.pdf

(c) Subsampled Rand. Resp. with .

[width=]figs/LN_gaussian_RDP_against_alpha.pdf

(d) Subsampled Gaussian with .

[width=]figs/LN_laplace_RDP_against_alpha.pdf

(e) Subsampled Laplace with .

[width=]figs/LN_randresp_RDP_against_alpha.pdf

(f) Subsampled Rand. Resp. with
Figure 1: The RDP parameter ( of the three subsampled mechanisms as a function of order , with subsampling rate in all the experiments. The top row illustrates the case where the base mechanism (before amplification using subsampling) is in a relatively high privacy regime (with ) and the bottom row shows the low privacy regime with . RDP upper bound obtained through Theorem 9 is represented as the blue curve, and the corresponding lower bound obtained through Proposition 11 is represented as the red dashed curve. For the Gaussian case, we also present the RDP bound obtained through the asymptotic Gaussian approximation idea explained in Appendix C.

Here

represents the variance of the Gaussian perturbation,

the variance of the Laplace perturbation, and the probability of replying truthfully in randomized response. We considered two groups of parameters for the three base mechanisms .

High Privacy Regime:

We set , and . These correspond to -DP, -DP, and approximately -DP for the Gaussian, Laplace, and Randomized response mechanisms, respectively, using the standard differential privacy calibration.

Low Privacy Regime:

We set , and . These correspond to -DP, -DP, and approximately -DP for the Gaussian, Laplace, and Randomized response mechanisms, respectively, using the standard differential privacy calibration.

The subsampling ratio is taken to be for both regimes.

In Figure 1, we plot the upper and lower bounds (as well as asymptotic approximations whenever applicable) of RDP parameter for the subsampled mechanism as a function of . As we can see, the upper and lower bounds match up to a multiplicative constant for all the three mechanisms. There is a phase transition in the subsampled Gaussian case as we expect in both the upper and lower bound, which occurs at about . Note that our upper bound (the blue curve) matches the lower bound up to a multiplicative constant throughout in all regimes. For subsampled Gaussian mechanism in Plots 0(a) and 0(d), the RDP parameter matches up to an (not visible in log scale) additive factor for large . The RDP parameter for subsampled Laplace and subsampled randomized response (in the second and third column) are both linear in at the beginning, then they flatten as approaches .

For the Gaussian mechanism we also plot an asymptotic approximation obtained under the assumption that the size of the input dataset grows while the subsampling ratio is kept constant. In fact, we derive two asymptotic approximations: one in the case of “good” data and one for “bad” data. The approximations and the definitions of “good” and “bad” data can be found in Appendix C. The asymptotic Gaussian approximation with the “bad” data in Example 28 matches almost exactly with lower bound up to the phase transition point both in the high- and low-privacy regimes. The Gaussian approximation for the “good” data (with ) is smaller than the lower bound, especially in the low-privacy regime, highlighting that we could potentially gain a lot by performing a dataset-dependent analysis.

[width=]figs/gaussian_compose_mean.pdf

(a) Subsampled Gaussian with .

[width=]figs/laplace_compose_mean.pdf

(b) Subsampled Laplace with .

[width=]figs/randresp_compose_mean.pdf

(c) Subsampled Rand. Resp. with .

[width=]figs/LN_gaussian_compose_mean.pdf

(d) Subsampled Gaussian with .

[width=]figs/LN_laplace_compose_mean.pdf

(e) Subsampled Laplace with .

[width=]figs/LN_randresp_compose_mean.pdf

(f) Subsampled Rand. Resp. with
Figure 2: Comparison of techniques for strong composition of -DP over data accesses with three different subsampled mechanisms. We plot as a function of the number of rounds of composition with (note that smaller is better). The top row illustrates the case where the base mechanism (before amplification using subsampling) is in a relatively high privacy regime (with ) and the bottom row shows the low privacy regime with . We consider two baselines: the naïve composition that simply adds up and the strong composition is through the result of (Kairouz et al., 2015) with an optimal choice of per-round parameter computed for every . The blue curve is based on the composition applied to the RDP upper bound obtained through Theorem 9, and the red dashed curve is based on the composition applied to the lower bound on RDP obtained through Proposition 11. For the Gaussian case, we also present the curves based on applying the composition on the RDP bound obtained through the Gaussian approximation idea explained in Appendix C.

In Figure 2, we plot the overall -DP for as we compose each of the three subsampled mechanisms for times. The is obtained as a function of for each separately by calling the query in our analytical moments ccountant. Our results are compared to the algorithm-independent techniques for differential privacy including naïve composition and strong composition. The strong composition baseline is carefully calibrated for each by choosing an appropriate pair of for such that the overall -DP guarantee that comes from composing rounds of using Kairouz et al. (2015) obeys that and is minimized. Each round is described by the -DP guarantee using the standard subsampling lemma (Lemma 3) and is obtained as a function of via (3).

Not surprisingly, both our approach and strong composition give an scaling while the naïve composition has an scaling throughout. An interesting observation for the subsampled Gaussian mechanism is that the RDP approach initially performs worse than the naïve composition and strong composition with the standard subsampling lemma. Our RDP lower bound certifies that this is not due to an artifact of our analysis but rather a fundamental limitation of the approach that uses RDP to obtain -DP guarantees. We believe this is a manifestation of the same phenomenon that leads to the sub-optimality of the classical analysis of the Gaussian mechanism (Balle & Wang, 2018), which also relies on the conversion of a bound on the CGF of the privacy loss into an -DP guarantee, and might be addressed using the necessary and sufficient condition for -DP in terms of tail probabilities of the privacy loss random variable given in (Balle & Wang, 2018, Theorem 5). Luckily, such an artifact does not affect the typical usage of RDP: as the number of rounds of composition continues to grow, we end up having about an order of magnitude smaller than the baseline approaches in the high privacy regime (see Figure 1(a)) and five orders of magnitude smaller in the low privacy regime (see Figure 1(d)).

The results for composing subsampled Laplace mechanisms and subsampled randomized response mechanisms are shown in Figures 1(b)1(c)1(e), and 1(f). Unlike the subsampled Gaussian case, the RDP-based approach achieves about the same or better bound for all when compared to what can be obtained using a subsampling lemma and strong composition.

5 Conclusion

In this paper, we have studied the effect of subsampling (without replacement) in amplifying Rényi differential privacy (RDP). Specifically, we established a tight upper and lower bound for the RDP parameter for the randomized algorithm that first subsamples the data set then applies to the subsample, in terms of the RDP parameter of . Our analysis also reveals interesting theoretical insight into the connection of subsampling to a linearized privacy random variable, higher order discrete differences of moment generating functions, as well as a ternary version of Pearson-Vajda divergence that appears fundamental in understanding and analyzing the effect of subsampling. In addition, we designed a data structure called analytical moments accountant which composes RDP for randomized algorithm (including subsampled ones) in symbolic forms and allows efficiently conversion of RDP to -DP for any (or ) of choice. These results substantially expands the scope of the mechanisms with RDP guarantees to cover subsampled versions of Gaussian mechanism, Laplace mechanism, Randomized Responses, posterior sampling and so on, which facilitates flexible differentially private algorithm design. We compared our approach to the standard approaches that use subsampling lemma on -DP directly and then applies strong composition, and in our experiments we notice an order of magnitude improvement in the privacy parameters with our bounds when we compose the subsampled Gaussian mechanism over multiple rounds.

Future work includes applying this technique to more advanced mechanisms for differentially private training of neural networks, addressing the data-dependent per-instance RDP for subsampled mechanisms, connecting the problem more tightly with statistical procedures that uses subsampling/resampling as key components such as

bootstrap and jackknife, as well as combining the new approach with subsampling-based sublinear algorithms for exploratory data analysis.

Acknowledgment

The authors thank Ilya Mironov and Kunal Talwar for helpful discussions and the clarification of their proof of Lemma 3 in (Abadi et al., 2016).

References

  • Abadi et al. (2016) Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. In ACM SIGSAC Conference on Computer and Communications Security (CCS-16), (pp. 308–318). ACM.
  • Apple (2017) Apple, D. (2017). Learning with privacy at scale. Apple Machine Learning Journal.
  • Balle et al. (2018) Balle, B., Barthe, G., & Gaboardi, M. (2018). Privacy amplification by subsampling: Tight analyses via couplings and divergences. In NIPS.
  • Balle & Wang (2018) Balle, B., & Wang, Y.-X. (2018). Improving gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. International Conference in Machine Learning (ICML).
  • Bassily et al. (2014) Bassily, R., Smith, A., & Thakurta, A. (2014). Private empirical risk minimization: Efficient algorithms and tight error bounds. In Foundations of Computer Science (FOCS-14), (pp. 464–473). IEEE.
  • Beimel et al. (2013) Beimel, A., Nissim, K., & Stemmer, U. (2013). Characterizing the sample complexity of private learners. In Conference on Innovations in Theoretical Computer Science (ITCS-13), (pp. 97–110). ACM.
  • Bernard et al. (2017) Bernard, T. S., Hsu, T., Perlroth, N., & Lieber, R. (2017). Equifax says cyberattack may have affected 143 million in the us. The New York Times, Sept, 7.
  • Bobkov et al. (2016) Bobkov, S., Chistyakov, G., & Götze, F. (2016). R’enyi divergence and the central limit theorem. arXiv preprint arXiv:1608.01805.
  • Bun et al. (2018) Bun, M., Dwork, C., Rothblum, G. N., & Steinke, T. (2018). Composable and versatile privacy via truncated cdp. In to appear in STOC-18.
  • Bun et al. (2015) Bun, M., Nissim, K., Stemmer, U., & Vadhan, S. (2015). Differentially private release and learning of threshold functions. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, (pp. 634–649). IEEE.
  • Bun & Steinke (2016) Bun, M., & Steinke, T. (2016). Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, (pp. 635–658). Springer.
  • Cadwalladr & Graham-Harrison (2018) Cadwalladr, C., & Graham-Harrison, E. (2018). Revealed: 50 million facebook profiles harvested for cambridge analytica in major data breach. The Guardian, 17.
  • Dajani et al. (2017) Dajani, A., Lauger, A., Singer, P., Kifer, D., Reiter, J., Machanavajjhala, A., Garfinkel, S., Dahl, S., Graham, M., Karwa, V., Kim, H., Leclerc, P., Schmutte, I., Sexton, W., Vilhuber, L., & Abowd, J. (2017). The modernization of statistical disclosure limitation at the u.s. census bureau. Census Scientific Advisory Commitee Meetings.
    URL https://www2.census.gov/cac/sac/meetings/2017-09/statistical-disclosure-limitation.pdf
  • Dwork et al. (2006a) Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., & Naor, M. (2006a). Our data, ourselves: Privacy via distributed noise generation. In International Conference on the Theory and Applications of Cryptographic Techniques, (pp. 486–503). Springer.
  • Dwork et al. (2006b) Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006b). Calibrating noise to sensitivity in private data analysis. In Theory of cryptography, (pp. 265–284). Springer.
  • Dwork & Roth (2013) Dwork, C., & Roth, A. (2013). The algorithmic foundations of differential privacy. Theoretical Computer Science, 9(3-4), 211–407.
  • Dwork & Rothblum (2016) Dwork, C., & Rothblum, G. N. (2016). Concentrated differential privacy. arXiv preprint arXiv:1603.01887.
  • Dwork et al. (2010) Dwork, C., Rothblum, G. N., & Vadhan, S. (2010). Boosting and differential privacy. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, (pp. 51–60). IEEE.
  • Erlingsson et al. (2014) Erlingsson, Ú., Pihur, V., & Korolova, A. (2014). Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, (pp. 1054–1067). ACM.
  • European Parliament & Council of the European Union (2016) European Parliament, & Council of the European Union (2016). Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation). Official Journal of the European Union.
  • Foulds et al. (2016) Foulds, J., Geumlek, J., Welling, M., & Chaudhuri, K. (2016). On the theory and practice of privacy-preserving bayesian data analysis. In

    Conference on Uncertainty in Artificial Intelligence (UAI-16)

    , (pp. 192–201). AUAI Press.
  • Geumlek et al. (2017) Geumlek, J., Song, S., & Chaudhuri, K. (2017). Renyi differential privacy mechanisms for posterior sampling. In Advances in Neural Information Processing Systems, (pp. 5295–5304).
  • Gil et al. (2013) Gil, M., Alajaji, F., & Linder, T. (2013). Rényi divergence measures for commonly used univariate continuous distributions. Information Sciences, 249, 124–131.
  • Kairouz et al. (2015) Kairouz, P., Oh, S., & Viswanath, P. (2015). The composition theorem for differential privacy. In International Conference on Machine Learning (ICML-15).
  • Kasiviswanathan et al. (2011) Kasiviswanathan, S. P., Lee, H. K., Nissim, K., Raskhodnikova, S., & Smith, A. (2011). What can we learn privately? SIAM Journal on Computing, 40(3), 793–826.
  • Lukacs (1970) Lukacs, E. (1970). Characteristic functions. Griffin.
  • Mironov (2017) Mironov, I. (2017). Rényi differential privacy. In Computer Security Foundations Symposium (CSF), 2017 IEEE 30th, (pp. 263–275). IEEE.
  • Murtagh & Vadhan (2016) Murtagh, J., & Vadhan, S. (2016). The complexity of computing the optimal composition of differential privacy. In Theory of Cryptography Conference, (pp. 157–175). Springer.
  • Nielsen & Nock (2014) Nielsen, F., & Nock, R. (2014). On the chi square and higher-order chi distances for approximating f-divergences. IEEE Signal Processing Letters, 21(1), 10–13.
  • Song et al. (2013) Song, S., Chaudhuri, K., & Sarwate, A. D. (2013). Stochastic gradient descent with differentially private updates. In Conference on Signal and Information Processing.
  • Sweeney (2015) Sweeney, L. (2015). Only you, your doctor, and many others may know. Technology Science, 2015092903.
  • Uber Security (2017) Uber Security (2017). Uber releases open source project for differential privacy. https://medium.com/uber-security-privacy/differential-privacy-open-source-7892c82c42b6.
  • Ullman (2017) Ullman, J. (2017). Cs7880: Rigorous approaches to data privacy, spring 2017. http://www.ccs.neu.edu/home/jullman/PrivacyS17/HW1sol.pdf.
  • Vajda (1973) Vajda, I. (1973). -divergence and generalized fisher information. In Prague Conference on Information Theory, Statistical Decision Functions and Random Processes, (p. 223). Academia.
  • Van Erven & Harremos (2014) Van Erven, T., & Harremos, P. (2014).

    Rényi divergence and kullback-leibler divergence.

    IEEE Transactions on Information Theory, 60(7), 3797–3820.
  • Wang (2018) Wang, Y.-X. (2018). Per-instance differential privacy. Journal of Confidentiality and Privacy, to appear.
  • Wang et al. (2015) Wang, Y.-X., Fienberg, S., & Smola, A. (2015). Privacy for free: Posterior sampling and stochastic gradient monte carlo. In International Conference on Machine Learning (ICML-15), (pp. 2493–2502).
  • Wang et al. (2016) Wang, Y.-X., Lei, J., & Fienberg, S. E. (2016). Learning with differential privacy: Stability, learnability and the sufficiency and necessity of erm principle. Journal of Machine Learning Research, 17(183), 1–40.

Appendix A Composition of Differentially Private Mechanisms

Composition theorems for differential privacy allow a modular design of privacy preserving mechanisms based on mechanisms for simpler sub tasks:

Theorem 12 (Naïve composition, Dwork et al. (2006a)).

A mechanism that permits adaptive interactions with mechanisms that preserves -differential privacy (and does not access the database otherwise) ensures -differential privacy.

A stronger composition is also possible as shown by Dwork et al. (2010).

Theorem 13 (Strong composition, Dwork et al. (2010)).

Let and . A mechanism that permits adaptive interactions with mechanisms that preserves -differential privacy ensures -differential privacy.

Kairouz et al. (2015) recently gave an optimal composition theorem for differential privacy, which provides an exact characterization of the best privacy parameters that can be guaranteed when composing a number of -differentially private mechanisms. Unfortunately, the resulting optimal composition bound is quite complex to state exactly, and indeed is even #P-complete to compute exactly when composing mechanisms with different parameters (Murtagh & Vadhan, 2016).

Appendix B Proofs and Missing Details from Section 3.1

In this section, we fill in the missing details and proofs from Section 3.1. We first define a few quantities needed to establish our results.

Pearson-Vajda Divergence and the Moments of Linearized Privacy Random Variable. The Pearson-Vajda Divergence (or -divergence) of order is defined as follows (Vajda, 1973):

(5)

This is closely related to the moment of the privacy random variable in that is the linearized version of . More interestingly, the th moment of the privacy random variable is the th derivate of the MGF evaluated at :

while at least for the even order, the -divergence is the th order forward finite difference of the MGF evaluated at :

(6)

In the above expression, the th order forward difference operator is defined recursively with

(7)

where denote the first order forward difference operator such that for any function . See Appendix D for more information on and its connection to binomial numbers.

b.1 A Sketch of the Proof of Theorem 9

In this section, we present a sketch of the proof of our main theorem. The arguments are divided into three parts. In the first part, we define a new family of privacy definitions called ternary--differential privacy and show that it handles subsampling naturally. In the second part, we bound the Rényi DP using the ternary--differential privacy and apply their subsampling lemma. In the third part, we propose several different ways of converting the expression stated as ternary--differential privacy back to that of RDP, hence giving rise to the stated results in the remarks following Theorem 9.

Part 1: Ternary--divergence and Natural Subsampling. Ternary--divergence is a novel quantity that measures the discrepancy of three distributions instead of two. Let be three probability distributions888We think of as the distributions , respectively, for mutually adjacent datasets ., we define

Using, this ternary--divergence notion, we define -ternary--differential privacy as follows. Analogously with RDP where we considered as a function of , we consider as a function of .

Definition 14 (Ternary--differential privacy).

We say that a randomized mechanism is -ternary--DP if for all :

Here, the mutually adjacent condition means , and is a function from to . Note that the above definition is a general case of the following binary--differential privacy definition that works with the standard Person-Vajda -divergences (as defined in (5)).

Definition 15 (Binary--differential privacy).

We say that a randomized mechanism is -binary--DP if for all :

Again, is a function from to .

As we described earlier, this notion of privacy shares many features of RDP and could have independent interest. It subsumes -DP (for ) and implies an entire family of -DP through Markov’s inequality. We provide additional details on this point in Appendix F.

For our ternary--differential privacy, what makes it stand out relative to Rényi DP is how it allows privacy amplification to occur in an extremely clean fashion, as the following proposition states:

Proposition 16 (Subsampling Lemma for Ternary--Dp).

Let a mechanism obey -ternary--DP, then the algorithm obeys -ternary--DP.

The entire proof is presented in Appendix B.2. The key idea involves using conditioning on subsampling events, constructing dummy random variables to match up each of these events, and the use of Jensen’s inequality to convert the intractable ternary--DP of a mixture distribution to that of three simple distributions that come from mutually adjacent datasets.

Part 2: Bounding RDP with Ternary--DP. We will now show that (a transformation of) the quantity of interest — RDP of the subsampled mechanism — can be expressed as a linear combination of a sequence of binary--DP parameters for integer through Newton’s series expansion of the moment generating function:

(8)

Observe that , so it suffices to bound for .

Note that is a special case of with , therefore,

The same holds if we write and restrict the maximum on the left to and with , adjacent, and the maximum on the right to , and with mutually adjacent , and . For the subsampled mechanism, the right-hand side of the above equation can be bounded by Proposition 16. Putting these together, we can bound (8) as

where mechanism satisfies -ternary--DP and denote the distributions , respectively, for adjacent datasets . Using this result along with the definition of Rényi differential privacy (from Definition 4) implies the RDP parameter following bound,

(9)

Part 3: Bounding Ternary--DP using RDP. It remains to bound using RDP. We provide several ways of doing so and plugging them into (9) show how the various terms in the bound of Theorem 9 arise. Missing proofs are presented in Appendix B.3.

  • The Term. To begin with, we show that the binary--DP and ternary--DP are equivalent up to a constant of .

    Lemma 17.

    If a randomized mechanism is -binary--DP, then it is -ternary--DP for some satisfying .

    For the special case of , we have

    Using the bound from Lemma 17 relating the binary and ternary--DP, gives that .

  • The Term. Now, we provide a bound for . We start with the following simple lemma.

    Lemma 18.

    Let be nonnegative random variables, for any

    This “triangular inequality”-like result exploits the nonnegativity of and captures the intrinsic cancellations of the terms of a Binomial expansion. If we do not have non-negativity, the standard expansion will have a factor rather than (see e.g., Proposition 3.2 of Bobkov et al. (2016)).

    An alternative bound that is tighter in cases when and is related to each other with a multiplicative bound. Note that this bound is only going to be useful when has a bounded , such as when satisfies -DP guarantee.

    Lemma 19.

    Let be nonnegative random variables and with probability , . Then for any