1 Introduction
Subsampling is a fundamental tool in the design and analysis of differentially private mechanisms. Broadly speaking, the intuition behind the “privacy amplification by subsampling” principle is that the privacy guarantees of a differentially private mechanism can be amplified by applying it to a small random subsample of records from a given dataset. In machine learning, many classes of algorithms involve sampling operations, e.g. stochastic optimization methods and Bayesian inference algorithms, and it is not surprising that results quantifying the privacy amplification obtained via subsampling play a key role in designing differentially private versions of these learning algorithms
(Bassily et al., 2014; Wang et al., 2015; Abadi et al., 2016; Jälkö et al., 2017; Park et al., 2016b, a). Additionally, from a practical standpoint subsampling provides a straightforward method to obtain privacy amplification when the final mechanism is only available as a blackbox. For example, in Apple’s iOS and Google’s Chrome deployments of differential privacy for data collection the privacy parameters are hardcoded into the implementation and cannot be modified by the user. In this type of settings, if the default privacy parameters are not satisfactory one could achieve a stronger privacy guarantee by devising a strategy that only submits to the mechanism a random sample of the data.Despite the practical importance of subsampling, existing tools to bound privacy amplification only work for specific forms of subsampling and typically come with cumbersome proofs providing no information about the tightness of the resulting bounds. In this paper we remedy this situation by providing a general framework for deriving tight privacy amplification results that can be applied to any of the subsampling strategies considered in the literature. Our framework builds on a characterization of differential privacy in terms of divergences (Barthe and Olmedo, 2013). This characterization has been used before for program verification (Barthe et al., 2012, 2016), while we use it here for the first time in the context of algorithm analysis. In order to do this, we develop several novel analytical tools, including advanced joint convexity – a property of divergence with respect to mixture distributions – and privacy profiles – a general tool describing the privacy guarantees that private algorithms provide.
One of our motivations to initiate a systematic study of privacy amplification by subsampling is that this is an important primitive for the design of differentially private algorithms which has received less attention than other building blocks like composition theorems (Dwork et al., 2010; Kairouz et al., 2017; Murtagh and Vadhan, 2016). Given the relevance of sampling operations in machine learning, it is important to understand what are the limitations of privacy amplification and develop a finegrained understanding of its theoretical properties. Our results provide a first step in this direction by showing how privacy amplification resulting from different sampling techniques can be analyzed by means of single set of tools, and by showing how these tools can be used for proving lower bounds. Our analyses also highlight the importance of choosing a sampling technique that is welladapted to the notion of neighbouring datasets under consideration. A second motivation is that subsampling provides a natural example of mechanisms where the output distribution is a mixture. Because mixtures have an additive structure and differential privacy is defined in terms of a multiplicative guarantee, analyzing the privacy guarantees of mechanisms whose output distribution is a mixture is in general a challenging task. Although our analyses are specialized to mixtures arising from subsampling, we believe the tools we develop in terms of couplings and divergences will also be useful to analyze other types of mechanisms involving mixture distributions. Finally, we want to remark that privacy amplification results also play a role in analyzing the generalization and sample complexity properties of private learning algorithms (Kasiviswanathan et al., 2011; Beimel et al., 2013; Bun et al., 2015; Wang et al., 2016); an indepth understanding of the interplay between sampling and differential privacy might also have applications in this direction.
2 Problem Statement and Methodology Overview
A mechanism with input space and output space is a randomized algorithm that on input outputs a sample from the distribution over . Here
denotes the set of probability measures on the output space
. We implicitly assume is equipped with a sigmaalgebra of measurable subsets and a base measure, in which case is restricted to probability measures that are absolutely continuous with respect to the base measure. In most cases of interest is either a discrete space equipped with the counting measure or an Euclidean space equipped with the Lebesgue measure. We also assume is equipped with a binary symmetric relation defining the notion of neighbouring inputs.Let and . A mechanism is said to be differentially private w.r.t. if for every pair of inputs and every measurable subset we have
(1) 
For our purposes, it will be more convenient to express differential privacy in terms of divergences^{2}^{2}2Also known in the literature as elementary divergences (Österreicher, 2002) and hockeystick divergences (Sason and Verdú, 2016).. Concretely, the divergence () between two probability measures is defined as^{3}^{3}3Here denotes the RadonNikodym derivative between and . In particular, if and have densities and with respect to some base measure , then .
(2) 
where ranges over all measurable subsets of , , and the last equality is a specialization for discrete . It is easy to see (Barthe and Olmedo, 2013) that is differentially private if and only if for every and such that .
In order to emphasize the relevant properties of from a privacy amplification point of view, we introduce the concepts of privacy profile and groupprivacy profiles. The privacy profile of a mechanism is a function associating to each privacy parameter a bound on the divergence between the results of running the mechanism on two adjacent datasets, i.e. (we will discuss the properties of this tool in more details in the next section). Informally speaking, the privacy profile represents the set of all of privacy parameters under which a mechanism provides differential privacy. In particular, recall that an DP mechanism is also DP for any and any . The privacy profile defines a curve in that separates the space of privacy parameters into two regions: the ones for which satisfies differential privacy and the ones for which it does not. This curve exists for every mechanism , even for mechanisms that satisfy pure DP for some value of . When the mechanism is clear from the context we might slightly abuse our notation and write or for the corresponding privacy profile. To define groupprivacy profiles () we use the pathdistance induced by :
With this notation, we define . Note that .
Problem Statement
A wellknown method for increasing privacy of a mechanism is to apply the mechanism to a random subsample of the input database, rather than on the database itself. Intuitively, the method decreases the chances of leaking information about a particular individual because nothing about that individual can be leaked in the cases where the individual is not included in the subsample. The question addressed in this paper is to devise methods for quantifying amplification and for proving optimality of the bounds. This turns out to be a surprisingly subtle problem.
Formally, let and be two sets equipped with neighbouring relations and respectively. We assume that both and contain databases (modelled as sets, multisets, or tuples) over a universe that represents all possible records contained in a database. A subsampling mechanism is a randomized algorithm that takes as input a database and outputs a finitely supported distribution over datasets. Note that we find it convenient to distinguish between and because and might not always have the same type. For example, sampling with replacement from a set yields a multiset .
The problem of privacy amplification can now be stated as follows: let be a mechanism with privacy profile with respect to , and let be a subsampling mechanism. Consider the subsampled mechanism given by , where the composition notation means we feed a sample from into . The goal is to relate the privacy profiles of and , via an inequality of the form: for every , there exists such that , where is some function to be determined. In terms of differential privacy, one can be read as saying that if is DP, them the subsampled mechanism is DP for some . This is a privacy amplification statement because the new mechanism has better privacy parameters than the original one.
A full specification of this problem requires formalizing the following three ingredients: (i) dataset representation specifying whether the inputs to the mechanism are sets, multisets, or tuples; (ii) neighbouring relations in and , including the usual remove/addone and substituteone relations; (iii) subsampling method and its parameters, with the most commonly used being subsample without replacement, subsampling with replacement, and Poisson subsampling.
Regardless of the specific setting being considered, the main challenge in the analysis of privacy amplification by subsampling resides in the fact that the output distribution of the mechanism is a mixture distribution. In particular, writing for any and taking to be the (finitely supported) distribution over subsamples from produced by the subsampling mechanism, we can write , where denotes the Markov kernel operating on measures defined by . Consequently, proving privacy amplifications results requires reasoning about the mixtures obtained when sampling from two neighbouring datasets , and how the privacy parameters are affected by the mixture.
Our Contribution
We provide a unified method for deriving privacy amplification by subsampling bounds (Section 3). Our method recovers all existing results in the literature and allow us to derive novel amplification bounds (Section 4). In most cases our method also provides optimal constants which are shown to be tight by a generic lower bound (Section 5). Our analysis relies on properties of divergences and privacy profiles, together with two additional ingredients.
The first ingredient is a novel advanced joint convexity property providing upper bounds on the divergence between overlapping mixture distributions. In the specific context of differential privacy this result yields for every :
(3) 
for , some , and being the total variation distance between the distributions over subsamples. Here are suitable measures obtained from and through a coupling and projection operation. In particular, the proof of advanced joint convexity uses ideas from probabilistic couplings, and more specifically the maximal coupling construction (see Theorem 2 and its proof for more details). It is also interesting to note that the nonlinear relation already appears in some existing privacy amplification results (e.g. Li et al. (2012)). Although for small and this relation yields , our results show that the more complicated nonlinear relation is in fact a fundamental aspect of privacy amplification by subsampling.
The second ingredient in our analysis establishes an upper bound for the divergences occurring in the right hand side of (3) in terms of groupprivacy profiles. It states that under suitable conditions, we have for suitable choices of . Again, the proof of the inequality uses tools from probabilistic couplings.
The combination of these results yields a bound of the privacy profile of as a function of the groupprivacy profiles of . Based on this inequality, we will establish several privacy amplification result and prove tightness results. This methodology can be applied to any of the settings discussed above in terms of dataset representation, neighbouring relation, and type of subsampling. Table 1 summarizes several results that can be obtained with our method (see Section 4 for details). The supplementary material also contains plots illustrating our bounds (Figure 0(a)) and proofs of all the results presented in the paper.
3 Tools: Couplings, Divergences and Privacy Profiles
We next introduce several tools that will be used to support our analyses. The first and second tools are known, whereas the remaining tools are new and of independent interest.
Divergences
The following characterization follows immediately from the definition of divergence in terms of the supremum over .
Theorem 1 ((Barthe and Olmedo, 2013)).
A mechanism is differentially private with respect to if and only if .
Note that in the statement of the theorem we take . Throughout the paper we sometimes use these two notations interchangeably to make expressions more compact.
We now state consequences of the definition of divergence: (i) ; (ii) the function is monotonically decreasing; (iii) the function is jointly convex. Furthermore, one can show that if and only if .
Couplings
Couplings are a standard tool for deriving upper bounds for the statistical distance between distributions. Concretely, it is wellknown that the total variation distance between two distributions satisfies for any coupling , where equality is attained by taking the socalled maximal coupling. We recall the definition of coupling and provide a construction of the maximal coupling, which we shall use in later sections.
A coupling between two distributions is a distribution whose marginals along the projections and are and respectively. Couplings always exist, and furthermore, there exists a maximal coupling, which exactly characterizes the total variation distance between and . Let and let , where denotes the total variation distance. The maximal coupling between and is defined as the mixture , where , and . Projecting the maximal coupling along the marginals yields the overlapping mixture decompositions and .
Advanced Joint Convexity
The privacy amplification phenomenon is tightly connected to an interesting new form of joint convexity for divergences, which we call advanced joint convexity.
Theorem 2 (Advanced Joint Convexity of ^{4}^{4}4Proofs of all our results are presented in the appendix.).
Let be measures satisfying and for some , , , and . Given , let and . Then the following holds:
(4) 
Note that writing and in the above lemma we get the relation . Applying standard joint convexity to the right hand side above we conclude: . Note that applying joint convexity directly on instead of advanced joint complexity yields a weaker bound which implies amplification for the privacy parameter, but not for the privacy parameter.
When using advanced joint convexity to analyze privacy amplification we consider two elements and and fix the following notation. Let and and and , where we use the notation to denote the Markov kernel associated with mechanism operating on measures over . We then consider the mixture factorization of and obtained by taking the decompositions induced by projecting the maximal coupling on the first and second marginals: and . It is easy to see from the construction of the maximal coupling that and have disjoint supports and is the smallest probability such that this condition holds. In this way we obtain the canonical mixture decompositions and , where , and .
Privacy Profiles
We state some important properties of privacy profiles. Our first result illustrates our claim that the “privacy curve” exists for every mechanism in the context of the Laplace output perturbation mechanism.
Theorem 3.
Let be a function with global sensitivity . Suppose is a Laplace output perturbation mechanism with noise parameter . The privacy profile of is given by , where .
The wellknown fact that the Laplace mechanism with is DP follows from this result by noting that for any . However, Theorem 3 also provides more information: it shows that for the Laplace mechanism with noise parameter satisfies DP with .
For mechanisms that only satisfy approximate DP, the privacy profile provides information about the behaviour of as we increase . The classical analysis for the Gaussian output perturbation mechanism provides some information in this respect. Recall that for a function with global sensitivity the mechanism satisfies DP if and (cf. (Dwork and Roth, 2014, Theorem A.1)). This can be rewritten as for , where . Recently, Balle and Wang (Balle and Wang, 2018) gave a new analysis of the Gaussian mechanism that is valid for all values of
. Their analysis can be interpreted as providing an expression for the privacy profile of the Gaussian mechanism in terms of the CDF of a standard normal distribution
.Theorem 4 ((Balle and Wang, 2018)).
Let be a function with global sensitivity . For any let . The privacy profile of the Gaussian mechanism is given by .
Interestingly, the proof of Theorem 4
implicitly provides a characterization of privacy profiles in terms of privacy loss random variables that holds for any mechanism. Recall that the
privacy loss random variable of a mechanism on inputs is defined as , where , , and .Theorem 5 ((Balle and Wang, 2018)).
The privacy profile of any mechanism satisfies
The characterization above generalizes the wellknown inequality (eg. see (Dwork and Roth, 2014)). This bound is often used to derive
DP guarantees from other notions of privacy defined in terms of the moment generating function of the privacy loss random variable, including concentrated DP
(Dwork and Rothblum, 2016), zeroconcentrated DP (Bun and Steinke, 2016), Rényi DP (Mironov, 2017), and truncated concentrated DP (Bun et al., 2018). We now show a reverse implication also holds. Namely, that privacy profiles can be used to recover all the information provided by the moment generating function of the privacy loss random variable.Theorem 6.
Given a mechanism and inputs let and . For , define the moment generating function . Then we have
In particular, if holds^{5}^{5}5For example, this is satisfied by all output perturbation mechanisms with symmetric noise distributions. for every , then .
Groupprivacy Profiles
Recall the th group privacy profile of a mechanism is defined as . A standard group privacy analysis^{6}^{6}6If is DP with respect to , then it is DP with respect to , cf. (Vadhan, 2017, Lemma 2.2) immediately yields . However, “whitebox” approaches based on full knowledge of the privacy profile of can be used to improve this result for specific mechanisms. For example, it is not hard to see that, combining the expressions from Theorems 3 and 4 with the triangle inequality on the global sensitivity of changing records in a dataset, one obtains bounds that improve on the “blackbox” approach for all ranges of parameters for the Laplace and Gaussian mechanisms. This is one of the reasons why we state our bounds directly in terms of (group)privacy profiles (a numerical comparison can be found in the supplementary material).
Distancecompatible Coupling
The last tool we need to prove general privacy amplification bounds based on divergences is the existence of a certain type of couplings between two distributions like the ones occurring in the right hand side of (4). Recall that any coupling between two distributions can be used to rewrite the mixture distributions and as and . Using the joint convexity of and the definition of groupprivacy profiles to get the bound
(5) 
Since this bound holds for any coupling , one can set out to optimize it by finding a coupling the minimizes the right hand side of (5). We show that the existence of couplings whose support is contained inside a certain subset of is enough to obtain an optimal bound. Furthermore, we show that when this condition is satisfied the resulting bound depends only on and the groupprivacy profiles of .
We say that two distributions are compatible if there exists a coupling between and such for any we have , where the distance between a point and the set is defined as the distance between and the closest point in .
Theorem 7.
Let be the set of all couplings between and and for let . If and are compatible, then the following holds:
(6) 
Applying this result to the bound resulting from the right hand side of (4) yields most of the concrete privacy amplification results presented in the next section.
4 Privacy Amplification Bounds
In this section we provide explicit privacy amplification bounds for the most common subsampling methods and neighbouring relations found in the literature on differential privacy, and provide pointers to existing bounds and other related work. For our analysis we work with orderindependent representations of datasets without repetitions, i.e. sets. This is mostly for technical convenience, since all our results also hold if one considers datasets represented as tuples or multisets. Note however that subsampling with replacement for a set can yield a multiset; hence we introduce suitable notations for sets and multisets.
Fix a universe of records and let . We write and for the spaces of all sets and multisets with records from . Note every set is also a multiset. For we also write and for the spaces of all sets and multisets containing exactly records^{7}^{7}7In the case of multisets records are counted with multiplicity. from . Given we write for the number of occurrences of in . The support of a multiset is the defined as the set of elements that occur at least once in . Given multisets we write to denote that for all .
For orderindependent datasets represented as multisets it is natural to consider the two following neighbouring relations. The remove/addone relation is obtained by letting hold whenever with or with ; i.e. is obtained by removing or adding a single element to . The substituteone relation is obtained by letting hold whenever and ; i.e. is obtained by replacing an element in with a different element from . Note how relates pairs of datasets with different sizes, while only relates pairs of datasets with the same size.
Poisson Subsampling
Perhaps the most wellknown privacy amplification result refers to the analysis of Poisson subsampling with respect to the remove/addone relation. In this case the subsampling mechanism takes a set and outputs a sample from the distribution supported on all set given by . This corresponds to independently adding to with probability each element from . Now, given a mechanism with privacy profile with respect to , we are interested in bounding the privacy profile of the subsampled mechanism with respect to .
Theorem 8.
Let . For any we have , where .
Privacy amplification with Poisson sampling was used in (Chaudhuri and Mishra, 2006; Beimel et al., 2010; Kasiviswanathan et al., 2011; Beimel et al., 2014), which considered loose bounds. A proof of this tight result in terms of DP was first given in (Li et al., 2012). In the context of the moments accountant technique based on the moment generating function of the privacy loss random variable, (Abadi et al., 2016) provide an amplification result for Gaussian output perturbation mechanisms under Poisson subsampling.
Sampling Without Replacement
Another known results on privacy amplification corresponds to the analysis of sampling without replacement with respect to the substitution relation. In this case one considers the subsampling mechanism that given a set of size
outputs a sample from the uniform distribution
over all subsets of size . Then, for a given a mechanism with privacy profile with respect to the substitution relation on sets of size , we are interested in bounding the privacy profile of the mechanism with respect to the substitution relation on sets of size .Theorem 9.
Let . For any we have , where .
This setting has been used in (Beimel et al., 2013; Bassily et al., 2014; Wang et al., 2016) with nontight bounds. A proof of this tight bound formulated in terms of DP can be directly recovered from Ullman’s class notes (Ullman, 2017), although the stated bound is weaker. Rényi DP amplification bounds for subsampling without replacement were developed in (Wang et al., 2018).
Sampling With Replacement
Next we consider the case of sampling with replacement with respect to the substitution relation . The subsampling with replacement mechanism takes a set of size and outputs a sample from the multinomial distribution over all multisets of size with , given by . In this case we suppose the base mechanism is defined on multisets and has privacy profile with respect to . We are interested in bounding the privacy profile of the subsampled mechanism with respect to .
Theorem 10.
Let . Given and we have
Hybrid Neighbouring Relations
Using our method it is also possible to analyze new settings which have not been considered before. One interesting example occurs when there is a mismatch between the two neighbouring relations arising in the analysis. For example, suppose one knows the groupprivacy profiles of a base mechanism with respect to the substitution relation . In this case one could ask whether it makes sense to study the privacy profile of the subsampled mechanism with respect to the remove/add relation . In principle, this makes sense in settings where the size of the inputs to is restricted due to implementation constraints (eg. limited by the memory available in a GPU used to run a private mechanism that computes a gradient on a minibatch of size ). In this case one might still be interested in analyzing the privacy loss incurred from releasing such stochastic gradients under the remove/add relation. Note that this setting cannot be implemented using sampling without replacement since under the remove/add relation we cannot a priori guarantee that the input dataset will have at least size because the size of the dataset must be kept private (Vadhan, 2017). Furthermore, one cannot hope to get a meaningful result about the privacy profile of the subsampled mechanism across all inputs sets in ; instead the privacy guarantee will depend on the size of the input dataset as shown in the following result.
Theorem 11.
Let . For any and we have
where .
When the Neighbouring Relation is “Incompatible”
Now we consider a simple example where distancecompatible couplings are not available: Poisson subsampling with respect to the substitution relation. Suppose are sets of size related by the substitution relation . Let and and note that . Let and , . In this case the factorization induced by the maximal coupling is obtained by taking , , and . Now the support of contains sets of sizes between and , while the supports of and contain sets of sizes between and . From this observation one can deduce that and are not compatible, and and are not compatible.
This argument shows that the method we used to analyze the previous settings cannot be extended to analyze Poisson subsampling under the substitution relation, regardless of whether the privacy profile of the base mechanism is given in terms of the replacement/addition or the substitution relation. This observation is saying that some pairings between subsampling method and neighbouring relation are more natural than others. Nonetheless, even without distancecompatible couplings it is possible to provide privacy amplification bounds for Poisson subsampling with respect to the substitution relation, although the resulting bound is quite cumbersome. The corresponding statement and analysis can be found in the supplementary material.
5 Lower Bounds
In this section we show that many of the results given in the previous section are tight by constructing a randomized membership mechanism that attains these upper bounds. For the sake of generality, we state the main construction in terms of tuples instead of multisets. In fact, we prove a general lemma that can be used to obtain tightness results for any subsampling mechanism and any neighbouring relation satisfying two natural assumptions.
For let be the randomized response mechanism that given returns with probability and with probability . Note that for this mechanism is DP. Let and . For any and define . It is easy to verify that . Now let be a universe containing at least two elements. For and we define the randomized membership mechanism that given a tuple returns . We say that a subsampling mechanism defined on some set is natural if the following two conditions are satisfied: (1) for any and , if then there exists such that ; (2) for any and , if then we have for every .
Lemma 12.
Let be equipped with a neighbouring relation such that there exist with and . Suppose is a natural subsampling mechanism and let . For any and we have
We can now apply this lemma to show that the first three results from previous section are tight. This requires specializing from tuples to (multi)sets, and plugging in the definitions of neighbouring relation, subsampling mechanism, and used in each of these theorems.
6 Conclusions
We have developed a general method for reasoning about privacy amplification by subsampling. Our method is applicable to many different settings, some which have already been studied in the literature, and others which are new. Technically, our method leverages two new tools of independent interest: advanced joint convexity and privacy profiles. In the future, it would be interesting to study whether our tools can be extended to give concrete bounds on privacy amplification for other privacy notions such as concentrated DP (Dwork and Rothblum, 2016), zeroconcentrated DP (Bun and Steinke, 2016), Rényi DP (Mironov, 2017), and truncated concentrated DP (Bun et al., 2018). A good starting point is Theorem 6 establishing relations between privacy profiles and moment generating functions of the privacy loss random variable. An alternative approach is to extend the recent results for Rényi DP amplification by subsampling without replacement given in (Wang et al., 2018) to more general notions of subsampling and neighbouring relations.
Acknowledgments
This research was initiated during the 2017 Probabilistic Programming Languages workshop hosted by McGill University’s Bellairs Research Institute.
References
 Abadi et al. [2016] Martín Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308–318. ACM, 2016.
 Balle and Wang [2018] Borja Balle and YuXiang Wang. Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In Proceedings of the 35th International Conference on Machine Learning, ICML, 2018.
 Barthe and Olmedo [2013] Gilles Barthe and Federico Olmedo. Beyond differential privacy: Composition theorems and relational logic for fdivergences between probabilistic programs. In International Colloquium on Automata, Languages, and Programming, pages 49–60. Springer, 2013.
 Barthe et al. [2012] Gilles Barthe, Boris Köpf, Federico Olmedo, and Santiago Zanella Béguelin. Probabilistic relational reasoning for differential privacy. In Symposium on Principles of Programming Languages (POPL), pages 97–110, 2012.
 Barthe et al. [2016] Gilles Barthe, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and PierreYves Strub. Proving differential privacy via probabilistic couplings. In Symposium on Logic in Computer Science (LICS), pages 749–758, 2016.
 Bassily et al. [2014] Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Foundations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on, pages 464–473. IEEE, 2014.
 Beimel et al. [2010] Amos Beimel, Shiva Prasad Kasiviswanathan, and Kobbi Nissim. Bounds on the sample complexity for private learning and private data release. In Theory of Cryptography Conference, pages 437–454. Springer, 2010.
 Beimel et al. [2013] Amos Beimel, Kobbi Nissim, and Uri Stemmer. Characterizing the sample complexity of private learners. In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pages 97–110. ACM, 2013.
 Beimel et al. [2014] Amos Beimel, Hai Brenner, Shiva Prasad Kasiviswanathan, and Kobbi Nissim. Bounds on the sample complexity for private learning and private data release. Machine learning, 94(3):401–437, 2014.
 Bun and Steinke [2016] Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography  14th International Conference, TCC 2016B, Beijing, China, October 31  November 3, 2016, Proceedings, Part I, pages 635–658, 2016.
 Bun et al. [2015] Mark Bun, Kobbi Nissim, Uri Stemmer, and Salil Vadhan. Differentially private release and learning of threshold functions. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, pages 634–649. IEEE, 2015.

Bun et al. [2018]
Mark Bun, Cynthia Dwork, Guy Rothblum, and Thomas Steinke.
Composable and versatile privacy via truncated cdp.
In
Symposium on Theory of Computing, STOC
, 2018.  Chaudhuri and Mishra [2006] Kamalika Chaudhuri and Nina Mishra. When random sampling preserves privacy. In Annual International Cryptology Conference, pages 198–213. Springer, 2006.
 Dwork and Roth [2014] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(34):211–407, 2014.
 Dwork and Rothblum [2016] Cynthia Dwork and Guy N Rothblum. Concentrated differential privacy. arXiv preprint arXiv:1603.01887, 2016.
 Dwork et al. [2010] Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. Boosting and differential privacy. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 51–60. IEEE, 2010.

Jälkö et al. [2017]
Joonas Jälkö, Antti Honkela, and Onur Dikmen.
Differentially private variational inference for nonconjugate
models.
In
Proceedings of the ThirtyThird Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia, August 1115, 2017
, 2017.  Kairouz et al. [2017] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. IEEE Transactions on Information Theory, 63(6):4037–4049, 2017.
 Kasiviswanathan et al. [2011] Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM Journal on Computing, 40(3):793–826, 2011.
 Li et al. [2012] Ninghui Li, Wahbeh Qardaji, and Dong Su. On sampling, anonymization, and differential privacy or, kanonymization meets differential privacy. In Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, pages 32–33. ACM, 2012.
 Mironov [2017] Ilya Mironov. Rényi differential privacy. In 30th IEEE Computer Security Foundations Symposium, CSF 2017, Santa Barbara, CA, USA, August 2125, 2017, pages 263–275, 2017.
 Murtagh and Vadhan [2016] Jack Murtagh and Salil Vadhan. The complexity of computing the optimal composition of differential privacy. In Theory of Cryptography Conference, pages 157–175. Springer, 2016.
 Österreicher [2002] Ferdinand Österreicher. Csiszár’s fdivergencesbasic properties. RGMIA Res. Rep. Coll, 2002.
 Park et al. [2016a] Mijung Park, James R. Foulds, Kamalika Chaudhuri, and Max Welling. Private topic modeling. CoRR, abs/1609.04120, 2016a.
 Park et al. [2016b] Mijung Park, James R. Foulds, Kamalika Chaudhuri, and Max Welling. Variational bayes in private settings (VIPS). CoRR, abs/1611.00340, 2016b.
 Sason and Verdú [2016] Igal Sason and Sergio Verdú. divergence inequalities. IEEE Transactions on Information Theory, 62(11):5973–6006, 2016.
 Ullman [2017] Jonathan Ullman. Cs7880: Rigorous approaches to data privacy. http://www.ccs.neu.edu/home/jullman/PrivacyS17/HW1sol.pdf, 2017.
 Vadhan [2017] Salil P. Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography., pages 347–450. 2017.
 Wang et al. [2015] YuXiang Wang, Stephen Fienberg, and Alex Smola. Privacy for free: Posterior sampling and stochastic gradient monte carlo. In Proceedings of the 32nd International Conference on Machine Learning (ICML15), pages 2493–2502, 2015.
 Wang et al. [2016] YuXiang Wang, Jing Lei, and Stephen E. Fienberg. Learning with differential privacy: Stability, learnability and the sufficiency and necessity of erm principle. Journal of Machine Learning Research, 17(183):1–40, 2016.
 Wang et al. [2018] YuXiang Wang, Borja Balle, and Shiva Kasiviswanathan. Subsampled rényi differential privacy and analytical moments accountant. ArXiv eprints, 2018.
Appendix A Proofs from Section 3
Proof of Theorem 2.
It suffices to check that for any ,
Plugging this identity in the definition of we get the desired equality
∎
Proof of Theorem 3.
Suppose and assume without loss of generality that and . Plugging the density of the Laplace distribution in the definition of divergence we get