1 Introduction
Modern statistical analysis and machine learning are overwhelmingly applied to data concerning
people. Valuable datasets generated from personal devices and online behavior of billions of individuals contain data on location, web search histories, media consumption, physical activity, social networks, and more. This is on top of continuing largescale analysis of traditionally sensitive data records, including those collected by hospitals, schools, and the Census. This reality requires the development of tools to perform largescale data analysis in a way that still protects the privacy of individuals represented in the data.Unfortunately, the history of data privacy for many years consisted of adhoc attempts at “anonymizing” personal information, followed by high profile deanonymizations. This includes the release of AOL search logs, deanonymized by the New York Times [BZ06], the Netflix Challenge dataset, deanonymized by Narayanan and Shmatikov [NS08], the realization that participants in genomewide association studies could be identified from aggregate statistics such as minor allele frequencies that were publicly released [HSR08], and the reconstruction of individuallevel census records from aggregate statistical releases [Abo18].
Thus, we urgently needed a rigorous and principled privacypreserving framework to prevent breaches of personal information in data analysis. In this context, differential privacy has put private data analysis on firm theoretical foundations [DMNS06, DKM06]. This definition has become tremendously successful: in addition to an enormous and growing academic literature, it has been adopted as a key privacy technology by Google [EPK14], Apple [App17], Microsoft [DKY17], and the US Census Bureau [Abo18]. The definition of this new concept involves privacy parameters and .
Definition 1.1 ([Dmns06, Dkm06]).
A randomized algorithm that takes as input a dataset consisting of individuals is differentially private (DP) if for any pair of datasets that differ in the record of a single individual, and any event ,
(1) 
When , the guarantee is simply called DP.
In this definition, datasets are fixed
and the probabilities are taken
only over the randomness of the mechanism^{1}^{1}1A randomized algorithm is often referred to as a mechanism in the differential privacy literature.. In particular, the event can take any measurable set in the range of . To achieve differential privacy, a mechanism is necessarily randomized. Take as an example the problem of privately releasing the average cholesterol level of individuals in the dataset , each corresponding to an individual. A privacypreserving mechanism may take the form^{2}^{2}2Here we identify the individual with his/her cholesterol level.The level of the noise term has to be sufficiently large to mask the characteristics
of any individual’s cholesterol level, while not being too large to distort the population average for accuracy purposes. Consequently, the probability distributions of
and are close to each other for any datasets that differ in only one individual record.Differential privacy is most naturally defined through a hypothesis testing problem from the perspective of an attacker who aims to distinguish from based on the output of the mechanism. This statistical viewpoint was first observed by [WZ10] and then further developed by [KOV17], which is a direct inspiration for our work. In short, consider the hypothesis testing problem
(2) 
and call Alice the only individual that is in but not
. As such, rejecting the null hypothesis corresponds to the detection of absence of Alice, whereas accepting the null hypothesis means to detect the presence of Alice in the dataset. Using the output of an
DP mechanism, the power^{3}^{3}3The power is equal to 1 minus the type II error.
of any test at significance level has an upper bound^{4}^{4}4A more precise bound is given in Proposition 2.3. of . This bound is only slightly larger than provided that are small and, therefore, any test is essentially powerless. Put differently, differential privacy with small privacy parameters protects against any inferences of the presence of Alice, or any other individual, in the dataset.Despite its apparent success, there are good reasons to want to relax the original definition of differential privacy, which has led to a long line of proposals for such relaxations. The most important shortcoming is that DP does not tightly handle composition. Composition concerns how privacy guarantees degrade under repetition of mechanisms applied to the same dataset, rendering the design of differentially private algorithms modular. Without compositional properties, it would be near impossible to develop complex differentially private data analysis methods. Although it has been known since the original papers defining differential privacy [DMNS06, DKM06] that the composition of an DP mechanism and an DP mechanism yields an DP mechanism, the corresponding upper bound on the power of any test at significance level no longer tightly characterizes the tradeoff between significance level and power for the testing between and . In [DRV10], Dwork, Rothblum, and Vadhan gave an improved composition theorem, but it fails to capture the correct hypothesis testing tradeoff. This is for a fundamental reason: DP is misparameterized in the sense that the guarantees of the composition of DP mechanisms cannot be characterized by any pair of parameters . Worse, given any , finding the parameter that most tightly approximates the correct tradeoff between significance level and type II error for a composition of a sequence of differentially private algorithms is computationally hard [MV16], and so in practice, one must resort to approximations. Given that composition and modularity are firstorder desiderata for a useful privacy definition, these are substantial drawbacks and often continue to push practical algorithms with meaningful privacy guarantees out of reach.
In light of this, substantial recent effort has been devoted to developing relaxations of differential privacy for which composition can be handled exactly. This line of work includes several variants of “concentrated differential privacy” [DR16, BS16], “Rényi differential privacy” [Mir17], and “truncated concentrated differential privacy” [BDRS18]. These definitions are tailored to be able to exactly and easily track the “privacy cost” of compositions of the most basic primitive in differential privacy, which is the perturbation of a real valued statistic with Gaussian noise.
While this direction of privacy relaxation has been quite fruitful, there are still several places one might wish for improvement. First, these notions of differential privacy no longer have hypothesis testing interpretations, but are rather based on studying divergences that satisfy a certain information processing inequality. There are good reasons to prefer definitions based on hypothesis testing. Most immediately, hypothesis testing based definitions provide an easy way to interpret the guarantees of a privacy definition. More fundamentally, a theorem due to Blackwell (see Theorem 2.7) provides a formal sense in which a tight understanding of the tradeoff between type I and type II errors for the hypothesis testing problem of distinguishing between and contains only more information than any divergence between the distributions and (so long as the divergence satisfies the information processing inequality).
Second, certain simple and fundamental primitives associated with differential privacy—most notably, privacy amplification by subsampling [KLN11]—either fail to apply to the existing relaxations of differential privacy, or require a substantially complex analysis [WBK18]. This is especially problematic when analyzing privacy guarantees of stochastic gradient descent—arguably the most popular presentday optimization algorithm—as subsampling is inherent to this algorithm. At best, this difficulty arising from using these relaxations could be overcome by using complex technical machinery. For example, it necessitated Abadi et al. [ACG16] to develop the numerical moments accountant method to sidestep the issue.
1.1 Our Contributions
In this work, we introduce a new relaxation of differential privacy that avoids these issues and has other attractive properties. Rather than giving a “divergence” based relaxation of differential privacy, we start fresh from the hypothesis testing interpretation of differential privacy, and obtain a new privacy definition by allowing the full tradeoff between type I and type II errors in the simple hypothesis testing problem (2) to be governed by some function . The functional privacy parameter is to this new definition as is to the original definition of differential privacy. Notably, this definition that we term differential privacy (DP)—which captures DP as a special case—is accompanied by a powerful and elegant toolkit for reasoning about composition. Here, we highlight some of our contributions:
An Algebra for Composition. We show that our privacy definition is closed and tight under composition, which means that the tradeoff between type I and type II errors that results from the composition of an DP mechanism with an DP mechanism can always be exactly described by a certain function . This function can be expressed via and in an algebraic fashion, thereby allowing for losslessly reasoning about composition. In contrast, DP or any other privacy definition artificially restricts itself to a small number of parameters. By allowing for a function to keep track of the privacy guarantee of the mechanism, our new privacy definition avoids the pitfall of premature summarization^{5}^{5}5To quote Susan Holmes [Hol19], “premature summarization is the root of all evil in statistics.” in intermediate steps and, consequently, yields a comprehensive delineation of the overall privacy guarantee. See more details in Section 3.
A Central Limit Phenomenon. We define a singleparameter family of
DP that uses the type I and type II error tradeoff in distinguishing the standard normal distribution
from for . This is referred to as Gaussian differential privacy (GDP). By relating to the hypothesis testing interpretation of differential privacy (2), the GDP guarantee can be interpreted as saying that determining whether or not Alice is in the dataset is at least as difficult as telling apart and based on one draw. Moreover, we show that GDP is a “canonical” privacy guarantee in a fundamental sense: for any privacy definition that retains a hypothesis testing interpretation, we prove that the privacy guarantee of composition with an appropriate scaling converges to GDP in the limit. This central limit theorem type of result is remarkable not only because of its profound theoretical implication, but also for providing a computationally tractable tool for analytically approximating the privacy loss under composition. Figure 1 demonstrates that this tool yields surprisingly accurate approximations to the exact tradeoff in testing the hypotheses (2) or substantially improves on the existing privacy guarantee in terms of type I and type II errors. See Section 2.2 and Section 3 for a thorough discussion.A PrimalDual Perspective. We show a general duality between DP and infinite collections of DP guarantees. This duality is useful in two ways. First, it allows one to analyze an algorithm in the framework of DP, and then convert back to an DP guarantee at the end, if desired. More fundamentally, this duality provides an approach to import techniques developed for DP to the framework of DP. As an important application, we use this duality to show how to reason simply about privacy amplification by subsampling for DP, by leveraging existing results for DP. This is in contrast to divergence based notions of privacy, in which reasoning about amplification by subsampling is difficult.
Taken together, this collection of attractive properties render DP a mathematically coherent, computationally efficient, and versatile framework for privacypreserving data analysis. To demonstrate the practical use of this hypothesis testing based framework, we give a substantially sharper analysis of the privacy guarantees of noisy stochastic gradient descent, improving on previous specialpurpose analyses that reasoned about divergences rather than directly about hypothesis testing [ACG16]. This application is presented in Section 5.
2 Differential Privacy and Its Basic Properties
In Section 2.1, we give a formal definition of DP. Section 2.2 introduces Gaussian differential privacy, a special case of DP. In Section 2.3, we highlight some appealing properties of this new privacy notation from an informationtheoretic perspective. Next, Section 2.4 offers a profound connection between DP and DP. Finally, we discuss the group privacy properties of DP.
Before moving on, we first establish several key pieces of notation from the differential privacy literature.

Dataset. A dataset is a collection of records, each corresponding to an individual. Formally, we write the dataset as , and an individual for some abstract space . Two datasets and are said to be neighbors if they differ in exactly one record, that is, there exists an index such that for all and .

Mechanism. A mechanism refers to a randomized algorithm that takes as input a dataset and releases some (randomized) statistics of the dataset in some abstract space . For example, a mechanism can release the average salary of individuals in the dataset plus some random noise.
2.1 Tradeoff Functions and Dp
All variants of differential privacy informally require that it be hard to distinguish any pairs of neighboring datasets based on the information released by a private a mechanism . From an attacker’s perspective, it is natural to formalize this notion of “indistinguishability” as a hypothesis testing problem for two neighboring datasets and :
The output of the mechanism serves as the basis for performing the hypothesis testing problem. Denote by and the probability distributions of the mechanism applied to the two datasets, namely and , respectively. The fundamental difficulty in distinguishing the two hypotheses is best delineated by the optimal tradeoff between the achievable type I and type II errors. More precisely, consider a rejection rule , with type I and type II error rates defined as^{6}^{6}6A rejection rule takes as input the released results of the mechanism. We flip a coin and reject the null hypothesis with probability .
respectively. The two errors satisfy, for example, the constraint is well known to satisfy
(3) 
where the total variation distance is the supremum of over all measurable sets
. Instead of this rough constraint, we seek to characterize the finegrained tradeoff between the two errors. Explicitly, fixing the type I error at
any level, we consider the minimal achievable type II error. This motivates the following definition.Definition 2.1 (tradeoff function).
For any two probability distributions and on the same space, define the tradeoff function as
where the infimum is taken over all (measurable) rejection rules.
The tradeoff function serves as a clearcut boundary of the achievable and unachievable regions of type I and type II errors, rendering itself the complete characterization of the fundamental difficulty in testing between the two hypotheses. In particular, the greater this function is, the harder it is to distinguish the two distributions. For completeness, we remark that the minimal can be achieved by the likelihood ratio test—a fundamental result known as the Neyman–Pearson lemma, which we state in the appendix as Theorem A.1.
A function is called a tradeoff function if it is equal to for some distributions and . Below we give a necessary and sufficient condition for to be a tradeoff function. This characterization reveals, for example, that is a tradeoff function if both and are tradeoff functions. propositiontradeoffthm A function is a tradeoff function if and only if is convex, continuous^{7}^{7}7Convexity itself implies continuity in for . In addition, and implies continuity at 1. Hence, the continuity condition only matters at ., nonincreasing, and for .
Now, we propose a new generalization of differential privacy built on top of tradeoff functions. Below, we write for two functions defined on if for all , and we abuse notation by identifying and with their corresponding probability distributions. Note that if , then in a very strong sense, and are harder to distinguish than and at any level of type I error.
Definition 2.2 (differential privacy).
Let be a tradeoff function. A mechanism is said to be differentially private if
for all neighboring datasets and .
A graphical illustration of this definition is shown in Figure 2. Letting and be the distributions such that , this privacy definition amounts to saying that a mechanism is DP if distinguishing any two neighboring datasets based on the released information is at least as difficult as distinguishing and based on a single draw. In contrast to existing definitions of differential privacy, our new definition is parameterized by a function, as opposed to several real valued parameters (e.g. and ). This functional perspective offers a complete characterization of “privacy”, thereby avoiding the pitfall of summarizing statistical information too early. This fact is crucial to the development of a composition theorem for DP in Section 3. Although this completeness comes at the cost of increased complexity, as we will see in Section 2.2, a simple family of tradeoff functions can often closely capture privacy loss in many scenarios.
Naturally, the definition of DP is symmetric in the same sense as the neighboring relationship, which by definition is symmetric. Observe that this privacy notion also requires
for any neighboring pair . Therefore, it is desirable to restrict our attention to “symmetric” tradeoff functions. Proposition 2.1 shows that this restriction does not lead to any loss of generality. propositionsymmrep Let a mechanism be DP. Then, is DP with , where the inverse function is defined as^{8}^{8}8 Equation 4 is the standard definition of the leftcontinuous inverse of a decreasing function. When is strictly decreasing and and hence bijective as a mapping, (4) corresponds to the inverse function in the ordinary sense, i.e. . However, this is not true in general.
(4) 
for .
Writing , we can express the inverse as , which therefore is also a tradeoff function. As a consequence of this, continues to be a tradeoff function by making use of Definition 2.1 and, moreover, is symmetric in the sense that
Importantly, this symmetrization gives a tighter bound in the privacy definition since . In the remainder of the paper, therefore, tradeoff functions will always be assumed to be symmetric unless otherwise specified. We prove LABEL:{prop:symmetry} in Appendix A.
We conclude this subsection by showing that DP is a generalization of DP. This foreshadows a deeper connection between DP and DP that will be discussed in Section 2.4. Denote
(5) 
for , which is a tradeoff function. Figure 3 shows the graph of this function and its evident symmetry. The following result is adapted from [WZ10].
Proposition 2.3 ([Wz10]).
A mechanism is DP if and only if is DP.
2.2 Gaussian Differential Privacy
This subsection introduces a parametric family of DP guarantees, where is the tradeoff function of two normal distributions. We refer to this specialization as Gaussian differential privacy (GDP). GDP enjoys many desirable properties that lead to its central role in this paper. Among others, we can now precisely define the tradeoff function with a single parameter. To define this notion, let
for . An explicit expression for the tradeoff function reads
(6) 
where denotes the standard normal CDF. For completeness, we provide a proof of (6) in Appendix A. This tradeoff function is decreasing in in the sense that if . We now define GDP:
Definition 2.4.
A mechanism is said to satisfy Gaussian Differential Privacy (GDP) if it is DP. That is,
for all neighboring datasets and .
GDP has several attractive properties. First, this privacy definition is fully described by the single mean parameter of a unitvariance Gaussian distribution, which makes it easy to describe and interpret the privacy guarantees. For instance, one can see from the right panel of Figure 3 that guarantees a reasonable amount of privacy, whereas if , almost nothing is being promised. Second, loosely speaking, GDP occupies a role among all hypothesis testing based notions of privacy that is similar to the role that the Gaussian distribution has among general probability distributions. We formalize this important point by proving central limit theorems for DP in Section 3, which, roughly speaking, says that DP converges to GDP under composition in the limit. Lastly, as shown in the remainder of this subsection, GDP precisely characterizes the Gaussian mechanism, one of the most fundamental building blocks of differential privacy.
Consider the problem of privately releasing a univariate statistic of the dataset . Define the sensitivity of as
where the supremum is over all neighboring datasets. The Gaussian mechanism adds Gaussian noise to the statistic in order to obscure whether is computed on or . The following result shows that the Gaussian mechanism with noise properly scaled to the sensitivity of the statistic satisfies GDP.
Theorem 2.5.
Define the Gaussian mechanism that operates on a statistic as , where . Then, is GDP.
Proof of Theorem 2.5.
Recognizing that are normally distributed with means , respectively, and common variance , we get
By the definition of sensitivity, . Therefore, we get
This completes the proof. ∎
As implied by the proof above, GDP offers the tightest possible privacy bound of the Gaussian mechanism. More precisely, the Gaussian mechanism in Theorem 2.5 satisfies
(7) 
where the infimum is (asymptotically) achieved at the two neighboring datasets such that irrespective of the type I error . As such, the characterization by GDP is precise in the pointwise sense. In contrast, the righthand side of (7) in general is not necessarily a convex function of and, in such case, is not a tradeoff function according to Definition 2.1. This nice property of Gaussian mechanism is related to the logconcavity of Gaussian distributions. See Proposition A.3 for a detailed treatment of logconcave distributions.
2.3 PostProcessing and the Informativeness of Dp
Intuitively, a data analyst cannot make a statistical analysis more disclosive only by processing the output of the mechanism . This is called the postprocessing property, a natural requirement that any notion of privacy, including our definition of DP, should satisfy.
To formalize this point for DP, denote by a (randomized) algorithm that maps the input to some space , yielding a new mechanism that we denote by . The following result confirms the postprocessing property of DP.
Proposition 2.6.
If a mechanism is DP, then its postprocessing is also DP.
Proposition 2.6 is a consequence of the following lemma. Let be the probability distribution of with drawn from . Define likewise. lemmapostrep For any two distributions and , we have
This lemma means that postprocessed distributions can only become more difficult to tell apart than the original distributions from the perspective of tradeoff functions. While the same property holds for many divergence based measures of indistinguishability such as the Rényi divergences^{9}^{9}9See Appendix B for its definition and relation with tradeoff functions. used by the concentrated differential privacy family of definitions [DR16, BS16, Mir17, BDRS18], a consequence of the following theorem is that tradeoff functions offer the most informative measure among all. This remarkable inverse of Proposition 2.6 is due to Blackwell (see also Theorem 2.5 in [KOV17]).
Theorem 2.7 ([Bla50], Theorem 10).
Let be probability distributions on and be probability distributions on . The following two statements are equivalent:

.

There exists a randomized algorithm such that .
To appreciate the implication of this theorem, we begin by observing that postprocessing induces an order^{10}^{10}10This is in general not a partial order. on pairs of distributions, which is called the Blackwell order (see, e.g., [Rag11]). Specifically, if the above condition (b) holds, then we write and interpret this as “ is easier to distinguish than in the Blackwell sense”. Similarly, when , we write and interpret this as “ is easier to distinguish than in the testing sense”. In general, any privacy measure used in defining a privacy notion induces an order on pairs of distributions. Assuming the postprocessing property for the privacy notion, the induced order must be consistent with . Concretely, we denote by the set of all comparable pairs of the order . As is clear, a privacy notion satisfies the postprocessing property if and only if the induced order satisfies .
Therefore, for any reasonable privacy notion, the set must be large enough to contain . However, it is also desirable to have a not too large . For example, consider the privacy notion based on a trivial divergence with for any . Note that is the largest possible and, meanwhile, it is not informative at all in terms of measuring the indistinguishability of two distributions.
The argument above suggests that going from the “minimal” order to the “maximal” order would lead to information loss. Remarkably, DP is the most informative differential privacy notion from this perspective because its induced order satisfies . In stark contrast, this is not true for the order induced by other popular privacy notions such as Rényi differential privacy and DP. We prove this claim in Appendix B and further justify the informativeness of DP by providing general tools that can losslessly convert DP guarantees into divergence based privacy guarantees.
2.4 A PrimalDual Perspective
In this subsection, we show that DP is equivalent to an infinite collection of DP guarantees via the convex conjugate of the tradeoff function. As a consequence of this, we can view DP as the primal privacy representation and, accordingly, its dual representation is the collection of DP guarantees. Taking this powerful viewpoint, many results from the large body of DP work can be carried over to DP in a seamless fashion. In particular, this primaldual perspective is crucial to our analysis of “privacy amplification by subsampling” in Section 4. All proofs are deferred to Appendix A.
First, we present the result that converts a collection of DP guarantees into an DP guarantee.
Proposition 2.8 (Dual to Primal).
Let be an arbitrary index set such that each is associated with and . A mechanism is DP for all if and only if it is DP with
This proposition follows easily from the equivalence of DP and DP. We remark that the function constructed above remains a symmetric tradeoff function.
The more interesting direction is to convert DP into a collection of DP guarantees. Recall that the convex conjugate of a function defined on is defined as
(8) 
To define the conjugate of a tradeoff function , we extend its domain by setting for and . With this adjustment, the supremum is effectively taken over .
[Primal to Dual]propositionftoDPrep For a symmetric tradeoff function , a mechanism is DP if and only if it is DP for all with .
For example, taking , the following corollary provides a lossless conversion from GDP to a collection of DP guarantees. This conversion is exact and, therefore, any other DP guarantee derived for the Gaussian mechanism is implied by this corollary. See Figure 4 for an illustration of this result. corollaryGDPtoDPrep A mechanism is GDP if and only if it is DP for all , where
This corollary has appeared earlier in [BW18]. Along this direction, [BBG18] further proposed “privacy profile”, which in essence corresponds to an infinite collection of . The notion of privacy profile mainly serves as an analytical tool in [BBG18].
The primaldual perspective provides a useful tool through which we can bridge the two privacy definitions. In some cases, it is easier to work with DP by leveraging the interpretation and informativeness of tradeoff functions, as seen from the development of composition theorems for DP in Section 3. Meanwhile, DP is more convenient to work with in the cases where the lower complexity of two parameters is helpful, for example, in the proof of the privacy amplification by subsampling theorem for DP. In short, our approach in Section 4 is to first work in the dual world and use existing subsampling theorems for DP, and then convert the results back to DP using a slightly more advanced version of Proposition 2.8.
2.5 Group Privacy
The notion of DP can be extended to address privacy of a group of individuals, and a question of interest is to quantify how privacy degrades as the group size grows. To set up the notation, we say that two datasets are neighbors (where is an integer) if there exist datasets such that and are neighboring or identical for all . Equivalently, are neighbors if they differ by at most individuals. Accordingly, a mechanism is said to be DP for groups of size if
for all neighbors and .
In the following theorem, we use to denote the fold iterative composition of a function . For example, and . theoremgroupthm If a mechanism is DP, then it is DP for groups of size . In particular, if a mechanism is GDP, then it is GDP for groups of size . For completeness, is a tradeoff function and, moreover, remains symmetric if is symmetric. These two facts and Section 2.5 are proved in Appendix A. As revealed in the proof, the privacy bound in general cannot be improved, thereby showing that the group operation in the DP framework is closed and tight. In addition, it is easy to see that by recognizing that the tradeoff function satisfies . This is consistent with the intuition that detecting changes in groups of individuals becomes easier as the group size increases.
As an interesting consequence of Section 2.5, the group privacy of DP in the limit corresponds to the tradeoff function of two Laplace distributions. Recall that the density of is . propositiongrouplimit Fix and set . As , we have
The convergence is uniform over .
Two remarks are in order. First, is not equal to for any and, therefore, DP is not expressive enough to measure privacy under the group operation. Second, the approximation in this theorem is very accurate even for small . For example, for , the function is within of uniformly over . The proof of Section 2.5 is deferred to Appendix A.
3 Composition and Limit Theorems
Imagine that an analyst performs a sequence of analyses on a private dataset, in which each analysis is informed by prior analyses on the same dataset. Provided that every analysis alone is private, the question is whether all analyses collectively are private, and if so, how the privacy degrades as the number of analyses increases, namely under composition. It is essential for a notion of privacy to gracefully handle composition, without which the privacy analysis of complex algorithms would be almost impossible.
Now, we describe the composition of two mechanisms. For simplicity, this section writes for the space of datasets and abuse notation by using to refer to the number of mechanisms in composition^{11}^{11}11As will be clear later, the use of is consistent with the literature on central limit theorems.. Let be the first mechanism and be the second mechanism. In brief, takes as input the output of the first mechanism in addition to the dataset. With the two mechanisms in place, the joint mechanism is defined as
(9) 
where .^{12}^{12}12Alternatively, we can write , in which case it is necessary to specify that should be run only once in this expression. Roughly speaking, the distribution of is constructed from the marginal distribution of on and the conditional distribution of on given . The composition of more than two mechanisms follows recursively. In general, given a sequence of mechanisms for , we can recursively define the joint mechanism as their composition:
Put differently,
can be interpreted as the trajectory of a Markov chain whose initial distribution is given by
and the transition kernel at each step.Using the language above, the goal of this section is to relate the privacy loss of to that of the mechanisms in the DP framework. In short, Section 3.1 develops a general composition theorem for DP. In Sections 3.2, we identify a central limit theorem phenomenon of composition in the
DP framework, which can be used as an approximation tool, just like we use the central limit theorem for random variables. This approximation is extended to and improved for
DP in Section 3.3.3.1 A General Composition Theorem
The main thrust of this subsection is to demonstrate that the composition of private mechanisms is closed and tight^{13}^{13}13Section 2.5 shows that DP is “closed and tight” in a similar sense, in terms of the guarantees of group privacy. in the DP framework. This result is formally stated in Theorem 3.2, which shows that the composed mechanism remains DP with the tradeoff function taking the form of a certain product. To define the product, consider two tradeoff functions and that are given as and for some probability distributions .
Definition 3.1.
Throughout the paper, write for , and denote by the fold tensor product of . The welldefinedness of rests on the associativity of the tensor product, which we will soon illustrate.
By definition, is also a tradeoff function. Nevertheless, it remains to be shown that the tensor product is welldefined: that is, the definition is independent of the choice of distributions used to represent a tradeoff function. More precisely, assuming for some distributions , we need to ensure that
We defer the proof of this intuitive fact to Appendix C. Below we list some other useful properties^{14}^{14}14These properties make the class of tradeoff functions a commutative monoid. Informally, a monoid is a group without the inverse operator. of the tensor product of tradeoff functions, whose proofs are placed in Appendix D.

The product is commutative and associative.

If , then .

, where the identity tradeoff function for .

. See the definition of inverse in (4).
Note that is the tradeoff function of two identical distributions. Property 4 implies that when are symmetric tradeoff functions, their tensor product is also symmetric.
Now we state the main theorem of this subsection. Its proof is given in Appendix C.
Theorem 3.2.
Let be DP for all . Then the fold composed mechanism is DP.
This theorem shows that the composition of mechanisms remains DP or, put differently, composition is closed in the DP framework. Moreover, the privacy bound in Theorem 3.2 is tight in the sense that it cannot be improved in general. To see this point, consider the case where the second mechanism completely ignores the output of the first mechanism. In that case, the composition obeys
Next, taking neighboring datasets such that and , one concludes that is the tightest possible bound on the twofold composition. For comparison, the advanced composition theorem for DP does not admit a single pair of optimal parameters [DRV10]. In particular, no pair of can exactly capture the privacy of the composition of DP mechanisms. See Section 3.3 and Figure 5 for more elaboration.
In the case of GDP, composition enjoys a simple and convenient formulation due to the identity
where . This formula is due to the rotational invariance of Gaussian distributions with identity covariance. We provide the proof in Appendix D. The following corollary formally summarizes this finding.
Corollary 3.3.
The fold composition of GDP mechanisms is GDP.
On a related note, the pioneering work [KOV17] is the first to take the hypothesis testing viewpoint in the study of privacy composition and to use Blackwell’s theorem as an analytic tool therein. In particular, the authors offered a composition theorem for DP that improves on the advanced composition theorem [DRV10]. Following this work, [MV16] provided a selfcontained proof by essentially proving the “ special case” of Blackwell’s theorem. In contrast, our novel proof of Theorem 3.2 only makes use of the Neyman–Pearson lemma, thereby circumventing the heavy machinery of Blackwell’s theorem. This simple proof better illuminates the essence of the composition theorem.
3.2 Central Limit Theorems for Composition
In this subsection, we identify a central limit theorem type phenomenon of composition in the DP framework. Our main results (Section 3.2 and Section 3.2), roughly speaking, show that tradeoff functions corresponding to small privacy leakage accumulate to for some under composition. Equivalently, the privacy of the composition of many “very private” mechanisms is best measured by GDP in the limit. This identifies GDP as the focal privacy definition among the family of DP privacy guarantees, including DP. More precisely, all privacy definitions that are based on a hypothesis testing formulation of “indistinguishability” converge to the guarantees of GDP in the limit of composition. We remark that [SMM18] proved a conceptually related central limit theorem for random variables corresponding to the privacy loss. This theorem is used to reason about the nonadaptive composition for DP. In contrast, our central limit theorem is concerned with the optimal hypothesis testing tradeoff functions for the composition theorem. Moreover, our theorem is applicable in the setting of composition, where each mechanism is informed by prior interactions with the same database.
From a computational viewpoint, these limit theorems yield an efficient method of approximating the composition of general DP mechanisms. This is very appealing for analyzing the privacy properties of algorithms that are comprised of many building blocks in a sequence. For comparison, the exact computation of privacy guarantees under composition can be computationally hard [MV16] and, thus, tractable approximations are important. Using our central limit theorems, the computation of the exact overall privacy guarantee in Theorem 3.2 can be reduced to the evaluation of a single mean parameter in a GDP guarantee. We give an exemplary application of this powerful technique in Section 5.
Explicitly, the mean parameter in the approximation depends on certain functionals of the tradeoff functions^{15}^{15}15Although the tradeoff function satisfies almost everywhere on , we prefer to use instead of for aesthetic reasons.:
All of these functionals take values in , and the last is defined for such that . In essence, these functionals are calculating moments of the loglikelihood ratio of and such that . In particular, all of these functionals are 0 if , which corresponds to zero privacy leakage. As its name suggests, is the Kullback–Leibler (KL) divergence of and and, therefore, . Detailed elaboration on these functionals is deferred to Appendix D.
In the following theorem,
denotes the vector
and are defined similarly; in addition, and are the and norms, respectively. theoremberryrep Let be symmetric tradeoff functions such that for all . Denoteand assume . Then, for all , we have^{16}^{16}16We can extend to be 1 in and 0 in so that the assumption that can be removed.
(10) 
Loosely speaking, the lower bound in (10) shows that the composition of DP mechanisms for is approximately GDP and, in addition, the upper bound demonstrates that the tightness of this approximation is specified by . In the case where all are equal to some , the theorem reveals that the composition becomes blatantly nonprivate as because . More interesting applications of the theorem, however, are cases where each is close to the “perfect privacy” tradeoff function such that collectively is convergent and vanishes as (see the example in Section 5). For completeness, the condition (which implies that the other three functionals are also finite) for the use of this theorem excludes the case where , in particular, in DP with . We introduce an easy and general technique in Section 3.3 to deal with this issue.
From a technical viewpoint, Theorem 3.2 can be thought of as a Berry–Esseen type central limit theorem. The detailed proof, as well as that of Section 3.2, is provided in Appendix D.
Next, we present an asymptotic version of Section 3.2 for composition of DP mechanisms. In analogue to classical central limit theorems, below we consider a triangular array of mechanisms , where is DP for .
theoremasymprep Let be a triangular array of symmetric tradeoff functions and assume the following limits for some constants and as :




.
Then, we have
uniformly for all .
Taken together, this theorem and Theorem 3.2 amount to saying that the composition is asymptotically GDP. In fact, this asymptotic version is a consequence of Theorem 3.2 as one can show and for the triangular array of symmetric tradeoff functions. This central limit theorem implies that GDP is the only parameterized family of tradeoff functions that can faithfully represent the effects of composition. In contrast, neither  nor DP can losslessly be tracked under composition—the parameterized family of functions cannot represent the tradeoff function that results from the limit under composition.
The conditions for use of this theorem are reminiscent of Lindeberg’s condition in the central limit theorem for independent random variables. The proper scaling of the tradeoff functions is that both and are of order for most . As a consequence, the cumulative effects of the moment functionals are bounded. Furthermore, as with Lindeberg’s condition, the second condition in Theorem 3.2 requires that no single mechanism has a significant contribution to the composition in the limit.
In passing, we remark that and satisfy the relationship in all examples of the application of Section 3.2 in this paper, including Section 3.3 and Section 5.2 as well as their corollaries. As such, the composition is asymptotically GDP. A proof of this interesting observation or the construction of a counterexample is left for future work.
3.3 Composition of DP: Beating Berry–Esseen
Now, we extend central limit theorems to DP. As shown by Proposition 2.3, DP is equivalent to DP and, therefore, it suffices to approximate the tradeoff function by making use of the composition theorem for DP mechanisms. As pointed out in Section 3.2, however, the moment conditions required in the two central limit theorems (Theorems 3.2 and 3.2) exclude the case where .
To overcome the difficulty caused by a nonzero , we start by observing the useful fact that
(11) 
This decomposition, along with the commutative and associative properties of the tensor product, shows
This identity allows us to work on the part and part separately. In short, the part now can be approximated by by invoking Theorem 3.2. For the part, we can iteratively apply the rule
(12) 
to obtain . This rule is best seen via the interesting fact that
is the tradeoff function of shifted uniform distributions
.Now, a central limit theorem for DP is just a stone’s throw away. In what follows, the privacy parameters and are arranged in a triangular array . theoremDPCLTrep Assume
for some nonnegative constants as . Then, we have
uniformly over as .
Remark 1.
A formal proof is provided in Appendix D. The assumptions concerning give rise to . In general, tensoring with is equivalent to scaling the graph of the tradeoff function toward the origin by a factor of . This property is specified by the following formula, and we leave its proof to Appendix D: