Gaussian Differential Privacy

05/07/2019 ∙ by Jinshuo Dong, et al. ∙ University of Pennsylvania 0

Differential privacy has seen remarkable success as a rigorous and practical formalization of data privacy in the past decade. But it also has some well known weaknesses: notably, it does not tightly handle composition. This weakness has inspired several recent relaxations of differential privacy based on Renyi divergences. We propose an alternative relaxation of differential privacy, which we term "f-differential privacy", which has a number of appealing properties and avoids some of the difficulties associated with divergence based relaxations. First, it preserves the hypothesis testing interpretation of differential privacy, which makes its guarantees easily interpretable. It allows for lossless reasoning about composition and post-processing, and notably, a direct way to import existing tools from differential privacy, including privacy amplification by subsampling. We define a canonical single parameter family of definitions within our class which we call "Gaussian Differential Privacy", defined based on the hypothesis testing of two shifted Gaussian distributions. We show that this family is focal by proving a central limit theorem, which shows that the privacy guarantees of any hypothesis-testing based definition of privacy (including differential privacy) converges to Gaussian differential privacy in the limit under composition. We also prove a finite (Berry-Esseen style) version of the central limit theorem, which gives a useful tool for tractably analyzing the exact composition of potentially complicated expressions. We demonstrate the use of the tools we develop by giving an improved analysis of the privacy guarantees of noisy stochastic gradient descent.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Modern statistical analysis and machine learning are overwhelmingly applied to data concerning

people. Valuable datasets generated from personal devices and online behavior of billions of individuals contain data on location, web search histories, media consumption, physical activity, social networks, and more. This is on top of continuing large-scale analysis of traditionally sensitive data records, including those collected by hospitals, schools, and the Census. This reality requires the development of tools to perform large-scale data analysis in a way that still protects the privacy of individuals represented in the data.

Unfortunately, the history of data privacy for many years consisted of ad-hoc attempts at “anonymizing” personal information, followed by high profile de-anonymizations. This includes the release of AOL search logs, de-anonymized by the New York Times [BZ06], the Netflix Challenge dataset, de-anonymized by Narayanan and Shmatikov [NS08], the realization that participants in genome-wide association studies could be identified from aggregate statistics such as minor allele frequencies that were publicly released [HSR08], and the reconstruction of individual-level census records from aggregate statistical releases [Abo18].

Thus, we urgently needed a rigorous and principled privacy-preserving framework to prevent breaches of personal information in data analysis. In this context, differential privacy has put private data analysis on firm theoretical foundations [DMNS06, DKM06]. This definition has become tremendously successful: in addition to an enormous and growing academic literature, it has been adopted as a key privacy technology by Google [EPK14], Apple [App17], Microsoft [DKY17], and the US Census Bureau [Abo18]. The definition of this new concept involves privacy parameters and .

Definition 1.1 ([Dmns06, Dkm06]).

A randomized algorithm that takes as input a dataset consisting of individuals is -differentially private (DP) if for any pair of datasets that differ in the record of a single individual, and any event ,

(1)

When , the guarantee is simply called -DP.

In this definition, datasets are fixed

and the probabilities are taken

only over the randomness of the mechanism111A randomized algorithm is often referred to as a mechanism in the differential privacy literature.. In particular, the event can take any measurable set in the range of . To achieve differential privacy, a mechanism is necessarily randomized. Take as an example the problem of privately releasing the average cholesterol level of individuals in the dataset , each corresponding to an individual. A privacy-preserving mechanism may take the form222Here we identify the individual with his/her cholesterol level.

The level of the noise term has to be sufficiently large to mask the characteristics

of any individual’s cholesterol level, while not being too large to distort the population average for accuracy purposes. Consequently, the probability distributions of

and are close to each other for any datasets that differ in only one individual record.

Differential privacy is most naturally defined through a hypothesis testing problem from the perspective of an attacker who aims to distinguish from based on the output of the mechanism. This statistical viewpoint was first observed by [WZ10] and then further developed by [KOV17], which is a direct inspiration for our work. In short, consider the hypothesis testing problem

(2)

and call Alice the only individual that is in but not

. As such, rejecting the null hypothesis corresponds to the detection of absence of Alice, whereas accepting the null hypothesis means to detect the presence of Alice in the dataset. Using the output of an

-DP mechanism, the power333

The power is equal to 1 minus the type II error.

of any test at significance level has an upper bound444A more precise bound is given in Proposition 2.3. of . This bound is only slightly larger than provided that are small and, therefore, any test is essentially powerless. Put differently, differential privacy with small privacy parameters protects against any inferences of the presence of Alice, or any other individual, in the dataset.

Despite its apparent success, there are good reasons to want to relax the original definition of differential privacy, which has led to a long line of proposals for such relaxations. The most important shortcoming is that -DP does not tightly handle composition. Composition concerns how privacy guarantees degrade under repetition of mechanisms applied to the same dataset, rendering the design of differentially private algorithms modular. Without compositional properties, it would be near impossible to develop complex differentially private data analysis methods. Although it has been known since the original papers defining differential privacy [DMNS06, DKM06] that the composition of an -DP mechanism and an -DP mechanism yields an -DP mechanism, the corresponding upper bound on the power of any test at significance level no longer tightly characterizes the trade-off between significance level and power for the testing between and . In [DRV10], Dwork, Rothblum, and Vadhan gave an improved composition theorem, but it fails to capture the correct hypothesis testing trade-off. This is for a fundamental reason: -DP is mis-parameterized in the sense that the guarantees of the composition of -DP mechanisms cannot be characterized by any pair of parameters . Worse, given any , finding the parameter that most tightly approximates the correct trade-off between significance level and type II error for a composition of a sequence of differentially private algorithms is computationally hard [MV16], and so in practice, one must resort to approximations. Given that composition and modularity are first-order desiderata for a useful privacy definition, these are substantial drawbacks and often continue to push practical algorithms with meaningful privacy guarantees out of reach.

In light of this, substantial recent effort has been devoted to developing relaxations of differential privacy for which composition can be handled exactly. This line of work includes several variants of “concentrated differential privacy” [DR16, BS16], “Rényi differential privacy” [Mir17], and “truncated concentrated differential privacy” [BDRS18]. These definitions are tailored to be able to exactly and easily track the “privacy cost” of compositions of the most basic primitive in differential privacy, which is the perturbation of a real valued statistic with Gaussian noise.

While this direction of privacy relaxation has been quite fruitful, there are still several places one might wish for improvement. First, these notions of differential privacy no longer have hypothesis testing interpretations, but are rather based on studying divergences that satisfy a certain information processing inequality. There are good reasons to prefer definitions based on hypothesis testing. Most immediately, hypothesis testing based definitions provide an easy way to interpret the guarantees of a privacy definition. More fundamentally, a theorem due to Blackwell (see Theorem 2.7) provides a formal sense in which a tight understanding of the trade-off between type I and type II errors for the hypothesis testing problem of distinguishing between and contains only more information than any divergence between the distributions and (so long as the divergence satisfies the information processing inequality).

Second, certain simple and fundamental primitives associated with differential privacy—most notably, privacy amplification by subsampling [KLN11]—either fail to apply to the existing relaxations of differential privacy, or require a substantially complex analysis [WBK18]. This is especially problematic when analyzing privacy guarantees of stochastic gradient descent—arguably the most popular present-day optimization algorithm—as subsampling is inherent to this algorithm. At best, this difficulty arising from using these relaxations could be overcome by using complex technical machinery. For example, it necessitated Abadi et al. [ACG16] to develop the numerical moments accountant method to sidestep the issue.

1.1 Our Contributions

In this work, we introduce a new relaxation of differential privacy that avoids these issues and has other attractive properties. Rather than giving a “divergence” based relaxation of differential privacy, we start fresh from the hypothesis testing interpretation of differential privacy, and obtain a new privacy definition by allowing the full trade-off between type I and type II errors in the simple hypothesis testing problem (2) to be governed by some function . The functional privacy parameter is to this new definition as is to the original definition of differential privacy. Notably, this definition that we term -differential privacy (-DP)—which captures -DP as a special case—is accompanied by a powerful and elegant toolkit for reasoning about composition. Here, we highlight some of our contributions:

An Algebra for Composition. We show that our privacy definition is closed and tight under composition, which means that the trade-off between type I and type II errors that results from the composition of an -DP mechanism with an -DP mechanism can always be exactly described by a certain function . This function can be expressed via and in an algebraic fashion, thereby allowing for losslessly reasoning about composition. In contrast, -DP or any other privacy definition artificially restricts itself to a small number of parameters. By allowing for a function to keep track of the privacy guarantee of the mechanism, our new privacy definition avoids the pitfall of premature summarization555To quote Susan Holmes [Hol19], “premature summarization is the root of all evil in statistics.” in intermediate steps and, consequently, yields a comprehensive delineation of the overall privacy guarantee. See more details in Section 3.

A Central Limit Phenomenon. We define a single-parameter family of

-DP that uses the type I and type II error trade-off in distinguishing the standard normal distribution

from for . This is referred to as Gaussian differential privacy (GDP). By relating to the hypothesis testing interpretation of differential privacy (2), the GDP guarantee can be interpreted as saying that determining whether or not Alice is in the dataset is at least as difficult as telling apart and based on one draw. Moreover, we show that GDP is a “canonical” privacy guarantee in a fundamental sense: for any privacy definition that retains a hypothesis testing interpretation, we prove that the privacy guarantee of composition with an appropriate scaling converges to GDP in the limit. This central limit theorem type of result is remarkable not only because of its profound theoretical implication, but also for providing a computationally tractable tool for analytically approximating the privacy loss under composition. Figure 1 demonstrates that this tool yields surprisingly accurate approximations to the exact trade-off in testing the hypotheses (2) or substantially improves on the existing privacy guarantee in terms of type I and type II errors. See Section 2.2 and Section 3 for a thorough discussion.

Figure 1: Left: Our central limit theorem based approximation (in blue) is very close to the composition of just mechanisms (in red). The tightest possible approximation via an -DP guarantee (in back) is substantially looser. See Figure 5

for parameter setup. Right: Privacy analysis of stochastic gradient descent used to train a convolutional neural network on MNIST

[LC10]. The -DP framework yields a privacy guarantee (in red) for this problem that is significantly better than the optimal -DP guarantee (in black) that is derived from the moments accountant (MA) method [ACG16]. Put simply, our analysis shows that stochastic gradient descent releases less sensitive information than expected in the literature. See Section 5 for more plots and details.

A Primal-Dual Perspective. We show a general duality between -DP and infinite collections of -DP guarantees. This duality is useful in two ways. First, it allows one to analyze an algorithm in the framework of -DP, and then convert back to an -DP guarantee at the end, if desired. More fundamentally, this duality provides an approach to import techniques developed for -DP to the framework of -DP. As an important application, we use this duality to show how to reason simply about privacy amplification by subsampling for -DP, by leveraging existing results for -DP. This is in contrast to divergence based notions of privacy, in which reasoning about amplification by subsampling is difficult.

Taken together, this collection of attractive properties render -DP a mathematically coherent, computationally efficient, and versatile framework for privacy-preserving data analysis. To demonstrate the practical use of this hypothesis testing based framework, we give a substantially sharper analysis of the privacy guarantees of noisy stochastic gradient descent, improving on previous special-purpose analyses that reasoned about divergences rather than directly about hypothesis testing [ACG16]. This application is presented in Section 5.

2 -Differential Privacy and Its Basic Properties

In Section 2.1, we give a formal definition of -DP. Section 2.2 introduces Gaussian differential privacy, a special case of -DP. In Section 2.3, we highlight some appealing properties of this new privacy notation from an information-theoretic perspective. Next, Section 2.4 offers a profound connection between -DP and -DP. Finally, we discuss the group privacy properties of -DP.

Before moving on, we first establish several key pieces of notation from the differential privacy literature.

  • Dataset. A dataset is a collection of records, each corresponding to an individual. Formally, we write the dataset as , and an individual for some abstract space . Two datasets and are said to be neighbors if they differ in exactly one record, that is, there exists an index such that for all and .

  • Mechanism. A mechanism refers to a randomized algorithm that takes as input a dataset and releases some (randomized) statistics of the dataset in some abstract space . For example, a mechanism can release the average salary of individuals in the dataset plus some random noise.

2.1 Trade-off Functions and -Dp

All variants of differential privacy informally require that it be hard to distinguish any pairs of neighboring datasets based on the information released by a private a mechanism . From an attacker’s perspective, it is natural to formalize this notion of “indistinguishability” as a hypothesis testing problem for two neighboring datasets and :

The output of the mechanism serves as the basis for performing the hypothesis testing problem. Denote by and the probability distributions of the mechanism applied to the two datasets, namely and , respectively. The fundamental difficulty in distinguishing the two hypotheses is best delineated by the optimal trade-off between the achievable type I and type II errors. More precisely, consider a rejection rule , with type I and type II error rates defined as666A rejection rule takes as input the released results of the mechanism. We flip a coin and reject the null hypothesis with probability .

respectively. The two errors satisfy, for example, the constraint is well known to satisfy

(3)

where the total variation distance is the supremum of over all measurable sets

. Instead of this rough constraint, we seek to characterize the fine-grained trade-off between the two errors. Explicitly, fixing the type I error at

any level, we consider the minimal achievable type II error. This motivates the following definition.

Definition 2.1 (trade-off function).

For any two probability distributions and on the same space, define the trade-off function as

where the infimum is taken over all (measurable) rejection rules.

The trade-off function serves as a clear-cut boundary of the achievable and unachievable regions of type I and type II errors, rendering itself the complete characterization of the fundamental difficulty in testing between the two hypotheses. In particular, the greater this function is, the harder it is to distinguish the two distributions. For completeness, we remark that the minimal can be achieved by the likelihood ratio test—a fundamental result known as the Neyman–Pearson lemma, which we state in the appendix as Theorem A.1.

A function is called a trade-off function if it is equal to for some distributions and . Below we give a necessary and sufficient condition for to be a trade-off function. This characterization reveals, for example, that is a trade-off function if both and are trade-off functions. propositiontradeoffthm A function is a trade-off function if and only if is convex, continuous777Convexity itself implies continuity in for . In addition, and implies continuity at 1. Hence, the continuity condition only matters at ., non-increasing, and for .

Now, we propose a new generalization of differential privacy built on top of trade-off functions. Below, we write for two functions defined on if for all , and we abuse notation by identifying and with their corresponding probability distributions. Note that if , then in a very strong sense, and are harder to distinguish than and at any level of type I error.

Definition 2.2 (-differential privacy).

Let be a trade-off function. A mechanism is said to be -differentially private if

for all neighboring datasets and .

A graphical illustration of this definition is shown in Figure 2. Letting and be the distributions such that , this privacy definition amounts to saying that a mechanism is -DP if distinguishing any two neighboring datasets based on the released information is at least as difficult as distinguishing and based on a single draw. In contrast to existing definitions of differential privacy, our new definition is parameterized by a function, as opposed to several real valued parameters (e.g.  and ). This functional perspective offers a complete characterization of “privacy”, thereby avoiding the pitfall of summarizing statistical information too early. This fact is crucial to the development of a composition theorem for -DP in Section 3. Although this completeness comes at the cost of increased complexity, as we will see in Section 2.2, a simple family of trade-off functions can often closely capture privacy loss in many scenarios.

Figure 2: Three different examples of . Only the dashed line corresponds to a trade-off function satisfying -DP.

Naturally, the definition of -DP is symmetric in the same sense as the neighboring relationship, which by definition is symmetric. Observe that this privacy notion also requires

for any neighboring pair . Therefore, it is desirable to restrict our attention to “symmetric” trade-off functions. Proposition 2.1 shows that this restriction does not lead to any loss of generality. propositionsymmrep Let a mechanism be -DP. Then, is -DP with , where the inverse function is defined as888 Equation 4 is the standard definition of the left-continuous inverse of a decreasing function. When is strictly decreasing and and hence bijective as a mapping, (4) corresponds to the inverse function in the ordinary sense, i.e. . However, this is not true in general.

(4)

for .

Writing , we can express the inverse as , which therefore is also a trade-off function. As a consequence of this, continues to be a trade-off function by making use of Definition 2.1 and, moreover, is symmetric in the sense that

Importantly, this symmetrization gives a tighter bound in the privacy definition since . In the remainder of the paper, therefore, trade-off functions will always be assumed to be symmetric unless otherwise specified. We prove LABEL:{prop:symmetry} in Appendix A.

We conclude this subsection by showing that -DP is a generalization of -DP. This foreshadows a deeper connection between -DP and -DP that will be discussed in Section 2.4. Denote

(5)

for , which is a trade-off function. Figure 3 shows the graph of this function and its evident symmetry. The following result is adapted from [WZ10].

Proposition 2.3 ([Wz10]).

A mechanism is -DP if and only if is -DP.

Figure 3: Left: is a piecewise linear function and is symmetric with respect to the line . It has (nontrivial) slopes and intercepts

. Right: Trade-off functions of unit-variance Gaussian distributions with different means. The case of

is reasonably private, is borderline private, and is basically non-private: an adversary can control type I and type II errors simultaneously at only 0.07. In the case of (almost coincides with the axes), the two errors both can be as small as 0.001.

2.2 Gaussian Differential Privacy

This subsection introduces a parametric family of -DP guarantees, where is the trade-off function of two normal distributions. We refer to this specialization as Gaussian differential privacy (GDP). GDP enjoys many desirable properties that lead to its central role in this paper. Among others, we can now precisely define the trade-off function with a single parameter. To define this notion, let

for . An explicit expression for the trade-off function reads

(6)

where denotes the standard normal CDF. For completeness, we provide a proof of (6) in Appendix A. This trade-off function is decreasing in in the sense that if . We now define GDP:

Definition 2.4.

A mechanism is said to satisfy -Gaussian Differential Privacy (-GDP) if it is -DP. That is,

for all neighboring datasets and .

GDP has several attractive properties. First, this privacy definition is fully described by the single mean parameter of a unit-variance Gaussian distribution, which makes it easy to describe and interpret the privacy guarantees. For instance, one can see from the right panel of Figure 3 that guarantees a reasonable amount of privacy, whereas if , almost nothing is being promised. Second, loosely speaking, GDP occupies a role among all hypothesis testing based notions of privacy that is similar to the role that the Gaussian distribution has among general probability distributions. We formalize this important point by proving central limit theorems for -DP in Section 3, which, roughly speaking, says that -DP converges to GDP under composition in the limit. Lastly, as shown in the remainder of this subsection, GDP precisely characterizes the Gaussian mechanism, one of the most fundamental building blocks of differential privacy.

Consider the problem of privately releasing a univariate statistic of the dataset . Define the sensitivity of as

where the supremum is over all neighboring datasets. The Gaussian mechanism adds Gaussian noise to the statistic in order to obscure whether is computed on or . The following result shows that the Gaussian mechanism with noise properly scaled to the sensitivity of the statistic satisfies GDP.

Theorem 2.5.

Define the Gaussian mechanism that operates on a statistic as , where . Then, is -GDP.

Proof of Theorem 2.5.

Recognizing that are normally distributed with means , respectively, and common variance , we get

By the definition of sensitivity, . Therefore, we get

This completes the proof. ∎

As implied by the proof above, GDP offers the tightest possible privacy bound of the Gaussian mechanism. More precisely, the Gaussian mechanism in Theorem 2.5 satisfies

(7)

where the infimum is (asymptotically) achieved at the two neighboring datasets such that irrespective of the type I error . As such, the characterization by GDP is precise in the pointwise sense. In contrast, the right-hand side of (7) in general is not necessarily a convex function of and, in such case, is not a trade-off function according to Definition 2.1. This nice property of Gaussian mechanism is related to the log-concavity of Gaussian distributions. See Proposition A.3 for a detailed treatment of log-concave distributions.

2.3 Post-Processing and the Informativeness of -Dp

Intuitively, a data analyst cannot make a statistical analysis more disclosive only by processing the output of the mechanism . This is called the post-processing property, a natural requirement that any notion of privacy, including our definition of -DP, should satisfy.

To formalize this point for -DP, denote by a (randomized) algorithm that maps the input to some space , yielding a new mechanism that we denote by . The following result confirms the post-processing property of -DP.

Proposition 2.6.

If a mechanism is -DP, then its post-processing is also -DP.

Proposition 2.6 is a consequence of the following lemma. Let be the probability distribution of with drawn from . Define likewise. lemmapostrep For any two distributions and , we have

This lemma means that post-processed distributions can only become more difficult to tell apart than the original distributions from the perspective of trade-off functions. While the same property holds for many divergence based measures of indistinguishability such as the Rényi divergences999See Appendix B for its definition and relation with trade-off functions. used by the concentrated differential privacy family of definitions [DR16, BS16, Mir17, BDRS18], a consequence of the following theorem is that trade-off functions offer the most informative measure among all. This remarkable inverse of Proposition 2.6 is due to Blackwell (see also Theorem 2.5 in [KOV17]).

Theorem 2.7 ([Bla50], Theorem 10).

Let be probability distributions on and be probability distributions on . The following two statements are equivalent:

  1. .

  2. There exists a randomized algorithm such that .

To appreciate the implication of this theorem, we begin by observing that post-processing induces an order101010This is in general not a partial order. on pairs of distributions, which is called the Blackwell order (see, e.g., [Rag11]). Specifically, if the above condition (b) holds, then we write and interpret this as “ is easier to distinguish than in the Blackwell sense”. Similarly, when , we write and interpret this as “ is easier to distinguish than in the testing sense”. In general, any privacy measure used in defining a privacy notion induces an order on pairs of distributions. Assuming the post-processing property for the privacy notion, the induced order must be consistent with . Concretely, we denote by the set of all comparable pairs of the order . As is clear, a privacy notion satisfies the post-processing property if and only if the induced order satisfies .

Therefore, for any reasonable privacy notion, the set must be large enough to contain . However, it is also desirable to have a not too large . For example, consider the privacy notion based on a trivial divergence with for any . Note that is the largest possible and, meanwhile, it is not informative at all in terms of measuring the indistinguishability of two distributions.

The argument above suggests that going from the “minimal” order to the “maximal” order would lead to information loss. Remarkably, -DP is the most informative differential privacy notion from this perspective because its induced order satisfies . In stark contrast, this is not true for the order induced by other popular privacy notions such as Rényi differential privacy and -DP. We prove this claim in Appendix B and further justify the informativeness of -DP by providing general tools that can losslessly convert -DP guarantees into divergence based privacy guarantees.

2.4 A Primal-Dual Perspective

In this subsection, we show that -DP is equivalent to an infinite collection of -DP guarantees via the convex conjugate of the trade-off function. As a consequence of this, we can view -DP as the primal privacy representation and, accordingly, its dual representation is the collection of -DP guarantees. Taking this powerful viewpoint, many results from the large body of -DP work can be carried over to -DP in a seamless fashion. In particular, this primal-dual perspective is crucial to our analysis of “privacy amplification by subsampling” in Section 4. All proofs are deferred to Appendix A.

First, we present the result that converts a collection of -DP guarantees into an -DP guarantee.

Proposition 2.8 (Dual to Primal).

Let be an arbitrary index set such that each is associated with and . A mechanism is -DP for all if and only if it is -DP with

This proposition follows easily from the equivalence of -DP and -DP. We remark that the function constructed above remains a symmetric trade-off function.

The more interesting direction is to convert -DP into a collection of -DP guarantees. Recall that the convex conjugate of a function defined on is defined as

(8)

To define the conjugate of a trade-off function , we extend its domain by setting for and . With this adjustment, the supremum is effectively taken over .

[Primal to Dual]propositionftoDPrep For a symmetric trade-off function , a mechanism is -DP if and only if it is -DP for all with .

Figure 4: Each -DP guarantee corresponds to two supporting linear functions (symmetric to each other) to the trade-off function describing the complete -DP guarantee. In general, characterizing a privacy guarantee using only a subset of -DP guarantees (for example, only those with small ) would result in information loss.

For example, taking , the following corollary provides a lossless conversion from GDP to a collection of -DP guarantees. This conversion is exact and, therefore, any other -DP guarantee derived for the Gaussian mechanism is implied by this corollary. See Figure 4 for an illustration of this result. corollaryGDPtoDPrep A mechanism is -GDP if and only if it is -DP for all , where

This corollary has appeared earlier in [BW18]. Along this direction, [BBG18] further proposed “privacy profile”, which in essence corresponds to an infinite collection of . The notion of privacy profile mainly serves as an analytical tool in [BBG18].

The primal-dual perspective provides a useful tool through which we can bridge the two privacy definitions. In some cases, it is easier to work with -DP by leveraging the interpretation and informativeness of trade-off functions, as seen from the development of composition theorems for -DP in Section 3. Meanwhile, -DP is more convenient to work with in the cases where the lower complexity of two parameters is helpful, for example, in the proof of the privacy amplification by subsampling theorem for -DP. In short, our approach in Section 4 is to first work in the dual world and use existing subsampling theorems for -DP, and then convert the results back to -DP using a slightly more advanced version of Proposition 2.8.

2.5 Group Privacy

The notion of -DP can be extended to address privacy of a group of individuals, and a question of interest is to quantify how privacy degrades as the group size grows. To set up the notation, we say that two datasets are -neighbors (where is an integer) if there exist datasets such that and are neighboring or identical for all . Equivalently, are -neighbors if they differ by at most individuals. Accordingly, a mechanism is said to be -DP for groups of size if

for all -neighbors and .

In the following theorem, we use to denote the -fold iterative composition of a function . For example, and . theoremgroupthm If a mechanism is -DP, then it is -DP for groups of size . In particular, if a mechanism is -GDP, then it is -GDP for groups of size . For completeness, is a trade-off function and, moreover, remains symmetric if is symmetric. These two facts and Section 2.5 are proved in Appendix A. As revealed in the proof, the privacy bound in general cannot be improved, thereby showing that the group operation in the -DP framework is closed and tight. In addition, it is easy to see that by recognizing that the trade-off function satisfies . This is consistent with the intuition that detecting changes in groups of individuals becomes easier as the group size increases.

As an interesting consequence of Section 2.5, the group privacy of -DP in the limit corresponds to the trade-off function of two Laplace distributions. Recall that the density of is . propositiongrouplimit Fix and set . As , we have

The convergence is uniform over .

Two remarks are in order. First, is not equal to for any and, therefore, -DP is not expressive enough to measure privacy under the group operation. Second, the approximation in this theorem is very accurate even for small . For example, for , the function is within of uniformly over . The proof of Section 2.5 is deferred to Appendix A.

3 Composition and Limit Theorems

Imagine that an analyst performs a sequence of analyses on a private dataset, in which each analysis is informed by prior analyses on the same dataset. Provided that every analysis alone is private, the question is whether all analyses collectively are private, and if so, how the privacy degrades as the number of analyses increases, namely under composition. It is essential for a notion of privacy to gracefully handle composition, without which the privacy analysis of complex algorithms would be almost impossible.

Now, we describe the composition of two mechanisms. For simplicity, this section writes for the space of datasets and abuse notation by using to refer to the number of mechanisms in composition111111As will be clear later, the use of is consistent with the literature on central limit theorems.. Let be the first mechanism and be the second mechanism. In brief, takes as input the output of the first mechanism in addition to the dataset. With the two mechanisms in place, the joint mechanism is defined as

(9)

where .121212Alternatively, we can write , in which case it is necessary to specify that should be run only once in this expression. Roughly speaking, the distribution of is constructed from the marginal distribution of on and the conditional distribution of on given . The composition of more than two mechanisms follows recursively. In general, given a sequence of mechanisms for , we can recursively define the joint mechanism as their composition:

Put differently,

can be interpreted as the trajectory of a Markov chain whose initial distribution is given by

and the transition kernel at each step.

Using the language above, the goal of this section is to relate the privacy loss of to that of the mechanisms in the -DP framework. In short, Section 3.1 develops a general composition theorem for -DP. In Sections 3.2, we identify a central limit theorem phenomenon of composition in the

-DP framework, which can be used as an approximation tool, just like we use the central limit theorem for random variables. This approximation is extended to and improved for

-DP in Section 3.3.

3.1 A General Composition Theorem

The main thrust of this subsection is to demonstrate that the composition of private mechanisms is closed and tight131313Section 2.5 shows that -DP is “closed and tight” in a similar sense, in terms of the guarantees of group privacy. in the -DP framework. This result is formally stated in Theorem 3.2, which shows that the composed mechanism remains -DP with the trade-off function taking the form of a certain product. To define the product, consider two trade-off functions and that are given as and for some probability distributions .

Definition 3.1.

The tensor product of two trade-off functions

and is defined as

Throughout the paper, write for , and denote by the -fold tensor product of . The well-definedness of rests on the associativity of the tensor product, which we will soon illustrate.

By definition, is also a trade-off function. Nevertheless, it remains to be shown that the tensor product is well-defined: that is, the definition is independent of the choice of distributions used to represent a trade-off function. More precisely, assuming for some distributions , we need to ensure that

We defer the proof of this intuitive fact to Appendix C. Below we list some other useful properties141414These properties make the class of trade-off functions a commutative monoid. Informally, a monoid is a group without the inverse operator. of the tensor product of trade-off functions, whose proofs are placed in Appendix D.

  1. The product is commutative and associative.

  2. If , then .

  3. , where the identity trade-off function for .

  4. . See the definition of inverse in (4).

Note that is the trade-off function of two identical distributions. Property 4 implies that when are symmetric trade-off functions, their tensor product is also symmetric.

Now we state the main theorem of this subsection. Its proof is given in Appendix C.

Theorem 3.2.

Let be -DP for all . Then the -fold composed mechanism is -DP.

This theorem shows that the composition of mechanisms remains -DP or, put differently, composition is closed in the -DP framework. Moreover, the privacy bound in Theorem 3.2 is tight in the sense that it cannot be improved in general. To see this point, consider the case where the second mechanism completely ignores the output of the first mechanism. In that case, the composition obeys

Next, taking neighboring datasets such that and , one concludes that is the tightest possible bound on the two-fold composition. For comparison, the advanced composition theorem for -DP does not admit a single pair of optimal parameters [DRV10]. In particular, no pair of can exactly capture the privacy of the composition of -DP mechanisms. See Section 3.3 and Figure 5 for more elaboration.

In the case of GDP, composition enjoys a simple and convenient formulation due to the identity

where . This formula is due to the rotational invariance of Gaussian distributions with identity covariance. We provide the proof in Appendix D. The following corollary formally summarizes this finding.

Corollary 3.3.

The -fold composition of -GDP mechanisms is -GDP.

On a related note, the pioneering work [KOV17] is the first to take the hypothesis testing viewpoint in the study of privacy composition and to use Blackwell’s theorem as an analytic tool therein. In particular, the authors offered a composition theorem for -DP that improves on the advanced composition theorem [DRV10]. Following this work, [MV16] provided a self-contained proof by essentially proving the “ special case” of Blackwell’s theorem. In contrast, our novel proof of Theorem 3.2 only makes use of the Neyman–Pearson lemma, thereby circumventing the heavy machinery of Blackwell’s theorem. This simple proof better illuminates the essence of the composition theorem.

3.2 Central Limit Theorems for Composition

In this subsection, we identify a central limit theorem type phenomenon of composition in the -DP framework. Our main results (Section 3.2 and Section 3.2), roughly speaking, show that trade-off functions corresponding to small privacy leakage accumulate to for some under composition. Equivalently, the privacy of the composition of many “very private” mechanisms is best measured by GDP in the limit. This identifies GDP as the focal privacy definition among the family of -DP privacy guarantees, including -DP. More precisely, all privacy definitions that are based on a hypothesis testing formulation of “indistinguishability” converge to the guarantees of GDP in the limit of composition. We remark that [SMM18] proved a conceptually related central limit theorem for random variables corresponding to the privacy loss. This theorem is used to reason about the non-adaptive composition for -DP. In contrast, our central limit theorem is concerned with the optimal hypothesis testing trade-off functions for the composition theorem. Moreover, our theorem is applicable in the setting of composition, where each mechanism is informed by prior interactions with the same database.

From a computational viewpoint, these limit theorems yield an efficient method of approximating the composition of general -DP mechanisms. This is very appealing for analyzing the privacy properties of algorithms that are comprised of many building blocks in a sequence. For comparison, the exact computation of privacy guarantees under composition can be computationally hard [MV16] and, thus, tractable approximations are important. Using our central limit theorems, the computation of the exact overall privacy guarantee in Theorem 3.2 can be reduced to the evaluation of a single mean parameter in a GDP guarantee. We give an exemplary application of this powerful technique in Section 5.

Explicitly, the mean parameter in the approximation depends on certain functionals of the trade-off functions151515Although the trade-off function satisfies almost everywhere on , we prefer to use instead of for aesthetic reasons.:

All of these functionals take values in , and the last is defined for such that . In essence, these functionals are calculating moments of the log-likelihood ratio of and such that . In particular, all of these functionals are 0 if , which corresponds to zero privacy leakage. As its name suggests, is the Kullback–Leibler (KL) divergence of and and, therefore, . Detailed elaboration on these functionals is deferred to Appendix D.

In the following theorem,

denotes the vector

and are defined similarly; in addition, and are the and norms, respectively. theoremberryrep Let be symmetric trade-off functions such that for all . Denote

and assume . Then, for all , we have161616We can extend to be 1 in and 0 in so that the assumption that can be removed.

(10)

Loosely speaking, the lower bound in (10) shows that the composition of -DP mechanisms for is approximately -GDP and, in addition, the upper bound demonstrates that the tightness of this approximation is specified by . In the case where all are equal to some , the theorem reveals that the composition becomes blatantly non-private as because . More interesting applications of the theorem, however, are cases where each is close to the “perfect privacy” trade-off function such that collectively is convergent and vanishes as (see the example in Section 5). For completeness, the condition (which implies that the other three functionals are also finite) for the use of this theorem excludes the case where , in particular, in -DP with . We introduce an easy and general technique in Section 3.3 to deal with this issue.

From a technical viewpoint, Theorem 3.2 can be thought of as a Berry–Esseen type central limit theorem. The detailed proof, as well as that of Section 3.2, is provided in Appendix D.

Next, we present an asymptotic version of Section 3.2 for composition of -DP mechanisms. In analogue to classical central limit theorems, below we consider a triangular array of mechanisms , where is -DP for .

theoremasymprep Let be a triangular array of symmetric trade-off functions and assume the following limits for some constants and as :

  1. .

Then, we have

uniformly for all .

Taken together, this theorem and Theorem 3.2 amount to saying that the composition is asymptotically -GDP. In fact, this asymptotic version is a consequence of Theorem 3.2 as one can show and for the triangular array of symmetric trade-off functions. This central limit theorem implies that GDP is the only parameterized family of trade-off functions that can faithfully represent the effects of composition. In contrast, neither - nor -DP can losslessly be tracked under composition—the parameterized family of functions cannot represent the trade-off function that results from the limit under composition.

The conditions for use of this theorem are reminiscent of Lindeberg’s condition in the central limit theorem for independent random variables. The proper scaling of the trade-off functions is that both and are of order for most . As a consequence, the cumulative effects of the moment functionals are bounded. Furthermore, as with Lindeberg’s condition, the second condition in Theorem 3.2 requires that no single mechanism has a significant contribution to the composition in the limit.

In passing, we remark that and satisfy the relationship in all examples of the application of Section 3.2 in this paper, including Section 3.3 and Section 5.2 as well as their corollaries. As such, the composition is asymptotically -GDP. A proof of this interesting observation or the construction of a counterexample is left for future work.

3.3 Composition of -DP: Beating Berry–Esseen

Now, we extend central limit theorems to -DP. As shown by Proposition 2.3, -DP is equivalent to -DP and, therefore, it suffices to approximate the trade-off function by making use of the composition theorem for -DP mechanisms. As pointed out in Section 3.2, however, the moment conditions required in the two central limit theorems (Theorems 3.2 and 3.2) exclude the case where .

To overcome the difficulty caused by a nonzero , we start by observing the useful fact that

(11)

This decomposition, along with the commutative and associative properties of the tensor product, shows

This identity allows us to work on the part and part separately. In short, the part now can be approximated by by invoking Theorem 3.2. For the part, we can iteratively apply the rule

(12)

to obtain . This rule is best seen via the interesting fact that

is the trade-off function of shifted uniform distributions

.

Now, a central limit theorem for -DP is just a stone’s throw away. In what follows, the privacy parameters and are arranged in a triangular array . theoremDPCLTrep Assume

for some nonnegative constants as . Then, we have

uniformly over as .

Remark 1.

A formal proof is provided in Appendix D. The assumptions concerning give rise to . In general, tensoring with is equivalent to scaling the graph of the trade-off function toward the origin by a factor of . This property is specified by the following formula, and we leave its proof to Appendix D: