Hypothesis Testing Interpretations and Renyi Differential Privacy

05/24/2019 ∙ by Borja Balle, et al. ∙ 0

Differential privacy is the gold standard in data privacy, with applications in the public and private sectors. While differential privacy is a formal mathematical definition from the theoretical computer science literature, it is also understood by statisticians and data experts thanks to its hypothesis testing interpretation. This informally says that one cannot effectively test whether a specific individual has contributed her data by observing the output of a private mechanism---any test cannot have both high significance and high power. In this paper, we show that recently proposed relaxations of differential privacy based on Rényi divergence do not enjoy a similar interpretation. Specifically, we introduce the notion of k-generatedness for an arbitrary divergence, where the parameter k captures the hypothesis testing complexity of the divergence. We show that the divergence used for differential privacy is 2-generated, and hence it satisfies the hypothesis testing interpretation. In contrast, Rényi divergence is only ∞-generated, and hence has no hypothesis testing interpretation. We also show sufficient conditions for general divergences to be k-generated.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Differential privacy (Dwork et al., 2006) is a formal notion of data privacy which enables accurate statistical analyses on populations and preserves privacy of the individuals contributing their data. Differential privacy is supported by a rich theory, which simplifies the design and formal analysis of private algorithms. This theory has helped make differential privacy a de facto standard for privacy-preserving data analysis. Over the last years, differential privacy has become in use in the private sector (Kenthapadi et al., 2019) by companies such as Google (Erlingsson et al., 2014; Papernot et al., 2018), Apple (team at Apple, 2017), and Uber (Johnson et al., 2018), and in the public sector by agencies such as the U.S. Census Bureau (Abowd, 2018; Garfinkel et al., 2018). A common challenge across all uses of differential privacy face is to explain it to users and policy makers. Indeed, differential privacy first emerged in the theoretical computer science community, and only was only subsequently considered in other research areas interested in data privacy. For this reason, several works have attempted to provide different interpretations of the semantics of differential privacy in an effort to make it more accessible.

One approach that has been particularly successful, especially when introducing differential privacy to people versed in statistical data analysis, is the hypothesis testing interpretation of differential privacy (Wasserman and Zhou, 2010; Kairouz et al., 2015). One can imagine an experiment where one wants to test through a differentially private mechanism the null hypothesis that an individual (for every possible ) has contributed her data to a particular dataset . One can also imagine that an alternative hypothesis is that the individual has not contributed her data. Then, the definition of differential privacy guarantees—and is in fact equivalent to requiring—that every hypothesis test that is designed for such experiment has either high significance

(it has a high rate of Type I errors), or low

power

(it has a high rate of Type II errors). In fact, this interpretation goes even further because it also explain the privacy parameters as quantities regulating this experiment and the level of acceptable significance and power.

Recently, several relaxations of differential privacy have been proposed (Mironov, 2017; Bun and Steinke, 2016; Bun et al., 2018; Dong et al., 2019)

. Most of these new privacy definitions have been proposed as privacy notions with better composition properties than differential privacy. Having better composition can become a key advantage when a high number of data accesses is needed for a single analysis (e.g., in private deep learning 

(Abadi et al., 2016)). Technically, many these relaxations are formulated as bounds on the Rényi divergence between the distribution obtained when running a private mechanism over a dataset where an individual has contributed her data versus the case when the private mechanism is run over the dataset where ’s data is removed.

In this work we show formally that the relaxations of differential privacy based on the Rényi divergence do not support the same hypothesis testing interpretation as differential privacy. The main technical reason for this is that the Rényi divergence has a finer granularity than the divergence that defines standard differential privacy. To quantify this difference we introduce the notion of -generatedness for a divergence. Intuitively, this notion expresses the number of decisions that are needed in a test to fully characterize the divergence. We show that the divergence that is traditionally used for differential privacy is -generated, and this allows one to interpret differential privacy according to the standard hypothesis testing interpretation. On the other hand, Rényi divergence is not -generated for any finite , though we show that it is -generated (where by we mean that it is infinitely, but countably generated). This says that to characterize these relaxations of differential privacy through an experiment similar to the one used in the hypothesis testing interpretation, one needs to have an infinite number of possible decisions available. This shows a semantics separation between standard differential privacy and relaxations based on Rényi divergence.

In addition we also study a sufficient condition to guarantee that a divergence is -generated: divergences defined as a supremum of a quasi-convex function

over probabilities of

-partitions are -generated. This allows one to construct divergences supporting the hypothesis testing interpretation by requiring them to be defined through an giving a -generated divergence. The condition is also necessary for quasi-convex divergences, characterizing -generation for all quasi-convex divergences.

Summarizing, our contributions are:

  • We introduce the notion of -generatedness for divergences. This notion describes the complexity of a divergence in terms of the number of possible decisions that are needed in a test to fully characterize the divergence.

  • We show that the divergence used to characterize differential privacy is -generated, supporting the usual hypothesis testing interpretation of differential privacy

  • We show that Rényi divergence is -generated, ruling out an hypothesis testing interpretation for privacy notions based on it.

  • We give sufficient and necessary conditions for a quasi-convex divergence to be -generated.

Related work.

Several works have studied the semantics of formal notions of data privacy and differential privacy (Dwork, 2006; Wasserman and Zhou, 2010; Dwork and Roth, 2013; Kifer and Machanavajjhala, 2011, 2014; Hsu et al., 2014; Kasiviswanathan and Smith, 2015). The hypothesis testing interpretation of differential privacy was first introduced by Wasserman and Zhou (2010) and then used in a formal way to study the optimal composition theorem for differential privacy (Kairouz et al., 2015). Several works (Mironov, 2017; Bun and Steinke, 2016; Bun et al., 2018; Dong et al., 2019) have used divergences to reason about privacy leakages. As discussed in the introduction, several of these works are based on Rényi divergence (Mironov, 2017; Bun and Steinke, 2016; Bun et al., 2018). Dong et al. (2019) proposes to define new notions of privacy based on the hypothesis testing interpretation; our work suggests lends support to this direction, showing that other existing variants of privacy do not enjoy a hypothesis testing interpretation. The hypothesis testing interpretation of differential privacy has also inspired techniques in formal verification (Sato, 2016; Sato et al., 2017), including techniques to detect violations in differentially private implementations (Ding et al., 2018).

2 Background: hypothesis testing, privacy, and Rényi divergences

2.1 Hypothesis testing interpretation for -differential privacy

We view randomized algorithms as functions from a set of inputs to the set of discrete probability distributions over a set of outputs. We assume that is equipped with a symmetric adjacency relation—informally, inputs are datasets and two inputs and are adjacent iff they differ in the data of a single individual.

Definition 1 (Differential Privacy (DP) (Dwork et al., 2006)).

Let and . A randomized algorithm is -differentially private if for every pairs of adjacent inputs and , and every subset , we have:

Wasserman and Zhou (2010); Kairouz et al. (2015) proposed a useful interpretation of this guarantee in terms of hypothesis testing. Suppose that and are adjacent inputs. The observer sees the output of running a private mechanism on one of these inputs—but does not see the particular input—and wants to guess whether the input was or .

In the terminology of statistical hypothesis testing, let

be an output of a randomized mechanism , and take the following null and alternative hypotheses:

H0 : came from ,    H1 : came from

One simple way of deciding between the two hypotheses is to fix a rejection region ; if the observation is in then the null hypothesis is rejected, and if the observation is not in then the null hypothesis is not rejected. These decision rules are known as deterministic decision rules.

Each decision rule can err in two possible ways. A false alarm (i.e. Type I error) is when the null hypothesis is true but rejected. This error rate is defined as . On the other hand, the decision rule may incorrectly fail to reject the null hypothesis, a false negative (i.e. Type II error). The probability of missed detection is defined as . There is a natural tradeoff between these two errors—a rule with a larger rejection region will be less likely to incorrectly fail to reject but more likely to incorrectly reject, while a rule with a smaller rejection region will be less likely to incorrectly reject but more likely to incorrectly fail to reject.

Differential privacy can now be reformulated in terms of these error rates.

Theorem 2 (Wasserman and Zhou (2010); Kairouz et al. (2015)).

A randomized algorithm is -differentially private if and only if for every pair of adjacent inputs and , and any rejection region , we have: and .

Intuitively, the lower bound on the sum of the two error rates means that no decision rule is capable of achieving low Type I error and low Type II error simultaneously. Thus, the output distributions from any two adjacent inputs are statistically hard to distinguish.

Following Kairouz et al. (2015), we can also reformulate the definition of differential privacy in terms of a privacy region describing the attainable pairs of Type I and Type II errors.

Theorem 3 (Kairouz et al. (2015)).

A randomized algorithm is -differentially private if and only if for every pair of adjacent inputs and ,

where the privacy region is defined as:

Since the original introduction of differential privacy, researchers have proposed several other variants based on Rényi divergence. The central question of this paper is: can we give similar hypothesis testing interpretations to these (and other) variants of differential privacy?

2.2 Relaxations of differential privacy based on Rényi divergence

We recall here notions of differential privacy based on Rényi divergence.

Definition 4 (Rényi divergence (Renyi, 1961)).

Let . The Rényi divergence of order between two probability distributions and on a space is defined by:

(1)

The above definition does not consider the cases and . However we can see the divergence as a function of for fixed distributions and consider the limit. We have:

The first limit is the well-known KL divergence, while the second limit is the max divergence that bounds the pointwise ratio of probabilities; standard -differential privacy bounds this divergence on distributions from adjacent inputs.

There are several notions of differential privacy based on Rényi divergence, differing in whether the bound holds for all orders or just some orders. The first notion we consider is Rényi Differential Privacy (RDP) (Mironov, 2017).

Definition 5 (Rényi Differential Privacy (RDP) (Mironov, 2017)).

Let . A randomized algorithm is -Rényi differentially private if for every pair and of adjacent inputs, we have

Renyi Differential privacy considers a fixed value of . In contrast, zero-Concentrated Differential Privacy (zCDP) (Bun and Steinke, 2016) quantifies over all possible .

Definition 6 (zero-Concentrated Differential Privacy (zCDP) (Bun and Steinke, 2016)).

A randomized algorithm is -zero concentrated differentially private if for every pairs of adjacent inputs and , we have

(2)

Truncated Concentrated Differential Privacy (tCDP) (Bun et al., 2018) quantifies over all below a given threshold.

Definition 7 (Truncated Concentrated Differential Privacy (tCDP) (Bun et al., 2018)).

A randomized algorithm is -truncated concentrated differentially private if for every pairs of adjacent inputs and , we have

(3)

These notions are all motivated by bounds on the privacy loss of a randomized algorithm. This quantity is defined by

where and are two adjacent inputs. Intuitively, the privacy loss measures how much information is revealed by an output . While output values with a large privacy loss are highly revealing—they are far more likely to result from a private input rather than a different private input

—if these outputs are only seen with small probability then it may be reasonable to discount their influence. The different privacy definitions bound different moments this privacy loss, treated as a random variable when

is drawn from the output of the algorithm on input . The following table summarizes these bounds.

Privacy Bound on privacy loss
-DP
-RDP
-zCDP
-tCDP

In particular, DP bounds the maximum value of the privacy loss,111Technically speaking, this is true only for sufficiently well-behaved distributions (Meiser, 2018). -RDP bounds the -moment, zCDP bounds all moments, and -tCDP bounds the moments up to some cutoff . Many conversions are known between these definitions; for instance, the relaxations of RDP, zCDP, and tCDP are known to sit between and -differential privacy in terms of expressivity, up to some modification in the parameters. While this means that RDP, zCDP, and tCDP can sometimes be analyzed by reduction to standard differential privacy, converting between the different notions requires weakening the parameters and often the privacy analysis is simpler or more precise when working with RDP, zCDP, or tCDP directly. The interested reader can refer to the original papers (Bun and Steinke, 2016; Mironov, 2017; Bun et al., 2018).

3 -generated divergences

In this section we establish that RDP, zCDP and tCDP cannot be described in terms of hypothesis testing. Our main technical tool is the new notion of -generatedness. We first formulate this notion for general divergences, then consider specific divergences from differential privacy.

3.1 Background and notation

We use standard notation and terminology from discrete probability. We let and stand for the unit interval and the positive extended real line respectively. We let denote the set of probability distributions over a set . When is a finite set with elements, i.e.  , we sometimes treat as a subset of . Moreover, for every , the Dirac distribution centered at is defined by if and otherwise. Moreover, we define the convex combination of to be . It is easy to check that for every such that ,

For any probability distribution and , we define to be for every . For any probability distribution and a function (called a deterministic decision rule), we define to be for every . We have .

3.2 Divergences between probability distributions

We start from a very general definition of divergences. Our notation includes the domain of definition of the divergence; this distinction will be important when introducing the concept of -generatedness.

Definition 8.

A divergence is a family of functions

We use the notation to denote the “distance” between distributions and .

Our notion of divergence subsumes the general notion of -divergence from the literature Csiszár (1963); Csiszár and Shields (2004). Moreover, -differential privacy can be reformulated using the -divergence Barthe and Olmedo (2013) defined as follows:

Specifically, a randomized algorithm is -differentially private if and only if for every pair of adjacent inputs and , we have

Many useful properties of divergences have been explored in the literature. Our technical development will involve the following two properties.

  • (post-processing inequality) A divergence satisfies the post-processing inequalityiff for every , .

  • (quasi-convexity) A divergence is quasi-convex iff for every such that and every discrete set ,

These light restrictions are satisfied by many common divergences. Besides Rényi divergences, they also hold for all -divergences (Csiszár, 1963; Csiszár and Shields, 2004).

3.3 -generatedness: definitions and basic properties

We now introduce the notion of -generatedness. Informally, -generatedness is a measure of the number of decisions that are needed in an hypothesis test to characterize a divergence.

Definition 9.

Let . A divergence is -generated if there exists a set such that and for every discrete set and ,

We say that is deterministically -generated if there exists a set such that and for every discrete set and ,

Lemma 10.

The following basic properties hold for all -generated divergences.

  • If is 1-generated, then is constant, i.e.  for every discrete set there exists such that for every , we have .

  • If is (deterministically) -generated, then it is also (deterministically) -generated.

  • If is deterministically -generated and satisfies the post-processing inequality, then it is also -generated.

  • If satisfies the post-processing inequality, then it is also -generated.

The following lemma shows that every -generated divergence is also deterministically -generated, so long as it is quasi-convex.

Theorem 11.

Any -generated quasi-convex divergence is also deterministically -generated.

To prove the equivalence we use a weak version of Birkhoff-von Neumann theorem, which states that every probabilistic decision rule can be decomposed into a convex combination of deterministic ones.

Theorem 12 (Weak Birkhoff-von Neumann).

Let and . Let and such that and . Then for every , there exist , and such that and for any .

3.4 2-generatedness and hypothesis testing

In general, -generated divergences have a close connection to the number of decisions that are needed in an hypothesis test to fully characterize the divergence. For instance, a divergence that is 2-generated has a straightforward interpretation in terms of traditional hypothesis testing interpretation under probabilistic decision rules. For any -generated divergence , we can define an analogous privacy region for hypothesis testing:

From the isomorphism , the following equivalence follows from definitions.

Lemma 13.

A divergence is -generated if and only if the following condition holds:

Here, every function of type can be seen as a (probabilistic) decision rule, determining the acceptance or rejection of a null hypothesis. Therefore, the probabilities and can be seen as the Type I error and Type II error of the corresponding test.

Hence, the above lemma says that is -generated if and only if we can bound, accordingly to the region , the Type I error and Type II errors of every test. Moreover, if a divergence is quasi-convex, this is equivalent to hypothesis testing under the more common deterministic decision rules. Thus, for quasi-convex and -generated , we have the condition on Type I error and Type II errors under every rejection region whenever .

4 Applications to differential privacy

4.1 -differential privacy is 2-generated

In our framework, the hypothesis testing interpretation of -differential privacy follows from the fact that -divergence is -generated.

Theorem 14.

The -divergence is -generated for all .

By Lemma 13 and Theorems 11 and 14, we can re-prove that the notion of differential privacy can be characterized by hypothesis testing with both deterministic and probabilistic decision rules respectively.

It is worth noticing that the result above says that the -divergence can be fully characterized in terms of traditional hypothesis tests, i.e. in terms of binary decision rules and the -divergence over the space . This means that we do not lose anything in looking at differential privacy through the lenses of the hypothesis testing interpretation. This is not the case for other privacy definitions based on Rényi divergence, as we will show in the next section.

4.2 Other examples

Along similar lines to what we showed for the -divergence, one can show that the total variation distance222Given by . is also is -generated.

Recently, Dong et al. (2019) proposed a formal definition of data privacy based on the notion of trade-off function and satisfying the hypothesis testing interpretation, similarly to differential privacy. We can characterize the trade-off functions between Type I errors and Type II errors they use by the following family of divergences

By using we obtain the actual trade-off function. It is easy to show that this family of divergences is also -generated.

4.3 Rényi divergence is -generated

Rényi divergence is not -generated. To see this let and let be defined by and , and . Let . Then a simple calculation shows:

Similar results can be shown for the divergences used for zCDP and tCDP for specific values of the privacy parameters (see the appendix).

In general, Rényi divergences is exactly -generated. First, it is not -generated for any finite .

Theorem 15.

For any , the -Rényi divergence is not -generated for any finite .

By Lemma 10, we conclude that -Rényi divergence is exactly -generated. Moreover, thanks to the continuity of Rényi divergence (Liese and Vajda, 2006), we can generalize this result to uncountable domains and general probability measures.

On the hypothesis testing interpretation of Rényi divergence

The results above imply that we cannot have an analogous of Lemma 13 for Rényi divergence. Specifically, we cannot fully characterize the Rényi divergence between two distributions in terms of hypothesis tests—or more precisely, in terms of binary decision rules and Rényi divergence over the set .

Let be an infinite set. For every finite set , we have for some ,

but this inequality is strict, so if we consider only a decision rules with a finite number of decisions, we do not fully capture the Rényi divergence between two distributions.

In fact, every divergence can be approximated by a -generated version by picking a set such that and setting:

One example of this phenomenon is the -generated version

of the Kullback-Leibler divergence

. This is a well studied divergence often referred to as the binary relative entropy. We can take a similar approach for the Rényi divergence of an arbitrary order , and study these restrictions. However, it is not clear whether these divergences would give good properties for privacy.

If instead one wants to focus just on the traditional version of Rényi divergence, the -generatedness tells us that to fully characterize it through an experiment, we need to have an infinite number of possible decisions available.

5 A characterization of -generated divergences

As we have seen,

-generated divergences satisfy a number of useful properties; known divergences from the literature can be classified according to this parameter

. In the other direction, we give a simple condition to ensure that a divergence is -generated: suprema of quasi-convex functions over size -partitions determine -generated divergences.

Theorem 16.

Let be a countable domain, and let be a quasi-convex function and define the following divergence:

Then the divergence is -generated.

We sketch the proof for discrete probability distributions; it also holds for general measures.

Proof.

The direction is not hard to show: any partition defines a map from each point to its partition, which is a deterministic decision rule .

For the reverse direction , given a decision rule we can apply Theorem 12 to decompose as a convex combination , where each corresponds to a deterministic decision rule . By quasi-convexity of , we have:

As a converse to Theorem 11, this result characterizes -generated quasi-convex divergences. It also serves as a useful tool to construct new divergences with a hypothesis testing interpretation, by varying the quasi-convex function .

6 Conclusion

In this paper we have shown that recent relaxations of differential privacy defined in terms of Rényi divergence do not have a hypothesis testing interpretation similar to the one for standard differential privacy. We introduced the notion of -generatedness for a divergence, which quantifies the number of decisions that are needed in an experiment similar to the ones used in hypothesis testing to fully characterize the divergence. This notion is also a measure of the complexity that tools for formal verification may have. We leave the study of this connection for future work.

References

Appendix A Weak Birkhoff-von Neumann Theorem

Theorem 17 (Weak Birkhoff-von Neumann theorem).

Let and . For any , there are and such that and for any .

The cardinal can be relaxed to countable infinite cardinal , and then the families and may be infinite.

Proof.

Consider the following matrix representation of :

where and for any .

For any , the matrix representation of is

satisfying that for any , there is exactly such that and for . Conversely, any matrix satisfying this condition corresponds to some function . Consider the family of matrix representations of maps of the form . We give an algorithm decomposing to a convex sum of :

  1. Let and . We have for all .

  2. For given and satisfying for all , we define , , and as follows: α_m+1 = min_s max_t (~f_m)_s,t,

    r_m+1 = r_m - α_m+1,

    (g_m+1)_i,j = {1j = argmaxs(~fm)i,s0 (otherwise) ,

    ~f

    _m+1 = ~f_m - α_m+1 ⋅g_m+1.

  3. If then we terminate. Otherwise, we repeat the previous step.

In each step, we obtain the following conditions:

  • We have because can be written as .

  • We have whenever because

  • We have for any from the following equation:

    When and , we obtain while . This implies that the number of in increases in this operation.

  • We also have for all because

Therefore the construction of , , and terminates within steps. When the construction terminates at the step ( also holds), we have a convex decomposition of by where . This implies By taking such that is a matrix representation of , we obtain for any with and . ∎

Appendix B Generalizing quasi-convex characterization of -generation

Recall that Theorem 16 shows that suprema of quasi-convex functions over -partitions of a countable domain are -generated quasi-convex divergences. This section generalizes this result to probability measures over general measurable spaces.

Theorem 18 (-generatedness in measurable setting).

Assume that is quasi-convex and continuous. For any measurable space , we have

Proof.

We easily calculate as follows (functions are assumed to be measurable):

Note that we treat as a finite discrete space. Consider the family of finite sets (discrete spaces) defined as follows:

We fix a measurable function and treat as a subset of . For each , we define a measurable partition of by

We next define and as follows: is the unique element