1 Introduction
Differential privacy (Dwork et al., 2006) is a formal notion of data privacy which enables accurate statistical analyses on populations and preserves privacy of the individuals contributing their data. Differential privacy is supported by a rich theory, which simplifies the design and formal analysis of private algorithms. This theory has helped make differential privacy a de facto standard for privacypreserving data analysis. Over the last years, differential privacy has become in use in the private sector (Kenthapadi et al., 2019) by companies such as Google (Erlingsson et al., 2014; Papernot et al., 2018), Apple (team at Apple, 2017), and Uber (Johnson et al., 2018), and in the public sector by agencies such as the U.S. Census Bureau (Abowd, 2018; Garfinkel et al., 2018). A common challenge across all uses of differential privacy face is to explain it to users and policy makers. Indeed, differential privacy first emerged in the theoretical computer science community, and only was only subsequently considered in other research areas interested in data privacy. For this reason, several works have attempted to provide different interpretations of the semantics of differential privacy in an effort to make it more accessible.
One approach that has been particularly successful, especially when introducing differential privacy to people versed in statistical data analysis, is the hypothesis testing interpretation of differential privacy (Wasserman and Zhou, 2010; Kairouz et al., 2015). One can imagine an experiment where one wants to test through a differentially private mechanism the null hypothesis that an individual (for every possible ) has contributed her data to a particular dataset . One can also imagine that an alternative hypothesis is that the individual has not contributed her data. Then, the definition of differential privacy guarantees—and is in fact equivalent to requiring—that every hypothesis test that is designed for such experiment has either high significance
(it has a high rate of Type I errors), or low
power(it has a high rate of Type II errors). In fact, this interpretation goes even further because it also explain the privacy parameters as quantities regulating this experiment and the level of acceptable significance and power.
Recently, several relaxations of differential privacy have been proposed (Mironov, 2017; Bun and Steinke, 2016; Bun et al., 2018; Dong et al., 2019)
. Most of these new privacy definitions have been proposed as privacy notions with better composition properties than differential privacy. Having better composition can become a key advantage when a high number of data accesses is needed for a single analysis (e.g., in private deep learning
(Abadi et al., 2016)). Technically, many these relaxations are formulated as bounds on the Rényi divergence between the distribution obtained when running a private mechanism over a dataset where an individual has contributed her data versus the case when the private mechanism is run over the dataset where ’s data is removed.In this work we show formally that the relaxations of differential privacy based on the Rényi divergence do not support the same hypothesis testing interpretation as differential privacy. The main technical reason for this is that the Rényi divergence has a finer granularity than the divergence that defines standard differential privacy. To quantify this difference we introduce the notion of generatedness for a divergence. Intuitively, this notion expresses the number of decisions that are needed in a test to fully characterize the divergence. We show that the divergence that is traditionally used for differential privacy is generated, and this allows one to interpret differential privacy according to the standard hypothesis testing interpretation. On the other hand, Rényi divergence is not generated for any finite , though we show that it is generated (where by we mean that it is infinitely, but countably generated). This says that to characterize these relaxations of differential privacy through an experiment similar to the one used in the hypothesis testing interpretation, one needs to have an infinite number of possible decisions available. This shows a semantics separation between standard differential privacy and relaxations based on Rényi divergence.
In addition we also study a sufficient condition to guarantee that a divergence is generated: divergences defined as a supremum of a quasiconvex function
over probabilities of
partitions are generated. This allows one to construct divergences supporting the hypothesis testing interpretation by requiring them to be defined through an giving a generated divergence. The condition is also necessary for quasiconvex divergences, characterizing generation for all quasiconvex divergences.Summarizing, our contributions are:

We introduce the notion of generatedness for divergences. This notion describes the complexity of a divergence in terms of the number of possible decisions that are needed in a test to fully characterize the divergence.

We show that the divergence used to characterize differential privacy is generated, supporting the usual hypothesis testing interpretation of differential privacy

We show that Rényi divergence is generated, ruling out an hypothesis testing interpretation for privacy notions based on it.

We give sufficient and necessary conditions for a quasiconvex divergence to be generated.
Related work.
Several works have studied the semantics of formal notions of data privacy and differential privacy (Dwork, 2006; Wasserman and Zhou, 2010; Dwork and Roth, 2013; Kifer and Machanavajjhala, 2011, 2014; Hsu et al., 2014; Kasiviswanathan and Smith, 2015). The hypothesis testing interpretation of differential privacy was first introduced by Wasserman and Zhou (2010) and then used in a formal way to study the optimal composition theorem for differential privacy (Kairouz et al., 2015). Several works (Mironov, 2017; Bun and Steinke, 2016; Bun et al., 2018; Dong et al., 2019) have used divergences to reason about privacy leakages. As discussed in the introduction, several of these works are based on Rényi divergence (Mironov, 2017; Bun and Steinke, 2016; Bun et al., 2018). Dong et al. (2019) proposes to define new notions of privacy based on the hypothesis testing interpretation; our work suggests lends support to this direction, showing that other existing variants of privacy do not enjoy a hypothesis testing interpretation. The hypothesis testing interpretation of differential privacy has also inspired techniques in formal verification (Sato, 2016; Sato et al., 2017), including techniques to detect violations in differentially private implementations (Ding et al., 2018).
2 Background: hypothesis testing, privacy, and Rényi divergences
2.1 Hypothesis testing interpretation for differential privacy
We view randomized algorithms as functions from a set of inputs to the set of discrete probability distributions over a set of outputs. We assume that is equipped with a symmetric adjacency relation—informally, inputs are datasets and two inputs and are adjacent iff they differ in the data of a single individual.
Definition 1 (Differential Privacy (DP) (Dwork et al., 2006)).
Let and . A randomized algorithm is differentially private if for every pairs of adjacent inputs and , and every subset , we have:
Wasserman and Zhou (2010); Kairouz et al. (2015) proposed a useful interpretation of this guarantee in terms of hypothesis testing. Suppose that and are adjacent inputs. The observer sees the output of running a private mechanism on one of these inputs—but does not see the particular input—and wants to guess whether the input was or .
In the terminology of statistical hypothesis testing, let
be an output of a randomized mechanism , and take the following null and alternative hypotheses:H0 : came from , H1 : came from
One simple way of deciding between the two hypotheses is to fix a rejection region ; if the observation is in then the null hypothesis is rejected, and if the observation is not in then the null hypothesis is not rejected. These decision rules are known as deterministic decision rules.
Each decision rule can err in two possible ways. A false alarm (i.e. Type I error) is when the null hypothesis is true but rejected. This error rate is defined as . On the other hand, the decision rule may incorrectly fail to reject the null hypothesis, a false negative (i.e. Type II error). The probability of missed detection is defined as . There is a natural tradeoff between these two errors—a rule with a larger rejection region will be less likely to incorrectly fail to reject but more likely to incorrectly reject, while a rule with a smaller rejection region will be less likely to incorrectly reject but more likely to incorrectly fail to reject.
Differential privacy can now be reformulated in terms of these error rates.
Theorem 2 (Wasserman and Zhou (2010); Kairouz et al. (2015)).
A randomized algorithm is differentially private if and only if for every pair of adjacent inputs and , and any rejection region , we have: and .
Intuitively, the lower bound on the sum of the two error rates means that no decision rule is capable of achieving low Type I error and low Type II error simultaneously. Thus, the output distributions from any two adjacent inputs are statistically hard to distinguish.
Following Kairouz et al. (2015), we can also reformulate the definition of differential privacy in terms of a privacy region describing the attainable pairs of Type I and Type II errors.
Theorem 3 (Kairouz et al. (2015)).
A randomized algorithm is differentially private if and only if for every pair of adjacent inputs and ,
where the privacy region is defined as:
Since the original introduction of differential privacy, researchers have proposed several other variants based on Rényi divergence. The central question of this paper is: can we give similar hypothesis testing interpretations to these (and other) variants of differential privacy?
2.2 Relaxations of differential privacy based on Rényi divergence
We recall here notions of differential privacy based on Rényi divergence.
Definition 4 (Rényi divergence (Renyi, 1961)).
Let . The Rényi divergence of order between two probability distributions and on a space is defined by:
(1) 
The above definition does not consider the cases and . However we can see the divergence as a function of for fixed distributions and consider the limit. We have:
The first limit is the wellknown KL divergence, while the second limit is the max divergence that bounds the pointwise ratio of probabilities; standard differential privacy bounds this divergence on distributions from adjacent inputs.
There are several notions of differential privacy based on Rényi divergence, differing in whether the bound holds for all orders or just some orders. The first notion we consider is Rényi Differential Privacy (RDP) (Mironov, 2017).
Definition 5 (Rényi Differential Privacy (RDP) (Mironov, 2017)).
Let . A randomized algorithm is Rényi differentially private if for every pair and of adjacent inputs, we have
Renyi Differential privacy considers a fixed value of . In contrast, zeroConcentrated Differential Privacy (zCDP) (Bun and Steinke, 2016) quantifies over all possible .
Definition 6 (zeroConcentrated Differential Privacy (zCDP) (Bun and Steinke, 2016)).
A randomized algorithm is zero concentrated differentially private if for every pairs of adjacent inputs and , we have
(2) 
Truncated Concentrated Differential Privacy (tCDP) (Bun et al., 2018) quantifies over all below a given threshold.
Definition 7 (Truncated Concentrated Differential Privacy (tCDP) (Bun et al., 2018)).
A randomized algorithm is truncated concentrated differentially private if for every pairs of adjacent inputs and , we have
(3) 
These notions are all motivated by bounds on the privacy loss of a randomized algorithm. This quantity is defined by
where and are two adjacent inputs. Intuitively, the privacy loss measures how much information is revealed by an output . While output values with a large privacy loss are highly revealing—they are far more likely to result from a private input rather than a different private input
—if these outputs are only seen with small probability then it may be reasonable to discount their influence. The different privacy definitions bound different moments this privacy loss, treated as a random variable when
is drawn from the output of the algorithm on input . The following table summarizes these bounds.Privacy  Bound on privacy loss 

DP  
RDP  
zCDP  
tCDP 
In particular, DP bounds the maximum value of the privacy loss,^{1}^{1}1Technically speaking, this is true only for sufficiently wellbehaved distributions (Meiser, 2018). RDP bounds the moment, zCDP bounds all moments, and tCDP bounds the moments up to some cutoff . Many conversions are known between these definitions; for instance, the relaxations of RDP, zCDP, and tCDP are known to sit between and differential privacy in terms of expressivity, up to some modification in the parameters. While this means that RDP, zCDP, and tCDP can sometimes be analyzed by reduction to standard differential privacy, converting between the different notions requires weakening the parameters and often the privacy analysis is simpler or more precise when working with RDP, zCDP, or tCDP directly. The interested reader can refer to the original papers (Bun and Steinke, 2016; Mironov, 2017; Bun et al., 2018).
3 generated divergences
In this section we establish that RDP, zCDP and tCDP cannot be described in terms of hypothesis testing. Our main technical tool is the new notion of generatedness. We first formulate this notion for general divergences, then consider specific divergences from differential privacy.
3.1 Background and notation
We use standard notation and terminology from discrete probability. We let and stand for the unit interval and the positive extended real line respectively. We let denote the set of probability distributions over a set . When is a finite set with elements, i.e. , we sometimes treat as a subset of . Moreover, for every , the Dirac distribution centered at is defined by if and otherwise. Moreover, we define the convex combination of to be . It is easy to check that for every such that ,
For any probability distribution and , we define to be for every . For any probability distribution and a function (called a deterministic decision rule), we define to be for every . We have .
3.2 Divergences between probability distributions
We start from a very general definition of divergences. Our notation includes the domain of definition of the divergence; this distinction will be important when introducing the concept of generatedness.
Definition 8.
A divergence is a family of functions
We use the notation to denote the “distance” between distributions and .
Our notion of divergence subsumes the general notion of divergence from the literature Csiszár (1963); Csiszár and Shields (2004). Moreover, differential privacy can be reformulated using the divergence Barthe and Olmedo (2013) defined as follows:
Specifically, a randomized algorithm is differentially private if and only if for every pair of adjacent inputs and , we have
Many useful properties of divergences have been explored in the literature. Our technical development will involve the following two properties.

(postprocessing inequality) A divergence satisfies the postprocessing inequalityiff for every , .

(quasiconvexity) A divergence is quasiconvex iff for every such that and every discrete set ,
These light restrictions are satisfied by many common divergences. Besides Rényi divergences, they also hold for all divergences (Csiszár, 1963; Csiszár and Shields, 2004).
3.3 generatedness: definitions and basic properties
We now introduce the notion of generatedness. Informally, generatedness is a measure of the number of decisions that are needed in an hypothesis test to characterize a divergence.
Definition 9.
Let . A divergence is generated if there exists a set such that and for every discrete set and ,
We say that is deterministically generated if there exists a set such that and for every discrete set and ,
Lemma 10.
The following basic properties hold for all generated divergences.

If is 1generated, then is constant, i.e. for every discrete set there exists such that for every , we have .

If is (deterministically) generated, then it is also (deterministically) generated.

If is deterministically generated and satisfies the postprocessing inequality, then it is also generated.

If satisfies the postprocessing inequality, then it is also generated.
The following lemma shows that every generated divergence is also deterministically generated, so long as it is quasiconvex.
Theorem 11.
Any generated quasiconvex divergence is also deterministically generated.
To prove the equivalence we use a weak version of Birkhoffvon Neumann theorem, which states that every probabilistic decision rule can be decomposed into a convex combination of deterministic ones.
Theorem 12 (Weak Birkhoffvon Neumann).
Let and . Let and such that and . Then for every , there exist , and such that and for any .
3.4 2generatedness and hypothesis testing
In general, generated divergences have a close connection to the number of decisions that are needed in an hypothesis test to fully characterize the divergence. For instance, a divergence that is 2generated has a straightforward interpretation in terms of traditional hypothesis testing interpretation under probabilistic decision rules. For any generated divergence , we can define an analogous privacy region for hypothesis testing:
From the isomorphism , the following equivalence follows from definitions.
Lemma 13.
A divergence is generated if and only if the following condition holds:
Here, every function of type can be seen as a (probabilistic) decision rule, determining the acceptance or rejection of a null hypothesis. Therefore, the probabilities and can be seen as the Type I error and Type II error of the corresponding test.
Hence, the above lemma says that is generated if and only if we can bound, accordingly to the region , the Type I error and Type II errors of every test. Moreover, if a divergence is quasiconvex, this is equivalent to hypothesis testing under the more common deterministic decision rules. Thus, for quasiconvex and generated , we have the condition on Type I error and Type II errors under every rejection region whenever .
4 Applications to differential privacy
4.1 differential privacy is 2generated
In our framework, the hypothesis testing interpretation of differential privacy follows from the fact that divergence is generated.
Theorem 14.
The divergence is generated for all .
By Lemma 13 and Theorems 11 and 14, we can reprove that the notion of differential privacy can be characterized by hypothesis testing with both deterministic and probabilistic decision rules respectively.
It is worth noticing that the result above says that the divergence can be fully characterized in terms of traditional hypothesis tests, i.e. in terms of binary decision rules and the divergence over the space . This means that we do not lose anything in looking at differential privacy through the lenses of the hypothesis testing interpretation. This is not the case for other privacy definitions based on Rényi divergence, as we will show in the next section.
4.2 Other examples
Along similar lines to what we showed for the divergence, one can show that the total variation distance^{2}^{2}2Given by . is also is generated.
Recently, Dong et al. (2019) proposed a formal definition of data privacy based on the notion of tradeoff function and satisfying the hypothesis testing interpretation, similarly to differential privacy. We can characterize the tradeoff functions between Type I errors and Type II errors they use by the following family of divergences
By using we obtain the actual tradeoff function. It is easy to show that this family of divergences is also generated.
4.3 Rényi divergence is generated
Rényi divergence is not generated. To see this let and let be defined by and , and . Let . Then a simple calculation shows:
Similar results can be shown for the divergences used for zCDP and tCDP for specific values of the privacy parameters (see the appendix).
In general, Rényi divergences is exactly generated. First, it is not generated for any finite .
Theorem 15.
For any , the Rényi divergence is not generated for any finite .
By Lemma 10, we conclude that Rényi divergence is exactly generated. Moreover, thanks to the continuity of Rényi divergence (Liese and Vajda, 2006), we can generalize this result to uncountable domains and general probability measures.
On the hypothesis testing interpretation of Rényi divergence
The results above imply that we cannot have an analogous of Lemma 13 for Rényi divergence. Specifically, we cannot fully characterize the Rényi divergence between two distributions in terms of hypothesis tests—or more precisely, in terms of binary decision rules and Rényi divergence over the set .
Let be an infinite set. For every finite set , we have for some ,
but this inequality is strict, so if we consider only a decision rules with a finite number of decisions, we do not fully capture the Rényi divergence between two distributions.
In fact, every divergence can be approximated by a generated version by picking a set such that and setting:
One example of this phenomenon is the generated version
of the KullbackLeibler divergence
. This is a well studied divergence often referred to as the binary relative entropy. We can take a similar approach for the Rényi divergence of an arbitrary order , and study these restrictions. However, it is not clear whether these divergences would give good properties for privacy.If instead one wants to focus just on the traditional version of Rényi divergence, the generatedness tells us that to fully characterize it through an experiment, we need to have an infinite number of possible decisions available.
5 A characterization of generated divergences
As we have seen,
generated divergences satisfy a number of useful properties; known divergences from the literature can be classified according to this parameter
. In the other direction, we give a simple condition to ensure that a divergence is generated: suprema of quasiconvex functions over size partitions determine generated divergences.Theorem 16.
Let be a countable domain, and let be a quasiconvex function and define the following divergence:
Then the divergence is generated.
We sketch the proof for discrete probability distributions; it also holds for general measures.
Proof.
The direction is not hard to show: any partition defines a map from each point to its partition, which is a deterministic decision rule .
For the reverse direction , given a decision rule we can apply Theorem 12 to decompose as a convex combination , where each corresponds to a deterministic decision rule . By quasiconvexity of , we have:
As a converse to Theorem 11, this result characterizes generated quasiconvex divergences. It also serves as a useful tool to construct new divergences with a hypothesis testing interpretation, by varying the quasiconvex function .
6 Conclusion
In this paper we have shown that recent relaxations of differential privacy defined in terms of Rényi divergence do not have a hypothesis testing interpretation similar to the one for standard differential privacy. We introduced the notion of generatedness for a divergence, which quantifies the number of decisions that are needed in an experiment similar to the ones used in hypothesis testing to fully characterize the divergence. This notion is also a measure of the complexity that tools for formal verification may have. We leave the study of this connection for future work.
References
 Abadi et al. [2016] Martín Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In ACM SIGSAC Conference on Computer and Communications Security (CCS), Vienna, Austria, pages 308–318, 2016. doi: 10.1145/2976749.2978318. URL http://doi.acm.org/10.1145/2976749.2978318.
 Abowd [2018] John M. Abowd. The U.S. census bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 1923, 2018, page 2867, 2018. doi: 10.1145/3219819.3226070. URL https://doi.org/10.1145/3219819.3226070.
 Barthe and Olmedo [2013] Gilles Barthe and Federico Olmedo. Beyond differential privacy: Composition theorems and relational logic for divergences between probabilistic programs. In International Colloquium on Automata, Languages and Programming (ICALP), Riga, Latvia, volume 7966 of Lecture Notes in Computer Science, pages 49–60. SpringerVerlag, 2013. doi: 10.1007/9783642392122_8. URL https://certicrypt.gforge.inria.fr/2013.ICALP.pdf.
 Bun and Steinke [2016] Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In IACR Theory of Cryptography Conference (TCC), Beijing, China, volume 9985 of Lecture Notes in Computer Science, pages 635–658. SpringerVerlag, 2016. doi: 10.1007/9783662536414_24.

Bun et al. [2018]
Mark Bun, Cynthia Dwork, Guy N. Rothblum, and Thomas Steinke.
Composable and versatile privacy via truncated CDP.
In
ACM SIGACT Symposium on Theory of Computing (STOC), Los Angeles, California
, 2018.  Csiszár [1963] Imre Csiszár. Eine informationstheoretische ungleichung und ihre anwendung auf den beweis der ergodizitat von markoffschen ketten. Magyar. Tud. Akad. Mat. Kutató Int. Közl, 8:85–108, 1963.
 Csiszár and Shields [2004] I. Csiszár and P.C. Shields. Information theory and statistics: A tutorial. Foundations and Trends® in Communications and Information Theory, 1(4):417–528, 2004. ISSN 15672190. doi: 10.1561/0100000004. URL http://dx.doi.org/10.1561/0100000004.
 Ding et al. [2018] Zeyu Ding, Yuxin Wang, Guanhong Wang, Danfeng Zhang, and Daniel Kifer. Detecting violations of differential privacy. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 1519, 2018, pages 475–489, 2018. doi: 10.1145/3243734.3243818. URL https://doi.org/10.1145/3243734.3243818.
 Dong et al. [2019] Jinshuo Dong, Aaron Roth, and Weijie J. Su. Gaussian Differential Privacy. arXiv eprints, art. arXiv:1905.02383, May 2019.
 Dwork [2006] Cynthia Dwork. Differential privacy. In International Colloquium on Automata, Languages and Programming (ICALP), Venice, Italy, pages 1–12. 2006. URL https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.7534&rep=rep1&type=pdf.
 Dwork and Roth [2013] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4), 2013. ISSN 1551305X. doi: 10.1561/0400000042. URL http://dx.doi.org/10.1561/0400000042.
 Dwork et al. [2006] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In IACR Theory of Cryptography Conference (TCC), New York, New York, volume 3876 of Lecture Notes in Computer Science, pages 265–284. SpringerVerlag, 2006. doi: 10.1007/11681878_14.
 Erlingsson et al. [2014] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. RAPPOR: randomized aggregatable privacypreserving ordinal response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, November 37, 2014, pages 1054–1067, 2014. doi: 10.1145/2660267.2660348. URL https://doi.org/10.1145/2660267.2660348.
 Garfinkel et al. [2018] Simson L. Garfinkel, John M. Abowd, and Sarah Powazek. Issues encountered deploying differential privacy. In Proceedings of the 2018 Workshop on Privacy in the Electronic Society, WPES@CCS 2018, Toronto, ON, Canada, October 1519, 2018, pages 133–137, 2018. doi: 10.1145/3267323.3268949. URL https://doi.org/10.1145/3267323.3268949.
 Hsu et al. [2014] Justin Hsu, Marco Gaboardi, Andreas Haeberlen, Sanjeev Khanna, Arjun Narayan, Benjamin C. Pierce, and Aaron Roth. Differential privacy: An economic method for choosing epsilon. In IEEE 27th Computer Security Foundations Symposium, CSF 2014, Vienna, Austria, 1922 July, 2014, pages 398–410, 2014. doi: 10.1109/CSF.2014.35. URL https://doi.org/10.1109/CSF.2014.35.
 Johnson et al. [2018] Noah M. Johnson, Joseph P. Near, and Dawn Song. Towards practical differential privacy for SQL queries. PVLDB, 11(5):526–539, 2018. URL http://www.vldb.org/pvldb/vol11/p526johnson.pdf.

Kairouz et al. [2015]
Peter Kairouz, Sewoong Oh, and Pramod Viswanath.
The composition theorem for differential privacy.
In
Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 611 July 2015
, pages 1376–1385, 2015. URL http://jmlr.org/proceedings/papers/v37/kairouz15.html.  Kasiviswanathan and Smith [2015] Shiva Prasad Kasiviswanathan and Adam D. Smith. On the ‘semantics’ of differential privacy: A bayesian formulation. CoRR, abs/0803.3946, 2015. URL http://arxiv.org/abs/0803.3946.
 Kenthapadi et al. [2019] Krishnaram Kenthapadi, Ilya Mironov, and Abhradeep Guha Thakurta. Privacypreserving data mining in industry. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, February 1115, 2019, pages 840–841, 2019. doi: 10.1145/3289600.3291384. URL https://doi.org/10.1145/3289600.3291384.
 Kifer and Machanavajjhala [2011] Daniel Kifer and Ashwin Machanavajjhala. No free lunch in data privacy. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 1216, 2011, pages 193–204, 2011. doi: 10.1145/1989323.1989345. URL https://doi.org/10.1145/1989323.1989345.
 Kifer and Machanavajjhala [2014] Daniel Kifer and Ashwin Machanavajjhala. Pufferfish: A framework for mathematical privacy definitions. ACM Trans. Database Syst., 39(1):3:1–3:36, 2014. doi: 10.1145/2514689. URL https://doi.org/10.1145/2514689.
 Liese and Vajda [2006] Friedrich Liese and Igor Vajda. On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory, 52(10):4394–4412, Oct 2006. ISSN 00189448. doi: 10.1109/TIT.2006.881731.
 Meiser [2018] Sebastian Meiser. Approximate and probabilistic differential privacy definitions. IACR Cryptology ePrint Archive, 2018:277, 2018. URL https://eprint.iacr.org/2018/277.
 Mironov [2017] Ilya Mironov. Rényi differential privacy. In IEEE Computer Security Foundations Symposium (CSF), Santa Barbara, California, pages 263–275, 2017. doi: 10.1109/CSF.2017.11. URL https://arxiv.org/abs/1702.07476.
 Papernot et al. [2018] Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Úlfar Erlingsson. Scalable private learning with PATE. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30  May 3, 2018, Conference Track Proceedings, 2018. URL https://openreview.net/forum?id=rkZB1XbRZ.
 Renyi [1961] Alfred Renyi. On measures of entropy and information. In Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, pages 547–561, Berkeley, Calif., 1961. University of California Press. URL http://projecteuclid.org:443/euclid.bsmsp/1200512181.
 Sato [2016] Tetsuya Sato. Approximate relational Hoare logic for continuous random samplings. Electronic Notes in Theoretical Computer Science, 325:277–298, 2016. ISSN 15710661. doi: https://doi.org/10.1016/j.entcs.2016.09.043. URL http://www.sciencedirect.com/science/article/pii/S1571066116300949. Conference on the Mathematical Foundations of Programming Semantics (MFPS), Pittsburgh, Pennsylvania.
 Sato et al. [2017] Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shinya Katsumata. Approximate span liftings: Compositional semantics for relaxations of differential privacy. CoRR, abs/1710.09010, 2017. URL http://arxiv.org/abs/1710.09010. (The extended version of this paper).
 team at Apple [2017] Differential Privacy team at Apple. Learning with privacy at scale, 2017. https://machinelearning.apple.com/2017/12/06/learningwithprivacyatscale.html.
 Wasserman and Zhou [2010] Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375–389, 2010. doi: 10.1198/jasa.2009.tm08651. URL https://doi.org/10.1198/jasa.2009.tm08651.
Appendix A Weak Birkhoffvon Neumann Theorem
Theorem 17 (Weak Birkhoffvon Neumann theorem).
Let and . For any , there are and such that and for any .
The cardinal can be relaxed to countable infinite cardinal , and then the families and may be infinite.
Proof.
Consider the following matrix representation of :
where and for any .
For any , the matrix representation of is
satisfying that for any , there is exactly such that and for . Conversely, any matrix satisfying this condition corresponds to some function . Consider the family of matrix representations of maps of the form . We give an algorithm decomposing to a convex sum of :

Let and . We have for all .

For given and satisfying for all , we define , , and as follows: α_m+1 = min_s max_t (~f_m)_s,t,
r_m+1 = r_m  α_m+1,
(g_m+1)_i,j = {1j = argmaxs(~fm)i,s0 (otherwise) ,
_m+1 = ~f_m  α_m+1 ⋅g_m+1.

If then we terminate. Otherwise, we repeat the previous step.
In each step, we obtain the following conditions:

We have because can be written as .

We have whenever because

We have for any from the following equation:
When and , we obtain while . This implies that the number of in increases in this operation.

We also have for all because
Therefore the construction of , , and terminates within steps. When the construction terminates at the step ( also holds), we have a convex decomposition of by where . This implies By taking such that is a matrix representation of , we obtain for any with and . ∎
Appendix B Generalizing quasiconvex characterization of generation
Recall that Theorem 16 shows that suprema of quasiconvex functions over partitions of a countable domain are generated quasiconvex divergences. This section generalizes this result to probability measures over general measurable spaces.
Theorem 18 (generatedness in measurable setting).
Assume that is quasiconvex and continuous. For any measurable space , we have
Proof.
We easily calculate as follows (functions are assumed to be measurable):
Note that we treat as a finite discrete space. Consider the family of finite sets (discrete spaces) defined as follows:
We fix a measurable function and treat as a subset of . For each , we define a measurable partition of by
We next define and as follows: is the unique element satisfying , and we choose