1 Introduction
Differential privacy (DP) (Dwork et al., 2006) has arisen in the last decade into a strong defacto standard for privacypreserving computation in the context of statistical analysis. The success of DP is based, at least in part, on the availability of robust building blocks (e.g., the Laplace, exponential and Gaussian mechanisms) together with relatively simple rules for analyzing complex mechanisms built out of these blocks (e.g., composition and robustness to postprocessing). The inherent tension between privacy and utility in practical applications has sparked a renewed interest into the development of further rules leading to tighter privacy bounds. A trend in this direction is to find ways to measure the privacy introduced by sources of randomness that are not accounted for by standard composition rules. Generally speaking, these are referred to as privacy amplification rules, with prominent examples being amplification by subsampling (Chaudhuri and Mishra, 2006; Kasiviswanathan et al., 2011; Li et al., 2012; Beimel et al., 2013, 2014; Bun et al., 2015; Balle et al., 2018; Wang et al., 2019), shuffling (Erlingsson et al., 2019; Cheu et al., 2019; Balle et al., 2019) and iteration (Feldman et al., 2018).
Motivated by these considerations, in this paper we initiate a systematic study of privacy amplification by stochastic postprocessing. Specifically, given a DP mechanism producing (probabilistic) outputs in and a Markov operator defining a stochastic transition between and , we are interested in measuring the privacy of the postprocessed mechanism producing outputs in . The standard postprocessing property of DP states that is at least as private as . Our goal is to understand under what conditions the postprocessed mechanism is strictly more private than . Roughly speaking, this amplification should be nontrivial when the operator “forgets” information about the distribution of its input . Our main insight is that, at least when , the forgetfulness of from the point of view of DP can be measured using similar tools to the ones developed to analyze the speed of convergence, i.e. mixing, of the Markov process associated with .
In this setting, we provide three types of results, each associated with a standard method used in the study of convergence for Markov processes. In the first place, Section 3 provides DP amplification results for the case where the operator
satisfies a uniform mixing condition. These include standard conditions used in the analysis of Markov chains on discrete spaces, including the wellknown Dobrushin coefficent and Doeblin’s minorization condition
(Levin and Peres, 2017). Although in principle uniform mixing conditions can also be defined in more general nondiscrete spaces (Del Moral et al., 2003), most Markov operators of interest in do not exhibit uniform mixing since the speed of convergence depends on how far apart the initial inputs are. Convergence analyses in this case rely on more sophisticated tools, including Lyapunov functions (Meyn and Tweedie, 2012), coupling methods (Lindvall, 2002) and functional inequalities (Bakry et al., 2013).Following these ideas, Section 4 investigates the use of coupling methods to quantify privacy amplification by postprocessing under Rényi DP (Mironov, 2017). These methods apply to operators given by, e.g., Gaussian and Laplace distributions, for which uniform mixing does not hold. Results in this section are intimately related to the privacy amplification by iteration phenomenon studied in (Feldman et al., 2018) and can be interpreted as extensions of their main results to more general settings. In particular, our analysis provides sharper bounds when iterating strict contractions and leads to an exponential improvement on the privacy amplification by iteration of Noisy SGD in the strongly convex case.
Our last set of results concerns the case where is replaced by a family of operators forming a Markov semigroup (Bakry et al., 2013). This is the natural setting for continuoustime Markov processes, and includes diffusion processes defined in terms of stochastic differential equations (Øksendal, 2003). In Section 5 we associate (a collection of) diffusion mechanisms to a diffusion semigroup. Interestingly, these mechanisms are, by construction, closed under postprocessing in the sense that . We show the Gaussian mechanism falls into this family – since Gaussian noise is closed under addition – and also present a new mechanism based on the OrnsteinUhlenbeck process which in many cases has better mean squared error than the Gaussian mechanism. Our main result on diffusion mechanisms provides a generic Rényi DP guarantee based on an intrinsic notion of sensitivity derived from the geometry induced by the semigroup. The proof relies on a heat flow argument reminiscent of the analysis of mixing in diffusion processes based on functional inequalities (Bakry et al., 2013).
2 Background
We start by introducing notation and concepts that will be used throughout the paper. We write , and .
Probability.
Let be a measurable space with sigmaalgebra and base measure . We write
to denote the set of probability distributions on
. Given a probability distribution and a measurable event we writefor a random variable
, denote its expectation under by , and can get back its distribution as . Given two distributions (or, in general, arbitrary measures) we write to denote that is absolutely continuous with respect to , in which case there exists a RadonNikodym derivative . We shall reserve the notation to denote the density of with respect to the base measure. We also write to denote the set of couplings between and ; i.e. is a distribution on with marginals and . The support of a distribution is .Markov Operators.
We will use to denote the set of Markov operators defining a stochastic transition map between and and satisfying that is measurable for every measurable . Markov operators act on distributions on the left through , and on functions on the right through , which can also be written as with . The kernel of a Markov operator (with respect to ) is the function associating with the density of with respect to a fixed measure.
Divergences.
A popular way to measure dissimilarity between distributions is to use Csiszár divergences , where is convex with . Taking yields the total variation distance , and the choice with gives the hockeystick divergence , which satisfies
It is easy to check that is monotonically decreasing and . All Csiszár divergences satisfy joint convexity and the data processing inequality for any Markov operator . Rényi divergences^{1}^{1}1Rényi divergences do not belong to the family of Csiszár divergences. are another way to compare distributions. For the Rényi divergence of order is defined as , and also satisfies the data processing inequality. Finally, to measure similarity between we sometimes use the Wasserstein distance:
Differential Privacy.
A mechanism is a randomized function that takes a dataset over some universe of records and returns a (sample from) distribution . We write to denote two databases differing in a single record. We say that satisfies^{2}^{2}2This divergence characterization of DP is due to (Barthe and Olmedo, 2013). DP if (Dwork et al., 2006). Furthermore, we say that satisfies RDP if (Mironov, 2017).
3 Amplification From Uniform Mixing
We start our analysis of privacy amplification by stochastic postprocessing by considering settings where the Markov operator satisfies one of the following uniform mixing conditions.
Definition 1.
Let be a Markov operator, and . We say that is:

Dobrushin if ,

Dobrushin if ,

Doeblin if there exists a distribution such that for all ,

ultramixing if for all we have and .
Most of these conditions arise in the context of mixing analyses in Markov chains. In particular, the Dobrushin condition can be tracked back to (Dobrushin, 1956), while Doeblin’s condition was introduced earlier (Doeblin, 1937) (see also (Nummelin, 2004)). Ultramixing is a strengthening of Doeblin’s condition used in (Del Moral et al., 2003). The Dobrushin is, on the other hand, new and is designed to be a generalization of Dobrushin tailored for amplification under the hockeystick divergence.
It is not hard to see that Dobrushin’s is the weakest among these conditions, and in fact we have the implications summarized in Figure 2 (see Lemma 3). This explains why the amplification bounds in the following result are increasingly stronger, and in particular why the first two only provide amplification in , while the last two also amplify the parameter.
The implications in Figure 2 hold.
Proof.
That Dobrushin implies Dobrushin follows directly from .
To see that Doeblin implies Dobrushin we observe that the kernel of a Doeblin operator must satisfy for any . Thus, we can use the characterization of in terms of a minimum to get
Finally, to get the Doeblin condition for an operator satisfying ultramixing we recall from (Del Moral et al., 2003, Lemma 4.1) that for such an operator we have that is satisfied for any probability distribution and . Thus, taking to have full support we obtain Doeblin’s condition with . ∎
Let be an DP mechanism. For a given Markov operator , the postprocessed mechanism satisfies:

DP with if is Dobrushin,

DP with if is Dobrushin with^{3}^{3}3We take the convention whenever , in which case the Dobrushin condition is obtained with respect to the divergence . ,

DP with and if is Doeblin,

DP with and if is ultramixing.
A few remarks about this result are in order. First we note that (2) is stronger than (1) since the monotonicity of hockeystick divergences implies . Also note how in the results above we always have , and in fact the form of is the same as obtained under amplification by subsampling when, e.g., a fraction of the original dataset is kept. This is not a coincidence since the proofs of (3) and (4) leverage the overlapping mixtures technique used to analyze amplification by subsampling in (Balle et al., 2018). However, we note that for (3) we can have even with . In fact the Doeblin condition only leads to an amplification in if .
For convenience, we split the proof of Theorem 3 into four separate statements, each corresponding to one of the claims in the theorem.
Recall that a Markov operator is Dobrushin if .
Let be an DP mechanism. If is a Dobrushin Markov operator, then the composition is DP.
Proof.
This follows directly from the strong Markov contraction lemma established by Cohen et al. (1993) in the discrete case and by Del Moral et al. (2003) in the general case (see also (Raginsky, 2016)). In particular, this lemma states that for any divergence in the sense of Csiszár we have . Letting and for some and applying this inequality to yields the result. ∎
Next we prove amplification when is a Dobrushin operator. Recall that a Markov operator is Dobrushin if . We will require the following technical lemmas in the proof of Theorem 3.
Let denote the fact . If is Dobrushin, then we have
Proof.
Note that the condition on can be written as . This shows that by hypothesis the condition already holds for the distributions with . Thus, all we need to do is prove that these distributions are extremal for among all distributions with . Let and define and . Working in the discrete setting for simplicity, we can write , with an equivalent expression for . Now we use the joint convexity of to write
∎
Let . Then we have
Proof.
Define to be set of points where is dominated by , and let denote its complementary. Then we have the identities
Thus we obtain the desired result since
∎
Let be an DP mechanism and let . If is a Dobrushin Markov operator, then the composition is DP.
Proof.
Fix and for some and let . We start by constructing overlapping mixture decompositions for and as follows. First, define the function and let be the probability distribution with density , where we used Lemma 3. Now note that by construction we have the inequalities
Assuming without loss of generality that , these inequalities imply that we can construct probability distributions and such that
Now we observe that the distributions and defined in this way have disjoint support. To see this we first use the identity to see that
Thus we have . A similar argument applied to shows that on the other hand , and thus .
Finally, we proceed to use the mixture decomposition of and and the condition to bound as follows. By using the mixture decompositions we get
where . Thus, applying the definition of , using the linearity of Markov operators, and the monotonicity we obtain the bound:
where the last inequality follows from Lemma 3. ∎
Recall that a Markov operator is Doeblin if there exists a distribution such that for all . The proof of amplification for Doeblin operators further leverages overlapping mixture decompositions like the one used in Theorem 3, but this time the mixture arises at the level of the kernel itself.
Let be an DP mechanism. If is a Doeblin Markov operator, then the composition is DP with and .
Proof.
Fix and for some . Let be a witness that is Doeblin and let be the constant Markov operator given by for all . Doeblin’s condition implies that the following is again a Markov operator:
Thus, we can write as the mixture and then use the advanced joint convexity property of (Balle et al., 2018, Theorem 2) with to obtain the following:
where . Finally, using the immediate bounds and , we get
∎
Our last amplification result applies to operators satisfying the ultramixing condition of Del Moral et al. (2003). We say that a Markov operator is ultramixing if for all we have and . The proof strategy is based on the ideas from the previous proof, although in this case the argument is slightly more technical as it involves a strengthening of the Doeblin condition implied by ultramixing that only holds under a specific support.
Let be an DP mechanism. If is a ultramixing Markov operator, then the composition is DP with and .
Proof.
Fix and for some . The proof follows a similar strategy as the one used in Theorem 3, but coupled with the following consequence of the ultramixing property: for any probability distribution and we have (Del Moral et al., 2003, Lemma 4.1). We use this property to construct a collection of mixture decompositions for as follows. Let and take and . By the ultramixing condition and the argument used in the proof of Theorem 3, we can show that
is a Markov operator from into . Here is the constant Markov operator . Furthermore, the expression for and the definition of imply that
(1) 
Now note that the mixture decompositions and and the advanced joint convexity property of (Balle et al., 2018, Theorem 2) with yield
where . Using (1) we can expand the remaining divergence above as follows:
where we used the definition of and joint convexity. Since was arbitrary, we can now take the limit to obtain the bound . ∎
We conclude this section by noting that the conditions in Definition 1, despite being quite natural, might be too stringent for proving amplification for DP mechanisms on, say, . One way to see this is to interpret the operator as a mechanism and to note that the uniform mixing conditions on can be rephrased in terms of local DP (LDP) (Kasiviswanathan et al., 2011) properties (see Table 2)^{4}^{4}4The blanket condition is a necessary condition for LDP introduced in (Balle et al., 2019) to analyze privacy amplification by shuffling. where the supremum is taken over any pair of inputs (instead of neighboring ones). This motivates the results on next section, where we look for finer conditions to prove amplification by stochastic postprocessing.
4 Amplification From Couplings
In this section we turn to couplingbased proofs of amplification by postprocessing under the Rényi DP framework. Our first result is a measuretheoretic generalization of the shiftreduction lemma in (Feldman et al., 2018)
which does not rely on the vectorspace structure of the underlying space.
Given a coupling with , we construct a transport Markov operator with kernel^{5}^{5}5Here we use the convention . , where and . It is immediate to verify from the definition that is a Markov operator satisfying the transport property .
Let , and . For any distribution and coupling we have
(2) 
Proof.
Let and be as in the statement, and let . Note that taking and to be the corresponding transport operators we have . Now, given a let denote the marginal of on the second coordinate. In particular, if
denotes the joint distribution of
and , then we have . Thus, by the data processing inequality we haveThe final step is to expand the RHS of the derivation above as follows:
where the supremums are taken with respect to . ∎
Note that this result captures the dataprocessing inequality for Rényi divergences since taking yields . The next examples illustrate the use of this theorem to obtain amplification by operators corresponding to the addition of Gaussian and Laplace noise.
Example 1 (Tightness).
To show that (2) is tight we consider the simple scenario of adding Gaussian noise to the output of a Gaussian mechanism. In particular, suppose for some function with global sensitivity and the Markov operator is given by . The postprocessed mechanism is given by , which satisfies RDP. We now show how this result also follows from Theorem 4. Given two datasets we write and with . We take for some to be determined later, and couple and through a translation , yielding a coupling with and a transport operator with kernel . Plugging these into (2) we get
Finally, taking with yields .
Example 2 (Iterated Laplace).
To illustrate the flexibility of this technique, we also apply it to get an amplification result for iterated Laplace noise, in which Laplace noise is added to the output of a Laplace mechanism. We begin by noting a negative result that there is no amplification in the DP regime.
Let for some function with global sensitivity and let the Markov operator be given by . The postprocessed mechanism does not achieve DP for any . Note that achieves DP and achieves DP.
Proof.
This can be shown by directly analyzing the distribution arising from the sum of two independent laplace variables. Let denote this distribution. In the following equations, we assume . Due to symmetry around the origin, densities at negative values can be found by looking instead at the corresponding positive location.
The integration on the middle term varies between the cases and . Finishing this derivation and replacing with to account for both positive and negative values, we get a complete expression for our density.