1 Introduction
The Gaussian mechanism is the workhorse for a multitude of differentially private learning algorithms [46, 10, 1]. While simple enough for mathematical reasoning and privacy accounting analyses, its continuous nature presents a number of challenges in practice. For example, it cannot be exactly represented on finite computers, making it prone to numerical errors that can break its privacy guarantees [35]. Moreover, it cannot be used in distributed learning settings with cryptographic multiparty computation primitives involving modular arithmetic, such as secure aggregation [12, 11]. To address these shortcomings, the binomial and (distributed) discrete Gaussian mechanisms were recently introduced [19, 2, 15, 25]
. Unfortunately, both have their own drawbacks: the privacy loss for the binomial mechanism can be infinite with a nonzero probability, and the discrete Gaussian: (a) is not closed under summation (i.e. sum of discrete Gaussians is not a discrete Gaussian), complicating analysis in distributed settings and leading to a performance worse than continuous Gaussian in the highly distributed, lownoise regime
[25]; (b) requires a sampling algorithm that is not shipped with mainstream machine learning or data analysis software packages, making it difficult for engineers to use it in production settings (naïve implementations may lead to catastrophic privacy errors).Our contributions To overcome these limitations, we introduce and analyze the multidimensional Skellam mechanism, a mechanism based on adding noise distributed according to the difference of two independent Poisson random variables. The Skellam noise is closed under summation (i.e. sums of Skellam random variables is again Skellam distributed) and can be sampled from easily – efficient Poisson samplers are widely available in numerical software packages. Being discrete in nature also means that it can mesh well cryptographic protocols and can lead to communication savings.
To analyze the privacy guarantees of the Skellam mechanism and compare it with other mechanisms, we provide a numerical evaluation of the privacy loss random variable and prove a sharp bound on the Rényi divergence between two shifted Skellam distributions. Our careful analysis shows that for a multidimensional query function with sensitivity and sensitivity
, the Skellam mechanism with variance
achieves Rényi differential privacy (RDP) [36] for (see Theorem 3.5). This implies that the RDP guarantees are at most times worse than those of the Gaussian mechanism.To analyze the performance of the Skellam mechanism in practice, we consider a differentially private and communication constrained federated learning (FL) setting [26] where the noise is added locally to the dimensional discretized client updates that are then summed securely via a cryptographic protocol, such as secure aggregation (SecAgg) [11, 12]. We provide an endtoend algorithm that appropriately discretizes the data and applies the Skellam mechanism along with modular arithmetic to bound the range of the data and communication costs before applying SecAgg.
We show on distributed mean estimation and two benchmark FL datasets, Federated EMNIST
[14] and Stack Overflow [8], that our method can match the performance of the continuous Gaussian baseline under tight privacy and communication budgets, despite using generic RDP amplification via sampling [51] for our approach and the precise RDP analysis for the subsampled Gaussian mechanism [37]. Our method is implemented in TensorFlow Privacy
[32] and TensorFlow Federated [24]and will be opensourced.
^{1}^{1}1https://github.com/googleresearch/federated/tree/master/distributed_dp While we mostly focus on FL applications, the Skellam mechanism can also be applied in other contexts of learning and analytics, including centralized settings.Related work The Skellam mechanism was first introduced in the context of computational differential privacy from latticebased cryptography [49]
and private Bayesian inference
[45]. However, the privacy analyses in the prior work do not readily extend to the multidimensional case, and they give direct bounds for pure or approximate DP which makes only advanced composition theorems [28, 22] directly applicable in learning settings where the mechanism is applied many times. For example, the guarantees from [49] lead to poor accuracyprivacy tradeoffs as demonstrated in Fig. 1. Moreover, we show in Section 3.1 that extending the direct privacy analysis to the multidimensional setting is nontrivial because the worstcase neighboring dataset pair is unknown in this case. For these reasons, our tight privacy analysis via a sharp RDP bound makes the Skellam mechanism practical for learning applications for the first time. These guarantees (almost) match those of the Gaussian mechanism and allow us to use generic RDP amplification via subsampling methods [51].The closest mechanisms to Skellam are the binomial [2, 19] and the discrete Gaussian mechanisms [15, 25]. The binomial mechanism can (asymptotically) match the continuous Gaussian mechanism (when properly scaled). However, it does not achieve Rényi or zeroconcentrated DP [36, 13] and has a privacy loss that can be infinite with a nonzero probability, leading to catastrophic privacy failures. The discrete Gaussian mechanism yields Rényi DP and can be applied to distributed settings [25], but it requires a sampling algorithm that is not yet available in data analysis software packages despite being explored in the latticebased cryptography community (e.g., [43, 18, 38]
). The discrete Gaussian is also not closed under summation and the divergence can be large in highly distributed lownoise settings (e.g. quantile estimation
[6] and federated analytics [42]), which causes privacy degradation. See the end of Section 4 for more discussion.2 Preliminaries
We begin by providing a formal definition for differential privacy (DP) [20].
Definition 2.1 (Differential Privacy).
For , a randomized mechanism satisfies DP if for all neighboring datasets and all in the range of , we have that
where and are neighboring pairs if they can be obtained from each other by adding or removing all the records that belong to a particular user.
In our experiments we consider userlevel differential privacy – i.e., and are neighboring pairs if one of them can be obtained from the other by adding or removing all the records associated with a single user [33]. This is stronger than the commonlyused notion of item level privacy where, if a user contributes multiple records, only the addition or removal of one record is protected.
We also make use of Rényi differential privacy (RDP) [36] which allows for tight privacy accounting.
Definition 2.2 (Rényi Differential Privacy).
A mechanism satisfies RDP if for any two neighboring datasets , we have that where is the Rényi divergence between and and is given by
A closely related privacy notion is zeroconcentrated DP (zCDP) [21, 13]. In fact, zCDP is equivalent to simultaneously satisfying an infinite family of RDP guarantees, namely Rényi differential privacy for all . The following conversion lemma from [13, 15, 7] relates RDP to ()DP.
Lemma 2.3.
If satisfies RDP, then, for any , satisfies DP, where
For any query function , we define the sensitivity as where and are neighboring pairs differing by adding or removing all the records from a particular user. We also include the RDP guarantees of the discrete Gaussian mechanism (same RDP guarantees as the continuous Gaussian mechanism) to which we compare our method.
Definition 2.4 (The Discrete Gaussian Mechanism [15]).
Given an integervalued query and noise variance , the Discrete Gaussian (DGaussian) Mechanism is given by
and
denotes the discrete Gaussian distribution defined in Equation (1) of
[15]. The discrete Gaussian mechanism achieves Rényi DP.3 The Skellam Mechanism
We begin by presenting the definition of the Skellam distribution, which is the basis of the Skellam Mechanism for releasing integer ranged multidimensional queries.
Definition 3.1 (Skellam Distribution).
The multidimensional Skellam distribution over with mean and variance is given with each coordinate distributed independently as
for . Here, is the modified Bessel function of the first kind. A key property of Skellam random variables which motivates their use in DP is that they are closed under summation, i.e. let and then This follows from the fact that a Skellam random variable can be obtained by taking the difference between two independent Poisson random variables with means .^{2}^{2}2We only consider the symmetric version of Skellam, but it is often more generally defined as the difference of independent Poisson random variables with different variances. We are now ready to introduce the Skellam Mechanism.
Definition 3.2 (The Skellam Mechanism).
Given an integervalued query , we define the Skellam Mechanism as
and the total error of the mechanism is bounded by .
The Skellam mechanism was first introduced in [49] for the scalar case. As our goal is to apply the Skellam mechanism in the learning context, we have to address the following challenges. (1) Tight privacy compositions: Learning algorithms are iterative in nature and require the application of the DP mechanism many times (often ). The current direct approximate DP analysis in [49] can be combined with advanced composition (AC) theorems [28, 22] but that leads to poor privacyaccuracy tradeoffs (see Fig. 1).
(2) Privacy analysis for multidimensional queries: In learning algorithms, the differentially private queries are multidimensional (where the dimension equals the number of model parameters, typically ). Using composition theorems lead to poor accuracyprivacy tradeoffs and a direct extension of approximate DP guarantee [49] for the multidimensional case leads to a strong dependence on sensitivity which is prohibitively large in high dimensions. (3) Data discretization:
The gradients are naturally continuous vectors but we would like to apply an integer based mechanism. This requires properly discretizing the data while making sure that the norm of the vectors (sensitivity of the query) is preserved. We will tackle challenges (1) and (2) in the remainder of this section and leave (3) for the next section.
3.1 Tight Numerical Accounting via Privacy Loss Distributions
We begin by defining the notion of privacy loss distributions (PLDs).
Definition 3.3 (Privacy Loss Distribution).
For a multidimensional discrete privacy mechanism and neighboring datasets , for any , we define . The privacy loss random variable of at () is [22]. The privacy loss distribution (PLD) of , denoted by , is the distribution of .
The PLD of a mechanism can be used to characterize its DP guarantees.
Lemma 3.4.
A mechanism is DP if and only if for all neighboring datasets where .
When a mechanism is applied times on a dataset, the overall PLD of the composed mechanism at is the fold convolution of [22]
. Since discrete convolutions can be computed efficiently using fast Fourier transforms (FFTs) and the expectation in Lemma
3.4 can be numerically approximated, PLDs are attractive for tight numerical accounting [30, 34, 17]. Applying the above to the Skellam mechanism, a direct calculation shows that with are i.i.d. according to ,When , it suffices to look at , where and . Since
has a discrete and symmetric probability distribution and the
function is monotonic, the distribution of can be easily characterized. This gives us a tight numerical accountant for the Skellam mechanism in the scalar case, which we use to compare it with both the Gaussian and discrete Gaussian mechanisms. Fig. 1 shows this comparison, highlighting the competitiveness of the Skellam mechanism and the problem of combining the direct analysis of [49] with advanced composition (AC) theorems. When , there are combinatorially many ’s that need to be considered, even when the sensitivity of is bounded. The discrete Gaussian mechanism faces a similar issue (see Theorem 15 of [15]). To provide a tight privacy analysis in the multidimensional case, we prove a bound on the RDP guarantees of the Skellam mechanism in the next subsection. Fig. 1 and 2 show that our bound is tight and the competitiveness of the Skellam mechanism in high dimensions.3.2 Tight Accounting via Rényi Differential Privacy
The following theorem states our main theoretical result, providing a relatively sharp bound on the RDP properties for the Skellam machanism.
Theorem 3.5.
For , and sensitivity , the Skellam Mechanism is RDP with
(3.1) 
To remind the reader in comparison, the Gaussian mechanism is RDP with . The bound we provide is at most worse than the bound for the Gaussian, which is negligible for all practical choices of , especially as the privacy requirements increase.^{3}^{3}3The restriction that needs to be an integer is a technical one owing to known bounds on Bessel functions. In practice as we show, this restriction has a negligible effect. Next we show a simple corollary which follows via the independent composition of RDP across dimensions.
Corollary 3.6.
The multidimensional Skellam Mechanism is RDP with
(3.2) 
where and are the and sensitivities respectively.
3.2.1 Proof Overview for Theorem 3.5
In this subsection, we provide the proof of Theorem 3.5 assuming a technical bound on the ratios of Bessel functions presented as Lemma 3.7, which is the core of our analysis and may be of independent interest. We provide a proof overview for Lemma 3.7, deferring the full proof to the appendix.
On a macroscopic level, our proof structure mimics the RDP proof for the Gaussian mechanism [36], and the main object of our interest is to bound the following quantity, defined for any :
(3.3) 
The following lemma states our main bound on this quantity.
Lemma 3.7.
For any , with and , we have that for all
Note that in contrast if we consider the analogous notion of for the Gaussian mechanism (replacing with the Gaussian density ), we readily get the bound , which is the same as our bound up to lower order terms. We now provide the proof of Theorem 3.5.
Proof of Theorem 3.5.
We now provide an overview for the proof of Lemma 3.7 highlighting the crux of the argument. As a first step we collect some known facts regarding Bessel functions. It is known that for and , , is a decreasing function in , and is an increasing function in [47]. A succession of works consider bounding the ratio of successive Bessel functions , which is a natural quantity to considering the objective in Lemma 3.7. We use the following very tight characterization for this recently proved in [44, Theorem 5].
Lemma 3.8.
For any define the following function we have that
where is defined as .
Standard bounds such as those appearing in [5, 49] lead to the following conclusion:
While the above bound is significantly easier to work with, it leads to an RDP guarantee of Gaussian RDP + . In high dimensions this manifests as and overall leads to a constant multiplicative factor over the Gaussian. On the other hand we prove a Gaussian RDP + bound. Our proof of Lemma 3.7 splits into various cases depending on the signs of the quantities involved. We show the derivation for a single case below and defer the full proof to the appendix.
Proof of Lemma 3.7 in the case , .
4 Applying the Skellam Mechanism to Federated Learning
With a sharp RDP analysis for the multidimensional Skellam mechanism presented in the previous section, we are now ready to apply it to differentially private federated learning. We first outline the general problem setting and then describe our approach under central and distributed DP models.
Problem setting At a highlevel, we consider the distributed mean estimation problem. There are clients each holding a vector in such that for all , the vector norm is bounded as for some . We denote the set of vectors as , and the aim is for each client to communicate the vectors to a central server which then aggregates them as for an external analyst. In federated learning, the client vectors are the model gradients or model deltas (typically ) after training on the clients’ local datasets, and this procedure can be repeated for many rounds (). A large and thus necessitate accounting methods that provide tight privacy compositions for highdimensional queries.
We are primarily concerned with three metrics for this procedure and their tradeoffs: (1) Privacy: the mean should be differentially private with a reasonably small ; (2) Error: we wish to minimize the expected error; and (3) Communication: we wish to minimize the average number of bits communicated per coordinate. Characterizing this tradeoff is an important research problem. For example, it has been recently shown [50] that without formal privacy guarantees, the client training data could still be revealed by the model updates ; on the other hand, applying differential privacy [48] to these updates can degrade the final utility.
Skellam for central DP The central DP model refers to adding Skellam noise onto the nonprivate aggregate before releasing it to the external analyst. One important consideration is that the model updates in FL are continuous in nature, while Skellam is a discrete probability distribution. One approach is to appropriately discretize the client updates, e.g., via uniform quantization (which involves scaling the inputs by a factor for some bitwidth followed by stochastic rounding^{4}^{4}4Example of stochastic rounding: 42.3 has 0.7 and 0.3 probability to be rounded to 42 and 43, respectively. Other discretization schemes are possible; we do not explore this direction further in this work.
for unbiased estimates), and the server can convert the private aggregate back to real numbers at the end. Note that this allows us to reparameterize the variance of the added Skellam noise as
, giving the following simple corollary based on Cor. 3.6:Corollary 4.1 (Scaled Skellam Mechanism).
With a scaling factor , the multidimensional Skellam Mechanism is RDP with
(4.1) 
As increases, the RDP of scaled Skellam rapidly approaches that of Gaussian as the second term above approaches 0, suggesting that under practical regimes with moderate compression bitwidth, Skellam should perform competitively compared to Gaussian. Another aspect worth noting is that rounding vector coordinates from reals to integers can inflate the sensitivity , and thus more noise is required for the same privacy. To this end, we leverage the conditional rounding procedure introduced in [25] to obtain a bounded norm on the scaled and rounded client vector:
Proposition 4.2 (Norm of stochastically rounded vector [25]).
Let be a stochastic rounding of vector to the integer grid . Then, for , we have
(4.2) 
Conditional rounding is thus defined as retrying the stochastic rounding on until is within the probabilistic bound above (which also gives the inflated sensitivity ). We can then add Skellam noise to the aggregate according to before undoing the quantization (unscaling). Note that a larger scaling before rounding reduces the norm inflation and the extra noise needed (Fig. 3 right).
Skellam for distributed DP with secure aggregation A stronger notion of privacy in FL can be obtained via the distributed DP model [25] that leverages secure aggregation (SecAgg [12]). The fact that the Skellam distribution is closed under summation allows us to easily extend from central DP to distributed DP. Under this model, the client vectors are quantized as in central DP model, but the Skellam noise is now added locally with variance . Then, the noisy client updates are summed via SecAgg ( bits per coordinate for field size ) which only reveals the noisy aggregate to the server. While the local noise might be insufficient for local DP guarantees, the aggregated noise at the server provides privacy and utility comparable to the central DP model, thus removing trust away from the central aggregator. Note that the modulo operations introduced by SecAgg does not impact privacy as it can be viewed as a postprocessing of an already differentially private query.
We remark on several properties of the distributed Skellam compared to the distributed discrete Gaussian (DDGauss [25]). (1) DDGauss is not closed under summation, and the divergence between discrete Gaussians can lead to notable privacy degradation in settings such as quantile estimation [6] and federated analytics [42] with sufficiently large number of clients and small local noises (see also the left side of Fig. 2 and Fig. 3). While scaling mitigates this issue, it also requires additional bitwidth which makes Skellam attractive under tight communication constraints. (2) Sampling from Skellam only requires sampling from Poisson, for which efficient implementations are widely available in numerical software packages. While efficient discrete Gaussian sampling has also been explored in the latticebased cryptography community (e.g., [43, 18, 38]), we believe the accessibility of Skellam samplers would help facilitate the deployment of DP to FL settings with mobile and edge devices. See Appendix D for more discussion. (3) In practice where (dictated by bitwidth ), both Skellam (cf. Cor. 4.1) and DDGauss (with an exponentially small divergence) quickly approaches Gaussian under RDP, and any differences will be negligible (Fig. 3).
5 Empirical Evaluation
In this section, we empirically evaluate the Skellam mechanism on two sets of experiments: distributed mean estimation and federated learning. In both cases, we focus on the distributed DP model, but note that the Skellam mechanism can be easily adapted to the central DP setting as discussed in the earlier section. Unless otherwise stated, we use RDP accounting for all experiments due to the highdimensional data and the ease of composition (Section
3). To obtain for Skellam RDP, we note that since in general and for integers.Under the distributed DP model, we also introduce a random orthogonal transformation [29, 2, 25]
before discretizing and aggregating the client vectors (which can be reverted after the aggregation); this makes the vector coordinates subGaussian and helps spread the magnitudes of the vector coordinates across all dimensions, thus reducing the errors from quantization and potential wraparound from SecAgg modulo operations. Moreover, by approximating the subexponential tail of the Skellam distribution as subGaussian, we can derive a heuristic for choosing
following [25] based on a bound on the variance of the aggregated signal, as . We choose such that are bounded within the SecAgg field size , where is a small constant.Algorithm 1 summarizes the aggregation procedure for the distributed Skellam mechanism via secure aggregation as well as the parameters used for the experiments. In summary, we have an clip norm ; percoordinate bitwidth ; target central noise variance ; number of clients ; signal bound multiplier ; and rounding bias . We fix for all experiments. Note that the percoordinate bitwidth is for the aggregated sum as it determines the field size of SecAgg. For federated learning, we also consider the number of rounds and the total number of clients (thus the uniform sampling ratio at every round). Our experiments are implemented in Python, TensorFlow Privacy [32], and TensorFlow Federated [24]. See also Appendix for additional results and more details on the experimental setup.
5.1 Distributed Mean Estimation (DME)
We first consider DME as the generalization of (single round) FL. We randomly generate client vectors from the dimensional sphere with radius , and compute the true mean . We then compute the private estimate of with the distributed Skellam mechanism (Algorithm 1) as . For a strong baseline, we use the analytic Gaussian mechanism [9] with tight accounting (see also Figure 2). In Figure 4, we plot the MSE as
with 95% confidence interval (small shaded region) over 10 dataset initializations across different values of
, , and . Results demonstrate that Skellam can match Gaussian even with clients as long as the bitwidth is sufficient. We emphasize that the communication cost depends logarithmically on , and to put numbers into context, Google’s production nextword prediction models [23, 39] use and the production DP language model [40] uses .5.2 Federated Learning
Setup We evaluate on three public federated datasets with realworld characteristics: Federated EMNIST [16], Shakespeare [31, 14], and Stack Overflow next word prediction (SONWP [8]). EMNIST is an image classification dataset for handwritten digits and letters; Shakespeare is a text dataset for nextcharacterprediction based on the works of William Shakespeare; and SONWP is a largescale text dataset for nextwordprediction based on user questions/answers from stackoverflow.com. We emphasize that all datasets have natural client heterogeneity that are representative of practical FL problems: the images in EMNIST are grouped the writer of the handwritten digits, the lines in Shakespeare are grouped by the speaking role, and the sentences in SONWP are grouped by the corresponding Stack Overflow user. We train a small CNN with model size for EMNIST and use the recurrent models defined in [41]
for Shakespeare and SONWP. The hyperparameters for the experiments follow those from
[25, 6, 27, 41] and tuning is limited. For EMNIST, we follow [25] and fix , , , client learning rate , server learning rate , and client batch size . For Shakespeare, we follow [6] and fix , , , , and , and we sweep . For SONWP, we follow [27] and fix , , , , and , and we sweepand limit max examples per client to 256. In all cases, clients train for 1 epoch on their local datasets, and the client updates are weighted uniformly (as opposed to weighting by number of examples). See Appendix for more results and full details on datasets, models, and hyperparameters.
Results Figure 5 summarizes the FL experiments. For EMNIST and Shakespeare, we report the average test accuracy over the last 100 rounds. For SONWP, we report the top1 accuracy (without padding, outofvocab, or begining/endofsentence tokens) on the test set. The results indicate that Skellam performs as good as Gaussian despite relying on generic RDP amplification via sampling [51] (cf. Fig. 3) and that Skellam matches DDG consistently under realistic regimes. This bears significant practical relevance given the advantages of Skellam over DDG in realworld deployments.
6 Conclusion
We have introduced the multidimensional Skellam mechanism for federated learning. We analyzed the Skellam mechanism through the lens of approximate DP, privacy loss distributions, and Rényi divergences, and derived a sharp RDP bound that enables Skellam to match Gaussian and discrete Gaussian in practical settings as demonstrated by our largescale experiments. Since Skellam is closed under summation and efficient samplers are widely available, it represents an attractive alternative to distributed discrete Gaussian as it easily extends from the central DP model to the distributed DP model. Being a discrete mechanism can also bring potential communication savings over continuous mechanisms and make Skellam less prone to attacks that exploit floatingpoint arithmetic on digital computers. Some interesting future work includes: (1) our scalar PLD analysis for Skellam suggests room for improvements on our multidimensional analysis via a complete PLD characterization, and (2) our results on FL may be further improved via a targeted analysis for RDP amplification via sampling akin to
[37]. Overall, this work is situated within the active area of private machine learning and aims at making ML more trustworthy. One potential negative impact is that our method could be (deliberately or inadvertently) misused, such as sampling the wrong noise or using a minuscule scaling factor, to provide nonexistent privacy guarantees for real users’ data. We nevertheless believe our results have positive impact as they facilitate the deployment of differential privacy in practice.References
 [1] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016.
 [2] Naman Agarwal, Ananda Theertha Suresh, Felix Xinnan X Yu, Sanjiv Kumar, and Brendan McMahan. cpSGD: Communicationefficient and differentiallyprivate distributed sgd. In Advances in Neural Information Processing Systems, pages 7564–7575, 2018.
 [3] Maruan AlShedivat, Jennifer Gillenwater, Eric Xing, and Afshin Rostamizadeh. Federated learning via posterior averaging: A new perspective and practical algorithms. In ICLR, 2021.
 [4] Martin R. Albrecht and Michael Walter. dgs, Discrete Gaussians over the Integers. Available at https://bitbucket.org/malb/dgs, 2018.
 [5] Donald E Amos. Computation of modified bessel functions and their ratios. Mathematics of Computation, 28(125):239–251, 1974.
 [6] Galen Andrew, Om Thakkar, H Brendan McMahan, and Swaroop Ramaswamy. Differentially private learning with adaptive clipping. arXiv preprint arXiv:1905.03871, 2019.
 [7] S. Asoodeh, J. Liao, F. P. Calmon, O. Kosut, and L. Sankar. A better bound gives a hundred rounds: Enhanced privacy guarantees via fdivergences. In 2020 IEEE International Symposium on Information Theory (ISIT), pages 920–925, 2020.
 [8] The TensorFlow Federated Authors. Tensorflow federated stack overflow dataset, 2019.
 [9] Borja Balle and YuXiang Wang. Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learning, pages 394–403. PMLR, 2018.
 [10] Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 464–473. IEEE, 2014.
 [11] James Bell, K. A. Bonawitz, Adrià Gascón, Tancrède Lepoint, and Mariana Raykova. Secure singleserver aggregation with (poly)logarithmic overhead. Cryptology ePrint Archive, Report 2020/704, 2020. https://eprint.iacr.org/2020/704.
 [12] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation for privacypreserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1175–1191, 2017.
 [13] Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer, 2016.
 [14] Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konečnỳ, H Brendan McMahan, Virginia Smith, and Ameet Talwalkar. Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097, 2018.
 [15] Clément Canonne, Gautam Kamath, and Thomas Steinke. The discrete gaussian for differential privacy. In NeurIPS, 2020.

[16]
Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre Van Schaik.
Emnist: Extending mnist to handwritten letters.
In2017 International Joint Conference on Neural Networks (IJCNN)
, pages 2921–2926. IEEE, 2017.  [17] Google Differential Privacy Team. Privacy loss distributions. https://github.com/google/differentialprivacy/blob/master/accounting/docs/Privacy_Loss_Distributions.pdf.2020.
 [18] Nagarjun C Dwarakanath and Steven D Galbraith. Sampling from discrete gaussians for latticebased cryptography on a constrained device. Applicable Algebra in Engineering, Communication and Computing, 25(3):159–180, 2014.
 [19] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 486–503. Springer, 2006.
 [20] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265–284. Springer, 2006.
 [21] Cynthia Dwork and Guy N Rothblum. Concentrated differential privacy. arXiv preprint arXiv:1603.01887, 2016.
 [22] Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 51–60. IEEE, 2010.
 [23] Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604, 2018.
 [24] Alex Ingerman and Krzys Ostrowski. Introducing tensorflow federated, 2019.
 [25] Peter Kairouz, Ziyu Liu, and Thomas Steinke. The distributed discrete gaussian mechanism for federated learning with secure aggregation. In International Conference on Machine Learning. PMLR, 2021.
 [26] Peter Kairouz, Brendan McMahan, et al. Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977, 2019.
 [27] Peter Kairouz, Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, and Zheng Xu. Practical and private (deep) learning without sampling or shuffling. arXiv preprint arXiv:2103.00039, 2021.
 [28] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. In International conference on machine learning, pages 1376–1385. PMLR, 2015.
 [29] Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492, 2016.

[30]
Antti Koskela, Joonas Jälkö, and Antti Honkela.
Computing tight differential privacy guarantees using fft.
In
International Conference on Artificial Intelligence and Statistics
, pages 2560–2569. PMLR, 2020.  [31] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communicationefficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pages 1273–1282, 2017.
 [32] H Brendan McMahan, Galen Andrew, Ulfar Erlingsson, Steve Chien, Ilya Mironov, Nicolas Papernot, and Peter Kairouz. A general approach to adding differential privacy to iterative training procedures. arXiv preprint arXiv:1812.06210, 2018.

[33]
H Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang.
Learning differentially private recurrent language models.
In ICLR, 2018.  [34] Sebastian Meiser and Esfandiar Mohammadi. Tight on budget? tight bounds for rfold approximate differential privacy. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 247–264, 2018.
 [35] Ilya Mironov. On significance of the least significant bits for differential privacy. In Proceedings of the 2012 ACM conference on Computer and communications security, pages 650–661, 2012.
 [36] Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263–275. IEEE, 2017.
 [37] Ilya Mironov, Kunal Talwar, and Li Zhang. Rényi differential privacy of the sampled gaussian mechanism. arXiv preprint arXiv:1908.10530, 2019.
 [38] Thomas Prest, Thomas Ricosset, and Mélissa Rossi. Simple , fast and constanttime gaussian sampling over the integers for falcon. In Second PQC Standardization Conference, 2019.
 [39] Swaroop Ramaswamy, Rajiv Mathews, Kanishka Rao, and Françoise Beaufays. Federated learning for emoji prediction in a mobile keyboard. arXiv preprint arXiv:1906.04329, 2019.
 [40] Swaroop Ramaswamy, Om Thakkar, Rajiv Mathews, Galen Andrew, H Brendan McMahan, and Françoise Beaufays. Training production language models without memorizing user data. arXiv preprint arXiv:2009.10031, 2020.
 [41] Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečnỳ, Sanjiv Kumar, and H Brendan McMahan. Adaptive federated optimization. arXiv preprint arXiv:2003.00295, 2020.

[42]
Google Research.
Federated analytics: Collaborative data science without data collection, May 2020.
 [43] Sujoy Sinha Roy, Frederik Vercauteren, and Ingrid Verbauwhede. High precision discrete gaussian sampling on fpgas. In International Conference on Selected Areas in Cryptography, pages 383–401. Springer, 2013.
 [44] Diego RuizAntolín and Javier Segura. A new type of sharp bounds for ratios of modified bessel functions. Journal of Mathematical Analysis and Applications, 443(2):1232–1246, 2016.
 [45] Aaron Schein, Zhiwei Steven Wu, Alexandra Schofield, Mingyuan Zhou, and Hanna Wallach. Locally private bayesian inference for count models. In International Conference on Machine Learning, pages 5638–5648. PMLR, 2019.
 [46] Shuang Song, Kamalika Chaudhuri, and Anand D Sarwate. Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing, pages 245–248. IEEE, 2013.
 [47] VR Thiruvenkatachar and TS Nanjundiah. Inequalities concerning bessel functions and orthogonal polynomials. In Proceedings of the Indian Academy of SciencesSection A, volume 33, page 373. Springer, 1951.
 [48] Florian Tramèr and Dan Boneh. Differentially private learning needs better features (or much more data). arXiv preprint arXiv:2011.11660, 2020.
 [49] Filipp Valovich and Francesco Alda. Computational differential privacy from latticebased cryptography. In International Conference on NumberTheoretic Methods in Cryptology, pages 121–141. Springer, 2017.
 [50] Hongxu Yin, Arun Mallya, Arash Vahdat, Jose M Alvarez, Jan Kautz, and Pavlo Molchanov. See through gradients: Image batch recovery via gradinversion. arXiv preprint arXiv:2104.07586, 2021.
 [51] Yuqing Zhu and YuXiang Wang. Poission subsampled rényi differential privacy. In International Conference on Machine Learning, pages 7634–7642. PMLR, 2019.
Appendix A Proof of Lemma 3.7
Before moving forward we state the following lemma which follows via a simple calculation.
Lemma A.1.
For any positive real number and any , we have that
Further for any positive reals we have that
where as defined in Lemma 3.8.
Proof.
The first inequality follows easily from the definition of and by noting that the function . For the second inequality, by the definition of we have that
Now, consider the scalar function for . Note that the function is monotonically increasing, concave and has values between with . Putting these facts together we have that for any
∎
Lemma A.2.
Given two nonnegative integers we have that
We are now ready to provide the proof of Lemma 3.7.
Proof of Lemma 3.7.
We prove the statement for , a similar analysis applies for the case by switching to .
Since 3.8 applies only in the case when is positive, we need to handle the negative case via noting that for integer . This necessitates the requirement for multiple cases. We begin with the first case
Case 1 
In this case replacing setting we get that
(A.1) 
where we know that . Now consider the following calculation.
where the first inequality follows from Lemma 3.8 and the second inequality follows from the fact that for all we have that and the third inequality follows from Lemma A.1.
Case 2 
In this case replacing setting we get that
(A.2) 
where we know that . Now consider the following calculation.
where the first inequality follows from Lemma 3.8 and the second inequality follows from the fact that for all we have that and the third inequality follows from Lemma A.1.
Case 3 
In this case we first note that
Next consider the following calculation which corresponds to applying Lemma A.2 to the above expression we get that,