1 Introduction
A principal goal within trustworthy machine learning is the design of privacypreserving algorithms. In recent years, differential privacy (DP) (Dwork et al., 2006b; a) has gained significant popularity as a privacy notion due to the strong protections that it ensures. This has led to several practical deployments including by Google (Erlingsson et al., 2014; Shankland, 2014), Apple (Greenberg, 2016; Apple Differential Privacy Team, 2017), Microsoft (Ding et al., 2017), and the U.S. Census Bureau (Abowd, 2018). DP properties are often expressed in terms of parameters and , with small values indicating that the algorithm is less likely to leak information about any individual within a set of people providing data. It is common to set to a small positive constant (e.g., ), and to inversepolynomial in .
DP can be enforced for any statistical or machine learning task, and it is particularly wellstudied for the real summation problem, where each user holds a real number
, and the goal is to estimate
. This constitutes a basic building block within machine learning, with extensions including (private) distributed mean estimation (see, e.g., Biswas et al., 2020; Girgis et al., 2021)(Song et al., 2013; Bassily et al., 2014; Abadi et al., 2016; Agarwal et al., 2018), and clustering (Stemmer and Kaplan, 2018; Stemmer, 2020).The real summation problem, which is the focus of this work, has been wellstudied in several models of DP. In the central model where a curator has access to the raw data and is required to produce a private data release, the smallest possible absolute error is known to be ; this can be achieved via the ubiquitous Laplace mechanism (Dwork et al., 2006b), which is also known to be nearly optimal^{1}^{1}1Please see the supplementary material for more discussion. for the most interesting regime of . In contrast, for the more privacystringent local setting (Kasiviswanathan et al., 2008) (also Warner, 1965) where each message sent by a user is supposed to be private, the smallest error is known to be (Beimel et al., 2008; Chan et al., 2012). This significant gap between the achievable central and local utilities has motivated the study of intermediate models of DP. The shuffle model (Bittau et al., 2017; Erlingsson et al., 2019; Cheu et al., 2019) reflects the setting where the user reports are randomly permuted before being passed to the analyzer; the output of the shuffler is required to be private. Two variants of the shuffle model have been studied: in the multimessage case (e.g., Cheu et al., 2019), each user can send multiple messages to the shuffler; in the singlemessage setting each user sends one message (e.g., Erlingsson et al., 2019; Balle et al., 2019).
For the real summation problem, it is known that the smallest possible absolute error in the singlemessage shuffle model^{2}^{2}2Here hides a polylogarithmic factor in , in addition to a dependency on . is (Balle et al., 2019). In contrast, multimessage shuffle protocols exist with a nearcentral accuracy of (Ghazi et al., 2020c; Balle et al., 2020), but they suffer several drawbacks in that the number of messages sent per user is required to be at least , each message has to be substantially longer than in the nonprivate case, and in particular, the number of bits of communication per user has to grow with . This (at least) threefold communication blowup relative to a nonprivate setting can be a limitation in realtime reporting use cases (where encryption of each message may be required and the associated cost can become dominant) and in federated learning settings (where great effort is undertaken to compress the gradients). Our work shows that nearcentral accuracy and nearzero communication overhead are possible for real aggregation over sufficiently many users:
Theorem 1.
For any , there is an DP real summation protocol in the shuffle model whose mean squared error (MSE) is at most the MSE of the Laplace mechanism with parameter , each user sends messages in expectation, and each message contains bits.
Note that hides a small term. Moreover, the number of bits per message is equal, up to lower order terms, to that needed to achieve MSE even without any privacy constraints.
Theorem 1 follows from an analogous result for the case of integer aggregation, where each user is given an element in the set (with an integer), and the goal of the analyzer is to estimate the sum of the users’ inputs. We refer to this task as the summation problem.
For summation, the standard mechanism in the central model is the Discrete Laplace (aka Geometric) mechanism, which first computes the true answer and then adds to it a noise term sampled from the Discrete Laplace distribution^{3}^{3}3The Discrete Laplace distribution with parameter , denoted by
, has probability mass
at each . with parameter (Ghosh et al., 2012). We can achieve an error arbitrarily close to this mechanism in the shuffle model, with minimal communication overhead:Theorem 2.
For any , there is an DP summation protocol in the shuffle model whose MSE is at most that of the Discrete Laplace mechanism with parameter , and where each user sends messages in expectation, with each message containing bits.
In Theorem 2, the hides a factor. We also note that the number of bits per message in the protocol is within a single bit from the minimum message length needed to compute the sum without any privacy constraints. Incidentally, for , Theorem 2 improves the communication overhead obtained by Ghazi et al. (2020b) from to . (This improvement turns out to be crucial in practice, as our experiments show.)
Using Theorem 1 as a blackbox, we obtain the following corollary for the sparse vector summation problem, where each user is given a
sparse (possibly highdimensional) vector of norm at most
, and the goal is to compute the sum of all user vectors with minimal error.Corollary 3.
For every , and , there is an DP algorithm for sparse vector summation in dimensions in the shuffle model whose error is at most that of the Laplace mechanism with parameter , and where each user sends messages in expectation, and each message contains bits.
1.1 Technical Overview
We will now describe the highlevel technical ideas underlying our protocol and its analysis. Since the real summation protocol can be obtained from the summation protocol using known randomized discretization techniques (e.g., from Balle et al. (2020)), we focus only on the latter. For simplicity of presentation, we will sometimes be informal here; everything will be formalized later.
Infinite Divisibility.
To achieve a similar performance to the centralDP Discrete Laplace mechanism (described before Theorem 2) in the shuffle model, we face several obstacles. To begin with, the noise has to be divided among all users, instead of being added centrally. Fortunately, this can be solved through the infinite divisibility of Discrete Laplace distributions^{4}^{4}4See (Goryczka and Xiong, 2017) for a discussion on distributed noise generation via infinite divisibility.: there is a distribution for which, if each user samples a noise independently from , then has the same distribution as .
To implement the above idea in the shuffle model, each user has to be able to send their noise to the shuffler. Following Ghazi et al. (2020b), we can send such a noise in unary^{5}^{5}5Since the distribution has a small tail probability, will mostly be in , meaning that nonunary encoding of the noise does not significantly reduce the communication., i.e., if we send the message times and otherwise we send the message times. This is in addition to user sending their own input (in binary^{6}^{6}6If we were to send in unary similar to the noise, it would require possibly as many as messages, which is undesirable for us since we later pick to be for real summation., as a single message) if it is nonzero. The analyzer is simple: sum up all the messages.
Unfortunately, this zerosum noise approach is not shuffle DP for because, even after shuffling, the analyzer can still see , the number of messages , which is exactly the number of users whose input is equal to for .
ZeroSum Noise over NonBinary Alphabets.
To overcome this issue, we have to ‘noise” the values themselves, while at the same time preserving the accuracy. We achieve this by making some users send additional messages whose sum is equal to zero; e.g., a user may send in conjunction with previously described messages. Since the analyzer just sums up all the messages, this additional zerosum noise still does not affect accuracy.
The bulk of our technical work is in the privacy proof of such a protocol. To understand the challenge, notice that the analyzer still sees the ’s, which are now highly correlated due to the zerosum noise added. This is unlike most DP algorithms in the literature where noise terms are added independently to each coordinate. Our main technical insight is that, by a careful change of basis, we can “reduce” the view to the independentnoise case.
To illustrate our technique, let us consider the case where . In this case, there are two zerosum “noise atoms” that a user might send: and . These two kinds of noise are sent independently, i.e., whether the user sends does not affect whether is also sent. After shuffling, the analyzer sees . Observe that there is a onetoone mapping between this and defined by , meaning that we may prove the privacy of the latter instead. Consider the effect of sending the noise: is increased by one, whereas are completely unaffected. Similarly, when we send noise, is increased by one, whereas are completely unaffected. Hence, the noise added to are now independent! Finally, is exactly the sum of all messages, which was noised by the noise explained earlier.
A vital detail omitted in the previous discussion is that the noise, which affects , is not canceled out in . Indeed, in our formal proof we need a special argument (Lemma 9) to deal with this noise.
Moreover, generalizing this approach to larger values of requires overcoming additional challenges: (i) the basis change has to be carried out over the integers
, which precludes a direct use of classic tools from linear algebra such as the Gram–Schmidt process, and (ii) special care has to be taken when selecting the new basis so as to ensure that the sensitivity does not significantly increase, which would require more added noise (this complication leads to the usage of the
linear query problem in Section 4).1.2 Related Work
Summation in the Shuffle Model.
Our work is most closely related to that of Ghazi et al. (2020b) who gave a protocol for the case where (i.e., binary summation) and our protocol can be viewed as a generalization of theirs. As explained above, this requires significant novel technical and conceptual ideas; for example, the basis change was not (directly) required by Ghazi et al. (2020b).
The idea of splitting the input into multiple additive shares dates back to the “splitandmix” protocol of Ishai et al. (2006) whose analysis was improved in Ghazi et al. (2020c); Balle et al. (2020) to get the aforementioned shuffle DP algorithms for aggregation. These analyses all crucially rely on the addition being over a finite group. Since we actually want to sum over integers and there are users, this approach requires the group size to be at least to prevent an “overflow”. This also means that each user needs to send at least bits. On the other hand, by dealing with integers directly, each of our messages is only bits, further reducing the communication.
From a technical standpoint, our approach is also different from that of Ishai et al. (2006) as we analyze the privacy of the protocol, instead of its security as in their paper. This allows us to overcome the known lower bound of on the number of messages for informationtheoretic security (Ghazi et al., 2020c), and obtain a DP protocol with messages (where is as in Theorem 1).
The Shuffle DP Model.
Recent research on the shuffle model of DP includes work on aggregation mentioned above (Balle et al., 2019; Ghazi et al., 2020c; Balle et al., 2020), analytics tasks including computing histograms and heavy hitters (Ghazi et al., 2021a; Balcer and Cheu, 2020; Ghazi et al., 2020a; b; Cheu and Zhilyaev, 2021), counting distinct elements (Balcer et al., 2021; Chen et al., 2021) and private mean estimation (Girgis et al., 2021), as well as means clustering (Chang et al., 2021).
Aggregation in Machine Learning.
We note that communicationefficient private aggregation is a core primitive in federated learning (see Section of Kairouz et al. (2019) and the references therein). It is also naturally related to mean estimation in distributed models of DP (e.g., Gaboardi et al., 2019). Finally, we point out that communication efficiency is a common requirement in distributed learning and optimization, and substantial effort is spent on compression of the messages sent by users, through multiple methods including hashing, pruning, and quantization (see, e.g., Zhang et al., 2013; Alistarh et al., 2017; Suresh et al., 2017; Acharya et al., 2019; Chen et al., 2020).
1.3 Organization
We start with some background in Section 2. Our protocol is presented in Section 3. Its privacy property is established and the parameters are set in Section 4. Experimental results are given in Section 5. We discuss some interesting future directions in Section 6. All missing proofs can be found in the Supplementary Material (SM).
2 Preliminaries and Notation
We use to denote .
Probability.
For any distribution , we write
to denote a random variable
that is distributed as . For two distributions , let (resp., ) denote the distribution of (resp., ) where are independent. For , we use to denote the distribution of where .A distribution over nonnegative integers is said to be infinitely divisible if and only if, for every , there exists a distribution such that is identical to , where the sum is over distributions.
The negative binomial distribution with parameters
, , denoted , has probability mass at all . is infinitely divisible; specifically, .Differential Privacy.
Two input datasets and are said to be neighboring if and only if they differ on at most a single user’s input, i.e., for all but one .
Shuffle DP Model.
A protocol over inputs in the shuffle DP model (Bittau et al., 2017; Erlingsson et al., 2019; Cheu et al., 2019) consists of three procedures. A local randomizer takes an input and outputs a set of messages. The shuffler takes the multisets output by the local randomizer applied to each of , and produces a random permutation of the messages as output. Finally, the analyzer takes the output of the shuffler and computes the output of the protocol. Privacy in the shuffle model is enforced on the output of the shuffler when a single input is changed.
3 Generic Protocol Description
Below we describe the protocol for summation that is private in the shuffle DP model. In our protocol, the randomizer will send messages, each of which is an integer in . The analyzer simply sums up all the incoming messages. The messages sent from the randomizer can be categorized into three classes:

[nosep]

Input: each user will send if it is nonzero.

Central Noise: This is the noise whose sum is equal to the Discrete Laplace noise commonly used algorithms in the central DP model. This noise is sent in “unary” as or messages.

ZeroSum Noise: Finally, we “flood” the messages with noise that cancels out. This noise comes from a carefully chosen subcollection of the collection of all multisets of whose sum of elements is equal to zero (e.g., may belong to ).^{7}^{7}7Note that while may be infinite, we will later set it to be finite, resulting in an efficient protocol. For more details, see Theorem 12 and the paragraph succeeding it. We will refer to each as a noise atom.
Algorithms 1 and 2 show the generic form of our protocol, which we refer to as the Correlated Noise mechanism. The protocol is specified by the following infinitely divisible distributions over : the “central” noise distribution , and for every , the “flooding” noise distribution .
Note that since is a multiset, Line 8 goes over each element the same number of times it appears in ; e.g., if , the iteration is executed twice.
3.1 Error and Communication Complexity
We now state generic forms for the MSE and communication cost of the protocol:
Observation 5.
MSE is .
We stress here that the distribution itself is not the Discrete Laplace distribution; we pick it so that is . As a result,
is indeed equal to the variance of the Discrete Laplace noise.
Observation 6.
Each user sends at most messages in expectation, each consisting of bits.
4 Parameter Selection and Privacy Proof
The focus of this section is on selecting concrete distributions to initiate the protocol and formalize its privacy guarantees, ultimately proving Theorem 2. First, in Section 4.1, we introduce additional notation and reduce our task to proving a privacy guarantee for a protocol in the central model. With these simplifications, we give a generic form of privacy guarantees in Section 4.2. Section 4.3 and Section 4.4 are devoted to a more concrete selection of parameters. Finally, Theorem 2 is proved in Section 4.5.
4.1 Additional Notation and Simplifications
MatrixVector Notation.
We use boldface letters to denote vectors and matrices, and standard letters to refer to their coordinates (e.g., if is a vector, then refers to its th coordinate). For convenience, we allow general index sets for vectors and matrices; e.g., for an index set , we write to denote the tuple . Operations such as addition, scalarvector/matrix multiplication or matrixvector multiplication are defined naturally.
For , we use to denote the th vector in the standard basis; that is, its indexed coordinate is equal to and each of the other coordinates is equal to . Furthermore, we use to denote the allzeros vector.
Let denote , and denote the vector . Recall that a noise atom is a multiset of elements from . It is useful to also think of as a vector in where its th entry denotes the number of times appears in . We overload the notation and use to both represent the multiset and its corresponding vector.
Let denote the matrix whose rows are indexed by and whose columns are indexed by where . In other words, is a concatenation of column vectors . Furthermore, let denote , and denote the matrix with row removed.
Next, we think of each input dataset as its histogram where denotes the number of such that . Under this notation, two input datasets are neighbors iff and . For each histogram , we write to denote the vector resulting from appending zeros to the beginning of ; more formally, for every , we let if and if .
An Equivalent Central DP Algorithm.
A benefit of using infinitely divisible noise distributions is that they allow us to translate our protocols to equivalent ones in the central model, where the total sum of the noise terms has a wellunderstood distribution. In particular, with the notation introduced above, Algorithm 1 corresponds to Algorithm 3 in the central model:
Observation 7.
CorrNoiseRandomizer is DP in the shuffle model if and only if CorrNoiseCentral is DP in the central model.
Given creftypecap 7, we can focus on proving the privacy guarantee of CorrNoiseCentral in the central model, which will be the majority of this section.
Noise Addition Mechanisms for MatrixBased Linear Queries.
The noise addition mechanism for summation (as defined in Section 1) works by first computing the summation and adding to it a noise random variable sampled from , where is a distribution over integers. Note that under our vector notation above, the noise addition mechanism simply outputs where .
It will be helpful to consider a generalization of the summation problem, which allows above to be changed to any matrix (where the noise is now also a vector).
To define such a problem formally, let be any index set. Given a matrix , the linear query problem^{8}^{8}8Similar definitions are widely used in literature; see e.g. (Nikolov et al., 2013). The main distinction of our definition is that we only consider integervalued and . is to compute, given an input histogram , an estimate of . (Equivalently, one can think of each user as holding a column vector of or the allzeros vector , and the goal is to compute the sum of these vectors.)
The noise addition algorithms for summation can be easily generalized to the linear query case: for a collection of distributions, the noise addition mechanism samples^{9}^{9}9Specifically, sample independently for each . and then outputs .
4.2 Generic Privacy Guarantee
With all the necessary notation ready, we can now state our main technical theorem, which gives a privacy guarantee in terms of a right inverse of the matrix :
Theorem 8.
Let denote any right inverse of (i.e., ) whose entries are integers. Suppose that the following holds:

[nosep]

The noise addition mechanism is DP for summation.

The noise addition mechanism is DP for the linear query problem, where denotes the th column of .
Then, CorrNoiseCentral with the following parameter selections is DP for summation:

[nosep]

.

.

for all .
The right inverse indeed represents the “change of basis” alluded to in the introduction. It will be specified in the next subsections along with the noise distributions .
As one might have noticed from Theorem 8, is somewhat different that other noise atoms, as its noise distribution is the sum of and . A highlevel explanation for this is that our central noise is sent as messages and we would like to use the noise atom to “flood out” the correlations left in messages. A precise version of this statement is given below in Lemma 9. We remark that the first output coordinate alone has exactly the same distribution as the noise addition mechanism. The main challenge in this analysis is that the two coordinates are correlated through ; indeed, this is where the random variable helps “flood out” the correlation.
Lemma 9.
Let be a mechanism that, on input histogram , works as follows:

[nosep]

Sample independently from .

Sample from .

Output .
If the noise addition mechanism is DP for summation, then is DP.
Lemma 9 is a direct improvement over the main analysis of Ghazi et al. (2020b), whose proof (which works only for ) requires the noise addition mechanism to be DP for summation; we remove this factor. Our novel insight is that when conditioned on , rather than conditioned on for some , the distributions of are quite similar, up to a “shift” of and a multiplicative factor of . This allows us to “match” the two probability masses and achieve the improvement. The full proof of Lemma 9 is deferred to SM.
Let us now show how Lemma 9 can be used to prove Theorem 8. At a highlevel, we run the mechanism from Lemma 9 and the noise addition mechanism, and argue that we can use their output to construct the output for CorrNoiseCentral. The intuition behind this is that the first coordinate of the output of gives the weighted sum of the desired output, and the second coordinate gives the number of messages used to flood the central noise. As for the noise addition mechanism, since is a right inverse of , we can use them to reconstruct the number of messages . The number of 1 messages can then be reconstructed from the weighted sum and all the numbers of other messages. These ideas are encapsulated in the proof below.
Proof of Theorem 8.
Consider defined as follows:

[nosep]

First, run from Lemma 9 on input histogram to arrive at an output

Second, run mechanism for linear query on to get an output .

Output computed by letting and then
From Lemma 9 and our assumption on , is DP. By assumption that the noise addition mechanism is DP for linear query and by the basic composition theorem, the first two steps of are DP. The last step of only uses the output from the first two steps; hence, by the postprocessing property of DP, we can conclude that is indeed .
Next, we claim that has the same distribution as with the specified parameters. To see this, recall from that we have , and , where and are independent. Furthermore, from the definition of the noise addition mechanism, we have
where are independent, and denotes after replacing its first coordinate with zero.
Notice that . Using this and our assumption that , we get
Let ; we can see that each entry is independently distributed as . Finally, we have
Recall that for all ; equivalently, . Thus, we have
Hence, we can conclude that ; this implies that has the same distribution as the mechanism . ∎
4.3 Negative Binomial Mechanism
Having established a generic privacy guarantee of our algorithm, we now have to specify the distributions that satisfy the conditions in Theorem 8 while keeping the number of messages sent for the noise small. Similar to Ghazi et al. (2020b), we use the negative binomial distribution. Its privacy guarantee is summarized below.^{10}^{10}10For the exact statement we use here, please refer to Theorem 13 of Ghazi et al. (2021b), which contains a correction of calculation errors in Theorem 13 of Ghazi et al. (2020b).
Theorem 10 (Ghazi et al. (2020b)).
For any and , let and . The additive noise mechanism is DP for summation.
We next extend Theorem 10 to the linear query problem. To state the formal guarantees, we say that a vector is dominated by vector iff . (We use the convention and for all .)
Corollary 11.
Let . Suppose that every column of is dominated by . For each , let , and . Then, noise addition mechanism is DP for linear query.
4.4 Finding a Right Inverse
A final step before we can apply Theorem 8 is to specify the noise atom collection and the right inverse of . Below we give such a right inverse where every column is dominated by a vector that is “small”. This allows us to then use the negative binomial mechanism in the previous section with a “small” amount of noise. How “small” is depends on the expected number of messages sent; this is governed by . With this notation, the guarantee of our right inverse can be stated as follows. (We note that in our noise selection below, every has at most three elements. In other words, and will be within a factor of three of each other.)
Theorem 12.
There exist of size , with and such that and every column of is dominated by .
The full proof of Theorem 12 is deferred to SM. The main idea is to essentially proceed via Gaussian elimination on . However, we have to be careful about our choice of orders of rows/columns to run the elimination on, as otherwise it might produce a noninteger matrix or one whose columns are not “small”. In our proof, we order the rows based on their absolute values, and we set to be the collection of and for all . In other words, these are the noise atoms we send in our protocol.
4.5 Specific Parameter Selection: Proof of Theorem 2
Proof of Theorem 2.
Let be as in Theorem 12, , and,

[nosep]

,

where and ,

where and for all .
From Theorem 10, the noise addition mechanism is DP for summation. Corollary 11 implies that the noise addition mechanism is DP for linear query. As a result, since and , Theorem 8 ensures that CorrNoiseCentral with parameters as specified in the theorem is DP.
The error claim follows immediately from creftypecap 5 together with the fact that . Using creftypecap 6, we can also bound the expected number of messages as
where each message consists of bits. ∎
5 Experimental Evaluation
We compare our “correlated noise” summation protocol (Algorithms 1, 2) against known algorithms in the literature, namely the IKOS “splitandmix” protocol (Ishai et al., 2006; Balle et al., 2020; Ghazi et al., 2020c) and the fragmented^{11}^{11}11
The RAPPOR randomizer starts with a onehot encoding of the input (which is a
bit string) and flips each bit with a certain probability. Fragmentation means that, instead of sending the entire bit string, the randomizer only sends the coordinates that are set to 1, each as a separate bit message. This is known to reduce both the communication and the error in the shuffle model (Cheu et al., 2019; Erlingsson et al., 2020). version of RAPPOR (Erlingsson et al., 2014; 2020). We also include the Discrete Laplace mechanism (Ghosh et al., 2012) in our plots for comparison, although it is not directly implementable in the shuffle model. We do not include the generalized (i.e., ary) Randomized Response algorithm (Warner, 1965) as it always incurs at least as large error as RAPPOR.For our protocol, we set in Theorem 8 to , meaning that its MSE is that of . While the parameters set in our proofs give a theoretically vanishing overhead guarantee, they turn out to be rather impractical; instead, we resort to a tighter numerical approach to find the parameters. We discuss this, together with the setting of parameters for the other alternatives, in the SM.
For all mechanisms the root mean square error (RMSE) and the (expected) communication per user only depends on , and is independent of the input data. We next summarize our findings.
Error.
The IKOS algorithm has the same error as the Discrete Laplace mechanism (in the central model), whereas our algorithm’s error is slightly larger due to being slightly smaller than . On the other hand, the RMSE for RAPPOR grows as increases, but it seems to converge as becomes larger. (For and , we found that RMSEs differ by less than 1%.)
However, the key takeaway here is that the RMSEs of IKOS, Discrete Laplace, and our algorithm grow only linearly in , but the RMSE of RAPPOR is proportional to . This is illustrated in Figure 0(a).
Communication.
While the IKOS protocol achieves the same accuracy as the Discrete Laplace mechanism, it incurs a large communication overhead, as each message sent consists of bits and each user needs to send multiple messages. By contrast, when fixing and taking , both RAPPOR and our algorithm only send messages, each of length and respectively. This is illustrated in Figure 0(b). Note that the number of bits sent for IKOS is indeed not a monotone function since, as increases, the number of messages required decreases but the length of each message increases.
Finally, we demonstrate the effect of varying in Figure 0(c) for a fixed value of . Although Theorem 2 suggests that the communication overhead should grow roughly linearly with , we have observed larger gaps in the experiments. This seems to stem from the fact that, while the growth would have been observed if we were using our analytic formula, our tighter parameter computation (detailed in Appendix H.1 of SM) finds a protocol with even smaller communication, suggesting that the actual growth might be less than though we do not know of a formal proof of this. Unfortunately, for large , the optimization problem for the parameter search gets too demanding and our program does not find a good solution, leading to the “larger gap” observed in the experiments.
6 Conclusions
In this work, we presented a DP protocol for real and sparse vector aggregation in the shuffle model with accuracy arbitrarily close to the best possible central accuracy, and with relative communication overhead tending to with increasing number of users. It would be very interesting to generalize our protocol and obtain qualitatively similar guarantees for dense vector summation.
We also point out that in the low privacy regime (), the staircase mechanism is known to significantly improve upon the Laplace mechanism (Geng and Viswanath, 2016; Geng et al., 2015); the former achieves MSE that is exponentially small in while the latter has MSE . An interesting open question is to achieve such a gain in the shuffle model.
References
 Deep learning with differential privacy. In CCS, pp. 308–318. Cited by: §1.
 The US Census Bureau adopts differential privacy. In KDD, pp. 2867–2867. Cited by: §1.
 Hadamard response: estimating distributions privately, efficiently, and with little communication. In AISTATS, pp. 1120–1129. Cited by: §1.2.
 cpSGD: communicationefficient and differentiallyprivate distributed SGD. In NeurIPS, pp. 7575–7586. Cited by: §1.
 QSGD: communicationefficient SGD via gradient quantization and encoding. In NIPS, pp. 1709–1720. Cited by: §1.2.
 Learning with privacy at scale. Apple Machine Learning Journal. Cited by: §1.
 Connecting robust shuffle privacy and panprivacy. In SODA, pp. 2384–2403. Cited by: §1.2.
 Separating local & shuffled differential privacy via histograms. In ITC, pp. 1:1–1:14. Cited by: §1.2.
 The privacy blanket of the shuffle model. In CRYPTO, pp. 638–667. Cited by: §1.2, §1, §1.
 Private summation in the multimessage shuffle model. In CCS, pp. 657–676. Cited by: §D.5, §1.1, §1.2, §1.2, §1, §5.
 Private empirical risk minimization: efficient algorithms and tight error bounds. In FOCS, pp. 464–473. Cited by: §1.
 Distributed private data analysis: simultaneously solving how and what. In CRYPTO, pp. 451–468. Cited by: §1.
 CoinPress: practical private mean and covariance estimation. In NeurIPS, Cited by: §1.
 Prochlo: strong privacy for analytics in the crowd. In SOSP, pp. 441–459. Cited by: §1, §2.
 Optimal lower bound for differentially private multiparty aggregation. In ESA, pp. 277–288. Cited by: §1.
 Locally private means in one round. In ICML, Cited by: §1.2.
 On distributed differential privacy and counting distinct elements. In ITCS, Cited by: §1.2.
 Breaking the communicationprivacyaccuracy trilemma. In NeurIPS, Cited by: §1.2.
 Distributed differential privacy via shuffling. In EUROCRYPT, pp. 375–403. Cited by: §1, §2, footnote 11.
 Differentially private histograms in the shuffle model from fake users. CoRR abs/2104.02739. External Links: 2104.02739 Cited by: §1.2.
 Collecting telemetry data privately. In NIPS, pp. 3571–3580. Cited by: §1.
 Our data, ourselves: privacy via distributed noise generation. In EUROCRYPT, pp. 486–503. Cited by: §1, Definition 4.
 Calibrating noise to sensitivity in private data analysis. In TCC, pp. 265–284. Cited by: §1, §1, Definition 4.
 Encode, shuffle, analyze privacy revisited: formalizations and empirical evaluation. CoRR abs/2001.03618. Cited by: §5, footnote 11.
 Amplification by shuffling: from local to central differential privacy via anonymity. In SODA, pp. 2468–2479. Cited by: §1, §2.
 RAPPOR: randomized aggregatable privacypreserving ordinal response. In CCS, pp. 1054–1067. Cited by: §1, §5.

Locally private mean estimation:
test and tight confidence intervals
. In AISTATS, pp. 2545–2554. Cited by: §1.2.  The staircase mechanism in differential privacy. IEEE J. Sel. Top. Signal Process. 9 (7), pp. 1176–1184. Cited by: §6.
 The optimal noiseadding mechanism in differential privacy. IEEE TOIT 62 (2), pp. 925–951. Cited by: §6.
 Pure differentially private summation from anonymous messages. In ITC, Cited by: §1.2.
 On the power of multiple anonymous messages. In EUROCRYPT, Cited by: §1.2.
 Private counting from anonymous messages: nearoptimal accuracy with vanishing communication overhead. In ICML, pp. 3505–3514. Cited by: Appendix A, §H.1, §1.1, §1.2, §1.2, §1, §4.2, §4.3, Theorem 10, Lemma 15, footnote 10, B. Ghazi, R. Kumar, P. Manurangsi, and R. Pagh (2021b).
 Private counting from anonymous messages: nearoptimal accuracy with vanishing communication overhead. CoRR abs/2106.04247. Note: This version contains a correction of calculation errors in Theorem 13 of Ghazi et al. (2020b). Cited by: footnote 10.
 Private aggregation from fewer anonymous messages. In EUROCRYPT, pp. 798–827. Cited by: §1.2, §1.2, §1.2, §1, §5.
 Universally utilitymaximizing privacy mechanisms. SIAM J. Comput. 41 (6), pp. 1673–1693. Cited by: Appendix B, §1, §5.
 Shuffled model of federated learning: privacy, communication and accuracy tradeoffs. In AISTATS, Cited by: §1.2, §1.
 A comprehensive comparison of multiparty secure additions with differential privacy. IEEE Trans. Dependable Secur. Comput. 14 (5), pp. 463–477. Cited by: footnote 4.
 Apple’s “differential privacy” is about collecting your data – but not your data. Wired, June 13. Cited by: §1.
 Cryptography from anonymity. In FOCS, pp. 239–248. Cited by: §1.2, §1.2, §5.
 Advances and open problems in federated learning. CoRR abs/1912.04977. Cited by: §1.2.
 What can we learn privately?. In FOCS, pp. 531–540. Cited by: §1.
 The geometry of differential privacy: the sparse and approximate cases. In STOC, pp. 351–360. External Links: Link, Document Cited by: footnote 8.
 Integrated public use microdata series (IPUMS) USA: version 10.0 [dataset]. Minneapolis, MN. External Links: Link Cited by: §H.3.
 How Google tricks itself to protect Chrome user privacy. CNET, October. Cited by: §1.
 Stochastic gradient descent with differentially private updates. In GlobalSIP, pp. 245–248. Cited by: §1.
 Differentially private means with constant multiplicative error. In NeurIPS, pp. 5436–5446. Cited by: §1.
 Locally private kmeans clustering. In SODA, pp. 548–559. Cited by: §1.
 Distributed mean estimation with limited communication. In ICML, pp. 3329–3337. Cited by: §1.2.
 Randomized response: a survey technique for eliminating evasive answer bias. JASA 60 (309), pp. 63–69. Cited by: §1, §5.
 Informationtheoretic lower bounds for distributed statistical estimation with communication constraints.. In NIPS, pp. 2328–2336. Cited by: §1.2.
Supplementary Material
Appendix A Additional Preliminaries
We start by introducing a few additional notation and lemmas that will be used in our proofs.
Definition 13.
The hockey stick divergence of two (discrete) distributions is defined as
where .
The following is a (wellknown) simple restatement of DP in terms of hockey stick divergence:
Lemma 14.
An algorithm is DP iff for any neighboring datasets and , it holds that .
We will also use the following result in (Ghazi et al., 2020b, Lemma 21), which allows us to easily compute differential privacy parameters noise addition algorithms for summation.
Lemma 15 (Ghazi et al. (2020b)).
An noise addition mechanism for summation is
Comments
There are no comments yet.