Differentially Private Aggregation in the Shuffle Model: Almost Central Accuracy in Almost a Single Message

09/27/2021 ∙ by Badih Ghazi, et al. ∙ Google Københavns Uni 0

The shuffle model of differential privacy has attracted attention in the literature due to it being a middle ground between the well-studied central and local models. In this work, we study the problem of summing (aggregating) real numbers or integers, a basic primitive in numerous machine learning tasks, in the shuffle model. We give a protocol achieving error arbitrarily close to that of the (Discrete) Laplace mechanism in the central model, while each user only sends 1 + o(1) short messages in expectation.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A principal goal within trustworthy machine learning is the design of privacy-preserving algorithms. In recent years, differential privacy (DP) (Dwork et al., 2006b; a) has gained significant popularity as a privacy notion due to the strong protections that it ensures. This has led to several practical deployments including by Google (Erlingsson et al., 2014; Shankland, 2014), Apple (Greenberg, 2016; Apple Differential Privacy Team, 2017), Microsoft (Ding et al., 2017), and the U.S. Census Bureau (Abowd, 2018). DP properties are often expressed in terms of parameters and , with small values indicating that the algorithm is less likely to leak information about any individual within a set of people providing data. It is common to set to a small positive constant (e.g., ), and to inverse-polynomial in .

DP can be enforced for any statistical or machine learning task, and it is particularly well-studied for the real summation problem, where each user holds a real number

, and the goal is to estimate

. This constitutes a basic building block within machine learning, with extensions including (private) distributed mean estimation (see, e.g., Biswas et al., 2020; Girgis et al., 2021)

, stochastic gradient descent 

(Song et al., 2013; Bassily et al., 2014; Abadi et al., 2016; Agarwal et al., 2018), and clustering (Stemmer and Kaplan, 2018; Stemmer, 2020).

The real summation problem, which is the focus of this work, has been well-studied in several models of DP. In the central model where a curator has access to the raw data and is required to produce a private data release, the smallest possible absolute error is known to be ; this can be achieved via the ubiquitous Laplace mechanism (Dwork et al., 2006b), which is also known to be nearly optimal111Please see the supplementary material for more discussion. for the most interesting regime of . In contrast, for the more privacy-stringent local setting (Kasiviswanathan et al., 2008) (also Warner, 1965) where each message sent by a user is supposed to be private, the smallest error is known to be (Beimel et al., 2008; Chan et al., 2012). This significant gap between the achievable central and local utilities has motivated the study of intermediate models of DP. The shuffle model (Bittau et al., 2017; Erlingsson et al., 2019; Cheu et al., 2019) reflects the setting where the user reports are randomly permuted before being passed to the analyzer; the output of the shuffler is required to be private. Two variants of the shuffle model have been studied: in the multi-message case (e.g., Cheu et al., 2019), each user can send multiple messages to the shuffler; in the single-message setting each user sends one message (e.g., Erlingsson et al., 2019; Balle et al., 2019).

For the real summation problem, it is known that the smallest possible absolute error in the single-message shuffle model222Here hides a polylogarithmic factor in , in addition to a dependency on . is (Balle et al., 2019). In contrast, multi-message shuffle protocols exist with a near-central accuracy of (Ghazi et al., 2020c; Balle et al., 2020), but they suffer several drawbacks in that the number of messages sent per user is required to be at least , each message has to be substantially longer than in the non-private case, and in particular, the number of bits of communication per user has to grow with . This (at least) three-fold communication blow-up relative to a non-private setting can be a limitation in real-time reporting use cases (where encryption of each message may be required and the associated cost can become dominant) and in federated learning settings (where great effort is undertaken to compress the gradients). Our work shows that near-central accuracy and near-zero communication overhead are possible for real aggregation over sufficiently many users:

Theorem 1.

For any , there is an -DP real summation protocol in the shuffle model whose mean squared error (MSE) is at most the MSE of the Laplace mechanism with parameter , each user sends messages in expectation, and each message contains bits.

Note that hides a small term. Moreover, the number of bits per message is equal, up to lower order terms, to that needed to achieve MSE even without any privacy constraints.

Theorem 1 follows from an analogous result for the case of integer aggregation, where each user is given an element in the set (with an integer), and the goal of the analyzer is to estimate the sum of the users’ inputs. We refer to this task as the -summation problem.

For -summation, the standard mechanism in the central model is the Discrete Laplace (aka Geometric) mechanism, which first computes the true answer and then adds to it a noise term sampled from the Discrete Laplace distribution333The Discrete Laplace distribution with parameter , denoted by

, has probability mass

at each . with parameter  (Ghosh et al., 2012). We can achieve an error arbitrarily close to this mechanism in the shuffle model, with minimal communication overhead:

Theorem 2.

For any , there is an -DP -summation protocol in the shuffle model whose MSE is at most that of the Discrete Laplace mechanism with parameter , and where each user sends messages in expectation, with each message containing bits.

In Theorem 2, the hides a factor. We also note that the number of bits per message in the protocol is within a single bit from the minimum message length needed to compute the sum without any privacy constraints. Incidentally, for , Theorem 2 improves the communication overhead obtained by Ghazi et al. (2020b) from to . (This improvement turns out to be crucial in practice, as our experiments show.)

Using Theorem 1 as a black-box, we obtain the following corollary for the -sparse vector summation problem, where each user is given a

-sparse (possibly high-dimensional) vector of norm at most

, and the goal is to compute the sum of all user vectors with minimal error.

Corollary 3.

For every , and , there is an -DP algorithm for -sparse vector summation in dimensions in the shuffle model whose error is at most that of the Laplace mechanism with parameter , and where each user sends messages in expectation, and each message contains bits.

1.1 Technical Overview

We will now describe the high-level technical ideas underlying our protocol and its analysis. Since the real summation protocol can be obtained from the -summation protocol using known randomized discretization techniques (e.g., from Balle et al. (2020)), we focus only on the latter. For simplicity of presentation, we will sometimes be informal here; everything will be formalized later.

Infinite Divisibility.

To achieve a similar performance to the central-DP Discrete Laplace mechanism (described before Theorem 2) in the shuffle model, we face several obstacles. To begin with, the noise has to be divided among all users, instead of being added centrally. Fortunately, this can be solved through the infinite divisibility of Discrete Laplace distributions444See (Goryczka and Xiong, 2017) for a discussion on distributed noise generation via infinite divisibility.: there is a distribution for which, if each user samples a noise independently from , then has the same distribution as .

To implement the above idea in the shuffle model, each user has to be able to send their noise to the shuffler. Following Ghazi et al. (2020b), we can send such a noise in unary555Since the distribution has a small tail probability, will mostly be in , meaning that non-unary encoding of the noise does not significantly reduce the communication., i.e., if we send the message times and otherwise we send the message times. This is in addition to user sending their own input (in binary666If we were to send in unary similar to the noise, it would require possibly as many as messages, which is undesirable for us since we later pick to be for real summation., as a single message) if it is non-zero. The analyzer is simple: sum up all the messages.

Unfortunately, this zero-sum noise approach is not shuffle DP for because, even after shuffling, the analyzer can still see , the number of messages , which is exactly the number of users whose input is equal to for .

Zero-Sum Noise over Non-Binary Alphabets.

To overcome this issue, we have to ‘noise” the values themselves, while at the same time preserving the accuracy. We achieve this by making some users send additional messages whose sum is equal to zero; e.g., a user may send in conjunction with previously described messages. Since the analyzer just sums up all the messages, this additional zero-sum noise still does not affect accuracy.

The bulk of our technical work is in the privacy proof of such a protocol. To understand the challenge, notice that the analyzer still sees the ’s, which are now highly correlated due to the zero-sum noise added. This is unlike most DP algorithms in the literature where noise terms are added independently to each coordinate. Our main technical insight is that, by a careful change of basis, we can “reduce” the view to the independent-noise case.

To illustrate our technique, let us consider the case where . In this case, there are two zero-sum “noise atoms” that a user might send: and . These two kinds of noise are sent independently, i.e., whether the user sends does not affect whether is also sent. After shuffling, the analyzer sees . Observe that there is a one-to-one mapping between this and defined by , meaning that we may prove the privacy of the latter instead. Consider the effect of sending the noise: is increased by one, whereas are completely unaffected. Similarly, when we send noise, is increased by one, whereas are completely unaffected. Hence, the noise added to are now independent! Finally, is exactly the sum of all messages, which was noised by the noise explained earlier.

A vital detail omitted in the previous discussion is that the noise, which affects , is not canceled out in . Indeed, in our formal proof we need a special argument (Lemma 9) to deal with this noise.

Moreover, generalizing this approach to larger values of requires overcoming additional challenges: (i) the basis change has to be carried out over the integers

, which precludes a direct use of classic tools from linear algebra such as the Gram–Schmidt process, and (ii) special care has to be taken when selecting the new basis so as to ensure that the sensitivity does not significantly increase, which would require more added noise (this complication leads to the usage of the

-linear query problem in Section 4).

1.2 Related Work

Summation in the Shuffle Model.

Our work is most closely related to that of Ghazi et al. (2020b) who gave a protocol for the case where (i.e., binary summation) and our protocol can be viewed as a generalization of theirs. As explained above, this requires significant novel technical and conceptual ideas; for example, the basis change was not (directly) required by Ghazi et al. (2020b).

The idea of splitting the input into multiple additive shares dates back to the “split-and-mix” protocol of Ishai et al. (2006) whose analysis was improved in Ghazi et al. (2020c); Balle et al. (2020) to get the aforementioned shuffle DP algorithms for aggregation. These analyses all crucially rely on the addition being over a finite group. Since we actually want to sum over integers and there are users, this approach requires the group size to be at least to prevent an “overflow”. This also means that each user needs to send at least bits. On the other hand, by dealing with integers directly, each of our messages is only bits, further reducing the communication.

From a technical standpoint, our approach is also different from that of Ishai et al. (2006) as we analyze the privacy of the protocol, instead of its security as in their paper. This allows us to overcome the known lower bound of on the number of messages for information-theoretic security (Ghazi et al., 2020c), and obtain a DP protocol with messages (where is as in Theorem 1).

The Shuffle DP Model.

Recent research on the shuffle model of DP includes work on aggregation mentioned above (Balle et al., 2019; Ghazi et al., 2020c; Balle et al., 2020), analytics tasks including computing histograms and heavy hitters (Ghazi et al., 2021a; Balcer and Cheu, 2020; Ghazi et al., 2020a; b; Cheu and Zhilyaev, 2021), counting distinct elements (Balcer et al., 2021; Chen et al., 2021) and private mean estimation (Girgis et al., 2021), as well as -means clustering (Chang et al., 2021).

Aggregation in Machine Learning.

We note that communication-efficient private aggregation is a core primitive in federated learning (see Section of Kairouz et al. (2019) and the references therein). It is also naturally related to mean estimation in distributed models of DP (e.g., Gaboardi et al., 2019). Finally, we point out that communication efficiency is a common requirement in distributed learning and optimization, and substantial effort is spent on compression of the messages sent by users, through multiple methods including hashing, pruning, and quantization (see, e.g., Zhang et al., 2013; Alistarh et al., 2017; Suresh et al., 2017; Acharya et al., 2019; Chen et al., 2020).

1.3 Organization

We start with some background in Section 2. Our protocol is presented in Section 3. Its privacy property is established and the parameters are set in Section 4. Experimental results are given in Section 5. We discuss some interesting future directions in Section 6. All missing proofs can be found in the Supplementary Material (SM).

2 Preliminaries and Notation

We use to denote .


For any distribution , we write

to denote a random variable

that is distributed as . For two distributions , let (resp., ) denote the distribution of (resp., ) where are independent. For , we use to denote the distribution of where .

A distribution over non-negative integers is said to be infinitely divisible if and only if, for every , there exists a distribution such that is identical to , where the sum is over distributions.

The negative binomial distribution with parameters

, , denoted , has probability mass at all . is infinitely divisible; specifically, .

Differential Privacy.

Two input datasets and are said to be neighboring if and only if they differ on at most a single user’s input, i.e., for all but one .

Definition 4 (Differential Privacy (DP) Dwork et al. (2006b; a)).

Let . A randomized algorithm taking as input a dataset is said to be -differentially private (-DP) if for any two neighboring datasets and , and for any subset of outputs of , it holds that .

Shuffle DP Model.

A protocol over inputs in the shuffle DP model (Bittau et al., 2017; Erlingsson et al., 2019; Cheu et al., 2019) consists of three procedures. A local randomizer takes an input and outputs a set of messages. The shuffler takes the multisets output by the local randomizer applied to each of , and produces a random permutation of the messages as output. Finally, the analyzer takes the output of the shuffler and computes the output of the protocol. Privacy in the shuffle model is enforced on the output of the shuffler when a single input is changed.

3 Generic Protocol Description

Below we describe the protocol for -summation that is private in the shuffle DP model. In our protocol, the randomizer will send messages, each of which is an integer in . The analyzer simply sums up all the incoming messages. The messages sent from the randomizer can be categorized into three classes:

  • [nosep]

  • Input: each user will send if it is non-zero.

  • Central Noise: This is the noise whose sum is equal to the Discrete Laplace noise commonly used algorithms in the central DP model. This noise is sent in “unary” as or messages.

  • Zero-Sum Noise: Finally, we “flood” the messages with noise that cancels out. This noise comes from a carefully chosen sub-collection of the collection of all multisets of whose sum of elements is equal to zero (e.g., may belong to ).777Note that while may be infinite, we will later set it to be finite, resulting in an efficient protocol. For more details, see Theorem 12 and the paragraph succeeding it. We will refer to each as a noise atom.

Algorithms 1 and 2 show the generic form of our protocol, which we refer to as the Correlated Noise mechanism. The protocol is specified by the following infinitely divisible distributions over : the “central” noise distribution , and for every , the “flooding” noise distribution .

1:  procedure CorrNoiseRandomizer
2:     if
3:        Send
4:     Sample
5:     Send copies of , and copies of
6:     for
7:        Sample
8:        for
9:           Send copies of
Algorithm 1 -Summation Randomizer
1:  procedure CorrNoiseAnalyzer
2:      multiset of messages received
3:     return
Algorithm 2 -Summation Analyzer

Note that since is a multiset, Line 8 goes over each element the same number of times it appears in ; e.g., if , the iteration is executed twice.

3.1 Error and Communication Complexity

We now state generic forms for the MSE and communication cost of the protocol:

Observation 5.

MSE is .

We stress here that the distribution itself is not the Discrete Laplace distribution; we pick it so that is . As a result,

is indeed equal to the variance of the Discrete Laplace noise.

Observation 6.

Each user sends at most messages in expectation, each consisting of bits.

4 Parameter Selection and Privacy Proof

The focus of this section is on selecting concrete distributions to initiate the protocol and formalize its privacy guarantees, ultimately proving Theorem 2. First, in Section 4.1, we introduce additional notation and reduce our task to proving a privacy guarantee for a protocol in the central model. With these simplifications, we give a generic form of privacy guarantees in Section 4.2. Section 4.3 and Section 4.4 are devoted to a more concrete selection of parameters. Finally, Theorem 2 is proved in Section 4.5.

4.1 Additional Notation and Simplifications

Matrix-Vector Notation.

We use boldface letters to denote vectors and matrices, and standard letters to refer to their coordinates (e.g., if is a vector, then refers to its th coordinate). For convenience, we allow general index sets for vectors and matrices; e.g., for an index set , we write to denote the tuple . Operations such as addition, scalar-vector/matrix multiplication or matrix-vector multiplication are defined naturally.

For , we use to denote the th vector in the standard basis; that is, its -indexed coordinate is equal to and each of the other coordinates is equal to . Furthermore, we use  to denote the all-zeros vector.

Let denote , and denote the vector . Recall that a noise atom is a multiset of elements from . It is useful to also think of as a vector in where its th entry denotes the number of times appears in . We overload the notation and use to both represent the multiset and its corresponding vector.

Let denote the matrix whose rows are indexed by and whose columns are indexed by where . In other words, is a concatenation of column vectors . Furthermore, let denote , and denote the matrix with row removed.

Next, we think of each input dataset as its histogram where denotes the number of such that . Under this notation, two input datasets are neighbors iff and . For each histogram , we write to denote the vector resulting from appending zeros to the beginning of ; more formally, for every , we let if and if .

An Equivalent Central DP Algorithm.

A benefit of using infinitely divisible noise distributions is that they allow us to translate our protocols to equivalent ones in the central model, where the total sum of the noise terms has a well-understood distribution. In particular, with the notation introduced above, Algorithm 1 corresponds to Algorithm 3 in the central model:

1:  procedure CorrNoiseCentral()
2:     Sample
3:     for
4:        Sample
6:     return .
Algorithm 3 Central Algorithm (Matrix-Vector Notation)
Observation 7.

CorrNoiseRandomizer is -DP in the shuffle model if and only if CorrNoiseCentral is -DP in the central model.

Given creftypecap 7, we can focus on proving the privacy guarantee of CorrNoiseCentral in the central model, which will be the majority of this section.

Noise Addition Mechanisms for Matrix-Based Linear Queries.

The -noise addition mechanism for -summation (as defined in Section 1) works by first computing the summation and adding to it a noise random variable sampled from , where is a distribution over integers. Note that under our vector notation above, the -noise addition mechanism simply outputs where .

It will be helpful to consider a generalization of the -summation problem, which allows above to be changed to any matrix (where the noise is now also a vector).

To define such a problem formally, let be any index set. Given a matrix , the -linear query problem888Similar definitions are widely used in literature; see e.g. (Nikolov et al., 2013). The main distinction of our definition is that we only consider integer-valued and . is to compute, given an input histogram , an estimate of . (Equivalently, one can think of each user as holding a column vector of or the all-zeros vector , and the goal is to compute the sum of these vectors.)

The noise addition algorithms for -summation can be easily generalized to the -linear query case: for a collection of distributions, the -noise addition mechanism samples999Specifically, sample independently for each . and then outputs .

4.2 Generic Privacy Guarantee

With all the necessary notation ready, we can now state our main technical theorem, which gives a privacy guarantee in terms of a right inverse of the matrix :

Theorem 8.

Let denote any right inverse of (i.e., ) whose entries are integers. Suppose that the following holds:

  • [nosep]

  • The -noise addition mechanism is -DP for -summation.

  • The -noise addition mechanism is -DP for the -linear query problem, where denotes the th column of .

Then, CorrNoiseCentral with the following parameter selections is -DP for -summation:

  • [nosep]

  • .

  • .

  • for all .

The right inverse indeed represents the “change of basis” alluded to in the introduction. It will be specified in the next subsections along with the noise distributions .

As one might have noticed from Theorem 8, is somewhat different that other noise atoms, as its noise distribution is the sum of and . A high-level explanation for this is that our central noise is sent as messages and we would like to use the noise atom to “flood out” the correlations left in messages. A precise version of this statement is given below in Lemma 9. We remark that the first output coordinate alone has exactly the same distribution as the -noise addition mechanism. The main challenge in this analysis is that the two coordinates are correlated through ; indeed, this is where the random variable helps “flood out” the correlation.

Lemma 9.

Let be a mechanism that, on input histogram , works as follows:

  • [nosep]

  • Sample independently from .

  • Sample from .

  • Output .

If the -noise addition mechanism is -DP for -summation, then is -DP.

Lemma 9 is a direct improvement over the main analysis of Ghazi et al. (2020b), whose proof (which works only for ) requires the -noise addition mechanism to be -DP for -summation; we remove this factor. Our novel insight is that when conditioned on , rather than conditioned on for some , the distributions of are quite similar, up to a “shift” of and a multiplicative factor of . This allows us to “match” the two probability masses and achieve the improvement. The full proof of Lemma 9 is deferred to SM.

Let us now show how Lemma 9 can be used to prove Theorem 8. At a high-level, we run the mechanism from Lemma 9 and the -noise addition mechanism, and argue that we can use their output to construct the output for CorrNoiseCentral. The intuition behind this is that the first coordinate of the output of gives the weighted sum of the desired output, and the second coordinate gives the number of messages used to flood the central noise. As for the -noise addition mechanism, since is a right inverse of , we can use them to reconstruct the number of messages . The number of 1 messages can then be reconstructed from the weighted sum and all the numbers of other messages. These ideas are encapsulated in the proof below.

Proof of Theorem 8.

Consider defined as follows:

  1. [nosep]

  2. First, run from Lemma 9 on input histogram to arrive at an output

  3. Second, run -mechanism for -linear query on to get an output .

  4. Output computed by letting and then

From Lemma 9 and our assumption on , is -DP. By assumption that the -noise addition mechanism is -DP for -linear query and by the basic composition theorem, the first two steps of are -DP. The last step of only uses the output from the first two steps; hence, by the post-processing property of DP, we can conclude that is indeed .

Next, we claim that has the same distribution as with the specified parameters. To see this, recall from that we have , and , where and are independent. Furthermore, from the definition of the -noise addition mechanism, we have

where are independent, and denotes after replacing its first coordinate with zero.

Notice that . Using this and our assumption that , we get

Let ; we can see that each entry is independently distributed as . Finally, we have

Recall that for all ; equivalently, . Thus, we have

Hence, we can conclude that ; this implies that has the same distribution as the mechanism . ∎

4.3 Negative Binomial Mechanism

Having established a generic privacy guarantee of our algorithm, we now have to specify the distributions that satisfy the conditions in Theorem 8 while keeping the number of messages sent for the noise small. Similar to Ghazi et al. (2020b), we use the negative binomial distribution. Its privacy guarantee is summarized below.101010For the exact statement we use here, please refer to Theorem 13 of Ghazi et al. (2021b), which contains a correction of calculation errors in Theorem 13 of Ghazi et al. (2020b).

Theorem 10 (Ghazi et al. (2020b)).

For any and , let and . The -additive noise mechanism is -DP for -summation.

We next extend Theorem 10 to the -linear query problem. To state the formal guarantees, we say that a vector is dominated by vector iff . (We use the convention and for all .)

Corollary 11.

Let . Suppose that every column of is dominated by . For each , let , and . Then, -noise addition mechanism is -DP for -linear query.

4.4 Finding a Right Inverse

A final step before we can apply Theorem 8 is to specify the noise atom collection and the right inverse of . Below we give such a right inverse where every column is dominated by a vector that is “small”. This allows us to then use the negative binomial mechanism in the previous section with a “small” amount of noise. How “small” is depends on the expected number of messages sent; this is governed by . With this notation, the guarantee of our right inverse can be stated as follows. (We note that in our noise selection below, every has at most three elements. In other words, and will be within a factor of three of each other.)

Theorem 12.

There exist of size , with and such that and every column of is dominated by .

The full proof of Theorem 12 is deferred to SM. The main idea is to essentially proceed via Gaussian elimination on . However, we have to be careful about our choice of orders of rows/columns to run the elimination on, as otherwise it might produce a non-integer matrix or one whose columns are not “small”. In our proof, we order the rows based on their absolute values, and we set to be the collection of and for all . In other words, these are the noise atoms we send in our protocol.

(a) RMSEs for varying
(b) Expected bits sent for varying
(c) Expected bits sent for varying
Figure 1: Error and communication complexity of our “correlated noise” -summation protocol compared to other protocols.

4.5 Specific Parameter Selection: Proof of Theorem 2

Proof of Theorem 2.

Let be as in Theorem 12, , and,

  • [nosep]

  • ,

  • where and ,

  • where and for all .

From Theorem 10, the -noise addition mechanism is -DP for -summation. Corollary 11 implies that the -noise addition mechanism is -DP for -linear query. As a result, since and , Theorem 8 ensures that CorrNoiseCentral with parameters as specified in the theorem is -DP.

The error claim follows immediately from creftypecap 5 together with the fact that . Using creftypecap 6, we can also bound the expected number of messages as

where each message consists of bits. ∎

5 Experimental Evaluation

We compare our “correlated noise” -summation protocol (Algorithms 12) against known algorithms in the literature, namely the IKOS “split-and-mix” protocol (Ishai et al., 2006; Balle et al., 2020; Ghazi et al., 2020c) and the fragmented111111

The RAPPOR randomizer starts with a one-hot encoding of the input (which is a

-bit string) and flips each bit with a certain probability. Fragmentation means that, instead of sending the entire -bit string, the randomizer only sends the coordinates that are set to 1, each as a separate -bit message. This is known to reduce both the communication and the error in the shuffle model (Cheu et al., 2019; Erlingsson et al., 2020). version of RAPPOR (Erlingsson et al., 2014; 2020). We also include the Discrete Laplace mechanism (Ghosh et al., 2012) in our plots for comparison, although it is not directly implementable in the shuffle model. We do not include the generalized (i.e., -ary) Randomized Response algorithm (Warner, 1965) as it always incurs at least as large error as RAPPOR.

For our protocol, we set in Theorem 8 to , meaning that its MSE is that of . While the parameters set in our proofs give a theoretically vanishing overhead guarantee, they turn out to be rather impractical; instead, we resort to a tighter numerical approach to find the parameters. We discuss this, together with the setting of parameters for the other alternatives, in the SM.

For all mechanisms the root mean square error (RMSE) and the (expected) communication per user only depends on , and is independent of the input data. We next summarize our findings.


The IKOS algorithm has the same error as the Discrete Laplace mechanism (in the central model), whereas our algorithm’s error is slightly larger due to being slightly smaller than . On the other hand, the RMSE for RAPPOR grows as increases, but it seems to converge as becomes larger. (For and , we found that RMSEs differ by less than 1%.)

However, the key takeaway here is that the RMSEs of IKOS, Discrete Laplace, and our algorithm grow only linearly in , but the RMSE of RAPPOR is proportional to . This is illustrated in Figure 0(a).


While the IKOS protocol achieves the same accuracy as the Discrete Laplace mechanism, it incurs a large communication overhead, as each message sent consists of bits and each user needs to send multiple messages. By contrast, when fixing and taking , both RAPPOR and our algorithm only send messages, each of length and respectively. This is illustrated in Figure 0(b). Note that the number of bits sent for IKOS is indeed not a monotone function since, as increases, the number of messages required decreases but the length of each message increases.

Finally, we demonstrate the effect of varying in Figure 0(c) for a fixed value of . Although Theorem 2 suggests that the communication overhead should grow roughly linearly with , we have observed larger gaps in the experiments. This seems to stem from the fact that, while the growth would have been observed if we were using our analytic formula, our tighter parameter computation (detailed in Appendix H.1 of SM) finds a protocol with even smaller communication, suggesting that the actual growth might be less than though we do not know of a formal proof of this. Unfortunately, for large , the optimization problem for the parameter search gets too demanding and our program does not find a good solution, leading to the “larger gap” observed in the experiments.

6 Conclusions

In this work, we presented a DP protocol for real and -sparse vector aggregation in the shuffle model with accuracy arbitrarily close to the best possible central accuracy, and with relative communication overhead tending to  with increasing number of users. It would be very interesting to generalize our protocol and obtain qualitatively similar guarantees for dense vector summation.

We also point out that in the low privacy regime (), the staircase mechanism is known to significantly improve upon the Laplace mechanism (Geng and Viswanath, 2016; Geng et al., 2015); the former achieves MSE that is exponentially small in while the latter has MSE . An interesting open question is to achieve such a gain in the shuffle model.


  • M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang (2016) Deep learning with differential privacy. In CCS, pp. 308–318. Cited by: §1.
  • J. M. Abowd (2018) The US Census Bureau adopts differential privacy. In KDD, pp. 2867–2867. Cited by: §1.
  • J. Acharya, Z. Sun, and H. Zhang (2019) Hadamard response: estimating distributions privately, efficiently, and with little communication. In AISTATS, pp. 1120–1129. Cited by: §1.2.
  • N. Agarwal, A. T. Suresh, F. Yu, S. Kumar, and H. B. McMahan (2018) cpSGD: communication-efficient and differentially-private distributed SGD. In NeurIPS, pp. 7575–7586. Cited by: §1.
  • D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic (2017) QSGD: communication-efficient SGD via gradient quantization and encoding. In NIPS, pp. 1709–1720. Cited by: §1.2.
  • Apple Differential Privacy Team (2017) Learning with privacy at scale. Apple Machine Learning Journal. Cited by: §1.
  • V. Balcer, A. Cheu, M. Joseph, and J. Mao (2021) Connecting robust shuffle privacy and pan-privacy. In SODA, pp. 2384–2403. Cited by: §1.2.
  • V. Balcer and A. Cheu (2020) Separating local & shuffled differential privacy via histograms. In ITC, pp. 1:1–1:14. Cited by: §1.2.
  • B. Balle, J. Bell, A. Gascón, and K. Nissim (2019) The privacy blanket of the shuffle model. In CRYPTO, pp. 638–667. Cited by: §1.2, §1, §1.
  • B. Balle, J. Bell, A. Gascón, and K. Nissim (2020) Private summation in the multi-message shuffle model. In CCS, pp. 657–676. Cited by: §D.5, §1.1, §1.2, §1.2, §1, §5.
  • R. Bassily, A. Smith, and A. Thakurta (2014) Private empirical risk minimization: efficient algorithms and tight error bounds. In FOCS, pp. 464–473. Cited by: §1.
  • A. Beimel, K. Nissim, and E. Omri (2008) Distributed private data analysis: simultaneously solving how and what. In CRYPTO, pp. 451–468. Cited by: §1.
  • S. Biswas, Y. Dong, G. Kamath, and J. R. Ullman (2020) CoinPress: practical private mean and covariance estimation. In NeurIPS, Cited by: §1.
  • A. Bittau, Ú. Erlingsson, P. Maniatis, I. Mironov, A. Raghunathan, D. Lie, M. Rudominer, U. Kode, J. Tinnés, and B. Seefeld (2017) Prochlo: strong privacy for analytics in the crowd. In SOSP, pp. 441–459. Cited by: §1, §2.
  • T.-H. H. Chan, E. Shi, and D. Song (2012) Optimal lower bound for differentially private multi-party aggregation. In ESA, pp. 277–288. Cited by: §1.
  • A. Chang, B. Ghazi, R. Kumar, and P. Manurangsi (2021) Locally private -means in one round. In ICML, Cited by: §1.2.
  • L. Chen, B. Ghazi, R. Kumar, and P. Manurangsi (2021) On distributed differential privacy and counting distinct elements. In ITCS, Cited by: §1.2.
  • W. Chen, P. Kairouz, and A. Özgür (2020) Breaking the communication-privacy-accuracy trilemma. In NeurIPS, Cited by: §1.2.
  • A. Cheu, A. D. Smith, J. Ullman, D. Zeber, and M. Zhilyaev (2019) Distributed differential privacy via shuffling. In EUROCRYPT, pp. 375–403. Cited by: §1, §2, footnote 11.
  • A. Cheu and M. Zhilyaev (2021) Differentially private histograms in the shuffle model from fake users. CoRR abs/2104.02739. External Links: 2104.02739 Cited by: §1.2.
  • B. Ding, J. Kulkarni, and S. Yekhanin (2017) Collecting telemetry data privately. In NIPS, pp. 3571–3580. Cited by: §1.
  • C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor (2006a) Our data, ourselves: privacy via distributed noise generation. In EUROCRYPT, pp. 486–503. Cited by: §1, Definition 4.
  • C. Dwork, F. McSherry, K. Nissim, and A. Smith (2006b) Calibrating noise to sensitivity in private data analysis. In TCC, pp. 265–284. Cited by: §1, §1, Definition 4.
  • Ú. Erlingsson, V. Feldman, I. Mironov, A. Raghunathan, S. Song, K. Talwar, and A. Thakurta (2020) Encode, shuffle, analyze privacy revisited: formalizations and empirical evaluation. CoRR abs/2001.03618. Cited by: §5, footnote 11.
  • Ú. Erlingsson, V. Feldman, I. Mironov, A. Raghunathan, K. Talwar, and A. Thakurta (2019) Amplification by shuffling: from local to central differential privacy via anonymity. In SODA, pp. 2468–2479. Cited by: §1, §2.
  • Ú. Erlingsson, V. Pihur, and A. Korolova (2014) RAPPOR: randomized aggregatable privacy-preserving ordinal response. In CCS, pp. 1054–1067. Cited by: §1, §5.
  • M. Gaboardi, R. Rogers, and O. Sheffet (2019) Locally private mean estimation:

    -test and tight confidence intervals

    In AISTATS, pp. 2545–2554. Cited by: §1.2.
  • Q. Geng, P. Kairouz, S. Oh, and P. Viswanath (2015) The staircase mechanism in differential privacy. IEEE J. Sel. Top. Signal Process. 9 (7), pp. 1176–1184. Cited by: §6.
  • Q. Geng and P. Viswanath (2016) The optimal noise-adding mechanism in differential privacy. IEEE TOIT 62 (2), pp. 925–951. Cited by: §6.
  • B. Ghazi, N. Golowich, R. Kumar, P. Manurangsi, R. Pagh, and A. Velingker (2020a) Pure differentially private summation from anonymous messages. In ITC, Cited by: §1.2.
  • B. Ghazi, N. Golowich, R. Kumar, R. Pagh, and A. Velingker (2021a) On the power of multiple anonymous messages. In EUROCRYPT, Cited by: §1.2.
  • B. Ghazi, R. Kumar, P. Manurangsi, and R. Pagh (2020b) Private counting from anonymous messages: near-optimal accuracy with vanishing communication overhead. In ICML, pp. 3505–3514. Cited by: Appendix A, §H.1, §1.1, §1.2, §1.2, §1, §4.2, §4.3, Theorem 10, Lemma 15, footnote 10, B. Ghazi, R. Kumar, P. Manurangsi, and R. Pagh (2021b).
  • B. Ghazi, R. Kumar, P. Manurangsi, and R. Pagh (2021b) Private counting from anonymous messages: near-optimal accuracy with vanishing communication overhead. CoRR abs/2106.04247. Note: This version contains a correction of calculation errors in Theorem 13 of Ghazi et al. (2020b). Cited by: footnote 10.
  • B. Ghazi, P. Manurangsi, R. Pagh, and A. Velingker (2020c) Private aggregation from fewer anonymous messages. In EUROCRYPT, pp. 798–827. Cited by: §1.2, §1.2, §1.2, §1, §5.
  • A. Ghosh, T. Roughgarden, and M. Sundararajan (2012) Universally utility-maximizing privacy mechanisms. SIAM J. Comput. 41 (6), pp. 1673–1693. Cited by: Appendix B, §1, §5.
  • A. M. Girgis, D. Data, S. Diggavi, P. Kairouz, and A. T. Suresh (2021) Shuffled model of federated learning: privacy, communication and accuracy trade-offs. In AISTATS, Cited by: §1.2, §1.
  • S. Goryczka and L. Xiong (2017) A comprehensive comparison of multiparty secure additions with differential privacy. IEEE Trans. Dependable Secur. Comput. 14 (5), pp. 463–477. Cited by: footnote 4.
  • A. Greenberg (2016) Apple’s “differential privacy” is about collecting your data – but not your data. Wired, June 13. Cited by: §1.
  • Y. Ishai, E. Kushilevitz, R. Ostrovsky, and A. Sahai (2006) Cryptography from anonymity. In FOCS, pp. 239–248. Cited by: §1.2, §1.2, §5.
  • P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al. (2019) Advances and open problems in federated learning. CoRR abs/1912.04977. Cited by: §1.2.
  • S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Rashkodnikova, and A. Smith (2008) What can we learn privately?. In FOCS, pp. 531–540. Cited by: §1.
  • A. Nikolov, K. Talwar, and L. Zhang (2013) The geometry of differential privacy: the sparse and approximate cases. In STOC, pp. 351–360. External Links: Link, Document Cited by: footnote 8.
  • S. Ruggles, S. Flood, R. Goeken, J. Grover, E. Meyer, J. Pacas, and M. Sobek (2020) Integrated public use microdata series (IPUMS) USA: version 10.0 [dataset]. Minneapolis, MN. External Links: Link Cited by: §H.3.
  • S. Shankland (2014) How Google tricks itself to protect Chrome user privacy. CNET, October. Cited by: §1.
  • S. Song, K. Chaudhuri, and A. D. Sarwate (2013) Stochastic gradient descent with differentially private updates. In GlobalSIP, pp. 245–248. Cited by: §1.
  • U. Stemmer and H. Kaplan (2018) Differentially private -means with constant multiplicative error. In NeurIPS, pp. 5436–5446. Cited by: §1.
  • U. Stemmer (2020) Locally private k-means clustering. In SODA, pp. 548–559. Cited by: §1.
  • A. T. Suresh, X. Y. Felix, S. Kumar, and H. B. McMahan (2017) Distributed mean estimation with limited communication. In ICML, pp. 3329–3337. Cited by: §1.2.
  • S. L. Warner (1965) Randomized response: a survey technique for eliminating evasive answer bias. JASA 60 (309), pp. 63–69. Cited by: §1, §5.
  • Y. Zhang, J. C. Duchi, M. I. Jordan, and M. J. Wainwright (2013) Information-theoretic lower bounds for distributed statistical estimation with communication constraints.. In NIPS, pp. 2328–2336. Cited by: §1.2.

Supplementary Material

Appendix A Additional Preliminaries

We start by introducing a few additional notation and lemmas that will be used in our proofs.

Definition 13.

The -hockey stick divergence of two (discrete) distributions is defined as

where .

The following is a (well-known) simple restatement of DP in terms of hockey stick divergence:

Lemma 14.

An algorithm is -DP iff for any neighboring datasets and , it holds that .

We will also use the following result in (Ghazi et al., 2020b, Lemma 21), which allows us to easily compute differential privacy parameters noise addition algorithms for -summation.

Lemma 15 (Ghazi et al. (2020b)).

An -noise addition mechanism for -summation is