The shuffle model.
The shuffle model of differential privacy [3, 2] considers a data collector that receives messages from users (possibly multiple messages from each user). The shuffle model assumes that a mechanism is in place to provide anonymity to each of the messages, i.e., in the curator’s view, the message have been shuffled by a random unknown permutation.
Following the notation in , we define a protocol in the shuffle model to be a pair of algorithms , where , and , for number of users and number of messages . We call the local randomizer, the message space of the protocol, the analyzer of , and the output space. The overall protocol implements a mechanism as follows. Each user holds a data record
, to which she applies the local randomizer to obtain a vector of messages. The multiset union of all messages is then shuffled and submitted to the analyzer. We write to denote the random shuffling step, where is a shuffler that applies a random permutation to its inputs. In summary, the output of is given by .
To prove privacy we will refer to the mechanism which captures the view of the analyzer in an execution of the protocol. Therefore we say that is -differentially private if for every pair of -tuples of inputs and differing in one co-ordinate, and every collection of multisets of of size , i.e. every possible subset of views of the analyzer, we have
In this paper we are concerned with the problem of real summation where each is a real number in
and the goal of the protocol is for the analyser to obtain a differentially private estimate of.
Our proposed protocol uses a fixed point encoding of a real number with integer precision and randomized rounding, which we define as .
For any , .
Let be , and note that and . It follows that
Differential Privacy from Statistical Distance.
Our argument relies on statistical distance which, for consistency with , we define as the maximal advantage of a distinguisher in telling two distributions and apart, namely . We will show that the view of the analyzer in our protocol is close in statistical distance to the output of a differentially private mechanism. The following lemma (also stated by Wang et al. , Proposition ) says that this suffices to conclude that our protocol is differentially private.
Let and be protocols such that , for a security parameter and all inputs . If is -DP, then is -DP.
For any neighboring inputs , satisfies , and satisfies , for any input and . It follows that . ∎
The Discrete Laplace
In this work we use a discrete version of the Laplace mechanism, which consists of adding a discrete random variable to the input. We refer to this distribution as thediscrete Laplace distribution. The distribution is over , we write it
and it has probability mass function proportional to. Adding noise from this distribution to a function with sensitivity provides -differential privacy with , analogously to the Laplace mechanism on . This distribution also appeared in  though under the name symmetric geometric.
2 Secure Distributed Summation
Ishai et al.  showed how to use anonymous communications as a building block for a variety of tasks, including securely computing -party summation over . This setting coincides with the shuffle model presented above, and hence the precise result by Ishai et al. can be restated as follows (we give a detailed proof of this Lemma in Section 5).
Let be a shuffle model protocol, and let be a function. We say that is -secure for computing if, for any such that , we have
Lemma 2.1 ().
There exists a -secure protocol in the shuffle model for summation in with communication per party.
The protocol by Ishai et al. is very simple. Let be the input of the th party. Party generates additive shares of ( can be reduced by almost a factor of two as explained in section 5.1), i.e., it generates independent uniformly random elements from denoted and then computes . Party then submits each as a separate message to the shuffler. The shuffler then shuffles all messages together and sends them on to the server who adds up all the received messages and finds the result as required. This is -secure as stated in the lemma.
3 Distributed Noise Addition
Given that a communication efficient protocol for secure exact integer summation in the shuffle model exists, we would now like to use it for private real summation. Intuitively, this task boils down to defining a local randomiser that takes a private value and outputs a privatized value in the discrete domain such that is differentially private and can be post-processed to a good approximation of .
A simple solution is to have a designated party add the noise required in the curator model. This is however not a satisfying solution as it does not withstand collusions and/or dropouts. To address this. Shi et al.  proposed a solution where each party adds enough noise to provide -differential privacy in the curator model with probability , which results in an -differentially private protocol. However, one can do strictly better: the total noise can be reduced by a factor of if each party adds a discrete random variable such that the sum of the contributions is exactly enough to provide -differential privacy, and this also results in pure differential privacy. A discrete random variable with this property is provided in , where it is shown that a discrete Laplace random variable can be expressed as the sum of
differences of two Pólya random variables (the Pólya distribution is a generalization of the negative binomial distribution). Concretely, ifand are independent Pólya random variables then has a discrete Laplace distribution i.e. . This allows to distribute the Laplace mechanism, which is what we shall do in our protocol presented in the next section.
4 Private Summation
In this section we prove a lemma which says that given a secure integer summation protocol we can construct a differentially private real summation protocol. We then combine this lemma with Lemma 2.1 to derive a protocol, given explicitly, for differentially private real summation.
Given a -secure protocol in the shuffle model for -party summation in , for any , with communication per party, there exists an -differentially private protocol in the shuffle model for real summation with standard error
-differentially private protocol in the shuffle model for real summation with standard errorand communication bounded by .
Let be . We will exhibit the resulting protocol , with and , with defined as follows. executes with , and thus . is the result of first computing a fixed-point encoding of the input with precision , and then adding noise . decodes by returning if , and otherwise. This addresses a potential underflow of the sum in . To see that has error , note that it has the accuracy of the discrete Laplace mechanism when adding integers, except when the total noise added has magnitude greater than , in which case we may incur additional error, but this only happens with probability . Hence, the error of this protocol is bounded by .
To show that this protocol is private we will compare the mechanism to another mechanism (which can be considered to be computed in the curator model) which is -differentially private and such that for all , from which the result follows by Lemma 1.2.
is defined to be the result of the following procedure. First apply to each input , then take the sum and then output the result of with first input and all other inputs .
Note that , and that the sensitivity of is . It follows that is -differentially private and thus by the post processing property so is .
It remains to show that , which we will do by demonstrating the existence of a coupling. First let the noise added to input by be the same in both mechanisms and note that this results in the inputs to within and the inputs to within having the same sum. It then follows immediately from Lemma 2.1 that these two instantiations of can be coupled to have identical outputs except with probability , as required. ∎
The choice was made so that the error in the discretization was the same order as the error due to the noise added, this recovers the same order of error as the curator model. Taking results in the leading term of the total error matching the curator model at the cost of a small constant factor increase to communication.
There exists an -differentially private protocol in the shuffle model for real summation with error and messages per party, each of length bits.
Such a protocol can be constructed from the proofs of these lemmas and is given explicitly by taking the local randomiser given in algorithm LABEL:algo:locrand, and the analyzer given in algorithm LABEL:algo:agg, with parameters , , and . This results in a mean squared error of
and communication of bits per party. In section 5.1 we explain how the choice of and thus the required communication can actually be reduced by almost a factor of two.
5 Summation by Anonymity
In this section we provide a proof of Lemma 2.1, all the ideas for the proof are provided in  but we reproduce the proof here keeping track of constants to facilitate setting parameters of the protocol. The following definition and lemma from  are fundamental to why this protocol is secure.
Let be a family of functions mapping to . We say is universal or a universal family of hash functions if, for selected uniformly at random from , for every , ,
Lemma 5.1 (Leftover Hash Lemma (special case)).
Let , , and let be a universal family of hash functions mapping bits to bits. Let , and be chosen independently uniformly at random from , and respectively. Then
To begin with we consider the case of securely adding two uniformly random inputs . Recall that is the protocol of the statement of the lemma, and let be shorthand for , i.e. the view of the analyzer in an execution of protocol with inputs . We write for and for . Finally let be an independent uniformly random element of .
Suppose . Then, .
For and let . are a universal family of hash functions from to . Let be an independent uniformly random element of . Note that has the same distribution as , which follows from the intuition that corresponds to random numbers shuffled together, and can be obtained by adding up of them, and letting be the sum of the rest.
The result now follows immediately from the fact that the Leftover Hash Lemma implies that . ∎
Now we can use this to solve the case of two arbitrary inputs.
If satisfy , then we have
Markov’s inequality provides that
and thus by the triangle inequality
and so for every and we have
Combining the last two inequalities gives the result. ∎
Combining these two lemmas gives that, for such that ,
From which the following lemma is immediate
If and such that then
We will now generalize to the case of -party summation.
Proof of Lemma 2.1.
Let be two distinct possible inputs to the protocol, we say that they are related by a basic step if they have the same sum and only differ in two entries. It is evident that any two distinct inputs with the same sum are related by at most basic steps. We will show that if is taken to be and and are related by a basic step then
from which the lemma follows by the triangle inequality for statistical distance.
Let and be related by a basic step and suppose w.l.o.g. that and differ in the first two co-ordinates. Taking , by lemma 5.4, we can couple the values sent by the first two parties on input with the values they send on input so that they match with probability . Independently of that we can couple the inputs of the other parties so that they always match as they each have the same input in both cases. This gives a coupling exhibiting that equation 2 holds. ∎
It may seem counter intuitive to require more messages the more parties there are (for fixed ). The addition of the term to is necessary for the proof of Lemma 2.1. The is because we are trying to stop the adversary from learning a greater variety of things when we have more parties. However it may be the case that Theorem 4.1 could follow from a weaker guarantee than provided by Lemma 2.1 and such a property might be true without the presence of this term.
It is an open problem to prove a lower bound greater than two on the number of messages required to get error on real summation. A proof that one message is not enough is given in .
5.1 Improving the Constants
The constants implied by this proof can be improved by using a sharper bound for in inequality 1. Using the bound gives that taking to be the ceiling of the root of
suffices in the statement of Lemma 5.4. The resulting value of is
Adding to the root before taking the ceiling gives a value of for which Lemma 2.1 holds.
-  Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. The privacy blanket of the shuffle model. abs/1903.02837, 2019.
-  Albert Cheu, Adam D. Smith, Jonathan Ullman, David Zeber, and Maxim Zhilyaev. Distributed differential privacy via shuffling. In Advances in Cryptology - EUROCRYPT 2019, 2019.
-  Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2468–2479. SIAM, 2019.
-  S. Goryczka and L. Xiong. A comprehensive comparison of multiparty secure additions with differential privacy. IEEE Transactions on Dependable and Secure Computing, 14(5):463–477, Sep. 2017.
-  Russell Impagliazzo and David Zuckerman. How to recycle random bits. Proc. 30th FOCS, 1989.
-  Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. Cryptography from anonymity. In FOCS, pages 239–248. IEEE Computer Society, 2006.
-  Elaine Shi, Richard Chow, T h. Hubert Chan, Dawn Song, and Eleanor Rieffel. Privacy-preserving aggregation of time-series data. In In NDSS, 2011.
-  Yu-Xiang Wang, Stephen E. Fienberg, and Alexander J. Smola. Privacy for free: Posterior sampling and stochastic gradient monte carlo. In ICML, volume 37 of JMLR Workshop and Conference Proceedings, pages 2493–2502. JMLR.org, 2015.