Differentially Private Summation with Multi-Message Shuffling

06/20/2019 ∙ by Borja Balle, et al. ∙ Georgetown University 0

In recent work, Cheu et al. (Eurocrypt 2019) proposed a protocol for n-party real summation in the shuffle model of differential privacy with O_ϵ, δ(1) error and Θ(ϵ√(n)) one-bit messages per party. In contrast, every local model protocol for real summation must incur error Ω(1/√(n)), and there exist protocols matching this lower bound which require just one bit of communication per party. Whether this gap in number of messages is necessary was left open by Cheu et al. In this note we show a protocol with O_ϵ, δ(1) error and O_ϵ, δ((n)) messages of size O((n)). This protocol is based on the work of Ishai et al. (FOCS 2006) showing how to implement distributed summation from secure shuffling, and the observation that this allows simulating the Laplace mechanism in the shuffle model.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Preliminaries

The shuffle model.

The shuffle model of differential privacy [3, 2] considers a data collector that receives messages from users (possibly multiple messages from each user). The shuffle model assumes that a mechanism is in place to provide anonymity to each of the messages, i.e., in the curator’s view, the message have been shuffled by a random unknown permutation.

Following the notation in [2], we define a protocol in the shuffle model to be a pair of algorithms , where , and , for number of users and number of messages . We call the local randomizer, the message space of the protocol, the analyzer of , and the output space. The overall protocol implements a mechanism as follows. Each user holds a data record

, to which she applies the local randomizer to obtain a vector of messages

. The multiset union of all messages is then shuffled and submitted to the analyzer. We write to denote the random shuffling step, where is a shuffler that applies a random permutation to its inputs. In summary, the output of is given by .

To prove privacy we will refer to the mechanism which captures the view of the analyzer in an execution of the protocol. Therefore we say that is -differentially private if for every pair of -tuples of inputs and differing in one co-ordinate, and every collection of multisets of of size , i.e. every possible subset of views of the analyzer, we have

Real summation.

In this paper we are concerned with the problem of real summation where each is a real number in

and the goal of the protocol is for the analyser to obtain a differentially private estimate of

.

Randomized rounding.

Our proposed protocol uses a fixed point encoding of a real number with integer precision and randomized rounding, which we define as .

Lemma 1.1.

For any , .

Proof.

Let be , and note that and . It follows that

Differential Privacy from Statistical Distance.

Our argument relies on statistical distance which, for consistency with [6], we define as the maximal advantage of a distinguisher in telling two distributions and apart, namely . We will show that the view of the analyzer in our protocol is close in statistical distance to the output of a differentially private mechanism. The following lemma (also stated by Wang et al. [8], Proposition ) says that this suffices to conclude that our protocol is differentially private.

Lemma 1.2.

Let and be protocols such that , for a security parameter and all inputs . If is -DP, then is -DP.

Proof.

For any neighboring inputs , satisfies , and satisfies , for any input and . It follows that . ∎

The Discrete Laplace

In this work we use a discrete version of the Laplace mechanism, which consists of adding a discrete random variable to the input. We refer to this distribution as the

discrete Laplace distribution. The distribution is over , we write it

and it has probability mass function proportional to

. Adding noise from this distribution to a function with sensitivity provides -differential privacy with , analogously to the Laplace mechanism on . This distribution also appeared in [7] though under the name symmetric geometric.

2 Secure Distributed Summation

Ishai et al. [6] showed how to use anonymous communications as a building block for a variety of tasks, including securely computing -party summation over . This setting coincides with the shuffle model presented above, and hence the precise result by Ishai et al. can be restated as follows (we give a detailed proof of this Lemma in Section 5).

Let be a shuffle model protocol, and let be a function. We say that is -secure for computing if, for any such that , we have

Lemma 2.1 ([6]).

There exists a -secure protocol in the shuffle model for summation in with communication per party.

The protocol by Ishai et al. is very simple. Let be the input of the th party. Party generates additive shares of ( can be reduced by almost a factor of two as explained in section 5.1), i.e., it generates independent uniformly random elements from denoted and then computes . Party then submits each as a separate message to the shuffler. The shuffler then shuffles all messages together and sends them on to the server who adds up all the received messages and finds the result as required. This is -secure as stated in the lemma.

3 Distributed Noise Addition

Given that a communication efficient protocol for secure exact integer summation in the shuffle model exists, we would now like to use it for private real summation. Intuitively, this task boils down to defining a local randomiser that takes a private value and outputs a privatized value in the discrete domain such that is differentially private and can be post-processed to a good approximation of .

A simple solution is to have a designated party add the noise required in the curator model. This is however not a satisfying solution as it does not withstand collusions and/or dropouts. To address this. Shi et al. [7] proposed a solution where each party adds enough noise to provide -differential privacy in the curator model with probability , which results in an -differentially private protocol. However, one can do strictly better: the total noise can be reduced by a factor of if each party adds a discrete random variable such that the sum of the contributions is exactly enough to provide -differential privacy, and this also results in pure differential privacy. A discrete random variable with this property is provided in [4], where it is shown that a discrete Laplace random variable can be expressed as the sum of

differences of two Pólya random variables (the Pólya distribution is a generalization of the negative binomial distribution). Concretely, if

and are independent Pólya random variables then has a discrete Laplace distribution i.e. . This allows to distribute the Laplace mechanism, which is what we shall do in our protocol presented in the next section.

4 Private Summation

In this section we prove a lemma which says that given a secure integer summation protocol we can construct a differentially private real summation protocol. We then combine this lemma with Lemma 2.1 to derive a protocol, given explicitly, for differentially private real summation.

Lemma 4.1.

Given a -secure protocol in the shuffle model for -party summation in , for any , with communication per party, there exists an

-differentially private protocol in the shuffle model for real summation with standard error

and communication bounded by .

Proof.

Let be . We will exhibit the resulting protocol , with and , with defined as follows. executes with , and thus . is the result of first computing a fixed-point encoding of the input with precision , and then adding noise . decodes by returning if , and otherwise. This addresses a potential underflow of the sum in . To see that has error , note that it has the accuracy of the discrete Laplace mechanism when adding integers, except when the total noise added has magnitude greater than , in which case we may incur additional error, but this only happens with probability . Hence, the error of this protocol is bounded by .

To show that this protocol is private we will compare the mechanism to another mechanism (which can be considered to be computed in the curator model) which is -differentially private and such that for all , from which the result follows by Lemma 1.2.

is defined to be the result of the following procedure. First apply to each input , then take the sum and then output the result of with first input and all other inputs .

Note that , and that the sensitivity of is . It follows that is -differentially private and thus by the post processing property so is .

It remains to show that , which we will do by demonstrating the existence of a coupling. First let the noise added to input by be the same in both mechanisms and note that this results in the inputs to within and the inputs to within having the same sum. It then follows immediately from Lemma 2.1 that these two instantiations of can be coupled to have identical outputs except with probability , as required. ∎

The choice was made so that the error in the discretization was the same order as the error due to the noise added, this recovers the same order of error as the curator model. Taking results in the leading term of the total error matching the curator model at the cost of a small constant factor increase to communication.

algocf[t]    

algocf[t]    

Combining Lemmas 2.1 and 4.1 we can conclude the following theorem.

Theorem 4.1.

There exists an -differentially private protocol in the shuffle model for real summation with error and messages per party, each of length bits.

Such a protocol can be constructed from the proofs of these lemmas and is given explicitly by taking the local randomiser given in algorithm LABEL:algo:locrand, and the analyzer given in algorithm LABEL:algo:agg, with parameters , , and . This results in a mean squared error of

and communication of bits per party. In section 5.1 we explain how the choice of and thus the required communication can actually be reduced by almost a factor of two.

5 Summation by Anonymity

In this section we provide a proof of Lemma 2.1, all the ideas for the proof are provided in [6] but we reproduce the proof here keeping track of constants to facilitate setting parameters of the protocol. The following definition and lemma from [5] are fundamental to why this protocol is secure.

Let be a family of functions mapping to . We say is universal or a universal family of hash functions if, for selected uniformly at random from , for every , ,

Lemma 5.1 (Leftover Hash Lemma (special case)).

Let , , and let be a universal family of hash functions mapping bits to bits. Let , and be chosen independently uniformly at random from , and respectively. Then

To begin with we consider the case of securely adding two uniformly random inputs . Recall that is the protocol of the statement of the lemma, and let be shorthand for , i.e. the view of the analyzer in an execution of protocol with inputs . We write for and for . Finally let be an independent uniformly random element of .

Lemma 5.2.

Suppose . Then, .

Proof.

For and let . are a universal family of hash functions from to . Let be an independent uniformly random element of . Note that has the same distribution as , which follows from the intuition that corresponds to random numbers shuffled together, and can be obtained by adding up of them, and letting be the sum of the rest.

The result now follows immediately from the fact that the Leftover Hash Lemma implies that . ∎

Now we can use this to solve the case of two arbitrary inputs.

Lemma 5.3.

If satisfy , then we have

Proof.

Markov’s inequality provides that

and thus by the triangle inequality

Note that

and so for every and we have

Combining the last two inequalities gives the result. ∎

Combining these two lemmas gives that, for such that ,

(1)

From which the following lemma is immediate

Lemma 5.4.

If and such that then

We will now generalize to the case of -party summation.

Proof of Lemma 2.1.

Let be two distinct possible inputs to the protocol, we say that they are related by a basic step if they have the same sum and only differ in two entries. It is evident that any two distinct inputs with the same sum are related by at most basic steps. We will show that if is taken to be and and are related by a basic step then

(2)

from which the lemma follows by the triangle inequality for statistical distance.

Let and be related by a basic step and suppose w.l.o.g. that and differ in the first two co-ordinates. Taking , by lemma 5.4, we can couple the values sent by the first two parties on input with the values they send on input so that they match with probability . Independently of that we can couple the inputs of the other parties so that they always match as they each have the same input in both cases. This gives a coupling exhibiting that equation 2 holds. ∎

Remark 5.1.

It may seem counter intuitive to require more messages the more parties there are (for fixed ). The addition of the term to is necessary for the proof of Lemma 2.1. The is because we are trying to stop the adversary from learning a greater variety of things when we have more parties. However it may be the case that Theorem 4.1 could follow from a weaker guarantee than provided by Lemma 2.1 and such a property might be true without the presence of this term.

It is an open problem to prove a lower bound greater than two on the number of messages required to get error on real summation. A proof that one message is not enough is given in [1].

5.1 Improving the Constants

The constants implied by this proof can be improved by using a sharper bound for in inequality 1. Using the bound gives that taking to be the ceiling of the root of

suffices in the statement of Lemma 5.4. The resulting value of is

Adding to the root before taking the ceiling gives a value of for which Lemma 2.1 holds.

References

  • [1] Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. The privacy blanket of the shuffle model. abs/1903.02837, 2019.
  • [2] Albert Cheu, Adam D. Smith, Jonathan Ullman, David Zeber, and Maxim Zhilyaev. Distributed differential privacy via shuffling. In Advances in Cryptology - EUROCRYPT 2019, 2019.
  • [3] Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2468–2479. SIAM, 2019.
  • [4] S. Goryczka and L. Xiong. A comprehensive comparison of multiparty secure additions with differential privacy. IEEE Transactions on Dependable and Secure Computing, 14(5):463–477, Sep. 2017.
  • [5] Russell Impagliazzo and David Zuckerman. How to recycle random bits. Proc. 30th FOCS, 1989.
  • [6] Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. Cryptography from anonymity. In FOCS, pages 239–248. IEEE Computer Society, 2006.
  • [7] Elaine Shi, Richard Chow, T h. Hubert Chan, Dawn Song, and Eleanor Rieffel. Privacy-preserving aggregation of time-series data. In In NDSS, 2011.
  • [8] Yu-Xiang Wang, Stephen E. Fienberg, and Alexander J. Smola. Privacy for free: Posterior sampling and stochastic gradient monte carlo. In ICML, volume 37 of JMLR Workshop and Conference Proceedings, pages 2493–2502. JMLR.org, 2015.