Scalable and Differentially Private Distributed Aggregation in the Shuffled Model

06/19/2019 ∙ by Badih Ghazi, et al. ∙ Google IT University of Copenhagen 0

Federated learning promises to make machine learning feasible on distributed, private datasets by implementing gradient descent using secure aggregation methods. The idea is to compute a global weight update without revealing the contributions of individual users. Current practical protocols for secure aggregation work in an "honest but curious" setting where a curious adversary observing all communication to and from the server cannot learn any private information assuming the server is honest and follows the protocol. A more scalable and robust primitive for privacy-preserving protocols is shuffling of user data, so as to hide the origin of each data item. Highly scalable and secure protocols for shuffling, so-called mixnets, have been proposed as a primitive for privacy-preserving analytics in the Encode-Shuffle-Analyze framework by Bittau et al. Recent papers by Cheu et al. and Balle et al. have formalized the "shuffled model" and suggested protocols for secure aggregation that achieve differential privacy guarantees. Their protocols come at a cost, though: Either the expected aggregation error or the amount of communication per user scales as a polynomial n^Ω(1) in the number of users n. In this paper we propose simple and more efficient protocol for aggregation in the shuffled model, where communication as well as error increases only polylogarithmically in n. Our new technique is a conceptual "invisibility cloak" that makes users' data almost indistinguishable from random noise while introducing zero distortion on the sum.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We consider the problem of privately summing numbers in the shuffled model recently defined by Cheu et al. cheu19 . For consistency with the literature we will use the term aggregation for the sum operation. Consider users with data . In the shuffled model user applies a randomized encoder algorithm that maps to a multiset of messages, , where is a parameter. Then a trusted shuffler takes all messages and outputs them in random order. Finally, an analyzer algorithm maps the shuffled output

to an estimate of 

.

A protocol in the shuffled model is -differentially private if is -differentially private (see definition in Section 2.1

), where probabilities are with respect to the random choices made in the algorithm

and the shuffler . The privacy claim is justified by the existence of highly scalable protocols for privately implementing the shuffling primitive bittau17 ; cheu19 .

Two protocols for aggregation in the shuffled model were recently suggested by Balle et al. balle19 and Cheu et al. cheu19 . We discuss these further in Section 1.2, but note here that all previously known protocols have either communication or error that grows as . This is unavoidable for single-message protocols, by the lower bound in balle19 , but it has been unclear if such a trade-off is necessary in general. Cheu et al. cheu19 explicitly mention it as an open problem to investigate this question.

1.1 Our Results

We show that a trade-off is not necessary — it is possible to avoid the factor in both the error bound and the amount of communication per user. The precise results obtained depend on the notion of “neighboring dataset” in the definition of differential privacy. We consider the standard notion of neighboring dataset in differential privacy, that the input of a single user is changed, and show:

Theorem 1.

Let and be any real numbers. There exists a protocol in the shuffled model that is -differentially private under single-user changes, has expected error , and where each encoder sends messages of bits.

We also consider a different notion similar to the gold standard of secure multi-party computation: Two datasets are considered neighboring if the their sums (taken after discretization) are identical. This notion turns out to allow much better privacy, even with zero noise in the final sum — the only error in the protocol comes from representing the terms of the sum in bounded precision.

Theorem 2.

Let and be any real numbers and let . There exists a protocol in the shuffled model that is -differentially private under sum-preserving changes, has worst-case error , and where each encoder sends messages of bits.

In addition to analyzing error and privacy of our new protocol we consider its resilience towards untrusted users that may deviate from the protocol. While the shuffled model is vulnerable to such attacks in general balle19 , we argue in Section 2.5 that the privacy guarantees of our protocol are robust even to a large fraction of colluding users. For reasons of exposition we show Theorem 2 before Theorem 1. The technical ideas behind our new results are discussed in Section 1.3. Next, we discuss implications for machine learning and the relation to previous work.

1.2 Discussion of Related Work and Applications

Our protocol is applicable in any setting where secure aggregation is applied. Below we mention some of the most significant examples and compare to existing results in the literature.

Federated Learning.

Our main application in a machine learning context is gradient descent-based federated learning mcmahan2016communication

. The idea is to avoid collecting user data, and instead compute weight updates in a distributed manner by sending model parameters to users, locally running stochastic gradient descent on private data, and aggregating model updates over all users. Using a

secure aggregation protocol (see e.g. practicalSecAgg

) guards against information leakage from the update of a single user, since the server only learns the aggregated model update. A federated learning system based on these principles is currently used by Google to train neural networks on data residing on users’ phones 

GoogleBlog17 .

Current practical secure aggregation protocols such as that of Bonawitz et al. practicalSecAgg have user computation cost and total communication complexity , where is the number of users. This limits the number of users that can participate in the secure aggregation protocol. In addition, the privacy analysis assumes of an “honest but curious” server that does not deviate from the protocol, so some level of trust in the secure aggregation server is required. In contrast, protocols based on shuffling work with much weaker assumptions on the server bittau17 ; cheu19 . In addition to this advantage, total work and communication of our new protocol scales near-linearly with the number of users.

Differentially Private Aggregation in the Shuffled Model.

It is known that gradient descent can work well even if data is accessible only in noised form, in order to achieve differential privacy abadi2016deep . Note that in order to run gradient descent in a differentially private manner, privacy parameters need to be chosen in such a way that the combined privacy loss over many iterations is limited.

Reference #messages /
Message
size
Expected error Privacy protection
Cheu et al. cheu19
1
Single-user change
Balle et al. balle19 Single-user change
New
Single-user change
Sum-preserving change
Figure 1: Comparison of differentially private aggregation protocols in the shuffled model with -differential privacy. The number of users is , and is an integer parameter. Message sizes are in bits; asymptotic notation is suppressed for readability. We consider two types of privacy protection, corresponding to different notions of “neighboring dataset” in differential privacy: In the first one, which was considered in previous papers, datasets are considered neighboring if they differ in the data of a single user. In the latter, datasets are considered neighboring if they have the same sum.

Each aggregation protocol shown in Figure 1 represents a different trade-off, optimizing different parameters. Our protocols are the only ones that avoid factors in both the communication per user and the error.

Private Sketching and Statistical Learning.

At first glance it may seem that aggregation is a rather weak primitive for combining data from many sources in order to analyze it. However, research in the area of data stream algorithms has uncovered many non-trivial algorithms that are small linear sketches, see e.g. cormode2011synopses ; woodruff2014sketching . Linear sketches over the integers (or over a finite field) can be implemented using secure aggregation by computing linear sketches locally and summing them up over some range that is large enough to hold the sum. This unlocks many differentially private protocols in the shuffled model, e.g. estimation of

-norms, quantiles, heavy hitters, and number of distinct elements.

Second, as observed in cheu19 we can translate any statistical query over a distributed data set to an aggregation problem over numbers in . That is, every learning problem solvable using a small number of statistical queries kearns1998efficient can be solved privately and efficiently in the shuffled model.

1.3 Invisibility Cloak Protocol

We use a technique from protocols for secure multi-party aggregation (see e.g. secAggSurvey ): Ensure that individual numbers passed to the analyzer are fully random by adding random noise terms, but coordinate the noise such that all noise terms cancel, and the sum remain the same as the sum of the original data. Our new insight is that in the shuffled model the addition of zero-sum noise can be done without coordination between the users. Instead, each user individually produces numbers that are are fully random except that they sum to , and pass them to the shuffler. This is visualized in Figure 2. Conceptually the noise we introduce acts as an invisibility cloak: The data is still there, possible to aggregate, but is almost impossible to gain any other information from.

Figure 2: Diagram of the Invisibility Cloak Protocol for secure multi-party aggregation

The details of our encoder is given as Algorithm 1. For parameters , , and to be specified later it converts each input to a set of random values whose sum, up to scaling and rounding, equals . When the output of all encoders is composed with a shuffler this directly gives differential privacy with respect to sum-preserving changes of data (where the sum is considered after rounding). To achieve differential privacy with respect to single-user changes the protocol must be combined with a pre-randomizer that adds noise to each with some probability, see discussion in Section 2.4.

Our analyzer is given as Algorithm 2. It computes as the sum of the inputs (received from the shuffler) modulo , which by definition of the encoder is guaranteed to equal the sum of scaled, rounded inputs. If this sum will be in and will be within of the true sum . In the setting where a pre-randomizer adds noise to some inputs, however, we may have in which case we round to the nearest feasible output sum, or .

Privacy Intuition.

The output of each encoder is very close to fully random in the sense that every set of values are independent and uniformly random. Only by summing exactly the outputs of an encoder (or several encoders) do we get a value that is not uniformly random. On the other hand, many size- subsets look like the output of an encoder in the sense that the sum of elements corresponds to a feasible value . In fact, something stronger is true: For every possible input with the same sum as the true input (sum taken after scaling and rounding) we can, with high probability, find a splitting of the shuffler’s output consistent with that input. Furthermore, the number of such splittings is about the same for each potential input.

Our technique can be compared to the recently proposed “privacy blanket” balle19 , which introduces uniform, random noise to replace some inputs. Since that paper operates in a single-message model there is no possibility of ensuring perfect noise cancellation, and thus the number of noise terms needs to be kept small, which in turn means that a rather coarse discretization is required for differential privacy. Since the noise we add is zero-sum we can add much more noise, and thus we do not need a coarse discretization, ultimately resulting in much higher accuracy.

:
       Input: , integer parameters
       Output: Multiset
       Let
       for  do
            
      
       return
Algorithm 1 Invisibility Cloak Encoder Algorithm
:
       Input: , integer parameters ,

, odd

       Output:
      
       if  then return ;
       else if  then return ;
       else return ;
      
Algorithm 2 Analyzer

2 Analysis

Overview.

We first consider privacy with respect to sum-preserving changes to the input, arguing that observing the output of the shuffler gives almost no information on the input, apart from the sum. Our proof strategy is to show privacy in the setting of two players and then argue that this implies privacy for players, essentially because the two-player privacy holds regardless of the behavior of the other players. In the two-player case we first argue that with high probability the outputs of the encoders satisfy a smoothness condition saying that every potential input , to the encoders corresponds to roughly the same number of divisions of the shuffler outputs into sets of size . Finally we argue that smoothness in conjunction with the elements being unique implies privacy.

2.1 Preliminaries

Notation.

We use to denote a value uniformly sampled from a finite set , and denote by the set of all permutations of . Unless stated otherwise, sets in this paper will be multisets. It will be convenient to work with indexed multisets whose elements are identified by indices in some set . We can represent a multiset with index set as a function . Multisets and with index sets and are considered identical if there exists a bijection such that for all . For disjoint and we define the union of and as the function defined on that maps to and to .

Differential Privacy and the Shuffled Model.

We consider the established notion of differential privacy, formalizing that the output distribution does not differ much between a certain data set and any “neighboring” dataset.

Definition 1.

Let be a randomized algorithm taking as input a dataset and let and be given parameters. Then, is said to be -differentially private if for all neighboring datasets and and for all subsets of the image of , it is the case that , where the probability is over the randomness used by the algorithm .

We consider two notions of “neighboring dataset”: 1) That the input of a single user is changed, but all other inputs are the same, and 2) That the sum of user inputs is preserved. In the latter case we consider the sum after rounding to the nearest lower multiple of , for a large integer parameter , i.e., is a neighbor of if and only if . (Alternatively, just assume that the input is discretized such that is integer.)

In the shuffled model, the algorithm that we want to show differentially private is the composition of the shuffler and the encoder algorithm run on user inputs. In contrast to the local model of differential privacy, the outputs of encoders do not need to be differentially private. We refer to cheu19 for details.

2.2 Common lemmas

Let , and consider some indexed multiset that can possibly be obtained as the union of the outputs of two encoders. Further, let denote the collection of subsets of of size . For each define . We will be interested in the following property of a given (fixed) multiset :

Definition 2.

A multiset is -smooth if the distribution of values for is close to uniform in the sense that for every .

We name the collection of multisets that are -smooth and contain distinct elements:

Given such that and are integers, consider the multisets and , and let be their multiset union. The multiset

is a random variable due to the random choices made by the encoder algorithm.

Lemma 1.

For every , and for every choice of we have .

Proof of Lemma 1.

We first upper bound the probability that the multiset has any duplicate elements. For consider the event that . Since we have that every pair of distinct values are uniform in and independent, so . A union bound over all pairs yields an upper bound of on the probability that there is at least one duplicate pair.

Second, we bound the probability that is not -smooth. Let and . Then by definition of the encoder, and with probability 1. For each we have that is uniformly random in the range , over the randomness of the encoder. Furthermore, observe that the random variables are pairwise independent. Let

be the indicator random variable that is 1 if and only if

. Let . For each and we have . The sum equals the number of sets in such that . Since and it will be helpful to disregard these fixed terms in . Thus we define , which is a sum of pairwise independent terms, each with expectation . Define

. We bound the variance of

:

The second equality uses that for because it is a product of two independent, zero-mean random variables. The inequality holds because is an indicator function. By Chebychev’s inequality over the random choices in the encoder, for any :

(1)

For we can bound as follows:

Using this for upper and lower bounding in (1), and choosing we get:

A union bound over all implies that with probability at least :

(2)

Conditioned on (2) we have:

The final inequality uses the assumption that . A similar computation shows that conditioned on (2), . ∎

Corollary 1.

For , and ,

Proof.

We invoke Lemma 1 with and . The probability bound is

Because and this shows the stated bound. ∎

Denote by the sequence obtained by the deterministic encoding for given values in Algorithm 1. Moreover, we denote by the corresponding multiset.

Lemma 2.

For any and for any and , it is the case that

Proof of Lemma 2.

Using the fact that all the elements in are distinct, we have that

2.3 Analysis of Privacy under Sum-Preserving Changes

Lemma 3.

For any and for all that are integer multiples of and that satisfy , it is the case that .

Proof of Lemma 3.

We denote by the sum of all elements in the set . We define

(3)

We similarly define by replacing in (3) by .
Since , Lemma 2 implies that

(4)

Similarly, we have that

(5)

Since is -smooth, Definition 2 implies that

(6)

By Equations (4) and (5) and the assumption that (as well as the assumption that are all integer multiples of ), we get that for every -smooth whose sum is not equal to , it is the case that

(7)

and for every -smooth whose sum is equal to , the ratio of Equations (4) and (5) along with (6) give that

(8)

Lemma 4.

Suppose and that are integer multiples of satisfying for all , where . Moreover, suppose that for any set consisting of multisets of elements from , we have the following guarantee:

(9)

for some . Then, it follows that for any set of multisets consisting of elements from ,

Proof of Lemma 4.

Without loss of generality, assume and (by symmetry). Thus, for . For ease of notation, let and .

Suppose is an arbitrary set of multisets of elements from . For any , we let denote

Then, we observe that

(10)

where (10) follows from (9) and the fact that for . This completes the proof. ∎

Lemma 5.

Suppose and such that (each of these being an integer multiple of ) and for all , where . Then, for any set of multisets consisting of elements from , we have , where and .

Proof of Lemma 5.

Without loss of generality, let and . We now consider any set of multisets of elements from . Observe that

(11)
(12)

where (11) and (12) follow from Lemma 1 and Lemma 3, respectively. The desired result now follows from a direct application of Lemma 4. ∎

Using Lemma 5 as a building block for analyzing differential privacy guarantees in the context of sum-preserving swaps, we can derive a differential privacy result with respect to general sum-preserving changes.

Lemma 6.

Suppose and have coordinates that are integer multiples of satisfying and can be obtained from by a series of sum-preserving swaps. Then, for any , we have , where , , and .

Proof of Lemma 6.

We prove the lemma by induction on . Note that the case holds by Lemma 5.

Now, for the inductive step, suppose the lemma holds for . We wish to show that it also holds for . Note that there exists some such that (1) can be obtained from by a series of sum-preserving swaps and (2) can be obtained from by a single sum-preserving swap. By the inductive hypothesis, we have that