We consider the problem of privately summing numbers in the shuffled model recently defined by Cheu et al. cheu19 . For consistency with the literature we will use the term aggregation for the sum operation. Consider users with data . In the shuffled model user applies a randomized encoder algorithm that maps to a multiset of messages, , where is a parameter. Then a trusted shuffler takes all messages and outputs them in random order. Finally, an analyzer algorithm maps the shuffled output
to an estimate of.
A protocol in the shuffled model is -differentially private if is -differentially private (see definition in Section 2.1
), where probabilities are with respect to the random choices made in the algorithmand the shuffler . The privacy claim is justified by the existence of highly scalable protocols for privately implementing the shuffling primitive bittau17 ; cheu19 .
Two protocols for aggregation in the shuffled model were recently suggested by Balle et al. balle19 and Cheu et al. cheu19 . We discuss these further in Section 1.2, but note here that all previously known protocols have either communication or error that grows as . This is unavoidable for single-message protocols, by the lower bound in balle19 , but it has been unclear if such a trade-off is necessary in general. Cheu et al. cheu19 explicitly mention it as an open problem to investigate this question.
1.1 Our Results
We show that a trade-off is not necessary — it is possible to avoid the factor in both the error bound and the amount of communication per user. The precise results obtained depend on the notion of “neighboring dataset” in the definition of differential privacy. We consider the standard notion of neighboring dataset in differential privacy, that the input of a single user is changed, and show:
Let and be any real numbers. There exists a protocol in the shuffled model that is -differentially private under single-user changes, has expected error , and where each encoder sends messages of bits.
We also consider a different notion similar to the gold standard of secure multi-party computation: Two datasets are considered neighboring if the their sums (taken after discretization) are identical. This notion turns out to allow much better privacy, even with zero noise in the final sum — the only error in the protocol comes from representing the terms of the sum in bounded precision.
Let and be any real numbers and let . There exists a protocol in the shuffled model that is -differentially private under sum-preserving changes, has worst-case error , and where each encoder sends messages of bits.
In addition to analyzing error and privacy of our new protocol we consider its resilience towards untrusted users that may deviate from the protocol. While the shuffled model is vulnerable to such attacks in general balle19 , we argue in Section 2.5 that the privacy guarantees of our protocol are robust even to a large fraction of colluding users. For reasons of exposition we show Theorem 2 before Theorem 1. The technical ideas behind our new results are discussed in Section 1.3. Next, we discuss implications for machine learning and the relation to previous work.
1.2 Discussion of Related Work and Applications
Our protocol is applicable in any setting where secure aggregation is applied. Below we mention some of the most significant examples and compare to existing results in the literature.
Our main application in a machine learning context is gradient descent-based federated learning mcmahan2016communication
. The idea is to avoid collecting user data, and instead compute weight updates in a distributed manner by sending model parameters to users, locally running stochastic gradient descent on private data, and aggregating model updates over all users. Using asecure aggregation protocol (see e.g. practicalSecAgg
) guards against information leakage from the update of a single user, since the server only learns the aggregated model update. A federated learning system based on these principles is currently used by Google to train neural networks on data residing on users’ phonesGoogleBlog17 .
Current practical secure aggregation protocols such as that of Bonawitz et al. practicalSecAgg have user computation cost and total communication complexity , where is the number of users. This limits the number of users that can participate in the secure aggregation protocol. In addition, the privacy analysis assumes of an “honest but curious” server that does not deviate from the protocol, so some level of trust in the secure aggregation server is required. In contrast, protocols based on shuffling work with much weaker assumptions on the server bittau17 ; cheu19 . In addition to this advantage, total work and communication of our new protocol scales near-linearly with the number of users.
Differentially Private Aggregation in the Shuffled Model.
It is known that gradient descent can work well even if data is accessible only in noised form, in order to achieve differential privacy abadi2016deep . Note that in order to run gradient descent in a differentially private manner, privacy parameters need to be chosen in such a way that the combined privacy loss over many iterations is limited.
|Expected error||Privacy protection|
|Cheu et al. cheu19||
|Balle et al. balle19||Single-user change|
Each aggregation protocol shown in Figure 1 represents a different trade-off, optimizing different parameters. Our protocols are the only ones that avoid factors in both the communication per user and the error.
Private Sketching and Statistical Learning.
At first glance it may seem that aggregation is a rather weak primitive for combining data from many sources in order to analyze it. However, research in the area of data stream algorithms has uncovered many non-trivial algorithms that are small linear sketches, see e.g. cormode2011synopses ; woodruff2014sketching . Linear sketches over the integers (or over a finite field) can be implemented using secure aggregation by computing linear sketches locally and summing them up over some range that is large enough to hold the sum. This unlocks many differentially private protocols in the shuffled model, e.g. estimation of
-norms, quantiles, heavy hitters, and number of distinct elements.
Second, as observed in cheu19 we can translate any statistical query over a distributed data set to an aggregation problem over numbers in . That is, every learning problem solvable using a small number of statistical queries kearns1998efficient can be solved privately and efficiently in the shuffled model.
1.3 Invisibility Cloak Protocol
We use a technique from protocols for secure multi-party aggregation (see e.g. secAggSurvey ): Ensure that individual numbers passed to the analyzer are fully random by adding random noise terms, but coordinate the noise such that all noise terms cancel, and the sum remain the same as the sum of the original data. Our new insight is that in the shuffled model the addition of zero-sum noise can be done without coordination between the users. Instead, each user individually produces numbers that are are fully random except that they sum to , and pass them to the shuffler. This is visualized in Figure 2. Conceptually the noise we introduce acts as an invisibility cloak: The data is still there, possible to aggregate, but is almost impossible to gain any other information from.
The details of our encoder is given as Algorithm 1. For parameters , , and to be specified later it converts each input to a set of random values whose sum, up to scaling and rounding, equals . When the output of all encoders is composed with a shuffler this directly gives differential privacy with respect to sum-preserving changes of data (where the sum is considered after rounding). To achieve differential privacy with respect to single-user changes the protocol must be combined with a pre-randomizer that adds noise to each with some probability, see discussion in Section 2.4.
Our analyzer is given as Algorithm 2. It computes as the sum of the inputs (received from the shuffler) modulo , which by definition of the encoder is guaranteed to equal the sum of scaled, rounded inputs. If this sum will be in and will be within of the true sum . In the setting where a pre-randomizer adds noise to some inputs, however, we may have in which case we round to the nearest feasible output sum, or .
The output of each encoder is very close to fully random in the sense that every set of values are independent and uniformly random. Only by summing exactly the outputs of an encoder (or several encoders) do we get a value that is not uniformly random. On the other hand, many size- subsets look like the output of an encoder in the sense that the sum of elements corresponds to a feasible value . In fact, something stronger is true: For every possible input with the same sum as the true input (sum taken after scaling and rounding) we can, with high probability, find a splitting of the shuffler’s output consistent with that input. Furthermore, the number of such splittings is about the same for each potential input.
Our technique can be compared to the recently proposed “privacy blanket” balle19 , which introduces uniform, random noise to replace some inputs. Since that paper operates in a single-message model there is no possibility of ensuring perfect noise cancellation, and thus the number of noise terms needs to be kept small, which in turn means that a rather coarse discretization is required for differential privacy. Since the noise we add is zero-sum we can add much more noise, and thus we do not need a coarse discretization, ultimately resulting in much higher accuracy.
We first consider privacy with respect to sum-preserving changes to the input, arguing that observing the output of the shuffler gives almost no information on the input, apart from the sum. Our proof strategy is to show privacy in the setting of two players and then argue that this implies privacy for players, essentially because the two-player privacy holds regardless of the behavior of the other players. In the two-player case we first argue that with high probability the outputs of the encoders satisfy a smoothness condition saying that every potential input , to the encoders corresponds to roughly the same number of divisions of the shuffler outputs into sets of size . Finally we argue that smoothness in conjunction with the elements being unique implies privacy.
We use to denote a value uniformly sampled from a finite set , and denote by the set of all permutations of . Unless stated otherwise, sets in this paper will be multisets. It will be convenient to work with indexed multisets whose elements are identified by indices in some set . We can represent a multiset with index set as a function . Multisets and with index sets and are considered identical if there exists a bijection such that for all . For disjoint and we define the union of and as the function defined on that maps to and to .
Differential Privacy and the Shuffled Model.
We consider the established notion of differential privacy, formalizing that the output distribution does not differ much between a certain data set and any “neighboring” dataset.
Let be a randomized algorithm taking as input a dataset and let and be given parameters. Then, is said to be -differentially private if for all neighboring datasets and and for all subsets of the image of , it is the case that , where the probability is over the randomness used by the algorithm .
We consider two notions of “neighboring dataset”: 1) That the input of a single user is changed, but all other inputs are the same, and 2) That the sum of user inputs is preserved. In the latter case we consider the sum after rounding to the nearest lower multiple of , for a large integer parameter , i.e., is a neighbor of if and only if . (Alternatively, just assume that the input is discretized such that is integer.)
In the shuffled model, the algorithm that we want to show differentially private is the composition of the shuffler and the encoder algorithm run on user inputs. In contrast to the local model of differential privacy, the outputs of encoders do not need to be differentially private. We refer to cheu19 for details.
2.2 Common lemmas
Let , and consider some indexed multiset that can possibly be obtained as the union of the outputs of two encoders. Further, let denote the collection of subsets of of size . For each define . We will be interested in the following property of a given (fixed) multiset :
A multiset is -smooth if the distribution of values for is close to uniform in the sense that for every .
We name the collection of multisets that are -smooth and contain distinct elements:
Given such that and are integers, consider the multisets and , and let be their multiset union. The multiset
is a random variable due to the random choices made by the encoder algorithm.
For every , and for every choice of we have .
Proof of Lemma 1.
We first upper bound the probability that the multiset has any duplicate elements. For consider the event that . Since we have that every pair of distinct values are uniform in and independent, so . A union bound over all pairs yields an upper bound of on the probability that there is at least one duplicate pair.
Second, we bound the probability that is not -smooth. Let and . Then by definition of the encoder, and with probability 1. For each we have that is uniformly random in the range , over the randomness of the encoder. Furthermore, observe that the random variables are pairwise independent. Let
be the indicator random variable that is 1 if and only if. Let . For each and we have . The sum equals the number of sets in such that . Since and it will be helpful to disregard these fixed terms in . Thus we define , which is a sum of pairwise independent terms, each with expectation . Define
. We bound the variance of:
The second equality uses that for because it is a product of two independent, zero-mean random variables. The inequality holds because is an indicator function. By Chebychev’s inequality over the random choices in the encoder, for any :
For we can bound as follows:
Using this for upper and lower bounding in (1), and choosing we get:
A union bound over all implies that with probability at least :
Conditioned on (2) we have:
The final inequality uses the assumption that . A similar computation shows that conditioned on (2), . ∎
For , and ,
We invoke Lemma 1 with and . The probability bound is
Because and this shows the stated bound. ∎
Denote by the sequence obtained by the deterministic encoding for given values in Algorithm 1. Moreover, we denote by the corresponding multiset.
For any and for any and , it is the case that
Proof of Lemma 2.
Using the fact that all the elements in are distinct, we have that
2.3 Analysis of Privacy under Sum-Preserving Changes
For any and for all that are integer multiples of and that satisfy , it is the case that .
Proof of Lemma 3.
We denote by the sum of all elements in the set . We define
Similarly, we have that
Since is -smooth, Definition 2 implies that
Suppose and that are integer multiples of satisfying for all , where . Moreover, suppose that for any set consisting of multisets of elements from , we have the following guarantee:
for some . Then, it follows that for any set of multisets consisting of elements from ,
Proof of Lemma 4.
Without loss of generality, assume and (by symmetry). Thus, for . For ease of notation, let and .
Suppose is an arbitrary set of multisets of elements from . For any , we let denote
Suppose and such that (each of these being an integer multiple of ) and for all , where . Then, for any set of multisets consisting of elements from , we have , where and .
Proof of Lemma 5.
Using Lemma 5 as a building block for analyzing differential privacy guarantees in the context of sum-preserving swaps, we can derive a differential privacy result with respect to general sum-preserving changes.
Suppose and have coordinates that are integer multiples of satisfying and can be obtained from by a series of sum-preserving swaps. Then, for any , we have , where , , and .
Proof of Lemma 6.
We prove the lemma by induction on . Note that the case holds by Lemma 5.
Now, for the inductive step, suppose the lemma holds for . We wish to show that it also holds for . Note that there exists some such that (1) can be obtained from by a series of sum-preserving swaps and (2) can be obtained from by a single sum-preserving swap. By the inductive hypothesis, we have that