Private Aggregation from Fewer Anonymous Messages

09/24/2019 ∙ by Badih Ghazi, et al. ∙ 0

Consider the setup where n parties are each given a number x_i ∈F_q and the goal is to compute the sum ∑_i x_i in a secure fashion and with as little communication as possible. We study this problem in the anonymized model of Ishai et al. (FOCS 2006) where each party may broadcast anonymous messages on an insecure channel. We present a new analysis of the one-round “split and mix” protocol of Ishai et al. In order to achieve the same security parameter, our analysis reduces the required number of messages by a Θ(log n) multiplicative factor. We complement our positive result with lower bounds showing that the dependence of the number of messages on the domain size, the number of parties, and the security parameter is essentially tight. Using a reduction of Balle et al. (2019), our improved analysis of the protocol of Ishai et al. yields, in the same model, an (ε, δ)-differentially private protocol for aggregation that, for any constant ε > 0 and any δ = 1/poly(n), incurs only a constant error and requires only a constant number of messages per party. Previously, such a protocol was known only for Ω(log n) messages per party.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We study one-round multi-party protocols for the problem of secure aggregation: Each of parties holds an element of the field and we wish to compute the sum of these numbers, while satisfying the security property that for every two inputs with the same sum, their transcripts are “indistinguishable.” The protocols we consider work in the anonymized model, where parties are able to send anonymous messages through an insecure channel and indistinguishability is in terms of the statistical distance between the two transcripts (i.e., this is information-theoretic security rather than computational security). This model was introduced by Ishai et al. [14] in their work on cryptography from anonymity111Ishai et al. in fact considered a more general model in which the adversary is allowed to corrupt some of the parties; please refer to the discussion at the end of Section 1.1 for more details.. We refer to [14, 7] for a discussion of cryptographic realizations of an anonymous channel.

The secure aggregation problem in the anonymized model was studied already by Ishai et al. [14], who gave an elegant one-round “split and mix” protocol. Under their protocol, each party holds a private input and sends anonymized messages consisting of random elements of that are conditioned on summing to . Upon receiving these anonymized messages from parties, the server adds them up and outputs the result. Pseudocode of this protocol is shown as Algorithm 1. Ishai et al. [14] show that as long as exceeds a threshold of , this protocol is -secure in the sense that the statistical distance between transcripts resulting from inputs with the same sum is at most .

Differentially Private Aggregation in the Shuffled Model.

An exciting recent development in differential privacy is the shuffled model, which is closely related to the aforementioned anononymized model. The shuffled model provides a middle ground between two widely-studied models of differential privacy. In the central model, the data structure released by the analyst is required to be differentially private, whereas the local model enforces the more stringent requirement that the messages sent by each party be private. While protocols in the central model generally allow better accuracy, they require a much greater level of trust to be placed in the analyzer, an assumption that may be unsuitable for certain applications. The shuffled model is based on the Encode-Shuffle-Analyze architecture of Bittau et al. [5] and was first formalized by Cheu et al. [7] and recently studied in [4, 7, 10, 11]. It seeks to bridge the two aforementioned models and assumes the presence of a trusted shuffler that randomly permutes all incoming messages from the parties before passing them to the analyzer (see Section 2 for formal definitions.) The shuffled model is particularly compelling because it allows the possibility of obtaining more accurate communication-efficient protocols than in the local model while placing far less trust in the analyzer than in the central model. Indeed, the power of the shuffled model has been illustrated by a number of recent works that have designed algorithms in this model for a wide range of problems such as privacy amplification, histograms, heavy hitters, and range queries [7, 10, 4, 11].

The appeal of the shuffled model provides the basis for our study of differentially private protocols for aggregation in this work. Most relevant to the present work are the recent differentially private protocols for aggregation of real numbers in the shuffled model provided by [12, 3]. The strongest of these results [3] shows that an extension of the split and mix protocol yields an -differentially private protocol for aggregation with error and messages, each consisting of bits.

1.1 Our Results

We prove that the split and mix protocol is in fact secure for a much smaller number of messages. In particular, for the same security parameter , the number of messages required in our analysis is times smaller than the bound in [14]:

Theorem 1.1 (Improved upper bound for split and mix).

Let and be positive integers and be a positive real number. The split and mix protocol (Algorithm 1 and [14]) with parties and inputs in is -secure for messages, where .

An interesting case to keep in mind is when the field size and the inverse statistical distance are bounded by a polynomial in . In this case, Theorem 1.1 implies that the protocol works already with a constant number of messages, improving upon the known bound. We recently learned that Balle et al. [2] have, independently of our work, obtained an analysis of the split and mix protocol that provides guarantees for messages, but we are not aware of specifics.

To complement our analysis, we show that, in terms of the number of messages sent by each party, Theorem 1.1 is essentially tight not only for just the split and mix protocol but also for every one-round protocol:

Theorem 1.2 (Lower bound for every one-round protocol).

Let and be positive integers, and be a real number. In any -secure, one-round aggregation protocol over in the anonymized model, each of the parties must send messages.

The lower bound holds regardless of the message size and asymptotically matches the upper bound under the mild assumption that is bounded by a polynomial in . When is larger, the bound is tight up to a factor .

We note here that our lower bound is for the number of messages. In terms of the communication complexity for each party, improvements are still possible when . We discuss this further in Section 5.

As stated earlier, the differentially private aggregation protocols of [3, 12] both use extensions of the split and mix protocol. Moreover, Balle et al. use the security guarantee of the split and mix protocol as a blackbox and derive a differential privacy guarantee from it [3, Lemma 4.1]. Specifically, when  is a constant and , their proof uses the split and mix protocol with field size . Previous analyses required ; however, our analysis works with a constant number of messages. In general, Theorem 1.1 implies -differential privacy with a factor fewer messages than known before:

Corollary 1 (Differentially private aggregation in the shuffled model).

Let be a positive integer, and let , be positive real numbers. There is an -differentially private aggregation protocol in the shuffled model for inputs in having absolute error in expectation, using messages per party, each consisting of bits.

A more comprehensive comparison between our differentially private aggregation protocol in Corollary 1 and previous protocols is presented in Figure 1.

We end this subsection by remarking that Ishai et al. [14] in fact considered a setting that is more general than what we have described so far. Specifically, they allow the adversary to corrupt a certain number of parties. In addition to the transcript of the protocol, the adversary knows the input and messages of these corrupted parties. (Alternatively, one can think of these corrupted parties as if they are colluding to learn the information about the remaining parties.) As already observed in [14], the security of the split and mix protocol still holds in this setting except that is now the number of honest (i.e., uncorrupted) parties. In other words, Theorem 1.1 remains true in this more general setup but with being the number of honest parties instead of the total number of parties.

1.2 Applications and Related Work

At first glance it may seem that aggregation is a rather weak primitive for combining data from many sources in order to analyze it. However, in important approaches to machine learning and distributed/parallel data processing, the mechanism for combining computations of different parties is

aggregation of vectors

. Since we can build vector aggregation in a straightforward way from scalar aggregation, our results can be applied in these settings.

Before discussing this in more detail, we mention that it is shown in [14] that summation protocols can be used as building blocks for realizing general secure computations in a specific setup where a server mediates computation of a function on data held by other parties. However, the result assumes a somewhat weak security model (see in Appendix D of [14] for more details).

Machine Learning.

Secure aggregation has applications in so-called federated

machine learning. The idea is to train a machine learning model without collecting data from any party, and instead compute weight updates in a distributed manner by sending model parameters to all parties, locally running stochastic gradient descent on private data, and aggregating model updates over all parties. For learning algorithms based on gradient descent, a secure aggregation primitive can be used to compute global weight updates without compromising privacy 

[18, 17]. It is known that gradient descent can work well even if data is accessible only in noised form, in order to achieve differential privacy [1].

Beyond gradient descent, as observed in [7], we can translate any statistical query over a distributed data set to an aggregation problem over numbers in . That is, every learning problem solvable using a small number of statistical queries [15] can be solved privately and efficiently based on secure aggregation.

Sketching.

Research in the area of data stream algorithms has uncovered many non-trivial algorithms that are compact linear sketches, see, e.g., [8, 23]. As noted already in [14], linear sketches can be implemented using secure aggregation by computing linear sketches locally, and then using aggregation to compute their sum which yields the sketch of the whole dataset. Typically, linear sketches do not reveal much information information about their input, and are robust to the noise needed to ensure differential privacy, though specific guarantees depend on the sketch in question. We refer to [16, 19, 20] for examples and further discussion.

Secure aggregation protocols.

Secure aggregation protocols are well-studied, both under cryptographic assumptions and with respect to differential privacy. We refer to the survey of Goryczka et al. [13] for an overview, but note that our approach leads to protocols that use less communication than existing (multi-round) protocols. The trust assumptions needed for implementing a shuffler (e.g., using a mixnet) are, however, slightly different from the assumptions typically used for secure aggregation protocols. Practical secure aggregation typically relies on an honest-but-curious assumption, see e.g. [6]. In that setting, such protocols typically require rounds of communication with bits of communication and computation per party. A more recent work [21] using homomorphic threshold encryption gives a protocol with three messages and constant communication and computation per party in addition to a (reusable) two-message setup (consisting of communication per party). By contrast, our aggregation protocol has a single round of constant communication and computation per party, albeit in the presence of a trusted shuffler.

Other related models.

A very recent work [22] has designed an extension of the shuffled model, called Multi Uniform Random Shufflers and analyzed its trust model and privacy-utility tradeoffs. Since they consider a more general model, our differentially private aggregation protocol would hold in their setup as well.

There has also been work on aggregation protocols in the multiple servers setting, e.g., the PRIO system [9]; here the protocol is secure as long as at least one server is honest. Thus trust assumptions of PRIO are somewhat different from those underlying shuffling and mixnets. While each party would be able to check the output of a shuffler, to see if its message is present, such a check is not possible in the PRIO protocol making server manipulation invisible even if the number of parties is known. On the other hand, PRIO handles malicious parties that try to manipulate the result of a summation by submitting illegal data — a challenge that has not been addressed yet for summation in the Shuffled model but that would be interesting future work.

Reference #messages / Message size Expected error
Cheu et al. [7]
1
Balle et al. [4]
Ghazi et al [12]
Balle et al. [3]
This work (Corollary 1)
Figure 1: Comparison of differentially private aggregation protocols in the shuffled model with -differential privacy. The number of parties is , and is an integer parameter. Message sizes are in bits. For readability, we assume that , and asymptotic notations are suppressed.

1.3 The Split and Mix Protocol

The protocol of [14] is shown in Algorithm 1. To describe the main guarantee proved in [14] regarding Algorithm 1, we need some notation. For any input sequence , we denote by the distribution on obtained by sampling uniformly at random conditioned on , sampling a random permutation , and outputting . Ishai et al. [14] proved that for some and for any two input sequences having the same sum (in ), the distributions and are -close in statistical distance.

Input: , positive integer parameter
Output: Multiset
for  do
      
(in )
return
Algorithm 1 Split and mix encoder from [14]

1.4 Overview of Proofs

We now give a short overview of the of Theorems 1.1 and 1.2. For ease of notation, we define to be the set of all input vectors with a fixed sum .

Upper Bound.

To describe the main idea behind our upper bound, we start with the following additional notation. For every , we denote by the distribution on generated uniformly at random conditioned on all coordinates summing to .

To prove Theorem 1.1, we have to show that for any two input sequences such that , the statistical distance between and is at most . By the triangle inequality, it suffices to show that the statistical distance between and is at most . (Theorem 3.1). Note that puts equal mass on all vectors in whose sum is equal to . Thus, our task boils down to showing that the mass put by on a random sample from

is well-concentrated. We prove this via a second order method (specifically, Chebyshev’s inequality). This amounts to computing the mean and bounding the variance. The former is a simple calculation whereas the latter reduces to proving a probabilistic bound on the rank deficit of a certain random matrix (Theorem 

3.2). A main ingredient in the proof of this bound is a combinatorial characterization of the rank deficit of the relevant matrices (Lemma 2).

Lower Bound.

For the lower bound (Theorem 1.2), our proof consists of two parts: a “security-dependent” lower bound and a “field-dependent” lower bound . Combining the two yields Theorem 1.2.

The security-dependent lower bound follows from the following statement (see Theorem 4.2): if is the encoder of any aggregation protocol in the anonymized model for parties with messages sent per party, then there is a vector such that the statistical distance between the distributions of the shuffled output corresponding to inputs and is at least .

Let us first sketch a simple proof for the particular case of the split and mix protocol. In this case, we set , and we will bound from below the statistical distance by considering the “distinguisher” which chooses a random permutation and accepts iff . We can argue (see Subsection 4.2

) that the probability that

accepts under the distribution is larger by an additive factor of than the probability that it accepts under the distribution . To generalize this idea to arbitrary encoders (beyond Ishai et al.’s protocol), it is natural to consider a distinguisher which accepts iff is a valid output of the encoder when the input is zero. Unlike the case of Ishai et al., in general when do not all come from the same party, it is not necessarily true that the acceptance probability would be the same for both distributions. To circumvent this, we pick the smallest integer such that the -message marginal of the encoding of 0 and that of input 1 are substantially different, and we let the distinguisher perform an analogous check on (instead of as before). Another complication that we have to deal with is that we can no longer consider the input vector as in the lower bound for Ishai et al.’s protocol sketched above. This is because the -message marginal of the encoding of could deviate from that for input more substantially than from that for input 1, which could significantly affect the acceptance probability. Hence in the actual proof, we instead pick that minimizes the value of such among all numbers in , and use the input vector .

Next, we turn to the field-dependent lower bound (see Theorem 4.1). The key idea is to show that for any , there exist distinct inputs such that the statistical distance between and is at least (see Lemma 4). We do so by proving the same lower bound on the average statistical distance between and over all pairs .

The average statistical distance described above can be written as the sum, over all , of the average difference in probability mass assigned to by and . Thus, we consider how to lower bound this coordinate-wise probability mass difference for an arbitrary .

There are at most ways to associate each of the elements of with a particular party. Since any individual party’s encoding uniquely determines the corresponding input, it follows that any shuffled output could have arisen from at most inputs . Moreover, since there are exactly input vectors , it follows that there are at least possible inputs that cannot possibly result in as an output. This implies that the average coordinate-wise probability mass difference, over all , is at least times the average probability mass assigned to over all inputs in . Summing this up over all yields the desired bound.

Organization of the Rest of the Paper

We start with some preliminaries in Section 2. We prove our main upper bound (Theorem 1.1) in Section 3. We prove our lower bound (Theorem 1.2) in Section 4. The proof of Corollary 1 appears in Appendix 0.B.

2 Preliminaries

2.1 Protocols

In this paper, we are concerned with answering the question of how many messages are needed for protocols to achieve certain security or cryptographic guarantees. We formally define the notion of protocols in the models of interest to us.

We first define the notion of a secure protocol in the shuffled model. An -user secure protocol in the shuffled model, , consists of a randomized encoder (also known as local randomizer) and an analyzer . Here, is known as the message alphabet, is the message space for each user, and is the output space of the protocol. The protocol implements the following mechanism: each party holds an input and encodes as . (Note that is possibly random based on the private randomness of party .) The concatenation of the encodings, is then passed to a trusted shuffler, who chooses a uniformly random permutation on elements and applies to . The output is submitted to the analyzer, which then outputs .

In this paper, we will be concerned with protocols for aggregation, in which (a finite field on elements) and , and

i.e., the protocol always outputs the sum of the parties’ inputs, regardless of the randomness over the encoder and the shuffler.

A related notion that we consider in this work is a one-round protocol in the anonymized model. The notion is similar to that of a secure protocol in the shuffled model except that there is no shuffler. Rather, the analyzer receives a multiset of messages obtained by enumerating all messages of each of the parties’ encodings. It is straightforward to see that the two models are equivalent, in the sense that a protocol in one model works in the other and the distributions of the view of the analyzer are the same.

2.2 Distributions Related to a Protocol

To study a protocol and determine its security and privacy, it is convenient to define notations for several probability distributions related to the protocol. First, we use

to denote the distribution of the (random) encoding of :

Definition 1.

For a protocol with encoding function , we let denote the distribution of outputs over obtained by applying to .

Furthermore, for a vector , we use to denote the distribution of the concatenation of encodings of , as stated more formally below.

Definition 2.

For an -party protocol with encoding function and , we let denote the distribution over obtained by applying individually to each element of , i.e.,

Finally, we define to be after random shuffling. Notice that is the distribution of the transcript seen at the analyzer.

Definition 3.

For an -party protocol with encoding function and , we let denote the distribution over obtained by applying to the elements of and then shuffling the resulting -tuple, i.e.,

for a uniformly random permuation over elements.

2.3 Security and Privacy

Given two distributions and , we let denote the statistical distance (aka the total variation distance) between and .

We begin with a notion of -security for computation of a function , which essentially says that distinct inputs with a common function value should be (almost) indistinguishable:

Definition 4 (-security).

An -user one-round protocol in the anonymized model is said to be -secure for computing a function if for any such that , we have

In this paper, we will primarily be concerned with the function that sums the inputs of each party, i.e., given by .

We now define the notion of -differential privacy. We say that two input vectors and are neighboring if they differ on at most one party’s data, i.e., for all but one value of .

Definition 5 (-differential privacy).

An algorithm is -differentially private if for every neighboring input vectors and every , we have

where probability is over the randomness of .

We now define -differential privacy specifically in the shuffled model.

Definition 6.

A protocol with encoder is -differentially private in the shuffled model if the algorithm given by

is -differentially private, where is a uniformly random permutation on elements.

3 Proof of Theorem 1.1

In this section, we prove Theorem 1.1, i.e., that the split and mix protocol of Ishai et al. is -secure even for messages, improving upon the known bounds of  [14, 3, 12].

Since we only consider Ishai et al.’s split and mix protocol in this section, we will drop the superscript from and simply write to refer to the shuffled output distribution of the protocol. Recall that, by the definition of the protocol, is generated as follows: for every , sample uniformly at random conditioned on . Then, pick a random permutation and output .

Showing that the protocol is -secure is by definition equivalent to showing that for all inputs such that .

In fact, we prove a stronger statement, that each is -close (in statistical distance) to the distribution that is uniform over all vectors in whose sum of all coordinates is equal to , as stated below.

Theorem 3.1.

For every , let denote the distribution on generated uniformly at random conditioned on all coordinates summing to . For any parameter and any , the following holds: for every , the statistical distance between and is at most .

When plugging in , Theorem 3.1 immediately implies Theorem 1.1 via the triangle inequality.

We now outline the overall proof approach. First, observe that puts probability mass equally across all vectors whose sum of all coordinates is , whereas puts mass proportional to the number of permutations such that satisfies for all . Thus, our task boils down to proving that this latter number of is well-concentrated (for a random

). We prove this via a second moment method (specifically Chebyshev’s inequality). Carrying this out amounts to computing the first moment and upper-bounding the second moment of this number. The former is a simple calculation, whereas the latter involves proving an inequality regarding the rank of a certain random matrix (Theorem 

3.2). We do so by providing a combinatorial characterization of the rank deficit of the relevant matrices (Lemma 2).

The rest of this section is organized as follows. In Subsection 3.1

, we define appropriate random variables, state the bound we want for the second moment (Lemma 

4), and show how it implies our main theorem (Theorem 3.1). Then, in Subsection 3.2, we relate the second moment to the rank of a random matrix (Proposition 1). Finally, we give a probabilistic bound on the rank of such a random matrix in Subsection 3.3 (Theorem 3.2).

3.1 Bounding Statistical Distance via Second Moment Method

From now on, let us fix , and let . The variables we define below will depend on (or ), but, for notational convenience, we avoid indicating these dependencies in the variables’ names.

For every , let denote the number of permutations such that for all . From the definition222Note that, if derived directly from the definition of , here should be replaced by . However, these two definitions are equivalent since is a bijection. of , its probability mass function is

(1)

As stated earlier, Theorem 3.1 is essentially about the concentration of , which we will prove via the second moment method. To facilitate the proof, for every , let us also denote by the indicator variable of “ for all ”. Note that by definition we have

(2)

where denotes the set of all permutations of .

When we think of as a random variable distributed according to , the mean of (and hence of ) can be easily computed: the probability that satisfies “” is exactly for each , and these events are independent. Furthermore, when these events are true, it is automatically the case that the condition holds for . Hence, we immediately have:

Observation 1.

For every ,

(3)

The more challenging part is upper-bounding the second moment of (where we once again think of as a random variable drawn from ). This is equivalent to upper-bounding the expectation of , where are independent uniformly random permutations of and is once again drawn from . On this front, we will show the following bound in the next subsections.

Lemma 1.

For every , we have

(4)

Since there are many parameters, the bound might look a bit confusing. However, the only property we need in order to show concentration of is that the right-hand side of (4) is dominated by the term. This is the case when the term inside the parenthesis is , which indeed occurs when .

The bound in Lemma 1 will be proved in the subsequent sections. For now, let us argue why such a bound implies our main theorem (Theorem 3.1).

Proof of Theorem 3.1.

First, notice that (2) and Observation 1 together imply that

(5)

For convenience, let us define as .

We now bound the second moment of as follows:

Now, let . If , then we have . Plugging this back in the above inequality gives

In other words, we have

Hence, by Chebyshev’s inequality, we have

(6)

Finally, notice that the statistical distance between and is

3.2 Relating Moments to Rank of Random Matrices

Having shown how Lemma 1 implies our main theorem (Theorem 3.1), we now move on to prove Lemma 1 itself. In this subsection, we deal with the first half of the proof by relating the quantity on the left-hand side of (4) to a quantity involving the rank of a certain random matrix.

3.2.1 Warm-Up: (Re-)Computing the First Moment

As a first step, let us define below a class of matrices that will be used throughout.

Definition 7.

For every permutation , let us denote by the matrix whose -th row is the indicator vector for . More formally,

Before we describe how these matrices relate to the second moment, let us illustrate their relation to the first moment, by sketching an alternative way to prove Observation 1. To do so, let us rearrange the left-hand side of (3) as

Now, observe that iff . Since the rows of the matrix have pairwise-disjoint supports, the matrix is always full rank (over ), i.e., . This means that the number of values of satisfying the aforementioned equation is . Plugging this into the above expansion gives

Hence, we have rederived (3).

3.2.2 Relating Second Moment to Rank

In the previous subsection, we have seen the relation of matrix to the first moment. We will now state such a relation for the second moment. Specifically, we will rephrase the left-hand side of (4) as a quantity involving matrices and . To do so, we will need the following additional notations:

Definition 8.

For a pair of permutations , we let denote the (column-wise) concatenation of and , i.e.,

Furthermore, let333Note that is equal to the corank of . the rank deficit of be .

Analogous to the relationship between the first moment and seen in the previous subsection, the quantity is in fact proportional to the number of solutions to certain linear equations, which is represented by . This allows us to give the bound to the former, as formalized below.

Proposition 1.

For every pair of permutations , we have

Proof.

First, let us rearrange the left-hand side term as

(7)

Now, notice that iff . Similarly, iff . In other words, iff

The number of solutions to the above equation is at most . Plugging this back into (7), we get

as desired. ∎

3.3 Probabilistic Bound on Rank Deficit of Random Matrices

The final step of our proof is to bound the probability that the rank deficit of is large. Such a bound is encapsulated in Theorem 3.2 below. Notice that Proposition 1 and Theorem 3.2 immediately yield Lemma 1.

Theorem 3.2.

For any , we have

for all .

3.3.1 Characterization of Rank Deficit via Matching Partitions.

To prove Theorem 3.2, we first give a “compact” and convenient characterization of the rank deficit of . In order to do this, we need several additional notations: we say that a partition of a universe is non-empty if . Moreover, for a set , we use to denote the set . Finally, we need the following definition of matching partitions.

Definition 9.

Let be any pair of permutations of . A pair of non-empty partitions and is said to match with respect to iff

(8)

for all . When are clear from the context, we may omit “with respect to ” from the terminology.

Condition (8) might look a bit mysterious at first glance. However, there is a very simple equivalent condition in terms of the matrices : and match iff the sum of rows of coincides with the sum of rows of , i.e., .

An easy-to-use equivalence of is that a pair of matching partitions and exists. We only use one direction of this relation, which we prove below.

Lemma 2.

For any permutations , if , then there exists a pair of matching partitions and .

Proof.

We will prove the contrapositive. Let be any permutations, and suppose that there is no pair of matching partitions and . We will show that , or equivalently .

Consider any pair of matching partitions444Note that at least one matching partition always exists: . and that maximizes the number of parts . From our assumption, we must have .

For every part , let us pick an arbritrary element . Consider all rows of , except the -th rows for all (i.e.