The past few years has seen a wave of commercially deployed systems [16, 28] for analysis of users’ sensitive data in the local model of differential privacy (LDP). LDP systems have several features that make them attractive in practice, and limit the barriers to adoption. Each user only sends private data to the data collector, so users do not need to fully trust the collector, and the collector is not saddled with legal or ethical obligations. Moreover, these protocols are relatively simple and scalable, typically requiring each party to asynchronously send just a single short message.
However, the local model imposes strong constraints on the utility of the algorithm. These constraints preclude the most useful differentially private algorithms, which require a central model where the users’ data is sent in the clear, and the data collector is trusted to perform only differentially private computations. Compared to the central model, the local model requires enormous amounts of data, both in theory and in practice (see e.g.  and the discussion in ). Unsurprisingly, the local model has so far only been used by large corporations like Apple and Google with billions of users.
In principle, there is no dilemma between the central and local models, as any algorithm can be implemented without a trusted data collector using cryptographic multiparty computation (MPC). However, despite dramatic recent progress in the area of practical MPC, existing techniques still require large costs in terms of computation, communication, and number of rounds of interaction between the users and data collector, and are considerably more difficult for companies to extend and maintain.
In this work, we initiate the analytic study of an intermediate model for distributed differential privacy called the shuffled model. This model, a special case of the ESA framework of , augments the standard model of local differential privacy with an anonymous channel (also called a shuffler) that collects messages from the users, randomly permutes them, and then forwards them to the data collector for analysis. For certain applications, this model overcomes the limitations on accuracy of local algorithms while preserving many of their desirable features. However, under natural constraints, this model is dramatically weaker than the central model. In more detail, we make two primary contributions:
We give a simple, non-interactive algorithm in the shuffled model for estimating a single Boolean-valued statistical query (also known as a counting query) that essentially matches the error achievable by centralized algorithms. We also show how to extend this algorithm to estimate a bounded real-valued statistical query, albeit at an additional cost in communication. These protocols are sufficient to implement any algorithm in thestatistical queries model , which includes methods such as gradient descent.
We consider the ubiquitous variable-selection problem—a simple but canonical optimization problem. Given a set of counting queries, the variable-selection problem is to identify the query with nearly largest value (i.e. an “approximate argmax”). We prove that the sample complexity of variable selection in a natural restriction of the shuffled model is exponentially larger than in the central model. The restriction is that each user send only a single message into the shuffle, as opposed to a set of messages, which we call this the one-message shuffled model. Our positive results show that the sample complexity in the shuffled model is polynomially smaller than in the local model. Taken together, our results give evidence that the central, shuffled, and local models are strictly ordered in the accuracy they can achieve for selection. Our lower bounds follow from a structural result showing that any algorithm that is private in the one-message shuffled model is also private in the local model with weak, but non-trivial, parameters.
In concurrent and independent work, Erlingsson et al.  give conceptually similar positive results for local protocols aided by a shuffler. We give a more detailed comparison between our work and theirs after giving a thorough description of the model and our results (Section 2.3)
1.1 Background and Related Work
Models for Differentially Private Algorithms. Differential privacy  is a restriction on the algorithm that processes a dataset to provide statistical summaries or other output. It ensures that, no matter what an attacker learns by interacting with the algorithm, it would have learned nearly the same thing whether or not the dataset contained any particular individual’s data . Differential privacy is now widely studied, and algorithms satisfying the criterion are increasingly deployed [1, 23, 16].
There are two well-studied models for implementing differentially-private algorithms. In the central model, raw data are collected at a central server where they are processed by a differentially private algorithm. In the local model [33, 18, 14], each individual applies a differentially private algorithm locally to their data and shares only the output of the algorithm—called a report or response—with a server that aggregates users’ reports. The local model allows individuals to retain control of their data since privacy guarantees are enforced directly by their devices. It avoids the need for a single, widely-trusted entity and the resulting single point of security failure. The local model has witnessed an explosion of research in recent years, ranging from theoretical work to deployed implementations. A complete survey is beyond the scope of this paper.
Unfortunately, for most tasks there is a large, unavoidable gap between the accuracy that is achievable in the two models. Beimel et al.  and Chan et al.  show that estimating the sum of bits, one held by each player, requires error in the local model, while an error of just is possible the central model.  extended this lower bound to a wide range of natural problems, showing that the error must blowup by at least , and often by an additional factor growing with the data dimension. More abstractly, Kasiviswanathan et al.  showed that the power of the local model is equivalent to the statistical query model  from learning theory. They used this to show an exponential separation between the accuracy and sample complexity of local and central algorithms. Subsequently, an even more natural separation arose for the variable-selection problem [12, 30], which we also consider in this work.
Implementing Central-Model Algorithms in Distributed Models. In principle, one could also use the powerful, general tools of modern cryptography, such as multiparty computation (MPC), or secure function evaluation, to simulate central model algorithms in a setting without a trusted server , but such algorithms currently impose bandwidth and liveness constraints that make them impractical for large deployments. In contrast, Google  now uses local differentially private protocols to collect certain usage statistics from hundreds of millions of users’ devices.
A number of specific, efficient MPC algorithms have been proposed for differentially private functionalities. They generally either (1) focus on simple summations and require a single “semi-honest”/“honest-but-curious” server that aggregates user answers, as in Shi et al. , Chan et al. , Bonawitz et al.  ; or (2) allow general computations, but require a network of servers, a majority of whom are assumed to behave honestly, as in Corrigan-Gibbs and Boneh . As they currently stand, these approaches have a number of drawbacks: they either require users to trust that a server maintained by a service provided is behaving (semi-)honestly, or they require that a coalition of service providers collaborate to run protocols that reveal to each other who their users are and what computations they are performing on their users’ data. It is possible to avoid these issues by combining anonymous communication layers and MPC protocols for universal circuits but, with current techniques, such modifications destroy the efficiency gains relative to generic MPC.
Thus, a natural question—relevant no matter how the state of the art in MPC evolves—is to identify simple (and even minimal) primitives that can be implemented via MPC in a distributed model and are expressive enough to allow for sophisticated private data analysis. In this paper, we show that shuffling is a powerful primitive for differentially private algorithms.
Mixnets. One way to realize the shuffling functionality is via a mixnet. A mix network, or mixnet, is a protocol involving several computers that takes as input a sequence of encrypted messages, and outputs a uniformly random permutation of those messages’ plaintexts. Introduced by , the basic idea now exists in many variations. In its simplest instantiation, the network consists of a sequence of servers, whose identities and ordering are public information.111Variations on this idea based on onion routing allow the user to specify a secret path through a network of mixes. Messages, each one encrypted with all the servers’ keys, are submitted by users to the first server. Once enough messages have been submitted, each server in turn performs a shuffle in which the server removes one layer of encryption and sends a permutation of the messages to the next server. In a verifiable shuffle, the server also produces a cryptographic proof that the shuffle preserved the multi-set of messages. The final server sends the messages to their final recipients, which might be different for each message. A variety of efficient implementations of mixnets with verifiable shuffles exist (see, e.g., [22, 5] and citations therein).
Old text in blue. New text in red. Another line of work [32, 29] shows how to use differential privacy in addition to mixnets to make communication patterns differentially private for the purposes of anonymous computation. Despite the superficial similarly, this line of work is orthogonal to ours, which is about how to use mixnets themselves to achieve (more accurate) differentially private data analysis.
Shufflers as a Primitive for Private Data Analysis. This paper studies how to use a shuffler (e.g. a mixnet) as a cryptographic primitive to implement differentially-private algorithms. Bittau et al.  propose a general framework, dubbed encode-shuffle-analyze (or ESA), which generalizes the local and central models by allowing a local randomized encoding step performed on user devices, a permutation step in which encrypted encodings are shuffled, and a final randomized process that analyzes the permuted encodings. We ask what privacy guarantee can be provided if we rely only on the local encoding and the shuffle —the analyst is untrusted. In particular, we are interested in protocols that are substantially more accurate than is possible in the local model (in which the privacy guarantee relies entirely on the encoding ). This general question was left open by Bittau et al. .
One may think of the shuffled model as specifying a highly restricted MPC primitive on which we hope to base privacy. Relative to general MPC, the use of mixnets for shuffling provides several advantages: First, there already exist a number of highly efficient implementations. Second, their trust model is simple and robust—as long as a single one of the servers performs its shuffle honestly, the entire process is a uniformly random permutation, and our protocols’ privacy guarantees will hold. The architecture and trust guarantees are also easy to explain to nonexperts (say, with metaphors of shuffled cards or shell games). Finally, mixnets automatically provide a number of additional features that are desirable for data collection: they can maintain secrecy of a company’s user base, since each company’s users could use that company’s server as their first hop; and they can maintain secrecy of the company’s computations, since the specific computation is done by the analyst. Note that we think of a mixnet here as operating on large batches of messages, whose size is denoted by . (In implementation, this requires a fair amount of latency, as the collection point must receive sufficiently many messages before proceeding—see Bittau et al. ).
Understanding the possibilities and limitations of shuffled protocols for private data analysis is interesting from both theoretical and practical perspectives. It provides an intermediate abstraction, and we give evidence that it lies strictly between the central and local models. Thus, it sheds light on the minimal cryptographic primitives needed to get the central model’s accuracy. It also provides an attractive platform for near-term deployment , for the reasons listed above.
For the remainder of this paper, we treat the shuffler as an abstract service that randomly permutes a set of messages. We leave a discussion of the many engineering, social, and cryptographic implementation considerations to future work.
2 Overview of Results
The Shuffled Model. In our model, there are users, each with data . Each user applies some encoder to their data and sends the messages . In the one-message shuffled model, each user sends message. The messages are sent to a shuffler that takes these messages and outputs them in a uniformly random order. The shuffled set of messages is then passed through some analyzer to estimate some function . Thus, the protocol consists of the tuple . We say that the algorithm is -differentially private in the shuffled model if the algorithm satisfies -differential privacy. For more detail, see the discussion leading to Definition 3.2.
In contrast to the local model, differential privacy is now a property of all users’ messages, and the may be functions of . However, if an adversary were to inject additional messages, then it would not degrade privacy, provided that those messages are independent of the honest users’ data. Thus, we may replace , in our results, as a lower bound on the number of honest users in the system. For example, if we have a protocol that is private for users, but instead we have users of which we assume at least a fraction are honest, the protocol will continue to satisfy differential privacy.
2.1 Algorithmic Results
Our main result shows how to estimate any bounded, real-valued linear statistic (a statistical query) in the shuffled model with error that nearly matches the best possible utility achievable in the central model. For every , and every and every function , there is a protocol in the shuffled model that is -differentially private, and for every and every ,
Each user sends one-bit messages.
For comparison, in the central model, the Laplace mechanism achieves -differential privacy and error . In contrast, error is necessary in the local model. Thus, for answering statistical queries, this protocol essentially has the best properties of the local and central models (up to logarithmic factors).
In the special case of estimating a sum of bits (or a Boolean-valued linear statistic), our protocol has a slightly nicer guarantee and form. For every , and every and every function , there is a protocol in the shuffled model that is -differentially private, and for every and every ,
Each user sends a single one-bit message.
The protocol corresponding to Theorem 2.1 is extremely simple:
For some appropriate choice of , each user with input outputs
with probabilityand a uniformly random bit with probability . When is not too small, .
The analyzer collects the shuffled messages and outputs
Intuition. In the local model, an adversary can map the set of observations to users. Thus, to achieve -differential privacy, the parameter should be set close to . In our model, the attacker sees only the anonymized set of observations , whose distribution can be simulated using only . Hence, to ensure that the protocol is differentially private, it suffices to ensure that is private, which we show holds for .New explanatory text Edited.
Communication Complexity. Our protocol for real-valued queries requires bits per user. In contrast, the local model requires just a single bit, but incurs error . A generalization of Theorem 2.1 gives error and sends bits per user, but we do not know if this tradeoff is necessary. Closing this gap is an interesting open question.
2.2 Negative Results
We also prove negative results for algorithms in the one-message shuffled model. These results hinge on a structural characterization of private protocols in the one-message shuffled model.
If a protocol satisfies -differential privacy in the one-message shuffled model, then satisfies -differential privacy. Therefore, is -differentially private in the local model.
Using Theorem 2.2 (and a transformation of  from -DP to -DP in the local model), we can leverage existing lower bounds for algorithms in the local model to obtain lower bounds on algorithms in the shuffled model.
Variable Selection. In particular, consider the following variable selection problem: given a dataset , output such that
(The approximation term is somewhat arbitrary—any sufficiently small constant fraction of will lead to the same lower bounds and separations.)
Any local algorithm (with ) for selection requires , whereas in the central model the exponential mechanism  solves this problem for . The following lower bound shows that for this ubiquitous problem, the one-message shuffled model cannot match the central model.
If is a -differentially private protocol in the one-message shuffled model that solves the selection problem (with high probability) then . Moreover this lower bound holds even if is drawn iid from a product distribution over .
In Section 6, we also prove lower bounds for the well studied histogram problem, showing that any one-message shuffled-model protocol for this problem must have error growing (polylogarithmically) with the size of the data domain. In contrast, in the central model it is possible to release histograms with no dependence on the domain size, even for infinite domains.
We remark that our lower bound proofs do not apply if the algorithm sends multiple messages through the shuffler. However, we do not know whether beating the bounds is actually possible. Applying our bit-sum protocol times (together with differential privacy’s composition property) shows that samples suffice in the general shuffled model. We also do not know if this bound can be improved. We leave it as an interesting direction for future work to fully characterize the power of the shuffled model.
2.3 Comparison to 
In concurrent and independent work, Erlingsson et al.  give conceptually similar positive results for local protocols aided by a shuffler. Specifically, they prove a general amplification result: adding a shuffler to any protocol satisfying local differential privacy improve the privacy parameters, often quite significantly. This amplification result can be seen as a partial converse to our transformation from shuffled protocols to local protocols (Theorem 2.2).
Their result applies to any local protocol, whereas our protocol for bit-sums (Theorem 2.1) applies specifically to the one-bit randomized response protocol. However, when specialized to randomized response, their result is quantitatively weaker than ours. As stated, their results only apply to local protocols that satisfy -differential privacy for . In contrast, the proof of Theorem 2.1 shows that, for randomized response, local differential privacy can be amplified to . Our best attempt at generalizing their proof to the case of does not give any amplification for local protocols with . Specifically, our best attempt at applying their method to the case of randomized response yields a shuffled protocol that is -differentially private and has error , which is just slightly better than the error that can be achieved without a shuffler.
3 Model and Preliminaries
In this section, we define terms and notation used throughout the paper. We use
to denote the Bernoulli distribution over, which has value 1 with probability and 0 with probability . We will use
to denote the binomial distribution (i.e. the sum ofindependent samples from .
3.1 Differential Privacy
Let be a dataset consisting of elements from some universe . We say two datasets are neighboring if they differ on at most one user’s data, and denote this .
[Differential Privacy ] An algorithm is -differentially private if for every and every
where the probability is taken over the randomness of .
Differential privacy satisfies two extremely useful properties:
[Post-Processing ] If is -differentially private, then for every , is -differentially private.
3.2 Differential Privacy in the Shuffled Model
In our model, there are users, each of whom holds data . We will use to denote the dataset of all users’ data. We say two datasets are neighboring if they differ on at most one user’s data, and denote this .
The protocols we consider consist of three algorithms:
is a randomized encoder that takes as input a single users’ data and outputs a set of messages . If , then is in the one-message shuffled model.
is a shuffler that takes a set of messages and outputs these messages in a uniformly random order. Specifically, on input , chooses a uniformly random permutation and outputs .
is some analysis function or analyzer that takes a set of messages and attempts to estimate some function from these messages.
We denote the overall protocol . The mechanism by which we achieve privacy is
where both and are randomized. We will use to denote the output of the protocol. However, by the post-processing property of differential privacy (Lemma 3.1), it will suffice to consider the privacy of , which will imply the privacy of . We are now ready to define differential privacy for protocols in the shuffled model.
[Differential Privacy in the Shuffled Model] A protocol is -differentially private if the algorithm is -differentially private (Definition 3.1).
In this model, privacy is a property of the entire set of users’ messages and of the shuffler, and thus may depend on the number of users . When we wish to refer to or with a specific number of users , we will denote this by or .
We remark that if an adversary were to inject additional messages, then it would not degrade privacy, provided that those messages are independent of the honest users’ data. Thus, we may replace , in our results, with an assumed lower bound on the number of honest users in the system.
In some of our results it will be useful to have a generic notion of accuracy for a protocol . [Accuracy of Distributed Protocols] Protocol is -accurate for the function if, for every , we have where is some application-dependent distance measure.
As with the privacy guarantees, the accuracy of the protocol may depend on the number of users , and we will use when we want to refer to the protocol with a specific number of users.
Composition of Differential Privacy We will use the following useful composition property for protocols in the shuffled model, which is an immediate consequence of Lemma 3.1 and the post-processing Lemma 3.1. This lemma allows us to directly compose protocols in the shuffled model while only using the shuffler once, rather than using the shuffler independently for each protocol being composed. [Composition of Protocols in the Shuffled Model] If for are each -differentially private in the shuffled model, and is defined as
then, for every , the composed protocol is -differentially private in the shuffled model for .
3.2.1 Local Differential Privacy
If the shuffler were replaced with the identity function (i.e. if it did not randomly permute the messages) then we would be left with exactly the local model of differential privacy. That is, a locally differentially private protocol is a pair of algorithms , and the output of the protocol is . A protocol is differentially private in the local model if and only if the algorithm is differentially private. In Section 6 we will see that if is a differentially private protocol in the one-message shuffled model, then itself must satisfy local differential privacy for non-trivial , and thus is a differentially private local protocol for the same problem.
4 A Protocol for Boolean Sums
In this section we describe and analyze a protocol for computing a sum of bits, establishing Theorem 2.1 in the introduction.
4.1 The Protocol
In our model, the data domain is and the function being computed is . Our protocol, , is specified by a parameter that allows us to trade off the level of privacy and accuracy. Note that may be a function of the number of users . We will discuss in Section 4.3 how to set this parameter to achieve a desired level of privacy. For intuition, one may wish to think of the parameter when is not too small.
The basic outline of is as follows. Roughly, a random set of users will choose randomly, and the remaining will choose to be their input bit . The output of each user is the single message . The outputs are then shuffled and the output of the protocol is the sum
, shifted and scaled so that it is an unbiased estimator of.
The protocol is described in Algorithm 1. The full name of this protocol is , where the superscript serves to distinguish it with the real sum protocol (Section 5). Because of the clear context of this section, we drop the superscript. Since the analysis of both the accuracy and utility of the algorithm will depend on the number of users , we will use to denote the protocol and its components in the case where the number of users is .
4.2 Privacy Analysis
In this section we will prove that satisfies -differential privacy. Note that if then the each user’s output is independent of their input, so the protocol trivially satisfies -differential privacy, and thus our goal is to prove an upper bound on the parameter that suffices to achieve a given .
[Privacy of ] There are absolute constants such that the following holds for . For every , and , there exists a such that is differentially private and,
In the remainder of this section we will prove Theorem 4.2.
The first step in the proof is the observation that the output of the shuffler depends only on . It will be more convenient to analyze the algorithm (Algorithm 2) that simulates . Claim 4.2 shows that the output distribution of is indeed the same as that of the output . Therefore, privacy of carries over to .
For every , , and every ,
Fix any . How should we lead into this calculation?
Let denote the (random) set of people for whom in . Notice that
which is the same as (1). This concludes the proof. ∎
Now we establish that in order to demonstrate privacy of , it suffices to analyze . If is differentially private, then is differentially private.
Fix any number of users . Consider the randomized algorithm that takes a number and outputs a uniformly random string that has ones. If is differentially private, then the output of is differentially private by the post-processing lemma.
We will analyze the privacy of in three steps. First we show that for any sufficiently large , the final step (encapsulated by Algorithm 3) will ensure differential privacy for some parameters. When then show that for any sufficiently large value and chosen randomly with , the privacy parameters actually improve significantly in the regime where is close to ; this sampling of is performed by Algorithm 4. Finally, we show that when is chosen randomly then is sufficiently large with high probability.
For any and any such that , is -differentially private for
Fix neighboring datasets , any such that , and any . If the point at which differ lies within , the two distributions are identical. Hence, without loss of generality we assume that and for some .
Define and so that by Hoeffding’s inequality(Theorem E), . For any we have,
Thus to complete the proof, it suffices to show that for any and
Because and , we have . Thus,
Now we define so that
Then we can calculate
|( is binomial)|
|( so )|
which completes the proof. ∎
Next, we consider the case where is a random subset of with a fixed size . In this case we will use an amplification via sampling argument [20, 26] to argue that the randomness of improves the privacy parameters by a factor of roughly , which will be crucial when .
For any and any , is differentially private for
As in the previous section, fix where . selects uniformly from and runs ; let denote the realization of . To enhance readability, we will use the shorthand . For any , we aim to show that
First, we have
When user outputs a uniformly random bit, their private value has no impact on the distribution. Hence, , and
Since is sufficiently large, by Claim 4.2 we have .
Observe that , so
which completes the proof. ∎
We now come to the actual algorithm , where is not fixed but is random. The analysis of yields a bound on the privacy parameter that increases with , so we will complete the analysis of by using the fact that, with high probability, is almost as large as .
For any and , is differentially private where
The proof is in Appendix A.