Separating Local Shuffled Differential Privacy via Histograms

11/15/2019 ∙ by Victor Balcer, et al. ∙ 0

Recent work in differential privacy has highlighted the shuffled model as a promising avenue to compute accurate statistics while keeping raw data in users' hands. We present a protocol in this model that estimates histograms with error independent of the domain size. This implies an arbitrarily large gap in sample complexity between the shuffled and local models. On the other hand, the models are equivalent when we impose the constraints of pure differential privacy and single-message randomizers.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The local model [KLNRS08] of differentially private computation has minimal trust assumptions: users each execute a privacy-preserving algorithm on their data and send the output message to an analyzer. While this model has appeal to the users—their data is never shared in the clear—the analyzer receives very noisy signals. For example, the computation of -bin histograms has error in the local model [BS15]. But when users trust the analyzer with their raw data (the central model), there is an algorithm that achieves error which notably is independent of [BNS16].

Because the local and central models lie at the extremes of trust, recent work has focused on the intermediary shuffled model [BittauEMMRLRKTS17, CSU+19]

. A protocol in this model commands each user to execute a randomized algorithm on their data, which generates a vector of messages. A trusted

shuffler applies a uniformly random permutation to all messages. The anonymity provided by the shuffler allows users to introduce less noise than in the local model (to achieve the same level of privacy). This prompts the following questions:

How well separated is the shuffled model from the local model?

How close is the shuffled model to the central model?

1.1 Our Results

In this work, we provide a new protocol for histograms in the shuffled model. It achieves a bound on error that is independent of the number of bins (Section 3). Table 1 presents existing results alongside ours: all previous bounds on error in the shuffled model depend on .

For data drawn from an underlying distribution, the results from [CSU+19, BBGN19, GPV19] all imply a polynomial separation in sample complexity between the local and shuffled models.333For example, [CSU+19] implies the sample complexity of Bernoulli mean estimation is in the shuffled model. This is in contrast with the lower bound of in the local model [BNO08]. In Section 3.3, we show that our histogram protocol implies a much stronger result:

Theorem 1.1 (Informal).

Under approximate differential privacy, the separation in sample complexity between the local and shuffled models can be made arbitrarily large.

We also prove that there are some problems which require polynomially more samples in the sequentially interactive local model than in the shuffled model. In Section 4, we complement Theorem 1.1 with a proof that the shuffled model collapses to the local model under more constrained settings:

Theorem 1.2 (Informal).

Under pure differential privacy with single-message randomizers, the shuffled model is equivalent to the local model.

Model Simultaneous Error No. Messages / User Source

 

Local 1 [BS15]
Shuffled [CSU+19]
w.h.p. [GGK+19]
[GGK+19]
[This Paper]
Central N/A [DMNS06, BNS16, BBKN14, HT10]
Table 1:

Comparison of results for the histogram problem. To simplify the presentation, we assume constant success probability,

and .

2 Preliminaries

We define a dataset to be an ordered tuple of rows where each row is drawn from a data universe and corresponds to the data of one user. Two datasets are considered neighbors (denoted as ) if they differ in exactly one row.

Definition 2.1 (Differential Privacy [Dmns06]).

An algorithm satisfies -differential privacy if

We say an -differentially private algorithm satisfies pure differential privacy when and approximate differential privacy when . For pure differential privacy, we may omit the parameter from the notation.

Definition 2.2 (Local Model [Klnrs08]).

A protocol in the (non-interactive444The literature also includes interactive variants; see [JMNR19] for a definition of sequential and full interactivity.) local model consists of two algorithms:

  • A randomizer that takes as input a single user’s data and outputs a message.

  • An analyzer that takes as input all user messages and computes the output of the protocol.

We denote the protocol . We assume that the number of users is public and available to both and . Let . For ease of notation, we overload the definition of and let . The evaluation of the protocol on input is

Definition 2.3 (Differential Privacy for Local Protocols).

A local protocol satisfies -differential privacy for users if its randomizer is -differentially private.

Definition 2.4 (Shuffled Model [BittauEMMRLRKTS17, Csu+19]).

A protocol in the shuffled model consists of three algorithms:

  • A randomizer that takes as input a single user’s data and outputs a vector of messages whose length may be randomized. If, on all inputs, the probability of sending a single message is 1, then the protocol is said to be single-message.

  • A shuffler that concatenates all message vectors and then applies a uniformly random permutation. For example, when there are three users each sending two messages, there are permutations and all are equally likely to be the output of the shuffler.

  • An analyzer that takes a permutation of messages to generate the output of the protocol.

As in the local model, we denote the protocol and assume that the number of users is accessible to both and . The evaluation of the protocol on input is

Definition 2.5 (Differential Privacy for Shuffled Protocols [Csu+19]).

A shuffled protocol satisfies -differential privacy for users if the algorithm is -differentially private.

For any , let denote the set . For any , we define the function as the normalized count of in the input

We use histogram to refer to the vector of normalized counts . For measuring the accuracy of a histogram protocol , we use the following metrics:

Definition 2.6.

A histogram protocol has -per-query accuracy if

Definition 2.7.

A histogram protocol has -simultaneous accuracy if

3 The Power of Multiple Messages for Histograms

In this section, we present an -differentially private histogram protocol in the multi-message shuffled model whose simultaneous accuracy does not depend on the universe size. We start by presenting a private protocol for releasing a single count that always outputs 0 if the true count is 0 and otherwise outputs a noisy estimate. The histogram protocol uses this counting protocol to estimate the frequency of every domain element. Its simultaneous error is the maximum noise introduced to the nonzero counts: there are at most , where may be much smaller than .

3.1 A Two-Message Protocol for Binary Sums

In the protocol (Figure 1), each user adds their true value to a Bernoulli value and reports a vector of that length. The contents of each vector will be copies of 1. Because the shuffler only reports a uniformly random permutation, the observable information is equivalent to a noisy sum. The noise is distributed as , where

is chosen so that there is sufficient variance to ensure

-differential privacy.

To streamline the presentation and analysis, we assume that so that . We can achieve privacy for a broader parameter regime by setting to a different function; we refer the interested reader to Theorem 4.11 in [CSU+19].

Randomizer for :

  1. Let .

  2. Sample .

  3. Output .

Analyzer for :

  1. Let .

  2. Let .

  3. Output .

Figure 1: The pseudocode for , a private shuffled protocol for normalized binary sums
Theorem 3.1.

For any and any such that , the protocol has the following properties:

  1. [label=.]

  2. is -differentially private in the shuffled model.

  3. For any , the error is with probability where

  4. .

  5. Each user sends at most two one-bit messages.

Proof of Part i.

If we let be the random bit generated by the -th user, the total number of messages is . Observe that learning is sufficient to represent the output of shuffler since all messages have the same value. Thus, the privacy of this protocol is equivalent to the privacy of

By post-processing, it suffices to show the privacy of where . Because privacy follows almost immediately from technical claims in [GGK+19], we defer the proof to Appendix A. ∎

Proof of Part ii.

Fix any . For shorthand, we define so that . A Chernoff bound implies that for , the following event occurs with probability :

The remainder of the proof will assume has occurred.

We perform case analysis. If , then we show that the error of is at most :

(By construction)
(By )

If , then the error is exactly . We argue that implies :

(By construction)
(By )

Rearranging terms yields

which concludes the proof. ∎

Proof of Part iii.

If , then is drawn from , which implies with probability 1. Hence, . ∎

3.2 A Multi-Message Protocol for Histograms

In the protocol (Figure 2), users encode their data as a one-hot vector . Then protocol is executed on each coordinate of . The executions are done in one round of shuffling. To remove ambiguity between executions, each message in execution has value .

Randomizer for :

  1. For each , let and let .

  2. Output the concatenation of all .

Analyzer for :

  1. For each , let all messages of value , then compute .

  2. Output .

Figure 2: The pseudocode for , a private shuffled protocol for histograms
Theorem 3.2.

For any and any such that , the protocol has the following properties:

  1. [label=.]

  2. is -differentially private in the shuffled model.

  3. has -per-query accuracy for any and

  4. has -simultaneous accuracy for any and

  5. Each user sends at most messages of length .

The accuracy guaranteed by this protocol is close to what is possible in the central model: there is a stability-based algorithm with simultaneous error [BNS16]. However, in , each user communicates messages of bits. It remains an open question as to whether or not this can be improved while maintaining similar accuracy.

Because the simultaneous error of a single-message histogram protocol is at least [CSU+19], this protocol is also proof that the single-message model is a strict subclass of the multi-message model. This separation was previously shown by [BBGN19] for the summation problem.

Proof of Part i.

Fix any neighboring pair of datasets . Let and . For any , the count of in output of the shuffler is independent of the count of in the output because each execution of is independent. As in Step (1) of , for , let ( resp.) be the vector of all messages in ( resp.) that have value .

For any where , is identically distributed to . For each of the two where , we will show that the distribution of is close to that of . Let where and . Now,

So by Therorem 3.1 Part i, for any ,

-differential privacy follows by composition.∎

Proof of Part ii-iii.

Notice that the -th element in the output is identically distributed with an execution of the counting protocol on the bits indicating if . Formally, for all . Per-query accuracy immediately follows from Theorem 3.1 Part ii.

To bound simultaneous error, we leverage the property that when , the counting protocol will report a nonzero value with probability 0. Let and let be the error bound defined in Theorem 3.1 Part ii for the failure probability .

(Theorem 3.1 Part iii)
(Theorem 3.1 Part ii)
()

This concludes the proof. ∎

3.3 Applications

In this section, we argue that solving the pointer-chasing problem in the non-interactive local model requires arbitrarily more samples than the shuffled model. We also show that solving the multi-party pointer jumping problem in the sequentially interactive local model requires polynomially more samples than in the shuffled model. These two problems reduce to a task that we call support identification and we show that solves it with relatively few samples.

Definition 3.3 (Support Identification Problem).

The support identification problem is specified by two positive integers . Let be a set of size . Using

to denote the uniform distribution over a set

, the set of problem instances is . A protocol solves the -support identification problem with sample complexity if, given users with data independently sampled from any problem instance , it identifies with probability at least 99/100.

Claim 3.4.

Under -differential privacy, the sample complexity of the support identification problem is in the shuffled model.

Proof.

For the purposes of this proof, we assume there is some bijection between and so that any reference to corresponds directly to some and vice versa. Consider the following protocol: execute on samples from and then choose the items whose estimated frequencies are nonzero. We will prove that these items are precisely those of , with probability at least .

For now, let be an arbitrary positive integer. Let be the event that some element in support has frequency less than in the sample. Let be the event that the histogram protocol (1) estimates the frequency of some element in with error exceeding or (2) overestimates the frequency of some element not in . If neither event occurs, every element in has estimated frequency at least and no element outside of has estimated frequency more than 0. Hence, it suffices to show that and each occur with probability .

By a balls-into-bins argument, it suffices to have users to ensure occurs with probability , for any . And by Theorem 3.2, it suffices to have users to ensure that occurs with probability , for a particular . The proof is complete by a union bound. ∎

Definition 3.5 (Pointer-Chasing Problem [Jmr19]).

The pointer chasing problem is denoted where are positive integer parameters. A problem instance is where are permutations of . A protocol solves the with sample complexity if, given independent samples from any , it outputs the -th integer in the sequence with probability at least .

To solve a instance, note that it suffices to find the support. Because the support has constant size, can be used to solve the problem with just samples, independent of and . But in the case where , [JMR19] give a lower bound of for non-interactive local protocols. So there is an arbitrarily large separation between the non-interactive shuffled and non-interactive local models (Theorem 1.1).

Definition 3.6 (Multi-Party Pointer Jumping Problem [Jmnr19]).

The Multi-Party Pointer Jumping Problem is denoted where are positive integer parameters. A problem instance is where each is a labeling of the nodes at level in a complete -ary tree. Each label is an integer in . The labeling implies a root-leaf path: if the -th node in the path has label , then the -st node in the path is the -th child of the -th node. A protocol solves with sample complexity if, given samples from any , it identifies the root-leaf path with probability at least .

As with pointer-chasing, is immediately solved when the support is identified. This takes samples in the shuffled model. But [JMNR19] give a lower bound of for in the local model, even allowing for sequential interactivity. We emphasize that this does not immediately imply a polynomial separation between the models.

Model

 

Local [JMR19] [JMNR19]
() (, sequentially interactive)
Shuffled * *
Central * *

 

Table 2: The sample complexity of private pointer-chasing and multi-party pointer jumping. Results marked by * follow from a reduction to histograms.

4 Pure Differential Privacy in the Shuffled Model

In this section, we prove that a single-message shuffled protocol that satisfies -differential privacy can be simulated by a local protocol under the same privacy constraint.

Theorem 4.1 (Formalization of Thm. 1.2).

For any single-message shuffled protocol that satisfies -differential privacy, there exists a local protocol that satisfies -differential privacy and is identically distributed to for any input .

We start with the following claim, which strengthens a theorem in [CSU+19] for the special case of pure differential privacy in the shuffled model,

Claim 4.2.

Let be any single-message shuffled protocol that satisfies -differential privacy. Then is an -differentially private algorithm.

Proof.

Assume for contradiction that is not -differentially private. So there are values and a set such that

Let and . Now consider , the set of message vectors where each message belongs to .

which contradicts the fact that is -differentially private. ∎

Now we are ready to prove Theorem 4.1.

Proof of Theorem 4.1.

Consider the aggregator that applies a uniformly random permutation to its input and then executes . is a local protocol that simulates , as it is trivial to see that the two protocols have the same distribution on the same inputs. And by Claim 4.2, the randomizer is -differentially private. ∎

One might conjecture Claim 4.2 also holds for multi-message protocols and thus immediately generalize Theorem 4.1. However, this is not the case:

Claim 4.3 (Informal).

There exists a multi-message shuffled protocol that is -differentially private but its randomizer is not -differentially private.

The counterexample is described and analyzed in Appendix B.

Acknowledgments

We are grateful to Daniel Alabi and Maxim Zhilyaev for discussions that shaped the presentation of this work. We are also indebted to Matthew Joseph and Jieming Mao for directing us to the pointer-chasing and multi-party pointer-jumping problems.

References

Appendix A Privacy via Smooth Distributions

Ghazi, Golowich, Kumar, Pagh and Velingker [GGK+19] identify a class of distributions and argue that, if is sampled from such a distribution, adding to a 1-sensitive sum ensures differential privacy of that sum.

Definition A.1 (Smooth Distributions, [Ggk+19]).

A distribution over is -smooth if for all ,

Lemma A.2 (Smoothness for Privacy, [Ggk+19]).

Let be a function such that for all . Let be an -smooth distribution. The algorithm that takes as input , then samples and reports satisfies -differential privacy.

Lemma A.3 (Binomial Distribution is Smooth, [Ggk+19]).

For any positive integer , , , and any , the distribution is -smooth with

Corollary A.4.

Fix any . Let . The algorithm that takes as input then samples

and reports satisfies -differential privacy.

Proof.

When observe that and Lemma A.3 implies that is sampled from an -smooth distribution:

and

()

So by Lemma A.2, we have is -differentially private. ∎

Appendix B Claim 4.2 does not Generalize

In this Appendix, we show that Claim 4.2 does not generalize to the multi-message case. This is achieved by describing the counter-example (Figure 3) which computes a binary sum.

Randomizer :

  1. Let .

  2. Sample

    , a negative binomial distribution.

    555In the case where is an integer, is the distribution of the number of failures until successes, where the probability of a success is . In the case where is a positive real, this distribution is called the Pólya distribution

  3. If , then . Otherwise, sample .

  4. Output

Analyzer :

  1. Output

Figure 3: The pseudocode for , a shuffled protocol for binary sums
Claim B.1.

For any , the protocol has the following properties:

  1. [label=]

  2. is -differentially private.

  3. is not -differentially private.

Proof of part i..

In the protocol, each sample is drawn from . Due to independence, the sum over all

of these samples has the characteristic function

But this is the characteristic function of the geometric distribution with parameter

.666 is the distribution of the number of failures until the first success, where the probability of success is .

Choose any and let

. Observe that the sum over the random variables

(Step 3) is distributed as . It follows that the output of the shuffler is a string of 1’s whose length is the sum of a sample from and a sample from .

Choose any where . Suppose we could prove that, for any ,

(1)

Then for all we have

By a symmetric argument,

To prove (1), we begin with the case where .