There is an inherent tension in data analysis between privacy and statistical utility. This tension is captured by the Fundamental Law of Information Recovery: Revealing “overly accurate answers to too many questions will destroy privacy”.444This formulation is from [DR14-monograph], while the quantitative version is from [DinurN03]. This tension, however, is not equally pronounced for every set of queries an analyst may wish to evaluate on a sensitive dataset. As a simple illustration, a single query repeated times is much easier to answer while preserving privacy than is a collection of random queries. For this reason, one of the basic goals of algorithmic privacy research is to design efficient private algorithms that optimally adapt to the structure of any given collection of queries. Phrased more specifically, this goal is to design algorithms achieving nearly optimal sample complexity—the minimum dataset size required to privately produce answers to within a prescribed accuracy—for any given workload of queries. In this work, we address this problem in the context of answering statistical queries in both the central and local models of differential privacy. Building on the projection mechanism [NTZ], and using the ideas behind Dudley’s chaining inequality, we propose new algorithms for privately answering statistical queries. Our algorithms are efficient and achieve instance optimal sample complexity (up to constant factors, in certain parameter regimes) with respect to a large class of differentially private algorithms. Specifically, for every collection of statistical queries, our algorithms provide answers with constant average squared error with the minimum dataset size requirement amongst all algorithms satisfying concentrated differential privacy (CDP) [DBLP:journals/corr/DworkR16, BunS16]. We further show that our algorithmic techniques can be adapted to work in the local model of differential privacy, where they again achieve optimal sample complexity amongst all algorithms with constant average squared error.
A dataset is a multiset of elements from a data universe . A statistical (also referred to in the literature as “linear”) query is specified by a function . Overloading notation, the value of the query on a dataset is
where dataset elements appear in the sum with their multiplicity. A query workload is simply a set of statistical queries. We use the notation
for the vector of answers to the queries inon dataset . In the centralized setting in which the dataset is held by a single trusted curator, we model privacy by (zero-)concentrated differential privacy. This definition was introduced by Bun and Steinke [BunS16], and is essentially equivalent to the original definition of (mean-)concentrated differential privacy proposed by Dwork and Rothblum [DBLP:journals/corr/DworkR16], and closely related to Mironov’s Rényi differential privacy [Mironov17]. Before we state the definition, we recall that two datasets of size are neighboring if we can obtain from by replacing only one of its elements with another element of the universe .
A randomized algorithm satisfies -zCDP if for any two neighboring datasets and , and all ,
where denotes the Rényi divergence of order measured in nats.
For the definition of Rényi divergence and further discussion of concentrated differential privacy, we refer the reader to Section 2.2. For now, we remark that concentrated differential privacy is intermediate in strength between “pure” (-) and “approximate” (-)differential privacy, in the sense that every mechanism satisfying -differential privacy also satisfies -zCDP, and every mechanism satisfying -zCDP also satisfies -differential privacy for every (see [BunS16]). Our privacy-preserving techniques (i.e., Gaussian noise addition) give privacy guarantees which are most precisely captured by concentrated differential privacy. In general, concentrated differential privacy captures a rich subclass (arguably, the vast majority) of the techniques in the differential privacy literature, including Laplace and Gaussian noise addition, the exponential mechanism [McSherryT07], sparse vector [DR14-monograph], and private multiplicative weights [HardtR10]. Crucially, concentrated differential privacy admits a simple and tight optimal composition theorem which matches the guarantees of the so-called “advanced composition” theorem for approximate differential privacy [DworkRV10]
. Because of these properties, concentrated differential privacy and its variants has been adopted in a number of recent works on private machine learning, for example[PrivateDL, ParkFCW16, Lee17, PATE].11todo: 1more? We also study the local model of differential privacy, in which the sensitive dataset is no longer held by a single trusted curator, but is instead distributed between parties where each party holds a single element . The parties engage in an interactive protocol with a potentially untrusted server, whose goal is to learn approximate answers to the queries in . Each party is responsible for protecting her own privacy, in the sense that the joint view of every party except should be -differentially private with respect to the input . (See Section 2.3 for the precise details of the definition.) Almost all industrial deployments of differential privacy, including those at Google [RAPPOR], Apple [AppleDP], and Microsoft [MicrosoftDP], operate in the local model, and it has been the subject of intense study over the past few years [BassilySmith15, BassilyNST17, DJW-ASA]. While the local model of privacy offers stronger guarantees to individuals, it is more restrictive in terms of the available privacy preserving techniques. In particular, algorithms based on the exponential mechanism in general cannot be implemented in the local model [KLNRS]. Nevertheless, we show that our algorithms can be relatively easily adapted to the local model with guarantees analogous to the ones we get in the centralized model. We believe this is evidence for the flexibility of our approach. In order to discuss error guarantees for private algorithms, let us first introduce some notation. We consider two natural measures of error: average (or root-mean squared) error, and worst-case error. For an algorithm we define its error on a query workload and databases of size as follows.
where each maximum is over all datasets of size , is the answer to query given by the algorithm on input , and expectations are taken with respect to the random choices of . The notation is used analogously in the local model, with an interactive protocol in the place of the algorithm .
1.2 Main Results
For the rest of this paper we will work with an equivalent formulation of the query release problem which is more natural from the perspective of geometric techniques, and will also ease our notation. For a given workload of queries , we can define the set by . We can identify each data universe element with the element of , so we can think of as just a multiset of elements of ; the true query answers then just become the mean of the elements in . This motivates us to introduce the mean point problem: given a dataset of elements from a (finite) universe , approximate the mean , where, as usual, the dataset elements are enumerated with repetition. We assume that the algorithm is explicitly given the set . By analogy with the query release problem, we can define measures of error for any given dataset by
Similarly, we can define for any finite universe the error measures
Algorithms for the query release problem can be used for the corresponding mean point problem, and vice versa, with the same error and privacy guarantees. Therefore, for the rest of the paper we will focus on the mean point problem with the understanding that analogous results for query release follow immediately. In this work, we give query release algorithms whose error guarantees naturally adapt to the properties of the queries, or, equivalently, we give algorithms for the mean point problem that adapt to the geometric properties of the set . Notice that for any datasets and of size that differ in a single element, , where is the set of all pairwise sums of elements in and . Since differential privacy should hide the difference between and , a private algorithm should not reveal where in the set the true mean lies. This suggests that that the size of , and, relatedly, the size of itself, should control the sample complexity of the mean point problem. However, it is non-trivial to identify the correct notion of “size”, and to design algorithms whose sample complexity adapts to this notion. In this work we adopt separation numbers, which quantify how many well-separated points can be packed in , as a measure of the size of . More precisely, for any set and , we define the separation number (a.k.a. packing number) as
That is, is the size of the largest set of points in whose normalized pairwise distances are all greater than . Analogously, we define the separation number as
Our bounds for average error will be expressed in terms of , and the bounds for worst case error will be expressed in terms of . We will give algorithms whose sample complexity is controlled by the separation numbers, and we will also prove nearly matching lower bounds in the regime of constant error.
1.2.1 Average Error
We propose two new algorithms for private query release: the Coarse Projection Mechanism and the Chaining Mechanism. Both algorithms refine the Projection Mechanism of Nikolov, Talwar, and Zhang [NTZ]. Recall that the Projection Mechanism simply adds sufficient Gaussian noise to in order to guarantee differential privacy, and then projects the noisy answer vector back onto the convex hull of , i.e. outputs the closest point to the noisy answer vector in the convex hull of with respect to the norm. Miraculously, this projection step can dramatically reduce the error of the original noisy answer vector. The resulting error can be bounded by the Gaussian mean width of , which in turn is always at most polylogarithmic in the cardinality of . Our first refined algorithm, which we call the Coarse Projection Mechanism, instead projects onto a minimal -cover of , i.e. a set such that contains . (Here, “+” denotes the Minkowski sum, i.e. the set of all pairwise sums, and is the unit Euclidean ball in .) Since this cover is potentially a much smaller set than , the projection may incur less error this way. The size of a minimal cover is closely related to the separation number, and the separation numbers themselves are related to the Gaussian mean width by Dudley’s chaining inequality. We use these connections in the analysis of our algorithm. The guarantees of the mechanism are captured by the next theorem.
There exists a constant and a -zCDP algorithm (the Coarse Projection Mechanism) for the mean point problem, which for any finite , achieves average error as long as
Moreover, runs in time .
We note that the sample complexity of the Coarse Projection Mechanism can be much lower than that of the Projection Mechanism. For example, consider a set defined as a circular cone with apex at the origin, and a ball of radius centered at as its base. (Here is the first standard basis vector.) Then, a direct calculation reveals that in order to achieve average error , the Projection Mechanism requires a dataset of size at least . By contrast, the Coarse Projection Mechanism would project onto the line segment from the apex to the center of the base and achieve error with a dataset of size . While in this example is not finite, it can be discretized to a finite set without significantly changing the sample complexity. Inspired by the proof of Dudley’s inequality, we give an alternative Chaining Mechanism whose error guarantees are incomparable to the Coarse Projection Mechanism. Instead of just taking a single cover of , we take a sequence of progressively finer covers . This allows us to write as the Minkowski sum , where the diameter of decreases with , while its cardinality grows. We can then decompose the mean point problem over into a sequence of mean point problems, which we solve individually with the projection mechanism. The next theorem captures the guarantees of this mechanism.
There exists a constant and a -zCDP algorithm (the Chaining Mechanism) for the mean point problem, which for any finite achieves average error as long as
Moreover, runs in time .
While our algorithms are generic, we show that for constant error, they achieve optimal sample complexity for any given workload of queries. To be more precise about the instance-optimality of our results, we define the sample complexity of answering a query workload with error under -zCDP by
In the local model we analogously define and with the minimum taken over all protocols satisfying -local differential privacy. For context, let us recall the sample complexity in the centralized model of some known algorithms, and how it compares to the best possible sample complexity. For average error, the projection mechanism [NTZ] can answer any workload with error at most under -zCDP as long as
It is known that there exist workloads for which this bound on matches up to constant factors. One particularly natural example here is the workload of -way marginals on the universe , which consists of queries [BunUV14]. Thus, the sample complexity of private query release with respect to worst-case workloads of any given size is well-understood. However, we know much less about optimal mechanisms and the behavior of for specific workloads . This behavior can depend strongly on the workload. For example, for the workload of threshold queries defined on a totally ordered universe by , we have sample complexity only . This motivates the following problems:
Characterize in terms of natural quantities associated with .
Identify efficient algorithms whose sample complexity on any workload nearly matches the optimal sample complexity .
We call algorithms with the property in item 2. above approximately instance-optimal. Note that it is a priori not clear that there should exist any efficient instance optimal algorithms. Here our notion of efficiency is polynomial time in , the number of queries, and the size of the universe . This is natural, as this is the size of the input to the algorithm, which needs to take a description of the queries in addition to the database. One could wish for a more efficient algorithm when the queries are specified implicitly, for example by a circuit, but this has been shown to be impossible in general under cryptographic assumptions [DworkNRRV09]. We prove lower bounds showing instance optimality for our algorithms when the error parameter is constant. Once again, we state the lower bounds for the mean point problem, rather than the query release problem. The equivalence of the two problems implies that we get the same optimality results for query release as for the mean point problem. To state the results we extend our notation above to the mean point problem, and define
and, analogously for and . Building on the packing lower bounds of Bun and Steinke [BunS16], we show that separation numbers also provide lower bounds on . The following theorem is proved in the appendix. 22todo: 2Move it from the appendix?
For any finite and every , we have
Comparing the lower bound (3) with our algorithmic guarantees (1) and (2), we see that the algorithms in Theorems 1.2 and 1.3 can achieve error on databases of size at most , where hides factors polynomial in and is a constant. In other words, when the error is constant, our mechanisms have sample complexity which is instance-optimal up to constant factors. The constant error regime is practically the most interesting one and is widely studied in the differential privacy literature. It captures the natural problem of identifying the smallest database size on which the mean point problem (resp. the query release problem) can be solved with non-trivial error. In his survey [Vadhan17] Vadhan asked explicitly for a characterization of the sample complexity of counting queries in the constant error regime under approximate differential privacy (Open Problem 5.25). Our results make a step towards resolving this question by giving a characterization for the rich subclass of algorithms satisfying concentrated differential privacy. Beyond the constant error regime, proving instance optimality results with tight dependence on the error parameter remains a tantalizing open problem. We note that we are not aware of any for which the sample complexity of the Chaining Mechanism is suboptimal by more than a factor.
1.2.3 Worst-Case Error
Using a variant of the chaining mechanism from Theorem 1.3, we get a guarantee for worst-case error as well.
There exists constant , and a -zCDP algorithm that for any finite achieves as long as
Moreover, runs in time .
This result shows the flexibility of the chaining mechanism. The analysis of the coarse projection mechanism relied crucially on Dudley’s inequality, which is tailored to Euclidean space and the Gaussian mean width. There are, in general, no mechanisms with worst-case error guarantees whose sample complexity depends on the Gaussian mean width, so it is unclear how to adapt the coarse projection mechanism to worst-case error. Nevertheless, by incorporating the idea of chaining used in the proof of Dudley’s inequality inside the algorithm itself, we are able to derive an analogous result. A lower bound analogous to Theorem 1.4 for worst-case error reveals that the sample complexity of the algorithm in Theorem 1.5 on workload with error is at most . I.e., we get instance-optimality up to a factor for constant .
1.2.4 Local Differential Privacy
Illustrating further the flexibility of our techniques, we show that the Coarse Projection Mechanism and the Chaining Mechanism can be adapted to the local model. The protocols we design are non-interactive, with each party sending a single message to the server, and satisfy pure
-local differential privacy. The protocols are in fact very similar to our algorithms in the central model, except that instead of Gaussian noise we use a variant of the local mean estimation algorithm from[DJW-ASA] to achieve privacy. The other steps in the Coarse Projection and the Chaining Mechanisms are either pre- or post-processing of the data and can be adapted seamlessly to the local model.
There exists a constant and a non-interactive -LDP protocol that for any finite achieves average error as long as
Furthermore, there exists a non-interactive -LDP protocol that achieves average error
as long as
Both protocols run in time .
Moreover, for constant average error , our algorithms achieve instance-optimal sample complexity up to constant factors. This is true even with respect to -LDP algorithms permitting “sequential” interaction between parties (see Section 2.3 for details of the model). The theorem is proved in the appendix using the framework of Bassily and Smith [BassilySmith15]
For any finite , every , and any satisfying for a sufficiently large constant , we have
It is an interesting open problem to extend these instance optimality results to worst-case error. While the lower bound extends in a straightforward way, our mechanisms do not, as there is no analog of the projection mechanism for worst-case error, and also no analog of the multiplicative weights mechanism in the local model. Moreover, it is known that in the local model packing lower bounds like these in Theorem 1.7 can be exponentially far from the true sample complexity with respect to worst-case error. For instance, Kasiviswanathan et al. [KLNRS] showed that learning parities over the universe has sample complexity exponential in in the local model, and learning parities easily reduces to answering parity queries with small constant worst-case error. At the same time, packing lower bounds can only show a lower bound which is polynomial in . Thus, worst case error has substantially different behavior from average error in the local model and requires different techniques.
1.3 Related Work
Instance-optimal query release was previously studied in a line of work that brought the tools of asymptotic convex geometry to differential privacy [HardtT10, BhaskaraDKT12, NTZ, Nikolov15, KattisN17]. However, despite significant effort, completely resolving these questions for approximate differential privacy appears to remain out of reach of current techniques. The papers [HardtT10, BhaskaraDKT12] focus on pure differential privacy, and their results only apply for very small values of , while here we focus on the regime of constant . A characterization for pure differential privacy with constant is known [RothNotes, BunS16, Vadhan17] based on the same geometric quantities considered in this work. When phrased in our language, these works show that for every constant error parameter , the sample complexity of the mean point problem with pure differential privacy is characterized up to constant factors by the logarithm of an appropriate separation number of the set . The sample complexity lower bound follows from a packing argument. Meanwhile, the upper bound is obtained by using the exponential mechanism of McSherry and Talwar [McSherryT07] to identify a point in a minimal cover of which is as close as possible to . Unlike our algorithms, this application of the exponential mechanism runs in time super-polynomial in . While we prove instance-optimality of our algorithms using similar lower bound techniques (i.e., the generalization of packing arguments to CDP from [BunS16]), our new algorithms appear to be completely different. There is no known analogue of the exponential mechanism that is tailored to achieve optimal sample complexity for CDP, and our algorithms are instead based on the projection mechanism. The papers [NTZ, Nikolov15] focus on approximate differential privacy, and give results for the entire range of , but their bounds are loose by factors polynomial in . We avoid such gaps, since for many natural workloads, such as marginal queries, is exponential in the other natural parameters of . The recent paper [KattisN17] is also very closely related to our work, but does not prove tight upper and lower bounds on for arbitrary .
In this section we define basic notation, state the definitions of concentrated differential privacy and local differential privacy, and state the known algorithms which will serve as building blocks for our own algorithms. We also describe the geometric tools which will be used throughout this paper.
We use the notation to denote the existence of an absolute constant such that , where and themselves may depend on a number of parameters. Similarly, denotes the existence of an absolute constant such that . We use and for the standard and norms. We use to denote the unit ball in . For two subsets , the notation denotes the Minkowski sum, i.e. the set
. For a real-valued random variablewe use the notation .
2.2 Concentrated Differential Privacy
Recalling Definition 1.1, we say that a randomized algorithm satisfies -zCDP if for any two neighboring datasets and all , we have
Here, denotes the Rényi divergence of orderwith absolutely continuous with respect to , this quantity is defined as
For two random variables , the divergence is defined as the divergence of their probability densities. One of the crucial properties of CDP is the following tight composition theorem which matches the guarantees of the so-called “advanced composition” theorem for approximate differential privacy [DworkRV10].
Lemma 2.1 ([BunS16]).
Assume that the algorithm satisfies -zCDP, and, for every in the range of , the algorithm satisfies -zCDP. Then the algorithm defined by satisfies -zCDP.
We remark that as a special case of Lemma 2.1, one can take to be a -zCDP algorithm which does not directly access the sensitive dataset at all. In this case, the combined algorithm satisfies -zCDP, showing that zCDP algorithms can be postprocessed without affecting their privacy guarantees. Our algorithms are designed by carefully applying two basic building blocks: the Projection and the Private Multiplicative Weights mechanisms. Below we state their guarantees for the mean point problem. In order to state the error guarantees for the projection mechanism, we need a couple of definitions. First, let us define the support function of a set on any by , where is the standard inner product. If is a standard Gaussian random variable in , then we define the Gaussian mean width of a set by .
Lemma 2.2 ([Ntz]).
Let and let . There exists a mechanism (The Projection Mechanism) such that, for every finite set ,
Moreover, runs in time .
Lemma 2.3 ([HardtR10]).
There exists a mechanism (The Private Multiplicative Weights Mechanism) such that, for any finite ,
Moreover, runs in time .
2.3 Local Differential Privacy
In the local model, the private database is distributed among parties, each party holding exactly one element of . For convenience, we index the parties by the integers from to , and denote by the element of held by party . The parties together with a server engage in a protocol in order to compute some function of the entire database . Here we consider sequentially interactive protocols (with non-interactive ones as a special case), as defined by Duchi, Wainright, and Jordan [DJW-ASA]. The protocol is defined by a collection of randomized algorithms . Algorithm takes as input and a message received from party , and produces a pair , where is sent to party , and is sent to the server. Parties and are exceptions: only takes as input, and only produces as output. Then the server runs a randomized algorithm on inputs to produce the final output of the protocol. We use to denote the union of all outputs of the algorithms. The running time of the protocol is the total running time of the algorithms and . Note that a special case of a sequentially interactive protocol is a non-interactive protocol, in which ignores and only depends on its private input . Non-interactive protocols roughly capture the randomized response model of Warner [Warner-RR], and their study in the context of differential privacy goes back to [DworkMNS06]. To formulate our privacy definition in the local model, let us recall the notions of max-divergence and approximate max-divergence, defined for any two random variables and on the same probability space by
where the supremum is over measurable sets in the support of . With this notation, the standard definition of an -differentially private algorithm [DworkMNS06] is as follows.
A randomized algorithm satisfies -differential privacy if for datasets and we have
We will need the simple composition theorem for differential privacy. See the book [DR14-monograph] for a proof.
Assume that the algorithm satisfies -differential privacy, and, for every in the range of , the algorithm satisfies -differential privacy. Then the algorithm defined by satisfies -differential privacy.
The privacy definition for a sequentially interactive protocol in the local model we adopt is as follows.
A protocol in the local model satisfies -local differential privacy (LDP) if the algorithm satisfies -differential privacy with respect to the single element dataset , and, for every and every in the range of , the algorithm satisfies -differential privacy with respect to the single element dataset . When a protocol satisfies -LDP, we also say that it satisfies -LDP.
We note that while our protocols work in the non-interactive pure LDP model (i.e. ), our lower bounds work against the larger class of sequentially interactive protocols and approximate LDP (i.e. sufficiently small but nonzero ).
2.4 Packings, Coverings, and Dudley’s Inequality
Recall the definitions of separation numbers given in the Introduction: for a set and a proximity parameter we denote
Note the non-standard scaling of , which we chose because it corresponds better to the definition of average error. To prove the optimality of our algorithms, we make use of the well-known duality between packings (captured by separation numbers) and coverings. We say that a set is a -covering of with respect to metric if for every there exists a point such that . This definition gives rise to the family of covering numbers (in or ) of a compact set , defined by
The next lemma relating separation numbers to covering numbers is folklore (see e.g. Chapter 4 of [AGM-book] for a proof).
Let be a compact subset of , and be a real number. Let be a maximal subset of with respect to inclusion s.t. (resp. ). Then is a -cover of , i.e. for any there exists a such that (resp. ). This implies
We will sometimes have to contend explicitly with the sets described above. A set is called a -separated set with respect to a metric if for every , we have . In what follows, when we discuss -separated sets in the context of the error measure the underlying metric will be the scaled norm , and in the context of the error measure the underlying metric will be the norm . Dudley’s Inequality is a tool which allows us to relate the Gaussian mean width of a set with the family of covering numbers of at all scales. (Note that the normalization factor of appearing in our definition of separation/covering numbers causes this statement to differ by a factor of from its usual formulation.)
Lemma 2.8 ([ledoux1991probability, Chapter 11.1]).
For any subset , with diameter , we have
2.5 Subgaussian Random Variables
We recall the standard definition of a subgaussian random variable.
We say that a mean zero random variable is -subgaussian, if for every fixed , we have
For an arbitrary random variable we say that it is -subgaussian if is -subgaussian.
We recall some basic facts about subgaussian random variables in the appendix.
3 Decompositions of the Universe
In this section we show a simple decomposition lemma (Lemma 3.3) that underlies all of our new algorithms. We begin by identifying an important property common to both error measures and which will be essential to Lemma 3.3.
Definition 3.1 (Subadditive error measure).
Let be a finite universe enumerated as . For each , consider an arbitrary decomposition , and define , . This decomposition induces, for any dataset , a pair of datasets . We say that an error measure is a subadditive error measure if for every finite universe , every decomposition as above, every dataset and every pair of algorithms (or local protocols) , we have
where the algorithm (resp. local protocol) is defined by .
Both error measures of interest in this paper are subadditive error measures.
Both and are subadditive error measures.
The proof of this claim can be found in the appendix. It follows directly from the triangle inequality for and norms respectively.
Let be a subset of a Minkowski sum , and let be functions, respectively, from to such that . Consider an arbitrary subadditive error measure . Let be a sequence of algorithms (respectively protocols in the local model) such that for every ,
satisfies -zCDP (resp. -LDP).
Then we can construct a -zCDP mechanism (resp. -LDP protocol) with , where (resp. ). Moreover, the running time of is bounded by the sum of the running times of , and the sum of the running times to compute on vectors from . If are non-interactive local protocols, then so is .
We first prove the lemma for CDP. For a database , we can consider a sequence of induced databases , where is derived from by applying pointwise to each one of its elements. Given a database we compute independently for every , and release . The privacy of follows from the composition properties of zCDP (Lemma 2.1), and postprocessing — i.e. by the composition lemma we know that releasing satisfies -zCDP, and by postprocessing has the same privacy guarantee. Moreover, the error bound is satisfied by inductively applying subadditivity of the error measure. Indeed, for any specific database , we have , and therefore
The proof for LDP is analogous: each party , given input , for each , runs the local protocol with input ). The protocols can be run in parallel. At the end the server can compute and output the sum of the outputs of the local protocols. The error analysis is the same as above, and the privacy bound follows from the simple composition theorem (Lemma 2.5) for (pure) differential privacy. ∎
4 Algorithms for Concentrated Differential Privacy
In this section we define our two new algorithms in the centralized model. In the subsequent section we describe how to adapt them to the local model.
4.1 The Coarse Projection Mechanism
In this section, we prove Theorem 1.2 giving the guarantees of the coarse projection mechanism. For a finite , let be an inclusion-maximal -separated subset of with respect to the metric . Let . We claim that : this follows since, by Lemma 2.7, is a -cover of . Let be the projection mechanism (as in Lemma 2.2) and let be the trivial -zCDP mechanism where for all . Note that has error . We now invoke Lemma 3.3 with and as described, and using the (subadditive) error measure . As mentioned in the introduction, this gives the following simple mechanism:
Round each element of the dataset to the nearest point in the covering to get a rounded dataset .
Add enough Gaussian noise to to preserve -zCDP; let the resulting noisy vector be .
Output the closest point in to in .
We will use the following lemma to analyze the error incurred by the projection mechanism (corresponding to steps 2. and 3. above).
For any and any -separated set with diameter , we have
By Dudley’s inequality (Lemma 2.8), we have
Now we can bound the two summands on the right hand side separately. Note that for , because is -separated, we have — indeed, every covering of radius has to contain every point of . Therefore
On the other hand, for the second summand, we have
This completes the proof. ∎
Hence, as soon as , the first term is bounded by , and the total error . By applying Lemma 4.1, we can deduce that it is enough to have
Since , we have for each . Therefore, as soon as