Differential privacy is compatible with a tremendous number of powerful data analysis tasks, including essentially any statistical learning problem [KLN11, CMS11, BST14] and the generation of synthetic data consistent with exponentially large families of statistics [BLR13, RR10, HR10, GRU12, NTZ13]. Unfortunately, it is also beset with a comprehensive set of computational hardness results. Of course, it inherits all of the computational hardness results from the (non-private) agnostic learning literature: for example, even the simplest learning tasks — like finding the best conjunction or linear separator to approximately minimize classification error — are hard [FGKP09, FGRW12, DOSW11]. In addition, tasks that are easy absent privacy constraints can become hard when these constraints are added. For example, although information theoretically, it is possible to privately construct synthetic data consistent with all -way marginals for -dimensional data, privately constructing synthetic data for even -way marginals is computationally hard [UV10]. These hardness results extend even to providing numeric answers to more than quadratically many statistical queries [Ull16].
How should we proceed in the face of pervasive computational hardness? We might take inspiration from machine learning, which has not been slowed, despite the fact that its most basic problems (e.g. learning linear separators) are already hard even to approximate. Instead, the field has employed heuristics with tremendous success — including exact optimization of convex surrogate loss functions (as in the case of SVMs), decision tree heuristics, gradient based methods for differentiable but non-convex problems (as in back-propogation for training neural networks), and integer programming solvers (as in recent work on interpretable machine learning[UR16]). Other fields such as operations research similarly have developed sophisticated heuristics including integer program solvers and SAT solvers that are able to routinely solve problems that are hard in the worst case.
The case of private data analysis is different, however. If we are only concerned with performance (as is the case for most machine learning and combinatorial optimization tasks), we have the freedom to try different heuristics, and evaluate our algorithms in practice. Thus the design of heuristics that perform well in practice can be undertaken as an empirical science. In contrast, differential privacy is an inherently worst-case guarantee that cannot be evaluated empirically (see[GM18] for lower bounds for black-box testing of privacy definitions).
In this paper, we build a theory for how to employ non-private heuristics (of which there are many, benefitting from many years of intense optimization) to solve computationally hard problems in differential privacy. Our goal is to guide the design of practical algorithms about which we can still prove theorems:
We will aim to prove accuracy theorems under the assumption that our heuristics solve some non-private problem optimally. We are happy to make this assumption when proving our accuracy theorems, because accuracy is something that can be empirically evaluated on the datasets that we are interested in. An assumption like this is also necessary, because we are designing algorithms for problems that are computationally hard in the worst case. However:
We aim to prove that our algorithms are differentially private in the worst case, even under the assumption that our heuristics might fail in an adversarial manner.
1.1 Overview of Our Results
Informally, we give a collection of results showing the existence of oracle-efficient algorithms for privately solving learning and synthetic data generation problems defined by discrete classes of functions that have a special (but common) combinatorial structure. One might initially ask whether it is possible to give a direct reduction from a non-private but efficient algorithm for solving a learning problem to an efficient private algorithm for solving the same learning problem without requiring any special structure at all. However, this is impossible, because there are classes of functions (namely those that have finite VC-dimension but infinite Littlestone dimension) that are known to be learnable absent the constraint of privacy, but are not privately learnable in an information-theoretic sense [BNSV15, ALMM18]. The main question we leave open is whether being information theoretically learnable under the constraint of differential privacy is sufficient for oracle-efficient private learning. We give a barrier result suggesting that it might not be.
Before we summarize our results in more detail, we give some informal definitions.
We begin by defining the kinds of oracles that we will work with, and end-goals that we will aim for. We will assume the existence of oracles for (non-privately) solving learning problems: for example, an oracle which can solve the empirical risk minimization problem for discrete linear threshold functions. Because ultimately oracles will be implemented using heuristics, we consider two types of oracles:
Certifiable heuristic oracles might fail, but when they succeed, they come with a certificate of success. Many heuristics for solving integer programs are certifiable, including cutting planes methods and branch and bound methods. SAT Solvers (and any other heuristic for solving a decision problem in NP) are also certifiable.
We define an oracle-efficient non-robustly differentially private algorithm to be an algorithm that runs in polynomial time in all relevant parameters given access to an oracle for some problem, and has an accuracy guarantee and a differential privacy guarantee which may both be contingent on the guarantees of the oracle — i.e. if the oracle is replaced with a heuristic, the algorithm may no longer be differentially private. Although in certain situations (e.g when we have very high confidence that our heuristics actually do succeed on all instances we will ever encounter) it might be acceptable to have a privacy guarantee that is contingent on having an infallible oracle, we would much prefer a privacy guarantee that held in the worst case. We say that an oracle-efficient algorithm is robustly differentially private if its privacy guarantee is not contingent on the behavior of the oracle, and holds in the worst case, even if an adversary is in control of the heuristic that stands in for our oracle.
1.1.2 Learning and Optimization
Our first result is a reduction from efficient non-private learning to efficient private learning over any class of functions that has a small universal identification set [GKS93]. A universal identification set of size is a set of examples such that the labelling of these examples by a function is enough to uniquely identify . Equivalently, a universal identification set can be viewed as a separator set [SKS16]: for any pair of functions , there must be some example in the universal identification set such that . We will use these terms interchangeably throughout the paper. We show that if has a universal identification set of size , then given an oracle which solves the empirical risk minimization problem (non-privately) over , there is an -differentially private algorithm with additional running time scaling linearly with and error scaling linearly with that solves the private empirical risk minimization problem over . The error can be improved to , while satisfying -differential privacy. Many well studied discrete concept classes from the PAC learning literature have small universal identification sets. For example, in dimensions, boolean conjunctions, disjunctions, parities, and halfspaces defined over the hypercube have universal identification sets of size . This means that for these classes, our oracle-efficient algorithm has error that is larger than the generic optimal (and computationally inefficient) learner from [KLN11] by a factor of . Other classes of functions also have small universal identification sets — for example, decision lists have universal identification sets of size .
The reduction described above has the disadvantage that not only its accuracy guarantees — but also its proof of privacy — depend on the oracle correctly solving the empirical risk minimization problem it is given; it is non-robustly differentially private. This shortcoming motivates our main technical result: a generic reduction that takes as input any oracle-efficient non-robustly differentially private algorithm (i.e. an algorithm whose privacy proof might depend on the proper functioning of the oracle) and produces an oracle-efficient robustly differentially private algorithm, whenever the oracle is implemented with a certifiable heuristic. As discussed above, this class of heuristics includes the integer programming algorithms used in most commercial solvers. In combination with our first result, we obtain robustly differentially private oracle-efficient learning algorithms for conjunctions, disjunctions, discrete halfspaces, and any other class of functions with a small universal identification set.
1.1.3 Synthetic Data Generation
We then proceed to the task of constructing synthetic data consistent with a class of queries . Following [HRU13, GGAH14], we view the task of synthetic data generation as the process of computing an equilibrium of a particular zero sum game played between a data player and a query player. In order to compute this equilibrium, we need to be able to instantiate two objects in an oracle-efficient manner:
a private learning algorithm for (this corresponds to solving the best response problem for the “query player”), and
a no-regret learning algorithm for a dual class of functions that results from swapping the role of the data element and the query function (this allows the “data player” to obtain a diminishing regret bound in simulated play of the game).
The no-regret learning algorithm need not be differentially private. From our earlier results, we are able to construct an oracle-efficient robustly differentially private learning algorithm for whenever it has a small universal identification set. On the other hand, Syrgkanis et al. [SKS16] show how to obtain an oracle-efficient no regret learning algorithm for a class of functions under the same condition. Hence, we obtain an oracle-efficient robustly differentially private synthetic data generation algorithm for any class of functions for which both and have small universal identification sets. Fortunately, this is the case for many interesting classes of functions, including boolean disjunctions, conjunctions, discrete halfspaces, and parity functions. The result is that we obtain oracle-efficient algorithms for generating private synthetic data for all of these classes. We note that the oracle used by the data player need not be certifiable.
1.1.4 A Barrier Result
Finally, we exhibit a barrier to giving oracle-efficient private learning algorithms for all classes of functions known to be privately learnable. We identify a class of private learning algorithms called perturbed empirical risk minimizers (pERMs) which output the query that exactly minimizes some perturbation of their empirical risk on the dataset. This class of algorithms includes the ones we give in this paper, as well as many other differentially private learning algorithms, including the exponential mechanism and report-noisy-min. We show that any private pERM can be efficiently used as a no-regret learning algorithm with regret guarantees that depend on the scale of the perturbations it uses. This allows us to reduce to a lower bound on the running time of oracle-efficient online learning algorithms due to Hazan and Koren [HK16]. The result is that there exist finite classes of queries such that any oracle-efficient differentially private pERM algorithm must introduce perturbations that are polynomially large in the size of , whereas any such class is information-theoretically privately learnable with error that scales only with .
The barrier implies that if oracle-efficient differentially private learning algorithms are as powerful as inefficient differentially private learning algorithms, then these general oracle efficient private algorithms must not be perturbed empirical risk minimizers. We conjecture that the set of problems solvable by oracle-efficient differentially private learners is strictly smaller than the set of problems solvable information theoretically under the constraint of differential privacy, but leave this as our main open question.
1.2 Additional Related Work
Conceptually, the most closely related piece of work is the “DualQuery” algorithm of [GGAH14], which in the terminology of our paper is a robustly private oracle-efficient algorithm for generating synthetic data for -way marginals for constant . The main idea in [GGAH14] is to formulate the private optimization problem that needs to be solved so that the only computationally hard task is one that does not depend on private data. There are other algorithms that can straightforwardly be put into this framework, like the projection algorithm from [NTZ13]. This approach immediately makes the privacy guarantees independent of the correctness of the oracle, but significantly limits the algorithm design space. In particular, the DualQuery algorithm (and the oracle-efficient version of the projection algorithm from [NTZ13]) has running time that is proportional to , and so can only handle polynomially sized classes of queries (which is why needs to be held constant). The main contribution of our paper is to be able to handle private optimization problems in which the hard computational step is not independent of the private data. This is significantly more challenging, and is what allows us to give oracle-efficient robustly private algorithms for constructing synthetic data for exponentially large families . It is also what lets give oracle-efficient private learning algorithms over exponentially large for the first time.
A recent line of work starting with the “PATE” algorithm [PAE16] together with more recent theoretical analyses of similar algorithms by Dwork and Feldman, and Bassily, Thakkar, and Thakurta [DF18, BTT18] can be viewed as giving oracle-efficient algorithms for an easier learning task, in which the goal is to produce a finite number of private predictions rather than privately output the model that makes the predictions. These can be turned into oracle efficient algorithms for outputting a private model under the assumption that the mechanism has access to an additional source of unlabeled data drawn from the same distribution as the private data, but that does not need privacy protections. In this setting, there is no need to take advantage of any special structure of the hypothesis class , because the information theoretic lower bounds on private learning proven in [BNSV15, ALMM18] do not apply. In contrast, our results apply without the need for an auxiliary source of non-private data.
Privately producing contingency tables, and synthetic data that encode them — i.e. the answers to statistical queries defined by conjunctions of features — has been a key challenge problem in differential privacy at least since [BCD07]. Since then, a number of algorithms and hardness results have been given [UV10, GHRU13, KRSU10, TUV12, HRS12, FK14, CTUW14]. This paper gives the first oracle-efficient algorithm for generating synthetic data consistent with a full contingency table, and the first oracle-efficient algorithm for answering arbitrary conjunctions to near optimal error.
Technically, our work is inspired by Syrgkanis et al. [SKS16] who show how a small separator set (equivalently a small universal identification set) can be used to derive oracle-efficient no-regret algorithms in the contextual bandit setting. The small separator property has found other uses in online learning, including in the oracle-efficient construction of nearly revenue optimal auctions [DHL17]. Hazan and Koren [HK16] show lower bounds for oracle-efficient no-regret learning algorithms in the experts setting, which forms the basis of our barrier result. More generally, there is a rich literature studying oracle-efficient algorithms in machine learning [BDH05, BBB08, BILM16] and optimization [BTHKM15] as a means of dealing with worst-case hardness, and more recently, for machine learning subject to fairness constraints [ABD18, KNRW18, AIK18].
We also make crucial use of a property of differentially private algorithms, first shown by [CLN16]: That when differentially private algorithms are run on databases of size with privacy parameter , then they have similar output distributions when run on datasets that are sampled from the same distribution, rather than just on neighboring datasets. In [CLN16], this was used as a tool to show the existence of robustly generalizing algorithms (also known as distributionally private algorithms in [BLR13]). We prove a new variant of this fact that holds when the datasets are not sampled i.i.d. and use it for the first time in an analysis to prove differential privacy. The technique might be of independent interest.
2.1 Differential Privacy Tools
Let denote a -dimensional data domain (e.g. or ). We write to denote the size of a dataset . We call two data sets neighbors (written as ) if can be derived from by replacing a single data point with some other element of .
Fix . A randomized algorithm is -differentially private if for every pair of neighboring data sets , and for every event :
Differentially private computations enjoy two nice properties:
Let be any -differentially private algorithm, and let be any function. Then the algorithm is also -differentially private.
Post-processing implies that, for example, every decision process based on the output of a differentially private algorithm is also differentially private.
Let , be such that is -differentially private, and is -differentially private for every . Then the algorithm defined as is -differentially private.
The Laplace distribution plays a fundamental role in differential privacy. The Laplace Distribution centered at with scale
is the distribution with probability density function. We write when
is a random variable drawn from a Laplace distribution with scale. Let be an arbitrary function. The sensitivity of is defined to be . The Laplace mechanism with parameter simply adds noise drawn independently from to each coordinate of .
Theorem 3 ([Dmns06]).
The Laplace mechanism is -differentially private.
2.2 Statistical Queries and Separator Sets
We study learning (optimization) and synthetic data generation problems for statistical queries defined over a data universe . A statistical query over is a function . A statistical query can represent, e.g. any binary classification model or the binary loss function that it induces. Given a dataset , the value of a statistical query on is defined to be . In this paper, we will generally think about query classes that represent standard hypothesis classes from learning theory – like conjunctions, disjunctions, halfspaces, etc.
In this paper, we will make crucial use of universal identification sets for classes of statistical queries. Universal identification sets are equivalent to separator sets, defined (in a slightly more general form) in [SKS16].
A set is a universal identification set or separator set for a class of statistical queries if for every pair of distinct queries , there is an such that:
If , then we say that has a separator set of size .
Many classes of statistical queries defined over the boolean hypercube have separator sets of size proportional to their VC-dimension. For example, boolean conjunctions, disjunctions, halfspaces defined over the hypercube, and parity functions in dimensions all have separator sets of size . When we solve learning problems over these classes, we will be interested in the set of queries that define the 0/1 loss function over these classes: but as we observe in Appendix A, if a hypothesis class has a separator set of size , then so does the class of queries representing the empirical loss for functions in that hypothesis class.
2.3 Learning and Synthetic Data Generation
We study private learning as empirical risk minimization (the connection between in-sample risk and out-of-sample risk is standard, and follows from e.g. VC-dimension bounds [KV94] or directly from differential privacy (see e.g. [BST14, DFH15])). Such problems can be cast as finding a function in a class that minimizes , subject to differential privacy (observe that the empirical risk of a hypothesis is a statistical query — see Appendix A). We will therefore study minimization problems over classes of statistical queries generally:
We say that a randomized algorithm is an -minimizer for if for every dataset , with probability , it outputs such that:
Synthetic data generation, on the other hand, is the problem of constructing a new dataset that approximately agrees with the original dataset with respect to a fixed set of statistical queries:
We say that a randomized algorithm is an -accurate synthetic data generation algorithm for if for every dataset , with probability , it outputs such that for all :
2.4 Oracles and Oracle Efficient Algorithms
We discuss several kinds of oracle-efficient algorithms in this paper. It will be useful for us to study oracles that solve weighted generalizations of the minimization problem, in which each datapoint is paired with a real-valued weight . In the literature on oracle-efficiency in machine learning, these are widely employed, and are known as cost-sensitive classification oracles. Via a simple translation and re-weighting argument, they are no more powerful than unweighted minimization oracles, but are more convenient to work with.
A weighted optimization oracle for a class of statistical queries is a function takes as input a weighted dataset and outputs a query such that
In this paper, we will study algorithms that have access to weighted optimization oracles for learning problems that are computationally hard. Since we do not believe that such oracles have worst-case polynomial time implementations, in practice, we will instantiate such oracles with heuristics that are not guaranteed to succeed. There are two failure modes for a heuristic: it can fail to produce an output at all, or it can output an incorrect query. The distinction can be important. We call a heuristic that might fail to produce an output, but never outputs an incorrect solution a certifiable heuristic optimization oracle:
A certifiable heuristic optimization oracle for a class of queries is a polynomial time algorithm that takes as input a weighted dataset and either outputs or else outputs (“Fail”). If it outputs a statistical query , we say the oracle has succeeded.
In contrast, a heuristic optimization oracle (that is not certifiable) has no guarantees of correctness. Without loss of generality, such oracles never need to return “Fail” (since they can always instead output a default statistical query in this case).
A (non-certifiable) heuristic optimization oracle for a class of queries is an arbitrary polynomial time algorithm . Given a call to the oracle defined by a weighted dataset we say that the oracle has succeeded on this call up to error if it outputs a query such that . If it succeeds up to error 0, we just say that the heuristic oracle has succeeded. Note that there may not be any efficient procedure to determine whether the oracle has succeeded up to error .
We say an algorithm is (certifiable)-oracle dependent if throughout the course of its run it makes a series of (possibly adaptive) calls to a (certifiable) heuristic optimization oracle . An oracle-dependent algorithm is oracle equivalent to an algorithm if given access to a perfect optimization oracle , induces the same distribution on outputs as . We now state an intuitive lemma (that could also be taken as a more formal definition of oracle equivalence). See the Appendix for a proof.
Let be a certifiable-oracle dependent algorithm that is oracle equivalent to . Then for any fixed input dataset , there exists a coupling between and such that .
We will also discuss differentially private heuristic optimization oracles, in order to state additional consequences of our construction in Section 4. Note that because differential privacy precludes exact computations, differentially private heuristic oracles are necessarily non-certifiable, and will never succeed up to error 0.
A weighted -differentially private -accurate learning oracle for a class of statistical queries is an differentially private algorithm that takes as input a weighted dataset and outputs a query such that with probability :
We say that an algorithm is oracle-efficient if given access to an oracle (in this paper, always a weighted optimization oracle for a class of statistical queries) it runs in polynomial time in the length of its input, and makes a polynomial number of calls to the oracle. In practice, we will be interested in the performance of oracle-efficient algorithms when they are instantiated with heuristic oracles. Thus, we further require oracle-efficient algorithms to halt in polynomial time even when the oracle fails. When we design algorithms for optimization and synthetic data generation problems, their -accuracy guarantees will generally rely on all queries to the oracle succeeding (possibly up to error ). If our algorithms are merely oracle equivalent to differentially private algorithms, then their privacy guarantees depend on the correctness of the oracle. However, we would prefer that the privacy guarantee of the algorithm not depend on the success of the oracle. We call such algorithms robustly differentially private.
An oracle-efficient algorithm is -robustly differentially private if it satisfies -differential privacy even under worst-case performance of a heuristic optimization oracle. In other words, it is differentially private for every heuristic oracle that it might be instantiated with.
We write that an oracle efficient algorithm is non-robustly differentially private to mean that it is oracle equivalent to a differentially private algorithm.
3 Oracle Efficient Optimization
In this section, we show how weighted optimization oracles can be used to give differentially private oracle-efficient optimization algorithms for many classes of queries with performance that is worse only by a factor compared to that of the (computationally inefficient) exponential mechanism. The first algorithm we give is not robustly differentially private — that is, its differential privacy guarantee relies on having access to a perfect oracle. We then show how to make that algorithm (or any other algorithm that is oracle equivalent to a differentially private algorithm) robustly differentially private when instantiated with a certifiable heuristic optimization oracle.
3.1 A (Non-Robustly) Private Oracle Efficient Algorithm
In this section, we give an oracle-efficient (non-robustly) differentially private optimization algorithm that works for any class of statistical queries that has a small separator set. Intuitively, it is attempting to implement the “Report-Noisy-Min” algorithm (see e.g. [DR14]), which outputs the query
that minimizes a (perturbed) estimatewhere for each . Because Report-Noisy-Min samples an independent perturbation for each query , it is inefficient: its run time is linear in . Our algorithm – “Report Separator-Perturbed Min” (RSPM) – instead augments the dataset in a way that implicitly induces perturbations of the query values . The perturbations are no longer independent across queries, and so to prove privacy, we need to use the structure of a separator set.
The algorithm is straightforward: it simply augments the dataset with one copy of each element of the separator set, each with a weight drawn independently from the Laplace distribution. All original elements in the dataset are assigned weight 1. The algorithm then simply passes this weighted dataset to the weighted optimization oracle, and outputs the resulting query. The number of random variables that need to be sampled is therefore now equal to the size of the separator set, instead of the size of . The algorithm is closely related to a no-regret learning algorithm given in [SKS16] — the only difference is in the magnitude of the noise added, and in the analysis, since we need a substantially stronger form of stability.
It is thus immediate that the Report Separator-Perturbed Min algorithm is oracle-efficient whenever the size of the separator set is polynomial: it simply augments the dataset with a single copy of each of separator elements, makes draws from the Laplace distribution, and then makes a single call to the oracle:
The Report Separator-Perturbed Min algorithm is oracle-efficient.
The accuracy analysis for the Report Separator-Perturbed Min algorithm is also straightforward, and follows by bounding the weighted sum of the additional entries added to the original data set.
The Report Separator-Perturbed Min algorithm is an -minimizer for for:
Let be the query returned by RSPM, and let be the true minimizer . Then we show that with probability . By the CDF of the Laplace distribution and a union bound over the random variables , we have that with probability :
Since for every query , , this means that with probability , . Similarly . Combining these bounds gives:
as desired, where the second inequality follows because by definition, is the true minimizer on the weighted dataset . ∎
We can bound the expected error of RSPM using Theorem 5 as well. If we denote the error of RSPM by , we’ve shown that for all , . Thus for all . Let . Since is non-negative:
Hence , and so .
The privacy analysis is more delicate, and relies on the correctness of the oracle.
If is a weighted optimization oracle for , then the Report Separator-Perturbed Min algorithm is -differentially private.
We begin by introducing some notation. Given a weighted dataset , and a query , let be the value when is evaluated on the weighted dataset given the realization of the noise . To allow us to distinguish queries that are output by the algorithm on different datasets and different realizations of the perturbations, write . Fix any , and define:
to be the event defined on the perturbations that the mechanism outputs query . Given a fixed we define a mapping on noise vectors as follows:
We now make a couple of observations about the function .
Fix any and any pair of neighboring datasets . Let be such that is the unique minimizer . Then . In particular, this implies that for any such :
For this argument, it will be convenient to work with un-normalized versions of our queries, so that — i.e. we do not divide by the dataset size . Note that this change of normalization does not change the identity of the minimizer. Under this normalization, the queries are now -sensitive, rather than sensitive.
Recall that . Suppose for point of contradiction that . This in particular implies that
We first observe that . This follows because:
Here the first inequality follows because the un-normalized queries are 1-sensitive, and the second follows because is the unique minimizer.
Next, we write:
Consider each term in the final sum: . Observe that by construction, each of these terms is non-negative: Clearly if , then the term is . Further, if , then by construction, . Finally, by the definition of a separator set, we know that there is at least one index such that . Thus, we can conclude:
where the final inequality follows from applying inequality 1. But rearranging, this means that , which contradicts the assumption that . ∎
denote the probability density function of the joint distribution of the Laplace random variables, and by abuse of notation also of each individual .
For any :
For any index and , we have . In particular, if , . Since for all and , we have:
Fix any class of queries that has a finite separator set . For every dataset there is a subset such that:
On the restricted domain , there is a unique minimizer
be the set of values that do not result in unique minimizers .
Because is a finite set111Any class of queries with a separator set of size can be no larger than ., by a union bound it suffices to show that for any two distinct queries ,
This follows from the continuity of the Laplace distribution. Let be any index such that (recall that by the definition of a separator set, such an index is guaranteed to exist). For any fixed realization of , there is a single value of that equalizes and . But any single value is realized with probability .
In Appendix B, we give a somewhat more complicated analysis to show that by using Gaussian perturbations rather than Laplace perturbations, it is possible to improve the accuracy of the RSPM algorithm by a factor of , at the cost of satisfying -differential privacy:
The Gaussian RSPM algorithm is -differentially private, and is an oracle-efficient -minimizer for any class of functions that has a universal identifications sequence of size for:
See Appendix B for the algorithm and its analysis.
It is instructive to compare the accuracy that we can obtain with oracle-efficient algorithms to the accuracy that can be obtained via the (inefficient, and generally optimal) exponential mechanism based generic learner from [KLN11]. The existence of a universal identification set for of size implies (and for many interesting classes of queries, including conjunctions, disjunctions, parities, and discrete halfspaces over the hypercube, this is an equality — see Appendix A). Thus, the exponential-mechanism based learner from [KLN11] is -accurate for:
Comparing this bound to ours, we see that we can obtain oracle-efficiency at a cost of roughly a factor of in our error bound. Whether or not this cost is necessary is an interesting open question.
We can conclude that for a wide range of hypothesis classes including boolean conjunctions, disjunctions, decision lists, discrete halfspaces, and several families of circuits of logarithmic depth (see Appendix A) there is an oracle-efficient differentially private learning algorithm that obtains accuracy guarantees within small polynomial factors of the optimal guarantees of the (inefficient) exponential mechanism.
3.2 A Robustly Differentially Private Oracle-Efficient Algorithm
The RSPM algorithm is not robustly differentially private, because its privacy proof depends on the oracle succeeding. This is an undesirable property for RSPM and other algorithms like it, because we do not expect to have access to actual oracles for hard problems even if we expect that there are certain families of problems for which we can reliably solve typical instances222There may be situations in which it is acceptable to use non robustly differentially private oracle-efficient algorithms — for example, if the optimization oracle is so reliable that it has never been observed to fail on the domain of interest. But robust differential privacy provides a worst-case guarantee which is preferable.. In this section, we show how to remedy this: we give a black box reduction, starting from a (non-robustly) differentially private algorithm that is implemented using a certifiable heuristic333We recall that heuristics for solving integer programs (such as cutting planes methods, branch and bound, and branch and cut methods, as implemented in commercial solvers) and SAT solvers are certifiable. oracle , and producing a robustly differentially private algorithm for solving the same problem. will be -differentially private for a parameter that we may choose, and will have a factor of roughly running time overhead on top of . So if is oracle efficient, so is whenever the chosen value of . If the oracle never fails, then we can prove utility guarantees for it when has such guarantees, since it just runs (using a smaller privacy parameter) on a random sub-sample of the original dataset. But the privacy guarantees hold even in the worst case of the behavior of the oracle. We call this reduction the Private Robust Subsampling Meta Algorithm or PRSMA.
3.2.1 Intuition and Proof Outline
Before we describe the analysis of PRSMA, a couple of remarks are helpful in order to set the stage.
At first blush, one might be tempted to assert that if an oracle-efficient non-robustly differentially private algorithm is implemented using a certifiable heuristic oracle, then it will sample from a differentially private distribution conditioned on the event that the heuristic oracle doesn’t fail
. But a moment’s thought reveals that this isn’t so: the possibility of failures both on the original datasetand on the (exponentially many) neighboring datasets can substantially change the probabilities of arbitrary events , and how these probabilities differ between neighboring datasets.
Next, one might think of the following simple candidate solution: Run the algorithm roughly many times in order to check that the failure probability of the heuristic algorithm on is , and then output a sample of only if this is so. But this doesn’t work either: the failure probability itself will change if we replace with a neighboring dataset , and so this won’t be differentially private. In fact, there is no reason to think that the failure probability of will be a low sensitivity function of , so there is no way to privately estimate the failure probability to non-trivial error.
It is possible to use the subsample-and-aggregate procedure of [NRS07] to randomly partition the dataset into pieces , and privately estimate on how many of these pieces fails with probability . The algorithm can then then fail if this private count is not sufficiently large. In fact, this is the first thing that PRSMA does, in lines 1-10, setting for those pieces such that it seems that the probability of failure is , and setting for the others.
But the next step of the algorithm is to randomly select one of the partition elements amongst the set that passed the earlier test: i.e. amongst the set such that — and return one of the outputs that had been produced by running . It is not immediately clear why this should be private, because which partition elements passed the test is not itself differentially private. Showing that this results in a differentially private output is the difficult part of the analysis.
To get an idea of the problem that we need to overcome, consider the following situation which our analysis must rule out: Fix a partition of the dataset , and imagine that each partition element passes: we have for all . Now suppose that there is some event such that , but is close to 0 for all . Since , and the final output is drawn from a uniformly random partition element, this means that PRSMA outputs an element of with probability . Suppose that on a neighboring dataset , no longer passes the test and has . Since it is no longer a candidate to be selected at the last step, we now have that on , PRSMA outputs an element of with probability close to . This is a violation of -differential privacy for any non-trivial value of (i.e. ).
The problem is that (fixing a partition of into ) moving to a neighboring dataset can potentially arbitrarily change the probability that any single element survives to step 11 of the algorithm, which can in principle change the probability of arbitrary events by an additive term, rather than a multiplicative factor.
Since we are guaranteed that (with high probability) if we make it to step 11 without failing, then at least elements have survived with , it would be sufficient for differential privacy if for every event , the probabilities were within a constant factor of each other, for all . Then a change of whether a single partition element survives with or not would only add or remove an fraction of the total probability mass on event . While this seems like a “differential-privacy” like property, but it is not clear that the fact that is differentially private can help us here, because the partition elements are not neighboring datasets — in fact, they are disjoint. But as we show, it does in fact guarantee this property if we set the privacy parameter to be sufficiently small — to roughly in step 5.
With this intuition setting the stage, the roadmap of the proof is as follows. For notational simplicity, we write to denote , the oracle-efficient algorithm when implemented with a perfect oracle.
We observe that -differential privacy implies that the log-probability of any event when is run on changes by less than an additive factor of when an element of is changed. We use a method of bounded differences argument to show that this implies that the log-probability density function concentrates around its expectation, where the randomness is over the subsampling of from . A similar result is proven in [CLN16] to show that differentially private algorithms achieve what they call “perfect generalization.” We need to prove a generalization of their result because in our case, the elements of are not selected independently of one another. This guides our choice of in step 5 of the algorithm. (Lemma 5)
We show that with high probability, for every such that after step 10 of the algorithm, fails with probability at most . By Lemma 1, this implies that it is -close in total variation distance to .
We observe that fixing a partition, on a neighboring dataset, only one of the partition elements changes — and hence changes its probability of having . Since with high probability, conditioned on PRSMA not failing, partition elements survive with , parts 1 and 2 imply that changing a single partition element only changes the probability of realizing any outcome event by a multiplicative factor of .
3.2.2 The Main Theorem
PRSMA is differentially private when given as input:
An oracle-efficient non-robustly differentially private algorithm implemented with a certifiable heuristic oracle , and
Privacy parameters where and .
We analyze PRSMA with privacy parameters and , optimizing the constants at the end. Fix an input dataset with , and an adjacent dataset , such that without loss of generality differ in the element . We denote the PRSMA routine with input and dataset by . We first observe that:
This is immediate since the indicator for a failure is a post-processing of the Laplace mechanism. Since can affect at most one oracle failure, is -sensitive, and so publishing satisfies -differential privacy since it is an invocation of the Laplace Mechanism defined in Section 2.1. (This can also be viewed as an instantiation of the “sub-sample and aggregate procedure of [NRS07]).
We now proceed to the meat of the argument. To establish differential privacy we must reason about the probability of arbitrary events , rather than just individual outputs . We want to show:
We first fix some notation and define a number of events that we will need to reason about. Let:
be the uniform distribution over equal sized partitions ofthat the datasets are drawn from in line ; i.e. where is the partition of into .
denote , our oracle-efficient algorithm when instantiated with a perfect oracle . i.e. is the -differentially private distribution that we ideally want to sample from.
be the event that the Laplace noise in step of PRSMA has magnitude greater than .
. We will use o to denote a particular set . Let be the set . Given , let denote .
be the event that for all : .
denote the index of the randomly chosen in step of PRSMA.
be the event that the draw is such that the probabilities of outputting when run on any two are within a multiplicative factor of . Lemma 5 formally defines and shows . Let denote the set of on which event holds.
We now bound the probabilities of several of these events. By the CDF of the Laplace distribution, we have and by a union bound:
Let be the event . By the above calculation and another union bound, . Our proof now proceeds via a sequence of lemmas. All missing proofs appear in Appendix C. We first show that occurs with high probability.
Let . Let be an differentially private algorithm, where: Fix , and let . Define to be the event
Then over the random draw of