A One-Pass Private Sketch for Most Machine Learning Tasks

by   Benjamin Coleman, et al.
Rice University

Differential privacy (DP) is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees. Inspired by recent progress toward general-purpose data release algorithms, we propose a private sketch, or small summary of the dataset, that supports a multitude of machine learning tasks including regression, classification, density estimation, near-neighbor search, and more. Our sketch consists of randomized contingency tables that are indexed with locality-sensitive hashing and constructed with an efficient one-pass algorithm. We prove competitive error bounds for DP kernel density estimation. Existing methods for DP kernel density estimation scale poorly, often exponentially slower with an increase in dimensions. In contrast, our sketch can quickly run on large, high-dimensional datasets in a single pass. Exhaustive experiments show that our generic sketch delivers a similar privacy-utility tradeoff when compared to existing DP methods at a fraction of the computation cost. We expect that our sketch will enable differential privacy in distributed, large-scale machine learning settings.


page 1

page 2

page 3

page 4


Auditing Differential Privacy in High Dimensions with the Kernel Quantum Rényi Divergence

Differential privacy (DP) is the de facto standard for private data rele...

Sub-linear RACE Sketches for Approximate Kernel Density Estimation on Streaming Data

Kernel density estimation is a simple and effective method that lies at ...

Efficient Inference via Universal LSH Kernel

Large machine learning models achieve unprecedented performance on vario...

Integral Privacy for Density Estimation with Approximation Guarantees

Density estimation is an old and central problem in statistics and machi...

Locality Sensitive Hashing with Extended Differential Privacy

Extended differential privacy, a generalization of standard differential...

Sketching Datasets for Large-Scale Learning (long version)

This article considers "sketched learning," or "compressive learning," a...

Density Sketches for Sampling and Estimation

We introduce Density sketches (DS): a succinct online summary of the dat...

1 Introduction

Large-scale data collection is an integral component of the modern machine learning pipeline. The success of a learning algorithm critically depends on the quantity and quality of the input data. Although vast amounts of data are available, the information is often of a personal or sensitive nature. Models can leak substantial information about individual participants, even if we only release predictions or outputs  fredrikson2014privacy ; fredrikson2015model ; shokri2017membership . Privacy is, therefore, an important design factor for machine learning algorithms.

To protect against diverse and sophisticated attacks, -differential privacy has emerged as a theoretically rigorous definition of privacy with robust guarantees dwork2006differential . Informally, an algorithm is differentially private if the inclusion (or exclusion) of any specific data record cannot substantially alter the output. In this paper, we consider the task of privately releasing a function that applies a pairwise operation to each record in a dataset and returns the sum. Although one could directly evaluate and release the result with the exponential mechanism, we can only evaluate a finite number of times before the privacy budget runs out. The function release problem is to release a private summary of that can answer an unlimited number of queries hall2013differential . We also seek error bounds for all queries111By “all queries”, we mean all values of . There is no polynomial algorithm for all general queries ullman2013answering ., not just the global minimum as with empirical risk minimization chaudhuri2011differentially or a finite set of linear queries as in hardt2012simple .

Many machine learning problems can be solved in the function release framework. When

is a loss function, we can train a model by minimizing

. When is a kernel function, is the kernel density estimate - a convenient way to approximate the likelihood function in a classification model. As a result, function release has received considerable theoretical attention. There are elegant, general and powerful techniques to release essentially any function hall2013differential ; alda2017bernstein ; formalmirshani19a ; wang2013efficient . However, function release has not yet been widely adopted in practice because existing methods fail to scale beyond small, low-dimensional datasets. The practical utility of function release is plagued by issues such as quadratic runtime and exponential memory requirements. For instance, many algorithms release

via function approximation over an interpolation lattice, but the size of the lattice grows exponentially with dimensions.

Our Contribution:

In this work, we propose a scalable approach to function release using the RACE sketch, a recent development in (non-private) data streaming algorithms. The RACE sketch can approximate pairwise kernel sums on streaming data for a specific class of kernels known as locality sensitive hash (LSH) kernels. By restricting our attention to LSH kernel sums, we obtain fast streaming algorithms for private function release. RACE sketches consist of a small array of integers (4 MB) and are well-suited to large-scale distributed settings. Private sketches are sought after by practitioners because they combine algorithmic efficiency with privacy guarantees apple2017 . We show how to construct a private RACE sketch for the LSH kernel sum . We prove that our sketch is -differentially private and derive pointwise error bounds for our approximation to . Our bounds are competitive with existing methods but come at a smaller computation cost. RACE easily scales to datasets with hundreds of dimensions and millions of entries.

Although the restriction to LSH kernels might seem limiting, we argue that most machine learning tasks can be performed using the RACE sketch. We show how to express classification, linear regression, kernel density estimation (KDE), anomaly detection and mode finding in terms of specific LSH kernel compositions. We conduct an exhaustive set of experiments with KDE, classification and linear regression. Our experiments show that RACE can release useful functions for many machine learning tasks with a competitive privacy-utility tradeoff.

2 Background

We consider a dataset of points in . Although our analysis naturally extends to any metric space, we restrict our attention to for the sake of presentation.

2.1 Differential Privacy

We use the well-established definition of differential privacy dwork2006differential .

Definition 1.

Differential Privacy dwork2006differential A randomized function is said to provide -differential privacy if for all neighboring databases and (which differ in at most one element) and all subsets in the codomain of ,

The parameter is the privacy budget. The privacy budget limits the amount of information that can leak about any individual element of . If , then

might leak more information, but only with probability up to

. In this paper, we consider , which is simply called “-differential privacy.” The Laplace mechanism dwork2006differential is a general method to satisfy -differential privacy. By adding zero-mean Laplace noise to a real-valued function, we obtain differential privacy if the noise is scaled based on the sensitivity of the function (Definition 2).

Definition 2.

Sensitivity dwork2006differential For a function , the L1-sensitivity of is

where the supremum is taken over all neighboring datasets and

Theorem 1.

Laplace Mechanism dwork2006differential Let be a non-private function with sensitivity and let (

-dimensional i.i.d Laplace vector). Then the function

provides -differential privacy.

2.2 Related Work

There are several techniques to release the kernel sum with differential privacy. A common approach is to decompose into a set of weighted basis functions wasserman2010statistical . We truncate the basis expansion to terms and represent as a set of weights in . The weights are made private via the Laplace mechanism and used to release a private version of . Each basis term in the representation increases the quality of the approximation but degrades the quality of the weights, since the privacy budget is shared among the

weights. This is a bias-variance tradeoff: we trade variance in the form of Laplace noise with bias in the form of truncation error. The Fourier basis 

hall2013differential , Bernstein basis alda2017bernstein , trigonometric polynomial basis wang2013efficient and various kernel bases have all been used for private function release. Such methods are most effective when is a smooth function.

An alternative set of techniques rely on functional data analysis  formalmirshani19a or synthetic databases  hardt2012simple ; balog2018differentially . In formalmirshani19a , the authors use densities over function spaces to release a smoothed approximation to . The main idea of  hardt2012simple and  balog2018differentially is to release a set of weighted synthetic points that can be used to estimate . Table 1 compares these methods based on approximation error and computation.

Method Error Bound Runtime Comments
polynomials alda2017bernstein
. Memory is also
exponential in .
PFDA formalmirshani19a
and are task-dependent
-differential privacy
MWEM hardt2012simple
is a set of query points. Holds
with probability
polynomials wang2013efficient
The result holds with probability
This work
Applies only for LSH kernels.
Efficient streaming algorithm.
Table 1: Summary of related methods to release the kernel sum for an point dataset in . Unless otherwise stated, the error is attained with probability and -differential privacy. We hide constant factors and adjust results to estimate rather than the KDE () when necessary. is a kernel smoothness parameter.

2.3 Locality-Sensitive Hashing

LSH Functions:

An LSH family is a family of functions with the following property: Under , similar points have a high probability of having the same hash value. We say that a collision occurs whenever two points have the same hash code, i.e. . In this paper, we use a slightly different definition than the original indyk1998approximate because we require the collision probability at all points.

Definition 3.

We say that a hash family is locality-sensitive with collision probability if for any two points and in , with probability under a uniform random selection of from .

LSH Kernels:

When the collision probability is a monotone decreasing function of the distance metric , one can show that is a positive semidefinite kernel function coleman2020race . We say that a kernel function is an LSH kernel if it forms the collision probability for an LSH family. For a kernel to be an LSH kernel, it must obey the conditions described in chierichetti2012preserving . A number of well-known LSH families induce useful kernels gionis1999similarity . For example, there are LSH kernels that closely resemble the cosine, Laplace and multivariate Student kernels coleman2020race .

2.4 RACE Sketch

LSH kernels are interesting because there are efficient algorithms to estimate the quantity


when is an LSH kernel. In coleman2020race , the authors present a one-pass streaming algorithm to estimate . The algorithm produces a RACE (Repeated Array of Count Estimators) sketch , a 2D array of integers that we index using LSH functions. This array is sufficient to report for any query . We begin by constructing functions from an LSH family with the desired collision probability. When an element arrives from the stream, we hash to get hash values, one for each row of . We increment row at location and repeat for all elements in the dataset. To approximate , we return the mean of over the rows.

RACE for Density Estimation:

This streaming algorithm approximates the kernel density estimate (KDE) for all possible queries in a single pass. RACE is also a mergable summary with an efficient distributed implementation. In practice, RACE estimates the KDE with 1% error using a 4 MB array, even for large high-dimensional datasets. To prove rigorous error bounds, observe that each row of

is an unbiased estimator of

. The main theoretical result of coleman2020race is stated below as Theorem 2.

Theorem 2.

Unbiased RACE Estimatorcoleman2020race Suppose that is the query result for one of the rows of . That is,

RACE for Empirical Risk Minimization:

Theorem 2 holds for all collision probabilities, even those that are neither continuous nor positive semidefinite. Thus, we can use RACE to approximate the empirical risk for a variety of losses. Suppose we are given a dataset of training examples and a loss function , where is a parameter that describes the model. Empirical Risk Minimization (ERM) consists of finding a parameter (and thus a model) that minimizes the mean loss over the training set. RACE can approximate the empirical risk when can be expressed in terms of LSH collision probabilities. Although we cannot analytically find the gradient of the RACE count values, derivative-free optimization conn2009introduction is highly effective for RACE sketches. With the right LSH family, RACE can perform many machine learning tasks in the one-pass streaming setting.

3 Private Sketches with RACE

We propose a private version of the RACE sketch. We obtain -differential privacy by applying the Laplace mechanism to each count in the RACE sketch array. Algorithm 1 introduces a differentially private method to release the RACE sketch, illustrated in Figure 1. It is straightforward to see that Algorithm 1 only requires hash computations. Assuming fixed , we have runtime.

  Input: Dataset , privacy budget , LSH family , dimensions
  Output: Private sketch
  Initialize: independent LSH functions from the LSH family
  for  do
     for  in to  do
     end for
  end for
Algorithm 1 Private RACE sketch
  Input: Sketch , query , the same LSH functions from Algorithm 1
  Output: Estimate of
  for  in to  do
  end for
Algorithm 2 RACE query

3.1 Privacy

For the purposes of Definition 1, we consider the function to be Algorithm 1. The codomain of is the set of all RACE sketches with rows and columns. Our main theorem is that the value returned by is -differentially private. That is, our sketch is differentially private. We begin the proof by applying the parallel composition theorem dwork2014algorithmic to the counters in one row, which each see a disjoint partition of . Then, we apply the sequential composition theorem dwork2014algorithmic to the set of rows. Due to space limitations, we defer the full proof to the Appendix.

Theorem 3.

For any , , and LSH family , the output of Algorithm 1, or the RACE sketch , is -differentially private.

3.2 Utility

Since one can construct many learning algorithms from a sufficiently good estimate of  alda2017bernstein , we focus on utility guarantees for the RACE estimate of . Since RACE is a collection of unbiased estimators for , our proof strategy is to bound the distance between the RACE estimate and the mean. To bound the variance of the private RACE estimator, we add the independent Laplace noise variance to the bound from Theorem 2. Theorem 4 follows using the median-of-means procedure.

Theorem 4.

Let be the median-of-means estimate using an -differentially private RACE sketch with rows and . Then with probability ,

Theorem 4 suggests a tradeoff for which there is an optimal value of . If we increase the number of rows in the sketch, we improve the estimator but must add more Laplace noise. To get our main utility guarantee, we choose an optimal that minimizes the error bound.

Corollary 1.

Put . Then the approximation error bound is

If we divide both sides of Corollary 1 by , we bound the relative (or percent) error rather than the absolute error. Corollary 1 suggests that it is hardest to achieve a small relative error when is small. This agrees with our intuition about how the KDE should behave under differential privacy guarantees. Fewer individuals make heavy contributions to in low-density regions than in high-density ones, so the effect of the noise is worse.

Figure 1: Illustration of Algorithm 1 for . We hash each element in the stream with LSH functions having collision probability . In this example, , and . We increment the highlighted cells. The addition of the Laplace noise is not shown in the figure, but is done by perturbing each count in .

4 Applications

Because RACE can release LSH kernel sums, our sketch is broadly useful for many algorithms in machine learning. In particular, we discuss private density estimation, classification, regression, mode finding, anomaly detection and diversity sampling using RACE.

Kernel Density Estimation: To use RACE for KDE, we select one or more LSH kernels and construct sketches with Algorithm 1. We require one RACE sketch for each kernel and bandwidth setting, and we return the result of Algorithm 2. Note that the query no longer has access to , the number of elements in the private dataset. Therefore, we estimate directly from the private sketch.

Mode Finding:

One can apply gradient-free optimization to KDE to recover the modes of the data distribution. This works surprisingly well, but in general the KDE is a non-convex function. One can also apply linear programming techniques that use information about the hash function partitions to identify a point in a partition with the highest count values.

Naive Bayes Classification:

Using kernel density classification, a well-developed result from statistical hypothesis testing 


, we can construct classifiers with RACE under both the maximum-likelihood and maximum a posteriori (MAP) decision rules. Suppose we are given a training set

with classes and a query . We can represent the empirical likelihood with a sketch of the KDE for class . Algorithm 2

returns an estimate of this probability, which may be used directly by a naive Bayes classifier or other type of probabilistic learner.

Anomaly Detection / Sampling: Anomaly detection can be cast as a KDE classification problem. If Algorithm 2 reports a low density for , then the training set contains few elements similar to and thus

is an outlier. This principle is behind the algorithms in 

luo2018arrays and coleman2019diversified .

Linear Regression: If we use an asymmetric LSH family shrivastava2014asymmetric , we can construct a valid surrogate loss for the linear regression loss . The key insight is that the signed random projection (SRP) LSH kernel is monotone in the inner product. If we apply SRP to both and , we obtain an LSH kernel with two components. One component is monotone increasing with the inner product, while the other is monotone decreasing. The resulting surrogate loss is a monotone function of the L2 loss and a convex function of .

5 Experiments

Dataset Description Task
NYC 25k 1 5k NYC salaries (2018) KDE
SF 29k 1 5k SF salaries (2018)
skin 241k 3 5.0 RGB skin tones
codrna 57k 8 0.5 RNA genomic data
nomao 34k 26 0.6 User location data Classification
occupancy 17k 5 0.5 Building occupancy
pulsar 17k 8 0.1 Pulsar star data
airfoil 1.4k 9 - Airfoil parameters and sound level Regression
naval 11k 16 - Frigate turbine propulsion
gas 3.6k 128 - Gas sensor, different concentrations
Table 2: Datasets used for KDE and classification experiments. Each dataset has entries with features. is the kernel bandwidth.
Preprocess days 2.3 days 6 hr 13 min
Query - 6.2 ms 1.2 ms 0.4 ms
Table 3: Computation time for KDE on the skin dataset. Note that we were unable to run PFDA on this dataset. Instead, we report the PFDA runtime for the (smaller) NYC dataset.

We perform an exhaustive comparison for KDE, classification and regression. For KDE, we estimate the density of the salaries for New York City (NYC) and San Francisco (SF) city employees in 2018, as well as the high-dimensional densities for the skin and codrna UCI datasets. We use the same LSH kernel as the authors of coleman2020race . We use UCI datasets for the regression and classification experiments. Most private KDE methods require the data to lie in the unit sphere or cube; we scale the inputs accordingly. Table 2 presents the datasets used in our experiments. We make the following considerations for our baseline methods.

KDE: We implemented several function release methods from Table 1. We also compare against the Fourier basis estimator hall2013differential ; wasserman2010statistical and the kernel mean embedding (KME) method balog2018differentially . We use the Python code released by balog2018differentially and implemented the other methods in Python. To give a fair comparison, we profiled and optimized each of our baselines. We were unable to run Fourier and PFDA in more than one dimension because they both require function evaluations on a Bravais lattice, which scales exponentially. We show the density estimates and their errors in Figure 2.


We compare a maximum-likelihood RACE classifier against a regularized logistic regression classifier trained using objective perturbation 

chaudhuri2011differentially . We average over the Laplace noise and report the accuracy on a held-out test set in Figure 3.

Regression: We compare RACE regression against five algorithms: sufficient statistics perturbation (SSP), objective perturbation (ObjPert), posterior sampling (OPS), and adaptive versions of SSP and OPS (AdaSSP and AdaOPS wang2018revisiting ). We use the MatLab code released by wang2018revisiting and report the mean squared error (MSE) on a held-out test set in Figure 4.

Figure 2: Privacy-utility tradeoff for private function release methods. We report the L2 function error and the mean relative error for 2000 held-out queries.

Computation: Table 3 displays the computation time needed to construct a useful function release. Note that Bernstein release can run faster if we use fewer interpolation points (see Table 1

), but we still required at least 12 hours of computation for competitive results. The Bernstein mechanism requires many binomial coefficient evaluations, which were expensive even when we used optimized C code. KME requires a large-scale kernel matrix computation, and PFDA required several days for an expensive eigenvalue computation. RACE construction time varies based on

, but is substantially faster than all baselines.

Figure 3: Binary classification experiments. We show the privacy-utility tradeoff for a private logistic regression classifier and the RACE max-likelihood classifier. Average over 10 repetitions.
Figure 4: Linear regression experiments. We show the privacy-utility tradeoff for RACE and several other linear regression methods. Average over 10 repetitions.

6 Discussion

Function Release at Scale:

RACE is ideal for private function release in large-scale distributed applications. Although PFDA and the Bernstein mechanism have the strongest error bounds, they required days to produce the experimental results in Figure 2. This is a serious barrier in practice - it would require computations to run the Bernstein mechanism on the UCI gas dataset. The RACE sketch has a small memory footprint, inexpensive streaming updates and can quickly run on high-dimensional datasets. We believe that RACE can make differentially private function release a viable tool for real-world applications.

Our sketch is also convenient to use and deploy in a production environment because it is relatively insensitive to hyperparameters. In general, we found that density estimation requires a larger sketch with more rows

than classification or regression. Classification and regression problems benefit from a smaller hash range than function release. Higher dimensional problems also require a larger hash range. However, these choices are not critical and any with will provide good results. Hyperparameter tuning can be accomplished using very little of the overall privacy budget.

Privacy and Utility:

Our sketch beats interpolation-based methods for private function release when the dataset has more than a few dimensions and when is not smooth. Depending on smoothness, our error rate improves upon the rates of wang2013efficient and hardt2012simple for LSH kernels. In our experiments (Figure 2), the Bernstein mechanism outperforms RACE on the SF dataset but fails to capture the nuances of the NYC salary distribution, which has sharp peaks. RACE preserves the details of because the Laplace noise can only make local changes to the RACE structure. Suppose we generate a Laplace noise that is unusually large for one particular RACE counter. Queries that do not involve the problematic counter are unaffected. If we perturb one or two of the most important weights of a series estimator, the changes propagate to all queries.

Although RACE can only estimate LSH kernels, the space of LSH kernels is sufficiently large that we can release many useful functions. Our RACE surrogate loss outperformed objective perturbation chaudhuri2011differentially and compared favorably to other linear regression baselines. RACE also performed well in our classification experiments for , providing a competitive utility tradeoff for practical privacy budgets. Our experiments show that RACE can privately release useful functions for many machine learning problems.

7 Conclusion

We have presented RACE, a differentially private sketch that can be used for a variety of machine learning tasks. RACE is competitive with the state of the art on tasks including density estimation, classification, and linear regression. RACE sketches can be constructed in the one-pass streaming setting and are highly computationally efficient. At the same time, our sketch offers good performance on many machine learning tasks and is reasonably efficient in terms of the privacy budget. Given the utility, simplicity and speed of the algorithm, we expect that RACE will enable differentially private machine learning in large-scale settings.

Broader Impact

Privacy is a serious concern in machine learning and a barrier to the widespread adoption of useful technology. For example, machine learning models can substantially improve medical treatments, inform social policy, and help with financial decision-making. While such applications are overall beneficial to society, they often rely on sensitive personal data that users may not want to disclose. For instance, a 2019 survey found that a large majority ( 80%) of Americans are concerned with but do not understand how technology companies use the data they collect pew2019 . Data management practices have recently come under increased scrutiny, especially after the enactment of the European GDPR Act in 2018. To improve public confidence in their services, Google erlingsson2014rappor and Apple apple2017 both advertise differential privacy as a central part of their data collection process.

Differential privacy provides strong protection against malicious data use but is difficult to apply in practice. This is particularly true for functional data release, which is often prohibitively expensive in terms of computation cost. Our work substantially reduces the computational cost of this useful technique to achieve differential privacy. By reducing the computational requirement, we make privacy more accessible and attainable for data publishers, since large computing infrastructure is no longer needed to release private information. This directly reduces the economic cost of data privacy, an important consideration for governments and private companies. Our sketching method also allows private functional release to be applied to more real-world datasets, making it easier to release high-dimensional information with differential privacy.


This section contains proofs and detailed discussion of the theorems.


We consider a function that takes in a dataset and outputs an RACE sketch. is a randomized function, where the randomness is over the choice of the LSH functions and the (independent) Laplace noise. In this context, the -differentially private query is the RACE sketch . Our main theorem is that the sketch is -differentially private. Because differential privacy is robust to post-processing, we can then query as many times as desired once the sketch is released.

Proof Sketch:

The sketch consists of independent rows. Each row consists of count values. First, we prove privately release one of the rows by adding Laplace noise with variance (i.e. sensitivity ). This is described in Lemma 1. Then, we apply Lemma 1 with to release all the rows, proving the main theorem.

Lemma 1.

Consider one row of the RACE sketch, and add independent Laplace noise to each counter. The row can be released with -differential privacy.


Consider just one of the counters. It is easy to see that the sensitivity of the counter is 1 because changing a single element from the dataset can only create a change of in the counter. By the Laplace mechanism, this counter can be released with -differential privacy by adding Laplace noise .

The row contains counters, and we can release each one with the Laplace mechanism. Therefore, we have mechanisms (one to release each counter) which are independently -differentially private. The parallel composition theorem dwork2014algorithmic states that if each mechanism is computed on a disjoint subset of the dataset, then we can compute all of the mechanisms (i.e. release all of the counters) with -differential privacy.

This is indeed the case for the RACE sketch. Under the LSH function, each element in the dataset maps to exactly one of the counters. Thus, the counters are computed on disjoint subsets of the dataset and we can release them with -differential privacy. ∎

Theorem 3.

For any , , and LSH family , the output of Algorithm 1, or the RACE sketch , is -differentially private.


The sketch is composed of independent rows. Consider the mechanism from Lemma 1 to release a single row with -differential privacy. The sequential composition theorem dwork2014algorithmic states that given mechanisms (one to release each row) which are independently -differentially private, we can compute all the mechanisms (i.e. release all rows) with -differential privacy.

To construct the sketch, we apply Lemma 1 to each row with differential privacy by adding independent Laplace noise to each counter. The sequential composition theorem guarantees that we can release all rows with -differential privacy. ∎


First, we introduce the median-of-means trick in Lemma 2. The median-of-means procedure is a standard analysis technique with many applications in the streaming literature. Lemma 2 is a special case of Theorem 2.1 from alon1999space .

Lemma 2.

Let be

i.i.d. random variables with mean

and variance . To get the median of means estimate , break the random variables into groups with elements in each group.

Put and 222For simplicity, we suppose and are integers that evenly divide . See alon1999space for the complete analysis.. Then with probability at least , the deviation of the estimate from the mean is

Each row of the sketch provides an estimator to be used in the median-of-means technique. By adding the Laplace noise variance, we can find in Lemma 2 and provide performance guarantees.

Theorem 4.

Let be the median-of-means estimate using an -differentially private RACE sketch with rows and . Then with probability ,


To use median-of-means, we must bound the variance of the RACE estimate after we add the Laplace noise. Since the Laplace noise is independent of the LSH functions, we simply add the Laplace noise variance to the variance bound in Theorem 2.

Using this variance bound with Lemma 2 we have the following statement with probability .

Corollary 1.

Put . Then the approximation error bound is


Take the derivative of with respect to to find that minimizes the bound. Put and . The corollary is obtained by substituting into Theorem 4. We may replace with in the inequality because


  • [1] Francesco Aldà and Benjamin IP Rubinstein. The bernstein mechanism: Function release under differential privacy. In

    Thirty-First AAAI Conference on Artificial Intelligence

    , 2017.
  • [2] Noga Alon, Yossi Matias, and Mario Szegedy.

    The space complexity of approximating the frequency moments.

    Journal of Computer and system sciences, 58(1):137–147, 1999.
  • [3] Brooke Auxier, Lee Rainie, Monica Anderson, Andrew Perrin, Madhu Kumar, and Erica Turner. Americans and privacy: Concerned, confused and feeling lack of control over their personal information. Washington, DC: Pew Research Center, 2019.
  • [4] Matej Balog, Ilya Tolstikhin, and Bernhard Schölkopf. Differentially private database release via kernel mean embeddings. In International Conference on Machine Learning, pages 414–422, 2018.
  • [5] Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12(Mar):1069–1109, 2011.
  • [6] Flavio Chierichetti and Ravi Kumar. Lsh-preserving functions and their applications. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’12, page 1078–1094, USA, 2012. Society for Industrial and Applied Mathematics.
  • [7] Benjamin Coleman, Benito Geordie, Li Chou, RA Leo Elworth, Todd J Treangen, and Anshumali Shrivastava. Diversified race sampling on data streams applied to metagenomic sequence analysis. bioRxiv, page 852889, 2019.
  • [8] Benjamin Coleman and Anshumali Shrivastava. Sub-linear race sketches for approximate kernel density estimation on streaming data. In Proceedings of the 2020 World Wide Web Conference. International World Wide Web Conferences Steering Committee, 2020.
  • [9] Andrew R Conn, Katya Scheinberg, and Luis N Vicente. Introduction to derivative-free optimization, volume 8. Siam, 2009.
  • [10] Cynthia Dwork. Differential privacy. In 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006), volume 4052 of Lecture Notes in Computer Science, pages 1–12. Springer Verlag, July 2006.
  • [11] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
  • [12] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054–1067, 2014.
  • [13] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 1322–1333, 2015.
  • [14] Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In 23rd USENIX Security Symposium (USENIX Security 14), pages 17–32, 2014.
  • [15] Aristides Gionis, Piotr Indyk, Rajeev Motwani, et al. Similarity search in high dimensions via hashing. In Vldb, volume 99, pages 518–529, 1999.
  • [16] Rob Hall, Alessandro Rinaldo, and Larry Wasserman. Differential privacy for functions and functional data. Journal of Machine Learning Research, 14(Feb):703–727, 2013.
  • [17] Moritz Hardt, Katrina Ligett, and Frank McSherry. A simple and practical algorithm for differentially private data release. In Advances in Neural Information Processing Systems, pages 2339–2347, 2012.
  • [18] Piotr Indyk and Rajeev Motwani.

    Approximate nearest neighbors: towards removing the curse of dimensionality.


    Proceedings of the thirtieth annual ACM symposium on Theory of computing

    , pages 604–613. ACM, 1998.
  • [19] George H. John and Pat Langley. Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, UAI’95, page 338–345, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc.
  • [20] Chen Luo and Anshumali Shrivastava. Arrays of (locality-sensitive) count estimators (ace): Anomaly detection on the edge. In Proceedings of the 2018 World Wide Web Conference, pages 1439–1448. International World Wide Web Conferences Steering Committee, 2018.
  • [21] Ardalan Mirshani, Matthew Reimherr, and Aleksandra Slavković. Formal privacy for functional data with Gaussian perturbations. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 4595–4604, Long Beach, California, USA, 09–15 Jun 2019. PMLR.
  • [22] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pages 3–18. IEEE, 2017.
  • [23] Anshumali Shrivastava and Ping Li. Asymmetric lsh (alsh) for sublinear time maximum inner product search (mips). In Advances in Neural Information Processing Systems, pages 2321–2329, 2014.
  • [24] Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 1(8), 2017.
  • [25] Jonathan Ullman. Answering n2+o(1) counting queries with differential privacy is hard. In Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, STOC 2013, page 361–370, New York, NY, USA, 2013. Association for Computing Machinery.
  • [26] Yu-Xiang Wang. Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain. Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, pages 93–103, Monterey, California USA, 06-10 August 2018.
  • [27] Ziteng Wang, Kai Fan, Jiaqi Zhang, and Liwei Wang. Efficient algorithm for privately releasing smooth queries. In Advances in Neural Information Processing Systems, pages 782–790, 2013.
  • [28] Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375–389, 2010.