1 Introduction
Largescale data collection is an integral component of the modern machine learning pipeline. The success of a learning algorithm critically depends on the quantity and quality of the input data. Although vast amounts of data are available, the information is often of a personal or sensitive nature. Models can leak substantial information about individual participants, even if we only release predictions or outputs fredrikson2014privacy ; fredrikson2015model ; shokri2017membership . Privacy is, therefore, an important design factor for machine learning algorithms.
To protect against diverse and sophisticated attacks, differential privacy has emerged as a theoretically rigorous definition of privacy with robust guarantees dwork2006differential . Informally, an algorithm is differentially private if the inclusion (or exclusion) of any specific data record cannot substantially alter the output. In this paper, we consider the task of privately releasing a function that applies a pairwise operation to each record in a dataset and returns the sum. Although one could directly evaluate and release the result with the exponential mechanism, we can only evaluate a finite number of times before the privacy budget runs out. The function release problem is to release a private summary of that can answer an unlimited number of queries hall2013differential . We also seek error bounds for all queries^{1}^{1}1By “all queries”, we mean all values of . There is no polynomial algorithm for all general queries ullman2013answering ., not just the global minimum as with empirical risk minimization chaudhuri2011differentially or a finite set of linear queries as in hardt2012simple .
Many machine learning problems can be solved in the function release framework. When
is a loss function, we can train a model by minimizing
. When is a kernel function, is the kernel density estimate  a convenient way to approximate the likelihood function in a classification model. As a result, function release has received considerable theoretical attention. There are elegant, general and powerful techniques to release essentially any function hall2013differential ; alda2017bernstein ; formalmirshani19a ; wang2013efficient . However, function release has not yet been widely adopted in practice because existing methods fail to scale beyond small, lowdimensional datasets. The practical utility of function release is plagued by issues such as quadratic runtime and exponential memory requirements. For instance, many algorithms releasevia function approximation over an interpolation lattice, but the size of the lattice grows exponentially with dimensions.
Our Contribution:
In this work, we propose a scalable approach to function release using the RACE sketch, a recent development in (nonprivate) data streaming algorithms. The RACE sketch can approximate pairwise kernel sums on streaming data for a specific class of kernels known as locality sensitive hash (LSH) kernels. By restricting our attention to LSH kernel sums, we obtain fast streaming algorithms for private function release. RACE sketches consist of a small array of integers (4 MB) and are wellsuited to largescale distributed settings. Private sketches are sought after by practitioners because they combine algorithmic efficiency with privacy guarantees apple2017 . We show how to construct a private RACE sketch for the LSH kernel sum . We prove that our sketch is differentially private and derive pointwise error bounds for our approximation to . Our bounds are competitive with existing methods but come at a smaller computation cost. RACE easily scales to datasets with hundreds of dimensions and millions of entries.
Although the restriction to LSH kernels might seem limiting, we argue that most machine learning tasks can be performed using the RACE sketch. We show how to express classification, linear regression, kernel density estimation (KDE), anomaly detection and mode finding in terms of specific LSH kernel compositions. We conduct an exhaustive set of experiments with KDE, classification and linear regression. Our experiments show that RACE can release useful functions for many machine learning tasks with a competitive privacyutility tradeoff.
2 Background
We consider a dataset of points in . Although our analysis naturally extends to any metric space, we restrict our attention to for the sake of presentation.
2.1 Differential Privacy
We use the wellestablished definition of differential privacy dwork2006differential .
Definition 1.
Differential Privacy dwork2006differential A randomized function is said to provide differential privacy if for all neighboring databases and (which differ in at most one element) and all subsets in the codomain of ,
The parameter is the privacy budget. The privacy budget limits the amount of information that can leak about any individual element of . If , then
might leak more information, but only with probability up to
. In this paper, we consider , which is simply called “differential privacy.” The Laplace mechanism dwork2006differential is a general method to satisfy differential privacy. By adding zeromean Laplace noise to a realvalued function, we obtain differential privacy if the noise is scaled based on the sensitivity of the function (Definition 2).Definition 2.
Sensitivity dwork2006differential For a function , the L1sensitivity of is
where the supremum is taken over all neighboring datasets and
Theorem 1.
Laplace Mechanism dwork2006differential Let be a nonprivate function with sensitivity and let (
dimensional i.i.d Laplace vector). Then the function
provides differential privacy.2.2 Related Work
There are several techniques to release the kernel sum with differential privacy. A common approach is to decompose into a set of weighted basis functions wasserman2010statistical . We truncate the basis expansion to terms and represent as a set of weights in . The weights are made private via the Laplace mechanism and used to release a private version of . Each basis term in the representation increases the quality of the approximation but degrades the quality of the weights, since the privacy budget is shared among the
weights. This is a biasvariance tradeoff: we trade variance in the form of Laplace noise with bias in the form of truncation error. The Fourier basis
hall2013differential , Bernstein basis alda2017bernstein , trigonometric polynomial basis wang2013efficient and various kernel bases have all been used for private function release. Such methods are most effective when is a smooth function.An alternative set of techniques rely on functional data analysis formalmirshani19a or synthetic databases hardt2012simple ; balog2018differentially . In formalmirshani19a , the authors use densities over function spaces to release a smoothed approximation to . The main idea of hardt2012simple and balog2018differentially is to release a set of weighted synthetic points that can be used to estimate . Table 1 compares these methods based on approximation error and computation.
Method  Error Bound  Runtime  Comments  














2.3 LocalitySensitive Hashing
LSH Functions:
An LSH family is a family of functions with the following property: Under , similar points have a high probability of having the same hash value. We say that a collision occurs whenever two points have the same hash code, i.e. . In this paper, we use a slightly different definition than the original indyk1998approximate because we require the collision probability at all points.
Definition 3.
We say that a hash family is localitysensitive with collision probability if for any two points and in , with probability under a uniform random selection of from .
LSH Kernels:
When the collision probability is a monotone decreasing function of the distance metric , one can show that is a positive semidefinite kernel function coleman2020race . We say that a kernel function is an LSH kernel if it forms the collision probability for an LSH family. For a kernel to be an LSH kernel, it must obey the conditions described in chierichetti2012preserving . A number of wellknown LSH families induce useful kernels gionis1999similarity . For example, there are LSH kernels that closely resemble the cosine, Laplace and multivariate Student kernels coleman2020race .
2.4 RACE Sketch
LSH kernels are interesting because there are efficient algorithms to estimate the quantity
(1) 
when is an LSH kernel. In coleman2020race , the authors present a onepass streaming algorithm to estimate . The algorithm produces a RACE (Repeated Array of Count Estimators) sketch , a 2D array of integers that we index using LSH functions. This array is sufficient to report for any query . We begin by constructing functions from an LSH family with the desired collision probability. When an element arrives from the stream, we hash to get hash values, one for each row of . We increment row at location and repeat for all elements in the dataset. To approximate , we return the mean of over the rows.
RACE for Density Estimation:
This streaming algorithm approximates the kernel density estimate (KDE) for all possible queries in a single pass. RACE is also a mergable summary with an efficient distributed implementation. In practice, RACE estimates the KDE with 1% error using a 4 MB array, even for large highdimensional datasets. To prove rigorous error bounds, observe that each row of
is an unbiased estimator of
. The main theoretical result of coleman2020race is stated below as Theorem 2.Theorem 2.
Unbiased RACE Estimatorcoleman2020race Suppose that is the query result for one of the rows of . That is,
RACE for Empirical Risk Minimization:
Theorem 2 holds for all collision probabilities, even those that are neither continuous nor positive semidefinite. Thus, we can use RACE to approximate the empirical risk for a variety of losses. Suppose we are given a dataset of training examples and a loss function , where is a parameter that describes the model. Empirical Risk Minimization (ERM) consists of finding a parameter (and thus a model) that minimizes the mean loss over the training set. RACE can approximate the empirical risk when can be expressed in terms of LSH collision probabilities. Although we cannot analytically find the gradient of the RACE count values, derivativefree optimization conn2009introduction is highly effective for RACE sketches. With the right LSH family, RACE can perform many machine learning tasks in the onepass streaming setting.
3 Private Sketches with RACE
We propose a private version of the RACE sketch. We obtain differential privacy by applying the Laplace mechanism to each count in the RACE sketch array. Algorithm 1 introduces a differentially private method to release the RACE sketch, illustrated in Figure 1. It is straightforward to see that Algorithm 1 only requires hash computations. Assuming fixed , we have runtime.
3.1 Privacy
For the purposes of Definition 1, we consider the function to be Algorithm 1. The codomain of is the set of all RACE sketches with rows and columns. Our main theorem is that the value returned by is differentially private. That is, our sketch is differentially private. We begin the proof by applying the parallel composition theorem dwork2014algorithmic to the counters in one row, which each see a disjoint partition of . Then, we apply the sequential composition theorem dwork2014algorithmic to the set of rows. Due to space limitations, we defer the full proof to the Appendix.
Theorem 3.
For any , , and LSH family , the output of Algorithm 1, or the RACE sketch , is differentially private.
3.2 Utility
Since one can construct many learning algorithms from a sufficiently good estimate of alda2017bernstein , we focus on utility guarantees for the RACE estimate of . Since RACE is a collection of unbiased estimators for , our proof strategy is to bound the distance between the RACE estimate and the mean. To bound the variance of the private RACE estimator, we add the independent Laplace noise variance to the bound from Theorem 2. Theorem 4 follows using the medianofmeans procedure.
Theorem 4.
Let be the medianofmeans estimate using an differentially private RACE sketch with rows and . Then with probability ,
Theorem 4 suggests a tradeoff for which there is an optimal value of . If we increase the number of rows in the sketch, we improve the estimator but must add more Laplace noise. To get our main utility guarantee, we choose an optimal that minimizes the error bound.
Corollary 1.
Put . Then the approximation error bound is
If we divide both sides of Corollary 1 by , we bound the relative (or percent) error rather than the absolute error. Corollary 1 suggests that it is hardest to achieve a small relative error when is small. This agrees with our intuition about how the KDE should behave under differential privacy guarantees. Fewer individuals make heavy contributions to in lowdensity regions than in highdensity ones, so the effect of the noise is worse.
4 Applications
Because RACE can release LSH kernel sums, our sketch is broadly useful for many algorithms in machine learning. In particular, we discuss private density estimation, classification, regression, mode finding, anomaly detection and diversity sampling using RACE.
Kernel Density Estimation: To use RACE for KDE, we select one or more LSH kernels and construct sketches with Algorithm 1. We require one RACE sketch for each kernel and bandwidth setting, and we return the result of Algorithm 2. Note that the query no longer has access to , the number of elements in the private dataset. Therefore, we estimate directly from the private sketch.
Mode Finding:
One can apply gradientfree optimization to KDE to recover the modes of the data distribution. This works surprisingly well, but in general the KDE is a nonconvex function. One can also apply linear programming techniques that use information about the hash function partitions to identify a point in a partition with the highest count values.
Naive Bayes Classification:
Using kernel density classification, a welldeveloped result from statistical hypothesis testing
john1995estimating, we can construct classifiers with RACE under both the maximumlikelihood and maximum a posteriori (MAP) decision rules. Suppose we are given a training set
with classes and a query . We can represent the empirical likelihood with a sketch of the KDE for class . Algorithm 2returns an estimate of this probability, which may be used directly by a naive Bayes classifier or other type of probabilistic learner.
Anomaly Detection / Sampling: Anomaly detection can be cast as a KDE classification problem. If Algorithm 2 reports a low density for , then the training set contains few elements similar to and thus
is an outlier. This principle is behind the algorithms in
luo2018arrays and coleman2019diversified .Linear Regression: If we use an asymmetric LSH family shrivastava2014asymmetric , we can construct a valid surrogate loss for the linear regression loss . The key insight is that the signed random projection (SRP) LSH kernel is monotone in the inner product. If we apply SRP to both and , we obtain an LSH kernel with two components. One component is monotone increasing with the inner product, while the other is monotone decreasing. The resulting surrogate loss is a monotone function of the L2 loss and a convex function of .
5 Experiments
Dataset  Description  Task  
NYC  25k  1  5k  NYC salaries (2018)  KDE 
SF  29k  1  5k  SF salaries (2018)  
skin  241k  3  5.0  RGB skin tones  
codrna  57k  8  0.5  RNA genomic data  
nomao  34k  26  0.6  User location data  Classification 
occupancy  17k  5  0.5  Building occupancy  
pulsar  17k  8  0.1  Pulsar star data  
airfoil  1.4k  9    Airfoil parameters and sound level  Regression 
naval  11k  16    Frigate turbine propulsion  
gas  3.6k  128    Gas sensor, different concentrations 
PFDA  Bernstein  KME  RACE  
Preprocess  days  2.3 days  6 hr  13 min 
Query    6.2 ms  1.2 ms  0.4 ms 
We perform an exhaustive comparison for KDE, classification and regression. For KDE, we estimate the density of the salaries for New York City (NYC) and San Francisco (SF) city employees in 2018, as well as the highdimensional densities for the skin and codrna UCI datasets. We use the same LSH kernel as the authors of coleman2020race . We use UCI datasets for the regression and classification experiments. Most private KDE methods require the data to lie in the unit sphere or cube; we scale the inputs accordingly. Table 2 presents the datasets used in our experiments. We make the following considerations for our baseline methods.
KDE: We implemented several function release methods from Table 1. We also compare against the Fourier basis estimator hall2013differential ; wasserman2010statistical and the kernel mean embedding (KME) method balog2018differentially . We use the Python code released by balog2018differentially and implemented the other methods in Python. To give a fair comparison, we profiled and optimized each of our baselines. We were unable to run Fourier and PFDA in more than one dimension because they both require function evaluations on a Bravais lattice, which scales exponentially. We show the density estimates and their errors in Figure 2.
Classification:
We compare a maximumlikelihood RACE classifier against a regularized logistic regression classifier trained using objective perturbation
chaudhuri2011differentially . We average over the Laplace noise and report the accuracy on a heldout test set in Figure 3.Regression: We compare RACE regression against five algorithms: sufficient statistics perturbation (SSP), objective perturbation (ObjPert), posterior sampling (OPS), and adaptive versions of SSP and OPS (AdaSSP and AdaOPS wang2018revisiting ). We use the MatLab code released by wang2018revisiting and report the mean squared error (MSE) on a heldout test set in Figure 4.
Computation: Table 3 displays the computation time needed to construct a useful function release. Note that Bernstein release can run faster if we use fewer interpolation points (see Table 1
), but we still required at least 12 hours of computation for competitive results. The Bernstein mechanism requires many binomial coefficient evaluations, which were expensive even when we used optimized C code. KME requires a largescale kernel matrix computation, and PFDA required several days for an expensive eigenvalue computation. RACE construction time varies based on
, but is substantially faster than all baselines.6 Discussion
Function Release at Scale:
RACE is ideal for private function release in largescale distributed applications. Although PFDA and the Bernstein mechanism have the strongest error bounds, they required days to produce the experimental results in Figure 2. This is a serious barrier in practice  it would require computations to run the Bernstein mechanism on the UCI gas dataset. The RACE sketch has a small memory footprint, inexpensive streaming updates and can quickly run on highdimensional datasets. We believe that RACE can make differentially private function release a viable tool for realworld applications.
Our sketch is also convenient to use and deploy in a production environment because it is relatively insensitive to hyperparameters. In general, we found that density estimation requires a larger sketch with more rows
than classification or regression. Classification and regression problems benefit from a smaller hash range than function release. Higher dimensional problems also require a larger hash range. However, these choices are not critical and any with will provide good results. Hyperparameter tuning can be accomplished using very little of the overall privacy budget.Privacy and Utility:
Our sketch beats interpolationbased methods for private function release when the dataset has more than a few dimensions and when is not smooth. Depending on smoothness, our error rate improves upon the rates of wang2013efficient and hardt2012simple for LSH kernels. In our experiments (Figure 2), the Bernstein mechanism outperforms RACE on the SF dataset but fails to capture the nuances of the NYC salary distribution, which has sharp peaks. RACE preserves the details of because the Laplace noise can only make local changes to the RACE structure. Suppose we generate a Laplace noise that is unusually large for one particular RACE counter. Queries that do not involve the problematic counter are unaffected. If we perturb one or two of the most important weights of a series estimator, the changes propagate to all queries.
Although RACE can only estimate LSH kernels, the space of LSH kernels is sufficiently large that we can release many useful functions. Our RACE surrogate loss outperformed objective perturbation chaudhuri2011differentially and compared favorably to other linear regression baselines. RACE also performed well in our classification experiments for , providing a competitive utility tradeoff for practical privacy budgets. Our experiments show that RACE can privately release useful functions for many machine learning problems.
7 Conclusion
We have presented RACE, a differentially private sketch that can be used for a variety of machine learning tasks. RACE is competitive with the state of the art on tasks including density estimation, classification, and linear regression. RACE sketches can be constructed in the onepass streaming setting and are highly computationally efficient. At the same time, our sketch offers good performance on many machine learning tasks and is reasonably efficient in terms of the privacy budget. Given the utility, simplicity and speed of the algorithm, we expect that RACE will enable differentially private machine learning in largescale settings.
Broader Impact
Privacy is a serious concern in machine learning and a barrier to the widespread adoption of useful technology. For example, machine learning models can substantially improve medical treatments, inform social policy, and help with financial decisionmaking. While such applications are overall beneficial to society, they often rely on sensitive personal data that users may not want to disclose. For instance, a 2019 survey found that a large majority ( 80%) of Americans are concerned with but do not understand how technology companies use the data they collect pew2019 . Data management practices have recently come under increased scrutiny, especially after the enactment of the European GDPR Act in 2018. To improve public confidence in their services, Google erlingsson2014rappor and Apple apple2017 both advertise differential privacy as a central part of their data collection process.
Differential privacy provides strong protection against malicious data use but is difficult to apply in practice. This is particularly true for functional data release, which is often prohibitively expensive in terms of computation cost. Our work substantially reduces the computational cost of this useful technique to achieve differential privacy. By reducing the computational requirement, we make privacy more accessible and attainable for data publishers, since large computing infrastructure is no longer needed to release private information. This directly reduces the economic cost of data privacy, an important consideration for governments and private companies. Our sketching method also allows private functional release to be applied to more realworld datasets, making it easier to release highdimensional information with differential privacy.
Appendix
This section contains proofs and detailed discussion of the theorems.
Privacy
We consider a function that takes in a dataset and outputs an RACE sketch. is a randomized function, where the randomness is over the choice of the LSH functions and the (independent) Laplace noise. In this context, the differentially private query is the RACE sketch . Our main theorem is that the sketch is differentially private. Because differential privacy is robust to postprocessing, we can then query as many times as desired once the sketch is released.
Proof Sketch:
The sketch consists of independent rows. Each row consists of count values. First, we prove privately release one of the rows by adding Laplace noise with variance (i.e. sensitivity ). This is described in Lemma 1. Then, we apply Lemma 1 with to release all the rows, proving the main theorem.
Lemma 1.
Consider one row of the RACE sketch, and add independent Laplace noise to each counter. The row can be released with differential privacy.
Proof.
Consider just one of the counters. It is easy to see that the sensitivity of the counter is 1 because changing a single element from the dataset can only create a change of in the counter. By the Laplace mechanism, this counter can be released with differential privacy by adding Laplace noise .
The row contains counters, and we can release each one with the Laplace mechanism. Therefore, we have mechanisms (one to release each counter) which are independently differentially private. The parallel composition theorem dwork2014algorithmic states that if each mechanism is computed on a disjoint subset of the dataset, then we can compute all of the mechanisms (i.e. release all of the counters) with differential privacy.
This is indeed the case for the RACE sketch. Under the LSH function, each element in the dataset maps to exactly one of the counters. Thus, the counters are computed on disjoint subsets of the dataset and we can release them with differential privacy. ∎
Theorem 3.
For any , , and LSH family , the output of Algorithm 1, or the RACE sketch , is differentially private.
Proof.
The sketch is composed of independent rows. Consider the mechanism from Lemma 1 to release a single row with differential privacy. The sequential composition theorem dwork2014algorithmic states that given mechanisms (one to release each row) which are independently differentially private, we can compute all the mechanisms (i.e. release all rows) with differential privacy.
To construct the sketch, we apply Lemma 1 to each row with differential privacy by adding independent Laplace noise to each counter. The sequential composition theorem guarantees that we can release all rows with differential privacy. ∎
Utility
First, we introduce the medianofmeans trick in Lemma 2. The medianofmeans procedure is a standard analysis technique with many applications in the streaming literature. Lemma 2 is a special case of Theorem 2.1 from alon1999space .
Lemma 2.
Let be
i.i.d. random variables with mean
and variance . To get the median of means estimate , break the random variables into groups with elements in each group.Put and ^{2}^{2}2For simplicity, we suppose and are integers that evenly divide . See alon1999space for the complete analysis.. Then with probability at least , the deviation of the estimate from the mean is
Each row of the sketch provides an estimator to be used in the medianofmeans technique. By adding the Laplace noise variance, we can find in Lemma 2 and provide performance guarantees.
Theorem 4.
Let be the medianofmeans estimate using an differentially private RACE sketch with rows and . Then with probability ,
Proof.
To use medianofmeans, we must bound the variance of the RACE estimate after we add the Laplace noise. Since the Laplace noise is independent of the LSH functions, we simply add the Laplace noise variance to the variance bound in Theorem 2.
Using this variance bound with Lemma 2 we have the following statement with probability .
∎
Corollary 1.
Put . Then the approximation error bound is
Proof.
Take the derivative of with respect to to find that minimizes the bound. Put and . The corollary is obtained by substituting into Theorem 4. We may replace with in the inequality because
∎
References

[1]
Francesco Aldà and Benjamin IP Rubinstein.
The bernstein mechanism: Function release under differential privacy.
In
ThirtyFirst AAAI Conference on Artificial Intelligence
, 2017. 
[2]
Noga Alon, Yossi Matias, and Mario Szegedy.
The space complexity of approximating the frequency moments.
Journal of Computer and system sciences, 58(1):137–147, 1999.  [3] Brooke Auxier, Lee Rainie, Monica Anderson, Andrew Perrin, Madhu Kumar, and Erica Turner. Americans and privacy: Concerned, confused and feeling lack of control over their personal information. Washington, DC: Pew Research Center, 2019.
 [4] Matej Balog, Ilya Tolstikhin, and Bernhard Schölkopf. Differentially private database release via kernel mean embeddings. In International Conference on Machine Learning, pages 414–422, 2018.
 [5] Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12(Mar):1069–1109, 2011.
 [6] Flavio Chierichetti and Ravi Kumar. Lshpreserving functions and their applications. In Proceedings of the TwentyThird Annual ACMSIAM Symposium on Discrete Algorithms, SODA ’12, page 1078–1094, USA, 2012. Society for Industrial and Applied Mathematics.
 [7] Benjamin Coleman, Benito Geordie, Li Chou, RA Leo Elworth, Todd J Treangen, and Anshumali Shrivastava. Diversified race sampling on data streams applied to metagenomic sequence analysis. bioRxiv, page 852889, 2019.
 [8] Benjamin Coleman and Anshumali Shrivastava. Sublinear race sketches for approximate kernel density estimation on streaming data. In Proceedings of the 2020 World Wide Web Conference. International World Wide Web Conferences Steering Committee, 2020.
 [9] Andrew R Conn, Katya Scheinberg, and Luis N Vicente. Introduction to derivativefree optimization, volume 8. Siam, 2009.
 [10] Cynthia Dwork. Differential privacy. In 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006), volume 4052 of Lecture Notes in Computer Science, pages 1–12. Springer Verlag, July 2006.
 [11] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
 [12] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable privacypreserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054–1067, 2014.
 [13] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 1322–1333, 2015.
 [14] Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. Privacy in pharmacogenetics: An endtoend case study of personalized warfarin dosing. In 23rd USENIX Security Symposium (USENIX Security 14), pages 17–32, 2014.
 [15] Aristides Gionis, Piotr Indyk, Rajeev Motwani, et al. Similarity search in high dimensions via hashing. In Vldb, volume 99, pages 518–529, 1999.
 [16] Rob Hall, Alessandro Rinaldo, and Larry Wasserman. Differential privacy for functions and functional data. Journal of Machine Learning Research, 14(Feb):703–727, 2013.
 [17] Moritz Hardt, Katrina Ligett, and Frank McSherry. A simple and practical algorithm for differentially private data release. In Advances in Neural Information Processing Systems, pages 2339–2347, 2012.

[18]
Piotr Indyk and Rajeev Motwani.
Approximate nearest neighbors: towards removing the curse of dimensionality.
InProceedings of the thirtieth annual ACM symposium on Theory of computing
, pages 604–613. ACM, 1998.  [19] George H. John and Pat Langley. Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, UAI’95, page 338–345, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc.
 [20] Chen Luo and Anshumali Shrivastava. Arrays of (localitysensitive) count estimators (ace): Anomaly detection on the edge. In Proceedings of the 2018 World Wide Web Conference, pages 1439–1448. International World Wide Web Conferences Steering Committee, 2018.
 [21] Ardalan Mirshani, Matthew Reimherr, and Aleksandra Slavković. Formal privacy for functional data with Gaussian perturbations. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 4595–4604, Long Beach, California, USA, 09–15 Jun 2019. PMLR.
 [22] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pages 3–18. IEEE, 2017.
 [23] Anshumali Shrivastava and Ping Li. Asymmetric lsh (alsh) for sublinear time maximum inner product search (mips). In Advances in Neural Information Processing Systems, pages 2321–2329, 2014.
 [24] Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 1(8), 2017.
 [25] Jonathan Ullman. Answering n2+o(1) counting queries with differential privacy is hard. In Proceedings of the FortyFifth Annual ACM Symposium on Theory of Computing, STOC 2013, page 361–370, New York, NY, USA, 2013. Association for Computing Machinery.
 [26] YuXiang Wang. Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain. Proceedings of the ThirtyFourth Conference on Uncertainty in Artificial Intelligence, pages 93–103, Monterey, California USA, 0610 August 2018.
 [27] Ziteng Wang, Kai Fan, Jiaqi Zhang, and Liwei Wang. Efficient algorithm for privately releasing smooth queries. In Advances in Neural Information Processing Systems, pages 782–790, 2013.
 [28] Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375–389, 2010.