1 Introduction
With the unprecedented availability of datasets containing sensitive personal information, there are increasing concerns that statistical analysis of such datasets may compromise individual privacy. These concerns give rise to statistical methods that provide privacy guarantees at the cost of statistical accuracy, but there has been very limited understanding of the optimal tradeoff between statistical accuracy and privacy cost.
A rigorous definition of privacy is a prerequisite for such an understanding. Differential privacy, introduced in Dwork et al. [15], is arguably the most widely adopted definition of privacy in statistical data analysis. The promise of a differentially private algorithm is protection of any individual’s privacy from an adversary who has access to the algorithm output and even sometimes the rest of the data.
Differential privacy has gained significant attention in the machine learning communities over the past few years
[16, 1, 19, 13] and found its way into real world applications developed by Google [21], Apple [9], Microsoft [10], and the U.S. Census Bureau [2].A usual approach to developing differentially private algorithms is perturbing the output of nonprivate algorithms by random noise. When the observations are continuous, differential privacy can be guaranteed by adding Laplace/Gaussian noise to the nonprivate output [16]. For discrete data, differential privacy can be achieved by adding Gumbel noise to utility score functions (also known as the exponential mechanism). Naturally, the processed output suffers from some loss of accuracy, which has been observed and studied in the literature, see, for example, Wasserman and Zhou [39], Smith [32], Lei [25], Bassily et al. [5], Dwork et al. [17]. However, given a certain privacy constraint, it is still unclear what the best achievable statistical accuracy is, or in other words, what the optimal tradeoff between privacy cost and statistical accuracy is.
The goal of this paper is to provide a quantitative characterization of the tradeoff between privacy cost and statistical accuracy, under the statistical minimax framework. Specifically, we consider this problem for mean estimation and linear regression models in both classical and highdimensional settings with differential privacy constraint, which is formally defined as follows.
Definition 1 (Differential Privacy [15]).
A randomized algorithm is differentially private if and only if for every pair of adjacent datasets and , and for any set ,
where we say two datasets and are adjacent if and only if
According to the definition, the two parameters and control the level of privacy against an adversary who attempts to detect the presence of a certain subject in the sample. Roughly speaking, is an upper bound on the amount of influence an individual’s record has on the information released and
is the probability that this bound fails to hold, so the privacy constraint becomes more stringent as
tend to .We establish the necessary cost of privacy by first providing minimax lower bounds for the estimation accuracy under this differential privacy constraint. The results show that the estimators with privacy guarantees generally exhibit very different rates of convergence compared to their nonprivate counterparts. As a first example, we consider the dimensional mean estimation under the loss: Theorem 2.2 in Section 2 shows that, when the sample size is , for any differentially private algorithm, in addition to the standard statistical error, there must be an extra error of at least the order of . This lower bound is established by using a general technique presented in Theorem 2.1, which reduces the establishing minimax risk lower bounds to designing and analyzing a tracing adversary that aims to detect the presence of an individual in a dataset via the output of a differentially private procedure that is applied to the dataset. The design and analysis of tracing adversary makes use of a novel generalization of the fingerprinting lemma, a concept from cryptography [6]. The connections between tracing adversaries, the fingerprinting lemma and differential privacy have been observed in [35], [7] and [18], but their discussions are primarily concerned with discrete distributions. In this paper, we provide a continuous version of the fingerprinting lemma that enables us to establish minimax lower bounds for a greater variety of statistical problems; more discussions are given in Section 2 as well as the Supplementary Material [8].
Further, we argue that these necessary costs of privacy, as shown by lower bounds for the minimax rates, are in fact sharp in both mean estimation and linear regression problems. We construct efficient algorithms and establish matching upper bounds up to logarithmic factors. These algorithms are based on several differentially private subroutines, such as the Gaussian mechanism, reporting noisy top, and their modifications. In particular, for the highdimensional linear regression, we propose a novel private iterative hard thresholding pursuit algorithm, based on a privately truncated version of stochastic gradient descent. Such a private truncation step effectively enforces the sparsity of the resulting estimator and leads to optimal control of the privacy cost (see more details in Section 4.2
). To the best of our knowledge, these algorithms are the first results achieving the minimax optimal rates of convergence in highdimensional statistical estimation problems with the
differential privacy guarantee. Our Theorems 3.1, 3.2, 4.1, and 4.3 together provide matching upper and lower bounds for both mean estimation and linear regression problems in highdimensional and classical settings, up to logarithmic factors.Related literature
There are previous works studying how the privacy constraints compromise estimation accuracy. In theoretical computer science, Smith [32] showed that under strong conditions on privacy parameters, some point estimators attain the statistical convergence rates and hence privacy can be gained for free. [5, 17, 34]
proposed differentially private algorithms for convex empirical risk minimization, principal component analysis, and highdimensional regression, and investigated the convergence rates of excess risk. In addition,
[7, 36, 3] considered the optimal estimation of sample quantities such as way marginals and top selection with privacy constrain. Unlike most prior works that focused on excess risks or the release of sample quantities, our focus is the population parameter estimation. Theoretical properties of excess risks or sample quantities can be very different from those of population parameters; see more discussions in [11].More recent works aimed to study differential privacy in the context of statistical estimation. [39] observed that, local differentially private schemes seem to yield slower convergence rates than the optimal minimax rates in general; [12] developed a framework for statistical minimax rates with the local privacy constraint; in addition, Rohde and Steinberger [31] showed minimax optimal rates of convergence under local differential privacy and exhibited a mechanism that is minimax optimal for “nearly” linear functionals based on randomized response. However, local privacy is a much stronger notion of privacy than differential privacy that is hardly compatible with highdimensional problems [12]. As we shall see in this paper, the cost of differential privacy in statistical estimation behaves quite differently compared to that of local privacy.
Organization of the paper
The rest of the paper is organized as follows. Section 2 introduces a general technical tool for deriving lower bounds of the minimax risk with differential privacy constraint. The new technical tool is then applied in Section 3 to the highdimensional mean estimation problem. Both minimax lower bound results and algorithms with matching upper bounds are obtained. Section 4 further applies the general lower bound technique to investigate the minimax lower bounds of the linear regression problem with differential privacy constraint, in both lowdimensional and highdimensional settings. The upper bounds are also obtained by providing novel differentially private algorithms and analyzing their risks. The results together show that our bounds are rateoptimal up to logarithmic factors. Simulation studies are carried out in Section 5 to show the advantages of our proposed algorithms. Section 6 applies our algorithms to real data sets with potentially sensitive information that warrants privacypreserving methods. Section 7 discusses extensions to other statistical estimation problems with privacy constraints. The proofs are given in Section 8.
Definitions and notation
We conclude this section by introducing notations that will be used in the rest of the paper. For a positive integer , denotes the set
. For a vector
, we use and to denote the usual vector and norm, respectively, where the norm counts the number of nonzero entries in a vector. For any set and , let denote the dimensional vector consisting of such that . The Frobenius norm of a matrix is denoted by , and the spectral norm of is . In addition, we useto denote the smallest and the largest eigenvalues of
. The matrix norm is defined similarly as the vector norm, i.e. . In addition, denotes the determinant of . The empirical measure is denoted by for a sample . For a set , we use to denote its complement, and denotes the indicator function on . We use , , and to denote generic constants which may vary line by line.2 A General Lower Bound for Minimax Risk with Differential Privacy
This section presents a general minimax lower bound technique for statistical estimation problems with differential privacy constraint. As an application, we use this technique to establish a tight lower bound for differentially private mean estimation in this section.
Our lower bound technique is based on a tracing adversary that attempts to detect the presence of an individual data entry in a data set with the knowledge of an estimator computed from the data set. If one can construct a tracing adversary that is effective at this task given an accurate estimator, an argument by contradiction leads to a lower bound of the accuracy of differentially private estimators: suppose a differentially private estimator from a data set is sufficiently accurate, the tracing adversary will be able to determine the presence of an individual data entry in the data set, thus contradicting with the differential privacy guarantee. In other words, the privacy guarantee and the tracing adversary together ensure that a differentially private estimator cannot be “too accurate”.
2.1 Background and problem formulation
Let denote a family of distributions supported on a set , and let denote a population quantity of interest. The statistician has access to a data set of i.i.d. samples, , drawn from a statistical model .
With the data, our goal is to estimate a population parameter by an estimator that belongs to , the collection of all differentially private procedures. The performance of is measured by its distance to the truth : formally, let be a metric induced by a norm on , namely , and let
be a loss function that is monotonically increasing on
, this paper studies the minimax risk for differentiallyprivate estimation of the population parameter :In this paper, our setting of the privacy parameters are and . This is essentially the mostpermissive setting under which differential privacy is a nontrivial guarantee: [33] shows that is essentially the weakest privacy guarantee that is still meaningful.
2.2 Lower bound by tracing
Consider a tracing adversary that outputs IN if it determines a certain sample is in the data set after seeing , and outputs OUT otherwise. We define , the index set of samples that are determined as IN by the adversary . A survey of tracing adversaries and their relationship with differential privacy can be found in [19] and the reference therein.
Our general lower bound technique requires some regularity conditions for and : for every , we assume that there exists a such that for every , , and The two statistical problems investigated in this paper, mean estimation and linear regression, satisfy the property.
The following theorem shows that minimax lower bounds for statistical estimation problems with privacy constraint can be constructed if there exist effective tracing adversaries:
Theorem 2.1.
Suppose is an i.i.d. sample from a distribution , and assume that and satisfy the regularity conditions described above. Given a tracing adversary that satisfies the following two properties when ,

completeness:

soundness: , where is an adjacent dataset of with replaced by ,
then if , for some , and , we have
Completeness and soundness roughly correspond to “true positive” and “false positive” in classification: completeness requires the adversary to return some nontrivial result when its input is accurate; soundness guarantees that an individual is unlikely to be identified as IN if the estimator that used is independent of the individual. When a tracing adversary satisfies these properties, Theorem 2.1 conveniently leads to a minimax risk lower bound; that is, Theorem 2.1 is a reduction from constructing minimax risk lower bounds to finding complete and sound tracing adversaries.
In the next section, we illustrate this technique by designing a complete and sound tracing adversary for the classical mean estimation problem.
2.3 A first application: private mean estimation in the classical setting
Consider the
dimensional subGaussian distribution family
, defined aswhere is the mean of , and denotes the th standard basis vector of .
Following the notation introduced in Section 2.1, and . Further we take and , so that our risk function is simply the error. The minimax risk is then denoted by
We propose a tracing adversary:
where is a fresh independent draw from . The adversary is indeed complete and sound, as desired:
Lemma 2.3.1.
If , there is a distribution , such that


, where is an adjacent dataset of with replaced by .
Intuitively, this adversary is constructed as follows. Without privacy constraints, a natural estimator for is the sample mean . On one hand, when does not belong to , is a sum of
independent zeromean random variables and we have
. On the other hand, when belongs to , we will have , and is more likely to output IN than OUT.In view of Theorem 2.1, and ; it follows that
Combining with the wellknown statistical minimax lower bound, see for example, Lehmann and Casella [24], namely
we arrive at the minimax lower bound result for differentially private mean estimation.
Theorem 2.2.
Let denote the collection of all differentially private algorithms, and let be an i.i.d. sample drawn from . Suppose that , for some and , then
Remark 1.
In comparison, applying Barber and Duchi [4]’s lower bound argument to our current model yields
Remark 2.
The minimax lower bound characterizes the cost of privacy in the mean estimation problem: the cost of privacy dominates the statistical risk when .
3 Privacy Cost of Highdimensional Mean Estimation
In this section and the subsequent Section 4, we consider the highdimensional setting where and the population parameters of interest, such as the mean vector or the regression coefficient , are sparse. In each statistical problem investigated, we present a minimax risk lower bound with differential privacy constraint, as well as a procedure with differential privacy guarantee that attains the lower bound up to factor(s) of .
3.1 Private highdimensional mean estimation
We first consider the problem of estimating the sparse mean vector of a dimensional subGaussian distribution, where can possibly be much larger than the sample size . We denote the parameter space of interest by
where the sparsity level is controlled by the parameter .
The tracing adversary for this problem is given by
where is an independent draw from , and
Given computed from a data set , the tracing adversary attempts to identify whether an individual belongs to , by calculating the difference of and over those coordinates where has a large value. If belongs to , the former should be correlated with and is likely to be larger than the latter.
Formally, the tracing adversary is complete and sound under appropriate sample size constraint:
Lemma 3.1.1.
If , there is a distribution such that


, where is an adjacent data set of with replaced by .
In conjunction with our general lower bound result Theorem 2.1, we have
Theorem 3.1.
Let denote the collection of all differentially private algorithms, and let be an i.i.d. sample drawn from . Suppose that , for some , and , then
The first term is the statistical minimax lower bound of sparse mean estimation (see, for example, [23]), and the second term is due to the privacy constraint. Comparing the two terms shows that, in highdimensional sparse mean estimation, the cost of differential privacy is significant when
In the next section, we present a differentially private procedure that attains this convergence rate up to a logarithmic factor.
3.2 Rateoptimal procedures
The rateoptimal algorithms in this paper utilize some classical subroutines in the differential privacy literature, such as the Laplace and Gaussian mechanisms and reporting the noisy maximum of a vector. Before describing our rateoptimal algorithms in detail, it is helpful to review some relevant results, which will also serve as the building blocks of the differentially private linear regression methods in Section 4.
3.2.1 Basic differentially private procedures
It is frequently the case that differential privacy can be attained by adding properly scaled noises to the output of a nonprivate algorithm. Among the most prominent examples are the Laplace and Gaussian mechanisms.
The Laplace and Gaussian mechanisms
As the name suggests, the Laplace and Gaussian mechanisms achieve differential privacy by perturbing an algorithm with Laplace and Gaussian noises respectively. The scale of such noises is determined by the sensitivity of the algorithm:
Definition 2.
For any algorithm mapping a dataset to , The sensitivity of is
For algorithms with finite sensitivity, the differential privacy guarantee can be attained by adding noises sampled from a Laplace distribution.
Lemma 3.2.1 (The Laplace mechanism [16]).
For any algorithm mapping a dataset to such that , the Laplace mechanism, given by
where is an i.i.d. sample drawn from , achieves differential privacy.
Similarly, adding Gaussian noises to algorithms with finite sensitivity guarantees differential privacy.
Lemma 3.2.2 (The Gaussian mechanism [16]).
For any algorithm mapping a dataset to such that , the Gaussian mechanism, given by
where is an i.i.d. sample drawn from , achieves differential privacy.
An important application of these mechanisms is differentially private selection of the maximum/minimum, which also plays a crucial role in our highdimensional mean estimation algorithm. Next we review some algorithms for differentially private selection, to provide some concrete examples and prepare us for stating the main algorithms.
Differentially private selection
Selecting the maximum (in absolute value) coordinate of is a straightforward application of the Laplace mechanism, as follows:
Algorithm 1: PrivateMax: 1:Sample . 2:For , compute the noisy version . 3:Return and , where is an independent draw from .
Lemma 3.2.3 ([20]).
If , then PrivateMax is differentially private.
In applications, we are often interested in finding the top numbers with . There are two methods for this task: an iterative “Peeling” algorithm that runs the PrivateMax algorithm times, with appropriately chosen privacy parameters in each iteration.
Algorithm 2: Peeling: 1:Set . 2:for to do 3: Run PrivateMax to obtain . 4: Remove from . 5:end for 6:Report the selected pairs.
Lemma 3.2.4 ([20]).
If , then Peeling is differentially private.
With these differentially private selection subroutine, we are ready to present the highdimensional mean estimation algorithm in the next section.
3.2.2 Differentiallyprivate mean estimation in high dimensions
Let denote projection onto the ball of radius in , where is a tuning parameter for the truncation level. With suitably chosen , the following algorithm attains the minimax lower bound in Theorem 3.1, up to at most a logarithmic factor in .
Algorithm 3: Private Highdimensional Mean Estimation
1:Compute
2:Find the top components of by running Peeling and set the remaining components to 0. Denote the resulting vector by .
3:Return
In view of Theorem 3.1, the theorem below shows that the highdimensional mean estimation algorithm is rateoptimal up to a factor of .
Theorem 3.2.
For with , if ， and , then Algorithm 3 is differentially private, and

if there exists a constant such that , when ,

otherwise, with the choice of for a sufficiently large constant ,
Remark 3.
Remark 4.
The role of the truncation parameter is to control the sensitivity of the sample mean so that the Laplace/Gaussian mechanisms are applicable. can be replaced by a differentiallyprivate estimator that consistently estimates the sample’s range. Examples of such an estimator can be found in [25]. This remark is applicable to all truncation tuning parameters in algorithms that appear in Sections 3 and 4.
3.2.3 Differentially private algorithms in the classical setting
In the classical setting of , the optimal rate of convergence of the mean estimation problem can be achieved simply by a noisy, truncated sample mean: given an i.i.d. sample , the estimator is defined as
where denotes projection onto the ball of radius in , and is an independent draw from The theoretical guarantees for this estimator are summarized in the theorem below.
Theorem 3.3.
For an i.i.d. sample with satisfying , is an differentially private procedure, and:

if there exists a constant such that , when ,

otherwise, with the choice of for a sufficiently large constant ,
By comparing with Theorem 2.2, we see that the noisy truncated sample mean achieves the optimal rate of convergence up to a factor of .
4 Privacy Cost of Linear Regression
In this section, we investigate the cost of differential privacy in linear regression problems, with primary focus on the highdimensional setting where and the regression coefficient is assumed to be sparse; the classical, lowdimensional case () will also be covered. Through the general lower bound technique described in Section 2, we are able to establish minimax lower bounds that match the minimax rate of our differentially private procedures up to factor(s) of .
4.1 Lower bound of highdimensional linear regression
For highdimensional sparse linear regression, we consider the following distribution space
where the parameter of interest is is defined such that is the best linear approximation of , and is a generic constant. For brevity, we use to denote .
Let denote an i.i.d. sample drawn from some , we propose the tracing adversary
where , and is a fresh independent sample with covariates .
This adversary satisfies the following properties:
Lemma 4.1.1.
Suppose that , then when , there is a distribution such that


, where is an adjacent dataset of with replaced by .
The proof of this lemma, which appears in the supplementary material, includes a novel generalization of the fingerprinting lemma (see [35], [7], and [18]) to Gaussian random variables, which may be of independent interest.
We note that the extra assumption in Lemma 4.1.1 that can be gained “for free”: when it fails to hold, there would be an automatic lower bound that On the other hand, when , the general lower bound result in Theorem 2.1 is applicable, and we obtain the following lower bound result.
Theorem 4.1.
Let denote the collection of all differentially private algorithms, and suppose the dataset consists of i.i.d. entries drawn from . Suppose that , for some , and , then
4.2 Upper bound of highdimensional linear regression
For highdimensional sparse linear regression, we propose the following differentially private LASSO algorithm, which splits the sample of size into subsamples of size and iterates through the subsamples by a truncated gradient descent with random perturbation.
Algorithm 4: Differentially Private LASSO 1:Inputs: privacy parameters deign matrix ,response vector , step size , sparsity tuning parameter , truncation tuning parameter and the number of iterations . 2:Randomly split into subsets of size each. 3:Initialize the algorithm with an sparse vector . 4:for do 5: , where denotes projection onto the ball of radius in . 6: = Peeling 7:end for 8:Output .
Theorem 4.2.
Let be an i.i.d. sample drawn from . If we have that

, the covariance matrix of , satisfies for some constant ,

for some , and

the tuning parameters satisfy for a sufficiently large constant , , and for , it holds that
then is differentially private, and it holds with high probability that
To the best of our knowledge, this is the first differentially private LASSO algorithm with parameter estimation consistency guarantees. In addition, in view of Theorem 4.1, we see that the proposed algorithm achieves the optimal convergence rate up to a logarithm term .
4.3 Linear regression in the classical setting
In the classical linear regression problem, we have i.i.d. observations drawn from some that belongs to the distribution space
where the parameter of interest is is defined such that is the best linear approximation of , and is a generic constant.
To apply Theorem 2.1 to deriving the lower bound for the linear regression model, we consider the following tracing adversary:
where is a fresh independent draw with the same covariates as .
The next lemma summarizes the soundness and completeness properties of the tracing adversary.
Lemma 4.3.1.
If and , there is a distribution , such that


, where is an adjacent dataset of with replaced by .
As in the highdimensional setting, the extra assumption in this lemma that can be gained “for free”.
Our minimax lower bound for private linear regression in the classical setting is presented in the theorem below:
Theorem 4.3.
Let denote the collection of all differentially private algorithms, and suppose that , for some and , then
Similar to the other lower bound results, the two terms in this minimax lower bound correspond to the statistical risk and the risk due to privacy constraint respectively.
4.3.1 Differentially private algorithms in the classical setting
In the classical setting of , the optimal rate of convergence for differentially private linear regression can be directly achieved by perturbing the OLS estimator with suitably chosen noises.
Let denote the OLS estimator, we consider the noisy estimator
Comments
There are no comments yet.