I Introduction
Differential privacy, introduced by Dwork et al. (2006b), is a framework to quantify to what extent individual privacy in a statistical dataset is preserved while releasing useful aggregate information about the dataset. Differential privacy provides strong privacy guarantees by requiring the nearindistinguishability of whether an individual is in the dataset or not based on the released information. For more motivation and background of differential privacy, we refer the readers to the survey by Dwork (2008) and the book by Dwork and Roth (2014).
The classic differential privacy is called differential privacy, which imposes an upper bound on the multiplicative distance of the probability distributions of the randomized query outputs for any two neighboring datasets, and the standard approach for preserving
differential privacy is to add a Laplacian noise to the query output. Since its introduction, differential privacy has spawned a large body of research in differentially private datareleasing mechanism design, and the noiseadding mechanism has been applied in many machine learning algorithms to preserve differential privacy, e.g., logistic regression
(Chaudhuri and Monteleoni, 2008), empirical risk minimization (Chaudhuri et al., 2011), online learning (Jain et al., 2012), statistical risk minimization (Duchi et al., 2012)(Shokri and Shmatikov, 2015; Abadi et al., 2016; Phan et al., 2016; Agarwal et al., 2018), hypothesis testing (Sheffet, 2018), matrix completion (Jain et al., 2018)(Park et al., 2017), and principal component analysis
(Chaudhuri et al., 2012; Ge et al., 2018).To fully make use of the randomized query outputs, it is important to understand the fundamental tradeoff between privacy and utility (accuracy). Ghosh et al. (2009) studied a very general utilitymaximization framework for a single count query with sensitivity one under differential privacy. Gupte and Sundararajan (2010) derived the optimal noise probability distributions for a single count query with sensitivity one for minimax (riskaverse) users. Geng and Viswanath (2016b) derived the optimal
differentially privacy noise adding mechanism for single realvalued query function with arbitrary query sensitivity, and show that the optimal noise distribution has a staircaseshaped probability density function.
Geng et al. (2015) generalized the result in Geng and Viswanath (2016b) to twodimensional query output space for the cost function, and show the optimality of a twodimensional staircaseshaped probability density function. SoriaComas and DomingoFerrer (2013) also independently derived the staircaseshaped noise probability distribution under a different optimization framework.A relaxed notion of differential privacy is differential privacy, introduced by Dwork et al. (2006a). The common interpretation of differential privacy is that it is differential privacy “except with probability ” (Mironov, 2017). The standard approach for preserving differential privacy is the Gaussian mechanism, which adds a Gaussian noise to the query output. Geng and Viswanath (2016a) studied the tradeoff between utility and privacy for a single integervalued query function in differential privacy. Geng and Viswanath (2016a) show that for and cost functions, the discrete uniform noise distribution is optimal for differential privacy when the query sensitivity is one, and is asymptotically optimal as for arbitrary query sensitivity. Balle and Wang (2018) improved the classic analysis of the Gaussian mechanism for differential in the high privacy regime (
), and develops an optimal Gaussian mechanism whose variance is calibrated directly using the Gaussian cumulative density function instead of a tail bound approximation.
Ia Our Contributions
In this work, we study the fundamental tradeoff between privacy and utility in differential privacy for a single readvalued query function. Within the class of queryoutput independently noiseadding mechanisms, we derive the optimal noise distribution for differential privacy under a general costminimization framework similar to Ghosh et al. (2009); Geng and Viswanath (2016b); Geng et al. (2015); Geng and Viswanath (2016a). Under a mild technical condition on the noise probability distribution^{1}^{1}1
In this paper we assume that the noise probability distribution has bigger probability over the smaller noise than the bigger noise. This condition is satisfied by a large class of probability distributions, including the uniform distribution, the Laplacian distribution and the Gaussian distribution. While the optimality result in this paper depends on this assumption, we believe this assumption can be done away.
, we show that the optimal noise probability distribution is a uniform distribution with a probability mass at the origin, which can be viewed as the distribution of the product of a uniform random variable and a Bernoulli random variable. The probability mass on the origin can be zero or nonzero, depending on the value of
. We explicitly derive the optimal noise distribution for general cost functions, including (for noise magnitude) and (for noise power) cost functions, and show that the probability concentration on the origin occurs when .Compared with the improved Gaussian mechanisms for differential privacy (Balle and Wang, 2018), our result demonstrates a twofold and threefold improvement in the high privacy regime in the context of minimizing the noise magnitude and noise power, respectively. The improvement is more pronounced in the low privacy regime.
Comparing the exact optimality results of differential privacy and differential privacy, we show that given the same amount of privacy constraint, differential privacy yields a higher utility than differential privacy in the high privacy regime.
Our result is consistent with the existing result for differential privacy in the discrete setting (Geng and Viswanath, 2016a) which shows that the discrete uniform distribution is optimal for an integervalued query function when the query sensitivity is one, and asymptotically optimal as for general query sensitivity. Interestingly, our result identifies a probability concentration phenomenon in the continuous setting for single realvalued query function.
IB Organization
The paper is organized as follows. In Section II, we give some preliminaries on differential privacy, and formulate the tradeoff between privacy and utility under differential privacy for a single realvalued query function as a functional optimization problem. Section III presents the optimal noise probability distribution preserving differential privacy, subject to a mild technical condition. Section IV applies our main result to a class of momentum cost functions, and derives the explicit forms of the optimal noise probability distributions with minimum noise magnitude and noise power, respectively. Section V compares our result with the improved Gaussian mechanism in the context of minimizaing noise magnitude and noise power.
Ii Problem Formulation
In this section, we first give some preliminaries on differential privacy, and then formulate the tradeoff between privacy and utility under differential privacy for a single realvalued query function as a functional optimization problem.
Iia Background on Differential Privacy
Consider a realvalued query function
where is the set of all possible datasets. The realvalued query function will be applied to a dataset, and the query output is a real number. Two datasets are called neighboring datasets if they differ in at most one element, i.e., one is a proper subset of the other and the larger dataset contains just one additional element (Dwork, 2008). A randomized queryanswering mechanism for the query function will randomly output a number with probability distribution depends on query output , where is the dataset.
Definition 1 (differential privacy (Dwork, 2008)).
A randomized mechanism gives differential privacy if for all data sets and differing on at most one element, and all ,
(1) 
where is the random output of the mechanism when the query function is applied to the dataset .
The differential privacy constraint (1) imposes an upper bound on the multiplicative distance of the two probability distributions. It essentially requires that for all neighboring datasets, the probability distributions of the output of the randomized mechanism should be approximately the same. Therefore, for any individual record, its presence or absence in the dataset will not significantly affect the output of the mechanism, which makes it hard for adversaries with arbitrary background knowledge to make inference on any individual from the released query output information. The parameter quantifies how private the mechanism is: the smaller is, the more private the randomized mechanism is.
The standard approach to preserving differential privacy is to perturb the query output by adding a random noise with Laplacian distribution proportional to the sensitivity of the query function , where the sensitivity of a realvalued query function is defined as:
Definition 2 (Query Sensitivity (Dwork, 2008)).
For a realvalued query function , the sensitivity of is defined as
for all differing in at most one element.
Introduced by Dwork et al. (2006a), a relaxed version of differential privacy is differential privacy, which relaxes the constraint (1) with an additive term .
Definition 3 (differential privacy (Dwork et al., 2006a)).
A randomized mechanism gives differential privacy if for all data sets and differing on at most one element, and all ,
In the special case where , the constraint for differential privacy is
(2) 
It is ready to see that differential privacy puts an upper bound on the additive distance of the two probability distributions.
IiB Differential Privacy Constraint on the Noise Probability Distribution
A standard approach for preserving differential privacy is queryoutput independent noiseadding mechanisms, where a random noise is added to the query output. Given a dataset , a queryoutput independent noiseadding mechanism will release the query output corrupted by an additive random noise with probability distribution :
(3) 
The differential privacy constraint (2) on is that for any such that (corresponding to the query outputs for two neighboring datasets),
where , is defined as the set .
Equivalently, the differential privacy constraint on the noise probability distribution is
(4) 
IiC Utility Model
Consider a cost function , which is a function of the additive noise in the queryoutput noiseadding mechanism. Given an additive noise , the cost is , and thus the expectation of the cost over is
(5) 
Our objective is to minimize the expectation of the cost over the noise probability distribution for preserving differential privacy.
IiD Optimization Problem
Iii Main Result
In this section, we solve the functional optimization problem (6) for differential privacy, and present our main result in Theorem 2. Under a mild technical condition on the probability distribution (see Property 2), we show that the optimal noise probability distribution is a uniform distribution with a probability mass at the origin, which can be viewed as the distribution of the product of a uniform random variable and a Bernoulli random variable.
We assume that the cost function satisfies a natural property.
Property 1.
is a symmetric function, and monotonically increasing for , i.e, satisfies
and
First, we show that without loss of generality, we only need to consider symmetric noise probability distributions.
Lemma 1.
Proof.
Define as follows: ,
It is ready to see
is a symmetric probability distribution. As the loss function
is symmetric, we haveNext we show that also satisfies the differential privacy constraint. Indeed, and such that , we have
∎
Due to Lemma 1, we can restrict ourselves to symmetric noise probability distributions.
As the loss function is monotonically increasing as the noise becomes bigger, we impose a mild and natural condition on the symmetric noise probability contribution, which requires the noise probability distribution to have bigger probability measure on the small noise than the large noise. More precisely,
Property 2.
Given a symmetric probability measure , is monotonically decreasing if
Property 2 is satisfied by a large class of probability distributions, including the uniform distribution, the Laplacian distribution and the Gaussian distribution.
Let denote the set of symmetric probability measures which are monotonically decreasing.
Lemma 2.
Given , then for any , , i.e., cannot have a nonzero probability mass on any singular point except the origin .
Proof.
Suppose there exists such that . Since is symmetric, we can assume . Since is monotonically decreasing in , for any , which implies . This contradicts with the fact that is a probability measure. ∎
Within the classes of monotonically decreasing probability distributions, we identify a sufficient and necessary condition for preserving differential privacy.
Theorem 1.
Given , satisfies the differential privacy constraint (7), if and only if
Proof.
First we show that it is a necessary condition. Assume satisfies the differential privacy constraint (7). Consider and in (7), and we have
and thus . Due to Lemma 2, .
Next we show that it is a sufficient condition. Assume . As is symmetric and the differential privacy constraint (7) applies to all measurable subset and all such that , it is equivalent to show that .
Since is symmetric and monotonically decreasing in , it is easy to see that is maximized when . Therefore, ,
This concludes the proof of Theorem 1. ∎
Consider a class of probability distributions parameterized by , where is defined as
and except the point , has a uniform probability distribution over the set with probability density (see Figure 1).
Let denote the set of all symmetric and monotonically decreasing probability distributions satisfying the differential privacy (7). Our main result on the optimal noise probability distribution is:
Theorem 2.
If the cost function satisfies Property 1, then for any and ,
Proof.
First note that for any , is symmetric and monotonically decreasing in , and
Applying a similar argument as in Lemma 20 of Geng and Viswanath (2016b), we can use a sequence of symmetric and piecewise linear probability density function with probability mass concentration in the origin to approximate any (see Figure 2). More precisely, given a probability distribution which may have nonzero probability mass at , for positive integer , define the probability distribution as follows:
and over the set , has a symmetric probability density function with
It is easy to see that , and thus due to Theorem 1, . Due to the definition of RiemannStieltjes integral, we have
Therefore, we only need to consider probability distributions with a probability mass on the origin and a symmetric monotonically decreasing piecewiseconstant probability density function on .
First we show that without loss of generality, we can assume the probability density function in is a step function, i.e., there exists a such that probability density function is a constant in and is zero in . Indeed, we can rearrange the probability distribution in to make the probability density function to be uniform within certain interval with the probability density the same as the previous bucket (see Figure 3). This will not increase the cost, due to the fact that is a monotonically increasing function on . Since we are not changing the probability distribution in , due to Theorem 1, the probability distribution after the rearrangement also satisfies the differential privacy constraint (7).
Then we show that the probability distribution in shall be uniform as well. Indeed, if the distribution in is not uniform, we can decrease the probability density over to be the same as the point , and move the extra probability mass to the origin point (see Figure 4). Due to the fact that is a monotonically increasing function on , this will not increase the cost. As is unchanged, the new probability distribution satisfies the differential privacy constraint (7).
In the last step, we show that should be exactly . If it is strictly less than , then we can reduce the probability density over and increase to make . Similarly, due to the property of and Theorem 1, we conclude that this reduces the cost while preserving the differential privacy constraint.
This concludes the proof of Theorem 2. ∎
A natural and simple algorithm to generate random noise with probability distribution is given in Algorithm 1.
Iv Applications
In this section, we apply our main result Theorem 2 to derive an explicit expression for the parameter in the optimal noise probability distribution for the class of momentum cost functions in Theorem 3. Applying Theorem 3 to the cases and , we get the optimal noise probability distribution with minimum noise amplitude and minimum noise power for differential privacy, respectively.
Let , i.e., denote the expectation of the cost given the noise probability distribution for the cost function .
Theorem 3.
Given and the query sensitivity . For the general momentum cost function , where , the optimal noise probability distribution to preserve differential privacy with query sensitivity is with
and the minimum cost is
Proof.
It is easy to see that the momentum cost function satisfies Property 1. We can compute the cost via
Define . As is a continuous function of over , and , and , the minimum is achieved at either or the point where the derivative .
Compute the derivative of via
Set and we get .
It is ready to calculate that
It is easy to see that when , at the point where the derivative is zero we have , and thus the minimum is achieved at . When , , and we have . Indeed,
(8) 
where (8) holds as
Therefore, when , the minimum of is achieved at .
In conclusion, for the cost function, the optimal is
and the minimum cost is
∎
Applying Theorem 3 to the cases where and , we derive the optimal noise probability distribution for differential privacy with minimum noise amplitude and minimum noise power, respectively.
Corollary 4 (Optimal Noise Amplitude).
Given and the query sensitivity , to minimize the expectation of the amplitude of the noise (i.e., for the cost function), the optimal noise probability distribution is with
and the minimum expectation of noise amplitude is
(9) 
Corollary 5 (Optimal Noise Power).
Given and the query sensitivity , to minimize the expectation of the power of the noise (i.e., for the cost function), the optimal noise probability distribution is with
and the minimum expectation of noise power is
(10) 
V Comparison to Gaussian Mechanism
In this section, we compare our result with the classic Gaussian mechanism, which adds a Gaussian noise to preserve differential privacy. We show a twofold and threefold improvement in the high privacy regime over the improved Guassian mechanism in Balle and Wang (2018) for minimizing the noise magnitude and the noise power, and we show that the gain is more pronounced in the low privacy regime.
A classic result on the Gaussian mechanism is that for any
, adding a Gaussian noise with standard deviation
preserves differential privacy (Dwork and Roth, 2014). This result does not apply to the differential privacy, as this would require to be when .For differential privacy, Balle and Wang (2018) developed an optimal Gaussian mechanism whose variance is calibrated directly using the Gaussian cumulative density function instead of a tail bound approximation. Balle and Wang (2018) show the following:
Theorem 6 (Theorem 2 in Balle and Wang (2018)).
A Gaussian output perturbation mechanism with preserves differential privacy.
It is ready to see that the Gaussian noise distribution has an expected noise amplitude and an expected noise power .
Gaussian  

Optimal 
Gaussian  

Optimal 
For comparison, our result (9) and (10) in this paper show that the minimum expected noise magnitude and noise power are and in the medium/high privacy regime (). Therefore, our result shows a twofold and threefold multiplicative gain over the improved Gaussian mechanism in Balle and Wang (2018) for differential privacy in the high privacy regime for minimizing the noise magnitude and the noise power, respectively.
In the low privacy regime, the gap is more pronounced: as , the cost of the Gaussian mechanism converges to and for the noise magnitude and the noise power, while the cost of optimal noise from (9) and (10) converges to zero proportionally to .
We plot the ratio of the optimal noise magnitude and the Gaussian mechanism’s noise magnitude in Fig. 5, and plot the ratio of the optimal noise power and the Gaussian mechanism’s noise power in Fig. 6. We conclude that the derived optimal differential private mechanism in this work reduces the noise magnitude and noise power by and in the high privacy regime, and the improvement is more pronounced in the low privacy regime.
References
 Abadi et al. (2016) Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, pages 308–318. ACM, 2016.
 Agarwal et al. (2018) Naman Agarwal, Ananda Theertha Suresh, Felix Xinnan Yu, Sanjiv Kumar, and Brendan McMahan. cpSGD: Communicationefficient and differentiallyprivate distributed SGD. In Advances in Neural Information Processing Systems. 2018.
 Balle and Wang (2018) Borja Balle and YuXiang Wang. Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 394–403. PMLR, 2018.
 Chaudhuri and Monteleoni (2008) Kamalika Chaudhuri and Claire Monteleoni. Privacypreserving logistic regression. In Neural Information Processing Systems, pages 289–296, 2008.
 Chaudhuri et al. (2011) Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12:1069–1109, 2011.
 Chaudhuri et al. (2012) Kamalika Chaudhuri, Anand Sarwate, and Kaushik Sinha. Nearoptimal differentially private principal components. In Advances in Neural Information Processing Systems 25, pages 989–997. 2012.
 Duchi et al. (2012) John Duchi, Michael Jordan, and Martin Wainwright. Privacy aware learning. In Advances in Neural Information Processing Systems, pages 1430–1438, 2012.
 Dwork (2008) Cynthia Dwork. Differential Privacy: A Survey of Results. In Theory and Applications of Models of Computation, volume 4978, pages 1–19, 2008.
 Dwork and Roth (2014) Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(34):211–407, 2014.
 Dwork et al. (2006a) Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: privacy via distributed noise generation. In Proceedings of the 24th annual international conference on The Theory and Applications of Cryptographic Techniques, EUROCRYPT’06, pages 486–503. SpringerVerlag, 2006a.
 Dwork et al. (2006b) Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, volume 3876 of Lecture Notes in Computer Science, pages 265–284. Springer Berlin / Heidelberg, 2006b.

Ge et al. (2018)
Jason Ge, Zhaoran Wang, Mengdi Wang, and Han Liu.
Minimaxoptimal privacypreserving sparse pca in distributed systems.
In
Proceedings of the TwentyFirst International Conference on Artificial Intelligence and Statistics
, volume 84 of Proceedings of Machine Learning Research, pages 1589–1598. PMLR, 09–11 Apr 2018.  Geng and Viswanath (2016a) Quan Geng and Pramod Viswanath. Optimal noise adding mechanisms for approximate differential privacy. IEEE Transactions on Information Theory, 62(2):952–969, Feb 2016a.
 Geng and Viswanath (2016b) Quan Geng and Pramod Viswanath. The optimal noiseadding mechanism in differential privacy. IEEE Transactions on Information Theory, 62(2):925–951, Feb 2016b.
 Geng et al. (2015) Quan Geng, Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The staircase mechanism in differential privacy. IEEE Journal of Selected Topics in Signal Processing, 9(7):1176–1184, Oct 2015.

Ghosh et al. (2009)
Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan.
Universally utilitymaximizing privacy mechanisms.
In
Proceedings of the 41st annual ACM symposium on Theory of computing
, STOC ’09, pages 351–360. ACM, 2009.  Gupte and Sundararajan (2010) Mangesh Gupte and Mukund Sundararajan. Universally optimal privacy mechanisms for minimax agents. In Symposium on Principles of Database Systems, pages 135–146, 2010.
 Jain et al. (2012) Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning. In Proceedings of the 25th Annual Conference on Learning Theory, volume 23 of Proceedings of Machine Learning Research, pages 24.1–24.34. PMLR, 25–27 Jun 2012.
 Jain et al. (2018) Prateek Jain, Om Dipakbhai Thakkar, and Abhradeep Thakurta. Differentially private matrix completion revisited. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2215–2224. PMLR, 10–15 Jul 2018.
 Mironov (2017) Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263–275, Aug. 2017.
 Park et al. (2017) Mijung Park, James Foulds, Kamalika Chaudhuri, and Max Welling. DPEM: Differentially Private Expectation Maximization. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 896–904. PMLR, 2017.
 Phan et al. (2016) NgocSon Phan, Yue Wang, Xintao Wu, and Dejing Dou. Differential privacy preservation for deep autoencoders: an application of human behavior prediction. In AAAI, 2016.
 Sheffet (2018) Or Sheffet. Locally private hypothesis testing. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4605–4614. PMLR, 10–15 Jul 2018.
 Shokri and Shmatikov (2015) Reza Shokri and Vitaly Shmatikov. Privacypreserving deep learning. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security, CCS ’15, pages 1310–1321. ACM, 2015.
 SoriaComas and DomingoFerrer (2013) Jordi SoriaComas and Josep DomingoFerrer. Optimal dataindependent noise for differential privacy. Information Sciences, 250:200 – 214, 2013.
Comments
There are no comments yet.