Smoothed Differential Privacy

07/04/2021
by   Ao Liu, et al.
Rensselaer Polytechnic Institute
0

Differential privacy (DP) is a widely-accepted and widely-applied notion of privacy based on worst-case analysis. Often, DP classifies most mechanisms without external noise as non-private [Dwork et al., 2014], and external noises, such as Gaussian noise or Laplacian noise [Dwork et al., 2006], are introduced to improve privacy. In many real-world applications, however, adding external noise is undesirable and sometimes prohibited. For example, presidential elections often require a deterministic rule to be used [Liu et al., 2020], and small noises can lead to dramatic decreases in the prediction accuracy of deep neural networks, especially the underrepresented classes [Bagdasaryan et al., 2019]. In this paper, we propose a natural extension and relaxation of DP following the worst average-case idea behind the celebrated smoothed analysis [Spielman and Teng, 2004]. Our notion, the smoothed DP, can effectively measure the privacy leakage of mechanisms without external noises under realistic settings. We prove several strong properties of the smoothed DP, including composability, robustness to post-processing and etc. We proved that any discrete mechanism with sampling procedures is more private than what DP predicts. In comparison, many continuous mechanisms with sampling procedures are still non-private under smoothed DP. Experimentally, we first verified that the discrete sampling mechanisms are private in real-world elections. Then, we apply the smoothed DP notion on quantized gradient descent, which indicates some neural networks can be private without adding any extra noises. We believe that these results contribute to the theoretical foundation of realistic privacy measures beyond worst-case analysis.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/27/2019

Reviewing and Improving the Gaussian Mechanism for Differential Privacy

Differential privacy provides a rigorous framework to quantify data priv...
05/28/2019

Differential Privacy Has Disparate Impact on Model Accuracy

Differential privacy (DP) is a popular mechanism for training machine le...
06/05/2021

Numerical Composition of Differential Privacy

We give a fast algorithm to optimally compose privacy guarantees of diff...
06/08/2021

Private Counting from Anonymous Messages: Near-Optimal Accuracy with Vanishing Communication Overhead

Differential privacy (DP) is a formal notion for quantifying the privacy...
05/17/2021

Gradient Masking and the Underestimated Robustness Threats of Differential Privacy in Deep Learning

An important problem in deep learning is the privacy and security of neu...
01/04/2017

Private Incremental Regression

Data is continuously generated by modern data sources, and a recent chal...
08/09/2021

Canonical Noise Distributions and Private Hypothesis Tests

f-DP has recently been proposed as a generalization of classical definit...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Differential privacy (DP) is a widely-used and widely-accepted notion of privacy, which is a de facto measure of privacy in academia and industry. DP is often achieved by adding external noises to published information (Dwork et al., 2014). However, external noises are procedurally or practically unacceptable in many real-world applications. For example, presidential elections often require a deterministic rule to be used (Liu et al., 2020). Notice that even under a deterministic mechanism (voting rule), the overall procedure of election is intrinsically randomized due to internal noises, as illustrated in the following example.

A Motivating Example.

Due to COVID-19, many voters in the 2020 US presidential election chose to submit their votes by mail. Unfortunately, it was estimated that the US postal service might have lost up to 300,000 mail-in ballots (

of all votes) (Bogage and Ingraham, 2020). For the purpose of illustration, suppose these votes are distributed uniformly at random, and the histogram of votes is announced after the election day. Should publish the histogram be viewed as a significant threat to the privacy of their votes?

According to DP, publishing the histogram poses a significant threat to privacy, because in a worst-case scenario, where all votes are for republicans except one for democrats, the agent’s vote to the democratic party is non-private because the published histogram provides information about the vote (Dwork et al., 2014). More precisely, the privacy parameter , which is much worse than the threshold of private mechanisms ( is the number of agents, see Section 2 for the formal definition of ). Moreover, in this (worst) case, the utility of adversaries is large (, see Section 2 for the formal definition of utility), which means the adversaries can make accurate predictions about every agent’s preferences. Nevertheless, the worst-case never happened even approximately in the modern history of US presidential elections. In fact, no candidates get more than 70% of the vote since 1920 (Leip, 2021), when the progressive party dissolved and there are no powerful parties other than democrats and republicans afterwards (Wikipedia contributors, 2021).

Suppose of the votes were randomly lost in the presidential elections of each year since 1920, we present the adversary’s utility of predicting the unknown votes in Figure 1. It can be seen that the adversary has very limited utility (at the order of to , always smaller than the threshold of private mechanisms ), which means that the adversary cannot learn much from the published histogram of votes. We also observe an interesting decreasing trend in , which implies that the elections are becoming more private. This is mostly due to the growth of voting population, which is exponentially related to the adversary’s utility (Theorem 2, notice that the y-axis is in log scale). In Appendix A, we show the elections are still private when only of votes got lost. Figure 1: The privacy level of US presidential elections

As another example, for deep neural networks (DNNs), even adding slight noise can lead to dramatic decreases in the prediction accuracy of DNNs, especially when predicting underrepresented classes (Bagdasaryan et al., 2019)

. Internal noises also widely exist in machine learning, for example, in the standard practice of cross-validation as well as in training (e.g., the batch-sampling when training DNNs).

As shown in these examples, the worst-case privacy according to DP might be too strong to serve as a practical measure for evaluating and comparing mechanisms without external noises in real-world applications. This motivates us to ask the following question.

How can we measure privacy for mechanisms without external noise under realistic models?

The question is highly challenging, as the answer should be less stringent than the worst-case analysis as in DP. Average-case analysis is a natural candidate, but since “all models are wrong” (Box, 1979), any privacy measure designed for a certain distribution over data may not work well for other distributions. Moreover, ideally the new measure should satisfy the desirable properties that played a central role behind the success of DP, including composability and robustness to post-processing. These properties make it easier for the mechanisms designers to figure out the privacy level of mechanisms.

We believe that the smoothed analysis (Spielman, 2005) provides a promising framework for addressing this question. Smoothed analysis is an extension and combination of worst-case and average-case analyses that inherits advantages of both. It measures the expected performance of algorithms under slight random perturbations of worst-case inputs. Compared with the average-case analysis, the assumptions of the smoothed analysis are much more natural. Compared with the worst-case analysis, the smoothed analysis can better describe the real-world performance of algorithms. For example, it successfully explained why some algorithms with exponential worst-case complexity (e.g., simplex algorithm) are faster than some polynomial algorithms in practice.

Our Contributions. The main merit of this paper is a new notion of privacy for mechanisms without external noises, called smoothed differential privacy (smoothed DP or SDP for short), which applies smoothed analysis to the privacy parameter (Definition 2) as a function of the database . In our model, the “ground truth” distribution of agents is from a set of distributions over data points, on top of which the nature adds random noises. Formally, our smoothed analysis of is defined as

where means that for every , the -th entry in the database follows the distribution .

Theoretically, we prove that the smoothed DP satisfies many desirable properties, including two properties also satisfied by the standard DP: robustness to post-processing (Proposition 2) and composability (Proposition 3). Beyond that, we prove two unique properties for smoothed DP, called pre-processing (Proposition 4) and distribution reduction (Proposition 5),which makes it easier for the mechanism designers to figure out the privacy level when the set of distributions is hard to estimate. Under smoothed DP, we found that many discrete mechanisms without external noise (and with small internal noise) are significantly more private than guaranteed by DP. For example, the sampling-histogram mechanism in our motivating example has an exponentially small (Theorem 2), which implies the mechanism protects voters’ privacy of their votes in elections—and this is in accordance with the observation on US election data in the motivating example. We also note that the sampling-histogram mechanism is widely used in machine learning (e.g., the SGD in quantized DNNs). In comparison, the smoothed DP implies a similar privacy level as the standard DP in many continuous mechanisms. We proved that the smoothed DP and the standard DP imply the same privacy level for the widely-used sampling-average mechanism when the inputs are continuous (Theorem 3).

Experimentally, we numerically evaluate the privacy level of the sampling-histogram mechanism using US presidential election data. Simulation results show an exponentially small , which is in accordance with our Theorem 2. Our second experiment shows that a one-step stochastic gradient descent (SGD) in quantized DNNs (Banner et al., 2018; Hubara et al., 2017) also has an exponentially small . This result implies that SGD with gradient quantization can already be private without adding any external noise. In comparison, the standard DP notion always requires extra (external) noise to make the network private at the cost of significant reduction in accuracy.

Related Work and Discussions. There is a large body of literature in the theory and practice of DP and its extensions. We believe that the smoothed DP introduced in this paper is novel. To the best of our knowledge, it appears to be most similar with distributional DP (Bassily et al., 2013), which measures privacy given the adversary’s (probabilistic) belief about the data he/she is interested in. Our smoothed DP is different both conceptually and technically. The adversary in distributional DP only has probabilistic information about the database and is much weaker than the smoothed DP adversary, who has complete information. Technically, distributional DP considers randomness in both the mechanism and the adversary’s belief about the database, while smoothed DP only considers the randomness in the dataset (generated by Nature). We prove that smoothed DP servers as an upper bound to distributional DP (Proposition 1).

Rényi DP (Mironov, 2017), Gaussian DP (Dong et al., 2019) and Concentrated DP (Bun and Steinke, 2016; Dwork and Rothblum, 2016) target to provide tighter privacy bounds for the adaptive mechanisms. Those three notions generalized the measure of distance between distributions to other divergence measures. Bayesian DP (Triastcyn and Faltings, 2020) tries to provide an “affordable” measure of privacy that requires less external noises than DP. With similar objectives, Bun and Steinke (2019) adds noises according to the average sensitivity instead of the worst-case sensitivity required by DP. However, external noises are required in (Bun and Steinke, 2019) and (Triastcyn and Faltings, 2020).

Quantized neural networks (Marchesi et al., 1993; Tang and Kwan, 1993; Balzer et al., 1991; Fiesler et al., 1990) are initially designed to make hardware implementations of DNNs easier. In the recent decade, quantized neural networks becomes a research hotspot again owing to its growing applications on mobile devises (Hubara et al., 2016, 2017; Guo, 2018). In quanitized neural networks, the weights (Anwar et al., 2015; Kim and Smaragdis, 2016; Zhou et al., 2017; Lin et al., 2016, 2017)

, activation functions 

(Vanhoucke et al., 2011; Courbariaux et al., 2015; Rastegari et al., 2016; Mishra et al., 2018) and/or gradients (Seide et al., 2014; Banner et al., 2018; Alistarh et al., 2017; Du et al., 2020) are quantized. When the gradients are quantized, both the training and inference of DNN are accelerated (Banner et al., 2018; Zhu et al., 2020). Gradient quantization can also save the communication cost when the DNNs are trained on distributed systems (Guo, 2018).

The smoothed analysis (Spielman and Teng, 2004) is a widely-accepted analysis tool in machine learning (Kalai et al., 2009; Manthey and Röglin, 2009), computational social choice (Xia, 2020; Baumeister et al., 2020; Xia, 2021; Liu and Xia, 2021), and etc. (Brunsch et al., 2013; Bhaskara et al., 2014; Blum and Dunagan, 2002). In differential privacy literature, the smoothed analysis is a widely-accepted tool to calculate the sensitivity of mechanisms under realistic setting (Nissim et al., 2007; Bun and Steinke, 2019). The analysis of sensitivity plays a central role in the procedure of adding external noises (usually is Laplacian or Gaussian). However, the above-mentioned smoothed analysis of sensitivity has many pitfalls in real-world applications Steinke and Ullman (2020). We also note that even using smoothed analysis on the sensitivity, the external noises are still required by private mechanisms under DP.

2 Differential Privacy and Its Interpretations

In this paper, we use to denote the number of records (entries) in a database , and denotes the set of all data. also represents the number of agents when one individual can only contribute one record. Let denote the number of different records (the distance) between database and . We say that two databases are neighboring, if they contain no more than one different entry.

Definition 1 (Differential privacy).

Let denote a randomized algorithm and be a subset of the image space of . is said to be -differentially private for some , if for any and any pair of inputs and such that ,

Notice that the randomness comes from the mechanism in the worst case .

DP guarantees immunity to many kinds of attacks (e.g., linkage attacks (Nguyen et al., 2013) and reconstruction attacks (Dwork et al., 2014)). Take reconstruction attacks for example, where the adversary has access to a subset of the database (such information may come from public database, social media, etc.). In an extreme situation, an adversary knows all but one agent’s records. To protect the data of every single agent, DP uses as a common requirement of private mechanisms (Page 18 of (Dwork et al., 2014)). Next, we recall two common views on how DP helps protect privacy even in the extreme situation of reconstruction attacks.

View 1: DP guarantees the prediction of adversary cannot be too accurate (Wasserman and Zhou, 2010; Kairouz et al., 2015). Assume that the adversary knows all entries except the -th. Let denote the database with its -th entry removed. With the information provided by the output , the adversary can infer the missing entry by testing the following two hypothesis:

: The missing entry is (or equivalently, the database is ).

: The missing entry is (or equivalently, the database is ).

Suppose that after observing the output of , the adversary uses a rejection region rule for hypothesis testing111The adversary can use any decision rule, and the rejection region rule is adopted just for example., where is rejected if and only if the output is in the rejection region . For any fixed

, the decision rule can be wrong in two possible ways, false positive (Type I error) and false negative (Type II error). Thus, the Type I error rate is

. Similarly, the Type II error rate is . According to the definition of DP, for any neighboring and , the adversary always has

which implies that and cannot be small at the same time. When and are both small, both and becomes close to (the error rates of random guess), which means the adversary cannot get much information from the output of .

View 2: With probability at least

-, is insensitive to the change of one record (Dwork et al., 2014).
In more detail, -DP guarantees the distribution of ’s output will not change significantly when one record changes. Here, “change” corresponds to add, remove or replace one record of the database. Mathematically, Page 18 of Dwork et al. (2014) showed that given any pair of neighboring databases and ,

where the probability222The probability notion used in (Dwork et al., 2014) is different from the standard definition of probability. See Appendix B for formal descriptions. is taken over (the output of ). The above inequality shows that the change of one record cannot make an output significantly more likely or significantly less likely (with at least probability). Dwork et al. (2014) (Page 25) also claims the above formula guarantees that the adversary cannot learn too much information about any single record of the database through observing the output of .

3 Smoothed Differential Privacy

Recall that DP is based on worst-case analysis over all possible databases. However, as described in the motivating example, the privacy of worst-case sometimes cannot represent the overall privacy leakage of real-world databases. In this section, we introduce smoothed DP, which applies the smoothed analysis (instead of the worst-case analysis) to the privacy parameter . All missing proofs of this section can be found in Appendix E.

3.1 The database-wise privacy parameter

We first introduce the database-wise parameter , which measures the privacy leakage of mechanism when its input is .

Definition 2 (Database-wise privacy parameter ).

Let denote a randomized mechanism. Given any database and any , define the database-wise privacy parameter as:

where .

In Lemma 8, we will show that measures the utility of adversaries.

DP as the worst-case analysis of . In the next theorem, we show that the worst-case analysis notion of is equivalent to the standard notion of DP.

Theorem 1 (DP in ).

Mechanism is -differentially private if and only if,

Next, we present three views to illustrate that can be seen as natural bounds on the privacy leakage at data . The first two views are similar as the two common views about DP in Section 2.

View 1: bounds the adversary’s prediction accuracy when the database is .
We consider the same setting as the view 1 of DP. According to the same reasoning, for a fixed database , we have,

Then, by the definition , we have,

which means and cannot be small at the same time when the database is .

View 2: With probability at least -, is insensitive to the change of one record.
Given any mechanism , any and any pair of neighboring databases , we have

The rigid claims of this view can be found in Appendix B.

3.2 The formal definition of smoothed DP

With the database-wise privacy parameter defined in the last subsection, we formally define the smoothed DP, where the worst-case “ground truth” distribution of every agent is allowed to be any distribution from a set , on top of which the nature adds random noises to generate the database. Formally, the smoothed differential privacy is defined as follows.

Definition 3 (Smoothed DP).

Let be a set of distributions over . We say is -smoothed differentially private if,

where means that for every , the -th entry in the database follows .

In the following three statements, we show that the smoothed DP can defend the reconstruction attacks in similar ways of DP under realistic settings.

Smoothed DP guarantees the prediction of the adversary cannot be too accurate under realistic settings. As our database-wise privacy parameter bounds the Type I and Type II errors when the input is . Then, the smoothed DP, which is a smoothed analysis of can bound the smoothed Type I and Type II errors. Mathematically, a -smoothed DP mechanism can guarantee

Under realistic settings, is insensitive to the change of one record with at least - probability. Mathematically, a -smoothed DP mechanism guarantees

As smoothed DP replaces the worst-case analysis to smoothed analysis, we also view as a requirement for private mechanisms for smoothed DP.

4 Properties of Smoothed DP

We first reveal a relationship between smoothed DP, DP, and distributional DP (DDP, see Definition 4 in Appendix C for its formal definition) (Bassily et al., 2013).

Proposition 1 (Dp Smoothed DP Ddp).

Given any mechanism with domain and any set of distributions over ,
If is -DP, then, is also -smoothed DP.
If is -smoothed DP, then, is also -DDP.

The above proposition shows that DP can guarantee smoothed DP, and smoothed DP can guarantee DDP. The proof and additional discussions about Proposition 1 can be found in Appendix C.

Next, we present four properties of smoothed DP and discuss how they can help mechanism designers figure out the smoothed DP level of mechanisms. We first present the robustness to post-processing property, which says no function can make a mechanism less private without adding extra knowledge about the database. The post-processing property of smoothed DP can be used to upper bound the privacy level of many mechanisms. With it, we know a private data-preprocessing can guarantee the privacy of the whole mechanism. Then, the rest part of the mechanisms can be arbitrarily designed. The proof of all four properties of the smoothed DP can be found in Appendix F.

Proposition 2 (Post-processing).

Let be a -smoothed DP mechanism. For any (which can also be randomized), is also -smoothed DP.

Then, we introduce the composition theorem for the smoothed DP, which bounds the smoothed DP property of databases when two or more mechanisms publish their outputs about the same database.

Proposition 3 (Composition).

Let be an -smoothed DP mechanism for any . Define as . Then, is -smoothed DP.

In practice,

might be hard to accurately characterized. The following proposition introduces the pre-processing property of smoothed DP, which says the distribution of data can be replaced by the distribution of features (extracted using any deterministic function). For example, in deep learning, the distribution of data can be replaced by the distribution of gradients, which is usually much easier to estimate in real-world training processes. More technically, the pre-processing property guarantees that any deterministic way of data-preprocessing is not harmful to privacy. To simplify notations, we let

be the distribution of where . For any set of distributions , we let .

Proposition 4 (Pre-processing for deterministic functions).

Let be a deterministic function and be a randomized mechanism. Then, is -smoothed DP if is -smoothed DP.

The following proposition shows that any two sets of distributions with the same convex hull have the same privacy level under smoothed DP. With this theorem, the mechanism designers can ignore all inner points and only consider the vertices of convex hull when calculating the privacy level of mechanisms. Let denote the convex hull of .

Proposition 5 (Distribution reduction).

Given any and any and such that , a anonymous mechanism is -smoothed DP if and only if is -smoothed DP.

5 Use Smoothed DP as a Measure of Privacy

In this section, we use smoothed DP to measure the privacy of some commonly-used mechanisms, where the randomness are intrinsic and unavoidable (as opposed to external noises such as Gaussian or Laplacian noises). Our analysis focus on two widely-used algorithms where the intrinsic randomnesses comes from sampling (without replacement). We also compare the privacy levels of smoothed DP with DP. All missing proofs of this section are presented in Appendix G.

5.1 Discrete mechanisms are more private than what DP predicts

In this section, we study the smoothed DP property of (discrete) sampling-histogram mechanism (SHM), which is widely used as a pre-possessing step in many real-world applications like the training of DNNs. As smoothed DP satisfies post-processing (Proposition 2) and pre-processing (Proposition 3), the smoothed DP property of SHM can upper bound the smoothed DP of many mechanisms used in practice, which are based on SHM.

SHM first sample data from the database and then output the histogram of the samples. Formally, we define the sampling-histogram mechanism in Algorithm  1. Note that we require all data in the database to be chosen from a finite set . 11:  Inputs: A finite set , the number of samples and a database where for all 2:  Randomly sample data from without replacement. The sampled data are denoted by . 3:  Output: The histogram Algorithm 1 Sampling-histogram mechanism

Smoothed DP of mechanisms based on SHM. The smoothed DP of SHM can be used to upper bound the smoothed DP of the following groups of mechanisms. The first group is deterministic voting rules as presented in the motivating example in Introduction. The sampling procedure in SHM mimics the votes that got lost. The second group is machine learning algorithms based on randomly-sampled training data, such as cross-validation. The (random) selection of the training data corresponds to SHM. Notice that many training algorithms are essentially based on the histogram of the training data (instead of the ordering of data points). Therefore, overall the training procedure can be viewed as SHM plus a post-processing function (the learning algorithm). Consequently, the smoothed DP of SHM can be used to upper bound the smoothed DP of such procedures. The third group is the SGD of DNNs with gradient quantization (Zhu et al., 2020; Banner et al., 2018), where the gradients are rounded to 8-bit in order to accelerate the training and inference of DNNs. The smoothed DP of SHM can be used to bound the privacy leakage in each SGD step of the DNN, where a batch (subset of the training set) is firstly sampled and the gradient is the average of the gradients of the sampled data.

DP vs. Smoothed DP for SHM. We are ready to present the main theorem of this paper, which indicates that SHM is very private under some mild assumptions. We say distribution is strictly positive, if there exists a positive constant such that for any in the support of . A set of distributions is strictly positive is there exists a positive constant such that every is strictly positive (by ). The strictly positive assumption is often considered mild in elections (Xia, 2020) and discrete machine learning (Laird and Saul, 1994), even though it may not holds for every step of SGD.

Theorem 2 (DP vs. Smoothed DP for SHM).

For any SHM, denoted by , given any strictly positive set of distribution , any finite set , and any , we have:
(Smoothed DP) is -smoothed DP for any .333This exponential upper bound of is also affected by and , where is the cardinality of . In general, a smaller or a larger results in a larger . See Appendix G.1 for our detailed discussions.
(Tightness of smothed DP bound) For any , there does not exist such that is -smoothed DP.

(DP) For any , there does not exist such that is -DP.

The above theorem says the privacy leakage is exponentially small under real-world application scenarios. In comparison, DP cares too much about the extremely rare cases and predicts a constant privacy leakage. Also, note that our theorem allows to be at the same order of . For example, when setting , SHM is -smoothed DP, which is an acceptable privacy threshold in many real-world applications. We also proved similar bounds for the SHM with replacement in Appendix H.

5.2 The smoothed DP predicts similar privacy level as DP for continuous mechanisms

In this section, we show that the sampling mechanisms with continuous support is still not privacy-preserving under smoothed DP. Our result indicates that the neural networks without quantized parameters are not private without external noise (i.e., the Gaussian or Laplacian noise). 11:  Inputs: The number of samples and a database where for all 2:  Randomly sample data from without replacement. The sampled data are denoted as . 3:  Output: The average Algorithm 2 Continuous sampling average

We use the sampling-average (Algorithm 2) algorithm as the standard algorithm for continuous mechanisms. Because sampling-average can be treated as SHM plus an average step, sampling-average is non-private also means SHM with continuous support is also non-private according to the post-processing property of smoothed DP.

Theorem 3 (Smoothed DP for continuous sampling average).

For any continuous sampling average algorithm , given any set of strictly positive444Distribution is strictly positive by if for any in the support of , where is the PDF of . distribution over , any and any , there does not exist such that is -smoothed DP.

6 Experiments

Smoothed DP in elections. We use a similar setting as the motivating example, where of the votes are randomly lost. We numerically calculate the parameter of smoothed DP. Here, the set of distributions includes the distribution of all 57 congressional districts of the 2020 presidential election. Using the distribution reduction property of smoothed DP (Proposition 5), we can remove all distributions in except DC and NE-2555DC refers to Washington, D.C. and NE-2 refers to Nebraska’s 2nd congressional district., which are the vertices for the convex hull of . Figure 2 (left) shows that the smoothed parameter is exponentially small in when , which matches our Theorem 2. We find that is also exponentially small when or , which indicates that the sampling-histogram mechanism is more private than DP’s predictions under our settings. Also, see Appendix I for experiments with different settings on and different ratios of lost votes.

Figure 2: DP and smoothed DP (SDP) under realistic settings. In both plots, the vertical axes of both plots are in log-scale and the pink dashed line presents the parameter of DP with whatever . The left plot is an accurate calculation of

. The shaded area shows the 99% confidence interval (CI) of the right plot.

SGD with 8-bit gradient quantization.

According to the pre-processing property of smoothed DP, the smoothed DP of (discrete) sampling average mechanism upper bounds the smoothed DP of SGD (for one step). In 8-bit neural networks for computer vision tasks, the gradient usually follows Gaussian distributions 

(Banner et al., 2018). We thus let the set of distributions , where denotes the 8-bit quantized Gaussian distribution (See Appendix I

for its formal definition). The standard variation, 0.12, is same as the standard variation of gradients in a ResNet-18 network trained on CIFAR-10 dataset 

(Banner et al., 2018). We use the standard setting of batch size . Figure 2 (right) shows that the smoothed parameter is exponentially small in for the SGD with 8-bit gradient quantization. This result implies that the neural networks trained through quantized gradients can be private without adding external noises.

7 Conclusions and Future Works

We propose a novel notion to measure the privacy leakage of mechanisms without external noises under realistic settings. One promising next step is to apply our smoothed DP notion to the entire training process of quantized DNNs. Is the quantized DNN private without external noise? If not, what level of external noises needs to be added, and how should we add noises in an optimal way? More generally, we believe that our work has the potential of making many algorithms private without requiring too much external noise.

References

  • Alistarh et al. [2017] Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. Qsgd: Communication-efficient sgd via gradient quantization and encoding. Advances in Neural Information Processing Systems, 30:1709–1720, 2017.
  • Anwar et al. [2015] Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung.

    Fixed point optimization of deep convolutional neural networks for object recognition.

    In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1131–1135. IEEE, 2015.
  • Bagdasaryan et al. [2019] Eugene Bagdasaryan, Omid Poursaeed, and Vitaly Shmatikov. Differential privacy has disparate impact on model accuracy. Advances in Neural Information Processing Systems, 32:15479–15488, 2019.
  • Balzer et al. [1991] Wolfgang Balzer, Masanobu Takahashi, Jun Ohta, and Kazuo Kyuma.

    Weight quantization in boltzmann machines.

    Neural Networks, 4(3):405–409, 1991.
  • Banner et al. [2018] Ron Banner, Itay Hubara, Elad Hoffer, and Daniel Soudry. Scalable methods for 8-bit training of neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 5151–5159, 2018.
  • Bassily et al. [2013] Raef Bassily, Adam Groce, Jonathan Katz, and Adam Smith. Coupled-worlds privacy: Exploiting adversarial uncertainty in statistical data privacy. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 439–448. IEEE, 2013.
  • Baumeister et al. [2020] Dorothea Baumeister, Tobias Hogrebe, and Jörg Rothe. Towards reality: Smoothed analysis in computational social choice. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pages 1691–1695, 2020.
  • Bhaskara et al. [2014] Aditya Bhaskara, Moses Charikar, Ankur Moitra, and Aravindan Vijayaraghavan.

    Smoothed analysis of tensor decompositions.

    In

    Proceedings of the forty-sixth annual ACM symposium on Theory of computing

    , pages 594–603, 2014.
  • Blum and Dunagan [2002] Avrim Blum and John Dunagan.

    Smoothed analysis of the perceptron algorithm for linear programming

    .
    Carnegie Mellon University, 2002.
  • Bogage and Ingraham [2020] Jacob Bogage and Christopher Ingraham. Usps ballot problems unlikely to change outcomes in competitive states. The Washington Post, 2020.
  • Box [1979] George EP Box. Robustness in the strategy of scientific model building. In Robustness in statistics, pages 201–236. Elsevier, 1979.
  • Brunsch et al. [2013] Tobias Brunsch, Kamiel Cornelissen, Bodo Manthey, and Heiko Röglin. Smoothed analysis of belief propagation for minimum-cost flow and matching. In International Workshop on Algorithms and Computation, pages 182–193. Springer, 2013.
  • Bun and Steinke [2016] Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer, 2016.
  • Bun and Steinke [2019] Mark Bun and Thomas Steinke. Average-case averages: Private algorithms for smooth sensitivity and mean estimation. Advances in Neural Information Processing Systems, 2019.
  • Courbariaux et al. [2015] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: training deep neural networks with binary weights during propagations. In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2, pages 3123–3131, 2015.
  • Dong et al. [2019] Jinshuo Dong, Aaron Roth, and Weijie J Su. Gaussian differential privacy. arXiv preprint arXiv:1905.02383, 2019.
  • Du et al. [2020] Yuqing Du, Sheng Yang, and Kaibin Huang. High-dimensional stochastic gradient quantization for communication-efficient edge learning. IEEE Transactions on Signal Processing, 68:2128–2142, 2020.
  • Dwork and Rothblum [2016] Cynthia Dwork and Guy N Rothblum. Concentrated differential privacy. arXiv preprint arXiv:1603.01887, 2016.
  • Dwork et al. [2006] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 486–503. Springer, 2006.
  • Dwork et al. [2014] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014.
  • Fiesler et al. [1990] Emile Fiesler, Amar Choudry, and H John Caulfield. Weight discretization paradigm for optical neural networks. In Optical interconnections and networks, volume 1281, pages 164–173. International Society for Optics and Photonics, 1990.
  • Guo [2018] Yunhui Guo. A survey on methods and theories of quantized neural networks. arXiv preprint arXiv:1808.04752, 2018.
  • Hubara et al. [2016] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pages 4114–4122, 2016.
  • Hubara et al. [2017] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research, 18(1):6869–6898, 2017.
  • Kairouz et al. [2015] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. In International conference on machine learning, pages 1376–1385. PMLR, 2015.
  • Kalai et al. [2009] Adam Tauman Kalai, Alex Samorodnitsky, and Shang-Hua Teng. Learning and smoothed analysis. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages 395–404. IEEE, 2009.
  • Kim and Smaragdis [2016] Minje Kim and Paris Smaragdis. Bitwise neural networks. arXiv preprint arXiv:1601.06071, 2016.
  • Laird and Saul [1994] Philip Laird and Ronald Saul. Discrete sequence prediction and its applications. Machine learning, 15(1):43–68, 1994.
  • Leip [2021] Dave Leip. Dave Leip’s Atlas of the US Presidential Elections. Dave Leip, 2021. https://uselectionatlas.org/2020.php.
  • Lin et al. [2016] Darryl Lin, Sachin Talathi, and Sreekanth Annapureddy. Fixed point quantization of deep convolutional networks. In International conference on machine learning, pages 2849–2858. PMLR, 2016.
  • Lin et al. [2017] Xiaofan Lin, Cong Zhao, and Wei Pan. Towards accurate binary convolutional neural network. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 344–352, 2017.
  • Liu and Xia [2021] Ao Liu and Lirong Xia. The smoothed likelihood of doctrinal paradoxes. arXiv preprint arXiv:2105.05138, 2021.
  • Liu et al. [2020] Ao Liu, Yun Lu, Lirong Xia, and Vassilis Zikas. How private are commonly-used voting rules? In

    Conference on Uncertainty in Artificial Intelligence

    , pages 629–638. PMLR, 2020.
  • Manthey and Röglin [2009] Bodo Manthey and Heiko Röglin.

    Worst-case and smoothed analysis of k-means clustering with bregman divergences.

    In International Symposium on Algorithms and Computation, pages 1024–1033. Springer, 2009.
  • Marchesi et al. [1993] Michele Marchesi, Gianni Orlandi, Francesco Piazza, and Aurelio Uncini. Fast neural networks without multipliers. IEEE transactions on Neural Networks, 4(1):53–62, 1993.
  • Mironov [2017] Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263–275. IEEE, 2017.
  • Mishra et al. [2018] Asit Mishra, Eriko Nurvitadhi, Jeffrey J Cook, and Debbie Marr. Wrpn: Wide reduced-precision networks. In International Conference on Learning Representations, 2018.
  • Nguyen et al. [2013] Hiep H Nguyen, Jong Kim, and Yoonho Kim. Differential privacy in practice. Journal of Computing Science and Engineering, 7(3):177–186, 2013.
  • Nissim et al. [2007] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 75–84, 2007.
  • Rastegari et al. [2016] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision, pages 525–542. Springer, 2016.
  • Seide et al. [2014] Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In Fifteenth Annual Conference of the International Speech Communication Association, 2014.
  • Spielman [2005] Daniel A Spielman. The smoothed analysis of algorithms. In International Symposium on Fundamentals of Computation Theory, pages 17–18. Springer, 2005.
  • Spielman and Teng [2004] Daniel A Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM (JACM), 51(3):385–463, 2004.
  • Steinke and Ullman [2020] Thomas Steinke and Jonathan Ullman. The pitfalls of average-case differential privacy. DifferentialPrivacy.org, 07 2020. https://differentialprivacy.org/average-case-dp/.
  • Tang and Kwan [1993] Chuan Zhang Tang and Hon Keung Kwan. Multilayer feedforward neural networks with single powers-of-two weights. IEEE Transactions on Signal Processing, 41(8):2724–2727, 1993.
  • Triastcyn and Faltings [2020] Aleksei Triastcyn and Boi Faltings. Bayesian differential privacy for machine learning. In International Conference on Machine Learning, pages 9583–9592. PMLR, 2020.
  • Vanhoucke et al. [2011] Vincent Vanhoucke, Andrew Senior, and Mark Z Mao. Improving the speed of neural networks on cpus. In Deep Learning and Unsupervised Feature Learning NIPS Workshop, page 4, 2011.
  • Wasserman and Zhou [2010] Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375–389, 2010.
  • Wikipedia contributors [2021] Wikipedia contributors. Two-party system — Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Two-party_system&oldid=1023343859, 2021. [Online; accessed 28-May-2021].
  • Xia [2020] Lirong Xia. The Smoothed Possibility of Social Choice. In Proceedings of NeurIPS, 2020.
  • Xia [2021] Lirong Xia. How Likely Are Large Elections Tied? In Proceedings of ACM EC, 2021.
  • Zhou et al. [2017] Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044, 2017.
  • Zhu et al. [2020] Feng Zhu, Ruihao Gong, Fengwei Yu, Xianglong Liu, Yanfei Wang, Zhelong Li, Xiuqi Yang, and Junjie Yan. Towards unified int8 training for convolutional neural network. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    , pages 1969–1979, 2020.

References

  • Alistarh et al. [2017] Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. Qsgd: Communication-efficient sgd via gradient quantization and encoding. Advances in Neural Information Processing Systems, 30:1709–1720, 2017.
  • Anwar et al. [2015] Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. Fixed point optimization of deep convolutional neural networks for object recognition. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1131–1135. IEEE, 2015.
  • Bagdasaryan et al. [2019] Eugene Bagdasaryan, Omid Poursaeed, and Vitaly Shmatikov. Differential privacy has disparate impact on model accuracy. Advances in Neural Information Processing Systems, 32:15479–15488, 2019.
  • Balzer et al. [1991] Wolfgang Balzer, Masanobu Takahashi, Jun Ohta, and Kazuo Kyuma. Weight quantization in boltzmann machines. Neural Networks, 4(3):405–409, 1991.
  • Banner et al. [2018] Ron Banner, Itay Hubara, Elad Hoffer, and Daniel Soudry. Scalable methods for 8-bit training of neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 5151–5159, 2018.
  • Bassily et al. [2013] Raef Bassily, Adam Groce, Jonathan Katz, and Adam Smith. Coupled-worlds privacy: Exploiting adversarial uncertainty in statistical data privacy. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 439–448. IEEE, 2013.
  • Baumeister et al. [2020] Dorothea Baumeister, Tobias Hogrebe, and Jörg Rothe. Towards reality: Smoothed analysis in computational social choice. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pages 1691–1695, 2020.
  • Bhaskara et al. [2014] Aditya Bhaskara, Moses Charikar, Ankur Moitra, and Aravindan Vijayaraghavan. Smoothed analysis of tensor decompositions. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 594–603, 2014.
  • Blum and Dunagan [2002] Avrim Blum and John Dunagan. Smoothed analysis of the perceptron algorithm for linear programming. Carnegie Mellon University, 2002.
  • Bogage and Ingraham [2020] Jacob Bogage and Christopher Ingraham. Usps ballot problems unlikely to change outcomes in competitive states. The Washington Post, 2020.
  • Box [1979] George EP Box. Robustness in the strategy of scientific model building. In Robustness in statistics, pages 201–236. Elsevier, 1979.
  • Brunsch et al. [2013] Tobias Brunsch, Kamiel Cornelissen, Bodo Manthey, and Heiko Röglin. Smoothed analysis of belief propagation for minimum-cost flow and matching. In International Workshop on Algorithms and Computation, pages 182–193. Springer, 2013.
  • Bun and Steinke [2016] Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer, 2016.
  • Bun and Steinke [2019] Mark Bun and Thomas Steinke. Average-case averages: Private algorithms for smooth sensitivity and mean estimation. Advances in Neural Information Processing Systems, 2019.
  • Courbariaux et al. [2015] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: training deep neural networks with binary weights during propagations. In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2, pages 3123–3131, 2015.
  • Dong et al. [2019] Jinshuo Dong, Aaron Roth, and Weijie J Su. Gaussian differential privacy. arXiv preprint arXiv:1905.02383, 2019.
  • Du et al. [2020] Yuqing Du, Sheng Yang, and Kaibin Huang. High-dimensional stochastic gradient quantization for communication-efficient edge learning. IEEE Transactions on Signal Processing, 68:2128–2142, 2020.
  • Dwork and Rothblum [2016] Cynthia Dwork and Guy N Rothblum. Concentrated differential privacy. arXiv preprint arXiv:1603.01887, 2016.
  • Dwork et al. [2006] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 486–503. Springer, 2006.
  • Dwork et al. [2014] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014.
  • Fiesler et al. [1990] Emile Fiesler, Amar Choudry, and H John Caulfield. Weight discretization paradigm for optical neural networks. In Optical interconnections and networks, volume 1281, pages 164–173. International Society for Optics and Photonics, 1990.
  • Guo [2018] Yunhui Guo. A survey on methods and theories of quantized neural networks. arXiv preprint arXiv:1808.04752, 2018.
  • Hubara et al. [2016] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pages 4114–4122, 2016.
  • Hubara et al. [2017] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research, 18(1):6869–6898, 2017.
  • Kairouz et al. [2015] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. In International conference on machine learning, pages 1376–1385. PMLR, 2015.
  • Kalai et al. [2009] Adam Tauman Kalai, Alex Samorodnitsky, and Shang-Hua Teng. Learning and smoothed analysis. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages 395–404. IEEE, 2009.
  • Kim and Smaragdis [2016] Minje Kim and Paris Smaragdis. Bitwise neural networks. arXiv preprint arXiv:1601.06071, 2016.
  • Laird and Saul [1994] Philip Laird and Ronald Saul. Discrete sequence prediction and its applications. Machine learning, 15(1):43–68, 1994.
  • Leip [2021] Dave Leip. Dave Leip’s Atlas of the US Presidential Elections. Dave Leip, 2021. https://uselectionatlas.org/2020.php.
  • Lin et al. [2016] Darryl Lin, Sachin Talathi, and Sreekanth Annapureddy. Fixed point quantization of deep convolutional networks. In International conference on machine learning, pages 2849–2858. PMLR, 2016.
  • Lin et al. [2017] Xiaofan Lin, Cong Zhao, and Wei Pan. Towards accurate binary convolutional neural network. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 344–352, 2017.
  • Liu and Xia [2021] Ao Liu and Lirong Xia. The smoothed likelihood of doctrinal paradoxes. arXiv preprint arXiv:2105.05138, 2021.
  • Liu et al. [2020] Ao Liu, Yun Lu, Lirong Xia, and Vassilis Zikas. How private are commonly-used voting rules? In Conference on Uncertainty in Artificial Intelligence, pages 629–638. PMLR, 2020.
  • Manthey and Röglin [2009] Bodo Manthey and Heiko Röglin. Worst-case and smoothed analysis of k-means clustering with bregman divergences. In International Symposium on Algorithms and Computation, pages 1024–1033. Springer, 2009.
  • Marchesi et al. [1993] Michele Marchesi, Gianni Orlandi, Francesco Piazza, and Aurelio Uncini. Fast neural networks without multipliers. IEEE transactions on Neural Networks, 4(1):53–62, 1993.
  • Mironov [2017] Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263–275. IEEE, 2017.
  • Mishra et al. [2018] Asit Mishra, Eriko Nurvitadhi, Jeffrey J Cook, and Debbie Marr. Wrpn: Wide reduced-precision networks. In International Conference on Learning Representations, 2018.
  • Nguyen et al. [2013] Hiep H Nguyen, Jong Kim, and Yoonho Kim. Differential privacy in practice. Journal of Computing Science and Engineering, 7(3):177–186, 2013.
  • Nissim et al. [2007] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 75–84, 2007.
  • Rastegari et al. [2016] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi.

    Xnor-net: Imagenet classification using binary convolutional neural networks.

    In European conference on computer vision, pages 525–542. Springer, 2016.
  • Seide et al. [2014] Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In Fifteenth Annual Conference of the International Speech Communication Association, 2014.
  • Spielman [2005] Daniel A Spielman. The smoothed analysis of algorithms. In International Symposium on Fundamentals of Computation Theory, pages 17–18. Springer, 2005.
  • Spielman and Teng [2004] Daniel A Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM (JACM), 51(3):385–463, 2004.
  • Steinke and Ullman [2020] Thomas Steinke and Jonathan Ullman. The pitfalls of average-case differential privacy. DifferentialPrivacy.org, 07 2020. https://differentialprivacy.org/average-case-dp/.
  • Tang and Kwan [1993] Chuan Zhang Tang and Hon Keung Kwan. Multilayer feedforward neural networks with single powers-of-two weights. IEEE Transactions on Signal Processing, 41(8):2724–2727, 1993.
  • Triastcyn and Faltings [2020] Aleksei Triastcyn and Boi Faltings. Bayesian differential privacy for machine learning. In International Conference on Machine Learning, pages 9583–9592. PMLR, 2020.
  • Vanhoucke et al. [2011] Vincent Vanhoucke, Andrew Senior, and Mark Z Mao. Improving the speed of neural networks on cpus. In Deep Learning and Unsupervised Feature Learning NIPS Workshop, page 4, 2011.
  • Wasserman and Zhou [2010] Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375–389, 2010.
  • Wikipedia contributors [2021] Wikipedia contributors. Two-party system — Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Two-party_system&oldid=1023343859, 2021. [Online; accessed 28-May-2021].
  • Xia [2020] Lirong Xia. The Smoothed Possibility of Social Choice. In Proceedings of NeurIPS, 2020.
  • Xia [2021] Lirong Xia. How Likely Are Large Elections Tied? In Proceedings of ACM EC, 2021.
  • Zhou et al. [2017] Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044, 2017.
  • Zhu et al. [2020] Feng Zhu, Ruihao Gong, Fengwei Yu, Xianglong Liu, Yanfei Wang, Zhelong Li, Xiuqi Yang, and Junjie Yan. Towards unified int8 training for convolutional neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1969–1979, 2020.

Appendix A Additional Discussions about the Motivating Example

a.1 Detailed setting about Figure 1

In the motivating example, the adversaries’ utility represent , where is the adjusted utility defined above Lemma 8. We set the threshold of accuracy in Figure 1. In other words, the adversary omitted the utilities coming from the predictors with no more than accuracy. To compare the privacy of different years, we assume that the US government only publish the number of votes from top-2 candidate. The parameter plotted in Figure 1 follows the database-wise privacy parameter in Definition 2, where . One can see that the threshold , which matches the setting of Lemma 8. In all experiments related to our motivating example, we directly calculated the probabilities and our numerical results does not include any randomness.

a.2 The motivating example under other settings

Figure 3 presents different settings for the privacy level of US presidential elections (the motivating example). In all our settings, we set . Figure 3 shows similar information as Figure 1 under different settings (threshold of accuracy and ratio of lost votes or ). In all settings of Figure 3, the US presidential election is much more private than what DP predicts. Figure 3(d) shows that the US presidential election is also private ( and ) when there are only of votes got lost.

Figure 3: The privacy level of US presidential elections under different settings.

Appendix B Detailed explanations about DP and smoothed DP

In this section, we present the formal definition of “probabilities” used in The View 3 of our interpretations of DP and smoothed DP. To simplify notations, we define the

-approximate max divergence between two random variables

and as,

where represents the support of random variable . Especially, we use to represent the , which is the max divergence. One can see that a mechanism is -DP if and only if for any pair of neighboring databases : . The next lemma shows the connection between and .

Lemma 6 (Lemma 3.17 in (Dwork et al., 2014)).

if and only if there exists a random variable such that and , where .

In (Dwork et al., 2014), “ happens with no more than probability” means that there exists a random variable such that and . In other words, this means: with modifying the distribution of by at most (for its mass), the probability ratio between and the modified distribution is always upper bounded by . By now, we explained the meaning of “probability” in (Dwork et al., 2014).

Appendix C Relationship between Smoothed DP and other privacy notions

As the smoothed DP is the smoothed analysis of while DP is its worst-case analysis, DP smoothed DP follows intuitively. In this section, we focus on showing that Smoothed DP is an upper bound of DDP. To simplify notations, we let (or ) to denote the database (or distributions) such that only the -th entry is removed.

Definition 4 (Distributional Differential Privacy (DDP)).

A mechanism is -distributional differentially private (DDP) if there is a simulator Sim such that for any , any , any data and any be a subset of the image space of ,

Proposition 7 (Smoothed DP Ddp).

Any -Smoothed DP mechanism is always -DDP.

Proof.

According to the definition of Smoothed DP, we have that

where

Similarly, we may rewrite the DDP conditions as , where

Next, we compare and .

where for any set of outputs , simulator ’s distribution of outputs are defined as follows,

By symmetry, we also have and the proposition follows by the definition of DDP. ∎

Appendix D An Additional View of Smoothed DP

View 3: tightly bounds Bayesian adversaries’ utilities.
We consider the same adversary as in View 1. Since the adversary has no information about the missing entry, he/she may assume a uniform prior distribution about the missing entry. Let denote the missing entry. Observing output from mechanism , the adversary’s posterior distribution is

A Bayesian predictor predicts the missing entry

through maximizing the posterior probability. Then, for the uniform prior, when the output is

, the 0/1 loss of the Bayesian predictor is defined as follows, where a correct prediction has zero loss and any incorrect prediction has loss one.

Then, we define the adjusted utility of adversary (in Bayesian prediction), which is the expectation of a normalized version of . Mathematically, for a database , we define the adjusted utility with threshold as follows,

In short, is the worst case expectation of while the contribution from predictors with loss larger than is omitted. Especially, when the threshold , an always correct predictor () has utility and a random guess predictor () has utility . For example, we let and consider the coin-flipping mechanism with support , which output with probability and output with probability . When , the entry is non-private because the adversary can directly learn it from the output of . Correspondingly, the adjusted utility of adversary is for any threshold . When , the mechanism give a output uniformly at random from . In this case, the output of cannot provide any information to the adversary. Correspondingly, the adjusted utility of adversary is for any threshold . In the next lemma, we reveal a connection between the adversary’s utility and .

Lemma 8.

Given mechanism and any pair of neighboring databases ,

Lemma 8 shows that the adjusted utility is upper bounded by . Especially, when , we provide both upper and lower bounds to the adjusted utility in Lemma 10 in Appendix E.1, which means that is a good measure for the privacy level of when . In the following corollary, we show that upper bounds the adjusted utility of adversary.

Corollary 9.

Given mechanism and any pair of neighboring databases ,

in smoothed DP bounds the adversaries’ utility in (Bayesian) predictions under realistic settings. Follow similar reasoning as View 1, we have know that the utility under realistic setting (or the smoothed utility) of adversary cannot be larger than . Mathematically, a -smoothed DP mechanism can guarantee

Appendix E Missing Proofs for Section 3: Smoothed Differential Privacy

To simplify notations, we let .

e.1 Tight bound of

Lemma 10.

Given such that and mechanism and any pair of neighboring databases ,

Proof.

To simplify notations, for any output , we let

Then, we define the utility of adversary when the database is

Let be the different entry between and , we have,

To further simplify notations we define the adjusted utility with threshold for output as follows.

Note that the threshold is for the utility (not for the accuracy). Then, its easy for find that

Using the above notations, we have,

Then, we let and analyze the first term,