Differentially Private ERM Based on Data Perturbation

by   Yilin Kang, et al.

In this paper, after observing that different training data instances affect the machine learning model to different extents, we attempt to improve the performance of differentially private empirical risk minimization (DP-ERM) from a new perspective. Specifically, we measure the contributions of various training data instances on the final machine learning model, and select some of them to add random noise. Considering that the key of our method is to measure each data instance separately, we propose a new `Data perturbation' based (DB) paradigm for DP-ERM: adding random noise to the original training data and achieving (ϵ,δ)-differential privacy on the final machine learning model, along with the preservation on the original data. By introducing the Influence Function (IF), we quantitatively measure the impact of the training data on the final model. Theoretical and experimental results show that our proposed DBDP-ERM paradigm enhances the model performance significantly.



page 1

page 2

page 3

page 4


Input Perturbation: A New Paradigm between Central and Local Differential Privacy

Traditionally, there are two models on differential privacy: the central...

On the Intrinsic Differential Privacy of Bagging

Differentially private machine learning trains models while protecting p...

DP-InstaHide: Provably Defusing Poisoning and Backdoor Attacks with Differentially Private Data Augmentations

Data poisoning and backdoor attacks manipulate training data to induce s...

Differentially Private Sliced Wasserstein Distance

Developing machine learning methods that are privacy preserving is today...

Differentially Private Coordinate Descent for Composite Empirical Risk Minimization

Machine learning models can leak information about the data used to trai...

DP-EM: Differentially Private Expectation Maximization

The iterative nature of the expectation maximization (EM) algorithm pres...

On Dynamic Noise Influence in Differentially Private Learning

Protecting privacy in learning while maintaining the model performance h...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent decades, machine learning has been widely used in fields such as data mining and pattern recognition

(Ivezić et al., 2019; Kavakiotis et al., 2017; Fu et al., 2019; Fatima and Pasha, 2017). Although machine learning has been shown to be effective and powerful in learning meaningful patterns directly from data, it faces many problems. One of the widest concerns is privacy, as tremendous quantities of data are collected to support the ‘data-hungry’ machine learning algorithms. Many machine learning models are trained on crowdsourcing data, which often contains sensitive personal information (Yuen et al., 2011; Feng et al., 2017), especially in medical research and financial fraud detection. It is obvious that any leakage of original data will reveal sensitive information on individuals if there is no protection on the training data. However, the original data is not the only problem, the model parameters trained using this data will also disclose the privacy of individuals in an undirect way (Fredrikson et al., 2014, 2015; Shokri et al., 2017).

Differential privacy (DP) (Dwork et al., 2006, 2014) is a theoretically rigorous tool to prevent sensitive information leakage and has become a popular framework to guarantee privacy. It preserves privacy by introducing a certain amount of random noise to the model parameters, in order to block malicious adversaries from inferring any single individual included in the dataset by observing the machine learning model. As such, differential privacy has attracted the attention of many researchers and has been applied to numerous machine learning methods, such as regression (Chaudhuri and Monteleoni, 2009; Smith et al., 2018; Bernstein and Sheldon, 2019)

, principal component analysis (PCA)

(Chaudhuri et al., 2013; Wang and Xu, 2019)

, Markov chain Monte Carlo (MCMC)

(Heikkilä et al., 2019), boosting (Dwork et al., 2010; Zhao et al., 2018)

, deep learning

(Shokri and Shmatikov, 2015; Abadi et al., 2016), generative adversarial networks (GANs) (Xu et al., 2019; Wu et al., 2019), graph algorithms (Ullman and Sealfon, 2019; Arora and Upadhyay, 2019) and other fields.

Empirical risk minimization (ERM), a popular optimization method which covers a variety of machine learning tasks, also faces privacy problems. There is a long list of works focusing on the combination of differential privacy and ERM (Chaudhuri et al., 2011; Kifer et al., 2012; Song et al., 2013; Bassily et al., 2014; Wang et al., 2017; Zhang et al., 2017), in which there are three main approaches of adding noise: output perturbation, objective perturbation and gradient perturbation (i.e. adding random noise to the final model, the objective function and the gradient, respectively).

Besides, input perturbation is an approach to add random noise on the training data, which is always connected with local differential privacy (LDP) to protect the individual privacy when communicating with the server (Duchi et al., 2013; Zheng et al., 2017; Fukuchi et al., 2017). Additionally, as an important gradient descent method, SGD has also attracted significant attention for differential privacy (Song et al., 2013). However, the accuracies of DP-SGD methods are notorious for being unsatisfactory. Thus, several performance improving methods have been proposed for DP-SGD (Phan et al., 2017; Wu et al., 2017; Wang et al., 2019).

Meanwhile, in previous works, different data instances are always treated the same when considering privacy. However, in real scenarios, different training data affects the model in different ways, so treating them all the same lacks ‘common sense’ and may be one of the reasons why low accuracy appears in DP algorithms.

Motivated by the definition of differential privacy (which will be clarified in detail in Section 2), in this paper, we provide a new perspective to improve the performance: different data instances are treated differently when adding random noise in the training process. By observing that different data instances make different contributions to the final machine learning model, we set a threshold and select some of them add random noise to in order to improve the performance. In particular, we follow the ‘common sense’ and only add random noise to those data instances which alter the final model parameters significantly, rather than considering all of them as being the same. To achieve an aim of ‘measuring each data instance separately’, we propose a new ‘data perturbation’ based (DB) paradigm in the field of DP-ERM, guaranteeing differential privacy by perturbation on the original training data. Meanwhile, by adding noise on the original data, we preserve the training data to some extents. Additionally, we introduce the Influence Function (IF) into our method to quantitatively measure the contributions made by different data instances.

The rest of the paper is organized as follows. First, we present some preliminaries in Section 2. Then, we propose our Data perturbation based Differentially Private ERM (DBDP-ERM) algorithm in detail in Section 3. In Section 4, we give ‘performance improving’ DBDP-ERM by considering the contribution made by each data instance. In Section 5, we present the experimental results. We introduce some previous work related to our method, such as DP-SGD, in Section 6. Finally, we conclude the paper in Section 7.

2 Preliminaries

In this section, we revisit the definitions of ERM and differential privacy, along with some popular perturbation methods when adding noise.

2.1 Empirical Risk Minimization

In this part, we do not consider privacy. In general, supposing there are data instances in the dataset in total, the objective function of ERM is:


where is data instance,

denotes the loss function and

is the model with parameters.

In ERM, our goal is to find the optimal that minimizes the objective function, which is formally defined as:


and is represented by .

Assuming , where is the feature and is the label, and considering the case of binary classification, the data space , the label set , and we assume , i.e. is the unit ball.

2.2 Differential Privacy

Two databases differing by one single element are denoted as , and called adjacent databases. based on adjacent databases, differential privacy is defined as follows.

Definition 1 (Differential Privacy (Dwork et al., 2014))

A randomized function is (,)-differential privacy if


where range() and is the number of parameters.

Differential privacy requires that adjacent datasets lead to similar distributions on the output of a randomized algorithm . This implies that an adversary cannot infer whether an individual participates in the training process because essentially the same conclusions about an individual will be drawn whether or not that individual’s data was used.

2.3 Perturbation Methods

There are three main perturbation methods in the field of differential privacy: output perturbation, gradient perturbation and objective perturbation.

Output Perturbation In the method of output perturbation, noise is added directly to the final model (in this paper, we denote a model by its parameters):


where is the noise guaranteeing differential privacy.

The output perturbation method is commonly used because it is simple to implement, only adding noise to the final model.

Gradient Perturbation In the gradient perturbation method, noise is added to the gradient when training, which leads a gradient descent process at round to:


where is the learning rate.

The gradient perturbation method is feasible and popular because most machine learning algorithms are based on the gradient descent method.

Objective Perturbation In the method of objective perturbation, noise is added to the objective function:


where is a function of the random noise .

During training, the perturbed objective function is directly optimized:


This method is rarely used in recent years because it is always difficult to optimize the objective function with random variables.

3 Data Perturbation Based Differentially Private ERM

In this section, before introducing our performance improving method, first we introduce our proposed Data perturbation based Differentially Private ERM (DBDP-ERM) method in detail as fundamental knowledge and provide a theoretical analysis on privacy and excess empirical risk.

In this paper, when it comes to data perturbation, we transform to , where is the random noise. The original data is denoted as and the perturbed data is denoted as for clarity. With random noise , the original data is protected.

As a result, the objective function is transformed to:


where denotes the perturbed objective function.

For generality, we discuss the mini-batch method here. Supposing that the sampling probability is

and the sampled set is whose size is denoted by , i.e. .

In this way, the updating of model parameters becomes:


We provide our DBDP-ERM method in Algorithm 1.

1:  Input: Dataset , learning rate , iteration rounds , sampling probability
2:  Initial model parameter as randomly.
3:  For all data instances (), transform them to .
4:  , where and .
5:  for  to  do
6:     Randomly choose samples for gradient descent, denoted by .
7:     .
8:  end for
9:  .
10:  return ,
Algorithm 1 DBDP-ERM

By applying Algorithm 1, we can get to guarantee the differential privacy of the machine learning model and to preserve the original training data.

Theorem 1

In Algorithm 1, for , if is -Lipschitz over , has an infimum , is of the same order as and


it is ()-differential privacy for some constant .

We provide a sketch of the proof here, details of which are shown in the full version of this paper. By first order Taylor expansion at point , we have:


For the sake of simplicity, we denote by 111Here, both and its -norm (i.e. ) represent how the loss function scales the perturbation added on data, so it is reasonable to replace the former by the latter. and transform equation (11) to:


in which the perturbation is led by .

By equation (12), and noting that , we have the perturbation on the gradient222Note that the ‘perturbation on the gradient’ here does not mean that we add noise on the gradient; the perturbation on the gradient is ‘passed’ from the noise we add on the training data.:


With in (10), for some constant

, the variance of this Gaussian perturbation is larger than

, which means (,)-differential privacy according to Theorem 1 in (Abadi et al., 2016).

Note that in Theorem 1, we only assume that the loss function is -Lipschitz, but do not require its convexity. So, Algorithm 1 is general to most conditions.

4 Performance Improving Method

In this section, we propose a ‘performance improving’ method based on the DBDP-ERM proposed in Section 3.

Motivated by the original definition of differential privacy (equation (3)), of which the key is the changes on the model caused by altering a single data instance, we consider the contributions made by different data instances on the final model. In particular, if the effects caused by a data instance on the final machine learning model is so little that the attacker cannot realize it by observing the model, there is no need for us to add noise when training by . From this perspective, it can be observed that DBDP-ERM naturally fits our method when combining with SGD, in which ‘analyzing each data instance individually’ is an important step.

Now, we only have one problem left: How to quantitatively measure the impact of the data instances on the model? A classic technique from robust statistics, Influence Function (IF), gives us some inspirations.

4.1 Influence Function

The Influence function, introduced by (Koh and Liang, 2017), is used to trace a model’s prediction through the learning algorithm and back to its training data, which can measure the contributions on the machine learning model made by data instances.

Formally, the contribution of data instance is defined by , where . From (Koh and Liang, 2017), we have:


where is the Hessian and is assumed positive definite (PD).

By equation (14), we can quantitatively measure the changes on the model caused by data instances. In particular, it measures how the model changes if we ‘drop’ one of the data instances, which is in line with the definition of differential privacy. If is small, data instance contributes little to the model, so the privacy cost when training with is relatively small and vice versa. Based on this observation, we propose a performance improving version of our proposed DBDP-ERM method.

4.2 Performance Improving DBDP-ERM

To measure contributions of different data instances, we need a (near) optimal model to calculate . For simple tasks, the analytical solution of the optimal model can be achieved; and for complicated tasks, we need to ‘pre-train’ a near optimal model: .

Then, we set a threshold for adding noise. In particular, after choosing a data instance by SGD, we calculate the contribution of by equation (14), using the ‘pre-trained’ 333For the sake of simplicity, both the analytical solution and pre-trained near optimal model are denoted by .. Then, if the contribution is larger than the threshold, we add random noise when optimizing with ; otherwise, we do not add noise.

The performance improving method is detailed in Algorithm 2. First, we define the division between vectors (whose dimensions are the same) as division on the corresponding elements, i.e. if

and , then .

1:  Input: Dataset , learning rate , local iteration rounds , global update rounds
2:  Get the analytical solution or pre-trained model and calculate .
3:  Initialize the global model and the local model with :
4:  Initialize dataset with privacy: .
5:  for  to  do
6:     for  to  do
7:        Choose data instance randomly.
8:        Calculate contribution built by :
9:        if  then
11:        else
12:           Sample , and train the model by , i.e.
13:           .
14:        end if
15:     end for
17:  end for
19:  Add all (last version) perturbed data instances to , if there is no perturbation, add the original one.
20:  return ,
Algorithm 2 Performance Improving DBDP-ERM

In line 9 of Algorithm 2, we measure whether the change on the model caused by the chosen is enough for attackers to infer an individual or not. If the change is not obvious (measured by and , the same with DP in equation (3)), noise is not added and vice versa.

It can be observed that, in Algorithm 2, if the privacy budget is higher, the constraint added in line 9 is looser, which means that less data instances will meet the condition for adding noise. As a result, the performance of our proposed performance improving method will be better when is higher.

By Algorithm 2, a perturbed machine learning model is given, along with a perturbed dataset . With , we can train the private model directly, and the privacy of the original data is protected to some extents.

Remark 1

In Algorithm 2, the model is changed by each iteration . As a result, the contribution ‘fluctuates’ with the model. However, the global model is not ‘fixed in real time’ if , which is due to the consideration of calculation cost, leading

to be an estimated value during local

iterations. If local iteration number is set 1, the global model is synchronized with the local model , which means the contribution is the exact value.

4.3 Privacy Guarantees

In Algorithm 2, for those data instances whose contributions on the final model444The‘final model’ is relative, because the model is changed with the iteration of optimization, which is clarified in Remark 1. are bounded by and (i.e. line 9 works), ()-differential privacy on these data instances is naturally guaranteed. So, we only need to consider data instances that ‘fall into’ line 11.

Theorem 2

In Algorithm 2, for and , if is -Lipschitz over and


where (similar to Section 3) and , it is ()-differential privacy for some constant .

Note that in Theorem 2, , which is not of the same order as (), and we release the assumption .

Again, we only give a sketch of proof here and more details can be found in the full version of the paper.

With , we simplify SGD in Algorithm 2 to:


Then, with in (15), the perturbation on the gradient when training with is larger than , which means ()-differential privacy according to (Abadi et al., 2016).

Remark 2

As can be observed in (15), the noise added to each data instance is not the same. For data instance with larger , less random noise is added. This is because represents how the loss function scales the perturbation on data, and the ‘scaled perturbation’ is ‘passed’ to the gradient. It easily follows that if the scale given by is larger, the original added noise will be lower and vice versa.

A comparison between our method and some of the best previous methods on noise bound is shown in Table 1. As can be observed, the noise added to data in our method is much better than the gradient method proposed in (Song et al., 2013) by a factor of approximately and almost the same as the gradient perturbation methods proposed in (Abadi et al., 2016) and (Wang et al., 2019).

4.4 Utility

Perturbation Noise Type Noise Bound Utility
(Song et al., 2013) Gradient Exponential not given
(Abadi et al., 2016) Gradient Gaussian not given
(Wu et al., 2017) Output Gaussian
(Wang et al., 2019) Gradient Gaussian
DBDP-SGD Input Gaussian
(Performance Improving) DBDP-SGD Input Gaussian (For the worst case)
  • If the noise type is ‘Exponential’, noise , the noise bound is represented by .

  • If the noise type is ‘Gaussian’, the noise bound is represented by the variance.

Table 1: Comparisons on noise bound and utility between our method and other methods.

It can be observed that, in Algorithm 2, noise is only added when training with a partition of data instances that have a sufficient effect on the model. So, in the worst case, all training data contributes a lot to the final model, and as a result, noise is added in each iteration. Under these circumstances, we analyze the excess empirical risk bound of our method.

Theorem 3

Suppose that over , the loss function is -smooth, the objective function is differentiable and -strongly convex, is the same as in equation (15) and the learning rate . We have:


where is the number of parameters in the model, is the same as in equation (2) and 555Here, hides factors polynomial in and ..

The proof sketch is similar to (Wang et al., 2017), details of which are shown in the Appendix. When considering the worst case, we add noise to the data instance in each iteration. Here, we denote -smooth as and -strongly convex as .

By equation (16), the perturbation on data is scaled and ‘passed’ to the gradient by , leading to the perturbation on the gradient being . So, first, we consider the bound of at iteration :


By re-arranging and summing over iterations, we have:


Then, with :

Remark 3

By improving the proof process in (Wang et al., 2017), we improve the last term in equation (19) to , where the term in the numerator is instead in (Wang et al., 2017). As a result, the excess empirical risk bound is improved by a factor of .

It can be observed that for the worst case, the excess empirical risk bound of our method is almost the same as some of the best previous methods, which means that in standard settings, our method is much better. Comparisons between our method and previous methods are shown in Table 1. As can be seen, our method is better than the methods proposed in (Wang et al., 2019) and (Wu et al., 2017) by factors of approximately and , respectively.

(a) KDDCup99 (LR)
(b) Adult (LR)
(c) Bank (LR)
(d) KDDCup99 (MLP)
(e) Adult (MLP)
(f) Bank (MLP)
Figure 1: Accuracy over

on different datasets, LR denotes logistic regression model and MLP denotes the deep learning model.

(a) KDDCup99 (LR)
(b) Adult (LR)
(c) Bank (LR)
(d) KDDCup99 (MLP)
(e) Adult (MLP)
(f) Bank (MLP)
Figure 2: Optimality gap over on different datasets, LR denotes logistic regression model and MLP denotes the deep learning model.

5 Experiments

Experiments are performed on the classification task. Considering that the key of our method is to analyze each data instance individually, we compare it with previous DP-SGD methods. Specifically, we compare our method with the gradient perturbation method proposed in (Abadi et al., 2016), the output perturbation method proposed in (Wu et al., 2017) and the DP-LSSGD method proposed in (Wang et al., 2019)666The performance fluctuates sharply on some datasets.. The performance is measured in terms of accuracy and the optimality gap. The latter, denoted by , represents the excess empirical risk.

According to the sizes of the datasets, we use both logistic regression model and deep learning model on the datasets KDDCup99 (Hettich and Bay, 1999), Adult (Dua and Graff, 2017), Bank (Moro et al., 2014), and on datasets Breast Cancer (Mangasarian and Wolberg, 1990), Credit Card Fraud (Bontempi and Worldline, 2018), Iris (Dua and Graff, 2017)

, only logistic regression model is applied, where the total number of data instances are 70000, 45222, 41188, 699, 984 and 150, respectively. In the experiments, deep learning model is denoted by Multi-layer Perceptron (MLP) with one hidden layer whose size is the same as the input layer. Training and testing datasets are chosen randomly. Due to space limitations, the performance on datasets Breast Cancer, Credit Card Fraud and Iris are shown in the Appendix.

In all experiments, total iteration rounds and learning rate are chosen by cross-validation. For our performance improving method, the number of global rounds is set to (i.e. ). We evaluate the performance over the differential privacy budget , which is set from 0.01 to 7. is set differently according to different datasets and can be considered as a constant.

Figure 1 shows that as the privacy budget increases, so does the accuracy, which follows our intuition. Moreover, the accuracy of our proposed DBDP-ERM method is similar to some of the best previous DP-SGD methods. With the performance improving algorithm proposed in Algorithm 2, the accuracy rises on most datasets, which means that our proposed method is effective. Meanwhile, when is small, the difference between the accuracy of the DBDP-ERM method and the performance improving one is also small and, in some cases, the accuracy of the former is even higher (e.g. in Figure 1 when ). However,as increases, the performance improving method becomes more and more competitive, which correlates with our theoretical analysis.

It can be observed that, in Figure 2, the optimality gap of the DBDP-ERM method is also similar to previous methods. Moreover, on some datasets, by applying our proposed performance improving algorithm, the optimality gap of our method is almost 0, which means that it achieves almost the same performance as the model without privacy consideration in some scenarios. In addition, similar to the accuracy in Figure 1, the optimality gap decreases as increases, which follows our intuition.

Experimental results show that our proposed DBDP-ERM method achieves similar performance to some of the best previous methods and our performance improving algorithm significantly improves the performance under most circumstances. The results on deep learning models are similar with traditional machine learning model.

6 Related Work

The first method on DP-ERM was proposed in (Chaudhuri et al., 2011), in which two perturbation methods were introduced: output and objective perturbation. A new perturbation method: gradient perturbation, was proposed in (Song et al., 2013), in which DP-SGD was also analyzed. Based on these works, various other methods were proposed to improve the noise bound and the excess empirical risk bound. The accuracy of the objective perturbation method was improved by (Kifer et al., 2012). The excess empirical risk bounds of the methods proposed in (Chaudhuri et al., 2011) and (Kifer et al., 2012) were improved by (Bassily et al., 2014), guaranteeing differential privacy by gradient perturbation. (Zhang et al., 2017) proposed an output perturbation method, achieving a better utility bound compared with (Bassily et al., 2014). By introducing an advanced gradient descent method: Prox-SVRG (Xiao and Zhang, 2014), DP-SVRG was proposed in (Wang et al., 2017), in which optimal or near optimal utility bounds were achieved, with less gradient complexity. Besides, input perturbation is an approach to add noise on the original data, which is always connected with LDP (Duchi et al., 2013; Zheng et al., 2017; Fukuchi et al., 2017). In LDP, the privacy of the training data is concerned, rather than the privacy of the machine learning model, which is different from our method.

When it comes to DP-SGD, several methods have also been proposed to improve the performance. An output perturbation method was introduced to DP-SGD in (Wu et al., 2017), in which a novel

sensitivity of SGD was analyzed and better accuracy was achieved. The moments accountant method was proposed in

(Abadi et al., 2016) to keep track of the privacy loss and was applied to SGD in deep learning. Aiming to achieve better performance, (Phan et al., 2017) added more noise to those features less ‘relevant’ to the final model, leading to higher accuracy. A Laplace smooth operator was introduced to DP-SGD when updating model parameters and a new method: DP-LSSGD was proposed in (Wang et al., 2019), achieving better performance.

7 Conclusions

In this paper, we propose a method to improve the performance of DP-ERM from a new perspective: adding random noise to some of the training data, rather than treating all of them the same when guaranteeing differential privacy. In order to achieve our goal, of which the key is ‘dividing each data instance separately and then measuring them’, we propose a new paradigm for differentially private ERM: Data perturbation based DP-ERM (DBDP-ERM), which adds random noise to the original data and provides ()-differential privacy on the final machine learning model, along with preservations on the original data to some extents. By introducing the influence function (IF), we analyze the contribution of each data instance to the final model. We then add noise to the data demonstrating significant contributions when training, and do not change those whose contributions are small. Through detailed theoretical analysis, we demonstrate that our proposed DBDP-ERM method, along with the performance improving algorithm, achieves almost the same performance as some of the best previous methods without decrease in privacy guarantees. Meanwhile, experimental results demonstrate that our performance improving method significantly enhances the accuracy, along with the excess empirical risk, on both traditional machine learning models (such as logistic regression) and deep learning models (such as MLP). In future work, we will attempt to remove the assumption that the loss function is strongly convex and generalize our proposed paradigm to new scenarios, such as deep learning.


  • M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang (2016) Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. Cited by: §1, §3, §4.3, §4.3, Table 1, §5, §6.
  • R. Arora and J. Upadhyay (2019) On differentially private graph sparsification and applications. In Advances in Neural Information Processing Systems, pp. 13378–13389. Cited by: §1.
  • R. Bassily, A. Smith, and A. Thakurta (2014) Private empirical risk minimization: efficient algorithms and tight error bounds. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pp. 464–473. Cited by: §1, §6.
  • G. Bernstein and D. R. Sheldon (2019)

    Differentially private bayesian linear regression

    In Advances in Neural Information Processing Systems, pp. 523–533. Cited by: §1.
  • G. Bontempi and Worldline (2018) ULB the machine learning group. Université Libre de Bruxelles, the Computer Science Department, the Machine Learning Group. External Links: Link Cited by: §5.
  • K. Chaudhuri, C. Monteleoni, and A. D. Sarwate (2011) Differentially private empirical risk minimization. Journal of Machine Learning Research 12, pp. 1069–1109. Cited by: §1, §6.
  • K. Chaudhuri and C. Monteleoni (2009) Privacy-preserving logistic regression. In Advances in neural information processing systems, pp. 289–296. Cited by: §1.
  • K. Chaudhuri, A. D. Sarwate, and K. Sinha (2013) A near-optimal algorithm for differentially-private principal components. The Journal of Machine Learning Research 14 (1), pp. 2905–2943. Cited by: §1.
  • D. Dua and C. Graff (2017) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. External Links: Link Cited by: §5.
  • J. C. Duchi, M. I. Jordan, and M. J. Wainwright (2013) Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pp. 429–438. Cited by: §1, §6.
  • C. Dwork, F. McSherry, K. Nissim, and A. Smith (2006) Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pp. 265–284. Cited by: §1.
  • C. Dwork, A. Roth, et al. (2014) The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9 (3–4), pp. 211–407. Cited by: §1, Definition 1.
  • C. Dwork, G. N. Rothblum, and S. Vadhan (2010) Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 51–60. Cited by: §1.
  • M. Fatima and M. Pasha (2017) Survey of machine learning algorithms for disease diagnostic. Journal of Intelligent Learning Systems and Applications 9 (01), pp. 1. Cited by: §1.
  • W. Feng, Z. Yan, H. Zhang, K. Zeng, Y. Xiao, and Y. T. Hou (2017) A survey on security, privacy, and trust in mobile crowdsourcing. IEEE Internet of Things Journal 5 (4), pp. 2971–2992. Cited by: §1.
  • M. Fredrikson, S. Jha, and T. Ristenpart (2015) Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333. Cited by: §1.
  • M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page, and T. Ristenpart (2014) Privacy in pharmacogenetics: an end-to-end case study of personalized warfarin dosing. In 23rd USENIX Security Symposium (USENIX Security 14), pp. 17–32. Cited by: §1.
  • G. Fu, Y. Levin-Schwartz, Q. Lin, and D. Zhang (2019) Machine learning for medical imaging. Journal of healthcare engineering 2019, pp. 1–2. Cited by: §1.
  • K. Fukuchi, Q. K. Tran, and J. Sakuma (2017) Differentially private empirical risk minimization with input perturbation. In International Conference on Discovery Science, pp. 82–90. Cited by: §1, §6.
  • M. Heikkilä, J. Jälkö, O. Dikmen, and A. Honkela (2019) Differentially private markov chain monte carlo. In Advances in Neural Information Processing Systems 32, pp. 4115–4125. Cited by: §1.
  • S. Hettich and S. D. Bay (1999) The uci kdd archive [http://kdd.ics.uci.edu].. Irvine, CA: University of California, Department of Information and Computer Science.. Cited by: §5.
  • Ž. Ivezić, A. J. Connolly, J. T. VanderPlas, and A. Gray (2019) Statistics, data mining, and machine learning in astronomy: a practical python guide for the analysis of survey data. Princeton University Press. Cited by: §1.
  • I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda (2017) Machine learning and data mining methods in diabetes research. Computational and Structural Biotechnology Journal 15, pp. 104–116. Cited by: §1.
  • D. Kifer, A. Smith, and A. Thakurta (2012) Private convex empirical risk minimization and high-dimensional regression. In Conference on Learning Theory, pp. 25–1. Cited by: §1, §6.
  • P. W. Koh and P. Liang (2017) Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1885–1894. Cited by: §4.1, §4.1.
  • O. L. Mangasarian and W. H. Wolberg (1990)

    Cancer diagnosis via linear programming

    Technical report University of Wisconsin-Madison Department of Computer Sciences. Cited by: §5.
  • S. Moro, P. Cortez, and P. Rita (2014) A data-driven approach to predict the success of bank telemarketing. Decision Support Systems 62, pp. 22–31. Cited by: §5.
  • N. Phan, X. Wu, H. Hu, and D. Dou (2017) Adaptive laplace mechanism: differential privacy preservation in deep learning. In 2017 IEEE International Conference on Data Mining (ICDM), pp. 385–394. Cited by: §1, §6.
  • R. Shokri and V. Shmatikov (2015) Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp. 1310–1321. Cited by: §1.
  • R. Shokri, M. Stronati, C. Song, and V. Shmatikov (2017) Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. Cited by: §1.
  • M. Smith, M. Álvarez, M. Zwiessele, and N. D. Lawrence (2018) Differentially private regression with gaussian processes. In

    Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics

    Vol. 84, pp. 1195–1203. Cited by: §1.
  • S. Song, K. Chaudhuri, and A. D. Sarwate (2013) Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing, pp. 245–248. Cited by: §1, §1, §4.3, Table 1, §6.
  • J. Ullman and A. Sealfon (2019) Efficiently estimating erdos-renyi graphs with node differential privacy. In Advances in Neural Information Processing Systems, pp. 3765–3775. Cited by: §1.
  • B. Wang, Q. Gu, M. Boedihardjo, F. Barekat, and S. J. Osher (2019) DP-lssgd: a stochastic optimization method to lift the utility in privacy-preserving erm. arXiv preprint arXiv:1906.12056. Cited by: §1, §4.3, §4.4, Table 1, §5, §6.
  • D. Wang and J. Xu (2019) Principal component analysis in the local differential privacy model. Theoretical Computer Science. Cited by: §1.
  • D. Wang, M. Ye, and J. Xu (2017) Differentially private empirical risk minimization revisited: faster and more general. In Advances in Neural Information Processing Systems, pp. 2722–2731. Cited by: §1, §4.4, §6, Remark 3.
  • B. Wu, S. Zhao, C. Chen, H. Xu, L. Wang, X. Zhang, G. Sun, and J. Zhou (2019) Generalization in generative adversarial networks: a novel perspective from privacy protection. In Advances in Neural Information Processing Systems, pp. 306–316. Cited by: §1.
  • X. Wu, F. Li, A. Kumar, K. Chaudhuri, S. Jha, and J. Naughton (2017) Bolt-on differential privacy for scalable stochastic gradient descent-based analytics. In Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1307–1322. Cited by: §1, §4.4, Table 1, §5, §6.
  • L. Xiao and T. Zhang (2014) A proximal stochastic gradient method with progressive variance reduction. SIAM Journal on Optimization 24 (4), pp. 2057–2075. Cited by: §6.
  • C. Xu, J. Ren, D. Zhang, Y. Zhang, Z. Qin, and K. Ren (2019) GANobfuscator: mitigating information leakage under gan via differential privacy. IEEE Transactions on Information Forensics and Security 14 (9), pp. 2358–2371. Cited by: §1.
  • M. Yuen, I. King, and K. Leung (2011) A survey of crowdsourcing systems. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, pp. 766–773. Cited by: §1.
  • J. Zhang, K. Zheng, W. Mou, and L. Wang (2017) Efficient private erm for smooth objectives. arXiv preprint arXiv:1703.09947. Cited by: §1, §6.
  • L. Zhao, L. Ni, S. Hu, Y. Chen, P. Zhou, F. Xiao, and L. Wu (2018) InPrivate digging: enabling tree-based distributed data mining with differential privacy. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pp. 2087–2095. Cited by: §1.
  • K. Zheng, W. Mou, and L. Wang (2017) Collect at once, use effectively: making non-interactive locally private learning possible. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70, pp. 4130–4139. Cited by: §1, §6.