Introduction
The prevalence of personal and connected devices along with the ubiquity of the Internet in our daily lives have led to an explosion of data being collected on individuals. These data are a treasure trove enabling services from personalized shopping to personalized healthcare. Access to more data is a key factor in developing better and more accurate machine learning models. While collecting more intracompany data is one solution, another is to share data across companies. Intercompany data aggregation can expand the data for all participating companies in an accelerated way. However, companies are not always willing to volunteer their data, primarily due to privacy concerns for both the individuals providing the data and the entities collecting, storing and actioning on these data (Dwork et al., 2017).
Bluecore is a BusinesstoBusiness SaaS company that collects onsite and offline traffic data from Ecommerce companies to enable marketers to grow their customer base, identify their best customers, and maximize their lifetime value. This is done through machine learning models that predict, for example, a customer’s lifetime value, their affinity to products, and their propensity to engage with an email. Each client’s data are typically stored in a silo separate from the others. When training a model for a customer, only the respective customer’s data are used. However, if Bluecore were to combine its different datasets in a riskfree way, it could augment the utility of its models for each of their clients.
Differential privacy (DP) is a fastgrowing field that provides provable guarantees on data privacy while maintaining utility. Privacy guarantees garner the trust of individuals and are key to enabling a greater level of trust and collaboration amongst companies. We propose and evaluate two DPbased frameworks to enable the training of machine learning models across various companies’ datasets. These approaches provide the benefits of aggregation while ensuring that no company can glean precise information on any individual customer in another company’s dataset. The first framework is based on the Differentially Private Permutationbased Stochastic Gradient Descent (DPPSGD) algorithm (Wu et al., 2017) and the second is based on the Approximate Minima Perturbation (AMP) algorithm (Iyengar et al., )
; both apply to techniques with convex objective functions, such as logistic regression (LR). This paper focuses on the aggregation of LR for binary classification models as a starting point, since LR is one of the most prevalently used algorithms. We show that our DP framework using the DPPSGD algorithm provides a 9.72% average lift in performance over a nonaggregated baseline in a realworld coldstart setting and an average lift of 4.85% using the AMP method.
Organization of the paper: We first present some background on differential privacy and then describe the two DP methods employed (DPPSGD and AMP) with their implementation details. Subsequently, we cover the experimental results obtained. They are followed by the business impact of this work and we conclude with future investigation directions.
Key contributions:

A framework for improving model performance by aggregating siloed data in a differentially private manner.

A detailed description of the strongly convex DPPSGD algorithm with minibatching that allows for faster convergence without compromising on privacy, including a derivation of the key loss function parameters.

A full comparison of DPPSGD and AMP in a model aggregation framework.

Experimental results showing the benefit of DP aggregation in the case of a coldstart problem with limited data.
Problem Description
As data collection for Bluecore’s partners (clients) only begins when they sign up for its services, it could take several months before the models have the data needed for good performance. While each partner may suffer from limited data, enough data exist collectively across the partnerbase to build higher performing models. However, partners are hesitant to have their data combined with others due to privacy concerns. Therefore, we set out to find a technical solution that could allow for aggregation across partners while preserving the privacy of each partner’s data.
One approach could be to build a model on each partner’s data separately and then aggregate the models using an ensemble approach. Unfortunately, this method would not guarantee privacy as it has been shown that machine learning models leak data. Barreno et al. (2010) show that models can memorize patterns in the training data exposing specific data points. Moreover, even in cases of only blackbox access to the model, modelmembership inference attacks (Shokri et al., 2017) can determine whether a datapoint was used in training, and model inversion attacks (Fredrikson, Jha, and Ristenpart, 2015) can pinpoint the value of specific features of a training datapoint.
To address the above privacy issues we teamed up with the impact team at Georgian Partners, one of our venture capital investors. This paper presents the result of this collaboration which shows that DP can provide a viable approach for aggregating partner specific models while simultaneously guaranteeing that no one partner can learn specific information about an individual datapoint in another partner’s dataset. We also show that this differentially private aggregation provides business value and can mitigate the coldstart problem. Specifically, we focus on Bluecore’s propensity to convert model, which is an LR model that estimates the probability that a customer will make a purchase in the near future. We expect that these findings would extend to our other propensity models which also use LR.
Related Work
Data privacy has always been a primary concern for organizations and individuals. Masking an individual’s personally identifiable information is not enough: it has been shown that 87% of Americans could be identified by using auxiliary information such as ZIP code, birthday, and gender (Sweeney, 2000). In an effort to hinder identification of individuals and private data, a major area of research emerged that studies how to extract meaningful statistical information from databases while preserving privacy. Recently, Uber (Mohan et al., 2012; Johnson, Near, and Song, ) and Google (Erlingsson, Pihur, and Korolova, 2014; Fanti, Pihur, and Erlingsson, 2016) have released tools to query a database in a private way.
There are two traditional techniques for preserving privacy: input and output perturbation. Input perturbation originated from survey techniques that inject some random noise when a participant answers the survey, such as Randomized Response (Warner, 1965). In output perturbation, an exact answer is first computed, then a random noise is injected to the output answer (Reiss, 1984; Traub, Yemini, and Woźniakowski, 1984; Beck, 1980). However, these classical techniques often generate too much noise, rendering the information extracting process impractical. For example, it has been shown that, in some cases, unless the noise injected to the database is so large that the database becomes unusable, it could be recovered in polynomial time by an adversary (Dinur and Nissim, 2003). Following this work, Dwork et al. (2006) suggested calibrating the noise to the sensitivity of the query function to keep meaningful statistical information. This study coined a new definition of privacy, named Differential Privacy, which is the definition used in this work. Given two neighboring databases and which differ in only one data entry and for all events , a noninteractive randomized algorithm is said to be ()differentially private if:
(1) 
In particular, if , differentially private is used instead. By analyzing the sensitivity of the query functions and injecting noise following a Laplace distribution, Dwork et al. (2006) showed that differential privacy can be achieved with much less noise than traditional approaches. For a survey of differential privacy based querying techniques we refer the reader to Dwork (2008).
There is, however, a drawback to these private querying techniques: the number of queries that can be made to the database is limited by a privacy budget as discussed in Friedman and Schuster (2010). This translates to differentially private models where each query to the model is differentially private, but the model parameters themselves are not. In these cases, each query leaks a small amount of information about the original data, and while a small number of queries may not constitute a significant leakage, a large number of queries may invalidate any meaningful privacy guarantees. To conquer the restrictions of private queries, model differential privacy has gained popularity as it does not impose any privacy budget limitations. The learning mechanism is treated as one query to the database, allowing the model to subsequently be used an unlimited number of times to make predictions (Chaudhuri, Monteleoni, and Sarwate, 2011; Bassily, Smith, and Thakurta, 2014).
Chaudhuri and Monteleoni (2009) introduced a differentially private LR model by analyzing the sensitivity of a regularized logistic loss and perturbing the learned weights with noise that is inversely proportional to the bound on the sensitivity. Recent work (Wu et al., 2017) proposed a new technique named Differentially Private Permutationbased Stochastic Gradient Descent (DPPSGD) that also injects noise on the model output weights. The study provides a new analysis on sensitivity and allows for injecting less noise and faster convergence. Performance is therefore preserved while strong privacy is guaranteed. The DPPSGD algorithm is however limited to models with convex or stronglyconvex objective functions. In parallel, methods that inject noise during the optimization process emerged. Song, Chaudhuri, and Sarwate (2013) proposed to add noise at each update of the gradient descent, but this created high variation in the training process. In a later version, Abadi et al. (2016) derived tighter privacy bounds for a similar gradient perturbation method. Another technique consists in perturbing the objective function itself, and output the model parameters that minimize the transformed loss, as in Chaudhuri, Monteleoni, and Sarwate (2011). Unfortunately, privacy guarantees only hold when the algorithm outputs the exact minima of the noisy objective, which can be impracticable to find in some cases. More recent work by Iyengar et al. presents a novel algorithm applicable to any convex loss, Approximate Minima Perturbation (AMP), that can provide privacy and utility guarantees even when the released model is not necessarily the exact minima of the perturbed objective.
Methodologies
As described in previous sections, the goal is to enable transfer learning through model aggregation. In order to protect the individual’s privacy, the models must be trained in a private way. Selecting which private approach to use largely depends on the nature of the problem and the availability of data. After exploring different methods to build differentially private machine learning models, we identified DPPSGD
(Wu et al., 2017) and AMP (Iyengar et al., ) as the best approaches for our usecase of solving the coldstart problem, where we have few labeled data. In this section, we provide a detailed description of the DPPSGD LR approach that employs mini batches, and an overview of the AMP approach.Notation: Throughout this paper, we will be using to indicate the
norm. Vectors will be written in boldface and sets in calligraphic type. A list of all parameters used for both algorithms is provided in Table
1.Parameter  Description 

Number of partners considered  
Training set  
Size of training set  
Dimension of training set  
Loss function  
regularization parameter  
Loss parameter  
Lipschitz constant  
Smoothness  
sensitivity  
Strong convexity parameter  
DPPSGD  
Learning rate at iteration  
Batch size  
Hypothesis space  
Radius of the hypothesis space  
AMP  
Bound on norm of the loss’s gradient  
Privacy  
,  Privacy parameters 
Dppsgd
In the DPPSGD algorithm (Wu et al., 2017), stochastic gradient descent (SGD) is treated as a black box and, Laplace noise is only added to the model output at the end of the optimization process. The authors provide a novel analysis of the convergence of the permutationbased SGD and a tighter bound on the sensitivity of the algorithm. The authors showed that little noise is needed to achieve reasonable privacy guarantees. The key advantages of using the DPPSGD algorithm are that its implementation is simple and that it relies on SGD, which is a generic optimization technique that can be applied to other convex optimization based machine learning techniques.
Strongly Convex DPPSGD Algorithm
Wu et al. indicate that using minibatching can improve sensitivity bounds by the batch size and thus effectively lowering the amount of noise to be injected (Wu et al., 2017). Thus, we implement the strongly convex version of the DPPSGD algorithm with minibatching throughout this paper and we will simply refer to this algorithm as DPPSGD going forward. Our approach necessitates a custom implementation of LR since DPPSGD not only demands a model output perturbation but also two crucial modifications in the training process to ensure convergence. The learning rate must be set to min at each iteration and the hypothesis space , within which lives the weights vector , is constrained to a ball of radius . Therefore, at the end of each iteration, we must compute the norm of the weights , and project w down to the ball if . After all iterations have finished, regardless of the size of , w is projected to the ball. By doing so, is normalized to and noise will have less impact on w when is relatively large.
We have identified two preconditions that must be met in order to implement the strongly convex DPPSGD:

the loss function must be strongly convex for all w

all datapoints must be scaled such that .
The regularized sigmoid binary crossentropy loss , presented in equation (2) fulfills the first condition (Grant, Boyd, and Ye, 2008):
(2) 
The represent the records’ labels. We discuss in detail how to achieve precondition 2 in the Experimental Results section.
There are three main parameters specific to the objective function which can be derived: , a tight bound on , , a tight bound on , and . Given the loss function , it is clear that . The detailed derivation of and is shown below. For , since
, and using the chain rule, we get
Since and  
Then, since and ,  
(3) 
Therefore, we set to . We now present the derivation for .
(4) 
(5) 
Therefore, we set to be . All other parameters are hyper parameters that can be tuned privately. The specifics will be discussed in the Experimental Results section.
Amp
The other differentially private model that we consider is the AMP algorithm (Iyengar et al., ). The AMP algorithm is a mix of objective perturbation and output perturbation. First, the objective function is perturbed and takes the form , where is a Gaussian noise term. To overcome the challenge of finding the exact minima, the AMP algorithm allows for an approximation: the algorithm optimizes over the perturbed objective function and stops when the norm of the gradient of the perturbed objective, is within a predetermined threshold . The algorithm then releases , where
is another random variable drawn from a Gaussian distribution with a variance that is linearly dependent on the threshold
.In the AMP algorithm, the () privacy budget is split into two, () and (). () is used to create the Gaussian noise that is added to the objective function while () is use to compute the Gaussian noise that is added to the model output. The privacy and utility guarantees are analyzed thoroughly in Iyengar et al. , where pseudocode for the algorithm can also be found. Furthermore, the loss function used is the same as in equation (2) but is not used (i.e. ) and is set as required by the privacy guarantees of the paper.
Experimental Results
In this section, we first describe the realworld datasets used for all experimentation and data preprocessing procedures. Next, we present the experimental design and results of three different experiments using both DPPSGD and AMP.
Realworld Datasets
The datasets used for the experiments capture customerwebsite interactions from 38 of Bluecore’s retail partners; all from the same business vertical. For each partner, we extract customer data in the same manner as for the model currently live in production. Each customer is represented by 19 engineered features that capture their browse and purchase behavior. Each record is labeled as if a purchase occurred in the subsequent 15 days, otherwise. For each partner, we obtain one “rampedup” dataset (size shown in Figure 5) that contains one year worth of customer records. To simulate a cold start setting, we also obtain one “coldstart” dataset (size shown in Figure 4
), which only includes records that were collected in the last month. To allow for a consistent comparison between the models built on “rampedup” and “coldstart” datasets, the features extracted were engineered in a timeagnostic manner.
Private vs Public Data: When training differentially private models, the data used fall into two categories: private or public. Private data are deemed sensitive, and the DP model is designed to protect them. Public data are widely available or not sensitive and therefore the DP model does not need to protect them: they are commonly used for parameter tuning, to help bypass information leakage, or in this case, to train the aggregate model. In this paper, when building an aggregate model for a given target partner, that partner’s own proprietary data do not pose any privacy restrictions; hence they are public data with regards to the target partner. However, data from other companies that the target company seeks to leverage are considered private.
Data Preprocessing
Both the DPPSGD and AMP approaches require that the loss have a Lipschitz constant, . To achieve this, we bound the feature vectors x including the bias term by 1. This must be done in a private manner, and we cannot leverage other partner’s datasets to normalize the target partner’s. In order to achieve normalization independently from other samples, we first pick a threshold that bounds the norm of each datapoint, , such that all datapoints with
are considered as outliers and discarded. We add the bias term with value
to each datapoint and the new vector is denoted by . All values are divided by to achieve . The logic is as follows: given that ,(6) 
Experimental Design
Using each partner’s respective training datasets, we train a differentially private LR classifier for both “rampedup” and “coldstart” datasets using either the DPPSGD or the AMP algorithm. Next, for each partner, referred to as the target partner, an ensemble model is trained as follows:

we feed the target partner’s training datapoints into all of the partnerspecific private models and get all of their predictions
The aggregation framework is shown in Figure 1. Each line type (full, dash, pointdash) represents the model training and aggregation process for different target partners.
Dppsgd
When training differentially private models, parameter tuning must be done privately. There exist several differentially private parameter tuning algorithms (Chaudhuri, Monteleoni, and Sarwate, 2011; Wu et al., 2017), however in our case, we use the target partner’s data (which are considered as “public” data) to tune the parameters. We find the best performing parameters using the target partner’s data and apply them to other partners’ model training parameters. We repeat the same process for each partner in turn. The free parameters that we can tune for DPPSGD are and . Since C and have proportionally inverse effects, we choose to fix at a standard value that ensures numerical stability and faster convergence during the gradient descent and vary . The batch size and are set as suggested in Wu et al. (2017). The values fixed/spanned during parameter tuning are summarized in table 2, along with the privacy parameters.
Parameter  Value Assigned 

0.001  
0.01 
Amp
For the AMP algorithm, the free parameters are , and the privacy parameters. We follow the guidelines outlined in Iyengar et al. for setting all of them, and their values are summarized in Table 3.
Parameter  Value Assigned 

1  
0.01  
0.99, 0.01  
0.99, 0.01 
Ensemble Model
Performance Measure
We evaluate the learned models on each partner’s test set. As an evaluation measure, we use the area under the ROC curve (AUC).
Experimental Setup
Both differentially private algorithms and the ensemble model are implemented following the algorithms outlined in the pseudocodes in Wu et al. (2017) and Iyengar et al. , and are written in Python 3.6. The implementation of DPPSGD uses the MXNet (Chen et al., 2015) package while AMP is implemented using the open source package that is provided with the paper (Iyengar et al., ) and which employs SciPy’s minimize procedure and BGFS solver.
Summary of Experiments
Nonprivate Baseline
For each partner, we train a nonprivate and unperturbed model on the respective “coldstart” dataset as a baseline for comparison. We also train a nonprivate and unperturbed model on the respective “rampedup” dataset. The nonprivate models are trained with SGD using the MXNet package.
Experiment 1: Varying
Prior to the aggregation, we conducted an investigation on the impact of on the LR performance. For each partner’s “coldstart” dataset, we averaged the performance of models trained using the data issued from 10 different random train/test splits, in order to deal with randomness induced by data partitioning. Additionally, for each epsilon, we sampled the noise vector 100 times before adding it to the model weights, in order to average out randomness induced in the noise sampling process. Figure 2
shows the quartiles of AUC across all partners on individual DPPSGD and AMP private models while varying
. The box labeled as no noise represents the nonprivate, unperturbed model, i.e., our baseline. As shown in Figure 2 for DPPSGD, there is a clear trend where smaller , which corresponds to stronger privacy guarantees, yields less accurate learners. As expected, for extremely high levels of noise (), the performance is close to that of a random classifier. As the amplitude of the noise added decreases, the performance gets closer and closer to the “true” performance of the model, i.e. that of the unperturbed model. For AMP, the variance in performance is higher than for DPPSGD at equally high levels of noise. AMP is inherently more prone to variance, as noise is also added to the objective function. We also note that AMP performs poorly even for a higher privacy budget ().Based on our business requirements and the model performance results, we have picked as the privacy parameter for the rest of our experiments. In Figure 3, we display for each partner the average AUC of the “coldstart” private and nonprivate models, dealing with randomness in data splitting and noise sampling as above. Unsurprisingly, the baseline almost always outperforms the noisy model, except in some cases for DPPSGD where we have similar performance (for instance partner 1, 8 or 12). For AMP, the performance generally remains close to that of a random classifier and does slightly better for some partners (4, 7, 22 or 24 for instance).
Experiment 2: Aggregation
We then compute the AUC across all partners for the aggregated private models. The results are illustrated in Figure 4. The first subplot displays the relative AUC lift of the “coldstart” private target partner’s model, aggregated with the other partners’ “rampedup” models over the nonprivate baseline (computed via a model trained nonprivately using a “coldstart” dataset). This is the main result of this study as it shows the utility of aggregation in our specific use case: augmenting a “coldstart” partner’s performance with “rampedup” partners. On average, the aggregation frameworks built with private models using DPPSGD and AMP provide a lift of 9.72% and 4.85%, respectively. These results highlight the benefits of aggregation (especially for AMP): while the noise injection led to a sometimes significant degradation in performance on an individual partner level (Figure 3), the aggregation provided a lift that counterbalanced it. The second subplot compares the same aggregated performance than above but to a “rampedup” nonprivate baseline. The aim is to see how well the aggregation can do compared to the performance that a partner can hope for once fully “rampedup”. The DPPSGD aggregation framework provides an average lift of 8.92% while the AMP aggregation framework provides an average lift of 3.69%. The third subplot represents the size of the “coldstart” training set. After aggregation and for both methods, most partners benefit from a lift over the “coldstart” baseline. For DPPSGD (resp. AMP), only partners 7, 19 and 25 (resp. 2, 7, 14, 19, 25 and 31) take a hit in performance. We see that most significant lifts benefit smallsized partners (e.g. 5, 9 and 29 for DPPSGD; 0, 9 and 29 for AMP) but also some large ones (e.g. 30 for both). This very last remark, along with the results of the comparison with the “rampedup” baseline highlight that data quantity is not the only driver of model performance. The quality of the transfer learning may also lie in the diversity of the different datasets’ distributions.
Additionally, we ran the aggregation in the case where the target partner also uses the “rampedup” dataset to highlight the benefits of aggregation even in noncoldstart scenarios. Figure 5 shows the relative lift in AUC of aggregation using only “rampedup” models over the “rampedup” nonprivate baseline. Once again, we see that aggregation benefits most partners. Here, DPPSGD (resp. AMP) yields an average lift of 8.16% (resp. 7.38%).
Experiment 3: Varying Number of Partners
Finally, we conducted a study on the impact of the number of partners on performance. In order to do that, we sampled a 100 times a subset of k partners for , ran for each subsample of partners the aggregation of the private models of the selected partners and reported the average AUC lift over the nonprivate baseline. The results are shown on Figure 6. As we add more partners, the variance in performance decreases but the mean remains more or less stable. This indicates that aggregation can provide a benefit even when a small number of datasets is available. However, the variance also shows that the effect of each dataset on the target partner’s performance could be large. As more and more datasets are aggregated the relative influence of each one of them diminishes, therefore ensuring that no one dataset drives all of the improvement in performance.
Business Impact and Future Work
At Bluecore, the coldstart problem directly affects the performance of our models for partners that have recently signed up. This leads to either deploying underperforming models for partners eager to use them or delaying modeldeployment until a critical mass of data is collected. The approach presented in this paper provides us with the ability to bridge, in a differentially private manner, the gaps between our clients’ siloed data, and in turn enables us to provide better models sooner. Consequently, this will directly drive a higher return on investment for our partners and more revenue for Bluecore.
This paper has shown marked improvement for our propensity to convert model. Future work will study whether this improvement will also manifest itself in the other LR based models such as the models predicting propensity to open, click, and unsubscribe. Additionally, we employ a wide array of algorithms beyond LR that do not have strong convex loss objective functions. For those, we plan to explore DP aggregation through private aggregation of teacher ensembles (Papernot et al., 2018; Abadi et al., 2016).
Conclusion
Differential privacy provides privacy guarantees for individuals while enabling insights at the entire population level. This leads individuals to more willingly share their data in return for improved machine learning products and insights. Furthermore, in SaaS companies that collect and store client companies’ data in a siloed manner, differential privacy can also encourage entities or companies with similar data to share information amongst each other.
We proposed a framework for private model aggregation using differential privacy. We analyzed the framework with two different private model generation algorithms: DPPSGD and AMP. Through extensive experimentation, we observed that in a coldstart setting our framework can provide an average model performance lift of 9.72% using DPPSGD and 4.85% using AMP. Furthermore, our results show that aggregation can even benefit in a fullyramped up setting. We also observed that as the number of client datasets aggregated increases, the contribution of each dataset to the gains achieved through aggregation is reduced.
References
 Abadi et al. (2016) Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H. B.; Mironov, I.; Talwar, K.; and Zhang, L. 2016. Deep learning with differential privacy. Proceedings of the 23rd ACM Conference on Computer and Communications Security.
 Barreno et al. (2010) Barreno, M.; Nelson, B.; Joseph, A. D.; and Tygar, J. 2010. The security of machine learning. Machine Learning 81(2):121–148.
 Bassily, Smith, and Thakurta (2014) Bassily, R.; Smith, A.; and Thakurta, A. 2014. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Foundations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on, 464–473. IEEE.
 Beck (1980) Beck, L. L. 1980. A security machanism for statistical database. ACM Transactions on Database Systems (TODS) 5(3):316–3338.
 Chaudhuri and Monteleoni (2009) Chaudhuri, K., and Monteleoni, C. 2009. Privacypreserving logistic regression. In Advances in Neural Information Processing Systems, 289–296.
 Chaudhuri, Monteleoni, and Sarwate (2011) Chaudhuri, K.; Monteleoni, C.; and Sarwate, A. D. 2011. Differentially private empirical risk minimization. Journal of Machine Learning Research 12(Mar):1069–1109.
 Chen and Guestrin (2016) Chen, T., and Guestrin, C. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd conference on knowledge discovery and data mining, 785–794. ACM.
 Chen et al. (2015) Chen, T.; Li, M.; Li, Y.; Lin, M.; Wang, N.; Wang, M.; Xiao, T.; Xu, B.; Zhang, C.; and Zhang, Z. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274.
 Dinur and Nissim (2003) Dinur, I., and Nissim, K. 2003. Revealing information while preserving privacy. In Proceedings of the 22nd ACM SIGMODSIGACTSIGART symposium on Principles of database systems, 202–210. ACM.
 Dwork et al. (2006) Dwork, C.; McSherry, F.; Nissim, K.; and Smith, A. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, 265–284. Springer.
 Dwork et al. (2017) Dwork, C.; Smith, A.; Steinke, T.; and Ullman, J. 2017. Exposed! a survey of attacks on private data. Annual Review of Statistics and Its Application 4:61–84.
 Dwork (2008) Dwork, C. 2008. Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation, 1–19. Springer.
 Erlingsson, Pihur, and Korolova (2014) Erlingsson, Ú.; Pihur, V.; and Korolova, A. 2014. Rappor: Randomized aggregatable privacypreserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, 1054–1067. ACM.
 Fanti, Pihur, and Erlingsson (2016) Fanti, G.; Pihur, V.; and Erlingsson, Ú. 2016. Building a rappor with the unknown: Privacypreserving learning of associations and data dictionaries. Proceedings on Privacy Enhancing Technologies 2016(3):41–61.
 Fredrikson, Jha, and Ristenpart (2015) Fredrikson, M.; Jha, S.; and Ristenpart, T. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 1322–1333. ACM.
 Friedman and Schuster (2010) Friedman, A., and Schuster, A. 2010. Data mining with differential privacy. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 493–502. ACM.
 Grant, Boyd, and Ye (2008) Grant, M.; Boyd, S.; and Ye, Y. 2008. Cvx: Matlab software for disciplined convex programming.
 (18) Iyengar, R.; Near, J. P.; Song, D.; Thakkar, O.; Thakurta, A.; and Wang, L. Towards practical differentially private convex optimization. In Towards Practical Differentially Private Convex Optimization, 0. IEEE.
 (19) Johnson, N.; Near, J. P.; and Song, D. Towards practical differential privacy for sql queries. Vertica 1:1000.
 Mohan et al. (2012) Mohan, P.; Thakurta, A.; Shi, E.; Song, D.; and Culler, D. 2012. Gupt: privacy preserving data analysis made easy. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, 349–360. ACM.
 Papernot et al. (2018) Papernot, N.; Song, S.; Mironov, I.; Raghunathan, A.; Talwar, K.; and Erlingsson, Ú. 2018. Scalable private learning with pate. arXiv preprint arXiv:1802.08908.
 Reiss (1984) Reiss, S. P. 1984. Practical dataswapping: The first steps. ACM Transactions on Database systems (TODS) 9(1):20–37.
 Shokri et al. (2017) Shokri, R.; Stronati, M.; Song, C.; and Shmatikov, V. 2017. Membership inference attacks against machine learning models. In Security and Privacy (SP), 2017 IEEE Symposium on, 3–18. IEEE.
 Song, Chaudhuri, and Sarwate (2013) Song, S.; Chaudhuri, K.; and Sarwate, A. D. 2013. Stochastic gradient descent with differentially private updates. In Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE, 245–248. IEEE.
 Sweeney (2000) Sweeney, L. 2000. Simple demographics often identify people uniquely. Health (San Francisco) 671:1–34.
 Traub, Yemini, and Woźniakowski (1984) Traub, J. F.; Yemini, Y.; and Woźniakowski, H. 1984. The statistical security of a statistical database. ACM TODS 9(4):672–679.
 Warner (1965) Warner, S. L. 1965. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60(309):63–69.
 Wu et al. (2017) Wu, X.; Li, F.; Kumar, A.; Chaudhuri, K.; Jha, S.; and Naughton, J. 2017. Bolton differential privacy for scalable stochastic gradient descentbased analytics. In Proceedings of the 2017 ACM International Conference on Management of Data, 1307–1322. ACM.