I Introduction
Machine learningasaservice (MLaaS) has seen an explosion of interest with the development of cloud platform services. Many cloud service providers, such as Amazon [1], Google [2], IBM [3], and Microsoft [4]
, have launched such MLaaS platforms. These services allow consumers and application companies to leverage powerful machine learning and artificial intelligence technologies without requiring inhouse domain expertise. Most MLaaS platforms offer two categories of services. (1) Machine learning model training. This type of service allows users and application companies to upload their datasets (often sensitive) and perform taskspecific analysis including private machine learning and data analytics. The ultimate goal of this service is to construct one or more trained predictive models. (2) Hosting service for pretrained models. This service provides pretrained models with a prediction API. Consumers are able to select and query such APIs to obtain task specific data analytic results on their own query data.
With the exponential growth of digital data in governments, enterprises, and social media, there has also been a growing demand for data privacy protections, leading to legislation such as HIPAA [5], GDPR [6], and the 2018 California privacy law [7]. Such legislation puts limits on the sharing and transmission of the data analyzed by these platforms and used to train predictive models. All MLaaS providers and platforms are therefore subject to the compliance of such privacy regulations.
With the new opportunities of MLaaS and the growing attention on privacy compliance, we have seen a rapid increase in the study of potential vulnerabilities involved in deploying MLaaS platforms and services. Membership inference attacks and adversarial examples represent two specific vulnerabilities against deep learning models trained and deployed using MLaaS platforms. With more mission critical cyber applications and systems using machine learning algorithms as a critical functional component, such vulnerabilities are a major and growing threat to the safety of cybersystems in general and the trust and accountability of algorithmic decision making in particular.
Membership Inference. Membership inference refers to the ability of an attacker to infer the membership of training examples used during model training. We call a membership inference a blackbox attack if the attacker only has the access to the prediction API of a privately trained model hosted by a MLaaS provider. A blackbox attacker therefore does not have any knowledge of either the private training process or the privately trained model.
Consider a financial institution with a large database of previous loan applications. This financial institution leverages its large quantity of data in conjunction with a MLaaS platform to develop a predictive model. The goal of constructing such a model is to provide prediction when given individuals’ personal financial data as input, such as a likelihood of being approved for a loan under multiple credit evaluation categories. This privately trained predictive model is then deployed with a MLaaS API at different offices of this financial institution. When potential applicants provide their own data to the query API, they receive some prediction statistics on their chance of receiving the desired loan and perhaps even the ways in which they may improve their likelihood of loan approval. In the context of the loan application approval model, a membership inference attack considers a scenario wherein a user of the prediction service is an attacker. This attacker provides data of a target individual and, based on the model’s output, the attacker tries to infer if had applied for a loan at the given financial institution.
Both the financial institution and individual have an interest in protecting against such membership inference attacks. Loan applicants (such as ) consider their applications and financial data to be sensitive and private. They do not want their loan to be public knowledge. The financial institution also owns the training dataset and the privately trained model and likely considers their training data to not only be confidential for consumer privacy but also an organizational asset [8]. It is therefore a high priority for MLaaS providers and companies to protect their private training datasets against membership inference risks for maintaining and increasing their competitive edge.
Adversarial Machine Learning.
The second category of vulnerability is the adversarial input attacks against modern machine learning models, also referred to as adversarial machine learning (ML). Adversarial ML attacks are broadly classified into two categories: (1) evasion attacks wherein attackers aim to mislead pretrained models and cause inaccurate output; and (2) poisoning attacks which aim to generate a poisonous trained model by manipulating its construction during the training process. Such poisoned models will misbehave at prediction time and can be deceived by the attacker. Adversarial deep learning research to date has been primarily centered on the generation of adversarial examples by injecting the minimal amount of perturbation to benign examples required to either (1) cause a pretrained classification model to misclassify the query examples with high confidence or (2) cause a training algorithm to produce a learning model with inaccurate or toxic behavior.
The risks of adversarial ML have triggered a flurry of attention and research efforts on developing defense methods against such deception attacks. As predictive models take a critical role in many sensitive or mission critical systems and application domains, such as healthcare, selfdriving cars, and cyber manufacturing, there is a growing demand for privacy preserving machine learning which is secure against these attacks. For example, the ability of adversarial ML attacks to trick a selfdriving car into identifying a stop sign as a speed limit sign poses a significant safety risk. Given that most of the prediction models targeted are hosted by MLaaS providers and kept private with only a blackbox access API, one common approach in developing adversarial example attacks is to use a substitute model of the target prediction model. This substitute model can be constructed in two steps. First, use membership inference methods to infer the training data of the target model and its distribution. Then, utilize this data and distribution information to train a substitute model.
Interestingly, most of the research efforts to date have been centered on defense methods against already developed adversarial examples [9] but few efforts have been dedicated to the countermeasures against membership inference risks and an attacker’s ability to develop an effective substitute model.
In this paper, we focus on investigating two key problems regarding model vulnerability to membership inference attacks. First, we are interested in understanding how skewness in training data may impact the membership inference threat. Second, we are interested in understanding a frequently asked question: can differentially private model training mitigate membership inference vulnerability? This includes several related questions, such as when such mitigation might be effective and the reasons why differential privacy may not always be the magic bullet to fully conquer all membership inference threats.
Machine learning with Differential Privacy.
Differential privacy provides a formal mathematical framework, which bounds the impact of individual instances on the output of a function when this function is constructed in a differentially private manner. In the context of deep learning, a deep neural network model is said to be differentially private if its training function is differentially private therefore guaranteeing the privacy of its training data. Thus, conceptually, differential privacy provides a natural mitigation strategy against membership inference threats. If training processes could limit the impact that any single individual instance may have on the model output, then the differential privacy theory
[10] would guarantee that an attacker would be incapable of identifying with high confidence that an individual example is included in the training dataset. Additionally, recent research has indicated that differential privacy also has a connection to the model robustness against adversarial examples [11].Unfortunately, differential privacy can be challenging to implement efficiently in deep neural network training for a number of reasons. First, it introduces a substantial number of parameters into the machine learning process, which already has an overwhelming number of hyperparameters for performance tuning. Second, existing differentially private deep learning methods tend to have a high cost in prolonged training time and lower training and testing accuracy. The effort for improving deep learning with differential privacy has therefore been centered on improving training efficiency and maintaining high training accuracy [12, 13]. We argue that balancing privacy, security, and utility remains an open challenge for supporting differential privacy in the context of machine learning in general, and deep neural network model training in particular.
Contributions of the paper. In this paper, we present a privacy analysis and compliance evaluation system, called MPLens, which investigates Membership Privacy through a multidimensional Lens. MPLens aims to expose membership inference vulnerabilities, including those unique to varying distributions of the training data. We also leverage MPLens to investigate differential privacy as a mitigation technique for membership inference risk in the context of deep neural network model training. Our privacy analysis system can serve for both MLaaS providers and data scientists to conduct privacy analysis and privacy compliance evaluation. This paper presents our initial design and implementation of MPLens and it makes three original contributions.
First, through MPLens, we demonstrate how membership inference attack methodologies can be leveraged in adversarial ML. Datasets developed by using the model prediction API for MLaaS not only reveal private information about the training dataset, such as the underlying distributions of the private training data, but they can also be used in developing and validating the adverse utility of adversarial examples.
Second, MPLens identifies and highlights that the vulnerability of pretrained models to the membership inference attack is not uniform when the training data itself is skewed. We show that risk from membership inference attacks is routinely increased when models use skewed training data. This vulnerability variation becomes particularly acute in federated learning environments wherein participants are likely to hold information representing different subsets of the population and therefore may incur different vulnerability to attack due to their participation. We argue that an indepth understanding of such disparities in privacy risks represent an important aspect for promoting fairness and accountability in machine learning.
Finally, we investigate the effectiveness of differential privacy as a mitigation technique against membership inference attacks, with a focus on deep learning models. We discuss the tradeoffs of implementing such a mitigation strategy for preventing membership inference and the impact of differential privacy on different classes when deep neural network (DNN) models are trained using skewed training datasets.
Ii Membership Inference Attacks
Attackers conducting membership inference attacks seek to identify whether or not an individual is a member of the dataset used to train a particular target machine learning model. We discuss the definition and the generation of membership inference attacks in this section, which will serve as the basic reference model of membership inference.
Iia Attack Definition
In studying membership inference attacks there are two primary sets of processes at play: (1) the training, deployment, and use of the machine learning model which the attacker is targeting for inference and (2) the development and use of the membership inference attack. Each of these two elements has guiding predefined objectives impacting respective outputs.
IiA1 Machine Learning Model Training and Prediction
The training of and prediction using the machine learning model which the attacker is targeting may be formalized as follows. Consider a dataset comprised of training instances with each instance containing features, denoted by , and a class value , where is a finite integer . Let be the target model trained using this dataset . is then deployed as a service such that users can provide a feature vector
, and the service will then output a probability vector
of the form , where , and . The prediction class label according to for a feature vector is the class with highest probability value in . Therefore .IiA2 Membership Inference Definition
Given some level of access to the trained model the attacker conducts his or her own training to develop a binary classifier which serves as the membership inference attack model. The most limited access environment in which an attacker may conduct the membership inference attack is the blackbox access environment. That is, an environment wherein the attacker may only query the target model through some machine learning as a service API and receive only the corresponding prediction vectors.
Let us consider an attacker with such blackbox access to . Given only a query input and output from some target model trained using a dataset , the membership inference attacker attempts to identify whether or not .
Dataset  Accuracy of Membership Inference (%) 
Adult  59.89 
MNIST  61.75 
CIFAR10  90.44 
Purchases10  82.29 
Purchases20  88.98 
Purchases50  93.71 
Purchases100  95.74 
Attack accuracies targeting decision tree models. Baseline accuracy against which to compare results is 50%.
Many different datasets and model types have demonstrated vulnerability to membership inference attacks in blackbox settings. Table I reports 5 accuracy results for blackbox attackers targeting decision tree models for problems ranging from binary classification (Adult) to 100class classification (Purchases100). We note that all experiments evaluated the attack model against an equal number of instances in the target training dataset as those not in . The baseline membership inference accuracy is therefore 50%. We refer readers to [14] for more details on these datasets and experimental set up.
These results demonstrate both the viability of membership inference attacks as well as the variation in vulnerability between datasets. This accentuates the need for practitioners to evaluate their system’s specific vulnerability.
Recently, researchers showed similar membership inference vulnerability in settings where attackers have whitebox access to the target model, including the output from the intermediate layers of a pretrained neural network model or the gradients for the target instance [15]. Interestingly, this study showed that the intermediate layer outputs, in most cases, do not lead to significant improvements in attack accuracy. For example, with the CIFAR100 dataset and AlexNet model structure, a blackbox attack achieves 74.6% accuracy while the whitebox attack achieves 75.18% accuracy. This result further supports the understanding that the attackers can gain sufficient knowledge from only the blackbox access to the pretrained models which is common in MLaaS platforms. Attackers do not require either full or even partial knowledge of the pretrained target model as blackbox attacks include the primary source of membership inference vulnerability.
IiB Attack Generation
The attack generation process can vary significantly based on the power of the attacker. For example, the attack proposed in [16] requires knowledge of the training error of . The attack technique proposed in [17], however, requires computational power and involves the training of multiple machine learning models. The techniques proposed in [18] are different still in that they require the attacker to develop effective threshold values. Figure 1 gives a workflow sketch of membership inference attack generation algorithm. We use the shadow model technique documented in [17] and [14] to describe the attack generation process of membership inference attacks, while noting that many of the processes may be applicable to other attack generation techniques.
IiB1 Generating Shadow Data and Substitute Models
In the shadow model technique, an attacker must first generate or access a shadow dataset, a synthetic labeled dataset to mirror the data in . While [17] and [14] both outline potential approaches to generating such a synthetic dataset from scratch, we would like to note that in many cases, attackers may also have examples of their own which can be used as seeds for the shadow data generation process or to bootstrap their shadow dataset. Consider our example of the financial institution. A competitor to the target institution may in fact have their own customer data, which could be leveraged to bootstrap a shadow dataset.
Once the attacker has developed the shadow dataset , the next phase of the membership inference attack is to leverage to train and observe a series of shadow models. Specifically, the shadow dataset is used to train multiple shadow models each of which is designed to emulate the behavior of the target model. Each shadow model is trained on a subset of the shadow dataset . As the attacker knows which portion of was provided to each shadow model, the attacker may then observe the shadow models’ behavior in response to instances which were in their training set versus behavior in response to those that were held out.
IiB2 Generating Attack Datasets and Models
Attackers use the observations of the shadow models to develop an attack dataset which captures the difference between the output generated by the shadow models for instances included in the training data and those previously unseen by models.
Once the attack dataset has been developed, is used to generate a binary classifier which provides predictions on whether an instance was previously known to a model based on the model’s output from that instance. At attack time this binary classifier may the be deployed against the target model service in a blackbox setting. The attack model takes as input prediction vectors of the same structure as those provided by the shadow models and contained within and produces as output a prediction of or representing “out” and “in” respectively with the former indicating an instance that was not in the training dataset of the target model and the latter indicating an instance that was included.
The totality of these two phases: (1) generating shadow data and substitute models and (2) generating attack datasets and models, constitute the primary processes for constructing the membership inference attack.
Degree of Noise  Attack Accuracy (%)  Attack Accuracy (%)  
with Noisy Target Data  with Noisy Shadow Data  
CIFAR10  Purchases10  Purchases20  Purchases50  CIFAR10  Purchases10  Purchases20  Purchases50  
0  67.49  66.69  80.70  88.52  67.49  66.69  80.70  88.52 
0.1  65.37  68.72  80.40  86.38  66.85  67.37  80.23  88.81 
0.2  63.88  66.01  77.47  85.85  65.36  66.86  81.20  88.35 
0.3  60.43  62.90  73.93  84.66  64.74  67.46  80.32  88.84 
0.4  60.48  60.07  68.23  83.21  62.64  66.91  80.14  88.63 
0.5  58.33  58.29  64.73  79.12  60.94  67.61  80.36  88.17 
0.6  57.53  57.58  61.51  73.50  60.09  66.68  80.08  88.54 
0.7  55.97  54.94  59.78  70.43  58.92  67.67  80.83  88.58 
0.8  55.35  54.44  58.21  67.16  58.66  66.73  80.49  88.54 
0.9  54.07  54.03  57.91  65.21  57.57  68.06  80.52  87.84 
1.0  53.95  52.72  56.02  62.44  56.55  67.32  80.43  87.70 
Iii Characterization of Membership Inference
Iiia Impact of Model Based Factors on Membership Inference
The most widely acknowledged factor impacting vulnerability to membership inference attacks is the degree of overfitting in the trained target model. Shokri et al. [17] demonstrate that the more overfitted a DNN model is, the more it leaks under membership inference attacks. Yeom et al. [16] investigated the role of overfitting from both the theoretical and the experimental perspectives. While their results confirm that models become more vulnerable as they overfit more severely, the authors also state that overfitting is not the only factor leading to model vulnerability under the membership inference attack. Truex et al. [14]
further demonstrate that several other model based factors also play important roles in causing model vulnerability to membership inference, such as classification problem complexity, inclass standard deviation, and the type of machine learning model targeted.
IiiB Impact of Attacker Knowledge on Membership Inference
Another category of factors that may cause model vulnerability to the membership inference attacks is the type and scope of knowledge which attackers may have about the target model and its training parameters. For example, Truex et al. [14] identified the impact that attacker knowledge with respect to both the training data of the target model and the target data have on the accuracy of the membership inference attack. This was evaluated by varying the degree of noise in the shadow dataset and target data used by the attacker. Table II shows the experimental results on four datasets with four types of learning tasks. The datasets include the CIFAR10 dataset which contains 3232 color images of different classes of objects while the Purchases datasets were developed from the Kaggle Acquire Valued Shoppers Challenge dataset containing the shopping history of several thousand individuals. Each instance in the Purchases datasets then represents an individual and each feature represents a particular product. If an individual has a purchase history with this product in the Kaggle Acquired Valued Shoppers Challenge dataset, there will be a 1 for the feature and otherwise a 0. The instances are then clustered into different shopping profile types which are treated as the classes. Table II reports results for Purchases datasets considering 10, 20, and 50 different shopping profile types.
The experiments in Table II demonstrate the impact of the attacker knowledge of the target data points by evaluating how adding varying degrees of to data features may impact on the success rate of membership inference attacks. Noise uniformly sampled from and added to features normalized within . Given a level of uncertainty of or inaccuracy in on the part of the attacker, represented by a corresponding degree of noise , Table II evaluates how effective the attacker remains in launching a membership inference attack to identify if or . The results reported in Table II
are reported for four logistic regression models, each one trained on a different dataset with gradually increasing
values (degree of noise).We make two interesting observations from Table II. First, for all four datasets, the more accurate (the less noise) the attacker knowledge about is, the higher the model vulnerability (attack success rate) to membership inference. This shows the accuracy of attacker knowledge about the targeted examples is an important factor in determining model vulnerability and attack success rate (in terms of attack accuracy). Second, in comparison to the noisy target data, adding noise to the shadow dataset results in a less severe drop in accuracy. Similar trends are however still observed with slightly higher attack success rates under smaller values for all four datasets. This set of experiments demonstrates that attackers with different knowledge and different levels of resources may have different success rates in launching a membership inference attack. Thus, model vulnerability should be evaluated by taking into account potential or available attacker knowledge.
IiiC Transferability of Membership Inference
Inspired by the transferability of adversarial examples [19], [20], [21], [22], membership inference attacks are also shown to be transferable. That is, attack model trained on an attack dataset containing the outputs from a set of shadow models is effective not only when shadow models and the target model are of the same type but also when the shadow model type varies. This property further opens the door to the blackbox attackers who do not have any knowledge of the target model.
Purchases20  Shadow Model Type  
Attack Model  DT  kNN  LR  NB 
DT  88.98  87.49  72.08  81.84 
kNN  88.23  72.57  84.75  74.27 
LR  89.02  88.11  88.99  83.57 
NB  88.96  78.60  89.05  66.34 
Table III demonstrates this property of membership inference attacks. It reports the membership inference attack accuracy for various attack configurations against a decision tree model trained on the Purchases20 dataset. It shows the transferability of membership inference attacks for different combinations from four different model types uses as the attack model type (rows) and the shadow model type (columns). In this experiment, the target prediction model is a decision tree. We observe that while using decision tree as the shadow model results in the most consistent membership inference attack success compared to other combinations, multiple combinations with both NN and logistic regression (LR) shadow models achieve attack success within 5% of gap compared to the most successful attack configuration using decision tree shadow models. Table III also shows that multiple types of models can be successful as the binary attack classifier . This set of experiments also shows that the worst attack performances against the DT target prediction model are seen when the shadow models are trained using Naïve Bayes (NB), with the worst performance reported when NB is the model type of the attack model as well. These results indicate that (1) the same strategy used for selecting the shadow model type may not be optimal for the attack model and (2) shadow models of different types other than the target model type may still lead to successful membership inference attacks. We refer readers to [14] for additional detail.
The transferability study in Table III indicates an attacker does not always need to know the exact target model configuration to launch an effective membership inference attack as attack models can be transferable from one target model type to another. And although finding the most effective attack strategy can be a challenging task for attackers, vulnerability to membership inference attack remains serious even with suboptimal attack configurations with almost all configurations reporting attack accuracy about and many above .
IiiD Training Data Skewness on Membership Attacks
The fourth important dimension of membership inference vulnerability is the risk imbalance across different prediction classes when the training data is skewed. Even when the overall membership inference vulnerability appears limited with attack success close to the 50% baseline (in or out random guess), there may be subgroups within the training data, which display significantly more vulnerability.
For example, Figure 2 illustrates the impact of data skewness on membership inference vulnerability. In this set of experiments, we measure the membership inference attack accuracy for a decision tree target model trained on the publicly available Adult dataset [23]. The adult dataset contains 48,842 instances, each with 14 different features and presents a binary classification problem wherein one wishes to identify if an individual’s yearly salary is K or K. The class distribution, however, is skewed with less than of instances being labeled $50K. As overfitting is widely considered a key factor in membership inference vulnerability, we simulate overfitting by increasing the depth of the target decision tree model. Figure 2 shows that the impact of overfitting (increasing in Xaxis) on both the aggregate membership inference vulnerability in terms of membership inference attack accuracy (accuracy over all classes) and the minority membership inference vulnerability (attack accuracy over the minority class). In this case aggregate vulnerability is the accuracy of the membership inference attack evaluated on an equal number of randomly selected examples seen by the target model (“in”) as as unseen (“out”) while the minority vulnerability reports the membership inference attack accuracy evaluated on only the subset of the previously selected examples whose class is $50K (the minority class).
We observe from Figure 2 that, the minority class has an increased risk under the membership inference attack as the model overfits more severely. This follows the intuition that minority class members have fewer other instances amongst whom they can hide in the training set and thus are more easily exposed under membership inference attacks. This aligns well to some extent with the observation that smaller training dataset sizes can lead to a greater overall risk for membership inference [17]. We argue that it is important for both data owners for model training and the MLaaS providers to consider vulnerability not just for the entire training dataset, but also the level of risk for minority populations specifically when evaluating privacy compliance.
Iv Mitigation Strategies and Algorithms
Iva Differential Privacy
Differential privacy is a formal privacy framework with a theoretical foundation and rigorous mathematical guarantees when effectively employed [10]. A machine learning algorithm is defined to be differentially private if and only if the inclusion of a single instance in the training dataset will cause only statistically insignificant changes to the output of the algorithm. Theoretical limits are set on such output changes in the definition of differential privacy, which is given formally as follows:
Definition 1 (Differential Privacy [10])
A randomized mechanism provides  differential privacy if for any two neighboring database and that differ in only a single entry, ,
(1) 
If , is said to satisfy differential privacy.
In the remaining of the paper, we focus on differential privacy for presentation convenience. To achieve differential privacy (DP), noise defined by is added to the algorithm’s output. This noise is proportional to the sensitivity of the output. Sensitivity measures the maximum change of the output due to the inclusion of a single data instance.
Definition 2 (Sensitivity [10])
For , the sensitivity of is
(2) 
for all , differing in at most one element.
The noise mechanism which is used is therefore bounded by both the sensitivity of the function , , and the privacy parameter . For example, consider the Gaussian mechanism defined as follows:
Definition 3 (Gaussian Noise Mechanism)
where
is the normal distribution with mean
and standard deviation . A single application of the Gaussian mechanism in Definition 3 to a function with sensitivity satisfies differential privacy if and [24].Additionally, there exist several nice properties of differential privacy for either multiple iterations of a differentially private function or the combination of multiple different functions wherein each satisfies a corresponding differential privacy. These composition properties are important for machine learning processes which often involve multiple passes over the training dataset . The formal composition properties of differential privacy include the following:
Definition 4 (Composition properties [24, 25])
Let be algorithms, such that for each , satisfies DP. Then, the following properties hold:

Sequential Composition: Releasing the outputs satisfies DP.

Parallel Composition: Executing each algorithm on a disjoint subset of satisfies DP.

Immunity to Postprocessing: Computing a function of the output of a differentially private algorithm does not deteriorate its privacy, e.g., publicly releasing the output of or using it as an input to another algorithm does not violate DP.
Differential privacy can be employed to different types of machine learning models. Due to the space constraint, in the rest of the paper, we specifically focus on differentially private training of deep neural network (DNN) models.
IvB Mechanisms for Differentially Private Deep Learning
DNNs are complex, sequentially stacked neural networks containing multiple layers of interconnected nodes. Each node represents the dataset in a unique way and each layer of networked nodes processes the input from previous layers using learned weights and a predefined activation function. The objective in training a DNN is to find the optimal weight values for each node in the multitier networks. This is accomplished by making multiple passes over the entire dataset with each pass constituting one
epoch. Within each epoch, the entire dataset is partitioned into many minibatches of equal size and the algorithm processes these batches sequentially, each including only a subset of the data. When processing one batch, the data is fed forward through the network using the existing weight values. A predefined loss function is computed for the errors made by the neural network learner with respect to the current batch of data. An optimizer, such as stochastic gradient descent (SGD), is then used to propagate these errors backward through the network. The weights are then updated according to the errors and the learning rate set by the training algorithm. The higher the learning rate value, the larger the update made in response to the backward propagation of errors.
Differentially private deep learning can be implemented by adding a small amount of noise to the updates made to the network such that there is only a marginal difference between the following two scenarios: (1) when a particular individual is included within the training dataset and (2) when the individual is absent from the training dataset. The noise added to the updates is sampled from a Gaussian distribution with scale determined by an appropriate noise parameter
corresponding to a desired level of privacy and controlled sensitivity. That is, the privacy budget , according to Definition 1, should constrain the value of at each epoch. Let denote the number of epochs for the DNN training, a predefined hyperparameter set at the training configuration as the termination condition. Let one epoch satisfy differential privacy given from the Gaussian mechanism in Definition 3. Then, a traditional accounting using the composition properties of differential privacy (Definition 4) would dictate that epochs would result in an overall privacy guarantee of if , and each epoch employed the Gaussian mechanism with the same value . We refer to this approach the fixed noise perturbation method [12]. An alternative approach proposed by Yu et. al. in [13] advocates a variable noise perturbation approach, which uses a decaying function to manage the total privacy budget and define variable noise scale based on different settings of for each different epoch (), aiming to add variable amount of noise to the epoch in a decreasing manner as the training progresses in epochs. Thus, we have for , and for a given epoch is bounded by its allocated privacy budget . The same overall privacy guarantee is met when for the differentially private DNN training of epochs.IvC Differentially Private Deep Learning with Fixed
The first differentially private approach to training deep neural networks is proposed by Abadi et al. [12]
and implemented on the tensorflow deep learning framework
[26]. A summary of their approach is given in Algorithm 1. To apply differential privacy, the sensitivity of each epoch is bounded by a clipping value , specifying that an instance may impact weight updates by at most the value . To achieve differential privacy, weight updates at the end of each batch include noise injection according to the sensitivity defined by and the scale of noise . The choice of is directly related to the overall privacy guarantee.Let , then according to differential privacy theory [24], each step (processing of a batch) is differentially private. If is randomly sampled from then additional properties of random sampling [27, 28] may be applied. Each step then becomes
differentially private. The moments accountant privacy accounting method is also introduced in
[12] to prove that Algorithm 1 is differentially private given appropriate parameter settings.We refer to Algorithm 1 as the fixed noise perturbation approach as each epoch is treated equally by introducing the same noise scale to every parameter update.
IvD Differentially Private Deep Learning with Variable
The variable noise perturbation approach to differentially private deep learning is proposed by Yu et al. [13]. It extends the fixed noise scale of over the total epochs in [12] by introducing two new capabilities. First, Yu et al. in [13] pointed out a limitation of the approach outlined in Algorithm 1. Namely, Algorithm 1 specifically calls for random sampling, wherein each batch is selected randomly with replacement. However, the most popular implementation for partitioning a dataset into minibatches in many deep learning frameworks is random shuffling, wherein the dataset is shuffled and then partitioned into evenly sized batches. In order to develop a differentially private DNN model under random shuffling, Yu et.al [13] extends Algorithm 1 of [12] by introducing a new privacy accounting method.
Additionally, Yu et al. [13] analyze the problem of using fixed noise scale (fixed values), and propose employing different noise scales to the weight updates at different stages of the training process. Specifically, Yu et al. [13] propose a set of methods for privacy budget allocation, which improve model accuracy by progressively reducing the noise scale as the training progresses. The variable noise scale approach is inspired by two observations. First, as the training progresses, the model begins to converge causing the noise being introduced to the updates to potentially become more impactful. This slows down the rate of model convergence and causes later epochs to no longer increase model accuracy compared to nonprivate scenarios. Second, the research on improving training accuracy and convergence rate of DNN training has led to a new generation of learning rate functions that replace the constant learning rate baseline by decaying learning rate functions and cyclic learning rates [29, 30]. Yu et al. [13] employed a similar set of decay functions to add noise at a decreased scale. That is, the noise defined by at the epoch as the training progresses is less than at the epoch given . The performance of four different types of decay functions to introduce variable noise scale by partitioning over epochs were evaluated.
IvE Important Implementation Factors
IvE1 Choosing Epsilon
In differentially private algorithms, the value dictates the amount of noise which must be introduced into the DNN model training and therefore the theoretical privacy bound. Choosing the correct value requires a careful balance between tolerable privacy leakage given the practical setting as well as the tolerable utility loss.
For example, Naldi and D’Acquisto propose a method in [31] for finding the optimal value to meet a certain accuracy for Laplacian mechanisms. Lee and Clifton alternatively employ the approach in [32] to analyze a particular adversarial model. Hsu et. al [33], on the other hand, takes an individual’s perspective on data privacy by optin incentivization. Kohli and Laskowski [34] also promote choosing an based on individual privacy preferences.
Despite these existing approaches, determining the “right” value remains a complex problem and is likely to be highly dependent on the privacy policy of the organization that owns the model and the dataset, the vulnerability of the model, the sensitivity of the data, and the tolerance to utility loss in the given setting. Additionally, there might be scenarios, such as the healthcare setting, in which even small degrees of utility loss are intolerable and are combined with stringent privacy constraints given highly sensitive data. In these cases, it may be hard or even impossible to find a good value for existing differentially private DNN training techniques.
IvE2 The Role of Transfer learning
Another key consideration is the use of transfer learning for dealing with more complex datasets
[12, 13]. For example, model parameters may be initialized by training on a nonprivate, similar dataset. The private dataset is then only used to further hone a subset of the model parameters. This helps to reduce the number of parameters affected by the noise addition required by exercising differential privacy. The use of transfer learning however relies on a strong assumption that such a nonprivate, similar dataset exists and and is available. In many cases, this assumption may be unrealistic.IvE3 Parameter Optimization
In additional to the privacy budget parameters , differentially private deep learning introduces influential privacy parameters, such as the clipping value that bounds the sensitivity for each epoch and the noise scale approach (including potential decay parameters) for . The settings of these privacy parameters may impact both the training convergence rate, and thus the training time, and the training and testing accuracy. However, tuning these privacy parameters becomes much more complex as we need to take into account the many learning hyperparameters already present in DNN training, such as the number of epochs, the batch size, the learning rate policy, and the optimization algorithm. These hyperparameters need to be carefully configured for high performance training of deep neural networks in a nonprivate setting. For deep learning with differential privacy, one needs to configure the privacy parameters by considering the side effect on other hyperparameters and reconfigure the previously tuned learning hyperparameters to account for the privacy approach.
For example, for a fixed total privacy budget values , too small of a number of epochs may result in reduced accuracy due to insufficient time to learn. A higher number of epochs however will result in a higher value required each epoch (recall Algorithm 1). Therefore, a carefully tuned hyperparameter for a nonprivate setting, such as the optimal number of epochs, may no longer be effective when differentially private deep learning is enabled.
A number of challenging questions remain open problems in differentially private DNN training, such as at what point do more epochs lead to accuracy loss under a particular privacy setting? What is the right noise decaying function for effective deep learning with differential privacy? Can we learn privately with high accuracy given complex datasets? The balance of the many parameters in a differentially private deep learning system presents new challenges to practitioners.
With these questions in mind, we develop MPLens, a membership privacy analysis system, which facilitates the evaluation of model vulnerability against membership inference attacks. Through MPLens, we investigate the effectiveness of differential privacy as a mitigation technique against membership inference attacks, including the tradeoffs of implementing such a mitigation strategy for preventing membership inference and the impact of differential privacy on different classes for the DNN models trained using skewed training datasets. We also highlight how the vulnerability of pretrained models under the membership inference attack is not uniform when the training data itself is skewed with minority populations. We show how this vulnerability variation may cause increased risks in federated learning systems.
V MPLens: System Overview
MPLens is designed as a privacy analysis and privacy compliance evaluation system for both data scientists and MLaaS providers to understand the membership inference vulnerability risk involved in their model training and model prediction process. Figure 3 provides an overview of the system architecture. The system allows providers to specify a set of factors that are critical to privacy analysis. Example factors include the data used to train their model, what data might be held by the attacker, what attack technique might be used, the degree of data skewness in the training set, whether the prediction model is constructed using the differentially private model training, what configurations are used for the set of differential privacy parameters, and so forth. Given the model input, the MPLens evaluation system reports the overall statistics on the vulnerability of the model, the perclass vulnerability, as well as the vulnerability of any sensitive populations such as specific minority groups. Example statistics include attack accuracy, precision, recall, and f1 score. We also include attacker confidence for true positives, false positives, true negatives, and false negatives, the average distance from both the false positives and the false negatives to the training data, and the time required to execute the attack.
Va Target Model Training
Overfitting is the first factor that MPLens measures for conducting membership vulnerability analysis. MPLens specifically highlights the overfitting analysis by reporting the Accuracy Difference between the target model training accuracy and testing accuracy. This enables MLaaS providers and domainspecific data scientists to understand whether their vulnerability might be linked to overfitting. As previously indicated, while overfitting is strongly correlated with membership inference vulnerability, it is not the only source of vulnerability. Thus, when MPLens indicates undesirable vulnerability an absence of significant overfitting, analysis may be triggered to investigate if vulnerability is linked to other model or data characteristics as those discussed in Section III.
VB Attacker Knowledge
Our MPLens system is by design customizable to understand multiple attack scenarios. For instance, users may specify the shadow data, which is used by the attacker. This allows the user to consider a scenario in which the attacker has access to some subset of the target model’s training data, one where the attacker has access to some data that are drawn from the same distribution as the training data of the target model, or one where the attacker has noisy and inaccurate data, such as that evaluated in Table II and possibly generated through blackbox probing [17, 14].
The MPLens system is additionally customizable with respect to the attack method, including the shadow model based attack techniques [17], and the thresholdbased attack techniques [16]. Furthermore, when using a thresholdbased attack, our MPLens system can either accept predetermined values representing attacker knowledge of the target model error or it can also determine good threshold values through the shadow model training.
These customizations allow MLaaS providers and users of MPLens to specify the types of attackers they wish to analyze, analyze their model vulnerability against such attackers, and evaluate the privacy compliance of their model training and model prediction services.
VC Transferability
Another aspect of privacy analysis is related to the specific model training methods used to generate membership inference attacks and whether different methods result in significant variations in membership inference vulnerability. Consider the attack method from [17], the user can specify not only the attacker’s data but also the shadow model training algorithm and the membership inference binary classifier training algorithm. Each element is customizable as a system parameter when configuring MPLens for specific privacy risk evaluation. The MPLens system makes no assumption on how the attacker develops the shadow dataset, what knowledge is included in the data, whether the attacker has knowledge of the target model algorithm, or what attack technique is used. This flexibility allows MPLens to support evaluation across various transferable attack configurations.
Vi Experimental Results and Analysis
Via Datasets
All experiments reported in this section were conducted using the following four datasets.
Cifar10
The CIFAR10 dataset contains 60,000 color images [35] and is publicly available. Each image is formatted to be 32 x 32. The CIFAR10 dataset contains 10 classes with 6,000 images each: airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. The problem is therefore a 10class classification problem where the task is to identify which class is depicted in a given image.
Cifar100
Also publicly available, CIFAR100 similarly contains 60,000 color images [35] formatted to 32 x 32. 100 classes are represented ranging from various animals, pieces of household furniture, or types of vehicles. Each class has 600 available images. The problem is therefore a 100class image classification problem.
Mnist
MNIST is a publicly available dataset containing 70,000 images of handwritten digits [36]. Each image is formatted to be 32 x 32 and processed such that the digit is at the center of the image. The MNIST dataset constitutes a 10class classification problem where the task is to identify which digit between and , inclusive, is contained within a given image.
Labeled Faces in the Wild
The Labeled Faces in the Wild (LFW) database contains face photographs for unconstrained face recognition with more than 13,000 images of faces collected from the web. Each face has been labeled with the name of the person pictured. 1,680 of the people pictured have two or more distinct photos in the data set. Each person is then labeled with a gender and race (including mixed races). Data is then selected for the top 22 classes which were represented with a sufficient number of data points.
ViB Membership Inference Risk: Adversarial Examples
Given a target model and its training data, the membership inference attack, using blackbox access to the model prediction API, can be used to create a representative dataset. This representative dataset can be leveraged to generate substitute models for the given target model. One can then use such substitute models to generate adversarial examples using different adversarial attack methods [37].
Figures 77 provide visualization plots for the comparison of 2D PCA given the images of the dog and truck classes in CIFAR10. The plots are divided relative to the membership inference attack prediction output including the true target model training data (Figure 7), the data predicted by the membership inference attack as training data (Figure 7), and the data predicted as nontraining data by the membership inference attack (Figure 7).
These plots illustrate the accuracy of the distribution of instances predicted as in the target model’s training dataset through membership inference. They clearly demonstrate how even with the inclusion of false positives, an attacker can create a good representation of the training data distribution, particularly compared to those instances not predicted to be in the target training data. An attacker can therefore easily train a substitute model on this representative dataset which then enables the generation of adversarial examples by attacking this substitute model. Examples developed to successfully attack a substitute model trained on the instances in Figure 7 are likely to also be successful against a model trained on the instances in Figure 7.
ViC Membership Inference Risk: Data Skewness
To date, the membership inference attack has been studied either using training datasets with uniform class distribution or without specific consideration of the impact of any data skewness. However, as we demonstrated earlier, the risk of membership inference vulnerability can vary when class representation is skewed. Minority classes can display increased risk to membership inference attack as models struggle to more effectively generalize past the training data when fewer instances are given. In this section, we focus on studying the impact of data skewness on vulnerability to membership inference attacks.
In Figure 7 we investigate the impact of data skewness on membership inference vulnerability by controlling the representation of a single class. We reduce the automobile images from the CIFAR10 dataset to only 1% of the data and then increase the representation until the dataset is again balanced with automobiles representing 10% of the training images. We then plot the aggregate membership inference vulnerability which is the overall membership inference attack accuracy evaluated across all classes as well as the vulnerability of just the automobile class.
Figure 7 demonstrates that in cases where the automobile class constitutes 5% or less of the total training dataset, i.e. the automobile class has fewer than half as many instances as each of the other classes, this skewness will result in the automobile class displaying more severe vulnerability to membership inference attack.
Interestingly, the automobile class displays lower membership inference vulnerability than the average vulnerability reported in the CIFAR10 dataset when the dataset is balanced. However, when the automobile class becomes a minority class with fewer than half the instances of each of the other classes, the vulnerability shifts to be greater than that reported by the overall model. This gap becomes greater as continued decreased representation results in continued increased vulnerability for the automobile class.
Target Population  Attack Accuracy (%) 
Aggregate  70.14 
Male Images  68.18 
Female Images  76.85 
White Race Images  62.77 
Racial Minority Images  89.90 
Table IV additionally shows the vulnerability of a DNN target model trained on the LFW dataset to membership inference attacks. We analyze this vulnerability by breaking down the aggregated vulnerability across the top 22 classes into four different (nondisjoint) subsets of the LFW dataset: Male, Female, White Race, and Racial Minority. We observe that the training examples of racial minorities experience the highest attack success rate () and are thus highly vulnerable to membership inference attacks compared to images of white individuals. Similarly, female images, which represent less than of the training data, demonstrate higher average vulnerability () compared with images of males ().
To provide deeper insight and more intuitive illustration for the increased vulnerability of minority groups under membership inference attacks, Table V provides 7 individual examples of images in the LFW dataset targeted by the membership inference attack. That is, given a query with each example image, the target model predicts the individual’s race and gender (22 separate classes) which the attack model, a binary classifier, uses to predict if that image was “in” or “out” of the target model’s training dataset. The last row reports the ground truth as to whether or not the image was in the target model training set.
✓ = correct prediction ✗ = wrong prediction  
Target  ✓ 99.99  ✓ 65.81  ✓ 72.56  ✗ 62.30  ✓ 99.99  ✗ 99.63  ✓ 98.38 
Confidence (%)  
Attacker  ✓ 86.10  ✓ 50.49  ✗ 61.85  ✓ 72.06  ✓ 56.40  ✓ 99.88  ✗ 53.29 
Confidence (%)  
In Training  in  out  out  out  in  out  out 
Data? 
Through Table V, we highlight how minority populations are more likely to be identified by an attacker with a higher degree of confidence. We next discuss each example from left to right to articulate the impact of data skewness on model vulnerability to the membership inference attack.
For the 1st image, the target model is highly confident with its prediction and its prediction is indeed correct. Using the membership inference attack model, the attacker predicts that the model must have seen this example with high confidence and it succeeds in the membership inference attack.
For the 2nd image, the target model is less confident with its prediction, although the prediction outcome is correct. The attacker succeeds in the membership inference attack because it correctly predicts that this example is not in the training set, though the attacker’s confidence on this membership inference is much less certain (close to ) compared to that of the attack to the 1st image. We conjecture that the relatively low prediction confidence by the target model may likely contribute to the fact that the attacker is unable to obtain a high confidence for his membership inference attack.
The 3rd image is predicted by the target model correctly with a confidence of 72.56%, which is about 11.5% more confidence than that for the 2nd image. However, the attacker wrongly predicts that the example is in the training set when the ground truth shows that this example is not in the training set. Assuming the same logic as with the 1st image, i.e., the confidence and accuracy of target model prediction may indicate that the image was in the training dataset, could have caused the attacker to be misled.
For the 4th image, the target model has an incorrect prediction with the confidence of 62.30%. The attacker correctly predicts that this example is not in the training set. It is clear that a somewhat confident and yet incorrect prediction by the target model is likely to result in high attacker confidence that this minority individual has not been seen during the training.
These four images highlight the compounding downfall for minority populations. Models are more likely to overfit these populations. This leads to poor test accuracy for these populations and makes them more vulnerable to attack. As the 3rd example shows, the way to fool attackers is to have an accurate target model that can show reasonable confidence when classifying minority test images.
We next compare the results from the previous four images which represented minority classes with results from three images representing the majority class (white male images). For the 5th image in Table V, the target model predicts correctly with high confidence and the attacker is able to correctly predict that the image was in the training data. This result can be interpreted through comparison with the attacker performance for the 1st image. For the 5th query image which is from the majority group, the attacker predicts correctly that the image has been seen in training with barely over 50% in confidence, showing relatively high uncertainty compared with the 1st image from a minority class. This indicates that model accuracy and confidence are weaker indicators with respect to membership inference vulnerability for the majority class.
For the 6th image, the target model produces an incorrect prediction with high confidence. The attacker is very confident that the query image is not in the training dataset, which is indeed the truth. This demonstrates rare a potential vulnerability for the majority class: When the target model has high confidence in an inaccurate prediction, an attack model is able to confidently succeed in the membership inference attack. Through this example and the above analysis, we see that the majority classes have two advantages compared to the minority classes: (1) it is less common for the target model to demonstrate this vulnerability of misclassification with high confidence; and (2) for majority classes, the accuracy and privacy are aligned rather than as competing objectives.
For the 7th image, the target model makes a correct prediction with high confidence. The attacker makes the incorrect prediction that the example was in the training dataset of the target model. But the truth is that the example is not in the training set. This membership attack failed and is the flip side of the 5th image, with both reporting low attack confidence. Again by comparing with the 1st image of minority, it shows how model confidence and accuracy may lead to membership inference vulnerability for the minority classes in a way that is not true for the majority classes.
ViD Mitigation with Differentially Private Training
The second core component of our experimental analysis is to use MPLens to investigate the effectiveness of differential privacy employed to deep learning models as a countermeasure for membership inference mitigation.
To define utility loss we follow [38] and consider where represents accuracy in a nonprivate setting and is the accuracy when differential privacy is employed for the same model and data.
ViD1 Model and Problem Complexity
Ii Membership Inference Attacks
Attackers conducting membership inference attacks seek to identify whether or not an individual is a member of the dataset used to train a particular target machine learning model. We discuss the definition and the generation of membership inference attacks in this section, which will serve as the basic reference model of membership inference.
Iia Attack Definition
In studying membership inference attacks there are two primary sets of processes at play: (1) the training, deployment, and use of the machine learning model which the attacker is targeting for inference and (2) the development and use of the membership inference attack. Each of these two elements has guiding predefined objectives impacting respective outputs.
IiA1 Machine Learning Model Training and Prediction
The training of and prediction using the machine learning model which the attacker is targeting may be formalized as follows. Consider a dataset comprised of training instances with each instance containing features, denoted by , and a class value , where is a finite integer . Let be the target model trained using this dataset . is then deployed as a service such that users can provide a feature vector
, and the service will then output a probability vector
of the form , where , and . The prediction class label according to for a feature vector is the class with highest probability value in . Therefore .IiA2 Membership Inference Definition
Given some level of access to the trained model the attacker conducts his or her own training to develop a binary classifier which serves as the membership inference attack model. The most limited access environment in which an attacker may conduct the membership inference attack is the blackbox access environment. That is, an environment wherein the attacker may only query the target model through some machine learning as a service API and receive only the corresponding prediction vectors.
Let us consider an attacker with such blackbox access to . Given only a query input and output from some target model trained using a dataset , the membership inference attacker attempts to identify whether or not .
Dataset  Accuracy of Membership Inference (%) 
Adult  59.89 
MNIST  61.75 
CIFAR10  90.44 
Purchases10  82.29 
Purchases20  88.98 
Purchases50  93.71 
Purchases100  95.74 
Attack accuracies targeting decision tree models. Baseline accuracy against which to compare results is 50%.
Many different datasets and model types have demonstrated vulnerability to membership inference attacks in blackbox settings. Table I reports 5 accuracy results for blackbox attackers targeting decision tree models for problems ranging from binary classification (Adult) to 100class classification (Purchases100). We note that all experiments evaluated the attack model against an equal number of instances in the target training dataset as those not in . The baseline membership inference accuracy is therefore 50%. We refer readers to [14] for more details on these datasets and experimental set up.
These results demonstrate both the viability of membership inference attacks as well as the variation in vulnerability between datasets. This accentuates the need for practitioners to evaluate their system’s specific vulnerability.
Recently, researchers showed similar membership inference vulnerability in settings where attackers have whitebox access to the target model, including the output from the intermediate layers of a pretrained neural network model or the gradients for the target instance [15]. Interestingly, this study showed that the intermediate layer outputs, in most cases, do not lead to significant improvements in attack accuracy. For example, with the CIFAR100 dataset and AlexNet model structure, a blackbox attack achieves 74.6% accuracy while the whitebox attack achieves 75.18% accuracy. This result further supports the understanding that the attackers can gain sufficient knowledge from only the blackbox access to the pretrained models which is common in MLaaS platforms. Attackers do not require either full or even partial knowledge of the pretrained target model as blackbox attacks include the primary source of membership inference vulnerability.
IiB Attack Generation
The attack generation process can vary significantly based on the power of the attacker. For example, the attack proposed in [16] requires knowledge of the training error of . The attack technique proposed in [17], however, requires computational power and involves the training of multiple machine learning models. The techniques proposed in [18] are different still in that they require the attacker to develop effective threshold values. Figure 1 gives a workflow sketch of membership inference attack generation algorithm. We use the shadow model technique documented in [17] and [14] to describe the attack generation process of membership inference attacks, while noting that many of the processes may be applicable to other attack generation techniques.
IiB1 Generating Shadow Data and Substitute Models
In the shadow model technique, an attacker must first generate or access a shadow dataset, a synthetic labeled dataset to mirror the data in . While [17] and [14] both outline potential approaches to generating such a synthetic dataset from scratch, we would like to note that in many cases, attackers may also have examples of their own which can be used as seeds for the shadow data generation process or to bootstrap their shadow dataset. Consider our example of the financial institution. A competitor to the target institution may in fact have their own customer data, which could be leveraged to bootstrap a shadow dataset.
Once the attacker has developed the shadow dataset , the next phase of the membership inference attack is to leverage to train and observe a series of shadow models. Specifically, the shadow dataset is used to train multiple shadow models each of which is designed to emulate the behavior of the target model. Each shadow model is trained on a subset of the shadow dataset . As the attacker knows which portion of was provided to each shadow model, the attacker may then observe the shadow models’ behavior in response to instances which were in their training set versus behavior in response to those that were held out.
IiB2 Generating Attack Datasets and Models
Attackers use the observations of the shadow models to develop an attack dataset which captures the difference between the output generated by the shadow models for instances included in the training data and those previously unseen by models.
Once the attack dataset has been developed, is used to generate a binary classifier which provides predictions on whether an instance was previously known to a model based on the model’s output from that instance. At attack time this binary classifier may the be deployed against the target model service in a blackbox setting. The attack model takes as input prediction vectors of the same structure as those provided by the shadow models and contained within and produces as output a prediction of or representing “out” and “in” respectively with the former indicating an instance that was not in the training dataset of the target model and the latter indicating an instance that was included.
The totality of these two phases: (1) generating shadow data and substitute models and (2) generating attack datasets and models, constitute the primary processes for constructing the membership inference attack.
Degree of Noise  Attack Accuracy (%)  Attack Accuracy (%)  
with Noisy Target Data  with Noisy Shadow Data  
CIFAR10  Purchases10  Purchases20  Purchases50  CIFAR10  Purchases10  Purchases20  Purchases50  
0  67.49  66.69  80.70  88.52  67.49  66.69  80.70  88.52 
0.1  65.37  68.72  80.40  86.38  66.85  67.37  80.23  88.81 
0.2  63.88  66.01  77.47  85.85  65.36  66.86  81.20  88.35 
0.3  60.43  62.90  73.93  84.66  64.74  67.46  80.32  88.84 
0.4  60.48  60.07  68.23  83.21  62.64  66.91  80.14  88.63 
0.5  58.33  58.29  64.73  79.12  60.94  67.61  80.36  88.17 
0.6  57.53  57.58  61.51  73.50  60.09  66.68  80.08  88.54 
0.7  55.97  54.94  59.78  70.43  58.92  67.67  80.83  88.58 
0.8  55.35  54.44  58.21  67.16  58.66  66.73  80.49  88.54 
0.9  54.07  54.03  57.91  65.21  57.57  68.06  80.52  87.84 
1.0  53.95  52.72  56.02  62.44  56.55  67.32  80.43  87.70 
Iii Characterization of Membership Inference
Iiia Impact of Model Based Factors on Membership Inference
The most widely acknowledged factor impacting vulnerability to membership inference attacks is the degree of overfitting in the trained target model. Shokri et al. [17] demonstrate that the more overfitted a DNN model is, the more it leaks under membership inference attacks. Yeom et al. [16] investigated the role of overfitting from both the theoretical and the experimental perspectives. While their results confirm that models become more vulnerable as they overfit more severely, the authors also state that overfitting is not the only factor leading to model vulnerability under the membership inference attack. Truex et al. [14]
further demonstrate that several other model based factors also play important roles in causing model vulnerability to membership inference, such as classification problem complexity, inclass standard deviation, and the type of machine learning model targeted.
IiiB Impact of Attacker Knowledge on Membership Inference
Another category of factors that may cause model vulnerability to the membership inference attacks is the type and scope of knowledge which attackers may have about the target model and its training parameters. For example, Truex et al. [14] identified the impact that attacker knowledge with respect to both the training data of the target model and the target data have on the accuracy of the membership inference attack. This was evaluated by varying the degree of noise in the shadow dataset and target data used by the attacker. Table II shows the experimental results on four datasets with four types of learning tasks. The datasets include the CIFAR10 dataset which contains 3232 color images of different classes of objects while the Purchases datasets were developed from the Kaggle Acquire Valued Shoppers Challenge dataset containing the shopping history of several thousand individuals. Each instance in the Purchases datasets then represents an individual and each feature represents a particular product. If an individual has a purchase history with this product in the Kaggle Acquired Valued Shoppers Challenge dataset, there will be a 1 for the feature and otherwise a 0. The instances are then clustered into different shopping profile types which are treated as the classes. Table II reports results for Purchases datasets considering 10, 20, and 50 different shopping profile types.
The experiments in Table II demonstrate the impact of the attacker knowledge of the target data points by evaluating how adding varying degrees of to data features may impact on the success rate of membership inference attacks. Noise uniformly sampled from and added to features normalized within . Given a level of uncertainty of or inaccuracy in on the part of the attacker, represented by a corresponding degree of noise , Table II evaluates how effective the attacker remains in launching a membership inference attack to identify if or . The results reported in Table II
are reported for four logistic regression models, each one trained on a different dataset with gradually increasing
values (degree of noise).We make two interesting observations from Table II. First, for all four datasets, the more accurate (the less noise) the attacker knowledge about is, the higher the model vulnerability (attack success rate) to membership inference. This shows the accuracy of attacker knowledge about the targeted examples is an important factor in determining model vulnerability and attack success rate (in terms of attack accuracy). Second, in comparison to the noisy target data, adding noise to the shadow dataset results in a less severe drop in accuracy. Similar trends are however still observed with slightly higher attack success rates under smaller values for all four datasets. This set of experiments demonstrates that attackers with different knowledge and different levels of resources may have different success rates in launching a membership inference attack. Thus, model vulnerability should be evaluated by taking into account potential or available attacker knowledge.
IiiC Transferability of Membership Inference
Inspired by the transferability of adversarial examples [19], [20], [21], [22], membership inference attacks are also shown to be transferable. That is, attack model trained on an attack dataset containing the outputs from a set of shadow models is effective not only when shadow models and the target model are of the same type but also when the shadow model type varies. This property further opens the door to the blackbox attackers who do not have any knowledge of the target model.
Purchases20  Shadow Model Type  
Attack Model  DT  kNN  LR  NB 
DT  88.98  87.49  72.08  81.84 
kNN  88.23  72.57  84.75  74.27 
LR  89.02  88.11  88.99  83.57 
NB  88.96  78.60  89.05  66.34 
Table III demonstrates this property of membership inference attacks. It reports the membership inference attack accuracy for various attack configurations against a decision tree model trained on the Purchases20 dataset. It shows the transferability of membership inference attacks for different combinations from four different model types uses as the attack model type (rows) and the shadow model type (columns). In this experiment, the target prediction model is a decision tree. We observe that while using decision tree as the shadow model results in the most consistent membership inference attack success compared to other combinations, multiple combinations with both NN and logistic regression (LR) shadow models achieve attack success within 5% of gap compared to the most successful attack configuration using decision tree shadow models. Table III also shows that multiple types of models can be successful as the binary attack classifier . This set of experiments also shows that the worst attack performances against the DT target prediction model are seen when the shadow models are trained using Naïve Bayes (NB), with the worst performance reported when NB is the model type of the attack model as well. These results indicate that (1) the same strategy used for selecting the shadow model type may not be optimal for the attack model and (2) shadow models of different types other than the target model type may still lead to successful membership inference attacks. We refer readers to [14] for additional detail.
The transferability study in Table III indicates an attacker does not always need to know the exact target model configuration to launch an effective membership inference attack as attack models can be transferable from one target model type to another. And although finding the most effective attack strategy can be a challenging task for attackers, vulnerability to membership inference attack remains serious even with suboptimal attack configurations with almost all configurations reporting attack accuracy about and many above .
IiiD Training Data Skewness on Membership Attacks
The fourth important dimension of membership inference vulnerability is the risk imbalance across different prediction classes when the training data is skewed. Even when the overall membership inference vulnerability appears limited with attack success close to the 50% baseline (in or out random guess), there may be subgroups within the training data, which display significantly more vulnerability.
For example, Figure 2 illustrates the impact of data skewness on membership inference vulnerability. In this set of experiments, we measure the membership inference attack accuracy for a decision tree target model trained on the publicly available Adult dataset [23]. The adult dataset contains 48,842 instances, each with 14 different features and presents a binary classification problem wherein one wishes to identify if an individual’s yearly salary is K or K. The class distribution, however, is skewed with less than of instances being labeled $50K. As overfitting is widely considered a key factor in membership inference vulnerability, we simulate overfitting by increasing the depth of the target decision tree model. Figure 2 shows that the impact of overfitting (increasing in Xaxis) on both the aggregate membership inference vulnerability in terms of membership inference attack accuracy (accuracy over all classes) and the minority membership inference vulnerability (attack accuracy over the minority class). In this case aggregate vulnerability is the accuracy of the membership inference attack evaluated on an equal number of randomly selected examples seen by the target model (“in”) as as unseen (“out”) while the minority vulnerability reports the membership inference attack accuracy evaluated on only the subset of the previously selected examples whose class is $50K (the minority class).
We observe from Figure 2 that, the minority class has an increased risk under the membership inference attack as the model overfits more severely. This follows the intuition that minority class members have fewer other instances amongst whom they can hide in the training set and thus are more easily exposed under membership inference attacks. This aligns well to some extent with the observation that smaller training dataset sizes can lead to a greater overall risk for membership inference [17]. We argue that it is important for both data owners for model training and the MLaaS providers to consider vulnerability not just for the entire training dataset, but also the level of risk for minority populations specifically when evaluating privacy compliance.
Iv Mitigation Strategies and Algorithms
Iva Differential Privacy
Differential privacy is a formal privacy framework with a theoretical foundation and rigorous mathematical guarantees when effectively employed [10]. A machine learning algorithm is defined to be differentially private if and only if the inclusion of a single instance in the training dataset will cause only statistically insignificant changes to the output of the algorithm. Theoretical limits are set on such output changes in the definition of differential privacy, which is given formally as follows:
Definition 1 (Differential Privacy [10])
A randomized mechanism provides  differential privacy if for any two neighboring database and that differ in only a single entry, ,
(1) 
If , is said to satisfy differential privacy.
In the remaining of the paper, we focus on differential privacy for presentation convenience. To achieve differential privacy (DP), noise defined by is added to the algorithm’s output. This noise is proportional to the sensitivity of the output. Sensitivity measures the maximum change of the output due to the inclusion of a single data instance.
Definition 2 (Sensitivity [10])
For , the sensitivity of is
(2) 
for all , differing in at most one element.
The noise mechanism which is used is therefore bounded by both the sensitivity of the function , , and the privacy parameter . For example, consider the Gaussian mechanism defined as follows:
Definition 3 (Gaussian Noise Mechanism)
where
is the normal distribution with mean
and standard deviation . A single application of the Gaussian mechanism in Definition 3 to a function with sensitivity satisfies differential privacy if and [24].Additionally, there exist several nice properties of differential privacy for either multiple iterations of a differentially private function or the combination of multiple different functions wherein each satisfies a corresponding differential privacy. These composition properties are important for machine learning processes which often involve multiple passes over the training dataset . The formal composition properties of differential privacy include the following:
Definition 4 (Composition properties [24, 25])
Let be algorithms, such that for each , satisfies DP. Then, the following properties hold:

Sequential Composition: Releasing the outputs satisfies DP.

Parallel Composition: Executing each algorithm on a disjoint subset of satisfies DP.

Immunity to Postprocessing: Computing a function of the output of a differentially private algorithm does not deteriorate its privacy, e.g., publicly releasing the output of or using it as an input to another algorithm does not violate DP.
Differential privacy can be employed to different types of machine learning models. Due to the space constraint, in the rest of the paper, we specifically focus on differentially private training of deep neural network (DNN) models.
IvB Mechanisms for Differentially Private Deep Learning
DNNs are complex, sequentially stacked neural networks containing multiple layers of interconnected nodes. Each node represents the dataset in a unique way and each layer of networked nodes processes the input from previous layers using learned weights and a predefined activation function. The objective in training a DNN is to find the optimal weight values for each node in the multitier networks. This is accomplished by making multiple passes over the entire dataset with each pass constituting one
epoch. Within each epoch, the entire dataset is partitioned into many minibatches of equal size and the algorithm processes these batches sequentially, each including only a subset of the data. When processing one batch, the data is fed forward through the network using the existing weight values. A predefined loss function is computed for the errors made by the neural network learner with respect to the current batch of data. An optimizer, such as stochastic gradient descent (SGD), is then used to propagate these errors backward through the network. The weights are then updated according to the errors and the learning rate set by the training algorithm. The higher the learning rate value, the larger the update made in response to the backward propagation of errors.
Differentially private deep learning can be implemented by adding a small amount of noise to the updates made to the network such that there is only a marginal difference between the following two scenarios: (1) when a particular individual is included within the training dataset and (2) when the individual is absent from the training dataset. The noise added to the updates is sampled from a Gaussian distribution with scale determined by an appropriate noise parameter
corresponding to a desired level of privacy and controlled sensitivity. That is, the privacy budget , according to Definition 1, should constrain the value of at each epoch. Let denote the number of epochs for the DNN training, a predefined hyperparameter set at the training configuration as the termination condition. Let one epoch satisfy differential privacy given from the Gaussian mechanism in Definition 3. Then, a traditional accounting using the composition properties of differential privacy (Definition 4) would dictate that epochs would result in an overall privacy guarantee of if , and each epoch employed the Gaussian mechanism with the same value . We refer to this approach the fixed noise perturbation method [12]. An alternative approach proposed by Yu et. al. in [13] advocates a variable noise perturbation approach, which uses a decaying function to manage the total privacy budget and define variable noise scale based on different settings of for each different epoch (), aiming to add variable amount of noise to the epoch in a decreasing manner as the training progresses in epochs. Thus, we have for , and for a given epoch is bounded by its allocated privacy budget . The same overall privacy guarantee is met when for the differentially private DNN training of epochs.IvC Differentially Private Deep Learning with Fixed
The first differentially private approach to training deep neural networks is proposed by Abadi et al. [12]
and implemented on the tensorflow deep learning framework
[26]. A summary of their approach is given in Algorithm 1. To apply differential privacy, the sensitivity of each epoch is bounded by a clipping value , specifying that an instance may impact weight updates by at most the value . To achieve differential privacy, weight updates at the end of each batch include noise injection according to the sensitivity defined by and the scale of noise . The choice of is directly related to the overall privacy guarantee.Let , then according to differential privacy theory [24], each step (processing of a batch) is differentially private. If is randomly sampled from then additional properties of random sampling [27, 28] may be applied. Each step then becomes
differentially private. The moments accountant privacy accounting method is also introduced in
[12] to prove that Algorithm 1 is differentially private given appropriate parameter settings.We refer to Algorithm 1 as the fixed noise perturbation approach as each epoch is treated equally by introducing the same noise scale to every parameter update.
IvD Differentially Private Deep Learning with Variable
The variable noise perturbation approach to differentially private deep learning is proposed by Yu et al. [13]. It extends the fixed noise scale of over the total epochs in [12] by introducing two new capabilities. First, Yu et al. in [13] pointed out a limitation of the approach outlined in Algorithm 1. Namely, Algorithm 1 specifically calls for random sampling, wherein each batch is selected randomly with replacement. However, the most popular implementation for partitioning a dataset into minibatches in many deep learning frameworks is random shuffling, wherein the dataset is shuffled and then partitioned into evenly sized batches. In order to develop a differentially private DNN model under random shuffling, Yu et.al [13] extends Algorithm 1 of [12] by introducing a new privacy accounting method.
Additionally, Yu et al. [13] analyze the problem of using fixed noise scale (fixed values), and propose employing different noise scales to the weight updates at different stages of the training process. Specifically, Yu et al. [13] propose a set of methods for privacy budget allocation, which improve model accuracy by progressively reducing the noise scale as the training progresses. The variable noise scale approach is inspired by two observations. First, as the training progresses, the model begins to converge causing the noise being introduced to the updates to potentially become more impactful. This slows down the rate of model convergence and causes later epochs to no longer increase model accuracy compared to nonprivate scenarios. Second, the research on improving training accuracy and convergence rate of DNN training has led to a new generation of learning rate functions that replace the constant learning rate baseline by decaying learning rate functions and cyclic learning rates [29, 30]. Yu et al. [13] employed a similar set of decay functions to add noise at a decreased scale. That is, the noise defined by at the epoch as the training progresses is less than at the epoch given . The performance of four different types of decay functions to introduce variable noise scale by partitioning over epochs were evaluated.
IvE Important Implementation Factors
IvE1 Choosing Epsilon
In differentially private algorithms, the value dictates the amount of noise which must be introduced into the DNN model training and therefore the theoretical privacy bound. Choosing the correct value requires a careful balance between tolerable privacy leakage given the practical setting as well as the tolerable utility loss.
For example, Naldi and D’Acquisto propose a method in [31] for finding the optimal value to meet a certain accuracy for Laplacian mechanisms. Lee and Clifton alternatively employ the approach in [32] to analyze a particular adversarial model. Hsu et. al [33], on the other hand, takes an individual’s perspective on data privacy by optin incentivization. Kohli and Laskowski [34] also promote choosing an based on individual privacy preferences.
Despite these existing approaches, determining the “right” value remains a complex problem and is likely to be highly dependent on the privacy policy of the organization that owns the model and the dataset, the vulnerability of the model, the sensitivity of the data, and the tolerance to utility loss in the given setting. Additionally, there might be scenarios, such as the healthcare setting, in which even small degrees of utility loss are intolerable and are combined with stringent privacy constraints given highly sensitive data. In these cases, it may be hard or even impossible to find a good value for existing differentially private DNN training techniques.
IvE2 The Role of Transfer learning
Another key consideration is the use of transfer learning for dealing with more complex datasets
[12, 13]. For example, model parameters may be initialized by training on a nonprivate, similar dataset. The private dataset is then only used to further hone a subset of the model parameters. This helps to reduce the number of parameters affected by the noise addition required by exercising differential privacy. The use of transfer learning however relies on a strong assumption that such a nonprivate, similar dataset exists and and is available. In many cases, this assumption may be unrealistic.IvE3 Parameter Optimization
In additional to the privacy budget parameters , differentially private deep learning introduces influential privacy parameters, such as the clipping value that bounds the sensitivity for each epoch and the noise scale approach (including potential decay parameters) for . The settings of these privacy parameters may impact both the training convergence rate, and thus the training time, and the training and testing accuracy. However, tuning these privacy parameters becomes much more complex as we need to take into account the many learning hyperparameters already present in DNN training, such as the number of epochs, the batch size, the learning rate policy, and the optimization algorithm. These hyperparameters need to be carefully configured for high performance training of deep neural networks in a nonprivate setting. For deep learning with differential privacy, one needs to configure the privacy parameters by considering the side effect on other hyperparameters and reconfigure the previously tuned learning hyperparameters to account for the privacy approach.
For example, for a fixed total privacy budget values , too small of a number of epochs may result in reduced accuracy due to insufficient time to learn. A higher number of epochs however will result in a higher value required each epoch (recall Algorithm 1). Therefore, a carefully tuned hyperparameter for a nonprivate setting, such as the optimal number of epochs, may no longer be effective when differentially private deep learning is enabled.
A number of challenging questions remain open problems in differentially private DNN training, such as at what point do more epochs lead to accuracy loss under a particular privacy setting? What is the right noise decaying function for effective deep learning with differential privacy? Can we learn privately with high accuracy given complex datasets? The balance of the many parameters in a differentially private deep learning system presents new challenges to practitioners.
With these questions in mind, we develop MPLens, a membership privacy analysis system, which facilitates the evaluation of model vulnerability against membership inference attacks. Through MPLens, we investigate the effectiveness of differential privacy as a mitigation technique against membership inference attacks, including the tradeoffs of implementing such a mitigation strategy for preventing membership inference and the impact of differential privacy on different classes for the DNN models trained using skewed training datasets. We also highlight how the vulnerability of pretrained models under the membership inference attack is not uniform when the training data itself is skewed with minority populations. We show how this vulnerability variation may cause increased risks in federated learning systems.
V MPLens: System Overview
MPLens is designed as a privacy analysis and privacy compliance evaluation system for both data scientists and MLaaS providers to understand the membership inference vulnerability risk involved in their model training and model prediction process. Figure 3 provides an overview of the system architecture. The system allows providers to specify a set of factors that are critical to privacy analysis. Example factors include the data used to train their model, what data might be held by the attacker, what attack technique might be used, the degree of data skewness in the training set, whether the prediction model is constructed using the differentially private model training, what configurations are used for the set of differential privacy parameters, and so forth. Given the model input, the MPLens evaluation system reports the overall statistics on the vulnerability of the model, the perclass vulnerability, as well as the vulnerability of any sensitive populations such as specific minority groups. Example statistics include attack accuracy, precision, recall, and f1 score. We also include attacker confidence for true positives, false positives, true negatives, and false negatives, the average distance from both the false positives and the false negatives to the training data, and the time required to execute the attack.
Va Target Model Training
Overfitting is the first factor that MPLens measures for conducting membership vulnerability analysis. MPLens specifically highlights the overfitting analysis by reporting the Accuracy Difference between the target model training accuracy and testing accuracy. This enables MLaaS providers and domainspecific data scientists to understand whether their vulnerability might be linked to overfitting. As previously indicated, while overfitting is strongly correlated with membership inference vulnerability, it is not the only source of vulnerability. Thus, when MPLens indicates undesirable vulnerability an absence of significant overfitting, analysis may be triggered to investigate if vulnerability is linked to other model or data characteristics as those discussed in Section III.
VB Attacker Knowledge
Our MPLens system is by design customizable to understand multiple attack scenarios. For instance, users may specify the shadow data, which is used by the attacker. This allows the user to consider a scenario in which the attacker has access to some subset of the target model’s training data, one where the attacker has access to some data that are drawn from the same distribution as the training data of the target model, or one where the attacker has noisy and inaccurate data, such as that evaluated in Table II and possibly generated through blackbox probing [17, 14].
The MPLens system is additionally customizable with respect to the attack method, including the shadow model based attack techniques [17], and the thresholdbased attack techniques [16]. Furthermore, when using a thresholdbased attack, our MPLens system can either accept predetermined values representing attacker knowledge of the target model error or it can also determine good threshold values through the shadow model training.
These customizations allow MLaaS providers and users of MPLens to specify the types of attackers they wish to analyze, analyze their model vulnerability against such attackers, and evaluate the privacy compliance of their model training and model prediction services.
VC Transferability
Another aspect of privacy analysis is related to the specific model training methods used to generate membership inference attacks and whether different methods result in significant variations in membership inference vulnerability. Consider the attack method from [17], the user can specify not only the attacker’s data but also the shadow model training algorithm and the membership inference binary classifier training algorithm. Each element is customizable as a system parameter when configuring MPLens for specific privacy risk evaluation. The MPLens system makes no assumption on how the attacker develops the shadow dataset, what knowledge is included in the data, whether the attacker has knowledge of the target model algorithm, or what attack technique is used. This flexibility allows MPLens to support evaluation across various transferable attack configurations.
Vi Experimental Results and Analysis
Via Datasets
All experiments reported in this section were conducted using the following four datasets.
Cifar10
The CIFAR10 dataset contains 60,000 color images [35] and is publicly available. Each image is formatted to be 32 x 32. The CIFAR10 dataset contains 10 classes with 6,000 images each: airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. The problem is therefore a 10class classification problem where the task is to identify which class is depicted in a given image.
Cifar100
Also publicly available, CIFAR100 similarly contains 60,000 color images [35] formatted to 32 x 32. 100 classes are represented ranging from various animals, pieces of household furniture, or types of vehicles. Each class has 600 available images. The problem is therefore a 100class image classification problem.
Mnist
MNIST is a publicly available dataset containing 70,000 images of handwritten digits [36]. Each image is formatted to be 32 x 32 and processed such that the digit is at the center of the image. The MNIST dataset constitutes a 10class classification problem where the task is to identify which digit between and , inclusive, is contained within a given image.
Labeled Faces in the Wild
The Labeled Faces in the Wild (LFW) database contains face photographs for unconstrained face recognition with more than 13,000 images of faces collected from the web. Each face has been labeled with the name of the person pictured. 1,680 of the people pictured have two or more distinct photos in the data set. Each person is then labeled with a gender and race (including mixed races). Data is then selected for the top 22 classes which were represented with a sufficient number of data points.
ViB Membership Inference Risk: Adversarial Examples
Given a target model and its training data, the membership inference attack, using blackbox access to the model prediction API, can be used to create a representative dataset. This representative dataset can be leveraged to generate substitute models for the given target model. One can then use such substitute models to generate adversarial examples using different adversarial attack methods [37].
Figures 77 provide visualization plots for the comparison of 2D PCA given the images of the dog and truck classes in CIFAR10. The plots are divided relative to the membership inference attack prediction output including the true target model training data (Figure 7), the data predicted by the membership inference attack as training data (Figure 7), and the data predicted as nontraining data by the membership inference attack (Figure 7).
These plots illustrate the accuracy of the distribution of instances predicted as in the target model’s training dataset through membership inference. They clearly demonstrate how even with the inclusion of false positives, an attacker can create a good representation of the training data distribution, particularly compared to those instances not predicted to be in the target training data. An attacker can therefore easily train a substitute model on this representative dataset which then enables the generation of adversarial examples by attacking this substitute model. Examples developed to successfully attack a substitute model trained on the instances in Figure 7 are likely to also be successful against a model trained on the instances in Figure 7.
ViC Membership Inference Risk: Data Skewness
To date, the membership inference attack has been studied either using training datasets with uniform class distribution or without specific consideration of the impact of any data skewness. However, as we demonstrated earlier, the risk of membership inference vulnerability can vary when class representation is skewed. Minority classes can display increased risk to membership inference attack as models struggle to more effectively generalize past the training data when fewer instances are given. In this section, we focus on studying the impact of data skewness on vulnerability to membership inference attacks.
In Figure 7 we investigate the impact of data skewness on membership inference vulnerability by controlling the representation of a single class. We reduce the automobile images from the CIFAR10 dataset to only 1% of the data and then increase the representation until the dataset is again balanced with automobiles representing 10% of the training images. We then plot the aggregate membership inference vulnerability which is the overall membership inference attack accuracy evaluated across all classes as well as the vulnerability of just the automobile class.
Figure 7 demonstrates that in cases where the automobile class constitutes 5% or less of the total training dataset, i.e. the automobile class has fewer than half as many instances as each of the other classes, this skewness will result in the automobile class displaying more severe vulnerability to membership inference attack.
Interestingly, the automobile class displays lower membership inference vulnerability than the average vulnerability reported in the CIFAR10 dataset when the dataset is balanced. However, when the automobile class becomes a minority class with fewer than half the instances of each of the other classes, the vulnerability shifts to be greater than that reported by the overall model. This gap becomes greater as continued decreased representation results in continued increased vulnerability for the automobile class.
Target Population  Attack Accuracy (%) 
Aggregate  70.14 
Male Images  68.18 
Female Images  76.85 
White Race Images  62.77 
Racial Minority Images  89.90 
Table IV additionally shows the vulnerability of a DNN target model trained on the LFW dataset to membership inference attacks. We analyze this vulnerability by breaking down the aggregated vulnerability across the top 22 classes into four different (nondisjoint) subsets of the LFW dataset: Male, Female, White Race, and Racial Minority. We observe that the training examples of racial minorities experience the highest attack success rate () and are thus highly vulnerable to membership inference attacks compared to images of white individuals. Similarly, female images, which represent less than of the training data, demonstrate higher average vulnerability () compared with images of males ().
To provide deeper insight and more intuitive illustration for the increased vulnerability of minority groups under membership inference attacks, Table V provides 7 individual examples of images in the LFW dataset targeted by the membership inference attack. That is, given a query with each example image, the target model predicts the individual’s race and gender (22 separate classes) which the attack model, a binary classifier, uses to predict if that image was “in” or “out” of the target model’s training dataset. The last row reports the ground truth as to whether or not the image was in the target model training set.
✓ = correct prediction ✗ = wrong prediction  
Target  ✓ 99.99  ✓ 65.81  ✓ 72.56  ✗ 62.30  ✓ 99.99  ✗ 99.63  ✓ 98.38 
Confidence (%)  
Attacker  ✓ 86.10  ✓ 50.49  ✗ 61.85  ✓ 72.06  ✓ 56.40  ✓ 99.88  ✗ 53.29 
Confidence (%)  
In Training  in  out  out  out  in  out  out 
Data? 
Through Table V, we highlight how minority populations are more likely to be identified by an attacker with a higher degree of confidence. We next discuss each example from left to right to articulate the impact of data skewness on model vulnerability to the membership inference attack.
For the 1st image, the target model is highly confident with its prediction and its prediction is indeed correct. Using the membership inference attack model, the attacker predicts that the model must have seen this example with high confidence and it succeeds in the membership inference attack.
For the 2nd image, the target model is less confident with its prediction, although the prediction outcome is correct. The attacker succeeds in the membership inference attack because it correctly predicts that this example is not in the training set, though the attacker’s confidence on this membership inference is much less certain (close to ) compared to that of the attack to the 1st image. We conjecture that the relatively low prediction confidence by the target model may likely contribute to the fact that the attacker is unable to obtain a high confidence for his membership inference attack.
The 3rd image is predicted by the target model correctly with a confidence of 72.56%, which is about 11.5% more confidence than that for the 2nd image. However, the attacker wrongly predicts that the example is in the training set when the ground truth shows that this example is not in the training set. Assuming the same logic as with the 1st image, i.e., the confidence and accuracy of target model prediction may indicate that the image was in the training dataset, could have caused the attacker to be misled.
For the 4th image, the target model has an incorrect prediction with the confidence of 62.30%. The attacker correctly predicts that this example is not in the training set. It is clear that a somewhat confident and yet incorrect prediction by the target model is likely to result in high attacker confidence that this minority individual has not been seen during the training.
These four images highlight the compounding downfall for minority populations. Models are more likely to overfit these populations. This leads to poor test accuracy for these populations and makes them more vulnerable to attack. As the 3rd example shows, the way to fool attackers is to have an accurate target model that can show reasonable confidence when classifying minority test images.
We next compare the results from the previous four images which represented minority classes with results from three images representing the majority class (white male images). For the 5th image in Table V, the target model predicts correctly with high confidence and the attacker is able to correctly predict that the image was in the training data. This result can be interpreted through comparison with the attacker performance for the 1st image. For the 5th query image which is from the majority group, the attacker predicts correctly that the image has been seen in training with barely over 50% in confidence, showing relatively high uncertainty compared with the 1st image from a minority class. This indicates that model accuracy and confidence are weaker indicators with respect to membership inference vulnerability for the majority class.
For the 6th image, the target model produces an incorrect prediction with high confidence. The attacker is very confident that the query image is not in the training dataset, which is indeed the truth. This demonstrates rare a potential vulnerability for the majority class: When the target model has high confidence in an inaccurate prediction, an attack model is able to confidently succeed in the membership inference attack. Through this example and the above analysis, we see that the majority classes have two advantages compared to the minority classes: (1) it is less common for the target model to demonstrate this vulnerability of misclassification with high confidence; and (2) for majority classes, the accuracy and privacy are aligned rather than as competing objectives.
For the 7th image, the target model makes a correct prediction with high confidence. The attacker makes the incorrect prediction that the example was in the training dataset of the target model. But the truth is that the example is not in the training set. This membership attack failed and is the flip side of the 5th image, with both reporting low attack confidence. Again by comparing with the 1st image of minority, it shows how model confidence and accuracy may lead to membership inference vulnerability for the minority classes in a way that is not true for the majority classes.
ViD Mitigation with Differentially Private Training
The second core component of our experimental analysis is to use MPLens to investigate the effectiveness of differential privacy employed to deep learning models as a countermeasure for membership inference mitigation.
To define utility loss we follow [38] and consider where represents accuracy in a nonprivate setting and is the accuracy when differential privacy is employed for the same model and data.
ViD1 Model and Problem Complexity
Iii Characterization of Membership Inference
Iiia Impact of Model Based Factors on Membership Inference
The most widely acknowledged factor impacting vulnerability to membership inference attacks is the degree of overfitting in the trained target model. Shokri et al. [17] demonstrate that the more overfitted a DNN model is, the more it leaks under membership inference attacks. Yeom et al. [16] investigated the role of overfitting from both the theoretical and the experimental perspectives. While their results confirm that models become more vulnerable as they overfit more severely, the authors also state that overfitting is not the only factor leading to model vulnerability under the membership inference attack. Truex et al. [14]
further demonstrate that several other model based factors also play important roles in causing model vulnerability to membership inference, such as classification problem complexity, inclass standard deviation, and the type of machine learning model targeted.
IiiB Impact of Attacker Knowledge on Membership Inference
Another category of factors that may cause model vulnerability to the membership inference attacks is the type and scope of knowledge which attackers may have about the target model and its training parameters. For example, Truex et al. [14] identified the impact that attacker knowledge with respect to both the training data of the target model and the target data have on the accuracy of the membership inference attack. This was evaluated by varying the degree of noise in the shadow dataset and target data used by the attacker. Table II shows the experimental results on four datasets with four types of learning tasks. The datasets include the CIFAR10 dataset which contains 3232 color images of different classes of objects while the Purchases datasets were developed from the Kaggle Acquire Valued Shoppers Challenge dataset containing the shopping history of several thousand individuals. Each instance in the Purchases datasets then represents an individual and each feature represents a particular product. If an individual has a purchase history with this product in the Kaggle Acquired Valued Shoppers Challenge dataset, there will be a 1 for the feature and otherwise a 0. The instances are then clustered into different shopping profile types which are treated as the classes. Table II reports results for Purchases datasets considering 10, 20, and 50 different shopping profile types.
The experiments in Table II demonstrate the impact of the attacker knowledge of the target data points by evaluating how adding varying degrees of to data features may impact on the success rate of membership inference attacks. Noise uniformly sampled from and added to features normalized within . Given a level of uncertainty of or inaccuracy in on the part of the attacker, represented by a corresponding degree of noise , Table II evaluates how effective the attacker remains in launching a membership inference attack to identify if or . The results reported in Table II
are reported for four logistic regression models, each one trained on a different dataset with gradually increasing
values (degree of noise).We make two interesting observations from Table II. First, for all four datasets, the more accurate (the less noise) the attacker knowledge about is, the higher the model vulnerability (attack success rate) to membership inference. This shows the accuracy of attacker knowledge about the targeted examples is an important factor in determining model vulnerability and attack success rate (in terms of attack accuracy). Second, in comparison to the noisy target data, adding noise to the shadow dataset results in a less severe drop in accuracy. Similar trends are however still observed with slightly higher attack success rates under smaller values for all four datasets. This set of experiments demonstrates that attackers with different knowledge and different levels of resources may have different success rates in launching a membership inference attack. Thus, model vulnerability should be evaluated by taking into account potential or available attacker knowledge.
IiiC Transferability of Membership Inference
Inspired by the transferability of adversarial examples [19], [20], [21], [22], membership inference attacks are also shown to be transferable. That is, attack model trained on an attack dataset containing the outputs from a set of shadow models is effective not only when shadow models and the target model are of the same type but also when the shadow model type varies. This property further opens the door to the blackbox attackers who do not have any knowledge of the target model.
Purchases20  Shadow Model Type  
Attack Model  DT  kNN  LR  NB 
DT  88.98  87.49  72.08  81.84 
kNN  88.23  72.57  84.75  74.27 
LR  89.02  88.11  88.99  83.57 
NB  88.96  78.60  89.05  66.34 
Table III demonstrates this property of membership inference attacks. It reports the membership inference attack accuracy for various attack configurations against a decision tree model trained on the Purchases20 dataset. It shows the transferability of membership inference attacks for different combinations from four different model types uses as the attack model type (rows) and the shadow model type (columns). In this experiment, the target prediction model is a decision tree. We observe that while using decision tree as the shadow model results in the most consistent membership inference attack success compared to other combinations, multiple combinations with both NN and logistic regression (LR) shadow models achieve attack success within 5% of gap compared to the most successful attack configuration using decision tree shadow models. Table III also shows that multiple types of models can be successful as the binary attack classifier . This set of experiments also shows that the worst attack performances against the DT target prediction model are seen when the shadow models are trained using Naïve Bayes (NB), with the worst performance reported when NB is the model type of the attack model as well. These results indicate that (1) the same strategy used for selecting the shadow model type may not be optimal for the attack model and (2) shadow models of different types other than the target model type may still lead to successful membership inference attacks. We refer readers to [14] for additional detail.
The transferability study in Table III indicates an attacker does not always need to know the exact target model configuration to launch an effective membership inference attack as attack models can be transferable from one target model type to another. And although finding the most effective attack strategy can be a challenging task for attackers, vulnerability to membership inference attack remains serious even with suboptimal attack configurations with almost all configurations reporting attack accuracy about and many above .
IiiD Training Data Skewness on Membership Attacks
The fourth important dimension of membership inference vulnerability is the risk imbalance across different prediction classes when the training data is skewed. Even when the overall membership inference vulnerability appears limited with attack success close to the 50% baseline (in or out random guess), there may be subgroups within the training data, which display significantly more vulnerability.
For example, Figure 2 illustrates the impact of data skewness on membership inference vulnerability. In this set of experiments, we measure the membership inference attack accuracy for a decision tree target model trained on the publicly available Adult dataset [23]. The adult dataset contains 48,842 instances, each with 14 different features and presents a binary classification problem wherein one wishes to identify if an individual’s yearly salary is K or K. The class distribution, however, is skewed with less than of instances being labeled $50K. As overfitting is widely considered a key factor in membership inference vulnerability, we simulate overfitting by increasing the depth of the target decision tree model. Figure 2 shows that the impact of overfitting (increasing in Xaxis) on both the aggregate membership inference vulnerability in terms of membership inference attack accuracy (accuracy over all classes) and the minority membership inference vulnerability (attack accuracy over the minority class). In this case aggregate vulnerability is the accuracy of the membership inference attack evaluated on an equal number of randomly selected examples seen by the target model (“in”) as as unseen (“out”) while the minority vulnerability reports the membership inference attack accuracy evaluated on only the subset of the previously selected examples whose class is $50K (the minority class).
We observe from Figure 2 that, the minority class has an increased risk under the membership inference attack as the model overfits more severely. This follows the intuition that minority class members have fewer other instances amongst whom they can hide in the training set and thus are more easily exposed under membership inference attacks. This aligns well to some extent with the observation that smaller training dataset sizes can lead to a greater overall risk for membership inference [17]. We argue that it is important for both data owners for model training and the MLaaS providers to consider vulnerability not just for the entire training dataset, but also the level of risk for minority populations specifically when evaluating privacy compliance.
Iv Mitigation Strategies and Algorithms
Iva Differential Privacy
Differential privacy is a formal privacy framework with a theoretical foundation and rigorous mathematical guarantees when effectively employed [10]. A machine learning algorithm is defined to be differentially private if and only if the inclusion of a single instance in the training dataset will cause only statistically insignificant changes to the output of the algorithm. Theoretical limits are set on such output changes in the definition of differential privacy, which is given formally as follows:
Definition 1 (Differential Privacy [10])
A randomized mechanism provides  differential privacy if for any two neighboring database and that differ in only a single entry, ,
(1) 
If , is said to satisfy differential privacy.
In the remaining of the paper, we focus on differential privacy for presentation convenience. To achieve differential privacy (DP), noise defined by is added to the algorithm’s output. This noise is proportional to the sensitivity of the output. Sensitivity measures the maximum change of the output due to the inclusion of a single data instance.
Definition 2 (Sensitivity [10])
For , the sensitivity of is
(2) 
for all , differing in at most one element.
The noise mechanism which is used is therefore bounded by both the sensitivity of the function , , and the privacy parameter . For example, consider the Gaussian mechanism defined as follows:
Definition 3 (Gaussian Noise Mechanism)
where
is the normal distribution with mean
and standard deviation . A single application of the Gaussian mechanism in Definition 3 to a function with sensitivity satisfies differential privacy if and [24].Additionally, there exist several nice properties of differential privacy for either multiple iterations of a differentially private function or the combination of multiple different functions wherein each satisfies a corresponding differential privacy. These composition properties are important for machine learning processes which often involve multiple passes over the training dataset . The formal composition properties of differential privacy include the following:
Definition 4 (Composition properties [24, 25])
Let be algorithms, such that for each , satisfies DP. Then, the following properties hold:

Sequential Composition: Releasing the outputs satisfies DP.

Parallel Composition: Executing each algorithm on a disjoint subset of satisfies DP.

Immunity to Postprocessing: Computing a function of the output of a differentially private algorithm does not deteriorate its privacy, e.g., publicly releasing the output of or using it as an input to another algorithm does not violate DP.
Differential privacy can be employed to different types of machine learning models. Due to the space constraint, in the rest of the paper, we specifically focus on differentially private training of deep neural network (DNN) models.
IvB Mechanisms for Differentially Private Deep Learning
DNNs are complex, sequentially stacked neural networks containing multiple layers of interconnected nodes. Each node represents the dataset in a unique way and each layer of networked nodes processes the input from previous layers using learned weights and a predefined activation function. The objective in training a DNN is to find the optimal weight values for each node in the multitier networks. This is accomplished by making multiple passes over the entire dataset with each pass constituting one
epoch. Within each epoch, the entire dataset is partitioned into many minibatches of equal size and the algorithm processes these batches sequentially, each including only a subset of the data. When processing one batch, the data is fed forward through the network using the existing weight values. A predefined loss function is computed for the errors made by the neural network learner with respect to the current batch of data. An optimizer, such as stochastic gradient descent (SGD), is then used to propagate these errors backward through the network. The weights are then updated according to the errors and the learning rate set by the training algorithm. The higher the learning rate value, the larger the update made in response to the backward propagation of errors.
Differentially private deep learning can be implemented by adding a small amount of noise to the updates made to the network such that there is only a marginal difference between the following two scenarios: (1) when a particular individual is included within the training dataset and (2) when the individual is absent from the training dataset. The noise added to the updates is sampled from a Gaussian distribution with scale determined by an appropriate noise parameter
corresponding to a desired level of privacy and controlled sensitivity. That is, the privacy budget , according to Definition 1, should constrain the value of at each epoch. Let denote the number of epochs for the DNN training, a predefined hyperparameter set at the training configuration as the termination condition. Let one epoch satisfy differential privacy given from the Gaussian mechanism in Definition 3. Then, a traditional accounting using the composition properties of differential privacy (Definition 4) would dictate that epochs would result in an overall privacy guarantee of if , and each epoch employed the Gaussian mechanism with the same value . We refer to this approach the fixed noise perturbation method [12]. An alternative approach proposed by Yu et. al. in [13] advocates a variable noise perturbation approach, which uses a decaying function to manage the total privacy budget and define variable noise scale based on different settings of for each different epoch (), aiming to add variable amount of noise to the epoch in a decreasing manner as the training progresses in epochs. Thus, we have for , and for a given epoch is bounded by its allocated privacy budget . The same overall privacy guarantee is met when for the differentially private DNN training of epochs.IvC Differentially Private Deep Learning with Fixed
The first differentially private approach to training deep neural networks is proposed by Abadi et al. [12]
and implemented on the tensorflow deep learning framework
[26]. A summary of their approach is given in Algorithm 1. To apply differential privacy, the sensitivity of each epoch is bounded by a clipping value , specifying that an instance may impact weight updates by at most the value . To achieve differential privacy, weight updates at the end of each batch include noise injection according to the sensitivity defined by and the scale of noise . The choice of is directly related to the overall privacy guarantee.Let , then according to differential privacy theory [24], each step (processing of a batch) is differentially private. If is randomly sampled from then additional properties of random sampling [27, 28] may be applied. Each step then becomes
differentially private. The moments accountant privacy accounting method is also introduced in
[12] to prove that Algorithm 1 is differentially private given appropriate parameter settings.We refer to Algorithm 1 as the fixed noise perturbation approach as each epoch is treated equally by introducing the same noise scale to every parameter update.
IvD Differentially Private Deep Learning with Variable
The variable noise perturbation approach to differentially private deep learning is proposed by Yu et al. [13]. It extends the fixed noise scale of over the total epochs in [12] by introducing two new capabilities. First, Yu et al. in [13] pointed out a limitation of the approach outlined in Algorithm 1. Namely, Algorithm 1 specifically calls for random sampling, wherein each batch is selected randomly with replacement. However, the most popular implementation for partitioning a dataset into minibatches in many deep learning frameworks is random shuffling, wherein the dataset is shuffled and then partitioned into evenly sized batches. In order to develop a differentially private DNN model under random shuffling, Yu et.al [13] extends Algorithm 1 of [12] by introducing a new privacy accounting method.
Additionally, Yu et al. [13] analyze the problem of using fixed noise scale (fixed values), and propose employing different noise scales to the weight updates at different stages of the training process. Specifically, Yu et al. [13] propose a set of methods for privacy budget allocation, which improve model accuracy by progressively reducing the noise scale as the training progresses. The variable noise scale approach is inspired by two observations. First, as the training progresses, the model begins to converge causing the noise being introduced to the updates to potentially become more impactful. This slows down the rate of model convergence and causes later epochs to no longer increase model accuracy compared to nonprivate scenarios. Second, the research on improving training accuracy and convergence rate of DNN training has led to a new generation of learning rate functions that replace the constant learning rate baseline by decaying learning rate functions and cyclic learning rates [29, 30]. Yu et al. [13] employed a similar set of decay functions to add noise at a decreased scale. That is, the noise defined by at the epoch as the training progresses is less than at the epoch given . The performance of four different types of decay functions to introduce variable noise scale by partitioning over epochs were evaluated.
IvE Important Implementation Factors
IvE1 Choosing Epsilon
In differentially private algorithms, the value dictates the amount of noise which must be introduced into the DNN model training and therefore the theoretical privacy bound. Choosing the correct value requires a careful balance between tolerable privacy leakage given the practical setting as well as the tolerable utility loss.
For example, Naldi and D’Acquisto propose a method in [31] for finding the optimal value to meet a certain accuracy for Laplacian mechanisms. Lee and Clifton alternatively employ the approach in [32] to analyze a particular adversarial model. Hsu et. al [33], on the other hand, takes an individual’s perspective on data privacy by optin incentivization. Kohli and Laskowski [34] also promote choosing an based on individual privacy preferences.
Despite these existing approaches, determining the “right” value remains a complex problem and is likely to be highly dependent on the privacy policy of the organization that owns the model and the dataset, the vulnerability of the model, the sensitivity of the data, and the tolerance to utility loss in the given setting. Additionally, there might be scenarios, such as the healthcare setting, in which even small degrees of utility loss are intolerable and are combined with stringent privacy constraints given highly sensitive data. In these cases, it may be hard or even impossible to find a good value for existing differentially private DNN training techniques.
IvE2 The Role of Transfer learning
Another key consideration is the use of transfer learning for dealing with more complex datasets
[12, 13]. For example, model parameters may be initialized by training on a nonprivate, similar dataset. The private dataset is then only used to further hone a subset of the model parameters. This helps to reduce the number of parameters affected by the noise addition required by exercising differential privacy. The use of transfer learning however relies on a strong assumption that such a nonprivate, similar dataset exists and and is available. In many cases, this assumption may be unrealistic.IvE3 Parameter Optimization
In additional to the privacy budget parameters , differentially private deep learning introduces influential privacy parameters, such as the clipping value that bounds the sensitivity for each epoch and the noise scale approach (including potential decay parameters) for . The settings of these privacy parameters may impact both the training convergence rate, and thus the training time, and the training and testing accuracy. However, tuning these privacy parameters becomes much more complex as we need to take into account the many learning hyperparameters already present in DNN training, such as the number of epochs, the batch size, the learning rate policy, and the optimization algorithm. These hyperparameters need to be carefully configured for high performance training of deep neural networks in a nonprivate setting. For deep learning with differential privacy, one needs to configure the privacy parameters by considering the side effect on other hyperparameters and reconfigure the previously tuned learning hyperparameters to account for the privacy approach.
For example, for a fixed total privacy budget values , too small of a number of epochs may result in reduced accuracy due to insufficient time to learn. A higher number of epochs however will result in a higher value required each epoch (recall Algorithm 1). Therefore, a carefully tuned hyperparameter for a nonprivate setting, such as the optimal number of epochs, may no longer be effective when differentially private deep learning is enabled.
A number of challenging questions remain open problems in differentially private DNN training, such as at what point do more epochs lead to accuracy loss under a particular privacy setting? What is the right noise decaying function for effective deep learning with differential privacy? Can we learn privately with high accuracy given complex datasets? The balance of the many parameters in a differentially private deep learning system presents new challenges to practitioners.
With these questions in mind, we develop MPLens, a membership privacy analysis system, which facilitates the evaluation of model vulnerability against membership inference attacks. Through MPLens, we investigate the effectiveness of differential privacy as a mitigation technique against membership inference attacks, including the tradeoffs of implementing such a mitigation strategy for preventing membership inference and the impact of differential privacy on different classes for the DNN models trained using skewed training datasets. We also highlight how the vulnerability of pretrained models under the membership inference attack is not uniform when the training data itself is skewed with minority populations. We show how this vulnerability variation may cause increased risks in federated learning systems.
V MPLens: System Overview
MPLens is designed as a privacy analysis and privacy compliance evaluation system for both data scientists and MLaaS providers to understand the membership inference vulnerability risk involved in their model training and model prediction process. Figure 3 provides an overview of the system architecture. The system allows providers to specify a set of factors that are critical to privacy analysis. Example factors include the data used to train their model, what data might be held by the attacker, what attack technique might be used, the degree of data skewness in the training set, whether the prediction model is constructed using the differentially private model training, what configurations are used for the set of differential privacy parameters, and so forth. Given the model input, the MPLens evaluation system reports the overall statistics on the vulnerability of the model, the perclass vulnerability, as well as the vulnerability of any sensitive populations such as specific minority groups. Example statistics include attack accuracy, precision, recall, and f1 score. We also include attacker confidence for true positives, false positives, true negatives, and false negatives, the average distance from both the false positives and the false negatives to the training data, and the time required to execute the attack.
Va Target Model Training
Overfitting is the first factor that MPLens measures for conducting membership vulnerability analysis. MPLens specifically highlights the overfitting analysis by reporting the Accuracy Difference between the target model training accuracy and testing accuracy. This enables MLaaS providers and domainspecific data scientists to understand whether their vulnerability might be linked to overfitting. As previously indicated, while overfitting is strongly correlated with membership inference vulnerability, it is not the only source of vulnerability. Thus, when MPLens indicates undesirable vulnerability an absence of significant overfitting, analysis may be triggered to investigate if vulnerability is linked to other model or data characteristics as those discussed in Section III.
VB Attacker Knowledge
Our MPLens system is by design customizable to understand multiple attack scenarios. For instance, users may specify the shadow data, which is used by the attacker. This allows the user to consider a scenario in which the attacker has access to some subset of the target model’s training data, one where the attacker has access to some data that are drawn from the same distribution as the training data of the target model, or one where the attacker has noisy and inaccurate data, such as that evaluated in Table II and possibly generated through blackbox probing [17, 14].
The MPLens system is additionally customizable with respect to the attack method, including the shadow model based attack techniques [17], and the thresholdbased attack techniques [16]. Furthermore, when using a thresholdbased attack, our MPLens system can either accept predetermined values representing attacker knowledge of the target model error or it can also determine good threshold values through the shadow model training.
These customizations allow MLaaS providers and users of MPLens to specify the types of attackers they wish to analyze, analyze their model vulnerability against such attackers, and evaluate the privacy compliance of their model training and model prediction services.
VC Transferability
Another aspect of privacy analysis is related to the specific model training methods used to generate membership inference attacks and whether different methods result in significant variations in membership inference vulnerability. Consider the attack method from [17], the user can specify not only the attacker’s data but also the shadow model training algorithm and the membership inference binary classifier training algorithm. Each element is customizable as a system parameter when configuring MPLens for specific privacy risk evaluation. The MPLens system makes no assumption on how the attacker develops the shadow dataset, what knowledge is included in the data, whether the attacker has knowledge of the target model algorithm, or what attack technique is used. This flexibility allows MPLens to support evaluation across various transferable attack configurations.
Vi Experimental Results and Analysis
Via Datasets
All experiments reported in this section were conducted using the following four datasets.
Cifar10
The CIFAR10 dataset contains 60,000 color images [35] and is publicly available. Each image is formatted to be 32 x 32. The CIFAR10 dataset contains 10 classes with 6,000 images each: airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. The problem is therefore a 10class classification problem where the task is to identify which class is depicted in a given image.
Cifar100
Also publicly available, CIFAR100 similarly contains 60,000 color images [35] formatted to 32 x 32. 100 classes are represented ranging from various animals, pieces of household furniture, or types of vehicles. Each class has 600 available images. The problem is therefore a 100class image classification problem.
Mnist
MNIST is a publicly available dataset containing 70,000 images of handwritten digits [36]. Each image is formatted to be 32 x 32 and processed such that the digit is at the center of the image. The MNIST dataset constitutes a 10class classification problem where the task is to identify which digit between and , inclusive, is contained within a given image.
Labeled Faces in the Wild
The Labeled Faces in the Wild (LFW) database contains face photographs for unconstrained face recognition with more than 13,000 images of faces collected from the web. Each face has been labeled with the name of the person pictured. 1,680 of the people pictured have two or more distinct photos in the data set. Each person is then labeled with a gender and race (including mixed races). Data is then selected for the top 22 classes which were represented with a sufficient number of data points.
ViB Membership Inference Risk: Adversarial Examples
Given a target model and its training data, the membership inference attack, using blackbox access to the model prediction API, can be used to create a representative dataset. This representative dataset can be leveraged to generate substitute models for the given target model. One can then use such substitute models to generate adversarial examples using different adversarial attack methods [37].
Figures 77 provide visualization plots for the comparison of 2D PCA given the images of the dog and truck classes in CIFAR10. The plots are divided relative to the membership inference attack prediction output including the true target model training data (Figure 7), the data predicted by the membership inference attack as training data (Figure 7), and the data predicted as nontraining data by the membership inference attack (Figure 7).
These plots illustrate the accuracy of the distribution of instances predicted as in the target model’s training dataset through membership inference. They clearly demonstrate how even with the inclusion of false positives, an attacker can create a good representation of the training data distribution, particularly compared to those instances not predicted to be in the target training data. An attacker can therefore easily train a substitute model on this representative dataset which then enables the generation of adversarial examples by attacking this substitute model. Examples developed to successfully attack a substitute model trained on the instances in Figure 7 are likely to also be successful against a model trained on the instances in Figure 7.
ViC Membership Inference Risk: Data Skewness
To date, the membership inference attack has been studied either using training datasets with uniform class distribution or without specific consideration of the impact of any data skewness. However, as we demonstrated earlier, the risk of membership inference vulnerability can vary when class representation is skewed. Minority classes can display increased risk to membership inference attack as models struggle to more effectively generalize past the training data when fewer instances are given. In this section, we focus on studying the impact of data skewness on vulnerability to membership inference attacks.
In Figure 7 we investigate the impact of data skewness on membership inference vulnerability by controlling the representation of a single class. We reduce the automobile images from the CIFAR10 dataset to only 1% of the data and then increase the representation until the dataset is again balanced with automobiles representing 10% of the training images. We then plot the aggregate membership inference vulnerability which is the overall membership inference attack accuracy evaluated across all classes as well as the vulnerability of just the automobile class.
Figure 7 demonstrates that in cases where the automobile class constitutes 5% or less of the total training dataset, i.e. the automobile class has fewer than half as many instances as each of the other classes, this skewness will result in the automobile class displaying more severe vulnerability to membership inference attack.
Interestingly, the automobile class displays lower membership inference vulnerability than the average vulnerability reported in the CIFAR10 dataset when the dataset is balanced. However, when the automobile class becomes a minority class with fewer than half the instances of each of the other classes, the vulnerability shifts to be greater than that reported by the overall model. This gap becomes greater as continued decreased representation results in continued increased vulnerability for the automobile class.
Target Population  Attack Accuracy (%) 
Aggregate  70.14 
Male Images  68.18 
Female Images  76.85 
White Race Images  62.77 
Racial Minority Images  89.90 
Table IV additionally shows the vulnerability of a DNN target model trained on the LFW dataset to membership inference attacks. We analyze this vulnerability by breaking down the aggregated vulnerability across the top 22 classes into four different (nondisjoint) subsets of the LFW dataset: Male, Female, White Race, and Racial Minority. We observe that the training examples of racial minorities experience the highest attack success rate () and are thus highly vulnerable to membership inference attacks compared to images of white individuals. Similarly, female images, which represent less than of the training data, demonstrate higher average vulnerability () compared with images of males ().
To provide deeper insight and more intuitive illustration for the increased vulnerability of minority groups under membership inference attacks, Table V provides 7 individual examples of images in the LFW dataset targeted by the membership inference attack. That is, given a query with each example image, the target model predicts the individual’s race and gender (22 separate classes) which the attack model, a binary classifier, uses to predict if that image was “in” or “out” of the target model’s training dataset. The last row reports the ground truth as to whether or not the image was in the target model training set.
✓ = correct prediction ✗ = wrong prediction  
Target  ✓ 99.99  ✓ 65.81  ✓ 72.56  ✗ 62.30  ✓ 99.99  ✗ 99.63  ✓ 98.38 
Confidence (%)  
Attacker  ✓ 86.10  ✓ 50.49  ✗ 61.85  ✓ 72.06  ✓ 56.40  ✓ 99.88  ✗ 53.29 
Confidence (%)  
In Training  in  out  out  out  in  out  out 
Data? 
Through Table V, we highlight how minority populations are more likely to be identified by an attacker with a higher degree of confidence. We next discuss each example from left to right to articulate the impact of data skewness on model vulnerability to the membership inference attack.
For the 1st image, the target model is highly confident with its prediction and its prediction is indeed correct. Using the membership inference attack model, the attacker predicts that the model must have seen this example with high confidence and it succeeds in the membership inference attack.
For the 2nd image, the target model is less confident with its prediction, although the prediction outcome is correct. The attacker succeeds in the membership inference attack because it correctly predicts that this example is not in the training set, though the attacker’s confidence on this membership inference is much less certain (close to ) compared to that of the attack to the 1st image. We conjecture that the relatively low prediction confidence by the target model may likely contribute to the fact that the attacker is unable to obtain a high confidence for his membership inference attack.
The 3rd image is predicted by the target model correctly with a confidence of 72.56%, which is about 11.5% more confidence than that for the 2nd image. However, the attacker wrongly predicts that the example is in the training set when the ground truth shows that this example is not in the training set. Assuming the same logic as with the 1st image, i.e., the confidence and accuracy of target model prediction may indicate that the image was in the training dataset, could have caused the attacker to be misled.
For the 4th image, the target model has an incorrect prediction with the confidence of 62.30%. The attacker correctly predicts that this example is not in the training set. It is clear that a somewhat confident and yet incorrect prediction by the target model is likely to result in high attacker confidence that this minority individual has not been seen during the training.
These four images highlight the compounding downfall for minority populations. Models are more likely to overfit these populations. This leads to poor test accuracy for these populations and makes them more vulnerable to attack. As the 3rd example shows, the way to fool attackers is to have an accurate target model that can show reasonable confidence when classifying minority test images.
We next compare the results from the previous four images which represented minority classes with results from three images representing the majority class (white male images). For the 5th image in Table V, the target model predicts correctly with high confidence and the attacker is able to correctly predict that the image was in the training data. This result can be interpreted through comparison with the attacker performance for the 1st image. For the 5th query image which is from the majority group, the attacker predicts correctly that the image has been seen in training with barely over 50% in confidence, showing relatively high uncertainty compared with the 1st image from a minority class. This indicates that model accuracy and confidence are weaker indicators with respect to membership inference vulnerability for the majority class.
For the 6th image, the target model produces an incorrect prediction with high confidence. The attacker is very confident that the query image is not in the training dataset, which is indeed the truth. This demonstrates rare a potential vulnerability for the majority class: When the target model has high confidence in an inaccurate prediction, an attack model is able to confidently succeed in the membership inference attack. Through this example and the above analysis, we see that the majority classes have two advantages compared to the minority classes: (1) it is less common for the target model to demonstrate this vulnerability of misclassification with high confidence; and (2) for majority classes, the accuracy and privacy are aligned rather than as competing objectives.
For the 7th image, the target model makes a correct prediction with high confidence. The attacker makes the incorrect prediction that the example was in the training dataset of the target model. But the truth is that the example is not in the training set. This membership attack failed and is the flip side of the 5th image, with both reporting low attack confidence. Again by comparing with the 1st image of minority, it shows how model confidence and accuracy may lead to membership inference vulnerability for the minority classes in a way that is not true for the majority classes.
ViD Mitigation with Differentially Private Training
The second core component of our experimental analysis is to use MPLens to investigate the effectiveness of differential privacy employed to deep learning models as a countermeasure for membership inference mitigation.
To define utility loss we follow [38] and consider where represents accuracy in a nonprivate setting and is the accuracy when differential privacy is employed for the same model and data.
ViD1 Model and Problem Complexity
Iv Mitigation Strategies and Algorithms
Iva Differential Privacy
Differential privacy is a formal privacy framework with a theoretical foundation and rigorous mathematical guarantees when effectively employed [10]. A machine learning algorithm is defined to be differentially private if and only if the inclusion of a single instance in the training dataset will cause only statistically insignificant changes to the output of the algorithm. Theoretical limits are set on such output changes in the definition of differential privacy, which is given formally as follows:
Definition 1 (Differential Privacy [10])
A randomized mechanism provides  differential privacy if for any two neighboring database and that differ in only a single entry, ,
(1) 
If , is said to satisfy differential privacy.
In the remaining of the paper, we focus on differential privacy for presentation convenience. To achieve differential privacy (DP), noise defined by is added to the algorithm’s output. This noise is proportional to the sensitivity of the output. Sensitivity measures the maximum change of the output due to the inclusion of a single data instance.
Definition 2 (Sensitivity [10])
For , the sensitivity of is
(2) 
for all , differing in at most one element.
The noise mechanism which is used is therefore bounded by both the sensitivity of the function , , and the privacy parameter . For example, consider the Gaussian mechanism defined as follows:
Definition 3 (Gaussian Noise Mechanism)
where
is the normal distribution with mean
and standard deviation . A single application of the Gaussian mechanism in Definition 3 to a function with sensitivity satisfies differential privacy if and [24].Additionally, there exist several nice properties of differential privacy for either multiple iterations of a differentially private function or the combination of multiple different functions wherein each satisfies a corresponding differential privacy. These composition properties are important for machine learning processes which often involve multiple passes over the training dataset . The formal composition properties of differential privacy include the following:
Definition 4 (Composition properties [24, 25])
Let be algorithms, such that for each , satisfies DP. Then, the following properties hold:

Sequential Composition: Releasing the outputs satisfies DP.

Parallel Composition: Executing each algorithm on a disjoint subset of satisfies DP.

Immunity to Postprocessing: Computing a function of the output of a differentially private algorithm does not deteriorate its privacy, e.g., publicly releasing the output of or using it as an input to another algorithm does not violate DP.
Differential privacy can be employed to different types of machine learning models. Due to the space constraint, in the rest of the paper, we specifically focus on differentially private training of deep neural network (DNN) models.
IvB Mechanisms for Differentially Private Deep Learning
DNNs are complex, sequentially stacked neural networks containing multiple layers of interconnected nodes. Each node represents the dataset in a unique way and each layer of networked nodes processes the input from previous layers using learned weights and a predefined activation function. The objective in training a DNN is to find the optimal weight values for each node in the multitier networks. This is accomplished by making multiple passes over the entire dataset with each pass constituting one
epoch. Within each epoch, the entire dataset is partitioned into many minibatches of equal size and the algorithm processes these batches sequentially, each including only a subset of the data. When processing one batch, the data is fed forward through the network using the existing weight values. A predefined loss function is computed for the errors made by the neural network learner with respect to the current batch of data. An optimizer, such as stochastic gradient descent (SGD), is then used to propagate these errors backward through the network. The weights are then updated according to the errors and the learning rate set by the training algorithm. The higher the learning rate value, the larger the update made in response to the backward propagation of errors.
Differentially private deep learning can be implemented by adding a small amount of noise to the updates made to the network such that there is only a marginal difference between the following two scenarios: (1) when a particular individual is included within the training dataset and (2) when the individual is absent from the training dataset. The noise added to the updates is sampled from a Gaussian distribution with scale determined by an appropriate noise parameter
corresponding to a desired level of privacy and controlled sensitivity. That is, the privacy budget , according to Definition 1, should constrain the value of at each epoch. Let denote the number of epochs for the DNN training, a predefined hyperparameter set at the training configuration as the termination condition. Let one epoch satisfy differential privacy given from the Gaussian mechanism in Definition 3. Then, a traditional accounting using the composition properties of differential privacy (Definition 4) would dictate that epochs would result in an overall privacy guarantee of if