Towards Demystifying Membership Inference Attacks

by   Stacey Truex, et al.

Membership inference attacks seek to infer membership of individual training instances of a model to which an adversary has black-box access through a machine learning-as-a-service API. Aiming at providing an in-depth characterization of membership privacy risks against machine learning models, this paper presents a comprehensive study towards demystifying membership inference attacks from two complimentary perspectives. First, we provide a generalized formulation of the development of a black-box membership inference attack model. Second, we characterize the importance of model choice on model vulnerability through a systematic evaluation of a variety of machine learning models and model combinations using multiple datasets. Through formal analysis and empirical evidence from extensive experimentation, we characterize under what conditions a model may be vulnerable to such black-box membership inference attacks. We show that membership inference vulnerability is data-driven and its attack models are largely transferable. Though different model types display different vulnerabilities to membership inferences, so do different datasets. Our empirical results additionally show that (1) using the type of target model under attack within the attack model may not increase attack effectiveness and (2) collaborative learning in federated systems exposes vulnerabilities to membership inference risks when the adversary is a participant in the federation. We also discuss countermeasure and mitigation strategies.


page 1

page 2

page 3

page 4


Effects of Differential Privacy and Data Skewness on Membership Inference Vulnerability

Membership inference attacks seek to infer the membership of individual ...

Membership Inference Attacks against Machine Learning Models

We quantitatively investigate how machine learning models leak informati...

Understanding Membership Inferences on Well-Generalized Learning Models

Membership Inference Attack (MIA) determines the presence of a record in...

Quantifying Membership Inference Vulnerability via Generalization Gap and Other Model Metrics

We demonstrate how a target model's generalization gap leads directly to...

Characterizing Improper Input Validation Vulnerabilities of Mobile Crowdsourcing Services

Mobile crowdsourcing services (MCS), enable fast and economical data acq...

White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

Membership inference determines, given a sample and trained parameters o...

Bootstrap Aggregation for Point-based Generalized Membership Inference Attacks

An efficient scheme is introduced that extends the generalized membershi...

1. Introduction

Machine learning-as-a-service has seen an explosion of interest with the development of cloud platform services. Amazon (ama, 2018), Microsoft (Copeland et al., 2015), IBM (ibm, 2018), and Google (goo, 2018)

have all launched such machine learning-as-a-service platforms. These services allow companies to leverage powerful machine learning and artificial intelligence technologies without requiring in-house domain expertise. Machine learning-as-a-service platforms allow users to upload their data, run various data analytics or model building processes, and deploy the trained models to services of their own. Given this landscape, new interest has been given to the potential vulnerabilities of such machine learning services.

One such vulnerability is membership inference. Let us consider a cancer treatment center with a large database of valuable patient data. Let our cancer treatment center then leverage a machine learning-as-a-service platform to develop a predictive model which, when given a patient’s data as input, can predict cancer-related health outcomes. The treatment center then utilizes the cloud deployment option to create a service of their own wherein users can log in, provide their own health information, and receive predictions in return. A membership inference attack considers a scenario wherein a user of such a black-box prediction service is an adversary. The adversary can provide the health information of another individual and, based on the model’s output, try to infer if is a cancer patient at the treatment center.

There are two primary parties who are interested in protecting against such membership inference attacks: patient and the cancer treatment center. Previous patients of the cancer treatment center, such as patient , consider their membership private and do not want their patronage to be public knowledge. For example, consider the case of a patient, Alice, of the treatment center. Let Alice be under consideration for a job at Bob’s company. Bob can leverage the cancer treatment center’s service to infer whether or not Alice is a patient. Upon learning of Alice’s inclusion in the cancer treatment center’s database, Bob decides not to hire Alice in favor of a candidate who, he believes, will have lower healthcare costs for the company.

In addition to concerns of patients such as Alice, we also must consider the interests of the cancer treatment center, the owner of the training dataset and the trained model under attack. In today’s market, across many domains, data is considered an organizational asset (Lake and Crowther, 2013). While internal data has always been a driver of decision making for most companies, the role of data has been moving steadily closer to the core of many industries. A company’s data therefore holds intrinsic value to the organization. Additionally, this training data is the source of the machine learning model under attack. Training this model not only requires front-end capital, time, and resources but also holds competitive business value for the treatment center. The treatment center may even charge a fee per evaluation. It is therefore essential from the cancer treatment center’s perspective to protect their private database.

Membership Inference Risks v.s. Differential Privacy. Membership inference violates the privacy of both the individual participants involved in the model training and the owner of the training dataset. The former involves membership privacy of individuals who are participants in the model training and the latter involves risks of unauthorized leakages of business value or trade secrets. The ultimate goal of membership privacy is to protect against the risk of membership leakage of individuals in the data used in training a machine learning model.

Unlike membership privacy, when a model is secured by differential privacy, it means that the model trained on the original dataset will produce almost identical predictions as a model trained on , which differs from by exactly one instance (Dwork et al., 2014). Differential privacy therefore protects the content privacy and the output privacy of the model, whereas membership privacy refers to membership inference against machine learning models and is centered on inferring the membership of the input data, but not the content of input data, from the output result of the model.

Membership Inference v.s. Adversarial Examples.

Adversarial learning to-date has been focuses on attacking deployed deep learning models. Most existing membership inference attacks similarly attack deep learning models, utilizing deep neural networks (DNNs) for training both the target model under attack and the attack model 

(Shokri et al., 2017; Long et al., 2018; Hayes et al., 2017; Carlini et al., 2018)

. However, membership inference attacks are different from adversarial examples with respect to both attack generation process and adverse effect of attacks and represent two different classes of security and privacy intrusion problems under the general umbrella of adversarial machine learning. Concretely, adversarial deep learning research to date has been centered on the generation of adversarial examples by injecting minimal amount of perturbation to a benign example such that the pre-trained classification model will misclassify with high probability. Thus, adversarial example-based attacks aim at altering the output of the model prediction without being visually noticed. On the contrary, a membership inference attack does not alter the prediction output at all, and it succeeds by simply making membership inference on the prediction output.

Scope and Contributions of the paper. In this paper, we investigate membership inference attacks under the black-box access scenario in which an adversary may probe the prediction API with input and receive the prediction output from the privately trained model. Our research results are novel from three perspectives. First, we describe a systematic approach to construct a membership inference attack model and the general formulation of each component of the attack model generation framework. We show that generating a membership inference attack model is a complex and multi-step strategic process. Second, to understand when and how membership inference attacks work and why certain models and datasets are more vulnerable, we take a holistic approach with extensive empirical evidence to study and characterize membership inference attacks across different target model types, different types of training datasets, and different combinations of model types for generating attack training dataset and attack models. Finally, we introduce and investigate a new membership threat, insider membership inference, which is launched by a member of a federated learning system against other participants in an collaborative learning environment. As federated learning systems become more popular with promises of increased accuracy and privacy, highlighting and understanding this risk is an important part of membership inference mitigation effort.

2. Membership Inference Attacks

In this section, we formalize membership inference attacks against machine learning models as follows: Given an instance and black-box access to a classification model trained on a dataset , can an adversary infer with high confidence that the instance was contained in at the train time of ? This definition states that membership inference focuses on the question of the membership of in and not about the contents of . This divergence separates membership inference from existing areas of privacy research, such as differential privacy (Blum et al., 2005)(Vaidya et al., 2014)(Dwork, 2008) or secure multiparty computation (Wu et al., 2016)(De Cock et al., 2017)(Cramer et al., 2015). Also notable is that membership inference attacks are at the local level: an adversary wishes to know if a particular is in and not in its entirety.

Figure 1. The workflow of a Membership Inference Attack.

Figure 1 illustrates the workflow of membership inference attack development. Given a training dataset and a classification model trained on , the machine learning service provider may provide a classification service through a prediction API. This API offers users black box access to the model . Users may send prediction queries with their own data to the service and receive classification predictions. An adversary uses such a service to collect information about the private dataset on which the prediction model was privately trained. By leveraging any public or background knowledge of the training dataset or the target model , an adversary builds a membership inference attack model to deploy for launching membership inference attacks in real time.

To gain an in-depth understanding of the general formulation of the membership inference attack model, we first characterize the types of adversarial knowledge and datasets required to train the attack model as well as the attack cost, the attack value, and their evaluation metrics. We then present a systematic formulation of general attacks in Section 


2.1. Threat Model and Assumptions

2.1.1. Machine Learning-As-A-Service: Black-box Access

Recall from Figure 1 that the machine learning service provider publishes the trained target model through a black-box access API, which accepts service requests from users in the form of a prediction query for input

and returns the predicted class and the prediction probability vector. The input and output formats of the API are given by the service provider as part of a service agreement. However,

and the dataset on which is trained remain private. Only the prediction API is exposed to users, thus ensuring only black box access to .

A question one may ask is: “how can an adversary with only black-box access to the prediction API perform membership inference attack without knowing anything about and the dataset ? Generally speaking, is trained as an approximation of an ideal function for a training dataset , where is the true class for the sample instance . Let denote the output of a candidate model

. An optimal model is then the one that minimizes the average loss defined by a chosen loss function

for all samples in the training set , weighted by their posterior

probability. The posterior probability,

, is defined as the probability of class being the label of sample . For many application specific problems, is a non-deterministic function. That is, if is sampled repeatedly, different values of may be given. In this case, the optimal choice of the class for sample object among all candidate class labels is the class that minimizes the expected loss for a given sample . The target model is then assigned to be the optimal model for the training dataset upon the completion of the training and testing phases.

There are currently a multitude of well-studied algorithms available to determine . Without loss of generality to which algorithm or loss function was chosen to identify , we simply maintain that is a function which maps feature vectors to a class , . Our target function therefore creates a decision boundary which separates the feature space into sets in which each set is associated with a candidate class value in . As these sets and corresponding class assignments are chosen to minimize the loss function over the training dataset , the decision boundaries are strongly informed by the training dataset and will in turn be the core of the trained machine learning model .

2.1.2. Adversarial Knowledge

We characterize the membership inference threat model based on prior adversarial knowledge. We broadly categorize this adversarial knowledge into three categories: black-box, grey-box, and white-box data knowledge.

Black-Box Knowledge. An adversary is said to have black-box knowledge when the adversary does not have any specialized knowledge of the training data. However, black-box knowledge may include the input and output of the service API as well as publicly available information about the target prediction model . For example, if the service provider is our cancer treatment center, then the adversary may have access to relevant statistics curated by the government and published for the public good including demographic information such as the likeliness of different age groups or genders to contract certain cancers or clinical information such as the prevalence of co-occurrence of different diseases with various cancer types.

Grey-Box Knowledge. We characterize grey-box knowledge as specialized population-level knowledge. This may include population-level statistics that describe the distribution of features in the target model’s training data. For example, in addition to publicly available distributions on the average age of cancer patients (black-box knowledge), the adversary may know the average age of a cancer patient seen at the target treatment center (specific statistics).

White-Box Knowledge. White-box knowledge characterizes scenarios where the training data for

is sampled from a constrained population or in a skewed fashion such that an adversary has access to some versions of real data in the training data

of the target model but not the complete training set . For example, a noisy version of the real data may be accessible which resembles with the addition of some noise or missing values (Shokri et al., 2017). Adversaries with white-box knowledge can therefore develop or access truein

” samples and employ active learning techniques on these known samples to develop a very accurate dataset to mirror


The adversary with white-box knowledge is the most powerful adversary whereas the adversary with only black-box knowledge represents the most difficult attack environment where the adversaries are limited to (i) publicly availbale information, (ii) black-box queries to the prediction API, and (iii) the output of classification prediction from the target model . This is the setting we use to formulate membership inference attacks and to characterize adverse effects and divergence of membership inferences.

2.2. Attack Value vs. Attack Cost

It is generally accepted that systems security should never operate as an all-or-nothing mechanism. Systems must always seek to optimize two sets of factors: cost of defense vs value of assets to the system owner and cost of attack vs value of assets to the adversary. These principles hold true to deployed learning systems with respect to membership inference attacks.

In the context of membership inference attacks, value can be characterized by attack accuracy as an evaluation of what level of leakage is present in or what amount of knowledge an attacker can expect to gain. The cost of attack refers to the knowledge and work necessary for an adversary to launch a successful attack. When characterizing cost, we consider knowledge cost as well as development cost. For example, to launch an effective attack, how much knowledge of or does an adversary need? Gaining this knowledge should be considered a type of cost for the adversary. Alternatively, cost can also be characterized by how computationally expensive it is to develop an effective attack model .

The accuracy of an attack can be characterized by a number of metrics. For example, an accuracy measure may be defined as the likelihood that correctly identifies or . Alternatively, accuracy can be defined as precision measure, which indicates the fraction of the instances inferred as members are indeed members of the training dataset , or focuses on the probability says .

3. General Attack Formulation

Figure 2. Membership Attack Model Development.

Using attack accuracy, the problem of membership inference is defined as follows: given a query input and black-box access to the target model , the membership inference attack answers the question of whether is true or false. The attack is successful if the attacker can determine with high confidence that is true.

At the most abstract level, membership inference attack models are binary classifiers. Given an instance

and a target model , the goal of a membership inference attack model is to identify whether or not was contained within the dataset used to train .

Let consist of training instances where consists of features, denoted by , and , where is a finite integer value . Let be the target model trained using this dataset . Given a particular feature vector , will then output a probability vector of the form , where , and . The prediction class label for a feature vector is the class with highest probability value in . Therefore .

Given the adversary’s black-box access to via the prediction service API, an adversary is able to query with any number of instances to receive corresponding probability vectors. The adversary uses this probing access, along with any prior knowledge, to generate , a representation of adversarial knowledge of . The first building block for implementing a black-box membership inference attack is to leverage to generate a synthetic labeled dataset to mirror the data in . This synthetic, labeled dataset is artificially simulated and called a shadow dataset of . Although the word “shadow” was borrowed from shadow copying for systems creating back up data copies (Sankaran et al., 2004), the shadow dataset in our context should be thought of as a synthetic version of the real training dataset . is then used to generate an attack training dataset , which is required to train the final membership attack model, a binary classifier .

Figure 2 highlights these three primary phases in the development of the membership inference attack: (1) development of a shadow dataset, (2) generation of an attack model training dataset, and (3) training and deployment of the membership inference attack model.

3.1. Development of a Shadow Dataset

Figure 3. Development of a Shadow Dataset.

Given a target model , its training dataset , and black-box adversarial knowledge , the development of a shadow dataset is the first step in generating a membership inference attack model. consists of training instances where each consists of features equivalent to those in and each is a predicted class label in . Note that and are known via the service API and thus consistent across and . The cardinality of , however, remains unknown and therefore and are likely to differ. The shadow dataset generation process leverages the prediction service API to manage the creation and control the quality of , as shown in Figure 3.

3.1.1. API Probing

While the training set and the cardinality (size) of are unknown to an adversary, the adversary can probe the service API to reveal structural information, such as the number of features, , the data types of those features, and the number of classes . This knowledge can be obtained by the adversary through sending in trial query instances and observe responses.

We refer to the complete set of adversarial knowledge as . This includes prior knowledge as well as that inferred from this API probing. We do not state any limitations on except that , as the membership inference attack becomes trivial when . The cost of launching an effective membership inference attack includes the work of this probing phase by the adversary.

As a result of this API probing, the adversary can construct a skeleton dataset , which is similar to in structure and, ideally, any should be a viable instance that could be included in .

3.1.2. Shadow Data Generation

There are several ways to generate a quality shadow dataset with small amount of query probing attempts. Below we highlight four categories of techniques: statistics-based, active learning-based, query-based, and region-based generation.

Statistics-Based Generation. In statistics-based generation, the adversary leverages population-level statistics of the features in to create samples for . Given known distributions for features an adversary may conduct random sampling to construct these new samples. Features may be treated independently where an instance is generated through random samplings of distributions, each distribution corresponding to either a different feature or the class label. Alternatively, sampling may account for feature relationships. This may be done, for example, when our adversary has knowledge of statistics on disease co-occurrence.

Active Learning-Based Generation. Active learning is a technique developed in the semi-supervised machine learning domain (Zhu, 2005). Active learning has been developed to address the problem of a largely unlabeled training dataset when assigning accurate labels is an expensive task. For example, in the development of a spam filter one may have access to a large number of unlabeled emails (Sculley, 2007). It is a very expensive proposal to suggest that a human read the millions of emails to provide labels, yet labels are necessary to develop an accurate filter. To address this problem, representative samples are selected and labeled. Given this subset of training instances which now have accurate labels, an automated process takes over and propagates the label logic to other instances. Active learning techniques may be combined with statistics-based generation in black-box or grey-box data knowledge scenarios where a large number of samples may be generated through random sampling of the features but class labels assigned through intervention by the adversary followed by active learning.

Query-Based Generation. When using query-based generation, an adversary will generate a random sample and then query the target model. The target model will then provide a probability vector output. In this instance the adversary will want to identify instances for which the machine learning service provides a class label with relatively high confidence. That is, the adversary will search for instances in which the output has a value above some predefined threshold. This, again, may be combined with other techniques. For example, query-based generation may be used to provide a seed for active learning or statistics-based generation may inform the development of the instances sent to the service provider.

Region-Based Generation. Region-based generation follows a clustering-based logic. Given an instance with label , region-based generation will seek to generate instances where is below some pre-determined threshold for a pre-chosen distance function . The new instance is then assigned the same label . One way an adversary may use region-based generation is in conjunction with white-box data knowledge. Given knowledge of some instances in or very similar to those in an adversary can use region-based generation to expand this knowledge into a larger number of highly accurate instances to construct .

Several factors may determine which concrete technique will be chosen by an adversary, such as the knowledge contained in or the query probing results. For example, an adversary who has grey-box data knowledge may be more likely to rely on statistics-based generation due to the specificity of the available statistics to whereas an adversary with black-box knowledge may want to augment statistics-based generation with query-based generation due to lower confidence in their non-specific, population-level knowledge of . On the other hand, active learning-based or region-based generation would likely be a popular choice for adversaries with white-box data knowledge where the adversary may leverage their insider knowledge of . These are just some of the considerations in development of an effective membership inference attack model.

Figure 4. Shadow dataset development using query-based and region-based techniques with black-box data knowledge. Figures adapted from images in (Pedregosa et al., 2011) and (MATLAB, 2010)

In Figure 4 we show an example of the shadow dataset development process using a combination of query-based and region-based techniques for an adversary with black-box data knowledge. The adversary first randomly generates a starting point using distributions in . This point is then queried to the service provider which provides in return. The point is then updated and queried again. This is continued until a confidence threshold is reached. That is, the process stops when a point is found such that meets a predefined confidence threshold.

A hyper-cube is then constructed surrounding . A set of new samples are then generated by randomly sampling from the hyper-cube region. Each point is assigned the class and added, along with to the shadow dataset . The entire process is then repeated beginning with sampling a new starting point from and ending with a new set of samples added to . The adversary will continue this repetition until a satisfactory number of samples have been added to .

3.2. Generation of an Attack Model Training Set

Figure 5. Generation of an Attack Model Training Set.

Upon the completion of the shadow dataset development, the adversary will proceed to utilize the shadow dataset to develop the membership attack dataset for training a binary classifier as the final attack model, as shown in Figure 5. Given that each instance in consists of a feature-vector and its known-class, denoted by , the adversary can define an attack generation function denoted by . takes a feature vector-known class pair as input and outputs an attack training instance, consisting of two pieces of information: a probability vector and a binary class label, indicating “in” or “out”. There are several approaches to generate using . For example, the adversary can train a new model over which simulates the private target model . In this case, we call a shadow model of . Given that the adversary does not know the original training set nor the size of , the adversary may leverage ensemble learning techniques (Dietterich, 2000), such as data partition-based ensemble, model-based ensemble, or hybrid ensemble models, to improve the quality of shadow models, aiming at simulating the target model . Thus, the attack model training set generation function can be viewed as an ensemble of the set of shadow models. These shadow models seek to characterize the decision boundary of the target model. More specifically, the shadow models aim at mirroring the sensitivity of the target decision boundary to individual instances.

Consider a data partition based ensemble approach (Shokri et al., 2017). The adversary partitions the shadow dataset into and . is then divided into partitions (), one partition for each shadow model. Each partition of will then be used to train a single shadow model . Here we intentionally do not specify the machine learning model type of as this is yet another design choice made by an adversary. The decision may be informed if the adversary knows the type of or chosen using some other criteria, we leave this decision unspecified to remove the constraint that an adversary must know the type of . Next, will be evaluated against . The corresponding outputs will then be labeled as “out”. Additionally, a sample of size is taken from the partition used to train and evaluated against with the corresponding outputs labeled as “in”. By combining these output-label pairs, we obtain the attack train data .

Figure 5 highlights the workflow of generating an attack model training set. An adversary may choose a single model for efficiency or an ensemble of models to increase the size or generality of . An adversary may diversify model types in an ensemble when ’s type is unknown. If an ensemble is used, there are choices on size, sampling, and aggregation that must be made and can be informed by the adversary’s knowledge of the target in various ways. We stress that while we formulate the membership inference attack using a general model, there exist many implementation variants.

Effect of Ensemble Methods. Combining multiple different models reduces the risk of choosing the wrong hypothesis within the hypothesis space of a particular problem. Also, multiple models allows for more effective local search, which many machine learning algorithms perform in various ways, and limits the impact of the local optima problem. Finally, a combining of chosen hypotheses allows for an expansion of the hypothesis space (Dietterich, 2000). Two common ways to accomplish this diversity are bagging and boosting.

Bagging is accomplished by either drawing a sample of training examples from the original dataset randomly and with replacement, or by creating disjoint subsets of the original training data called cross-validated committees (Parmanto et al., 1996)

. A commonly used implementation of bagging is the Random Forest ensemble model 

(Breiman, 2001).

Using the boosting technique, a set of weights is maintained for each instance within the training dataset. Each model is then trained iteratively to minimize the weighted error of the training dataset. The weights of the training instances are updated to put more emphasis on the misclassified examples. The adaboost ensemble model (Freund and Schapire, 1995) is a popular implementation of boosting.

The use of multiple models via ensemble learning for attack data generation decreases the risk of choosing the wrong hypothesis. Given an adversary who has only black-box knowledge of the target model (regardless of the adversary’s knowledge type of the dataset ), there is no guaranteed method to reproducing the target model’s behavior on . A diversity of generation models will minimize the risk that the adversary is only capturing one behavior type or candidate decision boundary shape. This again accentuates the need for boosting or bagging to ensure that the model set is diverse.

3.3. Generating the Membership Attack Model

Figure 6. Training and Deployment of the Membership Inference Attack Model

The attack model training set contains the outputs from the generation function . consists of instances, , and each instance is the output of for some input . This attack model training dataset will then be used to generate the final attack model in,out, which takes as input a probability vector output for an instance and outputs a binary classification of “in” or“out”. Similarly, we make no assumptions on how is used to inform but rather say that is available to an adversary during the generation of the membership attack model . A number of machine learning models and techniques can be leveraged to train a binary classification-based attack model using . can then be deployed against the output of the target model such that, ideally, given an instance , in” if the instance and out” if . Figure 6 visualizes this final phase of developing a membership inference attack.

Regardless of how complex the chosen training process is, whether it is a distance evaluation or a complex machine learning model, this phase produces the final attack model which will be deployed for membership inference attack against in real time.

4. General Attack Characterization

We have thus far provided a general formulation of membership inference attacks. In this section, we characterize such attacks through a systematic evaluation of a variety of machine learning models and model combinations using multiple datasets. We show that membership inference vulnerability is data-driven and its attack models are largely transferable. Although the target model is a dominating factor in determining vulnerability, attack data generation techniques need not explicitly mirror the target model. Finally, we show that membership inference attacks can persist as insider attacks in federated systems.

4.1. Experimental Setup

We conduct a series of experiments to support our characterization of membership inference attacks. Due to the space constraint, in this section we report empirical evidence for four types of machine learning models (logistic regression, k-nearest neighbor, decision tree, and Naïve Bayes) in addition to deep neural network results in 

(Shokri et al., 2017). A total of seven datasets are used in these experiments: Adult, MNIST, CIFAR-10, Purchases-10, Purchases-20, Purchases-50, and Purchases-100. Our experiments were conducted using Python (Python Core Team, 2017) and algorithms available with scikit-learn (Pedregosa et al., 2011). All reported results are averaged across 10 runs each using 10-fold cross-validation.


The Adult dataset is available on the UCI Machine Learning Repository (Dheeru and Karra Taniskidou, 2017) and contains 48,842 instances described by 14 different features. The feature set contains both continuous (ex: age, hours per week) and discrete (ex: education, marital status) values. This dataset presents a binary classification problem wherein one wishes to identify if an individual makes K or K in yearly salary.


MNIST is a publicly available dataset containing 70,000 images of handwritten digits (LeCun et al., 2010). Each image is formatted to be 32 x 32 and processed such that the digit is at the center of the image. The MNIST dataset constitutes a 10-class classification problem where the task is to identify which digit between and , inclusive, is contained within a given image.


The CIFAR-10 dataset, also publicly available, contains 60,000 color images (Krizhevsky and Hinton, 2009). Again, each image is formatted to be 32 x 32. The CIFAR-10 dataset also has 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. Each class has 6,000 available images. The problem is therefore a 10-class classification problem where the task is to identify which of the 10 classes is depicted in a given image.


Finally, we developed a number of purchases datasets similar to what was done in (Shokri et al., 2017). The purchases datasets were developed from the Kaggle Acquire Valued Shoppers Challenge dataset which contains the shopping history of several thousand individuals. From this dataset we create new datasets wherein each instance represents an individual and each feature represents a particular product. If an individual has purchased this product, there will be a 1 for the feature and otherwise a 0. The instances are then clustered into different shopping profile types. These cluster assignments are treated as the classes. We created different datasets with 10, 20, 50, and 100 shopping profile types. The classification problem then becomes: given a shopper’s purchase history, identify their shopping profile type.

We highlight these datasets as each shows fundamentally different results in previous work and the makeup of each is significantly different. For example, the Adult dataset presents a binary classification problem with numeric and factor features and was demonstrated to be resilient to membership inference attacks in previous work with precision results of 50.3%. The Purchases-50 dataset by comparison has 50 different class values, only binary features, and was much more vulnerable to attack in previous work, with precision results of 86.0% (Shokri et al., 2017).

4.2. Membership Inference: Data-Driven

Recall Section 3. There are many choices an adversary can make during the three phases of developing the membership inference attack model, each of which may impact the model accuracy and consequently the attack success rate. Two elements that the adversary cannot control, however, are the training dataset and the target model . Our experimental results show that membership inference attacks are data-driven. That is, the make-up of strongly correlates with the corresponding target model ’s vulnerability to membership inference attacks. Potential targets of such attacks can therefore use knowledge of their data to evaluate their risk. Table 1 compares the seven different datasets measuring three characteristics: (1) feature distribution, (2) number of classes (size of ), and (3) accuracy of (vulnerability to) membership inference.


In-Class Standard Deviation

Number of Classes Accuracy of
Membership Inference
Adult 0.1433 2 59.89
MNIST 0.1586 10 61.75
CIFAR-10 0.2301 10 90.44
Purchases-10 0.3820 10 82.29
Purchases-20 0.3873 20 88.98
Purchases-50 0.3873 50 93.71
Purchases-100 0.3832 100 95.74
Table 1. Comparison of datasets versus membership inference attack accuracy using a decision tree model.

The number of classes is important as it characterizes the number of regions into which the input space is divided. The more classes, the smaller each region. With smaller regions, there is less uninformed space and in fact the regions will more tightly surround the provided training instances in . This will make any single instance more likely to alter the decision boundary as space is “tighter” between the regions. If an instance is more likely to impact the decision boundary of the target model, then an adversary will be more likely to infer its inclusion in the training dataset.

Another side of this same argument can be seen in the in-class standard deviation metric. This value captures feature distributions by addressing the following: within a dataset, given all instances of the same class, how similar are the feature vectors? The Adult and MNIST datasets, for example, have significantly lower standard deviations than the Purchases datasets despite having more instances of each class. This demonstrates more uniformity within classes of the Adult and MNIST datasets as compared to the Purchases datasets. If an instance is exceedingly similar to other instances of the same class then it will be less likely to noticeably impact the decision boundary during training. If, however, instances within a class are notably different, then the inclusion of each instance may significantly impact the decision boundary. Therefore, the uniformity of the target training data within each class will, in addition to the number of classes, impact an adversary’s ability to identify the inclusion of a particular instance.

The accuracy results in Table 1 are reported from experiments using the same attack development process targeting decision tree models trained on each of the seven datasets. The variation in results demonstrates that factors such as in-class standard deviation and number of classes are drivers of a model’s susceptibility to membership inference. This leads us to characterize membership inference attacks as data-driven.

4.3. Transferable Attack Models

Recent studies on adversarial machine learning attacks such as evasion attacks or poisoning attacks (Szegedy et al., 2013)(Papernot et al., 2016b)(Papernot et al., 2016a)(Rozsa et al., 2016) have shown that maliciously generated adversarial examples tend to transfer from one model to another. This is an important property as it opens the door to adversaries in the black-box scenario. That is, an adversary need not know details of their target model to launch a successful attack. Through extensive experiments, we observe that membership inference attacks are similarly transferable.

For example, the results in Table 2 are the accuracy of membership inference attacks using various attack configurations against a decision tree model trained on the Purchases-20 dataset. The relative consistency seen in Table 2 demonstrates that many attack configurations are viable. An adversary does not need to know the target model configuration to launch an effective attack. This suggests that membership inference attack models are transferable from one target model to another, provided that targets are trained on the same dataset .

Purchases-20 Attack Data Generation Model
Attack Model DT k-NN LR NB
DT 88.98 87.49 72.08 81.84
k-NN 88.23 72.57 84.75 74.27
LR 89.02 88.11 88.99 83.57
NB 88.96 78.60 89.05 66.34
Table 2. Accuracy of membership inference attack against a decision tree target model trained on the Purchases-20 dataset.

Next, we evaluate the standard deviation of membership inference attack results for a membership inference attack model learned over the output of an attack data generation model and deployed against a target model across the seven datasets for the following three scenarios: (1) vary the model types for and while keeping consistent, (2) varying and while keeping consistent, and (3) varying and while keeping consistent.

Table 3 shows the results of the experiments on various combinations of model types for the CIFAR-10 dataset. We calculate the average standard deviation for scenario (1) as follows: denotes the standard deviation of all accuracy values in rows 1, 5, 9, and 13 where the target model is a decision tree (DT). Similarly, - corresponds to rows 2, 6, 10, and 14, with 3, 7, 11, and 15, and with 4, 8, 12, and 16. Then we consider the average standard deviation in accuracy for scenario (1) to be the average of , -, , . For scenario (3) we follow a similar process but with row sets 1-4, 5-8, 9-12, and 13-16. Scenario (2) is calculated by averaging the standard deviations for each of the 4 columns.

Attack Model Target Model Attack Data Generation Model
DT DT 90.44% 85.64% 60.48% 65.78%
k-NN 54.92% 69.32% 55.01% 51.38%
LR 53.84% 61.06% 61.10% 50.02%
NB 50.46% 50.58% 49.98% 50.20%
k-NN DT 89.96% 81.55% 89.07% 61.10%
k-NN 55.33% 68.32% 62.45% 50.89%
LR 51.34% 59.58% 64.78% 50.09%
NB 50.12% 50.61% 50.46% 50.11%
LR DT 90.37% 90.11% 88.81% 66.98%
k-NN 51.72% 69.90% 65.29% 55.64%
LR 50.01% 64.34% 67.40% 54.49%
NB 50.54% 50.63% 50.60% 50.29%
NB DT 90.42% 89.86% 90.52% 63.71%
k-NN 50.33% 68.31% 57.65% 53.08%
LR 50.00% 64.22% 67.63% 53.54%
NB 50.58% 50.44% 50.58% 50.01%
Table 3. Accuracy for CIFAR-10 dataset across experiments with various attack, data generation, and target models.

We follow this process for each of the seven datasets and summarize the results in Table 4. This allows us to compare the impact of model variation for each , , and . We observe that the standard deviation of membership inference attack results is relatively small against a fixed target model when compared to a fixed attack data generation model or fixed attack model. A smaller standard deviation is indicative of a larger impact. That is, the accuracy is stable when the standard deviation is small. A small standard deviation would indicate that the fixed model has more influence over attack accuracy than the varied models.

Dataset Standard Deviation in Accuracy Results
Fixed Fixed Fixed
Adult 0.0093 0.0335 0.0328
MNIST 0.0126 0.0347 0.0351
CIFAR-10 0.0643 0.1233 0.1366
Purchases-10 0.0396 0.1069 0.1074
Purchases-20 0.0545 0.1336 0.1352
Purchases-50 0.0705 0.1468 0.1482
Purchases-100 0.0849 0.1468 0.1452
Table 4. Standard deviation between accuracy results with (1) fixed type and varying and types, (2) fixed type and varying and types, and (3) fixed type and varying and types.

Table 4 clearly shows that for all datasets the deviation is minimized when is fixed. This indicates that variation in and have a comparatively low impact on attack success rates. This supports the evidence in Table 2 that an adversary need not have particularly informed choices in or to develop an attack model. We therefore characterize membership inference attacks, like others in the adversarial learning domain, as transferable.

4.4. Attacks Across Model Types

Many works in adversarial machine learning focus on the vulnerability of deep learning models whose uses range from image classification (He et al., 2016)(Kisačanin, 2017) to speech recognition (Deng et al., 2013)(Hinton et al., 2012)

to natural language processing 

(Mikolov et al., 2013). By generating adversarial examples which are tweaked in ways that are unnoticeable to humans, adversaries can exploit the models’ complexities to force misclassification (Kurakin et al., 2016)(Carlini and Wagner, 2017). This trend has also influenced the study of membership inference problems.

Most existing efforts on membership inference (Long et al., 2018)(Hayes et al., 2017)(Carlini et al., 2018)(Shokri et al., 2017) have been focused on deep learning models. We argue that the complexity exploited in traditional adversarial learning attacks is not explicitly leveraged in membership inference attacks. Additionally, areas where membership inference would be most alarming, such as healthcare (Lee and Yoon, 2017), e-commerce, banking, and government often deploy simpler model types, such as decision trees as model understanding is prioritized for safe use of predictive services. This leads to a natural question: are model types outside of deep learning methods susceptible to membership inference? Our empirical study on all seven datasets and four types of models demonstrates that not only are other model types vulnerable to membership inference attacks but that the model type is in fact very influential in determining the extent of that vulnerability.

The basic hypothesis of membership inference attack is it that models respond differently to instances which they have “seen” versus those they have not. In this hypothesis it is clear that model behavior and sensitivity are likely to impact vulnerability. We therefore consider a variety of model types outside of neural networks. We consider models from 4 different major categories: linear models (logistic regression), Bayesian models (Naïve Bayes), cluster models (k-nearest neighbor with k=5), and tree models (CART decision trees).

For equal comparison to the previous work in (Shokri et al., 2017), we use the shadow model implementation of the membership inference attack. All results from our experiments use 10-fold cross-validation and are averaged across 10 runs for a randomly selected sample of 10,000 instances.

Dataset LR k-NN DT NB NN
Adult 50.13 51.39 55.49 50.22 50.30
MNIST 53.25 50.44 56.66 50.48 51.70
CIFAR-10 70.25 65.99 83.94 50.03 78.00
Purchases-10 64.56 53.53 73.85 50.61 55.00
Purchases-20 75.85 55.36 81.94 50.79 59.00
Purchases-50 81.61 58.19 88.88 52.08 86.00
Purchases-100 83.78 60.11 92.19 54.93 93.50
Table 5. Precision of membership inference attack across 5 model types.

We can clearly see in Table 5 that other models are in fact vulnerable to membership inference and that both the training data (as discussed in Section 4.2) and the model type play an important role in understanding a particular model’s risk. As shown in Table 5, despite the variety in our datasets, the highest precision is seen with the decision tree model for all datasets except Purchases-100 while the Naïve Bayes models consistently show exceedingly low precision across all datasets.

In general, a target model whose decision boundary is unlikely to be drastically impacted by a particular instance will be more resilient to membership inference attacks. For example, the Naïve Bayes algorithm independently considers the probability of a given class for each feature. Therefore, given significant training samples, a single instance only marginally affects these probabilities. This explains the low numbers continuously seen when attacking a Naïve Bayes model. By contrast, a decision tree leaf node will consider a unique feature combination to determine class rather than each feature in isolation. The introduction of a single instance, if that instance displays a unique feature set-class combination, may cause a decision tree to grow and entire new branch. This sensitivity to single instances makes membership inference attacks more successful when targeting decision tree models.

Consequently, it is important to understand that while different datasets display different vulnerabilities to membership inference, so do different model types. This also indicates that machine learning-as-a-service providers who are wary of membership inference attacks against their deployed models may also be able to use model choice to help mitigate vulnerability.

4.5. Variation in Generation Model

Dataset Model Types for Accuracy Accuracy Accuracy Accuracy
All All All
Adult (DT, DT, NB) 59.91% DT 59.89% DT 59.89% NB 50.18%
MNIST (DT, DT, LR) 61.80% DT 61.75% DT 61.75% LR 54.38%
CIFAR-10 (DT, LR, NB) 90.52% DT 90.44% LR 67.40% NB 50.01%
Purchases-10 (DT, k-NN, DT) 82.45% DT 82.29% k-NN 53.78% DT 82.29%
Purchases-20 (DT, LR, NB) 89.05% DT 88.98% LR 80.50% NB 51.29%
Purchases-50 (DT, LR, LR) 93.77% DT 93.71% LR 88.60% LR 88.60%
Purchases-100 (k-NN, LR, DT) 95.86% k-NN 95.74% LR 90.23% DT 62.19%
Table 6. Model set up with maximum accuracy averaged across 10 runs using 10-fold cross validation. Maximum configuration is then compared to configurations where model type is consistent across the target, generation, and attack models using each model type represented in the maximum configuration.

Instinctually, one may believe that the model used to generate attack data must be of the same type as the target model. We briefly discuss why previous research has made this same assumption and then investigate its veracity and seek to explain why it does not strictly hold.

In (Shokri et al., 2017), the authors claim that the shadow model implementation of a membership inference attack requires the shadow models be trained in a similar way to the target model, an assumption followed by later work such as (Long et al., 2017). Consistent with our formalization of membership inference attacks, this claim is equivalent to saying that the attack data generation technique must mirror the behavior of the target model under attack.

The reasoning behind this assumption is intuitive. Let us say the target model is an approximation of an ideal function for a dataset . Then, given a data point for which the adversary aims to identify membership in the training data , the output provided to the adversary will be . This output will then be provided to the attack model to determine classification of as “in” or “out”. Let the binary classifier which serves as the attack model be trained from the output of a generation model . That is, the output for all make up the attack model training data .

We can clearly see that our attack model is therefore trained on the output of a function and deployed against the output of a function . It is understandable then that previous work would seek to have the behavior of mirror the behavior of . This assumption naturally extends to say that , in an attempt to mirror , must be, or intuitively should be, of the same model type as in a successful membership inference attack model.

However, this assumption is not necessary to launch an effective membership inference attack. We conducted a set of experiments varying model type combinations of , , and considering the four candidate model types.The combination with the highest accuracy for each dataset is reported in Table 6. Let be the target model, attack data generation model, and attack model types respectively for the membership inference attack which reported the highest accuracy. These types will naturally vary for different datasets. We also report accuracy for scenarios when all three model types are set to . We similarly report when all three types are equivalent to and .

For example, we recall that in Table 3 the highest accuracy was reported when the target model was a decision tree, the data generation model was a logistic regression model, and the attack model was a Naïve Bayes model. Therefore, = DT, = LR, and = NB, as shown in column 2 of Table 6 along with the attack accuracy under these settings. For the CIFAR-10 dataset, we then also report the attack accuracy when all the models are a decision tree (i.e. equivalent to ), when all models are logistic regression models (), and finally when all models are Bayesian (). We repeat this process for all seven datasets.

We observe that, across datasets, it is not necessary for all models to be of the same type, as no maximally accurate combination contained the same model type across all three phases of the membership attack development. Additionally, the attack data generation model, as previously assumed, does not need to strictly mirror the target model for a successful membership inference attack. In fact, for 5 out of the 7 datasets the highest accuracy was reported when the attack data generation model was of a different type than the target model, i.e. . The reason behind this non-intuitive phenomenon lies in a more precise understanding of the role of the generation model. Although it was previously assumed that the role was to mirror the behavior of the target model, we assert that the generation model’s role is to characterize how the target model may be impacted by the inclusion of a particular instance. That is, how the decision boundary of the target model may reveal the inclusion of an instance.

The generation algorithm is therefore trying to characterize probability distributions related to a decision boundary which either has or has not been informed by the instance. From this perspective, it is now more clear that the vulnerability is more closely related to two elements: distribution of the data and sensitivity of the decision boundary. If a decision boundary for a given dataset is notably impacted by the inclusion of a given instance then membership inference attacks are likely to be more successful.

In summary, we have demonstrated that, despite natural intuition, it is not strictly necessary for the attack data generation technique be of the same type as the target model for a membership inference attack to be successful. Additionally, we again see the dominating factors are the target model type and dataset. We note that when all model types are equivalent the resulting accuracy is within 0.16% of the maximum reported accuracy. By comparison, setting all model types to decreases attack accuracy by up to 28.67%. Setting all model types to demonstrates an even larger decrease in many cases as is reported to be a Naïve Bayes model for multiple datasets. As we previously identified the Naïve Bayes model as robust against membership inference attack, it is unsurprising that attack accuracy will decrease significantly in these cases.

While success is close to maximal in all scenarios when the target model type is known, we re-accentuate the attack success seen in Table 6 with a mixture of model types. This again supports the conclusion that an adversary need not have this level of insider knowledge to launch a successful attack. Rather, when a model and its training dataset are particularly vulnerable, a variety of attack scenarios are likely to demonstrate success.

4.6. Federated Learning and Insider Attacks

4.6.1. Insider Attack Model

To this point we have exclusively discussed “outsider” membership inference attacks. That is, membership inference attacks which are launched by an adversary who is only a user of the target model through black-box access to the target service prediction API. We now introduce the threat of insider membership inference attacks. We define these attacks to be those launched by a participant in a federated learning system.

In recent years there has been an increased interest in the role of federated learning systems in addressing privacy concerns in data mining. The intuition behind federated systems to protect against membership inference is as follows: if an adversary is able to identify that a certain instance is contained within the training data of a model and that model is the result of a federated learning system, then any individual participant will have plausible deniability with respect to their individual dataset. However, such federations open the door to a new risk through insider attacks.

The difference between an attack on federated systems and outsider membership inference attacks is that, in federated systems, the training dataset is divided amongst multiple parties who engage in collaborative learning to provide predictions to the machine learning service. We consider the following loosely federated system: given parties there exist independent dataset , one belonging to each party. Each party will then train a model using as the corresponding training data.

Within this environment, new instances will be evaluated as follows. On input of , each model will output a probability vector . The individual parties will then share their output either openly with one another or with some aggregation service to compute the final output where refers to point-wise averaging of the probability vectors. Any outside adversary using this service will only have access to the final probability vector . Any individual party will therefore have plausible deniability because, if an instance is identified as a member of the training data, the adversary is unlikely to identify which training set specifically.

However we must also consider adversaries who are members of the federated learning systems. That is, the aggregation service or a participating party. Under this scenario the “insider” will have access to the individual probability vectors . The insider membership inference attack then becomes: given these probability vectors, is the adversary able to identify which dataset a training instance belongs to.

We now consider when parties may participate in a federated system. Let , , and represent three candidate parties with training datasets , , and respectively. Let us assume the extreme case that , , and are statistically equivalent. Then, the trained models , , and will approximate the same ideal function . Given this set up, it is then likely that, on any input , the outputs , , and will also be statistically equivalent. Here, there is no accuracy gain for , , or through collaboration. They are therefore unlikely to be motivated to create a federated learning system.

Alternatively, consider such significantly different , , and such that , , and may be considered independent. Let us now assume accuracies of 75%, 80%, and 70% and a majority voting aggregation scheme. Such a federation, on input , has an 84.5% chance of accurately classifying , an accuracy higher than any individual model. Under these conditions, , , and are much more likely to form a federation.

It is therefore reasonable to assume parties are likely to form federated learning systems when their individual datasets are sufficiently different. Unfortunately, this leads to sufficiently different decision boundaries for different parties. These diverging decision boundaries open the door to effective insider membership inference attacks as an adversary will notice differences in , , …, .

4.6.2. Insider Attack Risk in Federated Systems

In Table 7 we see that even datasets showing resilience to outsider membership inference are vulnerable to an insider membership inference attack. We created a federated system where for datasets with so that each party has sufficient instances of each class to learn a meaningful decision boundary. Given a federation where , any party behaving as an adversary will have an attack precision baseline of . This allows for comparison with the outsider inference attacks. Both the Adult and MNIST dataset showed minimal vulnerability in the outsider membership inference attacks and experienced significant jumps in vulnerability in the insider attack scenario while the CIFAR-10 and Purchases-10 datasets show similar precision results, notably outperforming the baseline.

Dataset Outsider Inference Insider Inference
Precision Precision
Adult 55.49 70.12
MNIST 56.66 68.18
CIFAR-10 83.94 82.01
Purchases-10 73.85 74.30
Table 7. Insider inference precision in federated systems with 3 parties. Baseline is 50%. Model type is set to decision tree.

In Figure 7, we plot the decision boundaries created by three different decision tree models trained on disjoint subsets of the Adult dataset. We plot the decision boundaries relative to the capital loss and education number features. The section enclosed within the blue box highlights a portion of the decision boundary which notably differs between each plot. The second level of Figure 7 is a zoomed-in view of this section for all three plots. On the third level we then plot the positive training instances that informed each decision boundary in this region. It is clear that the long region identified as the positive class in the third plot was informed by significantly more positive instances than the other two decision plots. It is decision boundary differences such as those demonstrated here, and what they reveal of the underlying training data, that reveals ownership in the insider membership inference attack.

Figure 7. Decision boundaries for 3 different participants in a federated system using the Adult dataset. An area where the decision boundaries are significantly different is highlighted as well as the training instances provided to each participant relevant to the highlighted area.

This is supported by the accuracy seen for federations different data distributions characteristics. When constructing our federated learning systems for these experiments we first sampled target class distributions at random to create different scenarios which may be seen in deployed federated learning environments.

In Figure 8 we show the relationship between the accuracy of the insider membership inference attack and the distance between the two targeted parties’ data. That is, let a 3-party federation be formed by parties , , and wherein behaves maliciously and launches an insider membership inference attack against and . Then, we look at how similar all the instances of class in are to those of class in . This similarity is averaged across all . If the datasets are more similar then their in-class distance will be lower. In Figure 8 we can see that will be less successful than when and have a closer in-class measure than if the datasets are very different.

Figure 8. In-class distance between different parties compared with insider attack accuracy for the Adult dataset with a 3-party federated learning system.

Unfortunately this leads to a catch-22 scenario. Parties with very similar looking training datasets will not be motivated to participate in a federation as they are less likely to see significant increases in classification accuracy. Parties with very different looking training datasets, however, will be more vulnerable to insider membership inference.

We argue that the risk of insider membership inference attack is of particular concern as participants are likely to assume they are less vulnerable than in outsider scenarios due to the plausible deniability protections inherent in federated learning systems. The potential for insider attacks, however, calls for a robust trust policy for any federated learning system to considers such risk.

5. Mitigation Techniques

We categorize mitigation techniques to protect against membership inference into two categories: model hardening and API hardening.

Model Hardening. Model hardening mitigation techniques are implemented during the training phase of the target model . We suggest four such techniques: (1) model choice, (2) fit control, (3) regularization, and (4) anonymization. In model choice, a service provider may introduce concerns of membership inference into their model selection process. For example, as was demonstrated in our experimentation, a Naïve Bayes target model will be much more resilient to membership inference attacks than a decision tree and therefore may be the preferred model type for a particular machine learning service. For fit control, service providers may leverage parameters such as the decision tree’s complexity parameter to prevent overfitting and therefore decrease inference risk. Another technique is regularization where noise is added to a model’s loss function. This technique is particularly relevant to deep learning models. Finally, the service provider may introduce anonymization techniques into prior to training. That is, if is made to be -anonymous prior to the training of then the impact of a single instance may be hid amongst others prior to training.

API Hardening. API hardening techniques are implemented during the prediction phase through the machine learning service. For example, the service API may introduce noise into the prediction vector before returning it to the user. This will reduce the adversary’s understanding of exactly where an instance lies with respect to the target model’s decision boundary. Another option is to reduce the dimensionality of . This can be done either by limiting the return value to the top values in or even returning only the prediction label.

Mitigation Parameter Model Accuracy Attack Accuracy
None 55% 83%
Dimension Reduction 55% 83%
55% 82%
label 55% 73%
Regularization L2 56% 80%
L2 57% 73%
L2 56% 66%
L2 35% 52%
Table 8. Results of mitigation techniques in (Shokri et al., 2017) using dataset of Texas hospital admissions which contains 100 classes. A neural network target model is used.

An issue that is pervasive in each mitigation technique is a loss of utility. This is demonstrated in Table 8. We note here that there is not significant reduction in attack accuracy until only the label is returned when using a dimension reduction technique. Consider a hospital processing images of mass scans for cancer classification. A service which says “This mass is cancerous.” is significantly less useful than a service which says “There is a 56% chance that this mass is cancerous.” Additionally, even with the strongest dimension reduction, the attack accuracy is still notably outperforming the baseline at 73%. Regularization on the other hand is able to successfully decrease attack accuracy to 52%. Unfortunately, to gain this level of protection the noise introduced to the model decreases model accuracy to 35%. This is a significant challenge in the mitigation of membership inference attacks.

6. Related Work

Membership inference is a young area, but there still exist a few works since (Shokri et al., 2017) investigating the risk of membership inference attacks. Most of the existing proposals focus on deep learning models and are influenced by adversarial deep learning research such as (Goodfellow and Jones, 2015)(Papernot et al., 2016c)(Yuan et al., 2017). For example, (Long et al., 2018) identifies vulnerable instances for membership inference attacks exclusively relating to deep learning models while (Carlini et al., 2018) seeks to define a measure of deep learning model vulnerability, with respect to the model’s encoding of a random secret within the training data, orthogonal to membership inference. (Hayes et al., 2017) studies membership inference in generative adversarial networks (GANs), and shows that the level of generalization required to mitigate against membership inference in GANs will lead to worse results in accuracy and utility. This serves as an independent evidence for the mitigation strategies and research direction we promote, which include methods for anonymizing the training datasets while preserving the model training quality.

(Long et al., 2017) proposed a measure of risk at the data instance level, and evaluated on Adult and Purchases-10 with attack model, target model under attack and shadow model of the same type. The identified instance-level risks exemplify our analysis that membership inference attacks are data-driven.

Alternative study on membership inference relates to the impact of overfitting based on the belief that the cause of membership inference is model overfitting. (Yeom et al., 2017) investigates this belief and concludes that overfitting is not necessary for a model to show vulnerability to membership inference. Their investigation is limited to the role of overfitting and assumes a powerful adversary with prior knowledge of the average training loss for . This work does motivate that membership inference vulnerability is more complex than just the overfitting in the training data.

Application-specific membership inference, such as  (Pyrgelis et al., 2017), studied membership inference vulnerability specific to location data under a powerful adversary with deep prior knowledge. Though this work aims at attacking aggregate data rather than a trained target model and its training data, it does demonstrate the risk of membership inference attacks in a privacy-conscious domain.

Our work is mainly inspired by (Shokri et al., 2017), the first exploratory work in membership inference, which shows membership inference risks for a deep learning trained target model and attack model. In this paper we extend the work done in (Shokri et al., 2017) to a more general setting towards demystifying the adverse effect of membership inference across different types of models with both general and empirical characterization of why membership inference attacks are more effective in some scenarios than in others.

7. Conclusion

We have presented the first generalized framework for the development of a membership inference attack model. This general formulation enables an in-depth characterization of membership infernece attacks against different types of machine learning models. Through extensive experimentation and empirical evidence, we show when and why machine learning models may be vulnerable to membership inference attacks. By exploring a variety of machine learning model types and their correlations with respect to the three phases of the attack generation process, we present five interesting characteristics of membership inference attacks: (1) they are data-driven attacks, (2) attack models are transferable, (3) target model type is a strong indicator of model vulnerability, (4) attack data generation techniques need not explicitly mirror the target model, and (5) membership inference attacks can persist as insider attacks in federated systems. We also include a discussion on countermeasures and mitigation methods against membership inference attacks.

Our research on membership inference attacks and membership privacy continues along several dimensions. First, we are engaged in the development of countermeasures and defense methods. Second, we are currently studying the scale and diversity of membership inference attacks in federated and collaborative learning systems. Third, we are investigating the complex relationships between membership inference attacks, membership privacy, and differential privacy.


This research is partially support by the National Science Foundation under Grants SaTC 1564097, NSF 1547102, and an RCN BD Fellowship, provided by the Research Coordination Network (RCN) on Big Data and Smart Cities. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the RCN or National Science Foundation.