The widespread adoption of machine learning (ML) in real-world applications, such as image and speech recognition, and user behavior prediction, equally creates widespread challenges. Increasingly, as companies and organizations integrate ML components into their products and services, this ubiquitous deployment also leads to increasing scrutiny on its security and privacy implications.
In recent years, the threat of membership inference (MI) attacks against classifiers—where an attacker detects whether a specific data instance was used as training data—has become real and, as a fundamental challenge in ML privacy, has received much attention in the literature [shokri2017membership, long2017towards, yeom2018privacy, nasr2018machine, nasr2018comprehensive]
. Most modern-day classifiers use a softmax function to produce a vector of class probabilities for each input instance. An attacker—who is given access to the target classifier— seeks to determine whether an instance—where is the feature vector and is its label— is a member of the dataset used to train it (the classifier). Effective MI attacks pose a significant threat to the privacy of the individuals whose records have been used in training data. For example, cancer patient records may be used to train a classifier to predict what kind of treatment would be most promising for a future patient. However, if high-accuracy MI attacks can be carried out against this classifier, access to the classifier may enable adversaries to learn sensitive personal information about patients whose data were used as training data.
It is known that overfitted classifiers are more susceptible to MI attacks than classifiers with small generalization gaps [shokri2017membership, salem2018ml, yeom2018privacy]. Given a classifier, its generalization gap is defined by , where is the classifier’s accuracy on training data, and is the classifier’s accuracy on testing data. A classifier that overfits has a large generalization gap. Intuitively, the generalization gap of a classifier is closely related to its MI attack vulnerability. A trivial attack —first introduced by Yeom et al. [yeom2018privacy]—will say that an input is a member of the training data if and only if the classifier yields the correct label on . This attack requires knowing the true label, and its effectiveness is unsurprisingly closely tied to the generalization gap—more precisely, the advantage of this Baseline attack (how better it is over random choices over balanced queries, half members, half non-members) is , because the attack’s accuracy is . Thus, as long as the generalization gap exists, the baseline attack will have some level of effectiveness.
Contributions. In this work, we do an in-depth study of MI attacks and defenses. We claim the following contributions:
We systematically catalog existing MI attacks, also introducing a new MI attack, called the Instance-Probability attack.
We perform extensive experimental evaluations of state-of-the-art MI attacks and defense methods, highlighting an interesting quantitative relationship between the generalization gap, , and the largest advantage any existing MI attack over all classifiers and all datasets, . We empirically observe that
Figure 1 illustrates this phenomenon over multiple image classifiers and datasets. On the scatter plot, each point represents one classifier on one dataset. The axis gives the generalization gap , and the axis gives . The first part of Equation (1), namely , exists because the baseline attack achieves an advantage of . The second part of Equation (1), namely , is an empirical observation from extensive experiments.
This relationship helps us understand why MI attacks are effective against classifiers on some datasets but not on others. For example, it explains the phenomenon that the MNIST dataset is almost immune to MI attacks [shokri2017membership, salem2018ml]
. The generalization gap of the standard Neural Network classifiers on MNIST is just around 0.02. Similarly, classifiers trained for CIFAR-100 are more vulnerable to MI attacks than those CIFAR-10 because the former classifiers have higher generalization gaps.
This relationship also helps to guide research on MI attacks. For example, some researchers have proposed to defend against MI attacks by limiting access to the probability vector, e.g., restricting the probability vector only to the top classes, coarsen the precision of the probability vector, or increase the entropy by using temperature in softmax function. These defenses are ineffective against the baseline attack. More recently, in [nasr2018machine, jia2019memguard], researchers propose new defenses against MI attacks, and claim that the effectiveness of MI attacks is almost reduced to random guessing. However, these defenses do not reduce the generalization gap .
Knowing this relationship, it would be obvious that while the proposed defenses in [nasr2018machine, jia2019memguard] target some specific MI attacks, and are able reduce their effectiveness, they fail to defend against other MI attacks such as the baseline attack.
Our main contribution is a new principled, and more effective, defense against MI attacks. The relationship in quation 1 suggests that in order to defend against MI attacks, one needs to close the generalization gap. Hence, we propose to intentionally reduce training accuracy to match the test data accuracy. We achieve this through a regularization
penalty over the training loss function. Our newregularizer is a permutation-invariant function (set function) that takes the empirical distribution of the softmax output over all training and validation data instances as input, and outputs a penalty that forces classifier training to make these empirical distributions match. Our penalty uses the Maximum Mean Discrepancy (MMD) [fortet1953convergence], in the form of a nonparametric kernel two-sample test as introduced by Greton et al. [gretton2012kernel], which gives us a measure of the difference between the two empirical distributions. However, in practice, MMD alone tends to reduce both the training and test accuracies, an unwanted outcome. To tackle this challenge, we propose to combine MMD with mix-up training [zhang2018mixup], which, in our experiments, garners the benefits of MMD without the test accuracy penalty. Our resulting MMD+Mix-up regularizer achieves significant lower training accuracies in our experiments while keeping test accuracies in line with the original loss function. The resulting MMD+Mix-up defense not only significantly frustrates more attacks—compared to the state-of-the-art defense Mem-Guard [jia2019memguard]— but it is also significantly faster to train—can be trained on large neural networks— and incurs no extra computational cost at inference time.
Organization. The rest of this paper is organized as follows. Section 2 summarizes all existing attacks and defenses. Section 3 describes our proposed attacks and defenses. Section 4 presents extensive evaluation on all attacks and defenses described in this paper. Related work is discussed in Section 5. Section 6 concludes this paper. An appendix gives additional information.
|Class-Vector||Class||Probability vector||Neural Network||Training plus Data||Shokri et al. 2017 [shokri2017membership]|
|Instance-Vector||Instance||Probability vector||KL distance to avg||Training plus Data||Long et al. 2017 [long2017towards]|
|Global-Loss||Global||Training loss||Threshold||Training plus Data||Yeom et al. 2018 [yeom2018privacy]|
|Global-Probability||Global||Probability of correct label||Threshold||Training plus Data||Variant of Global-Loss|
|Global-TopThree||Global||Top 3 in||Neural Network||Training plus Data||Salem et al. 2018 [salem2018ml]|
|Global-TopOne||Global||Top 1 in||Threshold||Probability-Vector Oracle||Salem et al. 2018 [salem2018ml]|
|Baseline||Global||Predicted class||Binary||Label-only Oracle||Yeom et al. 2018 [yeom2018privacy]|
|Instance-Probability||Instance||Probability of correct label||Threshold||Training plus Data||This paper|
2 Membership Inference Attacks and Defenses
In this section, we discuss existing membership inference (MI) attacks and defenses.
2.1 Adversary Models and Nomenclature
We consider ML models that are used as -class classifiers. Each instance is a pair , where is the feature and is the label. A classifier typically uses a softmax function, and is trained with cross-entropy loss function. For any input instance , the target classifier outputs , a length- vector of non-negative real values that are output of the softmax function. The components of sum up to 1 and are generally interpreted as the confidence values that belongs to each of the classes. The predicted class is the one with the highest corresponding value in . In some cases, the values in the vector are calibrated so that they indeed represent some probabilities; in other cases, these values sum up to
, but do not have real probability interpretation. For convenience, we callthe probability vector.
The adversary’s goal is to determine whether an instance is a member of the dataset for training a target classifier . We consider the following three adversary classifiers, in decreasing strengths.
Training-plus-Data. The adversary knows the training process such as the classifier architecture, hyper parameters, and training algorithms, and the distribution of the data used in training. In addition, the adversary has oracle access to the target classifier , that is, the adversary can query and obtain the probability vector.
Knowledge of the training process and data distribution enables the adversary to train shadow classifiers that behave similarly to the target classifier, and learn information about the target classifier through the shadow classifiers.
Probability-Vector Oracle. The adversary has no knowledge (or, equivalently, does not utilize any such knowledge) of the training process or the data distribution. Thus the adversary can only exploit the oracle access to the target classifier, which gives a probability vector for each instance.
Label-only Oracle. When querying the target classifier, the adversary is given only the predicted label of the instance, and not the probability vector. Since all existing attacks extract features from the probability vectors, they are not applicable under this adversary model.
In [nasr2018comprehensive], other adversary models for MI attacks were considered. These include white-box attacks, in which the attacker also knows the parameters of the target classifier, and federated training settings, in which the attacker may also observe the updates during the training. Similar to [salem2018ml, jia2019memguard, carlini2018text, long2017towards, long2018understanding, nasr2018machine], we focus on the setting where the adversary uses oracle accesses to the target classifier, although a Training-plus-Data attacker has access to sufficient information to train shadow classifiers similar to the target classifier. Threats from attacks in the black-box settings are more serious, because it can be carried out by anyone who can query a classifier, e.g., in the Machine Learning as a Service setting. Understanding threats from MI attacks in the black-box setting also helps understanding MI attacks in other settings.
Table 1 summarizes the attacks considered in this paper. We assign a two-part name to each attack. The second part of a name is based on the features used in the attack. The first part of a name is based on the granularity of the MI attack. An attack may use one Global model or threshold value for all instances, or use one model (or threshold value) for each Class or each Instance.
2.2 Summary of Existing Attacks
We now summarize black-box MI attacks that have been proposed in the literature.
The Class-Vector Attack [shokri2017membership]. To our knowledge, Shokri et al. [shokri2017membership] presented the first study on MI attacks against classifiers. The attack is in the “Training-plus-data” adversary model, and for each instance , the probability vector is the feature for determining whether is used in training .
The adversary knows a dataset , which is from the same distribution as the dataset used to train the target classifier. The adversary creates samples from , and trains shadow classifiers , one from each . These shadow classifiers generate training data for MI classifiers. The attacker trains MI classifiers, one for each class. The classifier for class is trained using instances in that are of class . For each such instance , one can obtain instances for training the MI classifier for , one from each shadow classifier. With the -th shadow classifier, one has the probability vector as the feature, and whether as the label. Each MI classifier takes a probability vector
as input, and produces a binary classification result. These attack classifiers are feed-forward neural network with one fully-connected hidden layer of size 64, with ReLU activation functions. When trying to determine the membership of an instance, one feeds to the MI classifier for class to obtain a binary membership prediction.
The Baseline (Global-Label) Attack [yeom2018privacy]. Yeom et al.[yeom2018privacy] analyzed the relationship between overfitting and membership, and proposed two attacks, one of which predicts that an instance is a member for training if and only if gives the correct label on . This attack can be applied to the “Label-only Oracle” adversary model, in which the adversary is given only the label.
The advantage of the Baseline attack can be estimated from the training and testing accuracy. Letbe the training accuracy, and be the testing accuracy. Then is the generalization gap. Given a balanced evaluation set, the accuracy of the baseline attack is the average of its accuracy on members and non-members. By definition, its accuracy on members is about and its accuracy on non-members is about . Its overall accuracy is thus . Its advantage is therefore . This estimation of accuracy is an approximation but our experiments empirically show that this estimation is quite accurate. Since this attack is so simple and broadly applicable, we call this attack the baseline attack. This attack’s advantage provides a lower-bound estimation of a classifier’s vulnerability to MI attacks.
The Global-Loss Attack [yeom2018privacy]. This attack uses the probability vector for an instance with true label to compute the cross-entropy loss: , where is the probability value for the true label . The attack predicts to be a member when is smaller than the average loss of all training instances. We consider this attack to be in the Training-plus-Data adversary model, because the average training loss is not normally provided. The natural way to obtain it is to build one or more shadow classifiers.
The Global-Probability Attack. We note that the Global-Loss attack effectively predicts an instance to a member if the probability for the correct label is above some threshold. It fixes the threshold based on the average value for all training instances. This threshold may not achieve the maximum accuracy. We thus also consider using shadow classifiers and training data to compute the threshold that achieves the best accuracy. We call this the Global-probability attack.
The Global-TopOne Attack [salem2018ml]. Instead of using the probability of the correct label, Salem et al. [salem2018ml] proposed to use the highest value in the probability vector. They proposed an interesting threshold-choosing approach that exploits oracle access to the target classifier. One randomly generates some data instances, which are non-members with high probability, and query the target classifier with these instances. One then chooses the threshold using the top percentile among the Top 1 probabilities from the probability vectors of these instances. Experiments on different in range from to showed decent performance.
The Global-TopThree Attack [salem2018ml]. Salem et al. [salem2018ml] proposed an attack to use only the top three values in the probability vector for MI attack. This is in part motivated by considering the “Probability-Vector Oracle” model, in which the adversary cannot train shadow classifiers to generate training data for MI classifiers. Salem et al. [salem2018ml] proposed a data transferring attack, where the adversary trains one shadow classifier using a different dataset and different classifier structure. Since the number of classes may be different for the shadow classifier and the target classifier, the adversary chooses top 3 values in the probability vector (top 2 in case of binary classification) as the features for MI attack. Furthermore, only a single global MI classifier is used.
The Instance-Vector Attack [long2017towards]. Long et al.[long2017towards] proposed three MI attacks, all in the “Training-plus-data” adversary classifier. The first attack in [long2017towards] is called the untargeted attack, which trains an MI classifier that takes a probability vector and the label as the feature, and whether is used in training the classifier as label. This is essentially equivalent to the Class-Vector attack discussed above. The Class-Vector attack trains classifiers, one for each class. Even though this attack trains just one classifier, the class label is taken as an input into the classifier, enabling the classifier to adapt based on the class label. Because it is essentially equivalent to the Class-Vector attack, we do not consider it separately in the rest of this paper.
The remaining two attacks in [long2017towards] train instance-specific MI classifiers. That is, there is one MI classifier for each instance . To enable this, one creates samples of , and trains shadow classifiers, where is trained with , and is trained with . For each instance , one has probability vectors , where , from classifiers trained without , and probability vectors , from classifiers trained with . The intuition is that if is used in training , then the probability vector should be more similar to the latter than to the former .
Long et al. [long2017towards] investigated two ways to measure this similarity, and found that the more effective approach is to use the Kullback-Leibler (KL) divergence. More specifically, the adversary calculates , the average of all , and , the average of all . That is, is the average probability vector on for classifiers training without using , and is the average probability vector for classifiers trained using . To determine whether is used to train , one computes the Kullback-Leibler (KL) divergence of and with
. The KL divergence between two discrete probability distributionsand is defined to be . If , one predicts that is not used in training , otherwise the adversary will predict this instance to be used in the training.
2.3 Existing Defenses
Multiple defense mechanisms have been proposed in existing literature to mitigate the threat of membership inference attack. We summarize existing defenses in the following.
-Regularizer and modifying predictions. In [shokri2017membership], the author presented the relationship between overfitting and membership inference attack, showing that using -Regularizer to reduce the overfitting can help defend membership inference attack. Other than this, the author also proposed several mitigation strategies to reduce the information given by the prediction vectors, such as providing only the top-k probabilities and using high temperature in softmax.
Min-Max Game. Nasr et al. [nasr2018machine] proposed a Min-Max Game style defense to train a secure target classifier. During the training of the target classifier, a defender’s attack classifier is trained simultaneously to launch the membership inference attack. The optimization objective of the target classifier is to reduce the prediction loss while minimizing the membership inference attack accuracy. This is equivalent to adding a new regularization term to the training process, which is called adversarial regularization.
Dropout. Salem et al. [salem2018ml] explored using Dropout as a defense against membership inference attack. Dropout was firstly proposed in [srivastava2014dropout] and was proven to be useful in reducing overfitting.
Model Stacking. Salem et al. [salem2018ml] also proposed to use model stacking as a defense. Model stacking is a common ensemble technique used in machine learning applications, which combines multiple weak classifiers together to make the final prediction.
Mem-Guard. Jia et al. [jia2019memguard] proposed an adversarial-attack based noise-adding defense. Instead of directly adding regularizations into the training of the target classifier, Jia et al. choose to add noise to the prediction vectors after training. One membership inference attack classifier is trained by the defender and the defender tries to add noise to the prediction vectors to fool the attack classifier. To find the noise for each instance, the author proposed to search for adversarial example based on each instance against the defender’s attack classifier. The adversarial perturbation found for each instance is the desired noise. The prediction vectors with noise, which can perfectly fool the defender’s attack classifier, can fool the attacker’s attack classifier with high probability if two attack classifiers are similar.
Differential Privacy. Differential privacy [dwork2008differential, dwork2006calibrating, Dwo06] is a widely used privacy-preserving technique. Many differential privacy based defense techniques add noise to the training process of a classifier. One example is DP-SGD proposed in [abadi2016deep]. Noise is added to the gradient to ensure the data privacy. Training with differential privacy provide theoretical guarantee against any MI attacks, with a cost of large accuracy drop.
Limitations. Some of the proposed defenses (such as dropout and L2 regularization) implicitly result in reduction of the generalization gap as a side effect. However, since reducing the gap was not the goal, the gap typically remain significant. Other defenses withheld information needed for some MI attacks; they will not affect the baseline attack, which requires only the predicted label on an instance. Some recent defenses (e.g., Min-Max game and Mem-Guard) target specific MI attacks, and try to render these attacks ineffective. However, these defenses do not affect the generalization gap, and hence does not reduce the effectiveness of the baseline attack.
3 New Attack and Defense
In this section we propose our MI attack and introduce our MI defense.
3.1 Proposed MI Attack
The Instance-Probability Attack. We introduce a new attack that uses , the probability value for the true label . The difference between the Global-Probability attack and this attack is that the former trains a single classifier that takes and determines whether is a member, and this attack trains a different classifier for each instance . This attack uses the “Training plus data” adversary model, and requires training multiple shadow classifiers so that for each instance , there exist a number of shadow classifiers trained with , and a number of shadow classifiers trained without . We use as the feature for determining the membership of . The MI classifier essentially finds a threshold based on the training data generated from the shadow classifiers.
The goal of introducing this attack is to explore what would be the most effective MI attack. Through experiments, we have found that essentially all existing attacks rely on the predicted probability of the correct label. Our proposed Instance-Probability Attack attempts to fully utilize this piece of information, by finding an instance-specific threshold using shadow classifiers.
3.2 Proposed MI Defense
As we mentioned in the introduction, the generalization gap determines both a theoretical lower bound and an empirical upper bound for a classifier’s vulnerability to MI attacks. We thus propose to focus on reducing this gap to defend against MI attacks.
3.2.1 MMD-based Regularization Loss of Empirical Training and Validation Distributions
We propose to intentionally reduce training accuracy to match testing accuracy. To achieve this goal, we add to the training loss function a regularizing term that is the difference between the output distribution of the training set and that of a validation set.
We need a quantitative and differentiable metric to measure the difference, and choose to use Maximum Mean Discrepancy (MMD) [fortet1953convergence, gretton2012kernel]. MMD is used to construct statistical tests to determine if two samples are drawn from different distributions, based on Reproducing Kernel Hilbert Space (RKHS) [borgwardt2006integrating]. Let and
be the random variable sets drawn from distrubtionand . The empirical estimation of distance between and , as defined by MMD, is:
where is a universal RKHS, and , and () is the softmax output of the -th training (validation) instance. For speeding-up training time, rather than using a minimax objective to learn in Equation (2
)—as common procedure for the use of MMD in modern transfer-learning applications[long2017deep]— we will use a more traditional approach and make the Gaussian kernel [gretton2012kernel]. The sums and are empirical estimates of the mean embeddings of the training and validation softmax distributions [gretton2012kernel] and, thus, Equation (2) is a valid two-sample test. We leave the task of performing a stronger two-sample test with a minimax-optimized MMD objective as future work. Note that the regularizer Distance in Equation (2) is invariant to permutations of the ordering of the training () and validation () softmax outputs of the data instances, hence, Distance is a permutation-invariant function (a set function).
In order to calculate the MMD regularization loss, we need one mini-batch of training and validation instances. Since MMD is differentiable on the neural network parameters (e.g., the CNN), we obtain two sets of gradients: the first set is calculated based on training instances; the second set is based on validation instances. To update our classifier (e.g., the CNN), we only use the first set of gradients, since the the second set of gradients would optimize over the validation instances, which is undesirable as it could overfit them and make the empirical distribution significantly different from that of the (future) test data.
3.2.2 Mix-up Training Augmentation
We combine MMD with mix-up training, first introduced by Zhang et al. [zhang2018mixup]
. This training strategy is to use linear interpolation of two different training instances to generate a mixed instance and train the classifier with the mixed instance. The generation of mixed instances can be described as follows:
Here and are instance feature vectors randomly drawn from the training set; and are one-hot label encodings corresponding to and . is used in training. In Zhang et al. [zhang2018mixup] it is shown that mix-up training can improve generalization, resulting in higher accuracy on CIFAR-10 and CIFAR-100. This, in turn, reduces the generalization gap. Also, intuitively, since only the mixed instances are used in training, the classifier will not be directly trained in the original training instances, and should not remember them as well.
4 Evaluation of MI Attacks and Defenses
We first describe the detailed experimental settings in Section 4.1 and 4.2. We summarize the evaluation results of the MI attacks in Section 4.3. These results motivate our proposed approaches for defending against the MI attacks, which will be presented in Section 4.4.
4.1 Experimental Setup
|Largest attack advantage||0.173||0.362||0.191||0.145||0.020|
|Baseline attack advantage||0.122||0.317||0.143||0.101||0.010|
|Class-Vector attack advantage||0.162||0.317||0.161||0.106||0.013|
|Global-Loss attack advantage||0.173||0.359||0.191||0.143||0.013|
|Global-Probability attack advantage||0.160||0.351||0.177||0.145||0.013|
|Global-TopOne attack advantage||0.127||0.269||0.067||0.116||0.011|
|Global-TopThree attack advantage||0.152||0.288||0.121||0.125||0.012|
|Instance-Vector attack advantage||0.122||0.222||0.177||0.133||0.020|
|Instance-Probability attack advantage||0.173||0.362||0.167||0.088||0.017|
Datasets. We use the following datasets, which are used in existing work on MI attacks.
CIFAR-10 and CIFAR-100. These are benchmark datasets for evaluating image classification algorithms and systems. They contain 60,000 color images of size 32 32, divided into 50,000 for training and 10,000 for testing. In CIFAR-100, these images are divided into 100 classes, with 600 images for each class. In CIFAR-10, these 100 classes are grouped into 10 more coarse-grained classes; there are thus 6000 images for each class. These two datasets are widely used to evaluate membership inference attack in [shokri2017membership, salem2018ml, nasr2018machine, nasr2018comprehensive, yeom2018privacy].
PURCHASE-100. This dataset is based on the “acquire valued sopper” challenge from Kaggle. This dataset includes shopping records for several thousand individuals. We obtained the processed and simplified version of this dataset from the authors of [shokri2017membership]. Each data instance has 600 binary features. This dataset is clustered into 100 classes and the task is to predict the class for each customer. The dataset contains 197,324 data instances. This dataset is also widely used to evaluate membership inference attack in [shokri2017membership, salem2018ml, nasr2018machine, nasr2018comprehensive, jia2019memguard].
TEXAS-100. This dataset includes hospital discharge data. The records in the dataset contain information about inpatient stays in several health care facilities published by the Texas Department of State Health Services. Data records have features about the external causes of injury, the diagnosis, the procedures the patient underwent, and generic information. We obtained a processed version of the dataset from the authors of [shokri2017membership]. This dataset contains 67,330 records and 6,170 binary features which represent the 100 most frequent medical procedures. The records are clustered into 100 classes, each representing a different type of patient. This dataset is used to evaluate membership inference attack in [shokri2017membership, nasr2018machine, nasr2018comprehensive, jia2019memguard].
MNIST. This is a dataset of 70,000 handwritten digits. The size of each image is 32 32. The images are cropped to 28 28 so that the digits are located at the center of the image. There are 10 classes of different digits in this dataset. There are 60000 training images and 10000 testing images in this dataset. This dataset is used to evaluate membership inference attack in [shokri2017membership, salem2018ml].
We choose the state-of-the-art NN architecture for these datasets, and use Pytorch[paszke2017automatic]
, a widely used deep learning framework in academia, to implement these neural networks.
For CIFAR-10 and CIFAR-100, we use AlexNet [krizhevsky2012imagenet], VGG [Simonyan15], 3 ResNet [He_2016_CVPR] architectures, and 2 DenseNet [Huang_2017_CVPR] architectures. For PURCHASE-100 and TEXAS-100, we follow the NN architecture described in [nasr2018comprehensive]. For MNIST dataset, we follow the NN architecture described in [shokri2017membership].
Following [shokri2017membership, nasr2018machine], target classifiers on CIFAR-10 dataset, CIFAR-100 dataset, TEXAS-100 dataset and MNIST dataset are trained with instances, and target classifiers for PURCHASE-100 dataset are trained with
instances. (The PURCHASE-100 dataset contains about three times more data as the other datasets.) We also study the effect of varying the number of training instances and the number of epochs in Section4.3.
The training recipe for all datasets and classifiers are summarized in Table 6 on Page 7 in the appendix. For widely-used benchmark datasets, namely CIFAR-10, CIFAR-100, and MNIST, the training and testing accuracy numbers in Table 6 are for the situation of using the whole training set for training. As a result, these testing accuracy numbers in Table 6 are generally higher than those in membership inference experiments where classifiers are trained using only a subset of the instances. These numbers are similar to the ones reported elsewhere, demonstrating that the training was done correctly. PURCHASE-100 dataset and TEXAS-100 dataset are not widely used benchmark datasets, and we report numbers when using and training instances, respectively.
In the implementation of our MMD loss, we choose to reduce the difference between the probability vector distributions of members and non-members for each class. That is, a batch of training samples and a batch of validation samples in the same class are used together to compute the MMD score and only the gradients based on the batch of training samples are used.
For different classifiers, the weight parameter for the MMD loss is different. The details can be found in Table 6. We choose the weight so that the generalization gap can be reduced mostly while the testing accuracy drop is less than , compared to the classifiers trained without defense. In practice, the weight parameter, which implies the privacy-accuracy tradeoff, can be chosen based on the users’ need. We adopt the implementation from [transferlearning.xyz].
For PURCHASE-100 and TEXAS-100 dataset, we observed that using mix-up training leads to training failure. Perhaps this is due to the fact that the data features are binary. We thus do not evaluate these two defenses on them. The only applicable defense is the MMD loss defense.
4.2 MI Attack Workflow (via shadow classifiers)
Several MI attacks need to utilize “shadow classifiers” to train the attack classifiers. We train shadow classifiers as our default choice and we follow the workflow proposed in [shokri2017membership] with minor differences in generating the training sets for shadow classifiers and target classifiers, in order to achieve better efficiency for instance level attacks (the Instance-Vector attack and the Instance-Probability attack).
Training set generation. We randomly divide into three disjoint parts: , and . is the evaluation set and has 5000 instances. MI attacks are evaluated on accuracy of determining membership of instances in . The target classifier is trained with half of the instances in and the rest sampled from . Each shadow classifier is trained with a randomly sampled half of the instances in and the rest sampled from . This way, each shadow classifier is trained with half of the evaluation set, like the target classifier. At the same time, at least of the instances used for training the target classifier are never used in training any shadow classifier.
4.3 Evaluation of Attacks
We use attack advantage as the metric. Let
denote an attack’s accuracy when given an evaluation set such that one half of the set consists of instances used during training the classifier, and the other half consists of instances drawn from the same distribution as the training instances but are not used in training. Precision and recall is similar to accuracy in our experiment when the evaluation set is well balanced. Flipping an unbiased coin yields an accuracy of. The advantage of the attack is defined as .
Figure 2 shows the attack advantage of each attack on different classifier and dataset combinations. Table 2 provides an average summary for all the attacks, including average training and testing accuracy. From both Figure 2 and Table 2, we can see that the largest attack advantage is between and .
From Table 2, we can see that on MNIST the generalization gap is close to , so are the largest attack advantage among all attacks. For this reason, we do not include MNIST dataset in Figure 2 or the following discussion of different attacks.
For discussions below, we call readers’ attention to Table 2, especially the three rows for generalization gap, largest attack advantage, and baseline attack advantage.
The Baseline attack. The empirically observed baseline attack advantage is very close to the theoretical prediction of half of the generalization gap. We also observe that, comparing with other attacks, the baseline attack is pretty strong, even though it is trivial to implement and requires minimal access to the target classifier. For example, on CIFAR-100 and TEXAS-100, the two datasets that are more vulnerable to MI attacks, while the baseline attack has an advantage of , the best attack achieves or less.
The Instance-Probability attack. Our proposed Instance-Probability attack outperforms other attack on most settings, even though the differences between this attack and some other attacks (such as Class-Vector and Global-Probability) are usually small. Jumping slightly ahead, we note that these differences will widen in Section 4.4, when we add defences. A notable exception is that one PURCHASE-100 dataset, the Instance-Probability attack under-performs the Global-Probability attack by . Recall that both attacks use to determine membership. This result means that choosing a single threshold for all instances performs better than choosing a threshold for each instance, and is because the number of shadow classifiers is relatively small, so that choosing a threshold for each instance overfits.
The Class-Vector and Global-Probability Attacks. The Class-Vector attack and the Global-Probability attack generally perform similar to (and sometimes are) the best attack.
The Global-Loss Attack. This attack performs similarly to the Global-Probability attack most of the time. They both choose a threshold on the predicted probability of the correct label. The Global-Loss uses the average training loss to set the threshold.
The Global-TopOne Attack. This attack uses the highest value in the probability vector to determine membership. Compared with the Global-Probability Attack, which uses the probability for the correct label, this attack performs strictly worse. We observe that the differences in terms of advantage between these two attacks are highly correlated with training accuracy. The following table shows this relationship, the difference row is the difference between the advantage of the Global-Probability attack and the Global-TopOne attack.
This is because when training accuracy is low, the top-one probability for training many instances are not that of the correct label.
The Global-TopThree Attack. This attack performs similarly to the Global-TopOne attack across all datasets. In this attack, the three highest probability values are used as the feature. Since the highest value in the probability vector is used in the Global-TopOne attack as well as this attack, the remaining two probability values do not seem to be effective in improving the attack.
The Instance-Vector Attack. This attack performs only slightly better than the baseline attack on CIFAR-10 dataset, TEXAS-100 dataset and PURCHASE-100 dataset. On CIFAR-100 dataset, the Instance-Vector attack performs even worse than our baseline attack. This suggests that using KL-divergence is not the most effective way to exploit the information.
Varying size of training set. Intuitively, a classifier’s vulnerability to MI attacks is related to the size of the training set. Here we explore the effect of different size of training set by using three different sizes: on CIFAR-10 and CIFAR-100 datasets, with ResNet-20. The two plots in the first row of Figure 4 shows the results. We have three sets of attack advantage bars for all three different training set sizes. We see that the attack advantage values drop when the size of training set is increasing. However, the drop is relatively small. When the size of training set is increased from 10000 to 30000, the maximum attack advantage drops from from to and to on CIFAR-10 and CIFAR-100 dataset, respectively.
The most important observation from the evaluation for all different dataset, network and training set size combinations is that the relationship always holds.
|No Defense||Mix-up Alone (ablation)||MMD Alone (ablation)||MMD+Mix-up|
|Largest attack advantage||0.173||0.126||0.124||0.093|
|Baseline attack advantage||0.122||0.081||0.109||0.066|
|Class-Vector attack advantage||0.162||0.098||0.122||0.062|
|Global-Loss attack advantage||0.173||0.077||0.124||0.063|
|Global-Prob. attack advantage||0.160||0.102||0.121||0.068|
|Global-TopOne attack advantage||0.127||0.083||0.081||0.049|
|Global-TopThree attack advantage||0.152||0.083||0.088||0.050|
|Instance-Vector attack advantage||0.122||0.126||0.099||0.092|
|Instance-Prob. attack advantage||0.173||0.126||0.123||0.093|
|No Defense||Mix-up Alone (ablation)||MMD Alone (ablation)||MMD+Mix-up|
|Largest attack advantage||0.362||0.332||0.240||0.168|
|Baseline attack advantage||0.317||0.245||0.211||0.140|
|Class-Vector attack advantage||0.317||0.229||0.139||0.088|
|Global-Loss attack advantage||0.359||0.217||0.189||0.097|
|Global-Prob. attack advantage||0.351||0.276||0.227||0.147|
|Global-TopOne attack advantage||0.269||0.241||0.085||0.083|
|Global-TopThree attack advantage||0.288||0.241||0.130||0.105|
|Instance-Vector attack advantage||0.222||0.300||0.059||0.133|
|Instance-Prob. attack advantage||0.362||0.332||0.240||0.168|
4.4 Evaluation of Defenses
Tables 4 and 4 give the numbers averaged across the classifiers with different level of defenses. Figure 3 shows the effect of different level of defenses on different classifier and dataset combinations. Notice that in Figure 3, only generalization gap, the baseline attack advantage and the largest attack advantage are reported for each dataset, network and defense level combination.
One main observation is that using MMD loss or mix-up alone each offers some reduction in generalization gap; and combining mix-up with MMD loss offers better defense. However, the generalization gap are not fully closed by the defense. With both defense, the largest attack advantage in CIFAR-10 decrease from 0.173 to 0.092, and that for CIFAR-100 decreases from 0.362 to 0.147.
Another observation is that with both MMD loss and mix-up, most attacks in the literature become a lot less effective and perform similar to the baseline.
Our proposed Instance-Probability attack performs the best. Interestingly, the Instance-Vector attack, which performs significantly worse than other attacks on CIFAR-10 and CIFAR-100 when no defense is used, perform close to the Instance-Probability attack with both defences. It appears that when evaluating defence effectiveness, one needs to consider Instance-level attacks. If we were to consider only the Class-Vector attack, we would have concluded that the proposed defenses are highly effective, since its advantage is greatly reduced.
Perhaps the most important finding from these experimental results is that the relationship remains valid in all experiments with different kinds of defenses.
Regarding Purchase-100 dataset and Texas-100 dataset, the only applicable defense is the MMD-loss defense. With MMD-loss defense deployed, we see that the largest attack advantage drops from 0.140 to 0.089 and from 0.178 to 0.146.
Varying training set size. The two plots in the second row in Figure 4 shows the effect of different number of training instances under the defenses. We explored 3 different number of training instances: ,, for ResNet-20 on CIFAR-10 and CIFAR-100 dataset. We can see that the attack advantage drops gradually when more training instances are used; however, the reduction effect is less than deploying the defenses. One needs to notice that adding more training instances can be combined with deploying defenses.
|Defense level||No Def.||Mixup+MMD||Mem-Guard||No Def.||Mixup+MMD||Mem-Guard|
|Largest attack advantage||0.166||0.067||0.113||0.356||0.166||0.324|
|Baseline attack advantage||0.116||0.067||0.112||0.333||0.166||0.324|
|Global-Probability attack advantage||0.156||0.067||0.112||0.356||0.166||0.320|
|Global-Loss attack advantage||0.166||0.056||0.113||0.356||0.155||0.319|
|Global-TopOne attack advantage||0.120||0.049||0.028||0.249||0.103||0.093|
|Global-TopThree attack advantage||0.140||0.052||0.027||0.273||0.104||0.063|
|Class-Vector attack advantage||0.137||0.054||0.113||0.320||0.115||0.316|
4.5 Comparison against Existing Defenses
In this section, we compare our defense with existing defenses.
In modern deep neural network training, -regularizer and dropout is widely used. In our experiment, we already use these two techniques during the training, thus we don’t compare our defense with these two. Model stacking, according to [jia2019memguard], will incur high label loss, which implies high testing accuracy drop. In this paper, we focus on defenses that are able to maintain the testing accuracy while defending the membership inference attack.
According to [jia2019memguard], the Mem-Guard defense could outperform all existing defenses including MIN-MAX Game defense [nasr2018machine], here we focus our comparison with Mem-Guard. We adopt the code from [jia2019memguard] and use default parameters. In our experiment, we test Baseline attack, Global-Probability attack, Global-Loss attack, Global-TopOne attack, Global-TopThree attack and Class-Vector attack against Mem-Guard. We exclude the two instance level attack here because training a model with Mem-Guard is slow, and we need to train 50 shadow classifiers to learn instance-specific information. Even for a small CNN like Alexnet, it takes 14 hours to train one Alexnet classifier on CIFAR-10 with mem-Guard. For comparison, it takes 20 minutes to train one Alexnet classifier with our MMD+Mix-up defense.
For the same reason, when comparing with Mem-Guard, we use only three of the seven NN architectures used in earlier evaluations. More specifically, we use AlexNet, VGG-16 and ResNet-20 on CIFAR-10 and CIFAR-100 dataset. We chose these three because they are small, and faster to train with Mem-Guard.
The results (averaged over AlexNet, VGG-16 and ResNet-20) are shown in Table 5. We see that Mem-Guard defense can fully defend Global-TopOne attack and Global-TopThree attack. However, when it comes to Global-Proability attack and Class-Vector attack, Mem-Guard defense can only reduce the attack advantages of these two attack to the attack advantage of baseline attack. Mem-Guard has no impact on the baseline attack, since it does not reduce the generalization gap. Our defense, which focus on reducing the generalization gap, provides better defense since the largest attack advantage under our defense is lower than the largest attack advantage under Mem-Guard defense by a noticeable margin (0.046 for CIFAR-10 and 0.158 for CIFAR-100).
Differential privacy offers a principled defense against MI attacks, in the sense that the parameter determines an upperbound on the advantage of any possible MI attacks. However, current techniques for satisfying DP do not yet offer practically effective defense against MI attacks. Existing techniques incur heavy cost in terms of testing accuracy. For example, as reported in [abadi2016deep], when the privacy budget is , the test accuracy (trained using all data) is on CIFAR-10 dataset. The test accuracy without differential privacy guarantee is using an architecture similar to [abadi2016deep], and over 95% using DenseNet. From the perspective of privacy, one should note that ensures a theoretical upperbound of on MI advantage, and yields a theoretical upperbound of . In contrast, while our defenses are deployed, the largest attack advantage is 0.093 and 0.168 for CIFAR-10 dataset and CIFAR-100 dataset, respectively.
5 Related Work
Membership inference from summary statistics were studied in the context of Genome-Wide Association Studies (GWAS). Homer et al. [homer2008resolving] proposed attacks that could tell with high confidence whether an individual participated in a GWAS study, assuming that the individual’s DNA is known. The attack works even if the group includes hundreds of individuals. Because of the privacy concerns from such attacks, a number of institutions, including the US National Institute of Health (NIH) and the Welcome Trust in London all decided to restrict access to data from GWAS. Wang et al [WLW+09] improved these attack techniques.
Li et al. [LQS+13] introduced the membership privacy framework, which prevents an adversary from significantly improving its confidence that an entity is not in the dataset or not, and showed that -differential privacy is equivalent to membership privacy where the adversary’s prior belief is such that the probability of any instance’s membership is independent from that of any other instance. This framework was used in Backes et al. [backes2016membership] to address privacy concerns with genomics data.
Shokri et al. [shokri2017membership] presented the first study of MI attack on classification models. The Class-Vector attack evaluated in this paper is from [shokri2017membership]. Shokri et al. [shokri2017membership] observed that overfitting is highly related to MI attacks, and regularization techniques such as dropout and L2 regularization may be applied to mitigate MI attacks. It is shown in Shokri et al. [shokri2017membership] that strong L2 regularization leads to rapid drop of test accuracy. The classifier models considered in this paper are already trained using dropout.
Yeom et al. [yeom2018privacy] quantitatively analyzed the relationship between the MI attacks and the loss over both training set and testing set. The Global-Loss attack and the baseline attack are from this paper. Salem et al. [salem2018ml] studied effect of MI attacks using less and less information. They reported that only one shadow model would be sufficient to launch the attack proposed in Shokri et al. [shokri2017membership]. The Global-TopOne and Global-TopThree attacks are from this paper. These attacks are evaluated in our paper. See Section 2 for details on these attacks. For defense, Salem et al.[salem2018ml] proposed using Dropout and model stacking as defense against MI attacks. Since model stacking also reduces the generalization gap, it can also mitigate the threat of MI attacks.
Long et al.[long2017towards] proposed the Instance-Vector attack. Long et al. [long2018understanding] first assess which instances are more vulnerable to MI attacks, and then carry out attacks against those vulnerable ones. Similar attacks can be carried out by analyzing the distribution of an instance’s predicted probability in shadow models, and are interesting future work.
Nasr et al. [nasr2018machine] proposed a privacy MIN-MAX game to help defense the MI attack. The main idea comes from GAN [goodfellow2014generative]. By replacing the discriminator with an attacker, the benign users can train the target model and the attack model simultaneously. The expected result is that a target model that can resist the attack used in the training. From the numbers reported in [nasr2018machine], one can see that while models trained in this way can resist against MI attacks in Nasr et al. [nasr2018machine], the generalization gap is such that the baseline attack would have higher advantage. Nasr et al. [nasr2018comprehensive] introduced white-box MI attacks which leveraged the gradient information. Their experiments showed that attacks exploiting the gradient corresponding to the last layer can achieve higher advantage than the the generalization gap. Investigating white-box attacks is interesting future work.
studied MI attacks on traditional machine learning algorithms such as logistic regression and K-nearest neighbors. Their paper showed that logistic regression, K-nearest neighbors, decision tree and Naive Bayes can be easily attacked by the membership attack. Carlini et al.[carlini2018text] investigated membership inference attacks on text data and showed that the deep language models can unintentionally memorize training data, which is vulnerable to membership inference attack. Hayes et al. [hayes2019logan] showed that GAN is also vulnerable to membership attack. They utilized the output of the discriminator of the GAN to determine the membership. We point out that the baseline attack and attacks based on the predicted probability of the correct label can be carried out in these settings as well.
Jia et al. [jia2019memguard] brought adversarial example generation into defending membership inference attack. In their defense, which is called “Mem-Guard”, the defender trains its own membership inference attack model and by generating adversarial example based on each instance to fool the defender’s attack model, the defender can use the “adversarial perturbation” as the “noise”. After adding the “noise” to the prediction vector of each instance, the noise will help to fool the attacker’s attack model. We compared our defense with the Mem-Guard defense in Section 4.5.
In this work, we have demonstrated a quantitative relationship between the effectiveness of black-box MI attacks and the generalization gap. We then exploited this relationship to propose a new MI defense that intentionally reduces the training accuracy as a way to defend against MI attacks. In particular, our defense is based on a new regularizer (MMD+Mix-up) that combines MMD loss with mix-up training to match the softmax output of the training and validation distributions, thus reducing the effectiveness of MI attacks—though not yet eliminating them. Our experiments show that our MMD+Mix-up defense does not hurt the classifier’s test accuracy. We have also used the same insights to propose a new, highly effective, MI attack: the Instance-Probability attack.
A few open questions remain. For instance, whether the generalization gap can be closed completely without reducing the test accuracy. Another intriguing question is whether there is an explanation for the empirical observation that the generation gap seems to upper bound black-box MI attack advantages. A future research direction is the extension of our methods to white-box attacks.
7.1 Training recipe and model performance
We report the training accuracy and testing accuracy for each model when training from scratch. The training set size is 50000,50000,10000,20000 and 60000 for CIFAR-10 dataset, CIFAR-100 dataset, TEXAS dataset, PURCHASE dataset and MNIST dataset. See Table 6 for more details.
7.2 The effect of our defense on prediction distributions
In Figure 5, we show the effect of our defenses on distributions of the probability of correct labels for members and non-members. The neural network used here is ResNet-18 and the datasets is CIFAR-100. As shown in Figure 5 (a), when no defense is deployed, the two distributions are highly different from each other. The probability of correct label for members are mostly centered near 1. However, for non-members, a noticeable fraction of this group are clustered near 0, which means the predictions for this fraction are wrong. When mix-up defense is added, the two distributions become similar but the difference is still observable. After adding MMD loss defense, we see that the two distributions become almost the same. The changes on distributions explain why our defenses work. From the view of generalization gap, which can be roughly estimated by the gap between two cumulative distribution when the probability of correct label reaches 0.5, we can see that the gap is gradually reducing while more defenses are applied. However, one still needs to notice that the generalization gap is not fully eliminated and the generalization gap is still larger than when all defenses are deployed. This means the attacker can still gain some attack advantage on the target model.
|dataset||Model||learning rate||epochs||schedule||batch size||train acc.(%)||test acc.(%)||MMD loss weight|
Training recipe and model performance for different models. Learning rate is adjusted to 0.1x when current epoch is in schedule. Notice that ResNet-18 and DenseNet-121 are designed for ImageNet dataset, therefore these three models start with a convolutional layer with kernel size 7*7, which leads to slightly worse performance than its sibling CIFAR version: ResNet-20 and DenseNet_BC(100,12).