Membership inference (MI) against machine learning models has been extensively studied from the adversary perspective Shokri et al. (2017); Yeom et al. (2018); Salem et al. (2019); Nasr et al. (2018); Long et al. (2018); Jia et al. (2019); Song et al. (2019). An adversary applies membership inference algorithms to steal privacy-sensitive membership information from target model’s training data Backes et al. (2016); Pyrgelis et al. (2017). One example is that participation in a disease-specific dataset indicates the diagnosis of such disease Backes et al. (2016). Hence from the view of legitimate users and machine learning service providers, it is desirable to suppress such attack as much as possible, i.e., diminishing the MI success rate of the adversary.
On the other side, we advocate that membership inference can enable the legitimate users to better control their data, e.g., to infer whether a service provider uses their data to train a public model. In machine learning context, the influence of a user on trained model should be erased at request as the learned model leaks private information about its training data Fredrikson et al. (2015); Wu et al. (2016); Shokri et al. (2017); Hitaj et al. (2017); Zhu et al. (2019). This aligns with the spirit of “the right to be forgotten” in the European Union’s General Data Protection Regulation (GDPR) 5 and the California Consumer Privacy Act in the United States 4. In Section 4, we show that membership inference can be used to verify the compliance of data deletion in machine learning (also known as machine unlearning), which has attracted lots of attention over the years Ginart et al. (2019); Bourtoule et al. (2019); Guo et al. (2019). We argue that this ability benefits not only the users but also the machine learning service providers because it has been observed that applications with good privacy protection promote user adoption Alsdurf et al. (2020). In this case, it is desirable to enhance the MI success rate for the legitimate users.
It seems that we cannot achieve these two goals simultaneously because of the contradictory objectives. In this paper, however, we propose leveraging private data augmentation as a solution. We first show one can utilize the data augmentation used in training to boost MI success rate. Then we show legitimate users can protect membership privacy by applying data augmentation secretly.
The core technique for boosting MI success rate is to utilize the set of augmented data points instead of one data point to do membership inference. Specifically, with a set of predictions of augmented data points, we formulate the membership inference as a set classification problem. We then design classifiers based on permutation invariant features: one using threshold on some basic statistics of losses, and the other training a neural network with the moments of losses. Our algorithms significantly improve the success rate over existing membership inference algorithms when the same augmented data points are used in training process. Moreover, without the knowledge about the augmented data used in training, our algorithms have no improvement over existing algorithms. Therefore, we argue that the adversary cannot take advantage of our algorithms if the augmented data is kept as secret by the legitimate user.
In this paper, we focus on the black-box membership inference Shokri et al. (2017); Yeom et al. (2018); Salem et al. (2019); Song et al. (2019); Sablayrolles et al. (2019). The black-box setting naturally arises in the machine learning as a service (MLaaS) system. In MLaaS, a service provider trains a ML model on private crowdsourced data and releases the model to users through prediction API. Under the black-box setting, one has access to the model’s output of a given example. Typical outputs are the loss value Yeom et al. (2018); Sablayrolles et al. (2019)
and the predicted logits of all classesShokri et al. (2017); Salem et al. (2019). In this paper, we use the loss value of a given example as suggested in Sablayrolles et al. (2019).
Our contributions can be summarized as follows. We design new membership inference algorithms against machine learning models and achieve significantly higher inference accuracy than existing algorithms. We also demonstrate that the adversary can not take advantage of our algorithms if the augmented data used in training is kept as secret. Furthermore, we compose a set of membership inference results over a given dataset to verify the compliance of deleting the given dataset from trained model. To the best of our knowledge, this is the first work to explore such usage of membership inference. Our verification process is easy to implement and achieves high confidence based on our strong MI routine.
1.1 Related work
Existing implementations of membership inference include those based on neural network and those based on simple metrics. Shokri et al. (2017) and Salem et al. (2019) train a neural network to infer the membership of a given example with multiple shadow models generating training data. Song et al. (2019) use a threshold on the prediction confidence to determine if an example is a member or not. Yeom et al. (2018) and Sablayrolles et al. (2019) classify one sample in the training set if the loss of the sample is lower than a predefined threshold. All existing attacks only use the target model’s output on single example even when data augmentation is used.
In machine learning context, “the right to be forgotten” requires to remove the influence of given examples from a trained model Ginart et al. (2019); Bourtoule et al. (2019); Guo et al. (2019). Ginart et al. (2019) construct quantized and divide-and-conquer -means clustering algorithms to avoid retraining the model from scratch. Bourtoule et al. (2019) divide the dataset into multiple shards, train sub models on each shard, and only retrain the sub model that is affected by deleting request. Guo et al. (2019) focus on linear models, where they use second order update to cancel out target example’s influence and then add noise to achieve certified removal.
Apart from designing provable algorithm that can perform data deletion from a trained model, it is also considered how to verify the compliance of requested deletion. Sommer et al. (2020) use backdoor attack Liu et al. (2018); Gu et al. (2019) to verify whether a server deletes given data faithfully. They force the model to memorize the specific trigger of each user, which requires quite a number of intentionally poisoned examples to succeed. This decreases the model’s test performance. In contrast, we use membership inference as basic oracle, which neither requires individual user has large dataset nor hurts the target model’s test performance.
1.2 Paper organization
The rest of this paper is organized as follows. In Section 2, we introduce some background knowledge. We present two new membership inference algorithms in Section 3 and show that the malicious adversary cannot take advantage of them. Section 4 introduces how to use membership inference to verify the compliance of machine unlearning. Section 5 presents experiment results on different models and datasets. Finally, we conclude in Section 6.
We use to denote a dataset with pairs of feature and label. The dataset may be constructed from a set of users . The crowdsourced dataset is the union of all users’ datasets. A trained model with parameters
is a mapping from the feature space to the label space. Given model’s prediction and target label, the loss functioncomputes the loss value, e.g. the cross-entroy loss for classification problem. We use to denote if without further explanation.
Data augmentation generates similar variants for each example to enlarge the training set. We use to denote the set of all possible transformations. For given , each transformation function generates one augmented example . For example, if data is image, each could be the combination of rotation and random cropping. The set then contains the transformations with all possible rotation degrees and cropping locations. The size of may be unlimited and we are only able to use a subset in practice. We use to denote a subset with transformation instances. We adjust to control the strength of data augmentation.
Let be a membership inference algorithm against a trained model . The output of is a boolean value denotes whether a given example is in the training set of . For example, the algorithm based on loss threshold Yeom et al. (2018); Sablayrolles et al. (2019) can be formulated as
where is a predefined scalar threshold. Song et al. (2019) use prediction confidence to infer the membership. They classify an example as member if the output confidence is higher than a predefined scalar threshold:
3 Membership Inference with Data Augmentation
In this section, we first show why existing algorithms perform badly when the target model is trained with data augmentation. Then we present our new MI algorithms which can exploit the information of a set of augmented instances rather than a single example. Finally, we show the adversary cannot take advantage of the proposed algorithms if the augmentation used in training is kept secret to the adversary.
3.1 On the limitation of existing MI algorithms
The success of existing MI algorithms largely depends on the generalization gap of target model. Generalization gap measures how differently the target model perform on the training set and the test set. Therefore, MI is easy to do if the generalization gap is large. For example, Shokri et al. (2017) achieve inference success rate higher than 70% against a model trained on CIFAR10 Krizhevsky and Hinton (2009) dataset while the gap between target model’s training and test accuracy is nearly 40%. They show the inference accuracy drops quickly as the target model’s generalization gap diminishes (Table 2 in Shokri et al. (2017)).
It is has been observed that that data augmentation, a powerful weapon to combat overfitting, can significantly reduces the success rate of existing MI algorithms Sablayrolles et al. (2019). In order to clearly illustrate the influence of data augmentation, we plot the distribution of losses in Figure 1. We use the ResNet110 model He et al. (2016) to fit CIFAR10 dataset Krizhevsky and Hinton (2009). We use the same transformation pool as the one in He et al. (2016) which contains horizontal flipping and random cropping. The size of is . As shown in Figure 1, the overlap area between losses of training examples and losses of test examples is much larger when data augmentation is used. For the loss inside the overlap area, it is impossible to classify its membership confidently. The overlap area sets up a ceiling on the membership inference success rates for the MI algorithms which use single loss value as feature.
3.2 New membership inference algorithms
As the existing MI algorithms are inherently limited by the large overlap area in Figure 1, we propose leveraging more information from data augmentation to boost the MI success rate. With data augmentation, the model is trained with a set of augmented data points . Consequently, we have a set of outputs based on and a set of losses . We hide and for readability when there is no ambiguity. Instead of using single loss as in Sablayrolles et al. (2019), we use to do the membership inference. The inference task is then to classify into either training or test set. For each , is a valid empirical distribution. We plot the distributions of the basic statistics of in Figure 2. The overlap area in Figure 2 is smaller compared to Figure 1 because contains more information than single loss value.
Smaller overlap area indicates that it is easier to distinguish examples from training and test sets. Therefore, the results in Figure 2 suggest that using the basic statistics of is better than using single loss. Motivated by this, we first explore using the mean or standard deviation of as input feature for set classification.
can either be tuned as hyperparameter or set based on the outputs of shadow models. From Table1, we can see that Algorithm 1 outperforms existing algorithms by a large margin.
As mean and standard deviation are only coarse information of , it is natural to ask if we can design features incorporating the most information of . A straightforward solution is to train a neural network which takes entire as input and outputs final decision. However, such straightforward solution performs badly, whose experiment results are relegated to Appendix. One reason is that using losses as input is not invariant to permutation on losses . As a set classification problem, the order of elements in should not change the final decision Zaheer et al. (2017)
. Using raw losses as input does not possess this property because permuting the input losses will change the outputs of neurons. This makes the inference model hard to distinguish what is useful feature for classification and what is the noise induced by the different positions of losses.
In order to design features which are invariant to permutation on losses, we use the raw moments of . The raw moment
of a probability density (mass) functioncan be computed as
The moments of
can be computed easily because it is a valid empirical distribution with uniform probability mass. For probability distributions in bounded intervals, the moments of all orders uniquely determines the distribution (known asHausdorff moment problem Shohat and Tamarkin (1943)). More importantly, shuffling the elements in would not change the resulting moments. In Algorithm 2, we use the raw moments as features and train a neural network to infer the membership.
The training data of Algorithm 2 can be collected by using shadow models Shokri et al. (2017) or by assuming the prior knowledge on part of the target model’s training data Nasr et al. (2018). We find that the training of inference network in Algorithm 2 only needs several hundreds of training points, which is much fewer than previous algorithms based on neural net, e.g. Shokri et al. (2017) need thousands of training points to train the inference model. This is because the moments of losses are features easier to fit than the output logits in Shokri et al. (2017).
To demonstrate the effectiveness of Algorithm 1 and 2, we evaluate them on both the small convolution model in the membership inference literature Shokri et al. (2017); Sablayrolles et al. (2019) and standard ResNet He et al. (2016). We benchmark our algorithms with and . For baselines, we report the best result among using each and the loss of original image. We do not use the loss of original image for our algorithms. We use the benchmark dataset CIFAR10, which has 10 classes of real-world objects. We use 6 common operations to construct , including rotation, translations, random erasing, etc. The details are introduced in Appendix. We sample a subset with . Other implementation details are the same as those used in Section 5. The results are presented Table 1. For a given strength of augmentation, Algorithm 1 and 2 outperform existing methods significantly. Algorithm 2 has the best performance in general because it utilizes the most information of . Surprisingly, our algorithms on models trained with data augmentation sometimes perform even better than the baseline algorithms on models trained without data augmentation. More experiments with varying are presented in Section 5.
3.3 Using secret transformations to protect membership privacy
Algorithm 1 and 2 give legitimate users new and better ways to verify their influence in trained models. Then the question left is how to prevent the adversary takes advantage of our algorithms to to steal private membership information.
We propose using private data augmentation to suppress the adversary without hurting the inference accuracy of legitimate users. When is in training set, the elements in are small because deep neural net can memorize . Otherwise, the elements in are usually large. This contradistinction makes MI a easier task. Therefore, whether is used in training matters. Legitimate users can utilize this phenomenon to make our attacks only available to themselves. Specifically, instead of choosing the same for all users, can construct privately. The malicious attacker can not acquire useful knowledge of by random guessing as long as the used is large enough. We record the used transformation set for each user and use it to evaluate our algorithms after training. This process is similar to set a password. Without the “password” (), the malicious attacker can only use the original image or randomly chosen transformations which yields significantly lower inference success rate. In Table 2, we show the inference success rates when the adversary only has partial or no knowledge on the transformation set used in training. The inference success rate drops quickly as the adversary’s knowledge on diminishes. We illustrate the above process in Figure 3.
4 Use Membership Inference to Verify Machine Unlearning
In this section, we show membership inference can be used to verify the compliance of machine learning. We composite the results of membership inference on individual examples in a given set to infer whether the influence of such set is removed form target model. We use and to denote the target dataset and its size, respectively. For every in , returns if it predicts the example is used in training, otherwise it returns . Let be the number of total positive predictions. Large indicates that the user’s data is not deleted. Following Sommer et al. (2020), we formulate a hypothesis testing problem. Specifically, we define the null hypothesis as : is not in the training set of given model (server removes the influence of from model) and alternative hypothesis as :
is in the training set. There are two types of errors we might make. The Type I error rate
(false positive) and Type II error rate(false negative) are
We accept if is smaller than a given threshold otherwise we accept . Let be the probability that the inference oracle correctly predicts not a member of training set. Let be the probability that the MI oracle correctly predicts a member. We assume the probabilities are the same and independent for every in . For a given threshold , the probability that we make the Type I error is
which means that is true but we rejects it because is large. Analogously, the probability of making Type II error is
5 More Experiments
In this section, we sample a random subset for every example. We first compare our algorithms with existing inference algorithms. Then we use membership inference to verify the compliance of machine unlearning.
5.1 Comparison with existing algorithms
We conduct more experiments on both CIFAR10 and CIFAR100 datasets. CIFAR100 has 100 classes of real world objectives and therefore it is harder than CIFAR10. We also evaluate our algorithms on the Wide ResNet model Zagoruyko and Komodakis (2016). We use the WRN16-8 model, which has more than 10 million parameters. The baseline algorithms are and . For baselines, we report the best result among using every element in and the loss of original image. For all threshold based algorithms, we tune the thresholds to separate the training and test sets optimally Sablayrolles et al. (2019); Song et al. (2019). For Algorithm 2, we use examples from training set and examples from test set to build the training data of inference network. The inference network has one hidden layer with neurons and we compute moments up to orders as input features. We use examples from the training set and examples from the test set to evaluate the inference success rates. The examples used to evaluate inference accuracy have no overlap with inference model’s training data. All models are trained on a single Tesla P40 GPU. More details such as training recipes and model configurations are available in Appendix. We plot the results with varying in Figure 4. Our algorithms achieve high inference success rates against well-generalized models. Both Algorithm 1 and 2 outperform existing algorithms by a large margin on different models and datasets.
5.2 Machine unlearning verification
We use the trained models to verify the proposed verification method. We use the membership inference success rate on training set to estimate and the success rate on test set to estimate . We use to denote the size of target dataset. For given and the tolerance of Type I error , we choose to minimize . The experiment results are shown in Table 3.
|Data (Model)||Attack||MI accuracy||()||()|
As shown in Table 3, our verification mechanism needs smaller compared to the verification based on backdoor attack Sommer et al. (2020) while still achieves very high confidence. Moreover, our mechanism does not need to poison the data with wrong label and therefore causes no damage to target model’s test performance. The target model WRN16-8 in Table 3 achieves 94.6% test accuracy on CIFAR10 dataset. This suggests that membership inference is a powerful and effective way to verify the compliance of requested data deletion.
In this paper, we propose membership inference with private data augmentation, which benefits the legitimate data owners while suppresses the adversary. For a given example, we use the losses of its augmented examples to infer its membership. We present two ways: one using the mean and standard deviation of losses, and the other training a neural network with the moments of losses as permutation invariant features. The proposed algorithms achieve high membership inference accuracy against models with good generalization. Moreover, we show that the malicious adversaries can not take advantage of our algorithms if the benign users apply data augmentation privately.
This work benefits the legitimate data owners with new ways to decide their contribution on a trained ML model without hurting membership privacy. This work also boosts the communication between the participants working on privacy preserving machine learning and the participants working on state-of-the-art ML applications by exploring accurate membership inference algorithms against models with high test accuracy. This work has no explicit ethical concerns and the proposed algorithms do not leverage biases in data.
- COVI white paper. arXiv preprint arXiv:2005.08502. Cited by: §1.
- Membership privacy in microrna-based studies. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Cited by: §1.
- Machine unlearning. arXiv preprint arXiv:1912.03817. Cited by: §1.1, §1.
-  California consumer privacy act. Note: https://oag.ca.gov/privacy/ccpa Cited by: §1.
-  European union’s general data protection regulation. Note: https://gdpr-info.eu/ Cited by: §1.
- Model inversion attacks that exploit confidence information and basic countermeasures. In ACM SIGSAC Conference on Computer and Communications Security, Cited by: §1.
- Making ai forget you: data deletion in machine learning. In Advances in Neural Information Processing Systems, Cited by: §1.1, §1.
- Badnets: evaluating backdooring attacks on deep neural networks. IEEE Access. Cited by: §1.1.
- Certified data removal from machine learning models. arXiv preprint arXiv:1911.03030. Cited by: §1.1, §1.
- Deep residual learning for image recognition. In , Cited by: §3.1, §3.2.
Deep models under the gan: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Cited by: §1.
- MemGuard: defending against black-box membership inference attacks via adversarial examples. In 2019 ACM SIGSAC Conference on Computer and Communications Security, Cited by: §1.
- Learning multiple layers of features from tiny images. Cited by: §3.1, §3.1.
- Trojaning attack on neural networks. Network and Distributed Systems Security (NDSS) Symposium. Cited by: §1.1.
- Understanding membership inferences on well-generalized learning models. arXiv preprint arXiv:1802.04889. Cited by: §1.
- Machine learning with membership privacy using adversarial regularization. In ACM SIGSAC Conference on Computer and Communications Security, Cited by: §1, §3.2.
- Knock knock, who’s there? membership inference on aggregate location data. arXiv preprint arXiv:1708.06145. Cited by: §1.
- White-box vs black-box: bayes optimal strategies for membership inference. International Conference on Machine Learning. Cited by: §B.1, §1.1, §1, §2, §3.1, §3.2, §3.2, §3.2, §5.1.
- Ml-leaks: model and data independent membership inference attacks and defenses on machine learning models. Network and Distributed Systems Security (NDSS) Symposium. Cited by: §1.1, §1, §1.
- The problem of moments. American Mathematical Soc.. Cited by: §3.2.
- Membership inference attacks against machine learning models. In IEEE Symposium on Security and Privacy (SP), Cited by: §1.1, §1, §1, §1, §3.1, §3.2, §3.2.
- Towards probabilistic verification of machine unlearning. arXiv preprint arXiv:2003.04247. Cited by: §1.1, §4, §5.2.
- Privacy risks of securing machine learning models against adversarial examples. In ACM SIGSAC Conference on Computer and Communications Security, Cited by: §B.1, §1.1, §1, §1, §2, §3.2, §5.1.
- Data and model dependencies of membership inference attack. Proceedings on Privacy Enhancing Technologies. Cited by: §B.2.
- A methodology for formalizing model-inversion attacks. In IEEE Computer Security Foundations Symposium, Cited by: §1.
- Privacy risk in machine learning: analyzing the connection to overfitting. In IEEE 31st Computer Security Foundations Symposium (CSF), Cited by: §1.1, §1, §1, §2.
- Wide residual networks. arXiv preprint arXiv:1605.07146. Cited by: §5.1.
- Deep sets. In Advances in neural information processing systems, Cited by: §3.2.
- Deep leakage from gradients. In Advances in Neural Information Processing Systems, Cited by: §1.
Appendix A Supplementary materials for Section 3
a.1 Details of
We use standard operations in image processing literature such as rotation, translation, shearing, etc. The details are listed below.
Flip the image horizontally with probability .
Padding image into and take a crop at random location.
Rotate the image by degrees.
Translate the image by pixels.
Shear the image by degrees.
Erase a box at random location.
For each , the operations are applied with random order and the parameters of each operation are also randomly chosen. We record the sequence of operations and parameters of each operation to save a chosen transformation. Then we use the saved transformations to evaluate MI algorithms after training.
a.2 Using raw outputs as features for inference model
We show that using raw outputs of target model to train inference model yields bad inference accuracy. We evaluate two approaches: one using the raw losses (), and the other using the output logits of all augmented examples plus the ground truth label (
). For example, we concatenate the outputs of all augmented instances into a one dimension tensor. The pseudocodes ofand are shown in Algorithm 3 and 4, respectively. The word ‘specification’ in pseudocode denotes the architecture and hyperparameters of target model. See Appendix B.1 for details.
We use 200 (2500) examples from training set and 200 (2500) examples from test set to build the training data for (). The configuration of inference model is the same as . The results are shown in Table 4. The results suggest that the raw outputs are less informative than moments of losses. This may due to the raw outputs are not invariant to permutation on the augmented instances.
Appendix B Supplementary materials for Section 5
b.1 Implementation details of experiments
The small model used in Section 3 contains two convolution layers with kernels, a global pooling layer and a fully connected layer of size . Following Sablayrolles et al. , Song et al. , we using examples as training set. The model is trained for epochs with initial learning rate 0.01. We decay the learning rate by at the 100-th epoch. The ResNet110 and WRN16-8 models are adopted from original papers and we train the models with the same hyperparameters. For ResNet110 and WRN16-8, we use examples as training set and examples as test set.
We use examples from training set and examples from test set to build the training set for . We use moments up to orders, which yields a -dimensional input. The inference model has one hidden layer with
neurons and we use Tanh non-linearity as activation function. We train the attack network forsteps with learning rate . To compute the inference success rate, we use examples from the training set and examples from the test set.
b.2 More experiments
In Table 5, we compare our algorithms with two baseline algorithms and with . We apply different data augmentation to every example. The experiment settings are the same as those in Section 5. The results in Table 5 further justify the effectiveness of our algorithms. Interestingly, although the generalization gap of WRN16-8 is smaller than the gap of ResNet110, the algorithms achieve higher inference accuracy on WRN16-8. This phenomenon aligns with the finding in Tonni et al.  that the inference accuracy is affected not only by the generalization gap but also by the model and dataset in use.