Membership Inference with Privately Augmented Data Endorses the Benign while Suppresses the Adversary

07/21/2020 ∙ by Da Yu, et al. ∙ Microsoft SUN YAT-SEN UNIVERSITY 11

Membership inference (MI) in machine learning decides whether a given example is in target model's training set. It can be used in two ways: adversaries use it to steal private membership information while legitimate users can use it to verify whether their data has been forgotten by a trained model. Therefore, MI is a double-edged sword to privacy preserving machine learning. In this paper, we propose using private augmented data to sharpen its good side while passivate its bad side. To sharpen the good side, we exploit the data augmentation used in training to boost the accuracy of membership inference. Specifically, we compose a set of augmented instances for each sample and then the membership inference is formulated as a set classification problem, i.e., classifying a set of augmented data points instead of one point. We design permutation invariant features based on the losses of augmented instances. Our approach significantly improves the MI accuracy over existing algorithms. To passivate the bad side, we apply different data augmentation methods to each legitimate user and keep the augmented data as secret. We show that the malicious adversaries cannot benefit from our algorithms if being ignorant of the augmented data used in training. Extensive experiments demonstrate the superior efficacy of our algorithms. Our source code is available at anonymous GitHub page <https://github.com/AnonymousDLMA/MI_with_DA>.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Membership inference (MI) against machine learning models has been extensively studied from the adversary perspective Shokri et al. (2017); Yeom et al. (2018); Salem et al. (2019); Nasr et al. (2018); Long et al. (2018); Jia et al. (2019); Song et al. (2019). An adversary applies membership inference algorithms to steal privacy-sensitive membership information from target model’s training data Backes et al. (2016); Pyrgelis et al. (2017). One example is that participation in a disease-specific dataset indicates the diagnosis of such disease Backes et al. (2016). Hence from the view of legitimate users and machine learning service providers, it is desirable to suppress such attack as much as possible, i.e., diminishing the MI success rate of the adversary.

On the other side, we advocate that membership inference can enable the legitimate users to better control their data, e.g., to infer whether a service provider uses their data to train a public model. In machine learning context, the influence of a user on trained model should be erased at request as the learned model leaks private information about its training data Fredrikson et al. (2015); Wu et al. (2016); Shokri et al. (2017); Hitaj et al. (2017); Zhu et al. (2019). This aligns with the spirit of “the right to be forgotten” in the European Union’s General Data Protection Regulation (GDPR) 5 and the California Consumer Privacy Act in the United States 4. In Section 4, we show that membership inference can be used to verify the compliance of data deletion in machine learning (also known as machine unlearning), which has attracted lots of attention over the years Ginart et al. (2019); Bourtoule et al. (2019); Guo et al. (2019). We argue that this ability benefits not only the users but also the machine learning service providers because it has been observed that applications with good privacy protection promote user adoption Alsdurf et al. (2020). In this case, it is desirable to enhance the MI success rate for the legitimate users.

It seems that we cannot achieve these two goals simultaneously because of the contradictory objectives. In this paper, however, we propose leveraging private data augmentation as a solution. We first show one can utilize the data augmentation used in training to boost MI success rate. Then we show legitimate users can protect membership privacy by applying data augmentation secretly.

The core technique for boosting MI success rate is to utilize the set of augmented data points instead of one data point to do membership inference. Specifically, with a set of predictions of augmented data points, we formulate the membership inference as a set classification problem. We then design classifiers based on permutation invariant features: one using threshold on some basic statistics of losses, and the other training a neural network with the moments of losses. Our algorithms significantly improve the success rate over existing membership inference algorithms when the same augmented data points are used in training process. Moreover, without the knowledge about the augmented data used in training, our algorithms have no improvement over existing algorithms. Therefore, we argue that the adversary cannot take advantage of our algorithms if the augmented data is kept as secret by the legitimate user.

In this paper, we focus on the black-box membership inference Shokri et al. (2017); Yeom et al. (2018); Salem et al. (2019); Song et al. (2019); Sablayrolles et al. (2019). The black-box setting naturally arises in the machine learning as a service (MLaaS) system. In MLaaS, a service provider trains a ML model on private crowdsourced data and releases the model to users through prediction API. Under the black-box setting, one has access to the model’s output of a given example. Typical outputs are the loss value Yeom et al. (2018); Sablayrolles et al. (2019)

and the predicted logits of all classes

Shokri et al. (2017); Salem et al. (2019). In this paper, we use the loss value of a given example as suggested in Sablayrolles et al. (2019).

Our contributions can be summarized as follows. We design new membership inference algorithms against machine learning models and achieve significantly higher inference accuracy than existing algorithms. We also demonstrate that the adversary can not take advantage of our algorithms if the augmented data used in training is kept as secret. Furthermore, we compose a set of membership inference results over a given dataset to verify the compliance of deleting the given dataset from trained model. To the best of our knowledge, this is the first work to explore such usage of membership inference. Our verification process is easy to implement and achieves high confidence based on our strong MI routine.

1.1 Related work

Existing implementations of membership inference include those based on neural network and those based on simple metrics. Shokri et al. (2017) and Salem et al. (2019) train a neural network to infer the membership of a given example with multiple shadow models generating training data. Song et al. (2019) use a threshold on the prediction confidence to determine if an example is a member or not. Yeom et al. (2018) and Sablayrolles et al. (2019) classify one sample in the training set if the loss of the sample is lower than a predefined threshold. All existing attacks only use the target model’s output on single example even when data augmentation is used.

In machine learning context, “the right to be forgotten” requires to remove the influence of given examples from a trained model Ginart et al. (2019); Bourtoule et al. (2019); Guo et al. (2019). Ginart et al. (2019) construct quantized and divide-and-conquer -means clustering algorithms to avoid retraining the model from scratch. Bourtoule et al. (2019) divide the dataset into multiple shards, train sub models on each shard, and only retrain the sub model that is affected by deleting request. Guo et al. (2019) focus on linear models, where they use second order update to cancel out target example’s influence and then add noise to achieve certified removal.

Apart from designing provable algorithm that can perform data deletion from a trained model, it is also considered how to verify the compliance of requested deletion. Sommer et al. (2020) use backdoor attack Liu et al. (2018); Gu et al. (2019) to verify whether a server deletes given data faithfully. They force the model to memorize the specific trigger of each user, which requires quite a number of intentionally poisoned examples to succeed. This decreases the model’s test performance. In contrast, we use membership inference as basic oracle, which neither requires individual user has large dataset nor hurts the target model’s test performance.

1.2 Paper organization

The rest of this paper is organized as follows. In Section 2, we introduce some background knowledge. We present two new membership inference algorithms in Section 3 and show that the malicious adversary cannot take advantage of them. Section 4 introduces how to use membership inference to verify the compliance of machine unlearning. Section 5 presents experiment results on different models and datasets. Finally, we conclude in Section 6.

2 Preliminary

We use to denote a dataset with pairs of feature and label. The dataset may be constructed from a set of users . The crowdsourced dataset is the union of all users’ datasets. A trained model with parameters

is a mapping from the feature space to the label space. Given model’s prediction and target label, the loss function

computes the loss value, e.g. the cross-entroy loss for classification problem. We use to denote if without further explanation.

Data augmentation generates similar variants for each example to enlarge the training set. We use to denote the set of all possible transformations. For given , each transformation function generates one augmented example . For example, if data is image, each could be the combination of rotation and random cropping. The set then contains the transformations with all possible rotation degrees and cropping locations. The size of may be unlimited and we are only able to use a subset in practice. We use to denote a subset with transformation instances. We adjust to control the strength of data augmentation.

Let be a membership inference algorithm against a trained model . The output of is a boolean value denotes whether a given example is in the training set of . For example, the algorithm based on loss threshold Yeom et al. (2018); Sablayrolles et al. (2019) can be formulated as

where is a predefined scalar threshold. Song et al. (2019) use prediction confidence to infer the membership. They classify an example as member if the output confidence is higher than a predefined scalar threshold:

3 Membership Inference with Data Augmentation

In this section, we first show why existing algorithms perform badly when the target model is trained with data augmentation. Then we present our new MI algorithms which can exploit the information of a set of augmented instances rather than a single example. Finally, we show the adversary cannot take advantage of the proposed algorithms if the augmentation used in training is kept secret to the adversary.

3.1 On the limitation of existing MI algorithms

The success of existing MI algorithms largely depends on the generalization gap of target model. Generalization gap measures how differently the target model perform on the training set and the test set. Therefore, MI is easy to do if the generalization gap is large. For example, Shokri et al. (2017) achieve inference success rate higher than 70% against a model trained on CIFAR10 Krizhevsky and Hinton (2009) dataset while the gap between target model’s training and test accuracy is nearly 40%. They show the inference accuracy drops quickly as the target model’s generalization gap diminishes (Table 2 in Shokri et al. (2017)).

It is has been observed that that data augmentation, a powerful weapon to combat overfitting, can significantly reduces the success rate of existing MI algorithms Sablayrolles et al. (2019). In order to clearly illustrate the influence of data augmentation, we plot the distribution of losses in Figure 1. We use the ResNet110 model He et al. (2016) to fit CIFAR10 dataset Krizhevsky and Hinton (2009). We use the same transformation pool as the one in He et al. (2016) which contains horizontal flipping and random cropping. The size of is . As shown in Figure 1, the overlap area between losses of training examples and losses of test examples is much larger when data augmentation is used. For the loss inside the overlap area, it is impossible to classify its membership confidently. The overlap area sets up a ceiling on the membership inference success rates for the MI algorithms which use single loss value as feature.

Figure 1: Loss distribution from models trained with/without data augmentation on CIFAR10 dataset. The model is ResNet110. The plot uses 10000 examples from training set and 10000 examples from test set. The overlap area (dark region) between training and test distributions is significantly larger when data augmentation is used.

3.2 New membership inference algorithms

As the existing MI algorithms are inherently limited by the large overlap area in Figure 1, we propose leveraging more information from data augmentation to boost the MI success rate. With data augmentation, the model is trained with a set of augmented data points . Consequently, we have a set of outputs based on and a set of losses . We hide and for readability when there is no ambiguity. Instead of using single loss as in Sablayrolles et al. (2019), we use to do the membership inference. The inference task is then to classify into either training or test set. For each , is a valid empirical distribution. We plot the distributions of the basic statistics of in Figure 2. The overlap area in Figure 2 is smaller compared to Figure 1 because contains more information than single loss value.

Figure 2:

Distributions of mean and standard deviation of

. The experiment setting is the same as Figure 1. When using mean or standard deviation of as metric, the overlap area between training and test distributions is smaller than using single loss.

Smaller overlap area indicates that it is easier to distinguish examples from training and test sets. Therefore, the results in Figure 2 suggest that using the basic statistics of is better than using single loss. Motivated by this, we first explore using the mean or standard deviation of as input feature for set classification.

Input : losses of target example ; threshold ; metric function computes mean or standard deviation.
Output : boolean value, denotes is a member.
Compute . Return .
Algorithm 1 Membership inference based on basic statistics of (, ).

Following previous threshold based algorithms Sablayrolles et al. (2019); Song et al. (2019), the threshold in Algorithm 1

can either be tuned as hyperparameter or set based on the outputs of shadow models. From Table

1, we can see that Algorithm 1 outperforms existing algorithms by a large margin.

As mean and standard deviation are only coarse information of , it is natural to ask if we can design features incorporating the most information of . A straightforward solution is to train a neural network which takes entire as input and outputs final decision. However, such straightforward solution performs badly, whose experiment results are relegated to Appendix. One reason is that using losses as input is not invariant to permutation on losses . As a set classification problem, the order of elements in should not change the final decision Zaheer et al. (2017)

. Using raw losses as input does not possess this property because permuting the input losses will change the outputs of neurons. This makes the inference model hard to distinguish what is useful feature for classification and what is the noise induced by the different positions of losses.

In order to design features which are invariant to permutation on losses, we use the raw moments of . The raw moment

of a probability density (mass) function

can be computed as

The moments of

can be computed easily because it is a valid empirical distribution with uniform probability mass. For probability distributions in bounded intervals, the moments of all orders uniquely determines the distribution (known as

Hausdorff moment problem Shohat and Tamarkin (1943)). More importantly, shuffling the elements in would not change the resulting moments. In Algorithm 2, we use the raw moments as features and train a neural network to infer the membership.

Input : losses of target example ; losses from training and test sets , ; maximum moments order ; specification of inference network .
Output : boolean value, denotes is a member.
1 for  do
2       for  do
3            Compute the raw moment of : . Normalize
4       end for
5      Create tuple , where .
6 end for
Use created tuples and specification to train the inference model . Return
Algorithm 2 Membership inference based on moments of ().

The training data of Algorithm 2 can be collected by using shadow models Shokri et al. (2017) or by assuming the prior knowledge on part of the target model’s training data Nasr et al. (2018). We find that the training of inference network in Algorithm 2 only needs several hundreds of training points, which is much fewer than previous algorithms based on neural net, e.g. Shokri et al. (2017) need thousands of training points to train the inference model. This is because the moments of losses are features easier to fit than the output logits in Shokri et al. (2017).

To demonstrate the effectiveness of Algorithm 1 and 2, we evaluate them on both the small convolution model in the membership inference literature Shokri et al. (2017); Sablayrolles et al. (2019) and standard ResNet He et al. (2016). We benchmark our algorithms with and . For baselines, we report the best result among using each and the loss of original image. We do not use the loss of original image for our algorithms. We use the benchmark dataset CIFAR10, which has 10 classes of real-world objects. We use 6 common operations to construct , including rotation, translations, random erasing, etc. The details are introduced in Appendix. We sample a subset with . Other implementation details are the same as those used in Section 5. The results are presented Table 1. For a given strength of augmentation, Algorithm 1 and 2 outperform existing methods significantly. Algorithm 2 has the best performance in general because it utilizes the most information of . Surprisingly, our algorithms on models trained with data augmentation sometimes perform even better than the baseline algorithms on models trained without data augmentation. More experiments with varying are presented in Section 5.

Model Test acc.
2-layer ConvNet 59.7 83.7 83.4 N/A N/A N/A
64.6 82.2 82.1 90.3 90.9 91.3
66.8 63.5 63.5 69.4 71.6 71.4
ResNet110 84.9 65.4 65.3 N/A N/A N/A
89.4 63.2 63.2 68.5 68.7 71.4
92.7 59.3 58.7 66.3 66.9 67.1
Table 1: Membership inference success rates (in ) on CIFAR10 dataset. The number under membership inference algorithm is the inference accuracy. We use bold font to denote the best inference accuracy. The baseline attacks and are introduced in Section 2. The row with denotes model trained without data augmentation. Test acc. denotes the target model’s classification accuracy on test set.

3.3 Using secret transformations to protect membership privacy

Algorithm 1 and 2 give legitimate users new and better ways to verify their influence in trained models. Then the question left is how to prevent the adversary takes advantage of our algorithms to to steal private membership information.

We propose using private data augmentation to suppress the adversary without hurting the inference accuracy of legitimate users. When is in training set, the elements in are small because deep neural net can memorize . Otherwise, the elements in are usually large. This contradistinction makes MI a easier task. Therefore, whether is used in training matters. Legitimate users can utilize this phenomenon to make our attacks only available to themselves. Specifically, instead of choosing the same for all users, can construct privately. The malicious attacker can not acquire useful knowledge of by random guessing as long as the used is large enough. We record the used transformation set for each user and use it to evaluate our algorithms after training. This process is similar to set a password. Without the “password” (), the malicious attacker can only use the original image or randomly chosen transformations which yields significantly lower inference success rate. In Table 2, we show the inference success rates when the adversary only has partial or no knowledge on the transformation set used in training. The inference success rate drops quickly as the adversary’s knowledge on diminishes. We illustrate the above process in Figure 3.

10 8 6 4 2 0
67.1 66.5 65.4 63.5 61.6 55.7
66.9 66.0 65.8 64.3 61.4 54.5
Table 2: Membership inference success rates (in %) with partial or no knowledge on augmented data used in training. We train the ResNet110 model on CIFAR10 dataset with . The numbers () in first row denote the number of transformations in the adversary has access to. When , we random sample with to evaluate our algorithms.

4 Use Membership Inference to Verify Machine Unlearning

Figure 3: Illustration of using private data augmentation to protect membership privacy. User first chooses secretly. Then users train a model jointly by sharing the augmented examples or the gradients computed on augmented data to service provider A. Service provider A releases the trained model through black-box API. If service provider B also has as its user, it can use the data to probe the membership of in the published model. However, the malicious attacker has no access to and therefore has low inference success rate.

In this section, we show membership inference can be used to verify the compliance of machine learning. We composite the results of membership inference on individual examples in a given set to infer whether the influence of such set is removed form target model. We use and to denote the target dataset and its size, respectively. For every in , returns if it predicts the example is used in training, otherwise it returns . Let be the number of total positive predictions. Large indicates that the user’s data is not deleted. Following Sommer et al. (2020), we formulate a hypothesis testing problem. Specifically, we define the null hypothesis as : is not in the training set of given model (server removes the influence of from model) and alternative hypothesis as :

is in the training set. There are two types of errors we might make. The Type I error rate

(false positive) and Type II error rate

(false negative) are

We accept if is smaller than a given threshold otherwise we accept . Let be the probability that the inference oracle correctly predicts not a member of training set. Let be the probability that the MI oracle correctly predicts a member. We assume the probabilities are the same and independent for every in . For a given threshold , the probability that we make the Type I error is

which means that is true but we rejects it because is large. Analogously, the probability of making Type II error is

The probability

can be estimated by the inference success rate on testing set and the probability

can be estimated by the inference success rate on training set. For a given tolerance of , we choose to minimize the type II error . The experimental results are shown in Section 5.2.

5 More Experiments

In this section, we sample a random subset for every example. We first compare our algorithms with existing inference algorithms. Then we use membership inference to verify the compliance of machine unlearning.

5.1 Comparison with existing algorithms

We conduct more experiments on both CIFAR10 and CIFAR100 datasets. CIFAR100 has 100 classes of real world objectives and therefore it is harder than CIFAR10. We also evaluate our algorithms on the Wide ResNet model Zagoruyko and Komodakis (2016). We use the WRN16-8 model, which has more than 10 million parameters. The baseline algorithms are and . For baselines, we report the best result among using every element in and the loss of original image. For all threshold based algorithms, we tune the thresholds to separate the training and test sets optimally Sablayrolles et al. (2019); Song et al. (2019). For Algorithm 2, we use examples from training set and examples from test set to build the training data of inference network. The inference network has one hidden layer with neurons and we compute moments up to orders as input features. We use examples from the training set and examples from the test set to evaluate the inference success rates. The examples used to evaluate inference accuracy have no overlap with inference model’s training data. All models are trained on a single Tesla P40 GPU. More details such as training recipes and model configurations are available in Appendix. We plot the results with varying in Figure 4. Our algorithms achieve high inference success rates against well-generalized models. Both Algorithm 1 and 2 outperform existing algorithms by a large margin on different models and datasets.

Figure 4: Membership inference success rates with varying . The left y-axis denotes the membership inference attack success rate. The right y-axis denotes the test accuracy of target model. The advantage of our algorithms is clear on different datasets and models with varying choices of .

5.2 Machine unlearning verification

We use the trained models to verify the proposed verification method. We use the membership inference success rate on training set to estimate and the success rate on test set to estimate . We use to denote the size of target dataset. For given and the tolerance of Type I error , we choose to minimize . The experiment results are shown in Table 3.

Data (Model) Attack MI accuracy () ()
CIFAR10 58.8% 0.203 0.973 1.0 1.0
(ResNet110) 67.1% 0.382 0.960 0.458 0.031
CIFAR100 71.9% 0.482 0.956 0.139
(ResNet110) 80.2% 0.653 0.951
CIFAR10 61.9% 0.253 0.986 1.0 0.35
(WRN16-8) 70.1% 0.423 0.978 0.283 4.6
CIFAR100 76.8% 0.560 0.977
(WRN16-8) 83.4% 0.688 0.979
Table 3: Machine unlearning verification confidence on CIFAR10 and CIFAR100 datasets. We use ResNet110 and WRN16-8 with . Notations: and denote the MI oracle’s prediction accuracy on test and training set, respectively, and is the size of target dataset. We set Type I error . Smaller Type II error indicates higher confidence.

As shown in Table 3, our verification mechanism needs smaller compared to the verification based on backdoor attack Sommer et al. (2020) while still achieves very high confidence. Moreover, our mechanism does not need to poison the data with wrong label and therefore causes no damage to target model’s test performance. The target model WRN16-8 in Table 3 achieves 94.6% test accuracy on CIFAR10 dataset. This suggests that membership inference is a powerful and effective way to verify the compliance of requested data deletion.

6 Conclusion

In this paper, we propose membership inference with private data augmentation, which benefits the legitimate data owners while suppresses the adversary. For a given example, we use the losses of its augmented examples to infer its membership. We present two ways: one using the mean and standard deviation of losses, and the other training a neural network with the moments of losses as permutation invariant features. The proposed algorithms achieve high membership inference accuracy against models with good generalization. Moreover, we show that the malicious adversaries can not take advantage of our algorithms if the benign users apply data augmentation privately.

Broader Impact

This work benefits the legitimate data owners with new ways to decide their contribution on a trained ML model without hurting membership privacy. This work also boosts the communication between the participants working on privacy preserving machine learning and the participants working on state-of-the-art ML applications by exploring accurate membership inference algorithms against models with high test accuracy. This work has no explicit ethical concerns and the proposed algorithms do not leverage biases in data.

References

  • H. Alsdurf, Y. Bengio, T. Deleu, P. Gupta, D. Ippolito, R. Janda, M. Jarvie, T. Kolody, S. Krastev, T. Maharaj, et al. (2020) COVI white paper. arXiv preprint arXiv:2005.08502. Cited by: §1.
  • M. Backes, P. Berrang, M. Humbert, and P. Manoharan (2016) Membership privacy in microrna-based studies. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Cited by: §1.
  • L. Bourtoule, V. Chandrasekaran, C. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot (2019) Machine unlearning. arXiv preprint arXiv:1912.03817. Cited by: §1.1, §1.
  • [4] California consumer privacy act. Note: https://oag.ca.gov/privacy/ccpa Cited by: §1.
  • [5] European union’s general data protection regulation. Note: https://gdpr-info.eu/ Cited by: §1.
  • M. Fredrikson, S. Jha, and T. Ristenpart (2015) Model inversion attacks that exploit confidence information and basic countermeasures. In ACM SIGSAC Conference on Computer and Communications Security, Cited by: §1.
  • A. Ginart, M. Guan, G. Valiant, and J. Y. Zou (2019) Making ai forget you: data deletion in machine learning. In Advances in Neural Information Processing Systems, Cited by: §1.1, §1.
  • T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg (2019) Badnets: evaluating backdooring attacks on deep neural networks. IEEE Access. Cited by: §1.1.
  • C. Guo, T. Goldstein, A. Hannun, and L. van der Maaten (2019) Certified data removal from machine learning models. arXiv preprint arXiv:1911.03030. Cited by: §1.1, §1.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    Cited by: §3.1, §3.2.
  • B. Hitaj, G. Ateniese, and F. Pérez-Cruz (2017)

    Deep models under the gan: information leakage from collaborative deep learning

    .
    In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Cited by: §1.
  • J. Jia, A. Salem, M. Backes, Y. Zhang, and N. Z. Gong (2019) MemGuard: defending against black-box membership inference attacks via adversarial examples. In 2019 ACM SIGSAC Conference on Computer and Communications Security, Cited by: §1.
  • A. Krizhevsky and G. Hinton (2009) Learning multiple layers of features from tiny images. Cited by: §3.1, §3.1.
  • Y. Liu, S. Ma, Y. Aafer, W. Lee, J. Zhai, W. Wang, and X. Zhang (2018) Trojaning attack on neural networks. Network and Distributed Systems Security (NDSS) Symposium. Cited by: §1.1.
  • Y. Long, V. Bindschaedler, L. Wang, D. Bu, X. Wang, H. Tang, C. A. Gunter, and K. Chen (2018) Understanding membership inferences on well-generalized learning models. arXiv preprint arXiv:1802.04889. Cited by: §1.
  • M. Nasr, R. Shokri, and A. Houmansadr (2018) Machine learning with membership privacy using adversarial regularization. In ACM SIGSAC Conference on Computer and Communications Security, Cited by: §1, §3.2.
  • A. Pyrgelis, C. Troncoso, and E. De Cristofaro (2017) Knock knock, who’s there? membership inference on aggregate location data. arXiv preprint arXiv:1708.06145. Cited by: §1.
  • A. Sablayrolles, M. Douze, Y. Ollivier, C. Schmid, and H. Jégou (2019) White-box vs black-box: bayes optimal strategies for membership inference. International Conference on Machine Learning. Cited by: §B.1, §1.1, §1, §2, §3.1, §3.2, §3.2, §3.2, §5.1.
  • A. Salem, Y. Zhang, M. Humbert, P. Berrang, M. Fritz, and M. Backes (2019) Ml-leaks: model and data independent membership inference attacks and defenses on machine learning models. Network and Distributed Systems Security (NDSS) Symposium. Cited by: §1.1, §1, §1.
  • J. A. Shohat and J. D. Tamarkin (1943) The problem of moments. American Mathematical Soc.. Cited by: §3.2.
  • R. Shokri, M. Stronati, C. Song, and V. Shmatikov (2017) Membership inference attacks against machine learning models. In IEEE Symposium on Security and Privacy (SP), Cited by: §1.1, §1, §1, §1, §3.1, §3.2, §3.2.
  • D. M. Sommer, L. Song, S. Wagh, and P. Mittal (2020) Towards probabilistic verification of machine unlearning. arXiv preprint arXiv:2003.04247. Cited by: §1.1, §4, §5.2.
  • L. Song, R. Shokri, and P. Mittal (2019) Privacy risks of securing machine learning models against adversarial examples. In ACM SIGSAC Conference on Computer and Communications Security, Cited by: §B.1, §1.1, §1, §1, §2, §3.2, §5.1.
  • S. M. Tonni, F. Farokhi, D. Vatsalan, and D. Kaafar (2020) Data and model dependencies of membership inference attack. Proceedings on Privacy Enhancing Technologies. Cited by: §B.2.
  • X. Wu, M. Fredrikson, S. Jha, and J. F. Naughton (2016) A methodology for formalizing model-inversion attacks. In IEEE Computer Security Foundations Symposium, Cited by: §1.
  • S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha (2018) Privacy risk in machine learning: analyzing the connection to overfitting. In IEEE 31st Computer Security Foundations Symposium (CSF), Cited by: §1.1, §1, §1, §2.
  • S. Zagoruyko and N. Komodakis (2016) Wide residual networks. arXiv preprint arXiv:1605.07146. Cited by: §5.1.
  • M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola (2017) Deep sets. In Advances in neural information processing systems, Cited by: §3.2.
  • L. Zhu, Z. Liu, and S. Han (2019) Deep leakage from gradients. In Advances in Neural Information Processing Systems, Cited by: §1.

Appendix A Supplementary materials for Section 3

a.1 Details of

We use standard operations in image processing literature such as rotation, translation, shearing, etc. The details are listed below.

  1. Flip the image horizontally with probability .

  2. Padding image into and take a crop at random location.

  3. Rotate the image by degrees.

  4. Translate the image by pixels.

  5. Shear the image by degrees.

  6. Erase a box at random location.

For each , the operations are applied with random order and the parameters of each operation are also randomly chosen. We record the sequence of operations and parameters of each operation to save a chosen transformation. Then we use the saved transformations to evaluate MI algorithms after training.

a.2 Using raw outputs as features for inference model

We show that using raw outputs of target model to train inference model yields bad inference accuracy. We evaluate two approaches: one using the raw losses (), and the other using the output logits of all augmented examples plus the ground truth label (

). For example, we concatenate the outputs of all augmented instances into a one dimension tensor. The pseudocodes of

and are shown in Algorithm 3 and 4, respectively. The word ‘specification’ in pseudocode denotes the architecture and hyperparameters of target model. See Appendix B.1 for details.

Input : losses of target example ; losses from training and test sets , ; specification of inference network .
Output : boolean value, denotes is a member.
1 for  do
2       Concatenate the elements in

into vector

. Create tuple , where .
3 end for
Use created tuples and specification to train the inference model . Return
Algorithm 3 Membership inference based on raw losses ().

We use 200 (2500) examples from training set and 200 (2500) examples from test set to build the training data for (). The configuration of inference model is the same as . The results are shown in Table 4. The results suggest that the raw outputs are less informative than moments of losses. This may due to the raw outputs are not invariant to permutation on the augmented instances.

Dataset Model Test acc.
CIFAR10 ResNet110 92.8 61.8 56.9 67.2
WRN16-8 94.6 63.0 58.3 70.1
CIFAR100 ResNet110 69.9 74.1 51.7 80.2
WRN16-8 76.1 77.2 52.9 83.4
Table 4: Membership inference success rate (in ) on CIFAR10/100 dataset with . The numbers in third column denote the target model’s top1 test accuracy. We use bold font to denote the best inference accuracy.
Input : logits of target example ; logits from training and test sets , ; specification of inference network .
Output : boolean value, denotes is a member.
1 for  do
2       Concatenate all elements in and label into vector . Create tuple , where .
3 end for
Use created tuples and specification to train the inference model . Return
Algorithm 4 Membership inference based on raw logits ().

Appendix B Supplementary materials for Section 5

b.1 Implementation details of experiments

The small model used in Section 3 contains two convolution layers with kernels, a global pooling layer and a fully connected layer of size . Following Sablayrolles et al. [2019], Song et al. [2019], we using examples as training set. The model is trained for epochs with initial learning rate 0.01. We decay the learning rate by at the 100-th epoch. The ResNet110 and WRN16-8 models are adopted from original papers and we train the models with the same hyperparameters. For ResNet110 and WRN16-8, we use examples as training set and examples as test set.

We use examples from training set and examples from test set to build the training set for . We use moments up to orders, which yields a -dimensional input. The inference model has one hidden layer with

neurons and we use Tanh non-linearity as activation function. We train the attack network for

steps with learning rate . To compute the inference success rate, we use examples from the training set and examples from the test set.

b.2 More experiments

In Table 5, we compare our algorithms with two baseline algorithms and with . We apply different data augmentation to every example. The experiment settings are the same as those in Section 5. The results in Table 5 further justify the effectiveness of our algorithms. Interestingly, although the generalization gap of WRN16-8 is smaller than the gap of ResNet110, the algorithms achieve higher inference accuracy on WRN16-8. This phenomenon aligns with the finding in Tonni et al. [2020] that the inference accuracy is affected not only by the generalization gap but also by the model and dataset in use.

Dataset Model Test acc.
CIFAR10 ResNet110 92.8 59.3 59.2 66.3 67.0 67.2
WRN16-8 94.6 61.9 61.9 67.2 68.8 70.1
CIFAR100 ResNet110 69.9 71.9 71.8 79.2 79.9 80.2
WRN16-8 76.1 76.9 76.8 81.9 82.9 83.4
Table 5: Membership inference success rate (in ) on CIFAR10/100 dataset with . The numbers in third column denote the target model’s top1 validation accuracy. The baseline algorithms and are introduced in Section 2. We use bold font to denote the best inference accuracy.