privGAN
PrivGAN: Protecting GANs from membership inference attacks at low cost
view repo
Generative Adversarial Networks (GANs) have made releasing of synthetic images a viable approach to share data without releasing the original dataset. It has been shown that such synthetic data can be used for a variety of downstream tasks such as training classifiers that would otherwise require the original dataset to be shared. However, recent work has shown that the GAN models and their synthetically generated data can be used to infer the training set membership by an adversary who has access to the entire dataset and some auxiliary information. Here we develop a new GAN architecture (privGAN) which provides protection against this mode of attack while leading to negligible loss in downstream performances. Our architecture explicitly prevents overfitting to the training set thereby providing implicit protection against white-box attacks. The main contributions of this paper are: i) we propose a novel GAN architecture that can generate synthetic data in a privacy preserving manner and demonstrate the effectiveness of our model against white–box attacks on several benchmark datasets, ii) we provide a theoretical understanding of the optimal solution of the GAN loss function, iii) we demonstrate on two common benchmark datasets that synthetic images generated by privGAN lead to negligible loss in downstream performance when compared against non–private GANs. While we have focosued on benchmarking privGAN exclusively of image datasets, the architecture of privGAN is not exclusive to image datasets and can be easily extended to other types of datasets.
READ FULL TEXT VIEW PDFPrivGAN: Protecting GANs from membership inference attacks at low cost
Much of the recent progress in Machine Learning and related areas has been strongly dependent on the open sharing of datasets. A recent study shows that the increase in the number of open datasets in biology has led to a strongly correlated increase in number of data analysis papers (Piwowar and Vision, 2013). Moreover, in many specialized application areas, the development of novel algorithms is contingent upon the public availability of relevant data. While the public availability of data is essential for reproducible science, in the case of datasets containing sensitive information, this poses a possible threat to the privacy of the individuals in the dataset.
Several simple approaches have been adopted to de–identify record datasets such as anonymization of personally identifiable information. However, such approaches are prone to de–anonymization attacks when the adversary has access to additional information about the individuals in the dataset (Narayanan and Shmatikov, 2008)
. In the case of images, simple approaches such as blurring of faces/eyes have been shown to be easily overcome using model inversion or image similarity based attacks
(Li et al., 2014; Cavedon et al., 2011) by adversaries with access to auxiliary data. To overcome such attacks, more sophisticated approaches have been proposed that fall under the umbrella term ’privacy preserving machine learning’ (Al-Rubaie and Chang, 2019; Agrawal and Srikant, 2000).In the case of images, some examples of state–of–the–art privacy preserving machine learning methods include adversarial obfuscators to protect against re–construction attacks (Li et al., 2019), image fusion techniques (Jourabloo et al., 2015), and differential privacy based techniques (Abadi et al., 2016; Xie et al., 2018). Another increasingly popular direction of research has focused on generating synthetic samples using generative models such as Generative Adversarial Networks (GANs). Since the synthetic samples don’t represent real individuals, such methods have become increasingly popular in medical imaging (Yi et al., 2019; Zheng et al., 2018).
Despite the popularity of GANs as a privacy preserving way to share data, recent work has shown that such data is vulnerable to membership inference attacks (Hayes et al., 2019; Chen et al., 2019; Hilprecht et al., 2019). These papers demonstrate that such models are vulnerable to both black–box attacks (when the attacker only has access to synthetic data) and white–box attacks (when the attacker also has access to the trained model). However, it is consistently shown that white–box attacks are much more effective, hence in this paper we have focused primarily on such attacks. Moreover, since it is increasingly common among machine learning researchers to make their code and models public, white–box attacks are also increasingly practical.
Although most of the work on privacy attacks have been relatively recent, DPGAN (Differentially Private GAN) has been shown as a potential protection against such attacks (Hayes et al., 2019)
. DPGAN is an extension of differentially private deep learning
(Abadi et al., 2016)which works by adding noise to the stochastic gradient descent (SGD) steps to enforce provable privacy bounds on the trained model. However, this approach has been shown in multiple papers
(Hayes et al., 2019; Xie et al., 2018) to lead to a sharp degradation of sample quality compared to non–private GANs. This in turn affects the performances of models trained on such synthetic data for downstream tasks (Beaulieu-Jones et al., 2019).To address this trade–off between privacy and sample quality, we developed a novel GAN architecture namely priv(ate)GAN. We demonstrate empirically that the effectiveness of privGAN specifically on the white–box attacks from (Hayes et al., 2019) can be quite comparable to DPGAN, while not sacrificing the sample quality. Our primary contributions in this paper are: i) proposing a novel privacy preserving GAN architecture, ii) providing a theoretical analysis of the optimal solutions to the GAN minimax problem, iii) empirically comparing the performance of our architecture against baselines on white–box attacks, and iv) empirically comparing the sample quality of our generated samples with different baselines.
In this section, we will motivate and introduce privGANs. In addition, we will provide the theoretical results for the optimal generator and discriminator.
Before introducing privGANs, let us define the original GAN. In the original (non–private) GAN, the discriminator and the generator play the two-player game to minimaximize the value function :
(1) |
where is the pre-defined input noise distribution, and is the distribution of the real data . Despite its success in generating realistic samples without releasing the original dataset, GANs are vulnerable to membership inference attacks (Hayes et al., 2019) due to its overfitting on samples in the training set.
Having defined the original GAN architecture, we note that one of the major privacy risks posed by the GAN comes from the fact that distribution of the training samples can often be quite different than the distribution of the entire dataset. This leaves the original GAN vulnerable to an adversary with access to the larger pool of samples of which the training set was a subset of. The adversary is able to easily identify the training set membership of samples using the observation that the trained GAN is more likely to identify samples in it’s training set as a ’real’ sample than those that were not in the training set. Since this vulnerability is entirely predicated on the GANs overfitting to the distribution of the training set (which is an empirical sample of the true data distribution), a potential solution would be to better approximate the true data distribution instead of the distribution of the training set.
To tackle this problem, we introduce privGANs with a privacy discriminator
to penalize overfitting to the training samples. Given a hyperparameter
, we randomly divide the real data into equal subsets: . The value function for a privGAN is defined as:where the is the real data distribution of for , is the pre-defined input noise distribution, is a hyperparameter,
represents the probability of x to be generated by the generator
satisfying that . Figure 1 shows an illustration of the privGAN architecture when . Accordingly, the optimization problem for privGANs is(2) |
We will show in the following subsection the theoretical results for privGANs.
We first provide explicit expressions for the optimal discriminators given the generators.
Fixing the generators and the hyperparameter , the optimal discriminators of Equation (2) are given by:
for , where is the distribution of given for .
Decompose the value function as
Note that the first term only depends on , while the second term depends on alone. By Proposition 1 (Goodfellow et al., 2014), maximizes for .
Note that
Then it is equivalent to solve the optimization problem , where under the constraints that and for . It is reasonable to assume that
, since the probability density function is always positive. Easy to verify that
is concave, given any positive s. Note that for solves the set of differential equations for any positive s. Thus it always maximizes for any positive s, and we complete the proof. ∎Similar to the original GAN (Goodfellow et al., 2014), define
The minimum achievable value of is . This value is achieved if and only if , for .
It is easy to verify that when for , achieves .
By its definition, can be re-written as:
By Theorem 1 and a few algebraic manipulations, we have
(3) |
where stands for the KL-divergence between two distributions and . Note that the Jensen-Shannon divergence (JSD) between N distributions is defined as . Then, Equation (3) turns out to be
where the minimum is achieved if and only if , for , and , according to the property of Jensen-Shannon divergence. Thus completing the proof.
∎
This remark, suggests that privGANs and GANs yield the same solution, when for . This is true, when there are infinite samples.
In Theorems 1 & 2, we have focused on the ideal situation where we get access to . In a practical scenario this is not true, which is what makes the white–box attacks against GANs effective. In the following lemma we demonstrate that privGAN serves as a regularization that prevents the optimal generators (and hence the discriminators) from overfitting to the empirical distributions.
Assume that minimizes for a fixed positive . Then minimizing is equivalent as
(4) |
where given for , and .
Since and are fixed, reformulate as
(5) |
Assume that solves Equation (4). It also minimizes , since .
We will then show that is a solution of Equation (4). If the above assumption is not true, then there exists such that , and . Then . This contradicts the assumption that minimizes . This completes the proof. ∎
The results stated in Theorem 2 and Lemma 1 provide an intuitive understanding of the properties of the optimal generator distributions. It is easy to see from Theorem 2, that the cost function reduces to a trade off between the distance of generator distributions and their corresponding data split, and their distance to the other generator distributions. Hence the privacy discriminator can be seen as a regularization to prevent generators from overfitting to their corresponding data split. Since the effectiveness of white–box attacks solely depend on GANs overfitting to their dataset, this should reduce the efficacy of such attacks. The reformulation of the optimization problem for the generators (seen in Lemma 1) provides a more explicit way to bound the distance between the generator distributions, which can be explored in future work to provide privacy guarantees. Remark 1 demonstrates that if the sample size is large, the optimal generator distributions for privGAN are the same as a non–private GAN.
While an alternating minimization strategy seems like a reasonable choice for training the privGAN, there are some practical issues with such a strategy due to the presence of the privacy preserving discriminator (
). In the initial epochs, since the generators produce mostly noise, there is no substantial difference between the distribution of data generated by each generator. This makes it incredibly difficult for
to differentiate between the distributions and as a result the contribution of in the loss function for each generator is small. To avoid this, we train the on the different partitions of the training data (corresponding to each generator) for a small number of epochs (here we used ). This allows to learn the difference between the different data partitions. However, this makes it hard for the generators to beat (and makes it very easy for ). We fix this problem by fixing for the first epochs (here we used ) after the previously described training while only allowing the generator-discriminator pairs to train. This allows the generators to learn to generate realistic data while developing the ability to beat a static . Once is allowed to train after epochs, it leads to a sudden spike in the loss of the generators, however this transient soon settles as long as the generator had enough time to converge under the static . The overall training algorithm for privGAN can be seen in Algorithm 1 and a comparison of the generated image qualities for different combinations of and can be seen in Figure 2. Setting leads to big initial transients in the combined loss which eventually subsides. The effect of setting is less dramatic, however it shows that for a fixed the convergence of the combined loss is smoother when . However, it should be noted that the relative values of the various losses after epochs are quite stable to the choice of and .We use the following standard open datasets for our experiments: i) MNIST, ii) fashion–MNIST, iii) CIFAR-10, and iv) Labeled Faces in Wild (LFW). MNIST and fashion–MNIST are grayscale datasets of size ( training samples, test samples). MNIST comprises of images of handwritten digits, while fashion–MNIST contains images of simple clothing items. They contain a balanced number of samples from classes of images. CIFAR-10 is a colored (RGB) dataset of everyday objects of size ( training samples, test samples). LFW is a dataset of size comprising of faces of individuals. We use the grayscale version of the dataset made available through scikit–learn.
The white–box attack on a simple GAN is performed as outlined in (Hayes et al., 2019). Briefly, the attack assumes that the adversary is in possession of the trained model along with the entire dataset (a fraction of which was used in training). The attacker is also assumed to have the knowledge of what fraction of the dataset was used in training (say ) but no other information about the training set. The attack then proceeds by using the discriminator of the trained GAN to obtain a probability score for each sample in the dataset (see Algorithm 2). The samples are then sorted in descending order of probability score and the top fraction of the scores are outputted as the likely training set. The evaluation of the white–box attack is done by calculating what fraction of the predicted training set was actually in the training set.
Since a privGAN model has multiple generator–discriminator pairs, the previously described attack cannot be directly applied to it. However, for a successful white–box attack, each of the discriminators should score samples from the training corpus higher than those not used in training (note: the training sets are of the same size for both private and non–private GANs). Hence, we modify the previous approach by identifying a ’mean’ and ’max’ probability score by taking the mean/max over the scores from all discriminators (see Algorithm 3). We now proceed to sort the samples by each of these aggregate scores and select the top fraction samples as the predicted training set. Evaluation is performed as described before.
For the task of comparing accuracy of white box attacks, we compare against two baselines namely: i) a non–private GAN, ii) random selection of samples to be predicted as training samples with probability equal to (fraction of samples that belong to the training set). For the task of evaluating downstream performance of models, we compare against two baselines namely: i) a model trained data generated using non–private GANs, ii) a model trained on real training data.
For MNIST, MNIST-fashion and LFW, we use standard fully connected networks for both generators and discriminators. The generator and discriminator architecture details can be found in the Appendix. Identical generator and discriminator architectures are used for GAN and privGAN. While evaluating white–box attack accuracies, we trained all GAN models with an Adam (Kingma and Ba, 2014) optimizer with a learning rate of () for epochs. While evaluating performance on downstream classification tasks, we train all GAN models with an Adam optimizer with a learning rate of () for epochs. For the classifier, we use a simple CNN model (see architecture in Appendix). For the CNN model, we still used a learning rate of but trained for epochs instead since the model converges quickly. In all cases we used a batch–size of 256.
To test the efficiency of white–box attacks, models were trained on of the data as in (Hayes et al., 2019). Reported numbers are averages over runs. For each run, of the dataset was randomly chosen to be the training set. For the task of evaluating the downstream performance of GANs, a separate generative model was trained for each class of the training dataset. Here the training dataset refers to the pre–defined training set available for MNIST and MNIST–fashion.
Dataset | Rand. | GAN | privGAN () | privGAN () | privGAN () | DPGAN () |
---|---|---|---|---|---|---|
MNIST | 0.1 | 0.346 | (0.144, 0.147) | (0.12, 0.116) | (0.096, 0.097) | 0.098 |
f-MNIST | 0.1 | 0.420 | (0.192, 0.305) | (0.192, 0.255) | (0.095, 0.095) | 0.102 |
LFW | 0.1 | 0.724 | (0.148, 0.137) | (0.107, 0.094) | (0.163, 0.169) | 0.109 |
CIFAR-10 | 0.1 | 0.723 | (0.568,0.313) | (0.424,0.221) | Did not converge | 0.107 |
To compare the privacy loss of privGAN with the baselines, we performed a white–box attack as described in Section 3.2. Since the the privGAN has multiple generator/discriminator pairs, we describe a modified attack that is designed specifically for privGAN (see Algorithm 2). For each dataset, we train GAN, privGAN (for ) and DPGAN () on 10% of the dataset. The goal of the white–box attack is to then identify the training set from the total dataset. We observe in Table 1 that for all datasets privGAN with this leads to substantial decrease in accuracy of the white–box attack when compared to the non–private GAN. Increasing generally leads to reduction in accuracy of white–box attack although this does not seem to be true for LFW. This is most likely because for very high values of , privGAN prioritizes minimizing the difference between the generated distributions and this can in some cases lead to worsening of performance. In all cases, the privGAN model corresponding to the best performing value of yields comparable performance to the random chance and DPGAN baselines.
A qualitative way to evaluate how well GANs are protected against white–box attacks is comparing the distribution of discriminator scores for samples in the training set with samples outside of the training set. The more similar the distributions are, the harder it is for an adversary to tell the samples apart. For privGAN, since there are multiple discriminators, we can look at the outputs of a randomly chosen discriminator instead. In Figure 3 we see that privGAN does indeed make the two distributions closer and the similarity between the distributions increases with . On the other hand, for a non–private GAN, the two distributions are very different which explains the high accuracy of white–box attacks in their case.
We compare the downstream performance of privGAN against non–private GANs in two ways: i) qualitative comparison of the generated images, ii) quantitative comparison on a downstream classification task.
For the first task we qualitatively compare the quality of images generated by privGAN with different settings of to those generated by a non–private GAN (Figure 6). It is easy to see that the image quality for all three values (0.1, 1, 10) are quite comparable to the images generated by the non–private GAN. However, it can be seen that the image quality does decrease as we increase . We also see that certain classes become overrepresented as increases. This is studied in greater detail in the following section.
To quantitatively test the downstream performance, we split the pre–defined training set for MNIST and MNIST–fashion by each digit and trained a privGAN () for each digit. We then generated the same number of samples per digit () and created a new synthetic training set. This training set was used to train a CNN classification model (architecture in Appendix), which was then tested on the pre–defined test sets for each dataset. The baselines used for comparison were: i) CNN trained on the real training set, ii) CNN trained on a training set generated by a non–private GAN, iii) CNN trained on a DPGAN with . We can see in Figure 4 that the accuracy of the CNN remains almost unchanged for privGAN with different ’s compared to the non–private GAN. While this is true for both datasets, the drop in accuracy is slightly more in the case of MNIST and increases with . It is interesting to note that the performance of DPGAN is quite poor on both datasets despite the relatively large value of .
To test the effect of the hyper–parameter choices on sample quality we focus on two attributes: i) unambiguity of the class of the generated images, ii) relative abundance of different classes in generated images. We measure the unambiguity of the class of the generated images using the entropy of the predicted class probabilities (using a CNN trained on real images). The average entropy of the entire dataset is then reported for different hyper–parameter settings. The class diversity of generated images is measured by using the pre–trained CNN to first identify the most probable class per sample and then using it to calculate the relative abundance of each class in the generated dataset (scaled to sum to 1). We then calculate the entropy of the relative class abundance which we report as the class diversity.
We see in Figure 5 a that as (fixing number of generators to 2) is increased, both average entropy and class diversity monotonically decrease. This implies that as increases, the class ambiguity of the samples decreases, while the class diversity also decreases. We see in Figure 5 b that as the number of generators is increased (fixing ) we notice a monotonic increase of average entropy. This increase in average entropy is accompanied by an increase in class diversity (although the increase is not monotonic). This in turn implies that as the number of generators increases, the class ambiguity of samples increases while leading to an increase in class diversity. Here it must be noted that as the number of generators is increased (for a fixed dataset size), the size of each data split decreases.
Based on these results, it can be summarized that both and the number of generators impact the quality of samples generated by privGAN in opposite ways. Since these two parameters interact, the optimal value of these hyperparameters are inter–dependent and most likely dependent on the dataset.
Here we present a novel GAN framework (namely privGAN) that utilizes multiple generator-discriminator pairs to prevent the model from overfitting to the training set. Through a theoretical analysis of the optimal generator/discriminators, we demonstrate that the results are identical to those of a non–private GAN. We also demonstrate in the more practical scenario where the training data is a sample of the entire dataset, the privGAN loss function is equivalent to a regularization to prevent overfitting to the training set. The regularization provided by privGAN could also lead to an improved learning of the data distribution, which will be the focus of future work.
To demonstrate the utility of privGAN, we focus on the application of preventing white–box attacks against GANs. We demonstrate that while non–private GANs are highly vulnerable to such attacks, privGAN provides strong protection against such attacks. While we focus on white–box attacks in this paper, we argue that by protecting against such attacks, we are automatically protected against black–box attacks. We also demonstrate that compared to another popular defense against such attacks (DPGAN), privGAN does not negatively affect the quality of downstream samples as evidenced by the performance on downstream learning tasks such as classification. We also characterize the effect of different privGAN hyper–parameters on sample quality, measured through two different metrics.
While the major focus of the current paper has been to characterize the properties of privGAN and empirically show the protection it provides to white–box attacks, future work could focus on finding theoretical guarantees due to our approach. The privGAN architecture could also have applications in related areas such as transfer learning. Hence another direction of future work could be focused on extending privGAN to such application areas and demonstrating the benefits in practical datasets, such as in healthcare.
Privacy-preserving generative deep neural networks support clinical data sharing
. Circulation: Cardiovascular Quality and Outcomes 12 (7), pp. e005122. Cited by: §1.Proceedings of the IEEE conference on computer vision and pattern recognition
, pp. 152–159. Cited by: §1.Here we outline the different layers used in the model architectures for different datasets, along with associated optimization hyper–parameters. It is important to note that the same choices are made for non–private GAN, privGAN as well as DPGAN in all cases. Note that layers are in sequential order.
Dense(units, input size)
LeakyReLU()
Dense(units)
LeakyReLU()
Dense(units)
LeakyReLU()
Dense(units, activation = ’tanh’)
Dense(units)
LeakyReLU()
Dense(units)
LeakyReLU()
Dense(units)
LeakyReLU()
Dense(units, activation = ’sigmoid’)
Dense(units)
LeakyReLU()
Dense(units)
LeakyReLU()
Dense(units)
LeakyReLU()
Dense(units = number of generators, activation = ’softmax’)
An Adam optimizer with and a learning rate of was used for optimization.
Dense(units, input size)
LeakyReLU()
Dense(units)
LeakyReLU()
Dense(units)
LeakyReLU()
Dense(units, activation = ’tanh’)
Dense(units)
LeakyReLU()
Dense(units)
LeakyReLU()
Dense(units)
LeakyReLU()
Dense(units, activation = ’sigmoid’)
Dense(units)
LeakyReLU()
Dense(units)
LeakyReLU()
Dense(units)
LeakyReLU()
Dense(units = number of generators, activation = ’softmax’)
An Adam optimizer with and a learning rate of was used for optimization.
Dense(units, input size, target shape)
LeakyReLU()
Conv2DTranspose(filters, kernel size, strides)
LeakyReLU()
Conv2DTranspose(filters, kernel size, strides)
LeakyReLU()
Conv2DTranspose(filters, kernel size, strides, activation = ’tanh’)
Conv2D(filters, kernel size, strides)
Reshape(target shape)
Conv2D(filters, kernel size, strides)
LeakyReLU()
Conv2D(filters, kernel size, strides)
LeakyReLU()
Conv2D(filters, kernel size, strides)
LeakyReLU()
Dense(units, activation = ’sigmoid’)
Conv2D(filters, kernel size, strides)
Reshape(target shape)
Conv2D(filters, kernel size, strides)
LeakyReLU()
Conv2D(filters, kernel size, strides)
LeakyReLU()
Conv2D(filters, kernel size, strides)
LeakyReLU()
Dense(units = number of generators, activation = ’softmax’)
An Adam optimizer with and a learning rate of was used for optimization.
Conv2D(filters, kernel size
, activation = ’relu’)
Conv2D(filters, kernel size, activation = ’relu’)
Max–pooling(pool size)
Dense(units, activation = ’relu’)
Dense(units, activation = ’soft–max’)
An Adam optimizer with and a learning rate of was used for optimization.
All models are implemented with Keras with a TensorFlow backend. DPGAN was implemented using the Tensorflow Privacy package (
https://github.com/tensorflow/privacy).