DeepAI
Log In Sign Up

MACE: A Flexible Framework for Membership Privacy Estimation in Generative Models

09/11/2020
by   Xiyang Liu, et al.
0

Generative models are widely used for publishing synthetic datasets. Despite practical successes, recent works have shown some generative models may leak privacy of the data that have been used during training. Membership inference attacks aim to determine whether a sample has been used in the training set given query access to the model API. Despite recent work in this area, many of the attacks designed against generative models require very specific attributes from the learned models (e.g. discriminator scores, generated images, etc.). Furthermore, many of these attacks are heuristic and do not provide effective bounds for privacy loss. In this work, we formally study the membership privacy leakage risk of generative models. Specifically, we formulate membership privacy as a statistical divergence between training samples and hold-out samples, and propose sample-based methods to estimate this divergence. Unlike previous works, our proposed metric and estimators make realistic and flexible assumptions. First, we use a generalizable metric as an alternative to accuracy, since practical model training often leads to imbalanced train/hold-out splits. Second, our estimators are capable of estimating statistical divergence using any scalar or vector valued attributes from the learned model instead of very specific attributes. Furthermore, we show a connection to differential privacy. This allows our proposed estimators to provide a data-driven certificate to understand the privacy budget needed for differentially private generative models. We demonstrate the utility of our framework through experimental demonstrations on different generative models using various model attributes yielding some new insights about membership leakage and vulnerabilities of models.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

06/07/2019

Reconstruction and Membership Inference Attacks against Generative Models

We present two information leakage attacks that outperform previous work...
05/31/2022

Generative Models with Information-Theoretic Protection Against Membership Inference Attacks

Deep generative models, such as Generative Adversarial Networks (GANs), ...
03/04/2021

On the privacy-utility trade-off in differentially private hierarchical text classification

Hierarchical models for text classification can leak sensitive or confid...
08/17/2022

An Empirical Study on the Membership Inference Attack against Tabular Data Synthesis Models

Tabular data typically contains private and important information; thus,...
06/13/2022

Assessing Privacy Leakage in Synthetic 3-D PET Imaging using Transversal GAN

Training computer-vision related algorithms on medical images for diseas...
12/25/2017

Towards Measuring Membership Privacy

Machine learning models are increasingly made available to the masses th...
06/01/2022

Privacy for Free: How does Dataset Condensation Help Privacy?

To prevent unintentional data leakage, research community has resorted t...

Introduction

How can we release sensitive datasets while protecting the privacy of individuals in such datasets? Recent progress in generative machine learning has provided a new path to sharing of such data 

Han et al. (2018); Yi et al. (2019). However, recent work has demonstrated that even such approaches are vulnerable to privacy leakage since they are shown to overfit to their training datasets Liu et al. (2018). Adversaries can exploit this vulnerability with query access to the trained models or the synthetic data outputs, to infer membership of samples to the training datasets Hilprecht et al. (2019); Hayes et al. (2019); Mukherjee et al. (2019); Chen et al. (2019).

To overcome this, researchers have developed many generative machine learning approaches that are protected against privacy attacks. These approaches generally fall under two broad classes: i) differentially private approaches, ii) empirical approaches. Differentially Private (DP) approaches such as DPGAN Xie et al. (2018), DP-cGAN Torkzadehmahani et al. (2019), PATE-GAN Jordon et al. (2018) rely on adding noise during the training process (either to gradients or to labels). While these approaches come with formal guarantees of privacy, they have been shown to lead to poor sample qualities for reasonable privacy bounds Hayes et al. (2019). Empirical approaches such as privGAN Mukherjee et al. (2019) show higher sample quality but provide no privacy guarantees. Such approaches typically demonstrate the privacy benefits by empirical evaluation against previously proposed. While empirical evaluations seen in Hayes et al. (2019); Hilprecht et al. (2019) provide some data driven way to quantify privacy loss, they often are non-principled and do not take into account the imbalanced nature of the membership inference problem Jayaraman and Evans (2019) (there are usually many more non-members than members).

On the other hand, differential privacy is capable of providing strong query-specific theoretical guarantees. Currently all DP based generative modeling approaches (to the best of our knowledge) treat the model parameters themselves as the query, thereby providing the strongest possible guarantees. However, since releasing entire model parameters are not necessarily the only or even the most common query against generative models, there is a distinct need to quantify the membership loss for other types of queries (e.g. release of partial model or release of only synthetic data). Furthermore, DP approaches require knowledge of the model training process to provide guarantees, which does not allow providing guarantees retrospectively to pre-trained models. In the case of discriminative models, this has motivated the formal study of alternate approaches to quantify membership privacy loss without needing to rely on purely empirical evaluation Jayaraman and Evans (2019); Jayaraman et al. (2020)

. Not only do these approaches provide an alternate ways to quantify membership privacy loss for differentially private approaches, allowing these models to be trained for more epochs or with lower noise addition, they also extend to non-differentially private approaches as well.

In this paper, we build on these prior works to develop a flexible framework to measure membership privacy loss in generative models (MACE: Membership privACy Estimation). Our framework is built on a rigorous formulation of membership privacy loss (given a model and a query function) as a statistical divergence between the distribution of training set and non-training set samples. Our primary contributions are the following: i) We show connection of our membership privacy loss to differential privacy, ii) we develop principled sample-based estimators to measure membership privacy leakage for pre-trained models against any scalar/vector-valued query, iii) to deal with the imbalanced nature of membership privacy attacks, we allow users to select their preferred privacy loss metric from a generalized family of metrics Koyejo et al. (2014) and create custom statistical divergences. An overview of our framework can be seen in Figure 1. Finally, we test our framework using a variety of queries against different classes of generative models.

Figure 1: Overview of the MACE framework.

Preliminaries

Notation

Let be a data point from data distribution . Let be an ordered list of points which we will refer as training set, sampled from . We will use to denote uniformly sampling from a finite training set . Also, we will use to denote uniformly sampling from the data distribution not including the training set , which is referred as sampling from a hold-out set. For a set of samples , we define associated labels , where if is in the training set and otherwise for . In the rest of the paper, we will assume that the training set is fixed.

Generative Adversarial Networks (GANs)

Generative Adversarial Networks are the most common class of generative models. The original GAN algorithm Goodfellow et al. (2014) was designed to learn a distribution of a dataset by adversarially training two modules, namely, a generator and a discriminator. The goal of the generator was to learn a transformation that would convert a random vector to a realistic data sample. The goal of the discriminator module is to reliably distinguish synthetic samples (generated by the generator) from real samples. The mathematical formulation of this problem is as follows:

(1)

Here, is the real data distribution, and is the distribution of . In this work, we examine our framework on the original GAN and some of its variations.

Membership Inference Attack

The goal of a membership inference attack (MIA) Li et al. (2013); Shokri et al. (2017); Truex et al. (2018); Long et al. (2017), is to infer whether a sample is used in the training data or not. As described previously in Mukherjee et al. (2019), for a given learning algorithm , the goal of a MIA adversary is to learn the function , where:

In this paper, we focus on the specific case where is a generative machine learning algorithm (such as GANs). Study of MIAs against generative models is a relatively new research area. Hayes et al. (2019) first demonstrated MIAs against GANs. They propose: i) a black-box adversary which trains a shadow GAN model using the released synthetic data, ii) a white-box adversary that uses a threshold on the discriminator score of a released GAN model. Hilprecht et al. (2019)

demonstrates a black-box MIA adversary which uses only the generator of the GANs (or synthetic samples) and operates by thresholding the L2 distance between the query sample and the closest synthetic sample. In summary, these existing works focus on choosing a particular query function to the pre-trained model and form a strong binary classifier as the adversary. However, despite the empirical success of such approaches, there are several open problems: i) most papers use accuracy as the metric to evaluate MIA adversaries. In practice, hold-out set could be much larger than the training set and accuracy is known to be a rather poor metric for imbalanced classification problems. ii) threshold based adversary choice is based on heuristics and might be hard to transfer to different datasets and models. Our framework addresses these issues by deriving the theoretically strongest possible adversary for a generalized metric. Unlike other approaches, we also provide confidence intervals for individual-level privacy loss.

Membership Privacy

In this section, we formulate our membership privacy estimation framework.

Membership Privacy Framework

To evaluate membership privacy leakage of a pre-trained model or a query of the model, we define Experiment 1 inspired by Yeom et al. (2018); Jayaraman et al. (2020). The experiment assumes we have sampled a training set with size from the data distribution . Then a learning algorithm is trained on but we only have access to the pre-trained model through a query function which takes a sample from as a input. In practical scenarios, data providers could either release partial/full model parameters (including architecture), or only the output of the model. In the case of generative models, the output of the model would correspond to synthetic datasets. Some specific choices of queries will be discussed later. In this section, we do not distinguish between types of query function for simplicity.

Experiment 1 ( Yeom et al. (2018); Jayaraman et al. (2020) ).

Let be a learning algorithm, be the data distribution on . We first have a training set with size and have a trained model by algorithm . Let be a query function of the pre-trained model. Let be an adversary, a binary classifier of membership inference. The membership experiment proceeds as follows:

  1. Randomly sample such that

    with probability

    .

  2. If , then uniformly sample ; otherwise sample uniformly.

  3. Output utility score if ; otherwise outputs .

Based on the utility scores, Yeom et al. (2018) defines the metric for measuring privacy leakage through .

Definition 1 (Yeom et al. (2018)).

The membership advantage of by adversary is defined as

(2)

where the probability is taken over the random sample or , while the training set is fixed. Note, if algorithm is random guessing, we would have . If always holds, then we have .

The prior probability

of sampling from the training set is usually set as to form a balanced binary classification problem. This assumption has been widely accepted, e.g. Yeom et al. (2018). However, in practice, is usually much smaller than , as Jayaraman et al. (2020); Rezaei and Liu (2020) pointed out. To mitigate the imbalanced dataset problem, we introduce a generalized metric based estimator for practical purposes. The details are explained in a following section.

Bayes Optimal Classifier

As previously mentioned, a drawback of several existing MIA approaches is their use of heuristic classifiers as adversaries.The inherent drawback of such approaches is that the attack performance then becomes a function of how good a classifier is selected. To overcome this, we seek to use the strongest possible classier for a given classification task i.e. the Bayes optimal classifierSablayrolles et al. (2019); Mukherjee et al. (2019). This leads to the following lemma.

Lemma 2 (Bayes optimal adversarySablayrolles et al. (2019) ).

Given the query , the data distribution , the training set and the prior probability , the Bayes optimal classifier maximizing membership advantage is given by

(3)

Thus, we have

(4)

where

(5)

Note: Proofs of lemmas and theorems can be found in the Appendix. We will henceforth refer to as individual-level privacy loss for a sample . We have the following definition and lemma for membership privacy.

Definition 3 (-membership privacy).

We say a query function satisfies strong -membership privacy if and only if for any choice of , the privacy loss satisfies

(6)
Lemma 4.

If a query function is -membership private then the membership advantage satisfies for any adversary .

For simplicity, we refer to as the optimal membership advantage in the rest of the paper.

Connection to Differential Privacy

The membership privacy is connected to -differential privacy through the following theorem.

Theorem 1.

If the training algorithm is -differentially private, then for any choice of and , we have

(7)

where .

Specifically, when the prior probability , we have . As an example, if we set privacy budget , we have . This provides us with an intuitive way to understand the privacy budget selection in DP methods.

Membership Privacy Testing

In this section, we aim to provide a practical and principled approach to estimate and through Experiment 1.

Membership Privacy Testing

First, We construct an estimator for , denoted as . Then we estimate the optimal membership advantage by . For simplicity, denote and . Notice that the support is the domain of the queries . Now the problem reduces to estimation of the statistical difference between and defined by Eq (5). In the following sub-sections, we provide practical estimators for both discrete and continuous query functions .

Discrete Query

When is discrete, for a particular output , let and . Frequency-based plug-in methods have been widely used empirically Mukherjee et al. (2019); Yaghini et al. (2019) to estimate . Prior works simply collect samples from Experiment 1 and plug-in the fraction to estimate and . We account for the estimation error of this process by using a Clopper-Pearson confidence interval Clopper and Pearson (1934). We find a -Clopper Pearson lower bound for denoted as and -Clopper Pearson upper bound for denoted as . Then we have following confidence interval.

Lemma 5.

If , then the -confidence interval of is

Continuous Query

If is a

-dimensional continuous random variable. Let

and

. We first use Kernel Density Estimators (KDE) for both

and .

Recall that for samples from an unknown distribution and density , a KDE with bandwidth and kernel function is given by

(8)

And we have following plug-in confidence interval for KDE.

Lemma 6 (Chen (2017)).

With probability , we have

(9)

where

and , is the quantile of a standard normal distribution.

Using this, we have the following confidence interval for .

Lemma 7.

The -confidence interval of is

Estimation of

For a particular sample , now we can estimate the confidence interval through Experiment 1. To estimate the optimal membership advantage , we first split the samples into two partitions. We use the first partition to estimate empirical distributions for discrete queries or KDE for continuous queries. Then we use the second partition to estimate . The procedure is outlined as Algorithm 1.

0:  number of samples , prior of membership , training set , query function , confidence level
0:  estimate optimal membership advantage
1:  Perform Experiment 1 and draw two sets of samples and with membership and respectively.
2:  for i=1todo
3:        Estimate by samples from .
4:  end for
5:  
Algorithm 1 Estimate optimal membership advantage through Experiment 1

Membership Privacy Estimation by Generalized Metric

As pointed out in Jayaraman et al. (2020); Rezaei and Liu (2020), in reality, membership privacy leakage is a highly imbalanced problem. For example, the training set in a medical dataset may consist of data from the patients admitted to a clinical study with a particular health condition and the distribution may represent data from all patients (in the world) with the same health condition. Notably, in previous works Hayes et al. (2019); Hilprecht et al. (2019); Mukherjee et al. (2019) where accuracy/precision/recall are being used as a metric to measure the privacy risks, datasets are randomly split into training set and hold-out set with imbalanced prior (). For example, Hayes et al. (2019) uses only of the datasets for training and the rest serves as hold-out. As pointed out in Rezaei and Liu (2020), one can simply construct a trivial attack by checking whether a sample is correctly classified for discriminative models. Such a simple attack would result in high accuracy/precision/recall during privacy evaluation stage but would be useless in practice.

To overcome these issues, Jayaraman and Evans (2019); Jayaraman et al. (2020) suggest to use positive predictive value (PPV), which is defined as ratio of true members predicted among all positive membership predictions made by an adversary . Here, we allow users even more flexibility by adopting a generalized metric, first defined in Koyejo et al. (2014), which can be seen as a linear combination of following population quantities for binary classifications: true positives (TP), false positives (FP), false negatives (FN) and true negatives (TN).

The generalized metric is defined as

(10)

where , , and are pre-defined scalars for and . For example, we can represent commonly used metric as following:

(11)
(12)

Thus, through Experiment 1, we define the following metric to measure the membership privacy.

Definition 8.

The expected membership advantage of for under generalized metric is defined as

(13)

Now we seek to estimate the optimal adversary for this generalized metric. The characterization of Bayes classifier in Lemma 2 is not applicable anymore. In fact, Koyejo et al. (2014) has shown that the Bayes optimal classifier takes the form , where and is a constant depending on metric .

In this case, based on Koyejo et al. (2014), we derive the following consistent two-step plug-in estimator for optimal membership advantage under generalized metric . Similarly as Algorithm 1, we first split the samples into three partitions, we use the first partition to obtain estimator for . Then we use the second partition to find optimal . Finally, we apply our consistent classifier to estimate the membership privacy under generalized metric . The procedure is outlined in Algorithm 2. Similarly as before, if is discrete, we use plug-in estimator directly, if is continuous, we apply KDE estimator at the step 2.

0:  number of samples , prior of membership , training set , query function , generalized metric
0:  estimate membership privacy
1:   Perform Experiment 1 and draw two sets of samples , and with membership , and respectively.
2:   Estimate by samples from .
3:  
4:  Let .
5:   Evaluate .
Algorithm 2 Estimate membership privacy under generalized metric through Experiment 1

Attack on Generative Models

In this section, we seek to understand what kind of information could be released to public with limited privacy leakage. In DP based generative models, the weights of both discriminator and generator are differentially private. It is safe to release the whole pre-trained model including weights and architecture since the membership privacy is bounded by Theorem 1. However, such strong privacy guarantees require a large amount of noise added to the gradient, which results in low sample qualities. In this section, we aim to construct several example query functions so we can compare membership privacy leakage under different data releasing scenarios.

Following Chen et al. (2019); Hilprecht et al. (2019); Hayes et al. (2019), we divide our attack settings based on accessibility of model components: (1) access to models; (2) access only to generated images. In the following sub-sections we consider some representative examples of query functions for each of these two attack settings.

Accessible Models

In this case, the adversary has access to the models. Typically, white-box attacks assume that the adversary has access to partial/full model parameters, while black-box attacks assumes that adversary can only call output from a model API released to the public. Existing work Sablayrolles et al. (2019) has theoretically shown that releasing model parameters would not make the attack more successful, if we assume the model is properly trained. For generative models, especially GANs, the most successful attack known uses the output of the discriminators Hayes et al. (2019). Intuitively, if a sample is shown in the training set, the discriminator would be more likely to output high values. Notice that prior works do not use the Bayes optimal adversary and instead use a heuristic adversary based on thresholding of discriminator scores, where the threshold is selected manually.

Accessible Synthetic Datasets

In the common practice of synthetic data releasing, researchers or data providers may consider releasing only generated datasets or just the generator. However, prior works Chen et al. (2019); Hilprecht et al. (2019) have shown releasing generator/synthetic datasets could also cause privacy leakage. Specifically, under the case where generators are released, Chen et al. (2019) considers the following query function , where is the generator released to the public. Hilprecht et al. (2019) first generates a large synthetic dataset using the generator and then uses following query function:

(14)

where is some distance metric and is -ball defined on distance metric . In both cases, it is reported that attacking generators or synthetic datasets are not as successful as attacking discriminator scores.

Similar to these approaches, here we consider a simple query function (defined below). For a sample drawn from Experiment 1, we consider using the nearest neighbor distance to synthetic datasets as the query function:

(15)

where d is a distance metric. In the experiment section, we show comparable performance as Hilprecht et al. (2019).

Note: Aside from these two common classes of attacks there may be hybrid attacks possible that are a combination of these two classes of attacks. Since the design of such attacks is beyond the scope of this paper, we leave this for future work. Furthermore, in the main paper we restrict to demonstrating our framework on some common classes of attacks on scalar valued queries. However, attacks against vector valued queries may be useful in special circumstances (like attacking models with multiple generators/discriminators Mukherjee et al. (2019)). Some examples of such attacks are reported in the Appendix.

Training epochs(a)

Training epochs(b)

Training epochs(c)

Training epochs(d)

Training epochs(e)

Training epochs(f)

Figure 2: Training progress vs. membership advantage of WGAN-GP by discriminator scores and average

-confidence level: (a) 100 bins of discriminator scores on MNIST trained on

samples; (b) Gaussian KDE with a bandwidth equal to 1 on MNIST trained on

samples; (c) 100 bins of discriminator scores on CIFAR-10 trained on

samples; (d) Gaussian KDE with a bandwidth equal to 1 on CIFAR-10 trained on samples ;(e)100 bins of discriminator scores on CIFAR-10 trained on samples; (d) Gaussian KDE with a bandwidth equal to 1 on CIFAR-10 trained on samples

Experiments

In this section, we perform practical membership privacy estimation on pre-trained generative models.

Setup

We first demonstrate the effectiveness of our methods on MNIST and CIFAR-10 datasets using WGAN-GPGulrajani et al. (2017) (and GAN Goodfellow et al. (2014) in appendix). MNIST contains gray scale images of handwritten digits with 70000 digits from 0 to 9. CIFAR-10 comprises of 10 classes of 32 x32 RGB colored images with 60000 images in the whole dataset. Both of them are commonly used in GAN literature. Following common practice of membership inference attack on generative models, Hayes et al. (2019); Mukherjee et al. (2019); Chen et al. (2019), we choose a random subset of the entire dataset as a training set to show overfitting. For simplicity, we perform Experiment 1 without replacement and use the whole dataset to estimate and compute.

Estimation with confidence interval

We begin our privacy estimation by discriminator scores, which is also the most popular attack for discriminative models. To compare the discrete query and continuous query, we use the same settings as prior works Yaghini et al. (2019); Mukherjee et al. (2019) where we bin the discriminator scores into 100 equally spaced bins. Then we apply Algorithm 1 on discrete confidence score from the discriminator. Noticed that such discretization mimics a common real-world scenario where we do not observe the exact confidence output (e.g. COMPAS system outputs a score from 1 to 10 instead of the full confidence score.) As a comparison, we also apply KDE with Gaussian kernel and bandwidth set at . Throughout the experiments, we set , confidence level . Figure 2 shows the estimated optimal privacy advantage and average - confidence interval through training progress. As we can see here, the confidence interval of KDE totally falls in the confidence interval of the discretized query. This shows our KDE based Algorithm 1 is more accurate compared to the traditionally mentioned binning strategyMukherjee et al. (2019). The estimated membership advantage starts from 0 and increases with training. As expected, with more epochs, the model starts to memorize the training set.

Figure 3 (a) shows the estimated membership advantage by using generated synthetic datasets and nearest neighbor distance. The L2 distance is not perceptual meaningful for this 28x 28 image in pixel space. Instead, we calculate embedding for each sample by using a pre-trained VGG net with perceptual loss Zhang et al. (2018). The estimated membership advantage is much smaller than discriminator scores. This indicates that the privacy risk of publishing generated synthetic dataset is lower than publishing the white-box model including discriminator and generator. Our results are also comparable to Hilprecht et al. (2019). Figure 3 (b) shows the estimated membership advantage on the same model trained on MNIST for different sizes of the synthetic dataset. It can be seen here that increasing the size of the synthetic dataset increases the risk of privacy leakage. Though it is still significantly more private than releasing the models.

Training epochs(a)

Synthetic dataset sizes(b)

Figure 3: (a) Training progress and membership advantage estimated by generated synthetic datasets with samples (b) The estimated membership advantage for different synthetic dataset sizes

Estimation with Generalized Metrics

As mentioned previously, flexible privacy estimation framework can be used for accessing generalized metrics. Here as a sanity check, we compare PPV and Accuracy as metrics for membership privacy estimation, since both of them have been commonly used in MIA literature Jayaraman and Evans (2019); Jayaraman et al. (2020). We set here to construct an imbalanced dataset. Figure 4 (a) shows PPV and accuracy during training. The baseline accuracy for such imbalanced dataset is 0.9, which can cause some misleading reporting. However, PPV provides a better understanding of how successful the adversary is in predicting training set membershipJayaraman and Evans (2019). Our generalized metric allows users to explore a wide variety of other metrics and potentially identify the best metrics catered to specific situations.

Training epochs

Figure 4: Training progress and PPV and Accuracy for MNIST estimated by discriminator scores using Algorithm 2

Differential Privacy vs. Membership Privacy

As shown in Theorem 1, membership privacy loss is bounded by differential privacy guarantees. Differential privacy considers the worst case scenario, where for each individual sample, the privacy loss

is bounded. Depending on data distribution, it is possible that expected membership advantage is small but some outliers in the data may be much more vulnerable than others. In this section, we first examine the privacy risk of subgroup samples in non-DP models and then perform experiments on DP-cGAN

Torkzadehmahani et al. (2019).

As pointed out by Yaghini et al. (2019), membership leakage for some subgroup may be more vulnerable than others for discriminative models. We construct a toy dataset from MNIST by forming a new imbalanced dataset with digit zeros and digit sixes. We set

for simplicity. This dataset is also used for anomaly detection

Bandaragoda et al. (2014). We perform our privacy estimation by KDE for each subgroup. Figure 5 (a) shows the membership advantage for digit zeros is consistently lower than digit sixes. This means even if average-case membership advantage is small; sometimes, for specific sub-groups, the privacy risk might be larger.

Training epochs(a)

-differential privacy(b)

Figure 5: (a) Estimated privacy advantage by Algorithm 1 on digit zeros and digit sixes;(b) Training progress and membership advantage estimated for DP-cGAN

Figure 5 shows membership advantage trained DP-cGANTorkzadehmahani et al. (2019) with different privacy budget . We set . As we can see here, the theoretic upper bound given by is much larger than the estimated membership advantage from discriminator score.

Figure 6 shows per-instance privacy loss for both and . As expected, even the worst case observed privacy loss is strictly bounded by worst-case bound that is estimated from ( for , for ).

Privacy Loss (a)Privacy Loss (b)

Figure 6: Per-instance privacy loss for DP-cGAN with privacy budget (a) Per instance loss for (b)Per instance loss for

Conclusions

Understanding membership privacy is critical for data providers to release synthetic datasets to the public. Here we developed a formal framework that provides a certificate for the privacy leakage given a query function to the pre-trained model. We demonstrate the flexible capabilities of our framework on multiple query and model types, as well as metrics.

Our experiments indicate that releasing pre-trained GAN models results in much higher membership privacy leakage than would be the case if only synthetic datasets were released. However, we also show that larger the size of the synthetic dataset released, more is the membership privacy leakage. We also show that while DP-based methods have strong privacy guarantees, realistic MIA attacks against DP based methods are often much less effective, even at the per-instance level.

While we focus on using commonly used queries in this paper, there could be better designed queries that would lead to stronger MIAs. Future work can be aimed at using our MACE framework to identify better queries for general as well as specific generative models.

References

Ethics statement

Our current work is motivated by ethical concerns around the use of generative models to enable privacy preserving sharing of sensitive data. This has become a popular area of research due to the new General Data Protection Regulation (GDPR) policies, which enforce strict regulations on sharing of personal data. Our current framework provides a practical way to evaluate privacy loss in generative models in response to specific MIAs. We hope that this will help both algorithm developers as well as data sharers audit the privacy loss of algorithms. It is important however to be mindful that MACE provides query specific certificates, which should not be conflated with how private a model is against any attack. Furthermore, while comparing different algorithms, it is important to note that algorithms trained of different portions of the data may not be directly comparable. Performing such comparisons may lead to misleading conclusions.

Appendix

Proof of Lemma 2

Proof.

As we can see from Definition 1, the membership advantage is defined as . This means the Bayes classifier is given by . We get optimal membership advantage by plug in this Bayes classifier. Sablayrolles et al. (2019) has similar analysis for accuracy. ∎

Proof of Theorem 1

Proof.

-differential privacy is defined as follows:

Definition 9 (pure Differential Privacy).

We say a randomized algorithm is -differentially private if for any pair of neighbouring databases and that differ by one record and any output event , we have

(16)

By the post-processing property, the -differential privacy indicates that the query output satisfies for any record z being in the dataset or not, we have

(17)

Then Theorem 1 directly follows from that fact that

(18)

Proof of Lemma 6 and Lemma 7

Proof.

This follow from the fact that is a monotonically increasing function for . ∎

Model architectures and hyper–parameters

Here we outline the different layers used in the model architectures for different datasets. The last layers of discriminators for WGAN experiments do not have sigmoid activation functions. The hyper-parameters are chosen the same same as

Goodfellow et al. (2014); Gulrajani et al. (2017).

Mnist

Generator layers

  • Dense(units, input size)

  • LeakyReLU()

  • Dense(units)

  • LeakyReLU()

  • Dense(units)

  • LeakyReLU()

  • Dense(units, activation = ’tanh’)

Discriminator layers

  • Dense(units)

  • LeakyReLU()

  • Dense(units)

  • LeakyReLU()

  • Dense(units)

  • LeakyReLU()

  • Dense(units, activation = ’sigmoid’)

Cifar–10

Generator layers

  • Conv2DTranspose(filters, kernel size

    , strides

    )

  • ReLU()

  • Conv2DTranspose(filters, kernel size, strides

    , padding

    )

  • ReLU()

  • Conv2DTranspose(filters, kernel size, strides, padding)

  • ReLU()

  • Conv2DTranspose(filters, kernel size, strides,padding, activation = ’tanh’)

Discriminator layers

  • Conv2D(filters, kernel size, strides)

  • Reshape(target shape)

  • Conv2D(filters, kernel size, strides)

  • LeakyReLU()

  • Conv2D(filters, kernel size, strides)

  • LeakyReLU()

  • Conv2D(filters, kernel size, strides)

  • LeakyReLU()

  • Dense(units, activation = ’sigmoid’)

DP-cGAN implementation

We adapted code from Torkzadehmahani et al. (2019) and datasets (https://github.com/reihaneh-torkzadehmahani/DP-CGAN). The choice of privacy budget is set from to with a fixed and clipping value .

Multi-dimension Query

Our framework could be used for multi-dimensional query function. As an example, we show attack on generated synthetic datasets where the query function is the -nearest neighbors. Figure 7 shows the WGAN-GP model trained on MNIST datasets with samples in the training set.

Training epochs

Figure 7: Training progress and 3 nearest neighbor distance in the generated synthetic datasets. The number of samples in the synthetic datasets is set as . The bandwidth is set as .

Privacy Estimation for JS-GAN

Here we also attach membership privacy estimation results for JS-GANGoodfellow et al. (2014) in Figure 8.

Training epochs(a)

Training epochs(b)

Training epochs(c)

Training epochs(d)

Figure 8: Training progress and membership advantage of JS-GANGoodfellow et al. (2014) by discriminator scores with confidence level: (a) 100 bins of discriminator scores on MNIST trained on samples; (b) Gaussian KDE with a bandwidth equal to on MNIST trained on samples; (c) 100 bins of discriminator scores on CIFAR-10 trained on samples; (d) Gaussian KDE with a bandwidth equal to on CIFAR-10 trained on samples;

Multi-dimension Query for privGAN

To demonstrate the utility of attacks with multi-dimensional queries against unique generative models, we show a case study of privGANMukherjee et al. (2019). privGAN is a GAN architecture that consists of multiple generators and discriminators, and is designed to provide membership privacy. The training set is partitioned to train each generator and discriminator separately. A privacy discriminator is used as a regularizor to prevent overfitting. The attack used in Mukherjee et al. (2019) first computes discriminator score for each of the discriminator, attack using each discriminator score and then chooses the maximum one as final estimated privacy loss. Here we directly use a 2d query: to formulate an alternate attack. Figure 9 shows a comparison of two strategies. We choose the same hyper-parameters as Mukherjee et al. (2019). The privacy ratio is set as . We use 2 discriminator and generator pairs as an example. The red curve is based on maximum of two 1d attack on discriminator scores for each epoch. The blue curve is based on directly 2d query: . As we can see from Figure 9, the membership advantage with 2d query performs much better than the combination of two 1d query. Although privGAN has lower overfitting than a JS-GAN (Figure 8), releasing two discriminator could potentially increase the privacy risk.

Training epochs

Figure 9: Training progress and membership advantage of privGAN by two discriminator scores with a bandwidth .