Generative machine learning models such as Variational Autoencoders (VAE) and Generative Adversarial Networks (GAN) infer rules about the distribution of training data to generate new images, tables or numeric datasets that follow the training data distribution. The decision whether to use GAN or VAE depends on the learning task and dataset. However, similar to machine learning models for classification[CLK+18, FJR15, NSH19, SSSS17, ZLH19] trained generative models leak information about individual training data records [CYZF20, HMDC19, HHB19]. Anonymization of the training data or a training optimizer with differential privacy (DP) can reduce such leakage by limiting the privacy loss that an individual in the training would encounter when contributing their data [ACM+16, BRG+21, JE19]. Depending on the privacy parameter differential privacy has a significant impact on the accuracy of the generative model since the perturbation affects how closely generated samples follow the training data distribution. Balancing privacy and accuracy for differentially private generative models is a challenging task for data scientist since privacy parameter states an upper bound on the privacy loss. In contrast, quantifying the privacy loss under a concrete attack such as membership inference allows to quantify and compare the accuracy-privacy trade-off between differentially private generative models.
This paper compares the privacy-accuracy trade-off for differentially private VAE. This is motivated by previous work that has identified VAE are more prone to membership inference attacks than GAN [HHB19]. Hence, data scientists may want to particularly consider the use of differential privacy when training VAE. In particular, we formulate an experimental study to validate whether our methodology allows to identify sweet spots w.r.t. the privacy-accuracy trade-off in VAE. We conduct experiments for two datasets covering image and activity data, and for three different local and central differential privacy mechanisms. We make the following contributions:
Quantifying the privacy-accuracy trade-off under membership inference attacks for differentially private VAE.
Comparing local and central differential privacy w.r.t. the privacy-accuracy trade-off for image and motion data VAE.
This paper is structured as follows. Preliminaries are provided in Section 2. We formulate our approach for quantifying and comparing the privacy-accuracy trade-off for DP VAE in Sections 3. Section 4 introduces reference datasets and learning tasks. Section 5 presents the evaluation and is followed by a discussion in Section 6. We discuss related work in Section 7. Section 8 provides conclusions.
In the following we provide preliminaries on VAE, MI and DP.
2.1 Variational Autoencoders
Generative models are trained to learn the joint probability distributionof features and labels of a training dataset . We focus on Variational Autoencoders (VAE) [KW14]
as generative model. VAE consist of two neural networks: encoderand decoder . During training a record is given to the encoder which outputs the mean
of a Gaussian distribution. A latent variableis then sampled from the Gaussian distribution and fed into the decoder . After successful training the reconstruction should be close to . During training two terms are minimized. First, the reconstruction error Second, the Kullback-Leibler divergence between the distribution of latent variables and the unit Gaussian. The KL divergence term prevents the network from only memorizing certain latent variables since the distribution should be similar to the unit Gaussian. Kingma et al. [KW14] motivate the training objective as a lower bound on the log-likelihood and suggest training and for a training objective by using the reparameterization trick. Samples are generated from the VAE by sampling a latent variable and passing through . Similar to GAN conditional VAE generate samples for a specific label by utilizing a condition as input to and .
2.2 Reconstruction Membership Inference Attack against VAE
Membership inference (MI) attacks against machine learning models aim to identify the membership or non-membership of an individual record w.r.t. the training dataset of a target model. To exploit differences in the generated samples of a trained target model the MI adversary uses a statistical attack model. Therefore, computes a similarity or error metric for individual records . After having calculated such a metric for a set of records labels the records with the highest similarity, or lowest error, as members and all other records as non-members. For VAE the reconstruction loss quantifies how close a reconstructed training record is to the original training data record. Based on the reconstruction loss Hilprecht et al. formulate the reconstruction MI attack against VAE that outperforms prior work [HHB19]. The reconstruction MI attack assumes that a reconstructed training record will have a smaller reconstruction loss than a reconstructed test record and repeatedly computes the reconstruction for a record by drawing the latent variable from the record-specific latent distribution . The mean reconstruction distance for samples is then calculated by Eq. (1). Furthermore, the reconstruction MI attack depends on the availability of a distance measure . In this work we use the generic Mean Squared Error (MSE) and the image domain specific Structural Similarity Index Measure (SSIM) as distance measures. A record is likely a training record in case of small mean reconstruction distances for MSE or a similarity close to for SSIM.
2.3 Differential Privacy
For a dataset differential privacy (DP) [dwork2006a] can either be used centrally to perturb a function or locally to perturb records by perturbation. In central DP (CDP) an aggregation function is first evaluated and then perturbed by a trusted server. Due to perturbation it is no longer possible for an adversary to confidently determine whether was evaluated on , or some neighboring dataset differing in one record. Privacy is provided to records in as their impact on is limited. Mechanisms that follow Definition 1 are used for perturbation of [dwork2006b]. CDP holds for all possible differences by scaling noise to the global sensitivity of Definition 2. To apply CDP in VAE we use a differentially private version [ACM+16] of the Adam [kingma2015] stochastic gradient optimizer111 We used Tensorflow Privacy:
We used Tensorflow Privacy:https://github.com/tensorflow/privacy. We refer to this CDP optimizer as DP-Adam. DP-Adam represents a differentially private training mechanism that updates the weight coefficients of a neural network per training step with , where denotes a Gaussian perturbed gradient and is some scaling function on
Definition 1 (-Central Differential Privacy).
A mechanism gives -central differential privacy if differing in at most one element, and all outputs
Definition 2 (Global Sensitivity).
Let and be neighboring. The global sensitivity of a function , denoted by , is defined as
Definition 3 (Gaussian Mechanism [dwork2014]).
Let be arbitrary. For , the Gaussian mechanism with parameter gives -CDP, adding noise scaled to .
We refer to the perturbation of records as local DP (LDP) [wang2017]. LDP is the standard choice when the server which evaluates a function is untrusted. In the experiments within this work we use a local randomizer to perturb each record independently. Since a record may contain multiple correlated features a must be applied sequentially to each feature which results in a linearly increasing privacy loss. We adapt the definitions of Kasiviswanathan et al. [KLN+08] in Definition 4 to achieve LDP by using . A series of executions per record composes to a local algorithm according to Definition 5. -local algorithms are -local differentially private [KLN+08], where is a summation of all composed privacy losses. In this work we will use the by Fan [Fan18] for LDP image pixelization. Their applies the Laplace mechanism of Definition 6 with scale to each pixel. Parameter represents the neighborhood in which LDP is provided. Full neighborhood for an image dataset would require that any picture can become any other picture. In general, providing DP or LDP within a large neighborhood will require high privacy parameters values to retain meaningful image structure. Small privacy parameters will result in random black and white images.
Definition 4 (Local Differential Privacy).
A local randomizer (mechanism) is -local differentially private, if and for all possible inputs and all possible outcomes of
Definition 5 (Local Algorithm).
An algorithm is -local if it accesses the database via with the following restriction: for all , if are the algorithms invocations of on index , where each is an -local randomizer, then .
Definition 6 (Laplace Mechanism [dwork2014]).
Given a numerical query function , the Laplace mechanism with is an -differentially private mechanism, adding noise scaled to .
We furthermore use a domain independent LDP mechanism specifically for VAE, to which we refer as VAE-LDP. VAE-LDP by Weggenmann et al. [WRA+22]
allows a data scientist to use VAE as LDP mechanism to perturb data. This is achieved by limiting the encoders mean and adding noise to the encoders standard deviation before sampling the latent codeduring training. After training, the resulting VAE is used to perturb records with . In this work we limit the resulting mean of the encoder to
by using the tanh activation function. Furthermore, we introduce noiseaccording to noise bound by enforcing a lower bound on the standard deviation of . We set the standard deviation to .
3 Accuracy and Privacy for Variational Autoencoders
We compare the privacy-accuracy trade-off for differentially private VAE to support a data scientist in choosing privacy parameters . For this we formulate a framework to quantify privacy and accuracy as well as the privacy-accuracy trade-off for differentially private VAE with local or central differential privacy. The framework is depicted in Figure 1. The framework first splits a dataset into three distinct subsets: training data , validation data and test data . The target model VAE is trained on and optimized on . After training, we use the target model to generate a new dataset with the same distribution as . We use as input for the target classifier. Our framework quantifies privacy by means of a MI adversary performing a MI attack (cf. Section 2.2). The MI attack dataset for training and evaluating the MI attack model is sampled equally from and . We use the framework to calculate the baseline trade-off, as well as CDP and LDP trade-off. The baseline trade-off is calculated from the baseline target classifier test accuracy and the MI attack without any DP mechanism. For the CDP trade-off the target model is trained with DP-Adam (cf. Section2.3).
The LDP trade-off can be computed in three settings to which we refer as LDP-Train, LDP-Full, and VAE-LDP. In LDP-Train a LDP mechanism is applied solely to , but not and
. This scheme is similar to Denoising Autoencoders[VLL+10]
. However, we evaluated the LDP-Train setting and observed it to be mostly impractical for VAE since it introduces a transfer learning task. In particular, working on two different data distributions forand leads to distant latent representations and contrasting reconstructions. This neither benefits the target classifier test accuracy nor reduces MI attack performance in comparison to perturbing both training and test data. Hence, we only mention LDP-Train for sake of completeness but will not discuss LDP-Train in the rest of this work. In LDP-Full, is perturbed and the training objective of the target model and the target classifier is changed implicitly (i.e., performance on perturbed data). VAE-LDP perturbs generated data by training a perturbation model that follows the target model architecture to enforce LDP.
The use of LDP also leads to MI attack variations. In particular, the MI attack can either be evaluated against perturbed or unperturbed records in . We argue that in the LDP-Full setting the MI attack performance against unperturbed records is particularly relevant from the viewpoint of
, since the unperturbed records represent the actual sensitive information and otherwise the attack model would solely learn the to differentiate two distributions by the perturbation skew. Hence, within this work for the LDP settings we exclusively consider the MI attack performance against unperturbed records from.
We evaluate the accuracy of the VAE target model based on the performance of a subsequent target classifier on after training on . This is a common approach to evaluate the accuracy of generative models [FOGD19, JYS19, TKP19]. To evaluate the accuracy of the MI attack we use Average Precision of the Precision-Recall curve (MI AP) which considers membership as sensitive information (i.e., neglecting non-membership). The MI AP quantifies the integral under the precision-recall curve as a a weighted mean of the precision per threshold and the increase in recall
from the previous threshold. Using the accuracy of such a curve instead of a singular value allows us to measure the MI attack performance under optimal conditions. For example, the MI adversarycould decide to increase the assumed certainty by raising the threshold closer to 1. Independently of the target model accuracy, might be interested in lowering MI AP below a predefined threshold that is motivated by legislation (similar to the HIPAA requirement on group sizes [hhs]).
We quantify the relative trade-off between accuracy and privacy by [BRG+21] which considers the relative difference between the change in test accuracy for and the change in MI AP for . We slightly extend the original definition [BRG+21] to hold for generic accuracy scores that can be used to quantify the accuracy of the target model as well as success of the attacker. Let be a measure to rate the performance of an attack and a measure to rate the performance of the target model. represent the scores without DP, while represent the scores for a specific privacy parameter . Furthermore, let represent the uniform random guessing baseline where and depends on the chosen measure. It applies that where depicts the number of classes. Eq. (2) provides our adjusted definition for . Similar to the original definition we bound between and s.t. does not approach infinity when one measure drops while the other remains stable. highlights that the relative loss in model accuracy exceeds the relative loss in attack performance. Contrary, for the relative loss in model accuracy is smaller than the relative loss in attack performance. In general, a large gain in privacy, i.e., large drop in attack performance, at a small target model accuracy drop cost is beneficial. Hence seeks to maximize .
4 Datasets and Learning Tasks
Within this work we use two reference datasets for image and activity data.
Labeled Faces in the Wild (LFW).
LFW is a reference dataset for image classification [HMLL12]. We resize the images to by using a bilinear filter and normalize pixels to for improved accuracy. Images are distributed unbalanced across the classes with a minimum of 6 and a maximum of 530 pictures. We consider the most frequent 20 and 50 classes to which we refer as LFW20 and LFW50. In total, LFW20 consists of records and LFW50 consists of records. 50% of the data is allocated to , 20% to and 30% to . Our VAE target model is an extension of the architecture by Hou et al. [HSSQ17] and depicted in Figure 1(a). consists of four convolutional layers withcomprises a dense layer followed by four convolutional layers with kernels, a stride of one and Leaky ReLU as activation function. Before each convolutional layer we perform upsampling by a scale of two with the nearest neighbor method. New data is generated by randomly drawing from a multivariate Gaussian distribution which is passed through the decoder to create a new record. The target classifier is build upon a pre-trained VGG-Very-Deep-16 (VGG16) model [SZ15]
. The first part of VGG16 consists of multiple blocks of convolutional layers and max-pooling layers for feature extraction. The second part of VGG16 is a fully-connected network for classification. After loading the pre-trained weights222https://github.com/rcmalli/keras-vggface we keep the convolutional core and train the classification part.
MS is a reference dataset for human activity recognition with accelerometer and gyroscope sensor measurements [MCCH18]
. Each measurement consists of twelve datapoints. Measurements are labeled with activities such as walking downstairs, jogging, and sitting. The associated learning task is to label a time series of measurements collected at 50Hz with the corresponding activity. The VAE target model shall reconstruct such a time series. We normalize the data toand group the measurements to time series of 10 seconds. 10% of the data is allocated to and each, and the remaining 80% is allocated to . Using 10% of data for training is in line with previous work on MI against generative ML models [CYZF20, HMDC19, HHB19]. For the target model we use a multitask approach in which consists of a simple LSTM layer with 164 cells followed by two dense layers for and . and are used to sample through the reparameterization trick.
starts with a repeat vector unit for. This allows us to create sequences and pass to an LSTM layer. Furthermore, a second LSTM layer with twelve units is used to output sequences for each sensor. To support the reconstruction task we input to a classifier. Figure 1(b) shows the target model architecture. New data is generated by passing training records of a given class to create , which is then passed through the decoder to generate a record. We have to sample from the class-specific latent distribution since the latent space is clustered as a consequence of the multitask classifier. The overall loss is balanced with , ,
for KL-loss, reconstruction loss and classifier loss respectively. The target classifier is based on the Human Activity Recognition Convolutional Neural Network (HARCNN) architecture for time series data by Saeed[saeed2016]. In HARCNN each convolutional layer is followed by a dropout layer which we set to to learn a more general representation of the data. The final two fully-connected layers are used for classification.
Instead of comparing privacy parameter we designed and performed an experiment to compare the privacy-accuracy trade-off in different DP settings. The experiment quantifies the target classifier test accuracy and MI AP by using the framework depicted in Figure 1 (cf. Section 3). We discuss the experiment for each dataset in four parts. First, we state the baseline test accuracy of the target classifier on non-generated data to provide information on the general drop in test accuracy between generated and non-generated data. Second and Third, we discuss CDP and LDP results. Fourth, the results for VAE-LDP are presented. For CDP, LDP and VAE-LDP the experiment results are depicted in two figures each, stating target classifier accuracy over and MI AP over . In each figure we also state the original target classifier test accuracy and MI AP for unperturbed data.
For each dataset the target model is trained for epochs after which the target model test loss did not decrease significantly while the target classifier accuracy did not increase anymore. The target classifier is trained on generated samples from the VAE until the target classifier test data loss is stagnating (i.e., early stopping). This experiment design avoids overfitting and increases real-world relevance of our results. For CDP we use DP-Adam which samples noise from a Gaussian distribution (cf. Definition 3) with scale
. We use the heuristic of Abadi et al.[ACM+16] and set as the median of norms of the unclipped gradients over the course of 100 training epochs. We evaluate increasing CDP noise regimes for the target model by evaluating noise multipliers . The noise levels cover a wide range from baseline accuracy to naive majority vote. The exact values are presented in Table 2 in the appendix. Due to the varying LDP mechanisms we state the privacy parameter for a single mechanism execution for feature per dataset in the next sections and summarize in in Table 2. VAE-LDP perturbation models are trained with various noise bounds . Again, the corresponding exact values are presented in Table 2. For the MI attack we randomly draw records both from and for . The experiments were run on Amazon Web Services Elastic Compute Cloud instances of type “p2.xlarge”333https://aws.amazon.com/ec2 with 64 GiB RAM. This instance type is optimized for GPU computing. We implemented our experiments in Python 3.8 and use TensorFlow Privacy444 https://github.com/tensorflow/privacy. We provide all code on GitHub555https://github.com/SAP-samples/security-research-vae-dp-mia
. We identify hyperparameter values for batch size, epochs and learning rate for all target classifiers with Bayesian optimization.
On non-generated baseline images the target classifier achieves baseline test accuracies of and for LFW20 and LFW50. For generated images we provide two accuracy metrics. Namely, the SSIM of the images generated by the target model and the test accuracy of the target classifier. Figure 4 states the accuracy metrics for unperturbed and CDP perturbed VAE. The figure illustrates that the unperturbed VAE does not generate images with close proximity to the baseline images. However, the images still suffice to produce target classifier test accuracies well above majority voting. Shapes of the head, hair, and some facial expressions as well as the background can be observed for reconstructed images in Figure 6 in the appendix. We also use SSIM as a domain specific distance metric for the reconstruction MI attack. Figure 3 illustrates that the reconstruction MI attack yields a perfect MI AP of 1 for unperturbed VAE. This high MI AP is due to the large gap between train and test SSIM.
Figure 4 states CDP test accuracy over . The steady accuracy decrease is due to the closing target model train-test gap, which we state in Table 2 in the appendix. The resulting regularization also lowers the SSIM of the generated images. A particular sharp drop in SSIM is observable for (). For this datapoint posterior collapse occurs when produces noisy and leading to unstable latent codes which in turn are ignored by . In consequence, produces reconstructions independently of leading to a increased reconstruction loss, while and become constant and minimize the KL-loss [LTGN19]. As a consequence the target classifier resorts to majority vote. The CDP MI AP over is stated in Figure 3. The increased regularization caused by CDP is at the same time lowering MI AP. In addition, due to the inherent label imbalance in LFW the VAE reconstruction of loosely populated classes is worse than the reconstruction for classes with more records. Still, the resulting privacy-accuracy trade-off leaves space for compromise. When would for example be willing to accept an MI AP of up to this would require setting (). leads to target classifier test accuracy of . However, if raise their threshold to this would allow for () and a target classifier test accuracy of .
For LDP we use differentially private image pixelization (cf. Section 2.3) to create LDP training and test datasets with within neighborhood . Figure 3 presents the LDP test accuracy and SSIM over . In contrast to the CDP experiments the target classifier test accuracy and target model SSIM metrics do not show a regularization effect caused by the introduced noise for LDP. The train-test gap narrows only slightly and the random noise introduced in the dataset makes the reconstruction task for the VAE more difficult. Thus, the reconstruction MI attack AP in Figure 3 remains nearly unchanged until at which point the target model SSIM and the target classifier test accuracy are already at poor levels and little room for compromise is existing.
VAE-LDP accuracy over is presented in Figure 3. Counterintuitively, the test accuracy even rises over and the train-test gap and SSIM gap narrow. This is due to the VAE-LDP perturbation model which reconstructs only essential facial features and leaves the background grey when faced with small . Hence the learning task for the target classifier the reconstruction task for the VAE are simplified. Figure 6 in the appendix underlines this observation by showing the same image for VAE-LDP with increasing noise. The reconstruction attack against VAE-LDP in Figure 3 also decreases as the SSIM gap closes. All in all, the results point towards an advantage of the VAE-LDP mechanism over the LDP image pixelization mechanism. The main disadvantage of the VAE-LDP mechanism over image pixelization is the increased effort to optimize perturbation model hyperparameters.
Due to the absence of a domain specific accuracy metric we solely consider test accuracy as accuracy metric for this dataset. The target classifier for MS achieves a baseline test-accuracy of for non-generated data. Figure 4 states the test accuracy for original and CDP perturbed data over . The test accuracy is dropping to for generated data, which is due to the target model being unable to reconstruct time series for all activities equally well. The reconstruction MI attack has not been used for a time series data in previous work and we suggest ti use MSE as reconstruction MI attack distance metric. The original MI attack performance is depicted in Figure 4 and achieves an MI AP . We see three main reasons for the low MI AP in comparison to LFW. First, MS is more balanced in comparison to LFW. Second, there are significantly more records in MS than in LFW and thus more records per class allow to learn a more general representation. Third, sensor measurements exhibit ambiguities and thus the target model tends to learn more general trends instead of absolute values.
The CDP target classifier test accuracy only slightly worsens with increasing noise as illustrated in Figure 4. This is mostly due to the target classifier resorting to majority vote for particular activities with increasing noise. Figure 7
in the appendix shows the confusion matrix for the target classifier at(). The target classifier resorts to majority vote for classes 0 to 3 which represent different types of movements, but is still able to distinguish classes and which represent standing and sitting. The latter two activities are of different nature than the movements and remain distinguishable under noise. The MI AP illustrated in Figure 4 shows again the ineffectiveness of the reconstruction MI attack against the MS time series data.
For LDP we use the Laplace mechanism to perturb each measurement (cf. Section 2.3) and specify the sensitivity per sensor as the maximum of all corresponding observed values to create differentially private time series. Figure 4 shows the target classifier accuracy over . Notably, the target classifier test accuracy increases slightly before dropping sharply over . Here, small noise levels are actually positively influencing the target model training and hence also allow the target classifier to better distinguish between different classes. In general, the simple LDP mechanism used within this experiment seems to prevents the target model to infer structural information and in turn limits reconstruction or and meaningful generation of records. Figure 4 presents the MI attack performance. The MI AP decreases to already at the largest and remains close to the baseline for all further .
VAE-LDP test accuracy over is depicted in Figure 4. In comparison to LFW the MS perturbation models do not focus on the essential features of the data and in turn the target classifier cannot benefit from increased perturbation. Due to this the predictions also shift to a majority vote for class and lower the test accuracy significantly. The VAE-LDP MI AP over is illustrated in Figure 4. Note that at (
) an outlier is present where the target model did not learn a continuous latent space and thus the reconstruction of records fromsuffered. However, the VAE-LDP results show similar trends as the above LDP results.
This section discusses the findings of this paper w.r.t. comparing the privacy-accuracy trade-off for differentially private VAE.
Image data yields higher MI attack performance than time-series data.
The reconstruction MI attack has been shown effective for image data in prior work [CYZF20, HHB19], despite being fairly simple and only taking one metric for disparate behaviour of the target model into consideration. This is in line with the identified gap in image reconstruction for LFW and were able to exploit the gap by using SSIM as a distance measure for reconstruction MI attack. For MS we were not able to identify measure that provides equal success. Since activity measurements exhibit many ambiguities the target model learns to reconstruct relative trends instead of concrete measurements that represent a specific movement. Therefore, the target model generalizes more and is less prone to MI attacks. Additionally, previous research [NSH19, SSSS17] has shown that large datasets with few classes are generally less vulnerable to MI attacks.
Small noise yields favorable relative privacy-accuracy trade-off for image data.
For CDP and image data we recommend using as little noise as possible. The relative accuracy drop for largely exceeds the performance loss for throughout the CDP experiments for LFW. This trend is illustrated in Figure 5 which highlights that the drop in target classifier test accuracy is always larger than the privacy gain by reduced MI AP. For MS the reconstruction MI attack only achieves a performance close to random guessing already against original data. Hence, small DP noise is already sufficient to push the MI AP to random guessing. This is reflected in Figure 5, where we see an optimal already for . Similarly for LDP Figures 5 and 5 show only few favorable for both datasets and settings. These few favorable trade-offs again indicate that differentially private image pixelization and the Laplace mechanism disproportionately harm model accuracy over protecting privacy. Compared to CDP, LDP shows better trade-offs for small privacy parameter. However, generally gives up more accuracy compared to the gain in privacy.
VAE-LDP outperforms LDP and CDP w.r.t. the relative privacy-accuracy trade-off.
In our experiments, the VAE-LDP yielded the best trade-off between target classifier test accuracy and MI AP. This finding is supported by depicted in Figures 5 and 5. We identified the interaction between the perturbation models, that retain essential image features, and the targeted classification task as primary reason for the superior trade-off. for the VAE-LDP experiments highlight that small noise bounds are protecting from the reconstruction MI attack. For larger noise bounds however only offers limited informative value since the MI AP pivots around random guessing while the target classifier test accuracy is bound by the overall classification baseline.
VAE are highly susceptible to noise introduced during training.
Our results indicate that CDP leads to a regularization effect and directly addresses a key driver for MI AP. However, CDP also required additional hyperparameter optimization and increases computational cost. LDP mechanisms consume information within the data to foster protection and hence the test accuracy decrease heavily depends on how the LDP mechanism alters the training data. For example, differentially private image pixelization damages the structures of images to preserve privacy. The more information is consumed by the LDP mechanism, the worse the target classifier test accuracy becomes. This effect is clearly visible for the MS dataset, where the decrease in target classifier accuracy is similar to the overall classification baseline. When this characteristic is present MI is affected mostly as a consequence of diminishing model performance. This is facilitated by the lack of regularization effect which keeps a present relative gap for the MI attacks to exploit. The VAE-LDP mechanism preserves essential features of the LFW dataset during perturbation. The preservation of essential features are beneficial to the overall classification task as the test accuracy remains high while the MI AP decreases.
7 Related Work
We discuss related work from three categories. First, we briefly discuss generative models and accuracy metrics for generative models. Second, we provide background on differential privacy in generative models. Third, we introduce related work on membership inference attacks against generative models.
Generative Adversarial Networks by Goodfellow et al. [GPM+14] represent an alternative to VAE. We focus on VAE since VAE in comparison to GAN were observed to be more prone to MI attacks [HHB19]. Salimans et al. [SGZ+16] introduce Inception Score to automatically evaluate the utility of sampled images from generative models. The main advantage of Inception Score over other metrics such as SSIM is the correlation with human judgements. However, Barrat et al. [BS18]
point out that Inception Score is foremost meaningful for the ImageNet dataset due to pre-training. Therefore, we consider the test accuracy of a target classifier to evaluate the VAE accuracy.
Torkzadehmahani et al. [TKP19] propose the DP-cGAN framework to generate differentially private data and labels. Similar to our work they train target classifiers on the generated data to evaluate model accuracy. We consider VAE with LDP and CDP. Jordon et al. [JYS19] extend the differentially private federated learning architecture PATE [PSM+18] to GAN. Similar to us, they analyze the accuracy of a target classifier for various privacy parameters, yet Jordon et al. do not discuss privacy aside from privacy parameter . Frigerio et al. [FOGD19] evaluate a CDP GAN for time series data also w.r.t. MI attacks. We also consider LDP and quantify the trade-off between privacy and accuracy. Takahashi [TTOK20] propose an enhanced version of the DP-SGD for VAE by adjusting the noise that is injected to the loss terms. We use DP-Adam where their improvement is not applicable.
Hayes et al. [HMDC19] propose the LOGAN framework for MI attacks against GAN under various assumptions for the knowledge of . For their black-box attacks they train a separate discriminator model to distinguish between members and non-members. In contrast, we consider statistical MI attack models, allowing for MI attacks against generative models without the need to train a separate attack model. Hilprecht et al. [HHB19] propose Monte-Carlo MI attacks against GAN and VAE. We use their reconstruction MI attack and are the first to consider this attack under differential privacy. Chen et al. [CYZF20] extend the reconstruction MI attack to a partial black-box setting where solely has access to the latent space but not the internal parameters of the generative model. Their attack composes different losses targeting various aspects of a model and takes the reconstruction as well as the latent representation into consideration. We ran all experiments within this paper also for their attack and the consideration of latent representation did lead to strictly weaker MI AP. The gradient matching attack of Zhu et al. [ZLH19] strives for reconstruction of training data from publicly available gradients. In contrast, we focus on the identification of training data.
We evaluated a validation framework for quantifying the relative privacy-accuracy trade-off for VAE. We used the framework to compare two LDP and one CDP mechanism for image and time series data w.r.t. their privacy-accuracy trade-off. In particular the LFW image recognition dataset was very susceptible to the reconstruction MI attack whereas the MotionSense activity recognition dataset with more records and less classes was mostly resistant to MI. The CDP mechanism offered a more consistent decrease of MI attack performance whereas the LDP mechanisms showed varying levels of protection depending on chosen privacy parameter and setting. The relative privacy-accuracy trade-off highlights that protection often comes at a disproportionately high accuracy cost.