Distributed generation of privacy preserving data with user customization

04/20/2019 ∙ by Xiao Chen, et al. ∙ 6

Distributed devices such as mobile phones can produce and store large amounts of data that can enhance machine learning models; however, this data may contain private information specific to the data owner that prevents the release of the data. We wish to reduce the correlation between user-specific private information and data while maintaining the useful information. Rather than learning a large model to achieve privatization from end to end, we introduce a decoupling of the creation of a latent representation and the privatization of data that allows user-specific privatization to occur in a distributed setting with limited computation and minimal disturbance on the utility of the data. We leverage a Variational Autoencoder (VAE) to create a compact latent representation of the data; however, the VAE remains fixed for all devices and all possible private labels. We then train a small generative filter to perturb the latent representation based on individual preferences regarding the private and utility information. The small filter is trained by utilizing a GAN-type robust optimization that can take place on a distributed device. We conduct experiments on three popular datasets: MNIST, UCI-Adult, and CelebA, and give a thorough evaluation including visualizing the geometry of the latent embeddings and estimating the empirical mutual information to show the effectiveness of our approach.



There are no comments yet.


page 2

page 7

page 18

page 19

page 20

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The success of machine learning algorithms relies on not only technical methodologies, but also the availability of large datasets such as images (Krizhevsky et al., 2012; Oshri et al., 2018)

; however, data can often contain sensitive information, such as race or age, that may hinder the owner’s ability to release the data to exploit its utility. We are interested in exploring methods of providing privatized data such that sensitive information cannot be easily inferred from the adversarial perspective, while preserving the utility of the dataset. In particular, we consider a setting where many participants are independently gathering data that will be collected by a third party. Each participant is incentivized to label their own data with useful information; however, they have the option to create private labels for information that they do not wish to share with the database. In the case where data contains a large amount of information such as images, there can be an overwhelming number of potential private and utility label combinations (skin color, age, gender, race, location, medical conditions, etc.). The large number of combinations prevents training a separate method to obscure each set of labels at a central location. Furthermore, when participants are collecting data on their personal devices such as mobile phones, they would like to remove private information before the data leaves their devices. Both the large number of personal label combinations coupled with the use of mobile devices requires a privacy scheme to be computationally efficient on a small device. In this paper, we propose a method of generating private datasets that makes use of a fixed encoding, thus requiring only a few small neural networks to be trained for each label combination. This approach allows data collecting participants to select any combination of private and utility labels and remove them from the data on their own mobile device before sending any information to a third party.

In the context of publishing datasets with privacy and utility guarantees, we briefly review a number of similar approaches that have been recently considered, and discuss why they are inadequate at performing in the distributed and customizable setting we have proposed. Traditional work in generating private datasets has made use of differential privacy (DP)(Dwork et al., 2006), which involves injecting certain random noise into the data to prevent the identification of sensitive information (Dwork et al., 2006; Dwork, 2011; Dwork et al., 2014)

. However, finding a globally optimal perturbation using DP may be too stringent of a privacy condition in many high dimensional data applications. Therefore, more recent literature describes research that commonly uses Autoencoders

(Kingma & Welling, 2013) to create a compact latent representation of the data, which does not contain private information, but does encode the useful information (Edwards & Storkey, 2015; Abadi & Andersen, 2016; Beutel et al., 2017; Madras et al., 2018; Song et al., 2018; Chen et al., 2018). A few papers combine strategies involving both DP and Autoencoders (Hamm, 2017; Liu et al., 2017); however, all of these recent strategies require training a separate Autoencoder for each possible combination of private and utility labels. Training an Autoencoder for each privacy combination can be computationally prohibitive, especially when working with high dimensional data or when computation must be done on a small local device such as a mobile phone. Therefore, such methods are unable to handle our proposed scenario where each participant must locally train a data generator that obscures their individual choice of private and utility labels. Additionally, reducing the computation and communication burden when dealing with distributed data is beneficial in many other potential applications such as federated learning (McMahan et al., 2016).

We verify our idea on three datasets. The first is the MNIST dataset (LeCun & Cortes, 2010) of handwritten digits, commonly used as a synthetic example in machine learning literature. We have two cases involving this dataset: In MNIST Case 1, we preserve information regarding whether the digit contains a circle (i.e. digits 0,6,8,9), but privatize the value of the digit itself. In MNIST Case 2

, we preserve information on the parity (even or odd digit), but privatize whether or not the digit is greater than or equal to 5. Figure 

1 shows a sample of the original dataset along with the same sample now perturbed to remove information on the digit identity, but maintain the digit as a circle containing digit. The input to the algorithm is the original dataset with labels, while the output is the perturbed data as shown. The second is the UCI-adult income dataset (Dua & Graff, 2017) that has 45222 anonymous adults from the 1994 US Census to predict whether an individual has an annual income over $50,000 or not. The third dataset is the CelebA dataset (Liu et al., 2015) containing color images of celebrity faces. For this realistic example, we preserve whether the celebrity is smiling, while privatizing many different labels (gender, age, etc.) independently to demonstrate our capability to privatize a wide variety of labels with a single latent representation.

Primarily, this paper introduces a decoupling of the creation of a latent representation and the privatization of data that allows the privatization to occur in a distributed setting with limited computation and minimal disturbance on the utility of the data. Additional contributions include a variant on the Autoencoder to improve robustness of the decoder for reconstructing perturbed data, and thorough investigations into: (i) the latent geometry of the data distribution before and after privatization, (ii) how training against a cross entropy loss adversary impacts the mutual information between the data and the private label, and (iii) the use of f-divergence based constraints in a robust optimization and their relationship to more common norm-ball based constraints and differential privacy. These supplemental investigations provide a deeper understanding of how our scheme is able to achieve privatization.

(a) Sample of original images
(b) Same images perturbed to privatize digit ID
Figure 1: Visualization of digits pre- and post-noise injection and adversarial training. We find that digit IDs are randomly switched while circle digits remain circle digits and non-circle digits remain as non-circle digits.

2 Problem Statement and Methodology

Inspired by a few recent studies (Louizos et al., 2015; Huang et al., 2017a; Madras et al., 2018; Chen et al., 2018)

, we consider the data privatization as a game between two players: the data generator (data owner) and the adversary (discriminator). The generator tries to inject noise that will privatize certain sensitive information contained in the data, while the adversary tries to infer this sensitive information from the data. In order to deal with high dimensional data, we first learn a latent representation or manifold of the data distribution, and then inject noise with specific latent features to reduce the correlation between released data and sensitive information. After the noise has been added to the latent vector, the data can be reconstructed and published without fear of releasing sensitive information. To summarize, the input to our system is the original dataset with both useful and private labels, and the output is a perturbed dataset that has reduced statistical correlation with the private labels, but has maintained information related to the useful labels.

We consider the general setting where a data owner holds a dataset that consists of original data , private/sensitive labels , and useful labels . Thus, each sample has a record . We denote the function as a general mechanism for the data owner to release the data. The released data is denoted as . Because is private information, it won’t be released. Thus, for the record , the released data can be described as We simplify the problem by considering only the case that 111We maintain the to be unchanged for the following description. The corresponding perturbed data and utility attributes are published for use. The adversary builds a learning algorithm to infer the sensitive information given the released data, i.e. where is the estimate of . The goal of the adversary is to minimize the inference loss on the private labels. Similarly, we denote the estimate of utility labels as

. We quantify the utility of the released data through another loss function

that captures the utility attributes, i.e. . The data owner wants to maximize the loss that the adversary experiences to protect the sensitive information while maintaining the data utility by minimizing the utility loss. Given the previous settings, the data-releasing game can be expressed as follows:

where is a hyper-parameter weighing the trade-off between different losses, and the expectation is taken over all samples from the dataset. The loss functions in this game are flexible and can be tailored to a specific metric that the data owner is interested in. For example, a typical loss function for classification problems is cross-entropy loss (De Boer et al., 2005). Because optimizing over the functions is hard to implement, we use a variant of the min-max game that leverages neural networks to approximate the functions. The foundation of our approach is to construct a good posterior approximation for the data distribution in latent space , and then to inject context aware noise through a filter in the latent space, and finally to run the adversarial training to achieve convergence, as illustrated in figure 2. Specifically, we consider the data owner playing the generator role that comprises a Variational Autoencoder (VAE) (Kingma & Welling, 2013) structure with an additional noise injection filter in the latent space. We use , , and

to denote the parameters of neural networks that represent the data owner (generator), adversary (discriminator), and utility learner (util-classifier), respectively. Moreover, the parameters of generator

consists of the encoder parameters , decoder parameters , and filter parameters . The encoder and decoder parameters are trained independently of the privatization process and left fixed. Hence, we have

where is a distance or divergence measure, and is the corresponding distortion budget.

Figure 2: Privatization architecture. We decompose the whole privatization procedure into two steps: 1) training an encoder and decoder; 2) learning a generative filter.

In principle, we perform the following three steps to complete each experiment.

1) Train a VAE for the generator without noise injection or min-max adversarial training. Rather than imposing the distortion budget at the beginning, we first train the following objective


where the posterior distribution is characterized by an encoding network , and is similarly the decoding network . The distribution

, is a prior distribution that is usually chosen to be a multivariate Gaussian distribution for reparameterization purposes

(Kingma & Welling, 2013). When dealing with high dimensional data, we develop a variant of the preceding objective that captures three items: the reconstruction loss, KL divergence on latent representations, and improved robustness of the decoder network to perturbations in the latent space (as shown in equation (6)). We discuss more details of training a VAE and its variant in section 6.1.

2) Formulate a robust optimization to run the min-max GAN-type training (Goodfellow et al., 2014)

with the noise injection, which comprises a linear or nonlinear filter

222The filter can be a small neural network., while freezing the weights of the encoder and decoder. In this phase, we instantiate several divergence metrics and various distortion budgets to run our experiments (details in section 6.2). When we fix the encoder, the latent variable (or or for short), and the new altered latent representation can be expressed as , where represents a filter function (or for short). The classifiers and can take the latent vector as input when training the filter to reduce the computational burden as is done in our experiments. We focus on a canonical form for the adversarial training and cast our problem into the following robust optimization problem:


where is the -divergence in this case. We disclose some connections between the divergence based constraints and norm-ball perturbation based constraints in section 6.2. We also discuss the ability to incorporate differential privacy mechanisms, which we use as a baseline, through the constrained optimization as well as its distinctions in section 6.3 of the appendix.

3) Learn the adaptive classifiers for both the adversary and utility according to (or if the classifier takes the latent vector as input), where the perturbed data (or ) is generated based on the trained generator. We validate the performance of our approach by comparing metrics such as classification accuracy and empirical mutual information. Furthermore, we visualize the geometry of the latent representations, e.g. figure 4, to give intuitions behind how our framework achieves privacy.

3 Experiments and Results

We present the results of experiments on the datasets MNIST, UCI-adult, and CelebA to demonstrate the effectiveness of our approach.

MNIST Case 1: We consider the digit number itself as the private attribute and the digit containing a circle or not as the utility attribute. Figure 1 shows samples of this case as introduced before. Specific classification results before and after privatization are given in the form of confusion matrices in figures 2(a) and 2(b), demonstrating a significant reduction in private label classification accuracy. These results are supported by our illustrations of the latent space geometry in Figure 4 via uniform manifold approximation and projection (UMAP) (McInnes et al., 2018). Specifically, figure 3(b) shows a clear separation between circle digits (on the right) and non-circle digits (on the left). We also investigate the sensitivity of classification accuracy for both labels with respect to the distortion budget (for KL-divergence) in Figure 2(c), demonstrating that increasing the distortion budget rapidly decreases the private label accuracy while maintaining the utility label accuracy. We also compare these results to a baseline method based on differential privacy, (an additive Gaussian mechanism discussed in section 6.3), and we find that this additive Gaussian mechanism performs worse than our generative adversarial filter in terms of keeping the utility and protecting the private labels because the Gaussian mechanism yields lower utility and worse privacy (i.e. higher prediction accuracy of private labels) than the min-max generative filter approach.

(a) raw
(b) privatized
(c) classification accuracy
Figure 3: Classifying digits in MNIST. Original digits can be easily classified with more than 90% accuracy on average, yet the new perturbed digits have a significantly lower accuracy as expected. Specifically, many circle digits are incorrectly classified as other circle digits and similarly for the non-circle digits. Figure 2(c) demonstrates that classification accuracy on the private label decreases quickly while classification on the utility label remains nearly constant as the distortion budget increases. Furthermore, our approach is superior to the baseline Gaussian mechanism based on differential privacy.

MNIST Case 2: This case has the same setting as the experiment given in Rezaei et al. (2018) where we consider odd or even digits as the target utility and large or small value () as the private label. Rather than training a generator based on a fixed classifier, as done in Rezaei et al. (2018), we take a different modeling and evaluation approach that allows the adversarial classifier to update dynamically. We find that the classification accuracy of the private attribute drops down from 95% to 65% as the distortion budget grows. Meanwhile our generative filter doesn’t deteriorate the target utility too much, which maintains a classification accuracy above 87% for the utility label as the distortion increases, as shown in figure 6(b). We discuss more results in the appendix section 6.6, together with results verifying the reduction of mutual information between the data and the private labels.

(a) raw
(b) moderate perturbation
(c) heavy perturbation
Figure 4: Visualization of the latent geometry. The original embedding in figure 3(a) is clearly segmented into individual clusters for each digit; however, when we allow a distortion budget of , as shown in figure 3(b), the digits are separated according to the circle or non-circle property by the gap between the left and right clouds of points. A larger distortion budget nests all samples close together with some maintained local clusters as seen in figure 3(c).

While using the privatization scheme from case 2, we measure the classification accuracy of the circle attribute from case 1. This tests how the distortion budget prevents excessive loss of information of non-target attributes. The circle attribute from case 1 is not included in the loss function when training for case 2; however, as seen in Table 1, the classification accuracy on the circle is not more diminished than the target attribute (odd). A more detailed plot of the classification accuracy can be found in figure 6(c) in the appendix section 6.6. This demonstrates that the privatized data maintains utility beyond the predefined utility labels used in training.

Data Private attr. Utility attr. Non-target attr.
emb-raw 0.951 0.952 0.951
emb-g-filter 0.687 0.899 0.9
Table 1: Accuracy of private label (), target label (odd), and non-target label (circle) for MNIST dataset. The raw embedding yield by VAE is denoted as emb-raw. The embedding yield from generative filter is denoted as emb-g-filter.

UCI-Adult: We conduct the experiment by setting the private label to be gender and the utility label to be income. All the data is preprocessed to binary values for the ease of training. We compare our method with the models of Variational Fair AutoEncoder (VFAE)(Louizos et al., 2015) and Lagrangian Mutual Information-based Fair Representations (LMIFR) (Song et al., 2018)

to show the performance. The corresponding accuracy and area-under receiver operating characteristic curve (AUROC) of classifying private label and utility label are shown in Table 

2. Our method has the lowest accuracy and the smallest AUROC on the privatized gender attribute. Although our method doesn’t perform best on classifying the utility label, it still achieves comparable results in terms of both accuracy and AUROC which are described in Table 2.

Model Private attr. Utility attr.
acc. auroc. acc. auroc.
VAE (Kingma & Welling, 2013) 0.850 0.007 0.843 0.004 0.837 0.009 0.755 0.005
VFAE (Louizos et al., 2015) 0.802 0.009 0.703 0.013 0.851 0.004 0.761 0.011
LMIFR (Song et al., 2018) 0.728 0.014 0.659 0.012 0.829 0.009 0.741 0.013
Ours (w. generative filter) 0.717 0.006 0.632 0.011 0.822 0.005 0.731 0.015
Table 2: Accuracy (acc.) and Area-Under-ROC (auroc.) of private label (gender) and target label (income) for UCI-Adult dataset.

CelebA: For the CelebA dataset, we consider the case when there exists many private and utility label combinations depending on the user’s preferences. Specifically, we experiment on the private labels gender (male), age (young), attractive, eyeglasses, big nose, big lips, high cheekbones, or wavy hair, and we set the utility label as smiling for each private label to simplify the experiment. Table 3 shows the classification results of multiple approaches. Our trained generative adversarial filter reduces the accuracy down to 73% on average, which is only 6% more than the worst case accuracy demonstrating the ability to protect the private attributes. Meanwhile, we only sacrifice a small amount of the utility accuracy (3% drop), which ensures that the privatized data can still serve the desired classification tasks (All details are summarized in Table 3). We show samples of the gender-privatized images in Figure 5, which indicates the desired phenomenon that some female images are switched into male images and some male images are changed into female images. More example images on other privatized attributes can be found in appendix section 6.7.

(a) raw
(b) same samples gender privatized
Figure 5: Sampled images. We try to protect gender as the private attribute and keep the expression of smiling as the utility. We switch some female faces to males and also some males to females while preserving the celebrity’s smile. The blurriness of the privatized images is due to the compactness of the latent representation that is generated from the VAE model and not from our privatization scheme. More details can be found in figure 10 in section 6.7.

Private attr. Utility attr. Male Young Attractive H. Cheekbones B. lips B. nose Eyeglasses W. Hair Avg Smiling Liu et al. (2015) 0.98 0.87 0.81 0.87 0.68 0.78 0.99 0.80 0.84 0.92 Torfason et al. (2016) 0.98 0.89 0.82 0.87 0.73 0.83 0.99 0.84 0.87 0.93 VAE-emb 0.90 0.84 0.80 0.85 0.68 0.78 0.98 0.78 0.83 0.86 Random-guess 0.51 0.76 0.54 0.54 0.67 0.77 0.93 0.63 0.67 0.50 VAE-g-filter 0.61 0.76 0.62 0.78 0.67 0.78 0.93 0.66 0.73 0.83

Table 3: Classification accuracy on CelebA. The row VAE-emb is our initial classification accuracy on the latent vectors from our trained encoder. The row Random-guess demonstrates the worst possible classification accuracy. The row VAE-g-filter is the classification accuracy on the perturbed latent vectors yielded by our generative adversarial filters. The state-of-the-art classifier (Torfason et al., 2016) can achieve 87% accuracy on the listed private labels on average while our trained VAE can achieve a comparable accuracy (83% on average) on the more compact latent representations. More importantly, our trained generative adversarial filter can reduce the accuracy down to 73% on average, which is only 6% more than the worst case accuracy demonstrating the ability to protect the private attributes. Meanwhile, we only sacrifice a small amount of the utility accuracy (3% drop), which ensures that the privatized data can still serve the desired classification tasks

4 Discussion

In order to clarify how our scheme can be run in a local and distributed fashion, we perform a basic experiment with 2 independent users to demonstrate this capability. The first user adopts the label of digit as private and odd or even as the utility label. The second user prefers the opposite and wishes to privatize odd or even and maintain as the utility label. We first partition the MNIST dataset into 10 equal parts where the first part belongs to one user and the second part belongs to the other user. The final eight parts have already been made suitable for public use either through privatization or because it does not contain information that their original owners have considered sensitive. It is then encoded into their 10 dimensional representations and passed onto the two users for the purpose of training an appropriate classifier rather than one trained on a single user’s biased dataset. Since the data is already encoded into its representation, the footprint is very small when training the classifiers. Then, the generative filter for each user is trained separately and only on the single partition of personal data. Meanwhile, the adversarial and utility classifiers for each user are trained separately and only on the 8 parts of public data combined with the one part of personal data. The final result is 2 generative filters, one for each user, which corresponds to their own choice of private and utility labels. After the generative filters privatize their data, we can evaluate the classification accuracy on the private and utility labels as measured by adversaries trained on the full privatized dataset. Table 4 demonstrates the classification accuracy on the two users privatized data. This shows how multiple generative filters can be trained independently to successfully privatize small subsets of data.

Classifier type User 1 (privatize ) User 2 (privatize odd)
Private attr. 0.679 0.651
Utility attr. 0.896 0.855
Table 4: Accuracy of adversarial classifiers on two users private labels

We notice that our classification results on utility label (smiling) in the CelebA experiment performs worse than the state of the art classifiers presented in Liu et al. (2015) and Torfason et al. (2016). However, the main purpose of our approach is not building the best image classifier. Instead of constructing a large feature vector (through convolution , pooling, and non-linear activation operations), we compress a face image down to a 50-dimension vector as the embedding333

We use a VAE type architecture to compress the image down to a 100 dimensional vector, then enforce the first 50 dimensions as the mean and the second 50 dimensions as the variance

. We make the perturbation through a generative filter to yield a vector with the same dimensions. Finally, we construct a neural network with two fully-connected layers and an elu activation after the first layer to perform the classification task. We believe the deficit of the accuracy is due to the compact dimensionality of the representations and the simplified structure of the classifiers. We expect that a more powerful state of the art classifier trained on the released private images will still demonstrate the decreased accuracy on the private labels compared to the original non private images while maintaining higher accuracy on the utility labels.

We also discover an interesting connection between our idea of decorrelating the released data with sensitive attributes and the notion of learning fair representations(Zemel et al., 2013; Louizos et al., 2015; Madras et al., 2018)

. In fairness learning, the notion of demographic parity requires the prediction of a target output to be independent with respect to certain protected attributes. We find our generative filter could be used to produce a fair representation after privatizing the raw embedding, which shares the similar idea of demographic parity. Other notions in fairness learning such as equal odds and equal opportunity

(Hardt et al., 2016) will be left for future work.

5 Conclusion

In this paper, we propose an architecture for privatizing data while maintaining the utility that decouples for use in a distributed manner. Rather than training a very deep neural network or imposing a particular discriminator to judge real or fake images, we rely on a pretrained VAE that can create a comprehensive low dimensional representation from the raw data. We then find smart perturbations in the latent space according to customized requirements (e.g. various choices of the private label and utility label), using a robust optimization approach. Such an architecture and procedure enables small devices such as mobile phones or home assistants (e.g. Google home mini) to run a light weight learning algorithm to privatize data under various settings on the fly.


6 Appendix

6.1 VAE training

A variational autoencoder is a generative model defining a joint probability distribution between a latent variable

and original input . We assume the data is generated from a parametric distribution that depends on latent variable , where are the parameters of a neural network, which usually is a decoder net. Maximizing the marginal likelihood directly is usually intractable. Thus, we use the variational inference method proposed by Kingma & Welling (2013) to optimize over an alternative distribution with an additional KL divergence term , where the are parameters of a neural net and is an assumed prior over the latent space. The resulting cost function is often called evidence lower bound (ELBO)

Maximizing the ELBO is implicitly maximizing the log-likelihood of . The negative objective (also known as negative ELBO) can be interpreted as minimizing the reconstruction loss of a probabilistic autoencoder and regularizing the posterior distribution towards a prior. Although the loss of the VAE is mentioned in many studies (Kingma & Welling, 2013; Louizos et al., 2015), we include the derivation of the following relationships for completeness of the context

The Evidence lower bound (ELBO) for any distribution

where (i) holds because we treat encoder net as the distribution . By placing the corresponding parameters of the encoder and decoder networks, and the negative sign on the ELBO expression, we get the loss function equation (1). The architecture of the encoder and decoder for the MNIST experiments is explained in section 6.5.

In our experiments with the MNIST dataset, the negative ELBO objective works well because each pixel value (0 black or 1 white) is generated from a Bernoulli distribution. However, in the experiments of CelebA, we change the reconstruction loss into the

norm of the difference between raw and reconstructed samples because the RGB pixels are not Bernoulli random variables. We still add the regularization KL term as follows

Throughout the experiments we use a Gaussian as the prior , is sampled from the data, and is a hyper-parameter. The reconstruction loss uses the norm by default, although, the norm is acceptable too.

When training the VAE, we additionally ensure that small perturbations in the latent space will not yield huge deviations in the reconstructed space. More specifically, we denote the encoder and decoder to be and respectively. The generator can be considered as a composition of an encoding and decoding process, i.e. = , where we ignore the inputs here for the purpose of simplifying the explanation. One desired intuitive property for the decoder is to maintain that small changes in the input latent space still produce plausible faces similar to the original latent space when reconstructed. Thus, we would like to impose some Lipschitz continuity property on the decoder, i.e. for two points , we assume where is some Lipschitz constant (or equivalently ). In the implementation of our experiments, the gradient for each batch (with size ) is

It is worth noticing that , because = 0 when . Thus, we define as the batched latent input, , and use to avoid the iterative calculation of each gradient within a batch. The loss used for training the VAE is modified to be


where are samples drawn from the image data, , and are hyper-parameters, and means . Finally, is defined by sampling , , and , and returning . We optimize over to ensure that points in between the prior and learned latent distribution maintain a similar reconstruction to points within the learned distribution. This gives us the Lipschitz continuity properties we desire for points perturbed outside of the original learned latent distribution.

6.2 Robust Optimization & adversarial training

In this part, we formulate the generator training as a robust optimization. Essentially, the generator is trying to construct a new latent distribution that reduces the correlation between data samples and sensitive labels while maintaining the correlation with utility labels by leveraging the appropriate generative filters. The new latent distribution, however, cannot deviate from the original distribution too much (bounded by ) to maintain the general quality of the reconstructed images. To simplify the notation, we will use for the classifier (similar notion applies to or ). We also consider the triplet as the input data, where is the perturbed version of the original embedding , which is the latent representation of image . The values and are the sensitive label and utility label respectively. Without loss of generality, we succinctly express the loss as (similarly expressing as ). We assume the sample input follows the distribution that needs to be determined. Thus, the robust optimization is


where is the distribution of the raw embedding (also known as ). The -divergence (Nguyen et al., 2010; Cichocki & Amari, 2010) between and is (assuming and are absolutely continuous with respect to measure ). A few typical divergences (Nowozin et al., 2016), depending on the choices of , are

  1. KL-divergence , by taking

  2. reverse KL-divergence , by taking

  3. -divergence , by taking .

In the remainder of this section, we focus on the KL and divergence to build a connection between the divergence based constraint we use and some norm-ball based constraint seen in Wong & Kolter (2018); Madry et al. (2017); Koh et al. (2018), and Papernot et al. (2017).

6.2.1 Extension to multivariate Gaussian

When we train the VAE in the beginning, we impose the distribution to be a multivariate Gaussian by penalizing the KL-divergence between

and a prior normal distribution

, where

is the identity matrix. Without loss of generality, we can assume the raw encoded variable

follows a distribution that is the Gaussian distribution (more precisely , where the mean and variance depends on samples , but we suppress the to simplify notation). The new perturbed distribution is then also a Gaussian . Thus, the constraint for the KL divergence becomes

If we further consider the case that , then


When , the preceding constraint is equivalent to . It is worth mentioning that such a divergence based constraint is also connected to the norm-ball based constraint on samples. When the variance of is not larger than the total variance of , denoted as , it can be discovered that

where (i) uses the assumption . This implies that the norm-ball (-norm) based perturbation constraint automatically satisfies . In the other case where ; constraint satisfaction is switched, so the constraint automatically satisfies .

In the case of -divergence,

When , we have the following simplified expression

where . Letting indicates . Since the value of is always non negative as a norm, . Thus, we have . Therefore, when the divergence constraint is satisfied, we will have , which is similar to equation (10) with a new constant for the distortion budget.

We make use of these relationships in our implementation as follows. We define and to be functions that split the last layer of the output of the encoder part of the pretrained VAE, , and take the first half portion as the mean of the latent distribution. We let be a function that takes the second half portion to be the (diagonal) of the variance of the latent distribution. Then, our implementation of is equivalent to . As shown in the previous two derivations, optimizing over this constraint is similar to optimizing over the defined KL and -divergence constraints.

6.3 Comparison with Differential Privacy

The basic intuition behind (differential) privacy is that it is very hard to tell whether the released sample originated from raw sample or , thus, protecting the privacy of the raw samples. Designing such a releasing scheme, which is also often called a channel (or mechanism), usually requires some randomized response or noise perturbation. The goal of such a releasing scheme does not necessarily involve reducing the correlation between released data and sensitive labels. Yet, it is worth investigating the notion of differential privacy and comparing the empirical performance of a typical scheme to our setting due to its prevalence in traditional privacy literature. In this section, we use the words channel, scheme, mechanism, and filter interchangeably as they have the same meaning in our context. Also, we overload the notation of and because of the conventions in differential privacy literature.

Definition 1.

-differential privacy (Dwork et al., 2014) Let . A channel from space to output space is differentially private if for all measurable sets and all neighboring samples and ,


An alternative way to interpret this definition is that with high probability , we have the bounded likelihood ratio (The likelihood ratio is close to 1 as goes to )444 Alternatively, we may express it as the probability . Consequently, it is difficult to tell if the observation is from or if the ratio is close to 1. In the ongoing discussion, we consider the classical Gaussian mechanism , where is some function (or query) that is defined on the latent space and . We first include a theorem from Dwork et al. (2014) to disclose how differential privacy using the Gaussian mechanism can be satisfied by our baseline implementation and constraints in the robust optimization formulation. We denote a pair of neighboring inputs as for abbreviation.

Theorem 6.1.

Dwork et al. (2014) Theorem A.1 For any , the Gaussian mechanism with parameter is -differential private, where denotes the -sensitivity of .

In our baseline implementation of the Gaussian mechanism, we apply the additive Gaussian noise on the samples directly to yield


(i.e. is the identity mapping). If , we have . Therefore, considering KL-divergence as an example, we have


where (i) assumes follows the diagonal structure as it does in the VAE setting. Since (by theorem 6.1) guarantees the Gaussian mechanism is differentially private, we can proof that the KL-distortion budget allows for differential privacy through the following proposition:

Proposition 1.

The additive Gaussian mechanism with the adjustable variance is -differentially private, when the distortion budget satisfies


where , and is the diagonal elements of the sample covariance .

The proof of this proposition immediately follows from using theorem 6.1 on equation (16). This relationship is intuitive as increasing the budget of divergence will implicitly allow larger variance of the additive noise . Thus it becomes difficult to distinguish whether originated from or in latent space. The above example is just one complementary case of how the basic Gaussian mechanism falls into our divergence constraints in the context of differential privacy. There are many other potential extensions such as obfuscating the gradient, or a randomized response to generate new samples.

In our implementation, the noise is generated from where is learned through the back propagation of the robust optimization loss function subject to with a prescribed .

6.4 Why does cross-entropy loss work

In this section, we build the connection between cross-entropy loss and mutual information to give intuition for why maximizing the cross-entropy loss in our optimization reduces the correlation between released data and sensitive labels. Given that the encoder part of the VAE is fixed, we focus on the latent variable for the remaining discussion in this section.

The mutual information between latent variable and sensitive label can be expressed as follows


where is the data distribution, and

is the approximated distribution. which is similar to the last logit layer of a neural network classifier. Then, the term

is the cross-entropy loss (The corresponding negative log-likelihood is ). In classification problems, minimizing the cross-entropy loss enlarges the value of . Consequently, this pushes the lower bound of in equation (22) as high as possible, indicating high mutual information.

However, in our robust optimization, we maximize the cross-entropy, thus, decreasing the value of (More specifically, it is , given the mutual information we care about is between the new representation and sensitive label in our application). Thus, the bound of equation (22) has a lower value which indicates the mutual information can be lower than before. Such observations can also be supported by the empirical results of mutual information shown in figure 9.

6.5 Experiment Details

6.5.1 VAE architecture

The MNIST dataset contains 60000 samples of gray-scale handwritten digits with size 28-by-28 pixels in the training set, and 10000 samples in the testing set. When running experiments on MNIST, we convert the images into 784 dimensional vectors, and construct a network with the following structure for the VAE:

The aligned CelebA dataset contains 202599 samples. We crop each image down to 64-by-64 pixels with 3 color (RGB) channels and pick the first 182000 samples as the training set, and leave the remainder as the testing set. The encoder and decoder architecture for CelebA experiments are described in Table 5 and Table 6.

We use Adam (Kingma & Ba, 2014) to optimize the network parameters throughout all training procedures, with a batch size equal 100 and 24 for MNIST and CelebA respectively.

Name Configuration Replication
initial layer

conv2d=(3, 3), stride=(1, 1),

padding=(1, 1), channel in = 3, channel out =
dense block1

batch norm, relu, conv2d=(1, 1), stride=(1, 1),

batch norm, relu, conv2d=(3, 3), stride=(1, 1), 12
growth rate = , channel in = 2
transition block1 batch norm, relu, 1
conv2d=(1, 1), stride=(1, 1), average pooling=(2, 2),
channel in = ,
channel out =
dense block2 batch norm, relu, conv2d=(1, 1), stride=(1, 1), 12
batch norm, relu, conv2d=(3, 3), stride=(1, 1),
growth rate=, channel in = ,
transition block2 batch norm, relu, 1
conv2d=(1, 1), stride=(1, 1), average pooling=(2, 2),
channel in =
channel out=
dense block3 batch norm, relu, conv2d=(1, 1), stride=(1, 1), 12
batch norm, relu, conv2d=(3, 3), stride=(1, 1),
growth rate = , channel in =
transition block3 batch norm, relu, 1
conv2d=(1, 1), stride=(1, 1), average pooling=(2, 2),
channel in =
channel out =
output layer batch norm, fully connected 100 1
Table 5: Encoder Architecture in CelebA experiments. We use the DenseNet (Huang et al., 2017b) architecture with a slight modification to embed the raw images into a compact latent vector, with growth rate and depth
Name Configuration Replication
initial layer fully connected 4096 1
reshape block resize 4096 to 1
deccode block conv transpose=(3, 3), stride=(2, 2), 4
padding=(1, 1), outpadding=(1, 1),
relu, batch norm
decoder block conv transpose=(5, 5), stride=(1, 1), 1
padding=(2, 2)
Table 6: Decoder Architecture in CelebA experiments.

6.5.2 Filter Architecture

We use a generative linear filter throughout our experiments. In the MNIST experiments, we compressed the latent embedding down to a 10-dim vector. For MNIST Case 1, we use a 10-dim Gaussian random vector concatenated with a 10-dim one-hot vector representing digit id labels, where and . We use the linear filter to ingest the concatenated vector, and add the corresponding output to the original embedding vector to yield . Thus the mechanism is


where is a matrix. For MNIST Case 2, we use a similar procedure except the private label is a binary label (i.e. digit value or not). Thus, the corresponding one-hot vector is 2-dimensional. As we keep to be a 10-dimensional vector, the corresponding linear filter is a matrix in .

In the experiment of CelebA, we create the generative filter following the same methodology in equation (23), with some changes on the dimensions of and . (i.e. and )

6.5.3 Adversarial classifiers

In the MNIST experiments, we use a small architecture consisting of neural networks with two fully-connected layers and an exponential linear unit (ELU) to serve as the privacy classifiers, respectively. The specific structure of the classifier is depicted as follows:

In the CelebA experiments, we construct a two-layered neural network that is shown as follows:

The classifiers ingest the embedded vectors and output unnormalized logits for the private label or utility label. The classification results can be found in table 3.

6.6 More results of MNIST experiments

In this section, we illustrate detailed results for the MNIST experiment when we set whether the digit is odd or even as the utility label, and the whether the digit is greater than or equal to 5 as the private label. We first show samples of raw images and privatized images in figure 6. We show the classification accuracy and its sensitivity in figure 7. Furthermore, we display the geometry of the latent space in figure 8.

In addition to the classification accuracy, we evaluate the mutual information, to justify our generative filter indeed decreases the correlation between released data and private labels, as shown in figure 9.

6.6.1 Utility of Odd or Even

We present some examples of digits when the utility is odd or even number in figure 6

. The confusion matrix in figure 

6(a) shows that false positive rate and false negative rate are almost equivalent, indicating the perturbation resulted from filter doesn’t necessarily favor one type (pure positive or negative) of samples. Figure 6(b) shows that the generative filter, learned through minmax robust optimization, outperforms the Gaussian mechanism under the same distortion budget. The Gaussian mechanism reduces the accuracy of both private and utility labels, while the generative filter can maintain the accuracy of the utility while decreasing the accuracy of the private label, as the distortion budget goes up.

Furthermore, the distortion budget prevents the generative filter from distorting non-target attributes too severely. This allows the data to maintain some information even if it is not specified in the filter’s loss function. Figure 6(c) shows the accuracy with the added non-target label of circle from MNIST case 1.

(a) Sample of original digits
(b) Same images large-valued digits privatized
Figure 6: MNIST case 2: Visualization of digits pre and post-noise injection and adversarial training. We discover that some large-valued digits () are randomly switched to low-valued () digits (or vice versa) while some even digits