Log In Sign Up

Protecting Sensitive Attributes via Generative Adversarial Networks

Recent advances in computing have allowed for the possibility to collect large amounts of data on personal activities and private living spaces. Collecting and publishing a dataset in this environment can cause concerns over privacy of the individuals in the dataset. In this paper we examine these privacy concerns. In particular, given a target application, how can we mask sensitive attributes in the data while preserving the utility of the data in that target application. Our focus is on protecting attributes that are hidden and can be inferred from the data by machine learning algorithms. We propose a generic framework that (1) removes the knowledge useful for inferring sensitive information, but (2) preserves the knowledge relevant to a given target application. We use deep neural networks and generative adversarial networks (GAN) to create privacy-preserving perturbations. Our noise-generating network is compact and efficient for running on mobile devices. Through extensive experiments, we show that our method outperforms conventional methods in effectively hiding the sensitive attributes while guaranteeing high performance for the target application. Our results hold for new neural network architectures, not seen before during training and are suitable for training new classifiers.


Adversarial representation learning for synthetic replacement of private attributes

The collection of large datasets allows for advanced analytics that can ...

AnomiGAN: Generative adversarial networks for anonymizing private medical data

Typical personal medical data contains sensitive information about indiv...

Adversarial representation learning for private speech generation

As more and more data is collected in various settings across organizati...

Subverting Privacy-Preserving GANs: Hiding Secrets in Sanitized Images

Unprecedented data collection and sharing have exacerbated privacy conce...

Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration

The proliferation of big data has brought an urgent demand for privacy-p...

GaitPrivacyON: Privacy-Preserving Mobile Gait Biometrics using Unsupervised Learning

Numerous studies in the literature have already shown the potential of b...

Generating Optimal Privacy-Protection Mechanisms via Machine Learning

We consider the problem of obfuscating sensitive information while prese...

1. Introduction

Figure 1. Using our method (PR-GAN), Alice sends anonymized data (denoted by blurred faces) to a server outside, making sure her identity is not revealed but her gender is visible. When an adversary intercepts her message, he cannot de-anonymize her identity using a classifier. However, the adversary can reveal Bob’s identity as he uses a naive noise generation method111Graphics are partially obtained from Twitter open-source emoji repository:

In recent years, we have witnessed an explosive growth in the use of data-driven techniques in every aspect of our lives. Massive amounts of data are collected and processed, to predict consumers’ behavior, to improve an airport’s safety measures, or to make energy delivery to buildings more efficient. While we celebrate the convenience brought to us by these technologies, the collected data can often be personal information and contain attributes that could be extremely sensitive. With the recent breakthroughs in machine learning, datasets that look innocent can be used to reveal sensitive information. For example, one can infer the number of people talking, their identity and social relationships and the type of environment based solely on background noise (42; 28; 30; 31; 29). Data from electricity meters can reveal sensitive information such as the average household incomes and the occupants’ age distributions (43). GPS trajectories can reveal social ties (40; 13) and are very unique for each person (10). Using WiFi scans, one can discover occupancy patterns of private households (20). The collection and usage of such data raises important questions of ethics and privacy. It is thus crucial that we try to find the correct balance between using data to improve the quality of our lives while making sure sensitive information is effectively protected.

The most used notion of privacy has been Differential Privacy (DP) (11). Consider a database in which each entry is a piece of sensitive data (e.g., a patient’s medical history) which cannot be released, but aggregated queries on these entries are considered non-sensitive and allowed. More specifically, consider a database and a clone of it, , that lacks only one record

. If the answer to a query to the two databases are almost indistinguishable (with probability

), there is a good chance we cannot infer whether was in the database or not. This is termed -differential privacy. Despite DP’s flexibility in providing a privacy guarantee in different applications, it can sometimes be too restrictive (3) and in some applications can hurt the utility of the data as a result (15).

There are other metrics for privacy, which are suited for well-structured (e.g. relational) data, such as -anonymity (38) among a multitude of others. These criteria attempt to offer guarantees about the ability of an attacker to easily recognize a certain record within a database. All of these techniques, however, rely heavily on a priori knowledge of which features in the data are either sensitive themselves or can be linked to sensitive attributes. In well-structured data, such as a patient’s medical record, we often have a clear idea of which attributes are sensitive, such as the patient’s name, address or other personally identifying information. This is a key distinction from our work as we focus mainly on unstructured

data, such as images or binary vectors, where sensitive attributes are not known beforehand and has to be automatically discovered and erased.

In this paper, we argue that in certain application scenarios, we are able to perform a more targeted privacy protection which takes into account the content of data. We notice that it is not enough to remove or perturb only the sensitive attributes to guarantee privacy, as sensitive information may be embedded in multiple attributes and may be learned through a classifier. We introduce a procedure called PR-GAN: the user may specify a function (such as a classifier) that predicts target attributes and a function that predicts sensitive attributes. We then perturb the input data in a way that continues performing well while the performance of is severely impaired. The function describes data utility and function characterizes the privacy concern.

Consider a case where volunteers want to contribute their photos to train a classifier for a given task such as gender identification, but they do not want their identities to be revealed in case the data ends up in the wrong hands. Since the organization that keeps the data may not be trusted to properly address the privacy concerns, a better approach is ensuring privacy from the source. Figure 1 shows such a case where Alice and Bob want to share their photos with an external server and they use perturbations to conceal their identity. An adversary, using an auxiliary dataset has trained an identity-revealing model. He then intercepts Bob’s message midway and reveals his identity using this model. Our goal is that if Alice’s photo, perturbed by our method, is intercepted, the adversary won’t succeed in revealing her identity. Keep in mind that in order for their photos to be useful, Alice and Bob both have to make sure that their photos have valuable utility to train a gender identification model.

To produce the tailored perturbation, we use Generative Adversarial Networks (GAN) (16). A standard GAN is composed of two neural networks contesting with each other in a zero-sum game framework. More specifically, GAN simultaneously trains two models: a generative model and a discriminative model . and take turns in training, where tries to maximize the probability of making a mistake while tries to predict whether an input data is fake (produced by ) as accurately as possible. After training, the generator produces artificial data that looks indistinguishable from the original data by the discriminator . GAN has achieved visually appealing results in image generation (4) with the quality of the synthesis improved by the year (19; 46). Although the training of the noise generator network () might be expensive, the network itself can be very compact and, as shown in Section 5.3, suitable for running on personal devices. One can first train it with as much computational resources as required, then deploy it to remote devices. Once there, it can be used to anonymize data from source. This procedure is depicted in Figure 2.

Figure 2. Left: the noise generator network is trained on a server, with enough computational resources, then it’s deployed to users’ devices. Right: users can share data anonymized via PR-GAN with external entities.

To test our method, we conduct extensive experiments with three datasets: (1) MNIST, a standard benchmark image dataset, (2) a WiFi signal dataset for indoor localization, and (3) PubFig, a dataset of faces. For each dataset we show compelling evidence of the performance of our perturbations, compared to baseline approaches. Our method is capable of finding a good trade-off in difficult situations where the sensitive and target attributes are highly correlated. Through experiments we demonstrate that although we plug in specific classifiers in our training, the perturbation works for new classifiers not seen before. The perturbed data can be used for inference purposes as well as training new models for target application with a high performance.

The key properties of our approach can be summarized below.

  1. We provide a new framework for application-driven private data-publishing which allows for elaborate user-specified constraints. The trade-off between privacy and utility is built-in in our proposed framework. This means that when sensitive and target attributes are correlated, our solution can easily allow users to find the right trade-off between privacy (in terms of removing sensitive information) and utility (in terms of preserving information on target attributes). Our proposed framework is fairly generic and as shown in our experiments applicable to different types of input data and user specifications.

  2. We provide theoretical understandings of proposed PR-GAN under idealistic settings combined with experimental results under a variety of different data sets.

  3. To generate noise, our model does not need the whole dataset to be present. Once trained, the generator can be deployed on individual devices and generate perturbations locally from source. Unlike prior works on data perturbation in a centralized setting, our method is computationally efficient and is capable of making perturbations on large datasets.

  4. Our model does not transform the feature space to perturb the data. This means that our perturbed data can be used alongside the original data in target classification tasks with high utility.

We first survey prior work and then present our solution.

2. Related Work

Privacy in Learning Algorithms A lot of work have focused on manipulating the existing training and inference algorithms to protect privacy of training data, for example, a differentially private training algorithm for deep neural networks in which noise is added to the gradient during each iteration (2; 34; 36), and a “teacher-student” model using aggregated information instead of the original signals (32). It is also proposed to train a classifier in the cloud, with multiple users uploading their perturbed data to a central node without ever revealing their original private data (26; 21).

Privacy-Preserving Data Publishing A different approach to preserve privacy is making sure sensitive elements of the data is removed before publishing it, often called privacy-preserving data publishing (PPDP) (7). A main line of work in this field focuses on transforming numerical data into a secondary feature space, such that certain statistical properties are preserved and data mining tasks can be done with minimal performance loss (25; 9; 27). These methods guarantee data utility only on this secondary space. This means that a classifier trained on the perturbed data is not guaranteed to perform well on original data. This can be troublesome in a scenario where a classification model is trained on public, perturbed data and is going to be deployed on users’ personal devices to perform a task locally on non-perturbed private user data. Our perturbed data can be used in conjunction with original data in specified target applications. In addition, some of the methods in this category rely on expensive computations which renders them infeasible on large datasets.

Adversarial Learning In recent years GANs have been successfully used to produce adversarial examples that can fool a classifier to predict wrong classes (41). Some have formulated the problem of privacy protection as producing adversarial examples for an identity-revealing classifier (6; 18). However, we demonstrate through experiments that the absence of our proposed function, , that maintains the utility of the data, leads to a weaker utility guarantee for the published data.

Similar efforts have been made in the fairness literature to make sure that certain attributes (e.g., gender, or race) in a dataset do not create unwanted bias that affects decision making systems (5; 14). There is a key distinction between our work and that of Edwards and Storkey (14) in how we train our model. To train their model on a face dataset, the authors give two sets of data to the network, where in the second set, the last name of the subjects is artificially placed on each image. By providing two sets of acceptable and unacceptable samples, they are letting the model know what a safe-to-publish image looks like prior to training. In our method, the model relies only on the two classifiers, and , to learn how the published results should look like. It is also unclear whether and in their work will reach an optimal state given that they are trained from scratch together with the generative model.

Keep in mind that our approach builds on top of the existing literature on privacy-preserving learning algorithms. After using our method to remove sensitive information from a dataset, one can apply any of the existing privacy-preserving learning algorithms to further protect the privacy of users. Finally, although our model uses a classifier to guarantee high utility for certain desired tasks, a GAN by nature produces artificial samples indistinguishable from real ones, which makes the published data potentially useful for applications not specified by .

3. Problem Definition

Suppose that we have a dataset where the -th entry is a vector coming from an unknown distribution , with a corresponding sensitive label and a target label . We also have two functions , which predicts the sensitive labels and , which predicts the target labels. Given a prediction error function , the goal is then to produce a perturbed version of , , such that the following is minimized:



is a suitable loss function and

determines the trade-off between privacy and utility. As you can see, this definition can be trivially extended to any categorical or numerical and by choosing the correct loss function. The perturbed data, , is then released for public use.

4. PR-GAN Design

4.1. Architecture

Figure 3. Architecture of proposed model. Target classifier and sensitive classifier are approximations of and , pre-trained and fixed during training. and are trained in an adversarial setting.

As approximations of and , we have two discriminative classifiers, and , denoted by target and sensitive classifier. We also use a GAN, with a generative model in charge of producing the perturbed data, , and a discriminative model which distinguishes from . represents the distribution of the generated data from . Note that the model can be easily extended to accommodate multiple classifiers for both sensitive and target attributes. Both and are pre-trained classifiers plugged into our network. The advantage of using pre-trained classifiers, is that we can use classifiers with complex architectures, such as VGG16 for images (37), to guide the training of and . It will be extremely difficult to train such networks alongside and from scratch. The overall structure of the network is shown in Figure 3.

The process starts by the generator taking the original instance as input and generating the perturbed version of the data, . Then is fed to the discriminators , and . ’s goal is distinguishing real data from perturbed data. represents the probability that comes from the original data rather than produced data . We can write its loss function as below:


has multiple objectives. For the trade-off between privacy and utility, we can rewrite (1) to:


where is a suitable loss function (e.g: cross-entropy loss) and controls the relative trade-off between privacy and utility. also wants to fool the GAN discriminator in order to create perturbed data indistinguishable from the real data; the loss function for this will be:


Finally, it has been shown that using regularization can stabilize GAN’s training and also provides an additional lever to control the utility of the perturbed data by limiting the overall perturbation added to the data (19). Here, we use a hinge loss:


where is the maximum distance allowed before any loss is occurred.

Our full objective for , using (3), (4) and (5), can be written as:


where and control the relative importance of the three losses. At each iteration, we alternate between training and while and are previously trained and fixed in the network.

To fine-tune the model parameters, with a fixed utility threshold in mind, we explore the parameter space to minimize while keeping above . Similarly, given a fixed privacy budget , we can maximize while keeping below . Here, denotes the accuracy score.

4.2. Theoretical Analysis

4.2.1. GAN Optimality

In the original GAN design, the generator

defines a probability distribution

as the distribution of samples generated by . Under favorable assumptions, the distribution

converges to a good estimator of

– the distribution of training data (16). In our case, as we introduce additional classifiers and gradients, we wish to understand how these classifiers, , , modify the optimal solution and the final distribution.

Since GAN takes an iterative approach, the discriminator and the generator are optimized alternatively. We follow the same assumption as in (16): there is enough capacity and training time and the discriminator is allowed to reach its optimal given a fixed generator .

Fixed , Optimize .

Notice that the discriminator in our design uses the same loss function as in the original GAN. So the following claim is still true.

Lemma 4.1 (Optimal Discriminator (16)).

For a fixed generator , the optimal discriminator is

where is the probability that is from the data and is the probability that is from the generator .

Fixed , Optimize .

In the original GAN, the global minimum for the generator is achieved if and only if the generated distribution is the same as : . In our setting we show that the global minimum is achieved if the generator is an area preserving flipping map on , when such a map exists.

To explain what is an area preserving flipping map, we first consider the two classifiers and . Let’s suppose and are binary classifiers for now. A piece of data falls in one of the following four categories: , where , , contains the data items with label under and label under . If each data item in is changed by to a data item in category with , we will be able to completely fool and pass .

Definition 4.2 ().

Denote by the domain of data. An area preserving flipping map satisfies two conditions:

  • Flipping property: maps each data item to an item , with , and

  • Area preserving property: , , where is the probability measure on . That is, the total probability measure of before and after the mapping is the same.

Lemma 4.3 ().

If is such an area preserving flipping map on with measure , the generator loss is minimized.


Consider as the collection of output from when the input is taken with the distribution , i.e., . By the area preserving property, we have ; also by definition. Thus for any .

This essentially ensures that the output , with following the distribution , also follows the same distribution . Thus the total loss of the GAN is still minimized, if the discriminator is the optimal discriminator . Further, the flipping property ensures that for any input , the manipulated output completely fails and passes . Thus the total loss corresponding to and is minimized as well. ∎

Now the natural question is, when can we find an area preserving flipping map on our data? We first start with a definition on and .

Definition 4.4 ().

The sensitive classifier and the target classifier are called balanced if , for .

Lemma 4.5 ().

An area preserving flipping map with and exists if and only if and are balanced.


Clearly, if for some , then we cannot satisfy the flipping property and area preserving property simultaneously. On the other hand, when the total probability measures of and are the same, there is an area preserving map that maps to . First, an area preserving map exists between any two distribution (33). Now we define a distribution , which is proportional to for , and otherwise. Similarly we define a distribution which is proportional to for , and otherwise. Now the area preserving map from to is an area preserving flipping map with and . ∎

Therefore we can summarize that assuming sufficient capacity for the generator and discriminator with binary features, it is possible to find an area preserving flipping map with and and achieve balance.

Theorem 4.6 ().

When the sensitive classifier and the safe classifier are balanced, the global minimum of the generator is achieved if is an area preserving flipping map with respect to and .

An example when and are not balanced is in Figure 4. In this case, we have way more data samples in and than and . and are strongly correlated. When has label in , very likely it has label for as well. It is nearly impossible to protect the sensitive features and reveal the target features at the same time. There is a trade-off between the two objectives – either the distribution is different from (hurting generalization of the model), or the privacy protection cannot be ideal. This trade-off will be examined and evaluated in the next section.

Figure 4. and are not balanced.

For any two distributions, the area preserving map is not unique. Thus, the area preserving flipping map when and are balanced is not unique either. Finding one is not trivial though – since we do not have . This is mainly what the neural network optimizer is trying to achieve.

When there are multiple target/sensitive classifiers, the conclusion above can be easily extended. A flipping map will now flip all labels of the sensitive classifiers and maintain the labels of target classifiers. When the classifiers are not binary, a flipping map will change a label to any other label in the sensitive classifier.

4.2.2. Utility and Privacy Protection

In our architecture, two specific classifiers and are used. A natural question to ask is how much the generated data depends on these choices of classifiers. If the perturbed data fail with accuracy we give a bound on the accuracy for a different (unseen) classifier , under reasonable conditions.

Definition 4.7 (Total variation distance (8)).

For two probability distributions and on , the total variation distance between them is defined by

Informally, the total variation distance measures the largest change in probability over all events. For discrete probability distributions, the total variation distance is just the distance between the vectors in the probability simplex representing the two distributions. The proof can be found in supplemental material.

Theorem 4.8 ().

Suppose that the original data and generated data

are from distributions with total variance distance less than

. Consider an instance with ground truth label under a classification task with two classifiers and ) each with accuracy . If the perturbed data successfully fools , i.e., , then the perturbed data also fools :


By the definition that and are from distributions with bounded total variance, we have

If was used in the training of and one cannot infer sensitive labels with , we now show that the accuracy for on labeling and differently.

as claimed. ∎

Here we can use learner to denote the model used during training process, and the one that is unseen before. Intuitively, this result formalizes the observation that well-trained classifiers should possess close decision boundaries in high-probability regions. In such settings, the perturbation that misleads one sensitive classifier will be able to protect the hidden attributes against other sensitive classifiers with high probability. Although the total variance distance here measures the distance for distributions of original and perturbed data instead of the distance between two actual instances, it can characterize the data manifold dynamics and guarantee that the perturbation added to the data can protect the privacy for certain sensitive attributes against arbitrarily trained classifiers.

Basically, from Theorem 4.8, we can see that for a new sensitive classifier, the perturbed data will also have high probability to fail the classifier and protect privacy. Similarly, this holds for the target classifier as well. This will be further evaluated in the experiment section later.

4.3. Implementation Details

In order to minimize the dependencies between different components in our model, we slice a dataset into 3 parts equal in size and class proportions. We use the first slice, , to train and . We then use the second slice, , to train and while using the previously trained and . Finally, we use the last slice, , for testing purposes. Each slice is further divided into training and testing parts, with a ratio of 4:1, denoted by and for a slice .

Note that our method is not dependent on any specific architecture for and , and any model supporting gradient updates can be used here. We are assuming full access to the prediction results of the pre-trained classifiers. Since the training of classifiers for sensitive and target attributes and also the generative network is done by the data contributor/publisher, and not adversaries, this assumption is valid. However, we are assuming that an adversary, using a separate dataset (which can be public), trains a classifier to retrieve sensitive information and then attack the published data. Our goal is to prevent such attacks, while showing utility preservation.

5. Experiments

5.1. Datasets

Below, we go over the datasets we have used along with the sensitive and target attributes we have defined for each:

  1. MNIST (24): A dataset of handwritten letters, which includes 60,000 training and 10,000 test examples. For this dataset, we define target

    attributes as the parity (being odd or even) of the numbers and the

    sensitive attributes as whether or not a digit is greater than . Note that this is only a hypothetical application to showcase the strength of our method on a well-studied dataset.

  2. PubFig Faces (23): This dataset includes 58,797 images of 200 people. Inspired by the concerns around identity-revealing capabilities of face images, we define sensitive attributes as the identity of each subject while each person’s gender is the target attribute. This can happen in a scenario where subjects are willing to donate images to train a classifier, but are afraid about their identities being revealed. To achieve a higher performance, we aligned the images using MTCNN (44) and removed duplicate images for each person. We then filtered out subjects with less than images. This left us with 6,553 images, 2,279 of women and 4,274 of men, from a total of people. We used the VGG16 (37) architecture with modified top layers to perform both classification tasks.

  3. UJI Indoor Localization (39): Here, signal strengths of WiFi access points (WAP) are recorded for 21,048 locations inside different buildings. The buildings have a total of floors. In addition, each instance has a coordinate, which we use to cluster locations on each floor into groups. We define the sensitive attribute as the specific cluster a user was in, and the target attribute the floor on which the user was. This is inspired by a scenario where contributors of the data are willing to reveal their location up to a certain granularity. Although the signal strengths are numerical, we achieved better results by changing the signals into binary attributes indicating the presence or absence of signal from a WAP. For brevity we call this dataset the WiFi dataset from now on.

The detailed architectures of classifiers for each task and each dataset is mentioned in provided in the supplemental materials.

Method Sensitive Accuracy
( ( (
PR-GAN 0.125 0.175 0.177
NGP 0.305 0.211 0.178
AP 0.897 0.571 0.477
DP 0.806 0.783 0.464
Original* 0.984 0.807 0.759
  • Non-perturbed data.

Table 1. Performance of the methods on datasets; lower is better. Target accuracy threshold is mentioned in parenthesis.

5.2. Baseline

In our experiments, we compare our method against the following baselines:

Figure 5. The architecture of the two generative baselines compared to our method in the experiments.
  • Naive Generative Privacy (NGP): We have argued that by utilizing a GAN’s structure, we can produce more realistic perturbed data similar to the original. This will in turn increase the utility of the resulting datasets. To test this hypothesis we create an alternative architecture by removing . We expect this method to provide a lower privacy guarantee (higher ) given a fixed utility threshold.

  • Adversarial Privacy (AP): We believe that the existence of a target classifier () to guide the training of is essential for a better utility guarantee, and so we compare our method to an alternative architecture where is removed. This is essentially formulating the problem of privacy protection as defending against an adversary model (in our case ). We expect this method to provide a lower privacy guarantee (higher ) given a fixed utility threshold.

  • Differential Privacy (DP): For real-valued vectors (image datasets), we use the Laplace Mechanism known to achieve -differential privacy, where independent noise is added to each pixel with the Laplacian distribution:

    Here, is the scale parameter of the distribution and this method achieves -differential privacy (12). For the WiFi  dataset, where attributes are binary, we use a Randomized Response (RR) approach (12) where for each bit of information , we report its true value with probability or else reporting either or uniformly at random. Such a mechanism provides -differential privacy. We perform this perturbation mechanism for each of the signals and report the result for each record.

Our method is denoted by PR-GAN throughout experiments.

5.3. Running on Mobile Devices

As discussed earlier, once trained, we can remove the trained generator () and deploy it on remote devices to produce perturbations for users from source. This has the advantage that users will not need to trust an external entity with the safety of their sensitive information. The complexity and efficiency of a neural network depends on many factors, but as is common practice (35), we measure it by counting the number of parameters in a network and the number of floating-point operations (FLOP). In Table 2, we compare the complexity of our networks with state-of-the-art networks designed specifically to run on mobile devices. As you can see, our generator networks are more compact and computationally inexpensive compared to the state-of-the-art, which indicates that it is possible to deploy and use them on mobile devices using currently available technologies.

Network FLOP Parameters
MobileNetV1 1.0 (17) 575M 4.2M
ShuffleNet 1.5x (45) 292M 3.4M
NasNet-A (47) 564M 5.3M
MobileNetV2 1.0 (35) 300M 3.4M
Our Generators
MNIST 1.6M 235.4K
PubFig 232M 644.2K
WiFi 2.1M 1.1M
Table 2.

The complexity of our models compared to state-of-the-art models designed for ImageNet 

(22) classification task on mobile devices.

5.4. Performance

Dataset Target Accuracy (%) Avg. Utility Sensitive Accuracy (%) Avg. Privacy
Model 1 Model 2 Model 3 Drop (%) Model 1 Model 2 Model 3 Random Drop (%)
MNIST 95.19 94.07 92.73 1.79 12.49 43.51 46.18 50.00 0.00
PubFig 95.47 95.23 95.71 0.12 17.5 10.71 17.38 6.67 0.00
WiFi 75.77 76.96 72.27 1.75 17.75 19.77 20.66 0.97 2.47
Table 3. Performance of the original (Model 1) and two new models (Models 2 and 3) in sensitive and target classification tasks. Our results transfer to new architectures with minimal change.

To show that our method is capable of effectively concealing the sensitive attributes while preserving the information about target attributes, we compare our method against the baselines across datasets with a fixed utility () threshold. We select a threshold of for the two image datasets, PubFig and MNIST, and for the WiFi dataset. As we will see later on, due to high correlation between the sensitive and target attributes in the WiFi dataset, it is harder to effectively conceal sensitive attributes while keeping the target attributes almost intact. We dive deeper into the trade-off between utility and privacy in a case study on the WiFi dataset in Section 5.5.

Recall from Section 4.3 that the dataset is divided into slices, the first used to train and , the second to train the networks and

and the third used for testing. To tune the hyperparameters for neural networks, we further split the GAN’s training data into two parts, with a ratio of 4:1 and preserving the class proportions, and use the smaller part as a validation set to keep

above the fixed threshold. For the methods based on DP, the optimal value of is found by iterating over different values of from to and selecting the largest (corresponding to the smallest added noise) where is above the set threshold and report the resulting .

The performance of the methods along with the performance of the classifier on sensitive attributes on the original, non-perturbed data is available in Table 1. First, note that the two methods that do not utilize the target attributes in producing perturbations (DP and AP) achieve results that are far less promising than the other two methods. Furthermore, as the objectives become more complicated moving from binary attributes (WiFi dataset) to image data (MNIST and PubFig), this gap between the two groups grow wider and wider. Finally, you can see that our method, taking advantage of the adversarial training of a GAN, can hide the sensitive attributes more effectively given the same utility threshold. This shows that the GAN plays an essential part in achieving superior results. It is also worth noting that in the case of image datasets, we achieved a significant reduction in sensitive accuracy while choosing a threshold very close to the the original accuracy values ( for MNIST and for PubFig).

Figure 6. The trade-off between achieved privacy () and utility loss budget ( for the WiFi dataset.

5.5. Utility vs. Privacy

Ideally, one looks to perturb the data in a way that a classifier on sensitive attributes fails completely (with accuracy close to that of a random classifier) while a classifier on target attributes continues to perform as before. However, in many cases where the two objectives are in conflict and the sensitive and target attributes are correlated, this might not be possible. In these cases, a good trade-off between privacy and utility is desirable. Here, we test our method against the baselines over different utility loss budgets (a maximum allowed drop in ) and compare the achieved privacy (drop in ).

The utility loss budget is chosen from the interval , corresponding to . Since we optimize the hyperparameters over many settings for all methods, we were only able to carry out this experiment on the WiFi dataset with the resources available to us.

You can see the results in Figure 6 where the -axis is the utility loss budget and -axis is the achieved privacy. As you can see, for every budget, our method outperforms the others and as we increase the budget, the margin between our achieved privacy and the others grows larger. Also worth noting is that the methods that are not guided by a classifier on target attributes (DP and AP) have a lower privacy gain per budget compared to the other two (PR-GAN and NGP). The results show that in difficult conditions, our method is capable of achieving a higher privacy guarantee given the same utility drop budget.

5.6. Transferability

We now test whether our results transfer to new models with a different architecture. We take the perturbed datasets produced in Section 5.4, and use two neural networks with new architectures trained on the original training data to perform the target and sensitive classification tasks. We then compare the resulting accuracy values with those reported in Section 5.4. We expect the results to remain relatively the same. The 3 architectures used for every dataset is shown in the tables in Appendix A.

On the target classification task, we prefer no drop in accuracy when we change the model’s architecture. For sensitive attributes, we would like to see the new model performing worse than the original model or a random classifier222A classifier that spits out an output class selected uniformly at random.. We formally define a drop in utility and privacy incurred by substituting the model’s architecture as:

(7) Utility Drop
(8) Privacy Drop

where is the original model, the new model, RC a random classifier and the accuracy of each model in the corresponding classification task.

Note that the goal here is that the result on one neural network transfers to another with minimal changes to either utility or privacy. You can see the models’ accuracy values and the average utility and privacy drops over two new architectures in Table 3. As you can see, the drops in privacy and utility incurred by a change in the network’s architecture are extremely low. The highest drop in utility is equal to while the highest drop in privacy is equal to and in both image datasets there are no drops in our privacy guarantee for neither of new architectures, which is ideal. Note that in the case of MNIST sensitive attributes, although the two new architectures have a performance significantly higher than that of the original model, they are both below the random classifier performance threshold (50%). Since no one can guarantee a performance lower than that of a random classifier, this is ideal. These results suggest that the effects of our perturbations are transferable to other neural networks with different architectures. Since an adversary can choose any model to attack our perturbations, it is important to design a method with utility guarantees which can be extended to other networks with arbitrary architectures.

5.7. Training Utility

Dataset Target Accuracy (%)
Inference Training
MNIST 95.19 96.72
PubFig 95.47 98.84
WiFi 75.77 73.72
Table 4. Utility of perturbed data for Inference (where a model trained on original data is tested on perturbed data) and Training (where a model is trained on perturbed data and tested on original data) purposes

In previous sections we demonstrated that our published datasets can be used for inference tasks on target attributes with high utility guarantees. In another scenario, it is possible that individuals share their anonymized data, using our method to produce perturbations, with an external entity to contribute to the training of a new model. This model can in turn be deployed on the individuals’ devices to perform classification tasks on their raw, private data. To see if we can provide the same level of utility in this scenario as we did in the inference tasks before, we train new models for the datasets on the perturbed datasets produced in Section 5.4 and test them on original data. The results are available in Table 4. As you can see, there is little or no significant change in the accuracy of the models, which indicates that our method is capable to provide a high utility guarantee for both inference purposes as well as training purposes. This experiment shows a key advantage of our method over previous works where the perturbed data is in a transformed featured space and unsuitable to train models that can be tested on the original data (25; 9; 27).

6. Conclusion and Future Work

In this work, we have tried to bridge the gap between privacy preserving data publishing and deep generative models, a field that is on the rise and is used extensively in other areas such as adversarial learning. We showed that it is possible to use deep neural networks as clues for generating tailored perturbations. By choosing this approach, not only we can effectively protect sensitive information, but we can also maintain the information necessary for a given target application. Note that the goal here is to fool a classifiers on specific tasks and not human beings. The results might seem clearly distinguishable from a human’s point of view, but a machine might be unable to tell the difference.

Our experiments showed that our method’s clear advantage over conventional methods, it’s capability in finding a good trade-off between privacy and utility, it’s utility for both training and inference tasks, and the ability to be utilized on mobile devices with limited computational resources.

Finally, as more improved generative models are proposed, we can easily plug them into our framework to achieve better results. We believe that there are many interesting avenues of research to continue this work, including utilizing different GAN architectures to perturb different types of data (e.g: time series or very high resolution images), or guiding the users on the data they are about to share with a trusted central unit, to help train models without revealing private and potentially sensitive information.

Appendix A Generator Architecture Details

Here are detailed architectures used in our experiments. Model 1 is the original architecture used across all experiments. Models 2 and 3 are the additional architectures used in the transferability experiment.

Model 1 Model 2 Model 3


Conv(64,8,8)+Relu Conv(32,3,3)+Relu
Conv(64,5,5)+Relu Dropout(0.2) Conv(32,3,3)+Relu
Dropout(0.25) Conv(128, 6, 6)+Relu MaxPooling(2,2)
FC(128)+Relu Conv(128, 5, 5)+Relu Conv(64,3,3)+Relu
Dropout(0.5) Dropout(0.5) Conv(64,3,3)+Relu
FC(2)+Softmax FC(2)+Softmax MaxPooling(2,2)
Table 5. Model Architectures on MNIST. (Conv: convolution layer, FC: fully-connected layer. )
Model 1 Model 2 Model 3
FC(256)+Relu FC(1024)+Relu FC(256)+Relu
Dropout(0.5) Dropout(0.5) Dropout(0.5)
FC(128)+Relu FC(512)+Relu FC(256)+Relu
Dropout(0.5) Dropout(0.5) Dropout(0.5)
FC(64)+Relu FC() FC()
FC() Softmax Softmax
Table 6. Model architectures on UJI Indoor Localization dataset (FC : fully-connected layer)
Model 1 Model 2 Model 3
VGG16 base VGG166 base VGG166 base
FCC(1024)+Relu FCC(512)+Relu FCC(512)+Relu
Dropout(0.5) Dropout(0.5) Dropout(0.5)
FC(512)+Relu FC(512)+Relu FC(256)+Relu
FC() FC() FC()
Softmax Softmax Softmax
Table 7. Model architectures on PubFig dataset, built on top of VGG16 network (37) (minus the topmost layers).