Machine learning has benefited numerous mobile services, such as speech-based assistant (e.g. Siri), reading log enabled book recommendation (e.g. Youboox). Many such services submit user data, e.g. sound, image, and human activity records, to the service provider, posing well-known privacy risks (Abadi et al., 2016; Dwork et al., 2017; Bhatia et al., 2016). Our goal is to avoid disclosing raw data to service providers by creating a device-local intermediate component that encodes the raw data and only sends the encoded data to the service provider. And the encoded data must be both useful and private. For inference-based services, utility can be quantified by the inference accuracy, achieved by the service provider using a discriminative model. And Privacy can be quantified by the disclosure risk of private information.
Existing solutions addressing the privacy concern struggle to balance between above two seemingly conflicting objectives: privacy vs. utility. An obvious and widely practiced solution is to transform the raw data into task-specific features and upload features only, like Google Now (GoogleNow, 2018) and Google Cloud (GoogleCloud, 2018); This not only reduces the data utility but also is vulnerable to reverse models that reconstruct the raw data from extracted features (Mahendran and Vedaldi, 2015). The authors of (Ossia et al., 2017) additionally apply dimensionality reduction, Siamese fine-tuning, and noise injection to the features before sending them to the service provider. This unfortunately result in further loss in utility.
Unlike previous work, we employ deep models and adversarial training to automatically learn features for a sweet tradeoff between privacy and utility. Our key idea is to judiciously combine the discriminative learning, for minimizing the task-specific discriminative error as well as maximizing the user-specified privacy discriminative error, and the generative learning, for maximizing the agnostic privacy reconstruction error. Specifically, we present the Privacy Adversarial Network (PAN), an end-to-end deep model, and its training algorithm. PAN controls three types of descent gradients, i.e., utility discriminative error, privacy discriminative error, and privacy reconstruction error, in back propagation to guide the training of a feature extractor.
As shown in Fig. 2, a PAN consists of four parts: a feature extractor (Encoder ), a utility discriminator (UD), an adversarial privacy reconstructor (PR), and an adversarial privacy discriminator (PD). The output of the Encoder (E) feeds to the input of the utility discriminator (UD), privacy reconstructor (PR), and privacy discriminator (PD). We envision the Encoder (E) runs in mobile devices to extract features from raw data. The utility discriminator (UD) represents the inference service to ensure the utility of extracted features. PAN emulates two types of adversarials to ensure the privacy: the privacy discriminator (PD) emulates a malicious party that seeks to extract private information, e.g. user location; the privacy reconstructor (PR) emulates one that seeks to reconstruct raw data from the features. We present a novel algorithm to explicitly train PAN via an adversarial process that alternates between i.e., training the Encoder with the utility discriminator (UD) to improve the utility and confronting the Encoder with the adversaries of privacy discriminator (PD) and privacy reconstructor (PR) to enhance the privacy. All four parts iteratively evolve with others during the training phase. Understood from the perspective of manifold, the separate flows of gradients from utility discriminator (UD), privacy discriminator (PD), and privacy reconstructor (PR) through the Encoder in back-propagation can iteratively produces the feature manifold that is both useful and private.
and ImageNet(Deng et al., 2009)), sound sensing (Ubisound (Sicong et al., 2017)), human activity recognition (Har (UCI, 2017)), and driver behavior prediction (StateFarm (Kaggle, 2019)), we show PAN
is effective in training the Encoder to generate deep features that provide better privacy-utility tradeoff than other privacy preserving methods. Surprisingly, we observe that the adversarially learned features to remove redundant information, for privacy, even surpass the recognition accuracy of discriminatively learned features. That is, removing task-irrelevant information for privacy actually improves generalization and as a result, utility.
2. Problem Definition of Mobile Data Privacy Preserving
This section mathematically formulates the problem of utility-privacy tradeoff for mobile data. Many appealing cloud-based services exist today that require data from mobile users. For example, as shown in Fig 1, a user takes a picture of a product and sends it to a cloud-based service to find out how to purchase it, a service Amazon actually provides. The picture, on the other hand, can accidentally contain sensitive information, e.g., face and other identifying objects in the background. Therefore, the user faces a touch challenge: how to obtain the service without trusting the service provider with the sensitive information?
Toward addressing this challenge, our key insight is that most services actually do not need the raw data. The user can encode raw data into representation through an Encoder on the mobile device and only sends to the service provider. The representation ideally should have the following two properties:
Utility: it must contain enough task-relevant information to be useful for the intended service, e.g., high accuracy for object recognition;
Privacy: it must have little task-irrelevant information, especially that is considered sensitive by the user.
In this work, we focus on classification-based services. Therefore, the utility of is measured by the task inference error (e.g. cross entropy) in the service provider. And we quantify the privacy of by the privacy leak risk of raw data in all possible attacking models . Since the Encoder is distributed to mobile users, we assume it is available to both service providers and potential attackers. That is, both the service provider and the malicious party can train their models using raw data and their corresponding Encoder output . As such we can restate the desirable properties for the Encoder output within dataset as below:
The first objective (Utility) is well-understood for discriminative learning, and achievable via a standard optimization process on the Encoder (E) and the corresponding specialist discriminative model, i.e., minimizing the cross entropy between the predicted task label and the ground truth in a supervised manner (Kruse et al., 2013).
The second objective (Privacy) has two parts. The inner part, , is opposite to the the outer part . Therefore, the Encoder () employed by the mobile user and the specialist attacker () used by the malicious party is adversarial to each other in their optimization objectives. Given the information loss in for privacy, utility loss appears to be certain in theory. One would only hope to find a good, ideally Pareto-optimal, tradeoff between privacy and utility in devising . However, as we will show later, discovered via PAN
actually improves privacy and utility at the same time, a result that can be explained by the practical limits of deep learning in §5.
Important to the quantification of privacy, one must enumerate the privacy leak risks by all possible attackers in theory. Moreover, the measurement of the privacy leak risk is an open problem in itself (Mendes and Vilela, 2017). Therefore, we approximate privacy with two specific attackers, each with its own measurement of privacy, elaborated below.
Specified privacy quantification in which the user specifies what inference tasks should be forbidden and privacy can be quantified by the accuracy of these tasks. For example, users may want to prevent a malicious party from inferring their identity. In this case, the privacy can be measured by the inaccuracy of identify inference. In this case, the privacy leak risk can be defined as the inference accuracy by a discrimination model employed by the attacker.
Intuitive privacy quantification in which the privacy leakage risk is agnostic of the inference tasks under taken by the attacker. In this work, we quantify this agnostic privacy by the difference between the raw data, , and , data reconstructed by a malicious party from the Encoder output . We choose this reconstruction error as the agnostic measure for two reasons. First, the raw data in theory contains all information and difference between and is computationally straightforward and intuitive. Second, prior works have already shown that it is possible to reconstruct the raw data from feature representations optimized for accuracy (Mahendran and Vedaldi, 2015; Radford et al., 2015; Zhong et al., 2016).
3. Design of PAN
To find a good, hopefully Pareto-optimal tradeoff between utility and privacy, we design PAN to learn an Encoder via a careful combination of discriminative, generative, and adversarial training. As we will show in §4, to our surprise, the resulting Encoder actually improves utility and privacy at the same time.
3.1. Architecture of PAN
As shown in Fig 2, PAN
employs two additional neural network modules, utility discriminator (UD) and privacy attacker, to quantify utility and privacy, respectively, in training the Encoder. The utility discriminator simulates the intended classification service; when PAN is trained by the service provider, the utility discriminator can be the same discriminative model used by the service. The privacy attacker, i.e., the intuitive privacy reconstructor (PR) and the specified privacy discriminator (PD), simulates a malicious attacker that attempts to obtain sensitive information from the encoded features . These modules are end-to-end trained to learn the Encoder for users to extract deep features from raw data . The training is an iterative process that we will elaborate in §3.2. Below we first introduce PAN’s neural network architecture, along with some empirically gained design insights.
consists of an input layer, multiple convolutional layers, pooling layers, and batch-normalization layers. The convolution layer applies a convolution operation to output activation map with a set of trainable filters. We note that the clever usage of pooling layers and batch-normalization layers contribute to the deep feature’s utility and privacy. The batch-normalization layer normalizes the outputted activation map of a previous layer by subtracting the batch mean and dividing by the batch standard deviation(Ioffe and Szegedy, 2015). It helps the features’ utility because it normalizes the activation to avoid being too high or too low thus has a regularization effect (Ioffe and Szegedy, 2015). It contributes to features’ privacy as well since it makes it harder for an attacker to recover sensitive information from normalized features. And then, the pooling layer adopts a maximum or average value from a sub-region of the previous layer to form more compact features, which reduces the computational error and avoids over-fitting (Giusti et al., 2013). It helps privacy because none of un-pooling techniques can recover fine details from the resulting features through shifting small parts to precisely arranging them into a larger meaningful structure (Milletari et al., 2016).
The Utility Discriminator (UD)
builds a multi-layer perceptron (MLP) to process deep featuresand output the task classification results with several full-connected layers (Kruse et al., 2013). We also note that a service provider can explore any classification architectures for its utility discriminator model, given the Encoder or its binary version. We choose the MLP architecture because some of the most successful CNN architectures, e.g. VGG and AlexNet, can be viewed as the Encoder plus an MLP. The standard cross entropy between the utility discriminator’s prediction output and the task ground truth measures the utility error .
The Privacy Attacker employs the two privacy attacking models presented at the end of §2. Specifically, the privacy discriminator (PD) evaluates the recognition accuracy of private class from encoded features . And the privacy reconstructor (PR) quantifies the intuitive reconstruction error between mimic data and raw data .
Specified Privacy Discriminator (PD)
employs a similar MLP classifier as the utility discriminator (UD) to predict the user-specified privacy class, e.g. personal identity, from features . The difference is that the multi-layer PD maps to the corresponding private classes. As noted before, the architecture and training algorithm of PAN can easily incorporate other architectures as the privacy discriminator (PD). The error between the predicted private class and the privacy label measures the specified privacy leak risk .
Intuitive Privacy Reconstructor (PR)
is a usual Encoder turned upside down, composed of multiple un-pooling layers and deconvolutional layers. The un-pooling operation is realized by feature resizing or nearest-value padding(Mahendran and Vedaldi, 2015). And then the Deconvolution layer densifies the sparse activation obtained by un-pooling through reverse convolution operations (Zeiler et al., 2010). The PR simulates a malicious party and quantifies the intuitive privacy error . After obtaining a (binary) version of the Encoder, a malicious party is free to explore any neural architectures to reconstruct the raw data. In this work, we examine multiple reconstructor architectures and select the one with the lowest reconstruction error as the specialist privacy reconstructor. And we also include an exactly layer-to-layer reversed architecture to mirror the Encoder, to emulate a powerful adversarial reconstructor that knows the internals of the Encoder throughout training. The reconstruction error, e.g. Euclidean distance, between and measures the disclosure risk of agnostic privacy information.
3.2. Training Algorithm of PAN
Our goal with PAN is to train an Encoder that can produce output that is both useful, i.e., leading to high inference accuracy when used for classification tasks, and private, i.e., leading to low privacy inference accuracy and high reconstructive error when maliciously processed and reversely engineered by the attacker, respectively. As we noted in §2, the utility and privacy objectives can be competing when taken naively. The key idea of the PAN’s training algorithm is to train the Encoder along with the utility discriminator and the two types of privacy attackers, which specialize in discrimination and reconstruction, respectively. Given a training dataset of pairs of , the raw data, , the true task label, and , the privacy label, we train a PAN through an iterative process with the following four stages:
Discriminative training mainly maximizes the accuracy to train a specialist utility discriminator (UD); mathematically, it minimizes the cross entropy between predicted class and true label :
Discriminative training minimizes the cross entropy between predicted private class and private ground truth , to primarily train a specialist privacy discriminator (PD):
Generative training minimizes the reconstructive error to train a specialist privacy reconstructor (PR):
Adversarial training minimizes the sum error to find a privacy-utility tradeoff. Specifically, it trains the Encoder to suppress utility error and increase privacy error ():
is a Lagrangian function of and . , , and are Lagrange multipliers that can be used to control the relative importance of privacy and utility. When we set or , PAN only trains the Encoder to resist against the specified privacy discriminator or the intuitive privacy reconstructor, respectively.
Algorithm 1 summarizes the training algorithm of PAN
. We leverage mini-batch techniques to split the training data into small batches, over which we calculate the average of the gradient to reduce the variance of gradients, which balance the training robustness and efficiency (line 3)(Li et al., 2014). Within each epoch, we first perform the standard discriminative and generative stages (line 5, 6, 7) to initialize the Encoder’s weights and train the specialist utility discriminator (UD), privacy discriminator (PD) and privacy reconstructor (PR). And then, we perform the adversarial stage (line 9) to shift the utility-privacy tradeoff on the Encoder weight tuning. We note that in line 4 is a hyper-parameter of the first three stages. These steps followed by a single iteration of the forth stage is trying to synchronize the convergence speed of these four training stages well, borrowing existing techniques in generative adversarial network (Goodfellow et al., 2014). Our implementation uses an empirically optimized value of . And we leverage the AdamOptimizer (Kingma and Ba, 2014) with an adaptive learning rate for all four stages (line 5, 6, 7 and 9).
In this section, we evaluate PAN’s performance using six classification services for mobile apps, with a focus on the utility-privacy tradeoff. We compare PAN against alternative methods reported in the literature and visualize the results for insight into why PAN excels.
|No.||Target task (utility label)||Private attribute (privacy label)||Dataset||Description|
|Digit ( classes)||None||MNIST (LeCun, 1998)||images|
|Image ( classes)||None||CIFAR-10 (Krizhevsky et al., 2014)||images|
|Image ( classes)||None||ImageNet (Deng et al., 2009)||images|
|Acoustic event ( classes)||None||UbiSound (Sicong et al., 2017)||audio clips|
|Human activity ( classes)||Human identity ( classes)||Har (UCI, 2017)||records of accelerometer and gyroscope|
|Driver behavior ( classes)||Driver idenity ( classes)||StateFarm (Kaggle, 2019)||images|
4.1. Experiment Setup
Evaluation applications datasets. We evaluate PAN, especially the resulting Encoder, with six commonly used mobile applications/services, for which the corresponding benchmark datasets are summarized in Table 1. Specifically, the target task in (MNIST (LeCun, 1998)) is handwritten digit recognition. The agnostic private information in the real-world raw image may include individual handwritten style and the background paper. We use images for PAN training and images for validation and testing. The target tasks in (CIFAR-10 (Krizhevsky et al., 2014)) and (ImageNet (Deng et al., 2009)) are image classification. The agnostic private information in the real-world raw image may involve background location, color, and brand. We choose images for training and remaining images for testing in both cases. The target task in (UbiSound (Sicong et al., 2017)) is to recognize acoustic event. The agnostic private information covers background voice and environment information. We use audio clips for training and audio clips for testing. The target task in (Har (UCI, 2017)) is human activity identification based on the records of accelerometer and gyroscope. The specified private attribute we intend to hide is useridentity. And the agnostic private information we expect to protect may contain individual habit. We randomly select records for training and records for testing. The target task in (StateFarm (Kaggle, 2019)) is to predict driver behavior. The specified private attribute we choose is driver identity. And the agnostic private information within the real-world raw image can be face and gender. We use images for training and images for testing.
Evaluation models. In PAN, we leverage a utility discriminator (UD), a privacy discriminator (PD), and a privacy reconstructor (PR) model to train and validate the Encoder (E). In the training phase, we refer to the successful neural network architectures to build PAN’s Encoder (E), Utility Discriminator (UD) and Privacy Discriminator (PD) for different types of datasets. For example, according to the sample shape in the datasets, the LeNet is chosen as the reference for , and , AlexNet is for and , and VGG-16 model is for . To evaluate the learned Encoder in the testing phase, we leverage another set of separately trained Utility Discriminator (UD) and Privacy Attackers (PD and PR), given PAN’s
Encoder output, to simulate the service provider and malicious parties. In particular, we ensemble multiple optional MLP architectures to simulate the service provider’s Utility Discriminator (UD) for task recognition, as well as the malicious attacker’s Privacy Discriminator (PD) for private attribute prediction. These MLP models have different fully-connected architectures by using varying scales of singular value decomposition, sparse-coding factorization, and global-average computation to replace the initial fully-connected layers. We also employ multiple generative architectures to select the most powerful one as the privacy reconstruction attacker (PR). To emulate a powerful adversary that knows the Encoder for the attackers’ training, we include a PR model that exactly mirrors the Encoder for each task.
Prototype implementation. PAN has two phases: an offline phase to train the Encoder, and an online phase where we deploy the learned Encoder as a middleware on mobile platforms to encode the raw sensor data into features.
In the offline phase, we use the Python library of TensorFlow(team, 2018) to train the Encoder, utility discriminator, privacy discriminator as well as privacy reconstructor using the datasets summarized in Table 1. And we leverage h5py (Collette, 2018) to separately save the trained models. To speedup the training, we leverage a server with four Geforce GTX 1080 Ti GPUs with CUDA 9.0. In the online phase, we prototype the mobile-side on the Android platform, i.e., Xiaomi Mi6 smartphone, using TensorFlow Mobile Framework (Google, 2018b). And we store the learned Encoder in the smartphone’s L2-cache using Android’s LuCache API (Google, 2018a), which speeds up the on-device data encoding. The Encoder intercepts the incoming testing data and encodes it into features, which are then fed into the corresponding Android Apps for real-word performance evaluation.
4.2. Comparison Baselines
We employ four types of state-of-the-art data privacy preserving baselines to evaluate PAN. The DNN method provides a high utility standard, and the DP, FL, and Hybrid DNN methods set a strict utility-privacy tradeoff benchmark for PAN. The detail settings of the baseline approaches and PAN are as below.
Noisy (DP) method perturbs the raw data by adding Laplace noise with diverse factors , and then submit the noisy data to the service provider. This is a typical local differential privacy (DP) method (He and Cai, 2017; Dwork et al., 2010). The utility of noisy data is tested by the task (e.g. the driver behavior in ) recognition accuracy in a MLP classifier with multiple fully-connected layers. The specified privacy is measured by the inference accuracy over private attribute (e.g. driver identity in ) in another MLP classifier . And the intuitive privacy is evaluated by the average information loss, i.e., . Here is the corresponding testing set within datasets .
Noisy (FL) method perturbs the data by adding Gaussian noise with mean and variance , where we set according to (Papernot et al., 2018). The Gaussian noise included in the noisy data can provide rigorous guarantees of differential privacy using less local noise. This is widely used in the noisy aggregation scheme of federated learning (FL) (Truex et al., 2018; Papernot et al., 2018). We test the utility , the specified privacy , and the intuitive privacy of this noisy data using the similar methodology as DP baseline.
DNN method encodes the raw data into features using a deep encoder with multiple convolutional and pooling layers, and expose features to the service provider (GoogleCloud, 2018; GoogleNow, 2018). The utility of DNN features is measured by the inference accuracy in a classifier with multiple fully-connected layers. The specified privacy is tested by the inference accuracy over the private attribute in another privacy classifier with multiple fully-connected layers. And the intuitive privacy is tested by the reconstruction error in a decoder with multiple deconvolutional and unpooling layers, i.e., .
Hybrid DNN method further perturbs the above DNN features through additional lossy processes, i.e.
, principal components analysis (PCA) and adding Laplace noise(Ossia et al., 2017) with varying noise factors , before delivering them to the service provider. The utility , the specified privacy , and the intuitive privacy of the perturbed features is respectively tested by a task classifier, a private attribute classifier, and a decoder, using the same methodology of the DNN baseline.
PAN automatically transform raw data into features, i.e., , using the learned Encoder . In particular, we evaluate the following two types of PAN, that are trained to defend against different types of privacy attackers for different benchmark tasks/datasets (Table 1):
PAN is trained with one privacy attacker, i.e., the Privacy Reconstructor (PR), by setting in the adversarial training objective (Eq.(5)). We train the PAN on six datasets (i.e., ).
PAN is trained with two privacy attackers, i.e., Privacy Discriminator (PD) and Privacy Reconstructor (PR), for application datasets accompanied with both utility labels and private attribute labels (i.e., and ).
The utility of both ’s and ’s Encoder output are tested by the task inference accuracy in the service provider’s utility discriminator (UD) using a classifier. The specified privacy of ’s Encoder output is evaluated by the inference accuracy in the attacker’s privacy discriminator (PD) using a classifier. As for the intuitive reconstruction privacy in PAN and PAN, we select the most powerful decoder as the privacy reconstructor (PR) to evaluate it, i.e., .
4.3. Utility vs. Privacy Tradeoffs
This subsection evaluates PAN in terms of the utility by the service provider and the privacy and by the malicious attackers, compared with four privacy-preserving baselines (see 4.2). Figure 3 and Figure 4 summarize the Pareto fronts of the testing privacy-utility tradeoffs by four baselines and PAN. In this set of experiments, we train the PAN on the six application datasets () based on utility labels, and train the PAN on Har () and StateFarm () datasets accompanied with both utility labels and private attribute labels (see Table 1).
First, PAN’s Encoder output achieves the best privacy-utility tradeoff, compared to those encoded by other four baselines. In Figure 3, we see the performance of PAN’s Encoder output lies in the upper right corner with maximized utility and maximized privacy on the digit recognition applications (), and lie around the upper right side with maximized utility and competitive privacy compared with other four baselines on image classification applications ( and ) and audio sensing applications (). In Figure 4, the PAN is also in the upper right corner with maximized utility and maximized privacy or on both human activity recognition application () and driver behavior prediction application (). Here we transform the expected minimized privacy to the maximized privacy . When we consider the as a quantifiable metric of utility-privacy tradeoff, both the PAN’s and PAN’s Encoder output achieve the best tradeoff value according to the default relative importance and . While the DNN method provides unacceptable low privacy, and Hybrid DNN, Noisy (DP) and Noisy (FL) methods offer high privacy at the cost of utility degradation. Second, the utility (i.e., task inference accuracy) of PAN’s Encoder output is at least as good as and sometimes even better than the other four baseline methods across different applications. Specifically, the task inference accuracy by PAN is on MNIST (), on CIFAR-10 (), on ImageNet (), on UbiSound (), on Har (), and on StateFarm (), maintaining at a high level. And the task inference accuracy by PAN is on Har () and on StateFarm (). It is even better than the utility of standard DNN features. Although, with carefully-calibrated Gaussian noise distribution in Noisy (FL), we observe better utility when using Noisy (FL) method than that using the Noisy (DP) method. The task inference accuracy in Noisy (DP), Noisy (FL) and Hybrid DNN baselines is seriously unstable, ranging from to on different applications, because of the injected noises. Also, we see the utility of PAN on Har and StateFarm is slightly improved than the PAN case. It implies the PAN with two adversaries learns better features than PAN with only one generative model-based adversary. Third, PAN’s Encoder output in both PAN and PAN cases considerably improves the privacy than the DNN method and achieves the competitive privacy compared with other three baselines. Moreover, the PAN’s privacy and quantified by PAN’s privacy discriminator (PD) and privacy reconstructor (PR) (the dashed lines in Figure 3,4) is comparable with that measured by the third-party attackers (solid black triangles in Figure 3,4). We train the third-party attackers using the binary version of PAN’s Encoder. This result demonstrates the strong adversary ability of PAN’s privacy discriminator and privacy reconstructor.
Summary. First, although the PAN and PAN cannot always outperform the baseline methods in both utility and privacy, it achieves the best Pareto front for the utility-privacy tradeoffs across various adversaries and applications. Second, the utility, i.e., inference accuracy, of PAN’s Encoder output is even better than taht of the standard DNN. We will revisit this surprising result in 4.6 and 5.
4.4. Impact of the Lagrangian Multipliers
An important step in PAN’s training is to determine the Lagrangian multipliers in the adversarial training stage (see Eq.(5)). We verify that we are able to tune PAN’s utility-privacy tradeoff point through setting different , , and in adversarila training phase (see Eq.(1)), shown in Figure 5. Let and , we show evaluate the influence of Lagrangian multiplier on PAN tradeoff performance over two typical applications (e.g. and ), with five discrete choices of . As for PAN, empirically assuming and , we compare six discrete choices of . And we see that the optimal choice of Lagrange multiplier, among above optional space, is for PAN on digit classification (: MNIST), is for PAN on non-speech sound recognition (: Ubisound), is for PAN on human activity detection (: Har), and for PAN on driver behavior classification (: StateFarm).
Summary. The Lagrange multipliers , , and bring flexibility to PAN to satisfy different requirements of utility-privacy tradeoffs according to the relative importance between utility and privacy budgets across various tasks/applications. And we note it is exhaustive to search the optimal , and , since we can always search it from a finer-grained discrete space (e.g. ). An alternative in the future work is to leverage the automated search technique, e.g. deep deterministic policy gradient algorithm, for efficient searching.
|Applications||Encoder’s Cost on Xiaomi Mi6 Smartphone|
|Latency ()||Storage ()||Energy ()|
|Digit recognition (: MNIST)||26||135||0.8|
|Image classification (: CIFAR-10)||31||198||1.6|
|Image classification (: ImageNet)||102||310||3.2|
|Acoustic event recognition (: Ubisound)||42||213||1.8|
|Human activity prediction (: Har)||27||269||0.9|
4.5. Performance on Smartphone
We next evaluate PAN on a commercial off-the-shelf smartphone with six Android applications.
4.5.1. Resource Cost of PAN’s Encoder on Smartphone
This subsection evaluates the run-time execution cost (e.g. latency, storage, and energy consumption) of the learned PAN’s Encoder for encoding different formats of data on the Xiaomi Mi6 smartphone. Specifically, we deploy the learned Encoder on the smartphone to interrupt and encode the incoming testing sample into features (i.e., Encoder output). And then the Encoder output is fed into the corresponding Android Apps for task recognition and privacy validation. In this experiment, the task classifier is embedded in the corresponding Android App, and the privacy validation models (i.e., private attribute classifier and privacy reconstructor) are executed on the cloud to attack the Encoder output collected by Android APP. We summarize the on-device execution cost of PAN’s Encoder to encode five formats of data in Table 2. In particular, we load the PAN’s Encoder (parameters and architecture files) in the smartphone cache to speedup processing, since it only occupies storage. And the multiply-accumulate (MAC) operations of the Encoder network are run using smartphone CPU (Sicong et al., 2017). PAN’s Encoder occupies only of memory, takes of encoding latency, and incurs of energy cost for each encoding pass of raw data.
Summary. PAN’s Encoder does not incur notable high resource cost. Therefore it is compact to deploy on the resource-constrained mobile platforms as a data preprocessing middleware. In particular, it takes low memory usage since the Encoder only contains convolutional layers, without storage-exhaustive fully-connected layers. The execution delay is only several milliseconds (Liu et al., 2018). And the energy cost is less than , which is insignificant compared with Xiaomi Mi6’s battery capacity, i.e., .
|Input to App||Utility (%)||Specified privacy (%)||Intuitive privacy|
|Case A: Raw image|
|Case B: DNN features|
|Case C: PAN’s Encoder output|
4.5.2. Case studies on driver behavior recognition App
The user inputs data to an Android App to recognize driver behavior. Meanwhile, he wants to hide the private attributes (e.g. the driver identity) and other agnostic private information (e.g. the driver race and car model). Therefore, he leverages PAN to encode the raw image into features and only deliver the Encoder output to the Android App. We artificially play an example trace of the driver behavior during the study with driver images from drivers, selected from testing samples of StateFarm (). We consider 3 cases of the input data to the driver behavior recognition Android App: Case A: raw data, Case B: features generated by a standard DNN; and Case C: PAN’s Encoder output. Table 3 shows the evaluation results on the driver behavior recognition App for three cases. In Case A with raw image input, the driver classification accuracy (utility) by App is , while the adversary’s accuracy of predicting the private attribute, i.e., driver identity, is . In Case B of DNN feature input, the utility of classifying driver behavior is , and the private driver identity inference accuracy by the malicious attacker is . It indicates that both the raw data and DNN feature cases reveal private attribute-correlated information. As for Case C with PAN’s Encoder output, it incurs an improvement () in driver behavior recognition accuracy, and reduce the private identity prediction accuracy by . Meanwhile, the intuitive reconstruction privacy in Case C is , which is larger than that in Case B () and Case C (). The more significant reconstruction error implies the less private information leakage risk.
Summary. This outcome demonstrates PAN’s Encoder improves utility with quantified privacy guarantees.
4.6. Visualization of PAN’s Encoder Output
In this subsection, we further visualize the PAN’s Encoder output in terms of feature distribution and the reconstruction privacy to seek insight into answering the following questions: what is the impact of PAN on the learned features, how does PAN disentangle the feature components relevant to privacy from those relevant to utility, and how well does PAN preserve the reconstruction privacy of raw data?
4.6.1. Visualization of PAN’s Encoder Output on Feature Space
Fig. 6 and Fig. 7 visualize how the feature manifold is derived by DNN, DNN(resized), and PAN. First, in Fig. 6, PAN’s Encoder output is highly separable as DNN method do on the feature space, which indicates its utility for task recognition. While the manifold driven by the Hybrid baseline with PCA and noise addition processes on DNN features is blurry, this is why the resized features by Hybrid DNN method hurt utility. Moreover, the feature distribution (manifold) formed by PAN is the most constrictive one compared to that from the DNN and Hybrid baselines, which leads to the improved utility. Second, PAN to push the features away from redundant private information, for privacy, makes the manifold more constrictive, so that enhances the utility. Specifically, to zoom in on two categories of images from ImageNet () for more details about how PAN and DNN form the feature manifold to achieve utility-privacy tradeoff, as shown in Fig. 7. The target task is to classify the two categories, ”sailboat” and ”bus”. The private background information in the ”sailboat” raw image is ”water”, and the private information in the ”bus” image is ”road”. We see PAN pushes features towards the constrictive space dominated by the samples without redundant (private) information, i.e., ”sailboat without water” and ”bus without road”, which guarantees privacy, avoids over-fitting, and improves utility as well. While the DNN method may capture the background (private) information ”water” and ”road” and retain them in the feature manifold to help the target task classification of ”sailboat” and ”bus”, therefore hurts privacy. We defer the theoretical interpretation of this result to 5.
4.6.2. Visualization of PAN’s Encoder output on Reconstruction Privacy
Fig. 8 visualizes the reconstruction privacy of PAN’s Encoder output, in comparison to the baseline approaches, using two ”bus” image samples from ImageNet. We adopt the same architectures of encoder (i.e., 12 conv layers, 5 pooling layers, and 1 batch-normalization layer) and privacy reconstructor (i.e., the encoder turned upside down) to decode the features generated by DNN, Hybrid DNN, and PAN for fair comparison. We see the images reconstructed from the DNN features convey the target object ”bus” information and the private background ”road” information, indicating a high risk of private background leakage. Adding noise to the images in DP and FL or adding noise to the features in Hybrid DNN baselines obfuscate both utility-related ”bus” information and the privacy-correlated background ”road” information, compromising task detection accuracy (utility) at the cost of privacy. The PAN, instead, only muddles the utility-irrelevant private information ”road”, making background information reconstruction impossible without compromising the utility.
5. Manifold based Interpretation
Our evaluation reported above shows that PAN is able to train an Encoder that improves utility and accuracy at the same time. This section attempts to provide a theoretical interpretation of this surprising result.
We resort to the manifold perspective of the deep model. It is common in literature to assume that the high-dimensional raw data lies on a lower dimensional manifold (Chien and Chen, 2016). A DNN can also be viewed as a parametric manifold learner utilizing the nonlinear mapping of multi-layer architectures and connection weights. We decompose the input data into two orthogonal lower dimensional manifolds: . Here, the component is the manifold component that is both necessary and sufficient for task recognition (e.g. driver behavior). Thus, ideally, we want our training algorithm to rely on this information for task recognition solely. Formally, for the utility discriminator (UD), this implies that . And the other manifold component , orthogonal to , may or may not contain information for the objective class, but it is dispensable for task detection. In practice, the real data does have redundant correlations. Thus may be learned for task recognition, but unnecessary. However, revealing is likely to contain some sensitive information (e.g. driver identity information and background information) thus hurt the privacy. If we assume that there does exist a sweet-spot tradeoff between utility and privacy, that we hope to find, then it must be the case that is not sensitive.
The features learned by standard discriminative learning to minimize the classification error based on information from , will mostly likely overlap (non-zero projection) with both and . And the overlap with compromises the privacy (as evident from our experiments). Meanwhile, the projection of manifold on is significant as it might capture other extra sensitive features, which will help task recognition accuracy. Apart from privacy, the redundant correlation in is also likely only be spurious in training data. Thus, merely minimizing classification loss can lead to over-fitting.
This is where we can skill two birds with one stone via an adversarial process. In PAN, the Encoder is trained by the utility-specified discriminative learning objective (Eq.(2)) and privacy-imposed adversarial learning objective (Eq.(5)), to remove extra sensitive information in features as shown in Fig. 9. The transformed manifold formulated by Encoder is forced by discriminative learning objective (Eq.(2)), just like the traditional approach, to contain information from both as well as . However, the adversarial training objective (Eq.(5)) will push features away (or orthogonal) from . In this way, we get privacy as well, since as a function of which has two manifolds, being orthogonal to forces it to only depend on .
Meanwhile, from a generalization perspective, in the training data, the spurious information from that might over-fit the training data is iteratively removed by the adversarial training objective (Eq.(5)), leading to enhanced generalization. For example, as shown in Fig. 7, if we want to discriminate between ”bus” and ”sailboat”, the background information ”road” in the image can help in most cases but can also mislead when the test image contains a ”sailboat” being transported on the ”road”. Therefore, by considering the background information, a standard DNN may not generalize well. In contrast, because the background may contain sensitive information and contribute to reconstruction error, PAN is likely to train the Encoder to remove information about the background and as a result, improve the task accuracy .
The above interpretation highlights the possibility that utility and privacy are not completely competing objectives in practice. We believe that a rigorous formalism and thorough investigation of this phenomena is necessary to shed more insight and derive better designs.
6. Related Work
Our work is inspired by and closely related to the following works.
Data Privacy Protection in Machine Learning based Services: Randomized noise addition (He and Cai, 2017) and Differential privacy (Dwork et al., 2014; Abadi et al., 2016) techniques are widely used by service providers to remove personal identities in the released datasets. They provide strong privacy guarantees but often lead to a significant reduction in utility (as shown in 4.3). GAP (Huang et al., 2017) learns a privatization scheme to sanitize the datasets for limiting the risk of inference attacks on personal private attributes. Erdogdu et al. design a privacy mapping scheme for continuously released time-series of user data to protect the correlated private information in the dataset (Erdogdu et al., 2015). Federated learning (Truex et al., 2018; Papernot et al., 2018) techniques prevent inference over the sensitive data exchanged between distributed parties during training by noisy aggregation of multiple parties’ resulting models. However, all of the above techniques are tailored to datasets or the statistics of datasets, which are unsuitable to our problem settings, i.e., run-time data privacy protection in the online inference phase. Meanwhile, applying the statistic information to a new context-aware case is still an open problem.
PAN is a very different approach toward preserving the privacy at run-time. As the raw data is generated, it is intercepted by a trained Encoder and the encoded features are then fed into the untrusted service.
Data Utility-Privacy Tradeoff using Adversarial Networks:
Adversarial networks have been explored for data privacy protection, in which two or more players defend against others with conflicting utility/privacy goals.
Seong et al. introduce an adversarial game to learn the image obfuscation strategy, in which the user and recogniser (attacker) strive for antagonistic goals: dis-/enabling recognition (Oh et al., 2017).
Wu et al. (Wu et al., 2018) propose an adversarial framework to learn the degradation transformation (e.g. anonymized video) of video inputs. This framework optimizes the tradeoff between task performance and privacy budgets.
However, both practices only consider protecting privacy against attackers that perform ”vision” recognition on specific data format (e.g. image), which is insufficient across diverse data modalities in ubiquitous computing. On the contrary, PAN allow users to specify the utility and privacy quantification towards different data formats according to application requirements. And we have evaluated the PAN’s usability across image, audio, and motion data formats in 4.
OLYMPUS (Raval et al., 2019) learns a data obfuscator (i.e. , AutoEncoder) to jointly minimize privacy and utility loss, where the privacy and utility requirements are modeled as adversarial networks.
, AutoEncoder) to jointly minimize privacy and utility loss, where the privacy and utility requirements are modeled as adversarial networks.
Although the above works share the idea of adversarial learning with ours, they use the generative model to obfuscate the raw data in a homomorphic way. In contrast, PAN uses the Encoder to learn a downsampling transformation, e.g., features, from the raw data, and send the features, rather than any forms of obfuscated/synthetic data, to service providers. A byproduct of this encoding is that PAN has more efficient data communication from the mobile to the service provider.
Deep Feature Learning for Utility or Privacy: Edwards et al. propose the adversarial learned representations that are both fair (independent of sensitive attributes) and discriminative for the prediction task (Edwards and Storkey, 2015). However, they target at fair decision by quantifying the dependence between representation and sensitive variables thus provide no privacy guarantees. Our work is closely related to (Chen et al., 2018) that employs a variational GAN to learn the representations that hide the personal identity and preserve the facial expression. However, it employs a generative model to minimize the reconstruction error for realistic image synthesis, which is vulnerable to agnostic privacy hacking by reverse engineering. In contrast, we maximize the reconstruction error for intuitive privacy-preserving. Also, discriminative and generative models are widely studied for latent feature learning, improving task inference but facilitating data reconstruction (Radford et al., 2015; Zhong et al., 2016). They would make intuitive privacy protection even harder. Osia et al. (Ossia et al., 2017) employ a combination of dimensionality reduction, noise addition, and Siamese fine-tuning to preserve privacy. Importantly both its dimensionality reduction and Siamese fine-tuning are based on discriminative training. Specifically its Siamese fine-tuning seeks to reduce the intra-class variation in features amongst training samples for the intended classification service. While the authors show these methods improve privacy, there is no systematic way to make tradeoffs between privacy and utility. In contrast, PAN presents a rigorous mechanism to discover good tradeoffs via a combination of discriminative, generative, and adversarial training. Mohammad et al. (Malekzadeh et al., 2018) present a privacy-preserving transformation called Replacement AutoEncoder (RAE). Like the Encoder in PAN, RAE also intends to eliminate sensitive information from the features while keeping the task-relevant information. Importantly it assumes that features/data relevant to an intended task do not overlap with those revealing sensitive information. As a result, it transforms the data by simply replacing the latter with features/data that are irrelevant to the intended task and do not reveal sensitive information. With that assumption, RAE eschews the hard problem of making a good tradeoff between utility and privacy and is also solely based on discriminative training. Furthermore, RAE does not reduce the amount of data that have to be sent to the service provider and would use significantly higher resources in transforming the data in which RAE will need to detect sensitive features/data, replace them, and then reconstruct the (modified) raw data. In contrast, PAN only needs to run the dimensionality-reducing Encoder on the raw data and send the features to the service provider, although PAN may require significantly more computational resources in training the Encoder, which is done off-line, in the cloud.
7. Concluding Remarks
This paper addresses the privacy concern when mobile users send their data to an untrusted service provider for classification services. We present PAN, an adversarial framework to automatically generate deep features from the raw data with quantified guarantees in privacy and utility. We report a prototype of PAN on Android platforms and cloud servers. Evaluation using Android applications and benchmark datasets show that PAN’s Encoder output attains a notably better privacy-utility tradeoff than known methods. To our surprise, it achieves even better utility than standard DNNs that are completely ignorant of privacy. We surmise that this surprising result can be understood from the perspective of manifold.
We also see three directions that the work reported in this paper can be extended. First, the PAN framework can accommodate other choices of context-aware utility, such as the sequence prediction, and privacy quantification, such as the information theory-based privacy, according to app requirements. Second, it can also integrate multiple utility discriminators and privacy attackers to train the Encoder, given the appropriate datasets accompanied by utility and privacy labels. Third, our experience shows that the training of multiple adversarial models in PAN
must be carefully synchronized to avoid model degradation caused by difference in their objectives and convergence speeds. Therefore, more heuristics and insights for guaranteeing and accelerating training convergence are much needed.
Acknowledgements.This work is supported in part by National Key RD Program of China , Natural Science Foundation of China (NSFC) , Shaanxi Fund , Open Fund of State Key Laboratory of Computer Architecture, The Youth Innovation Team of Shaanxi Universities, and Natural Science Foundation (NSF) Grant . The idea behind PAN was conceived during Sicong Liu’s yearlong visit to Rice University with support from China Scholarship Council to which the authors are grateful. The authors also thank the anonymous reviewers for their constructive feedback that has made the work stronger.
- Deep learning with differential privacy. In Proceedings of SIGSAC, ACM, pp. 308–318. Cited by: §1, §6.
- Privacy risk in cybersecurity data sharing. In Proceedings of ACM Workshop on ISCS, pp. 57–64. Cited by: §1.
- Vgan-based image representation learning for privacy-preserving facial expression recognition. In Proceedings of CVPR Workshops, pp. 1570–1579. Cited by: §6.
- Deep discriminative manifold learning. In Proceeding of ICASSP, IEEE, pp. 2672–2676. Cited by: §5.
- HDF5 for python. Note: http://www.h5py.org/ Cited by: §4.1.
- Imagenet: a large-scale hierarchical image database. In Proceedings of CVPR, Cited by: §1, §4.1, Table 1.
- Differential privacy under continual observation. In Proceedings of STC, pp. 715–724. Cited by: 1st item.
- The algorithmic foundations of differential privacy. Journal of Foundations and Trends in Theoretical Computer Science, pp. 211–407. Cited by: §6.
- Exposed! a survey of attacks on private data. Annual Review of Statistics and Its Application 4, pp. 61–84. Cited by: §1.
- Censoring representations with an adversary. arXiv preprint arXiv:1511.05897. Cited by: §6.
- Privacy-utility trade-off for time-series with application to smart-meter data. In Proceedings of Workshops at AAAI, Cited by: §6.
- . In Proceedings of ICIP, IEEE, pp. 4034–4038. Cited by: 1st item.
- Generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2672–2680. Cited by: §3.2.
- Android.util.lrucache. Note: https://developer.android.com/reference/android/util/LruCache.html Cited by: §4.1.
- TensorFlow mobile. Note: https://www.tensorflow.org/mobile/ Cited by: §4.1.
- Data preparation. Note: https://cloud.google.com/ml-engine/docs/tensorflow/data-prep Cited by: §1, 3rd item.
- Google now launcher. Note: https://en.wikipedia.org/wiki/Google_Now Cited by: §1, 3rd item.
- Differential private noise adding mechanism: basic conditions and its application. In American Control Conference (ACC), 2017, pp. 1673–1678. Cited by: 1st item, §6.
- Context-aware generative adversarial privacy. Entropy. Cited by: §6.
- Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: 1st item.
- State farm distracted driver detection. Note: https://www.kaggle.com/c/state-farm-distracted-driver-detection Cited by: §1, §4.1, Table 1.
- Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.2.
- The cifar-10 dataset. Note: https://goo.gl/hXmru5 Cited by: §1, §4.1, Table 1.
- Computational intelligenceComputational intelligence. pp. 47–81. Cited by: §2, 2nd item.
- The mnist database of handwritten digits. Note: https://goo.gl/t6gTEy Cited by: §1, §4.1, Table 1.
- Efficient mini-batch training for stochastic optimization. In Proceedings of SIGKDD, pp. 661–670. Cited by: §3.2.
- On-demand deep model compression for mobile devices: a usage-driven model selection framework. In Proceedings of ACM MobiSys, Cited by: §4.5.1.
- Cited by: §1, item 2, 2nd item.
- Replacement autoencoder: a privacy-preserving algorithm for sensory data analysis. In Proceedings of IEEE IoTDI, pp. 165–176. Cited by: §6.
- Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5, pp. 10562–10582. Cited by: §2.
- V-net: fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of 3DV, IEEE, pp. 565–571. Cited by: 1st item.
Adversarial image perturbation for privacy protection a game theory perspective. In Proceedings of ICCV, pp. 1491–1500. Cited by: §6.
- A hybrid deep learning architecture for privacy-preserving mobile analytics. arXiv preprint arXiv:1703.02952. Cited by: §1, 4th item, §6.
- Scalable private learning with pate. Proceddings of ICLR. Cited by: 2nd item, §6.
- Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: item 2, §6.
- Olympus: sensor privacy through utility aware obfuscation. Proceedings of PET. Cited by: §6.
- UbiEar: bringing location-independent sound awareness to the hard-of-hearing people with smartphones. Journal of IMWUT. Cited by: §1, §4.1, §4.5.1, Table 1.
- TensorFlow. Note: https://www.tensorflow.org/tutorials/ Cited by: §4.1.
- A hybrid approach to privacy-preserving federated learning. arXiv preprint arXiv:1812.03224. Cited by: 2nd item, §6.
- Har: dataset for human activity recognition. Note: https://goo.gl/m5bRo1 Cited by: §1, §4.1, Table 1.
- Towards privacy-preserving visual recognition via adversarial training: a pilot study. In Proceedings of ECCV, Cited by: §6.
- Deconvolutional networks. In Proceedings of CVPR, Cited by: 2nd item.
An overview on data representation learning: from traditional feature learning to recent deep learning.
Journal of Finance and Data Science, pp. 265–278. Cited by: item 2, §6.