Unsupervised Information Obfuscation for Split Inference of Neural Networks

Splitting network computations between the edge device and a server enables low edge-compute inference of neural networks but might expose sensitive information about the test query to the server. To address this problem, existing techniques train the model to minimize information leakage for a given set of sensitive attributes. In practice, however, the test queries might contain attributes that are not foreseen during training. We propose instead an unsupervised obfuscation method to discard the information irrelevant to the main task. We formulate the problem via an information theoretical framework and derive an analytical solution for a given distortion to the model output. In our method, the edge device runs the model up to a split layer determined based on its computational capacity. It then obfuscates the obtained feature vector based on the first layer of the server model by removing the components in the null space as well as the low-energy components of the remaining signal. Our experimental results show that our method outperforms existing techniques in removing the information of the irrelevant attributes and maintaining the accuracy on the target label. We also show that our method reduces the communication cost and incurs only a small computational overhead.



There are no comments yet.


page 1

page 2

page 3

page 4


Communication-Computation Trade-Off in Resource-Constrained Edge Inference

The recent breakthrough in artificial intelligence (AI), especially deep...

SC2: Supervised Compression for Split Computing

Split computing distributes the execution of a neural network (e.g., for...

Optimal Model Placement and Online Model Splitting for Device-Edge Co-Inference

Device-edge co-inference opens up new possibilities for resource-constra...

Communication and Computation Reduction for Split Learning using Asynchronous Training

Split learning is a promising privacy-preserving distributed learning sc...

Communication-Computation Efficient Device-Edge Co-Inference via AutoML

Device-edge co-inference, which partitions a deep neural network between...

Progressive Feature Transmission for Split Inference at the Wireless Edge

In edge inference, an edge server provides remote-inference services to ...

Supervised Compression for Resource-constrained Edge Computing Systems

There has been much interest in deploying deep learning algorithms on lo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years, the surge in cloud computing and machine learning has led to the emergence of Machine Learning as a Service (MLaaS), where the compute capacity of the cloud is used to analyze the data generated on edge devices. One shortcoming of the MLaaS framework is the leakage of the clients’ privacy-sensitive data to the cloud server. To address this problem, several cryptography-based solutions have been proposed which provide provable security at the cost of increasing the communication cost and delay of the cloud inference by orders of magnitude 

(juvekar2018gazelle; rathee2020cryptflow2). Such cryptography-based solutions are applicable in use-cases where delay is tolerable such as healthcare (microsoftAIHealth), but not in scenarios where millions of clients request fast and low communication cost responses such as in Amazon Alexa or Apple Siri applications. A light-weight alternative to cryptographic solutions is to hide sensitive information on the edge device, e.g., by blurring images before sending them to the server (vishwamitra2017blur). This approach, however, is task-specific and is not viable for generic applications.

Another approach is split inference which provides a generic and computationally efficient data obfuscation framework (kang2017neurosurgeon; chi2018privacy). In this approach, the service provider trains the model and splits it into two sub-models, and , where contains the first few layers of the model and contains the rest. At inference time, the client runs on the edge device and sends the resulting feature vector to the server, which then computes the target label as . To protect the sensitive content of the client’s query, the model is required to be designed such that only contains the information related to the underlying task. This aligns well with the recent privacy laws, such as GDPR (gdpr), that restrict the amount of collected information to the necessary minimum. For instance, when sending facial features for cell-phone authentication, the client does not want to disclose other information such as their mood or their makeup. We denote such hidden attributes as in the remainder of this paper.

Figure 1: Unsupervised data obfuscation in split inference setting. and are the client and server models, respectively, and is the obfuscation function. The client computes and sends to the server to predict the target attribute as . The obfuscator is designed to minimize the content in that is irrelevant to , and also to be efficient enough to run on the edge device.

Current methods for data obfuscation in split inference aim to remove the information corresponding to a known list of hidden attributes. For example, adversarial training (feutry2018learning) and noise injection (mireshghallah2020shredder) methods minimize the accuracy of an adversary model on , and the information bottleneck method (osia2018deep) trains the model to minimize the mutual information between the query and . The set of hidden attributes, however, can vary from one query to another. Hence, it is not feasible to foresee all types of attributes that could be considered sensitive for a specific MLaaS application. Moreover, the need to annotate inputs with all possible hidden attributes significantly increases the training cost as well.

In this paper, we propose an alternative solution in which, instead of removing the information that is related to a set of sensitive attributes, we discard the information that is not used by the server model to predict the target label. Our contributions are summarized in the following:

  • [itemsep=1pt,leftmargin=0.5cm]

  • We propose an unsupervised obfuscation mechanism, depicted in Figure 1, and formulate a general optimization problem for finding the obfuscated feature vector . The formulation is based on minimization of mutual information between and , under a distortion constraint on model output . We then devise a practical solution for a relaxation of the problem using the SVD of the first layer of .

  • We perform extensive experiments on several datasets and show that our methods provide better tradeoffs between accuracy and obfuscation compared to existing approaches such as adversarial training, despite having no knowledge of the hidden attribute at training or inference phases. We also investigate the role of the edge computation and show that, with higher edge computation, the client obtains better obfuscation at the same target accuracy.

2 Problem Statement

Let and be the client and server models, respectively, and be the obfuscation function. We consider a split inference setup, where the model is trained with a set of examples and their corresponding target labels . At inference phase, clients run on their data and send to the server, where . The goal of the data obfuscation is to generate such that it contains minimal information about the sensitive attributes, yet the predicted target label for is similar to that of , i.e., . We consider the unsupervised data obfuscation setting, where the sensitive attributes are not available at training or inference phases, i.e., the obfuscation algorithm is required to be generic and remove information about any attribute that is irrelevant to the target label.

2.1 Threat Model

Client model.

Upon receiving the service, the client decides on the best tradeoff of accuracy, data obfuscation, and computational efficiency, based on their desired level of information protection and also the computational capability of the edge device. Similar mechanisms are already in use in AI-on-the-edge applications. For example, in the application of unlocking smart phones with face recognition, the client can specify the required precision in face recognition, where a lower precision will provide faster authentication at the cost of lower security 


Server model. The server is assumed to be honest-but-curious. It cooperates in providing an inference mechanism with minimum information leakage to abide by law enforcement (gdpr) or to have competitive advantage in an open market. The server performs inference of the target attribute , but might try to extract the sensitive information from the obfuscated feature vector, , as well.

Adversary model. The adversary tries to infer sensitive attribute(s), , from the obfuscated feature vector, . We consider a strong adversary with full knowledge of the client and server models, the training data and training algorithm, and the client’s obfuscation algorithm and setting. The client and the server, however, do not know the adversary’s algorithm and are not aware of the sensitive attributes that the adversary tries to infer.

3 Related Work

Prior work has shown that representations learned by neural networks can be used to extract sensitive information (song2019overlearning) or even reconstruct the raw data (mahendran2015understanding). Current methods for data obfuscation can be categorized as follows.

Cryptography-based Solutions. A class of public-key encryption algorithms protects the data in the transmission phase (al2003certificateless), but cannot prevent data exposure to a curious server. Nevertheless, these methods can be used in conjunction with our approach to strengthen the defense against external adversaries. Another type of cryptographic methods allows running inference directly on the encrypted data (rathee2020cryptflow2)

at the cost of significant communication and computation overhead. As an example, using the state-of-the-art cryptographic method, performing inference on a single ImageNet data takes about half an hour and requires

GB data transmission (rathee2020cryptflow2). We consider scenarios where the server provides service to millions of users (e.g., in Amazon Alexa or Apple Siri applications), and users expect low communication and fast response. Hence, classic solutions for secure function evaluation are not applicable to our scenario due to their high computational and communication cost.

Noise Injection. In this method, the client sends a noisy feature instead of to the server, where is drawn from a randomized mechanism parameterized by

(e.g., a Gaussian distribution). A typical approach one could employ is differential privacy (DP) 

(dwork2006differential; dwork2006calibrating; dwork2014algorithmic), which guarantees that the distribution of does not differ too much for any two inputs and . Using DP, however, can lead to a large loss of accuracy  (kasiviswanathan2011can; ullman2018tight; kairouz2019advances). To maintain the utility of the model, (mireshghallah2020shredder) proposed to solve the following:


where the first and second terms denote the cross entropy loss for the server and adversary, respectively. In general, while noise addition improves the privacy, it has been shown to significantly reduce the accuracy (liu2019better; li2019deepobfuscator).

Information Bottleneck (IB) is proposed to obfuscate the information related to a known set of sensitive attributes, . Let denote mutual information. The idea is to train that maximizes while minimizing  (osia2018deep; moyer2018invariant). The optimization is formulated as follows:


Adversarial Training (AT) is an effective method for obfuscating the information of a known set of sensitive attributes, while maintaining the accuracy on the target label. AT solves the following min-max optimization problem:


where denotes the cross-entropy loss. The above objective can be achieved through an adversarial training method (edwards2015censoring; hamm2017minimax; xie2017controllable; li-etal-2018-towards; feutry2018learning; li2019deepobfuscator; huang2017context). Upon convergence, the model generates , using which

cannot accurately estimate

, yet accurately predicts .

Existing obfuscation methods for split inference have several limitations. Except differential privacy which often significantly reduces the accuracy on , the underlying assumption in the above methods is that a set of hidden attributes is provided at training time. In practice, however, it might not be feasible to foresee and identify all possible sensitive attributes and annotate the training data accordingly. It also contradicts deployment at-scale since whenever a new attribute needs to be protected, the client model has to be retrained and re-distributed to all edge devices that use the service. Current approaches also often provide a poor tradeoff between accuracy and preventing the information leakage. Moreover, the tradeoff of accuracy and obfuscation with the client-side computation is not well studied in the split learning framework. In this paper, we characterize this tradeoff and propose to remove the content irrelevant to the main task, instead of obfuscating a predefined set of sensitive attributes. We empirically show that our method reduces the attack accuracy on hidden attributes, which are not known to the client or the server at training or inference times, at a small or no cost to the accuracy on the target label.

4 The Proposed Method

4.1 Problem Formulation

Let be the feature vector and be the corresponding label. Our goal is to design the obfuscation function, , such that contains the necessary and sufficient information about . Specifically, we want to (i) minimize the information that carries about , while (ii) maintaining the utility of the model for predicting as much as possible. We formulate the problem as follows:


The objective function in (4) minimizes the mutual information of and , while bounding the distortion to the model output. Note that our formulation is different from the information bottleneck (1) proposed by (osia2018deep; moyer2018invariant) in that it does not use and, hence, is unsupervised with respect to hidden attributes.

Let denote the entropy function. We have:

where the second equality holds since the obfuscation function is a deterministic algorithm and thus . The objective in (4) can therefore be written as:


Intuitively, a small indicates that, from the server’s point of view, the incoming queries look similar. For example, in face recognition applications, only basic properties of face images are transmitted to the server and other irrelevant attributes that would make the images different in each query, such as the background or makeup, are obfuscated.

4.2 Obfuscation for Linear Layers

Our goal is to develop a low-complexity obfuscation function, , that solves (5) with respect to the server’s model, . Figure 1 shows the block diagram of the method. In its general form, the function can be viewed as an auto-encoder (AE) network that is trained with the objective of (5). Such a network would be, however, computationally complex to run on edge devices and defeats the purpose of sending activations to the server for low-complexity inference.

To address the computational complexity problem, we design the obfuscation function with respect to the first linear layer (a convolutional or fully-connected layer) of . For linear models, the objective in (5) can be written as:


where is the weight matrix. In the following, we present our analysis for the linear models.

Definition 1.


denote the singular value decomposition (SVD) of

. The columns of provide an orthonormal basis . We write:


We have the following Lemmas.

Lemma 1.

If , then

’s are independent random variables with




, it is Gaussian with the following mean and variance:

Assume . We have:

Also, . Therefore, since and is Gaussian for any and , ’s are independent random variables. ∎

Lemma 2.

For , we have .


Assume . We have:

Also, . According to Lemma 1, ’s are independent Gaussian random variables. Thus, is multivariate Gaussian for any and . Hence, ’s are independent random vectors and . ∎

Lemma 3.

For , we have .


Since , we have . Lemma 2 also shows . Hence:

The left hand side can be also written as:

Hence, . We also have:

Therefore, , and . ∎

Lemma 4.

Let and . We have , where is the

-th singular value of



We have:

where is a one-hot vector with its -th element set to 1 and is the -th column of . Since ’s are orthonormal, we have . ∎

The following theorem provides the solution of the objective in (6).

Theorem 1.

Let as defined in Definition 1. Let where and ’s are sorted based on the singular values of . The objective in (6) is minimized by where


where , and .


Using Lemmas 1 and 3, we have , where . We have which is monotonically increasing in . Hence, can be reduced by suppressing the variance of , i.e., making ’s closer to zero.

Given a distortion budget, , the question now is which should be modified and by how much. From the entropy perspective, based on the assumptions above and Lemma 3, reducing the variance of each reduces the entropy by the same amount for all . Lemma 4, however, states that modifying by causes a distortion of , where is the -th singular value of . Since smaller ’s cause smaller distortion, the solution is achieved by sorting the singular values and then modifying the ’s corresponding to the smaller singular values towards zero one at a time until the budget is exhausted.

The following provides the solution more specifically. If in the weight matrix, the last coefficients do not contribute to and thus can be set to zero without causing any distortion. Now, assume the coefficients in range of to are to be set to zero. The total distortion will be . Also, the distortion caused by modifying by is , which we will set to be equal to the remaining distortion, , i.e., . This completes the proof. ∎

Definition 2.

The signal content of with respect to a matrix , or simply the signal content of , is the solution to (6) with . It is denoted by and defined as follows:


The remaining components of are called the null content defined as follows:


The signal content is the information that is kept after multiplying by , and the null content is the discarded information. By setting , the client reduces the entropy without introducing any distortion to the output of . We call this method distortion-free obfuscation herein. The entropy can be further reduced by removing components from the signal content as well, for which the optimal way for a desired distortion is determined by Theorem 1. We call this method distortion-bounded obfuscation in the remainder of the paper.

4.3 The Proposed Obfuscation Method

In the following, we present our framework for unsupervised data obfuscation in the split inference setup.

Training. The server trains the model with inputs and target labels

, where, at each epoch, various fractions of the signal content of different layers are removed (one layer at a time), so that the model becomes robust to removing the components of the signal content. The model is also trained to generate feature vectors,

, with uncorrelated Gaussian activations as specified in Theorem 1. To learn models with decorrelated activations, we used the penalty term proposed in (cogswell2015reducing) as , where is the covariances between activation pairs, is the Frobenius norm, and the operator extracts the main diagonal of a matrix into a vector. Additionally, the distribution of is forced to be close to Gaussian using the VAE approach (kingma2013auto), i.e., by learning to generate from a variational distribution with a Gaussian prior.

Inference. Upon providing the service, the server also provides a profile of the average reduction in target accuracy by removing a given fraction of the signal content for each split layer. The client first decides on the number of layers to be run locally on the edge device (determined based on the compute capacity) and then on the fraction of the signal content to maintain (determined based on the desired accuracy). For inference, the client computes the obfuscated feature vector and sends it to the server. The server then performs the rest of the computation and obtains .

Our framework provides a tradeoff between accuracy, obfuscation, and computational efficiency. Specifically, by running more layers locally (more edge computation), the client can achieve a better accuracy-obfuscation tradeoff, i.e., the same obfuscation can be obtained by discarding a smaller fraction of the signal content. Moreover, for a given split layer, the client can adjust the fraction of the signal content to be removed in order to obtain a desired tradeoff between accuracy and obfuscation. In Section 5, we provide empirical validations for the aforesaid tradeoffs.

Computational and Communication cost. Performing obfuscation requires the client to compute coefficients on the edge device, where the overhead of computing each is equivalent to -th of total computation in the first layer of . Therefore, the client performs an extra computation equivalent to the first layer of , where in practice. Note that the client is not required to recover , and can send only to the server, who has the basis and can compute accordingly. Therefore, our obfuscation method reduces the communication cost by a factor of compared to the case that the raw feature vector is sent to the server.

5 Experimental Results

5.1 Experiment Setup

Model architecture and training settings. We present the experimental results on an architecture used in prior work (song2019overlearning) which is shown in Table 1. We also use the same architecture for adversary’s model as the server model . We train the models for epochs using the Adam optimizer with an initial learning rate of . The learning rate is dropped by a factor of after and epochs.

Datasets. We perform our experiments on four visual datasets described below. Table 2 lists the target and hidden attributes of the datasets used.

  • [itemsep=2pt,leftmargin=0.5cm]

  • EMNIST (cohen2017emnist)

    is an extended version of the MNIST dataset where the labels are augmented with writer IDs. We select

    samples from EMNIST written by writers with examples per writer. We then split this dataset into , , and for training, validation, and testing. We use the digit label and the writer ID as the target and the hidden attributes, respectively.

  • FaceScrub (facescrub2; facescrub) is a dataset of celebrity faces labeled with gender and identity. We use gender as the target and identity as the hidden attribute. In experiments, we crop images using the face region bounding boxes specified in the annotations and resize them to .

  • UTKFace (utkface) is a dataset of face images labeled with gender and race, which we treat as the target and the hidden attributes, respectively. The images are cropped using the face region bounding boxes and resized to .

  • CelebA (celeba) is a dataset of celebrity images. Each image is labeled with 40 binary attributes, out of which, we select Smiling as the target attribute and {Male, Heavy_Makeup, High_Cheekbones, Mouth_Slightly_Open, Wearing_Lipstick, Attractive} as hidden attributes. These attributes have near balanced distribution of positive and negative examples. In experiments, we crop images to the face region and resize them to .

Layer Architecture

), ReLU, Maxpool(

), Batchnorm
2 CONV(), ReLU, Maxpool(), Batchnorm
3 CONV(), ReLU, Maxpool(), Batchnorm
4 FC(128), ReLU, Batchnorm
5 FC(64), ReLU, Batchnorm
6 FC(), Softmax
Table 1: Network Architecture. Each row shows a split layer, i.e., for the split layer , the input of that layer is sent to the server.
Dataset MNIST UTKFace FaceScrub CelebA
Target attribute digit gender gender smiling
Number of classes 10 2 2 2
Hidden attribute writer race identity gender makeup cheekbones mouth-open lipstick attractive
Number of classes 100 5 530 2 2 2 2 2 2
Table 2: Target and hidden attributes of the datasets used.

Measuring obfuscation. Several methods have been proposed to measure the information leakage of intermediate feature vectors in neural networks. One approach is computing the mutual information between the query and the feature vector  (kraskov2004estimating)

. In practice, measuring the mutual information is not tractable for high-dimensional random variables, unless certain assumptions are made about the probability distribution of the random variables of interest. A more practical approach computes the reconstruction error,

, where is estimated using the feature vector (mahendran2015understanding). Finally, attribute leakage can be defined based on the accuracy of an adversary model that predicts the hidden label from intermediate features.

In this paper, we follow the approach of predicting hidden attributes using an adversary model. Assume that each example has a target label and a hidden label . The adversary trains the model with , where is the same feature vector that the server also receives to do inference of the target label. Note that the training of is used as a post hoc process to evaluate the leakage of sensitive attributes and does not influence the client or server’s processing. We refer to the accuracy of on as target accuracy and the accuracy of on as attack accuracy.

5.2 Evaluations

Cumulative signal content. We start our analysis by computing the norm of the null and signal contents in every layer of . At each layer, the null content is discarded and the signal content is passed through the next layer. We compute the normalized amount of the information passed from the -th layer to the next as , where and are the activation vector and its signal content at the -th layer, respectively. Figure 2 shows the cumulative amount of the signal content preserved up to the -th layer, computed as . The plot suggests that the model gradually removes the content irrelevant to the target label from one layer to the next, thus acting as an obfuscator.

Figure 2: Cumulative preserved signal content at different layers of the model . The model acts as an obfuscator, gradually removing the content irrelevant to the target label from one layer to the next.
Figure 3: Attack accuracy versus the split layer for three settings: (1) baseline network without data obfuscation, (2) distortion-free obfuscation, and (3) distortion-bounded obfuscation with at most drop in target accuracy.
Figure 4: Tradeoff between utility and obfuscation for the UTKFace dataset, when the network is split at the input of the FC1 layer. As we increase the number of preserved features (), utility is increased (lower target error) but obfuscation is decreased (lower attack error).
Figure 5: Comparison between our method and feature pruning. The horizontal and vertical axes show the classification error on and , respectively. Our goal is to have low error on and high error on . In most cases, our method significantly outperforms feature pruning, i.e., with the same utility (target error), our method achieves better obfuscation (higher attack error). For CelebA, the sensitive attribute in this study is “gender”.

Investigating tradeoffs. Figure 3 shows the attack accuracy versus the split layer for three settings: (1) without obfuscation, (2) with distortion-free obfuscation (i.e., when ), and (3) with distortion-bounded obfuscation such that the drop in target accuracy is at most . The results illustrate the tradeoffs between edge computation, obfuscation, and target accuracy, described in the following.

  • [itemsep=2pt,leftmargin=0.5cm]

  • In all three cases, the attack accuracy is reduced as the network is split at deeper layers, i.e., when the edge device performs more computation. This observation indicates that more edge computation results in lower attack accuracy at the same target accuracy.

  • For each split layer, the attack accuracy of distortion-free obfuscation is less than that of baseline. This observation shows that even without decreasing the target accuracy, the feature vector can be modified to obtain a better obfuscation.

  • For each split layer, the distortion-bounded obfuscation further reduces the attack accuracy at a small reduction of at most in target accuracy. For the same split layer (same edge compute), the number of preserved features, , provides a tradeoff between the target accuracy and obfuscation. We show an example of this effect in Figure 4.

Comparison to pruning. Similar to our approach, pruning network weights eliminates features that do not contribute to the classification done by . Since pruning does not require access to the sensitive labels, it is a natural unsupervised obfuscation baseline. In the following, we compare our method with pruning in terms of target and attack accuracy. We split the network at the middle layer, i.e., the input of the layer. For our method, we keep the top components of from its signal content and filter out the rest. For pruning, we keep elements in and set the rest to zero. We adopt the pruning algorithm proposed by (li2016pruning) which works based on the norm of the columns of the following layer’s weight matrix. After pruning, we fine-tune to improve its accuracy. By changing , we achieve a tradeoff between attack and target error for pruning and our own method, and show the resulting tradeogg in Figure 5. In this figure, points with higher attack error and lower target errors are desired. As seen, our method achieves a better tradeoff compared to pruning. This is an after effect of minimizing the distortion to the target accuracy while maximizing the obfuscated “unrelated” information in our method.

Comparison with adversarial training. Among prior work in supervised data obfuscation, adversarial training is the natural baseline. We compare our method to adversarial training, which generally provides the best tradeoff of accuracy and obfuscation compared to other methods such as information bottleneck (osia2018deep; moyer2018invariant) and noise injection (mireshghallah2020shredder). We implemented the adversarial training framework proposed by (feutry2018learning) and trained the models in multiple settings with different parameters (Eq. 3) in range of to achieve the best performance. Figure 6 shows the tradeoff between utility and obfuscation by measuring how much the attack error increases when target error is increased by a given value. Note that, unlike our method, adversarial training assumes the hidden attribute is known at training time. Despite this, Figure 6 shows that, in most cases, our method significantly outperforms adversarial training, i.e., our method achieves better obfuscation (higher attack error) with the same utility (target error).

Figure 6: Comparison between our method and adversarial training. The horizontal and vertical axes show the classification error on and , respectively. Our goal is to have low error on and high error on . In most cases, our method significantly outperforms adversarial training, i.e., with the same utility (target error), our method achieves better obfuscation (higher attack error). For CelebA, the sensitive attribute in this study is “gender”.
Figure 7: Accuracy on the target (Smiling) and hidden attributes. Our method obfuscates the feature vector without the knowledge of hidden attributes at training or inference times. Adversarial training (AT) method maximizes the accuracy on Smiling, while minimizing the accuracy on Gender. As seen, AT successfully reduces accuracy on Gender attribute but, unlike our method, fails to obfuscate information of other attributes. This highlights the applicability of our method in practical settings as a generic obfuscator compared to specialized techniques such as AT.

We also do the experiments with multiple (unseen) hidden labels. Specifically, we consider the CelebA model trained to detect Smiling and evaluate two methods, (1) our method: we keep only component from the signal content of feature vector and then train one separate adversary model per hidden attribute, and (2) adversarial training: we first adversarially train an model to obfuscate Gender, and then train one separate adversary model to predict each hidden attribute.

For both of the above methods, the network is split at the input of the layer. As shown in Figure 7, our method outperforms adversarial training in both the target and attack accuracy. Specifically, our method results in a significantly lower attack accuracy on all hidden attributes compared to the baseline attack accuracy. The only exceptions are High_Cheekbones and Mouth_Open attributes, which highly correlate with the target attribute (a smiling person is likely to have high cheekbones and open mouth). The correlation between target and hidden attributes causes the signal content of the server and adversary models to have large overlaps and, hence, results in high attack accuracy. Also, as seen, the adversarially trained model successfully hides the information that it has been trained to obfuscate (Gender). The model, however, fails to remove information of other attributes such as Makeup or Lipstick. The results highlight the importance of the generic unsupervised obfuscation in scenarios where the sensitive attributes are not known. In such cases, unlike supervised obfuscation methods, our method successfully reduces the information leakage.

6 Conclusion

We proposed an obfuscation method for split edge-server inference of neural networks. We formulated the problem as an optimization problem based on minimizing the mutual information between the obfuscated and original features, under a distortion constraint on the model output. We derived an analytic solution for the class of linear operations on feature vectors. The obfuscation method is unsupervised with respect to sensitive attributes, i.e., it does not require the knowledge of sensitive attributes at training or inference phases. By measuring the information leakage using an adversary model, we empirically supported the effectiveness of our method when applied to models trained on various datasets. We also showed that our method outperforms existing techniques by achieving better tradeoffs between accuracy and obfuscation.