Lightweight and Unobtrusive Privacy Preservation for Remote Inference via Edge Data Obfuscation

12/20/2019 ∙ by Dixing Xu, et al. ∙ Zhejiang University Xi'an Jiaotong-Liverpool University Nanyang Technological University 9

The growing momentum of instrumenting the Internet of Things (IoT) with advanced machine learning techniques such as deep neural networks (DNNs) faces two practical challenges of limited compute power of edge devices and the need of protecting the confidentiality of the DNNs. The remote inference scheme that executes the DNNs on the server-class or cloud backend can address the above two challenges. However, it brings the concern of leaking the privacy of the IoT devices' users to the curious backend since the user-generated/related data is to be transmitted to the backend. This work develops a lightweight and unobtrusive approach to obfuscate the data before being transmitted to the backend for remote inference. In this approach, the edge device only needs to execute a small-scale neural network, incurring light compute overhead. Moreover, the edge device does not need to inform the backend on whether the data is obfuscated, making the protection unobtrusive. We apply the approach to three case studies of free spoken digit recognition, handwritten digit recognition, and American sign language recognition. The evaluation results obtained from the case studies show that our approach prevents the backend from obtaining the raw forms of the inference data while maintaining the DNN's inference accuracy at the backend.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

page 8

page 9

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The fast development of sensing and communication technologies and the wide deployment of Internet-enabled smart objects in the physical environments foster the forming of the Internet of Things (IoT) as a main data generation infrastructure in the world. The tremendous amount of IoT data provides great opportunities for various applications powered by advanced machine learning (ML) technologies.

IoT in nature is a distributed system consisting of nodes equipped with sensing, computing, and communication capabilities. In order to build scalable and efficient applications on top of IoT, edge computing is a promising hierarchical system paradigm. In edge computing, the widespread network edge devices (e.g., home gateways, set-top boxes, and personal smartphones) collect and process the data from the end devices that are normally smart objects deeply embedded in the physical environments (e.g., smart toothbrushes, smart body scales, smart wearables, and various embedded sensors). Then, the edge devices interact with the cloud backends of the applications to exchange processed data summaries and/or commands. Thus, by deploying certain data processing tasks on the Internet edge, the communication bandwidth usage can be reduced and the scalability of the IoT applications can be improved.

However, the implementation of the IoT edge that can leverage the latest ML technologies faces two challenges:

  • Separation of data sources and ML compute power:

    With the advances of deep learning, the depth of inference models and the needed compute power to support these deep inference models increase drastically. Thus, the execution of these deep inference models on the IoT end or edge devices that have limited compute resources may be infeasible or cause too long inference time. Moreover, the execution of deep inference models on battery-based edge devices (e.g., smartphones) may not be desirable due to high power consumption. A remote server-class or cloud backend with abundant ML compute power including powerful hardware acceleration is still desired for deep inference model execution.

  • Confidentiality of inference models: A deployable inference model often requires significant efforts in model training and manual tuning. Thus, an inference model in general contains intellectual properties under the enterprise settings. Even when the edge devices can execute the model and meet timing/energy constraints, deploying the inference model to the edge devices in the wild may lead to the risk of intellectual property infringement (e.g., extraction of the model from the edge device memory). Moreover, the leak of the inference model can aggravate the cybersecurity concern of adversarial examples [14]. Therefore, it is desirable to protect the confidentiality of the deep inference models.

To address the above two issues, remote inference is a natural solution, in which an edge device sends the inference data to the backend, then the backend executes the inference model and sends back the result. There are existing applications adopting remote inference. PictureThis [32], a mobile App, captures a picture of plant using the smartphone’s camera and then sends the picture to the cloud backend that runs an inference model to identify the plant. Amazon Alexa, a voice assistant, processes captured voices locally and also transmits the voice recordings to the cloud backend for further analysis and storage [2, 41]. However, remote inference inevitably incurs privacy concerns, especially when the inference data is collected in the user’s private space and time, such as voice recordings in households [41]. The pictures for plant recognition may also be misused by the curious cloud backend to infer the users’ locations based on the background of the pictures. In particular, the lack of privacy protection in remote inference may go against the recent legislation such as the General Data Protection Regulation in European Union.

Therefore, privacy preservation mechanisms are needed for remote inference. To this end, CryptoNets [13] has been proposed to homomorphically encrypt the inference data, perform inference based on the encrypted data, and generate encrypted results. While CryptoNets provides a strong protection of the confidentiality of the inference data, it incurs significant compute overhead to the edge devices [20]. Specifically, the homomorphic encryption of a grayscale image takes about ten minutes on a Raspberry Pi 2 Model B single-board computer that has a quad-core ARM CPU. Differently, in this paper, we aim to design a lightweight data obfuscation approach suitable for resource-constrained edge devices to protect inference data privacy in the remote inference scheme. With the lightweight approach, the edge device spends little time and energy to obfuscate the inference data before transmitting to the backend. Moreover, we aim to achieve another feature of unobtrusiveness, in that i) the inference model at the backend admits both original and obfuscated inference data, and ii) the edge device does not need to indicate whether obfuscation is applied. The unobtrusiveness feature provides three advantages. First, the system is back-compatible with old edge devices that cannot be upgraded to perform the data obfuscation. Second, the edge device can easily choose to opt into or out of data obfuscation given its run-time computation and battery lifetime statuses. Third, the exemption of obfuscation indication helps improve privacy protection.

In this paper, we present ObfNet

, an approach to realize the lightweight and unobtrusive data obfuscation at the IoT edge for remote inference. ObfNet is a small-scale neural network that can run at resource-constrained edge devices and introduces light compute overhead. ObfNet’s sophisticated, many-to-one non-linear mapping from the input vector to the output vector offers a form of data obfuscation that can well protect the confidentiality of the raw forms of the input data. To achieve unobtrusiveness, we design a training procedure for ObfNet as follows. We assume that the backend has an in-service deep inference model (referred to as

InfNet

). The backend concatenates an untrained ObfNet with the InfNet and then trains the concatenated model using the training dataset that was used to train InfNet. During the training, only the weights of ObfNet are updated by backpropagation until convergence. The backend repeats the above procedure to generate sets of distinct ObfNets and transmits a unique set to each of the edge devices. Then, each edge device chooses an ObfNet randomly and dynamically from the received set and uses it for obfuscating the data for remote inference.

We evaluate the ObfNet approach by three case studies of 1) free spoken digit (FSD) recognition, 2) MNIST handwritten digit recognition, and 3) American sign language (ASL) recognition. The case studies show the effectiveness of ObfNet in protecting the confidentiality of the raw forms of the inference data while preserving the accuracy of the remote inference. Specifically, the obfuscated samples are unrecognizable auditorily by invited volunteers for FSD and visually for MNIST and ASL, while the obfuscation causes inference accuracy drops of generally within 1% from the original inference accuracy of about 99%. We also benchmark the ObfNet approach on a testbed consisting of i) a Coral development board equipped with Google’s edge tensor processing unit (TPU) that acts as an edge device and ii) an NVIDIA Jetson AGX Xavier equipped with a Volta graphics processing unit (GPU) that acts as the backend. Measurements on the testbed show the effectiveness of ObfNet and the advantage of remote inference in terms of processing times.

The reminder of this paper is organized as follows. Section II reviews related work. Section III states the problem and overviews our approach. Section IV presents performance evaluation via three case studies. Section V presents benchmark results on the testbed. Section VI concludes this paper.

Ii Related Work

This section provides a brief taxonomy of existing privacy-preserving ML approaches that are categorized into privacy-preserving training and privacy-preserving inference approaches, as illustrated in Fig. 1. The nodes in a privacy-preserving ML system often have two roles of participant and coordinator. In the context of this paper, an edge device is a participant and the backend is the coordinator.

Privacy- preserving ML

Privacy- preserving training

Distributed machine learning [17, 37, 27, 6, 45, 3]

Training data obfuscation

Additive perturbation [9, 28, 34]

Multiplicative projection [26, 20, 36]

Training data encryption [16, 44, 33]

Privacy- preserving inference

CryptoNets [13]

Partitioned DNNs [31, 42]

ObfNet
Fig. 1: A taxonomy of privacy-preserving ML approaches.

In a privacy-preserving training process orchestrated by the coordinator, the participants collaboratively train a global model from their disjoint training datasets while the privacy of the training datasets is preserved. Distributed machine learning (DML) [17, 37, 27, 6, 45, 3] is a typical scheme of this category, in which only model weights are exchanged among the nodes. However, the local model training and the iterative weight exchanges are compute- and communication-intensive. If the training data samples are to be transmitted to the coordinator, they can be obfuscated or encrypted for data privacy protection. Obfuscation is often achieved via additive perturbation and multiplicative projection. Additive perturbation implemented via Laplacian [9], exponential [28], and median [34] mechanisms can provide differential privacy [10]. Multiplicative projection [26, 20, 36] protects the confidentiality of the raw forms of the original data. In [26, 20], the participants use distinct secret projection matrices, where the Euclidean distances among the projected data samples are no longer preserved. This can degrade the performance of distance-based ML algorithms. To address this issue, in [26], the participants need to project a number of public data vectors and return the results to the coordinator that will learn a regress function to preserve Euclidean distances. In [20], deep neural networks (DNNs) are used to learn the sophisticated patterns of the projected data from multiple participants. ML can be also performed based on homomorphically encrypted data samples [7, 16, 44, 33]. However, homomorphic encryption incurs high compute overhead (millions times higher than multiplicative projection [20]) and data swelling.

In privacy-preserving remote inference, the participants transmit unlabeled data samples to the coordinator for inference, while the participants’ privacy in the inference data should be preserved. The proposed ObfNet is a privacy-preserving remote inference approach. We now review the existing privacy-preserving remote inference approaches including CryptoNets [13] and partitioned DNN approaches [31, 42]. CryptoNets [13]

adjusts the feed-forward neural network trained with plaintext data such that it can be applied to the homomorphically encrypted data to make encrypted inference. However, the high compute overhead of homomorphic encryption renders CryptoNets unpractical for edge devices. Moreover, the neural network of CrytoNets needs to use square polynomials as the activation functions, which are rare for existing neural networks that often adopt the sigmoid function or rectified linear unit (ReLU).

In [31, 42], DNN partition approaches are proposed for privacy-preserving remote inference. Specifically, a trained DNN is split into two parts. The first part, which can be considered a feature extractor, is executed by the participant, while the second part (i.e., inference model) is executed by the coordinator. For privacy protection, various alterations are applied on the feature vector extracted by the participant, which include dimension reduction and Siamese fine-tuning in [31], and nullification and additive noisification for differential privacy in [42]. The inference model is retrained using the altered feature vectors of the training data samples. A major limitation of the DNN partition approach [42] is that the feature extractor needs to be unique. Thus, all participants need to use the same feature extractor. This renders the system vulnerable to the collusion between any single participant and the curious coordinator, because the coordinator may reconstruct other participants’ original inference data samples once they obtain the feature vector alteration mechanism. Moreover, the participants cannot choose to opt out of the privacy protection, whereas our ObfNet approach allows the participants to choose to opt in or out freely. The feature extractor in [31] consists of 11 to 13 convolutional layers, which incur considerable compute overhead to edge devices.

From the above review, the training data obfuscation implemented via additive perturbation or multiplicative projection is a lightweight privacy-preserving training approach that can be suitable for resource-constrained IoT edge and even end devices. In contrast, lightweight privacy-preserving inference has received limited research. In particular, as IoT applications may prefer to use pre-trained deep InfNets, the development of a lightweight privacy-preserving inference approach that can adopt pre-trained InfNets is meaningful. Moreover, it is desirable if the approach introduces privacy preservation unobtrusively such that no modifications are needed for legacy edge devices and backend that were designed with no privacy preservation considerations. To achieve these goals, in this paper, we design and present ObfNet.

Iii Problem Statement and Approach Overview

In this section, we state the privacy preservation problem in remote inference systems (Section III-A) and then present the overview of the proposed ObfNet approach (Section III-B).

Iii-a Problem Statement

We consider a remote inference system that consists of multiple resource-constrained edge devices and a resourceful backend. The backend can be a server program in the cloud. The edge devices send the inference data samples to the backend for inference. The backend executes a pre-trained inference neural network (InfNet) using the inference data samples. If the edge devices require the inference results, the backend sends the results to the edge devices. This remote inference scheme is advantageous if the heavyweight InfNet causes too long execution time or is not feasible on the resource-constrained edge devices.

Remote inference leads to privacy concerns if the inference data samples are privacy-sensitive. In particular, the inference data samples may contain private information beyond the inference application. Therefore, in this paper, we aim to protect the confidentiality of the raw form of each inference data sample. The data form confidentiality is an immediate and basic privacy requirement in many applications. In the experiments conducted in this paper (cf. Section IV), we use the human’s ability to interpret the protected inference data samples as a measure of privacy preservation. The inference results generated by the backend may also contain information about the corresponding edge devices. However, in this paper, we do not consider the privacy contained in the inference results, since the edge devices should have no expectation of it if they are willing to join the remote inference system.

Remote inference has two major privacy threats:

  • Honest-but-curious backend. The backend follows the privacy preservation mechanism described in Section III-B to honestly serve the edge devices. It does not intend to tamper with any data exchanged with the edge devices. However, the backend is curious about the edge devices’ private information contained in the inference data, since the backend may benefit from the private information irrelevant to the objective of the inference application. For example, the backend may misuse the extracted private information for unauthorized purposes, e.g., targeted advertisement and political advocacy [11].

  • Potential collusion between edge devices and the backend. We assume that the edge devices are not trustworthy in that they may collude with the backend in finding out other edge devices’ privacy contained in the inference data. The colluding participants are also honest, i.e., they will faithfully transmit their inference data with or without obfuscation. We aim to maintain the privacy protection for an edge device when any or all other edge devices are colluding with the backend.

Iii-B Approach Overview

To address the privacy threats discussed in Section III-A, in this paper, we propose an obfuscation neural network (ObfNet) approach to obfuscate the inference data sample before being transmitted to the backend. In particular, the design of ObfNet aims to provide two properties of light weight and unobtrusiveness as discussed in Section I.

Fig. 2: ObfNet for remote inference. The edge device desires privacy protection and thus applies ObfNet to obfuscate inference data sample to . The edge device does not desire privacy protection and thus directly transmits the original inference data sample to the backend. The backend feeds and to the pre-trained inference model InfNet to generate the results and .

ObfNet is a small-scale neural network executed on the edge device to obfuscate the inference data samples. In our proposed approach, the backend generates multiple sets of ObfNets by following an approach detailed in the next paragraph and then transmits a unique set to each of the edge devices. An edge device that wishes to obfuscate the inference data chooses one ObfNet from the received set and feeds the inference data to the chosen ObfNet. Then, the edge device transmits the output of the ObfNet, i.e., the obfuscated inference data, to the backend for inference. The old edge devices that cannot be upgraded to perform the data obfuscation and the edge devices that do not wish to obfuscate the inference data can transmit the original inference data to the backend for inference. The backend executes the InfNet using the received inference data and sends back the inference result to the edge device. Existing cryptographic approaches can be applied to i) protect the confidentiality and integrity of the data exchanged between the edge devices and the backend and ii) the authentication of the edge devices and the backend. Fig. 2 illustrates the remote inference system where each edge device can choose to opt into or out of the ObfNet-based privacy preservation.

Fig. 3: The procedure to generate ObfNets.

Now, we present the approach to generating the sets of ObfNets at the backend. Note that the ObfNets in any set are distinct and all sets are also distinct (i.e., any two sets do not share an identical ObfNet). Fig. 3 illustrates the approach. It has two steps as follows.

  • ObfNet design.

    The system designer designs a small-scale and application-specific neural network architecture for ObfNet. The input to ObfNet is the original inference data sample. The output of ObfNet is the obfuscated inference data sample. Note that there is no rule of thumb to design ObfNet’s architecture; similar to the design of DNNs for specific applications, the design of ObfNet also follows a trial-and-error approach using the validation results of the training process as the feedback (the training of ObfNet will be presented shortly). The designer should try to reduce the scale of ObfNet to make it affordable to resource-constrained edge devices. Moreover, the ObfNet design should meet the following requirements. First, to be unobtrusive, the dimensions of the input and output should be identical. Second, ObfNet should adopt many-to-one non-linear mapping activation functions (e.g., ReLU) to prevent the backend from estimating the exact original inference data from the obfuscated one.

  • ObfNet training.

    First, the backend initializes the weights of an ObfNet with random numbers. Then, the backend concatenates the ObfNet with the InfNet, forming a concatenated DNN, where the output of ObfNet is used as the input to InfNet. The backend trains the concatenated DNN using the training dataset that was previously used to train InfNet. During the backpropagation stage of each training epoch, the loss is backpropagated normally. However, only the weights of ObfNet are updated, while the weights of InfNet are fixed. When the training of the concatenated DNN converges, the backend retrieves the trained ObfNet from the concatenated DNN. By repeating the above procedure, the backend generates multiple distinct sets of distinct ObfNets. Note that due to the randomization of ObfNet’s initial weights and the randomization techniques (e.g., training data sampling) during the training phase, the trained ObfNets are distinct. The backend can determine the cardinality of each set according to the available storage volume of the corresponding edge device that desires data obfuscation. Finally, the backend transmits the set to the edge device.

We have a few remarks regarding the ObfNet approach. First, since InfNet is not changed during the training of ObfNet, the InfNet can classify both the original and the obfuscated inference data samples. The execution of InfNet does not require any indication of whether the input inference data sample is obfuscated. Thus, the unobtrusive requirement is achieved. Second, as the edge devices use distinct ObfNets during remote inference, the collusion between any/all other edge devices with the backend (i.e., the colluding edge devices let the backend know which ObfNets they use) will not affect the non-colluding edge devices. Third, as the ObfNet uses many-to-one non-linear activation functions, it is highly difficult (virtually impossible) for the backend to estimate the exact original inference data sample from the obfuscated one. Moreover, as each non-colluding edge device selects an ObfNet from its received set randomly and dynamically for obfuscation, the difficulty for the backend’s inverse attempt is strengthened due to the introduced uncertainty.

Iv Case Studies

In this section, we present the applications of ObfNet to three case studies. For each case study, we present the data preparation, architectures of the InfNet and the ObfNet, evaluation concerning the impact of ObfNet on inference accuracy, and assessment on the quality of obfuscation. The InfNets and ObfNets are implemented using Python based on the TensorFlow library

[40].

Iv-a Case Study 1: Free Spoken Digit (FSD) Recognition

Our first case study concerns human voice recognition. Recently, voice recognition has been integrated into various edge systems such as smartphones and voice assistants found in households and cars. In many scenarios, voice recordings are privacy sensitive. Thus, it is desirable to obfuscate the voice data for privacy protection, while preserving the performance of voice recognition. In this section, we apply the ObfNet approach to FSD recognition, which can be viewed as a minimal voice recognition task. Using this minimal task as a case study brings the advantage of easy exposition of the results and the associated insights.

Iv-A1 Data preparation

We use the FSD dataset [19, 18] that consists of 2,000 WAV recordings of spoken digits from 0 to 9 in English. We split the data as 80% for training, 10% for validation, and 10% for testing. We extract the mel-frequency cepstral coefficients (MFCC) [5] as the features to represent a segment of audio signal. According to [5]

, MFCC is empirically shown to well represent the pertinent aspects of the short-term speech spectrum and form a particularly compact representation compared with other features such as linear frequency cepstrum coefficients (LPC), reflection coefficients (RC), and cepstrum coefficients derived from the linear prediction coefficients (LPCC). As the recordings are of different lengths, we apply constant padding to unify the number of MFCC feature vectors for each recording. As a result, the extracted MFCC feature vectors over time for each recording form a

2-dimensional image. Both the InfNet and the ObfNet take a image as the input.

Iv-A2 Architecture of InfNet

Multilayer perceptron (MLP) and convolutional neural network (CNN) are two types of DNNs widely adopted for speech recognition and image classification [1, 22, 35]

. An MLP consists of multiple fully-connected layers (or dense layers). Specifically, each neuron in any hidden layer is connected to all the neurons in the previous layer. CNN incorporates the features of shared weights, local receptive fields, and spatial subsampling to ensure shift invariance

[22, 23]. In this case study, we design MLP-based InfNet and CNN-based InfNet, which are denoted by and , respectively. Their details are as follows.

  • consists of three convolutional layers, one max-pooling layer, and three dense layers. Zero padding is performed to the input image in the convolutional layers and the max-pooling layer. ReLU activation is applied to the output of every convolutional and dense layer except for the last layer. ReLU rectifies a negative input to zero. The last dense layer has 10 neurons with a softmax activation function corresponding to the 10 classes of FSD. Three dropout layers with dropout rate 0.25, 0.1 and 0.25 are applied after the max-pooling layer and in the first two dense layers. Specifically, 25%, 10%, and 25% of the neurons will be abandoned randomly from the neural network during the training process. Dropout is an approach to regularization in neural networks which helps reduce interdependent learning amongst the neurons. It is widely leveraged during model training to avoid overfitting

    [38]. Fig. 4 shows the structure of . The has about 1.1 million parameters in total.

  • has five dense layers. ReLU activation is applied to the output of every hidden layer. The last dense layer has 10 neurons with a softmax activation function. To prevent overfitting, four dropout layers are applied after the hidden layers. Fig. 5 shows the structure of . The has about one million parameters in total.

MFCC representation

32 conv filters

48 conv filters

64 conv filters

max-pooling ()

dropout ()

128 neurons

dropout ()

64 neurons

dropout ()

10 neurons

softmax

classification result

conv layers dense layers
Fig. 4: Structure of for FSD recognition.

MFCC representation

800 neurons

dropout ()

300 neurons

dropout ()

128 neurons

dropout ()

64 neurons

dropout ()

10 neurons

softmax

classification result

dense layers
Fig. 5: Structure of for FSD recognition.

Iv-A3 Architecture of ObfNet

Similar to InfNets, we design CNN-based and MLP-based ObfNets, which are denoted by and , respectively. Their details are as follows.

  • consists of two convolutional layers, one max-pooling layer and one dense layer as the output layer. The first convolutional layer filters the input image with three output filters of kernel size . The second convolutional layer applies five output filters with kernel size

    . All convolutional filters use a stride of one pixel. Batch normalization follows both convolutional layers, which is expected to mitigate the problem of internal covariate shift to improve model performance. A max-pooling layer with pool size of

    and stride of two is then used to reduce the data dimensionality for computational efficiency. Zero padding is added in each convolutional layer and the max-pooling layer, to ensure that the filtered image has the same dimension as the input image in each layer. The dense layer with 900 neurons is then connected after flattening the output of the max-pooling layer. ReLU activation is applied to the output of every convolutional and dense layer. This introduces many-to-one mapping that is needed in our scheme as discussed in Section III-B. Two dropout layers of with dropout rates of 0.25 and 0.15 are applied respectively after the max-pooling layer and in the dense layer. In order to ensure that the output of ObfNet has the same size as the input, a reshape layer is applied in the end to reshape the output size to . The has about 0.65 million parameters.

  • has two dense layers as hidden layers. The first layer has 200 neurons and is fully connected to the second layer of 900 neurons. ReLU activation and batch normalization are applied to the output of both layers. A reshape layer is used as the output layer. The has about 0.37 million parameters.

Iv-A4 Inference accuracy of InfNet and ObfNet-InfNet

Following the procedures described in Section III-B, we train and using the training dataset and then train and in the four concatenations of ObfNet and InfNet (i.e., -, -, -, -

). During the training phase, we adopt the AdaDelta optimizer, which introduces minimal computation overhead over stochastic gradient descent (SGD) and adapts the learning rate dynamically

[43]. Note that during the training phase, only the model achieving the highest validation accuracy is yielded as the training result.

The test accuracy of the trained InfNets and is 99.5%. Thus, the InfNets are well trained. The four ObfNet-InfNet concatenations give distinct test accuracy. For each concatenation, we trained ten different ObfNets. Fig. 6 shows the inference accuracy of applying ten different ObfNets before the well-trained InfNet. The average test accuracy of applying and before is 98.35% and 99.40%, respectively. The average test accuracy of applying and before is 98.55% and 99.10%, respectively. Compared with the test accuracy of the and on the original data (i.e., 99.5%), the test accuracy drops caused by the obfuscation are merely 1.15%, 0.10%, 0.95% and 0.40% for different combinations of the ObfNet and the InfNet. Thus, the inference accuracy is well preserved when ObfNet is employed.

Fig. 6: Test accuracy of different ObfNet-InfNet concatenations in ten tests.

Iv-A5 Quality of obfuscation

To understand the quality of obfuscation, we apply the MFCC inverse using a Python package LibROSA [25] to convert the MFCC representations back to WAV audio. The audio converted from the original MFCC representations can be easily recognized by human despite some distortions. We also design an experiment to investigate whether humans can interpret the audios inverted from the outputs of ObfNet, i.e., the obfuscated MFCC representations. The details and results of the experiment are as follows.

We invited ten student volunteers (five males and five females) aged from 21 to 23 from Xi’an Jiaotong-Liverpool University. All volunteers have good hearing. In the experiment, we randomly selected ten original MFCC representations from the test dataset (one for each class of the FSD dataset). Then, we applied the MFCC inverse using LibROSA to convert the ten MFCC representations back to audio. The four different ObfNets (two s and two s) used in our evaluation were applied to obfuscate the two selected MFCC representations. The obfuscated MFCC representations were inverted using LibROSA to audios. Therefore, in total, there were 50 audio files: ten for the original MFCC representations and 40 for the obfuscated MFCC representations. All volunteers sat in a classroom. The 10 audio files inverted from the original MFCC representations were firstly played in the classroom in a shuffled order. All volunteers can correctly recognize the FSDs. Then, the 40 audio files inverted from the obfuscated MFCC representations were played in a shuffled order. Every volunteer was required to write down the FSD label (from 0 to 9) that they perceived.

Fig. 7

shows the confusion matrix for the ten volunteers to recognize the audios inverted from the MFCC representations obfuscated by ObfNet

that is trained for InfNet . Each row shows the distribution of the ten volunteers’ answers for an audio with a certain true label. The last column shows the accuracy for the audio. From the figure, we can see that the volunteers’ answers are distributed over all labels without any consensus. This suggests that the volunteers cannot perceive useful information from the audio in recognizing the FSD. The confusion matrices for the other three ObfNets can be found in Appendix -A. The overall accuracy, which is defined as the number of correct answers divided by a total of 100 answers (10 volunteers 10 audios), is 5%, 7%, 7%, 4% for the four ObfNets, respectively. Thus, the volunteers’ answers seem to be random guesses with an expected accuracy of 10%. Therefore, the ObfNets achieve satisfactory obfuscation quality. Interested reader can download the obfuscated audio samples from an online repository [30] and then examine them.

Perceived label
0 1 2 3 4 5 6 7 8 9 Accuracy

True label

0
1
2
3
4
5
6
7
8
9
Overall accuracy = 5%
Fig. 7: Confusion matrix for recognizing the audio inverted from the MFCC representations obfuscated by ObfNet that is trained for InfNet . The matrix omits the zeros.

Iv-B Case Study 2: Handwritten Digit (MNIST) Recognition

The MNIST dataset of handwritten digits [24] has been widely adopted in ML literature. In this section, we apply our ObfNet approach to MNIST. Due to the simplicity of the image samples in MNIST, the quality of the obfuscation can be readily assessed by visual inspection.

Iv-B1 Data preparation

The MNIST dataset consists of 70,000 handwritten digit images with ten classes corresponding to the digits from 0 to 9, as shown in Fig. 11(a). Each image has a single channel (i.e., grayscale image). We resize each image to .

Iv-B2 Architecture of InfNet

We adopt two InfNets: a CNN-based and an MLP-based . Their details are as follows.

  • is similar to LeNet [24]. It consists of five layers: two convolutional layers, a pooling layer, and two dense layers with ReLU activation. Fig. 8 shows the architecture. The has about 1.2 million parameters in total.

  • has four dense layers as illustrated in Fig. 9. It has about 0.93 million parameters in total.

Digit Image

32 conv filters

64 conv filters

max-pooling ()

dropout ()

128 neurons

dropout ()

10 neurons

softmax

classification result

conv layers dense layers
Fig. 8: Structure of for MNIST recognition.

Digit Image

512 neurons

512 neurons

512 neurons

10 neurons

softmax

classification result

dense layers
Fig. 9: Structure of for MNIST recognition.

Iv-B3 Architecture of ObfNet

An MLP-based ObfNet and a CNN-based ObfNet are adopted. Details are as follows.

  • has two dense layers with ReLU activation. This two-layer design helps reduce the scale of ObfNet. Specifically, to be unobtrusive, ObfNet’s output must have the same size as its input. For input size of , a single-layer MLP with bias has parameters. In contrast, a two-layer MLP with 16 neurons within each layer has parameters only, which is 23.8 times smaller than the single-layer MLP. We configure the number of neurons for the first hidden layer to be , , , , or . We will investigate the impact of ObfNet’s scale on the accuracy of InfNet. The amounts of parameters corresponding to the above configurations are from 0.013 to 0.804 million.

  • has a convolutional layer, a pooling layer, a dropout layer, and two dense layers with ReLU activation. The convolutional layer filters the input image with 32 output filters of kernel size and uses stride of one pixel. The max-pooling layer with pool size of and stride of two follows to reduce spatial dimensions. Two dense layers are then connected with ReLU activation.

Iv-B4 Inference accuracy of InfNet and ObfNet-InfNet

The test accuracies of the trained InfNets and are 99.35% and 98.47%, respectively. This suggests that the InfNets are well trained. As discussed in Section IV-B3, we vary the number of neurons of the first hidden layer of the ObfNets and train the ObfNets following the procedure presented in Section III-B. Fig. 10 shows the test accuracy of various concatenations of ObfNets and InfNets when the number of neurons in the first hidden layer of the ObfNet varies. From Fig. 10(a), compared with the test accuracy of , the concatenation - has test accuracy drops ranging from 0.46% to 1.43% over various neuron number settings. When the InfNet is adopted, more neurons in the first hidden layer of ObfNet result in higher test accuracy of the ObfNet-InfNet concatenation, as shown in Fig. 10(b). In particular, some ObfNet-InfNet concatenations even outperform the corresponding InfNet. This is possible because the ObfNet-InfNet concatenations are deeper neural networks compared with the corresponding InfNet.

(a) Test accuracy of InfNet and two ObfNet-InfNet concatenations
(b) Test accuracy of InfNet and two ObfNet-InfNet concatenations
Fig. 10: Test accuracy of InfNets and ObfNet-InfNet concatenations for MNIST recognition.

Iv-B5 Quality of obfuscation

Fig. 11 shows the obfuscation results of when the number of neurons in the first hidden layer varies. From the figure, we cannot interpret the obfuscation results into any digits. When the number of neurons is few (e.g., 8 to 32), the obfuscation results of the digit one are darker than the obfuscation results of other digits. This is because the values of the pixels in the original inference data of digit one are zero, leading to lower pixel values in the obfuscation results. However, when more neurons are used in the first hidden layer of , the overall darkness levels of the obfuscation results of all digits are equalized, suggesting a better obfuscation quality. The obfuscation results of are given in Appendix -B. Similarly, we cannot interpret the obfuscation results into any digits.

(a) Original inference data
(b) Obfuscation results of with 8 neurons in the first hidden layer
(c) Obfuscation results of with 16 neurons in the first hidden layer
(d) Obfuscation results of with 32 neurons in the first hidden layer
(e) Obfuscation results of with 64 neurons in the first hidden layer
(f) Obfuscation results of with 128 neurons in the first hidden layer
(g) Obfuscation results of with 256 neurons in the first hidden layer
(h) Obfuscation results of with 512 neurons in the first hidden layer
Fig. 11: Obfuscation results of ObfNet on MNIST.

Iv-C Case Study 3: American Sign Language (ASL) Recognition

In this case study, we consider an application of ASL recognition using camera-captured pictures. ASL is a set of 29 hand gestures corresponding to 26 English letters and three other special characters representing the meanings of deletion, nothing, and space delimiter. While ASL is a predominant sign language of the deaf communities in the U.S., it is also widely learned as a second language, serving as a lingua franca. Therefore, portable ASL recognition systems [12] are useful to the communications between ASL users and those who do not understand ASL. Porting the ASL recognition capability to smart glasses is desirable but also challenging due to smart glasses’ limited compute power. Thus, remote inference is a solution for smart glass-based ASL recognition. As the hand gesture images caused by the embedded cameras can contain privacy-sensitive information (e.g., skin color, skin texture, gender, tattoo, location of the shot inferred from the picture background, etc), it is desirable to obfuscate the images. Thus, we apply ObfNet to ASL recognition.

Iv-C1 Data preparation

We use an ASL dataset [21] consisting of 87,000 static hand gesture RGB images with each sized pixels. Fig. 14(a) shows the samples corresponding to the 29 classes of the ASL alphabet. To reduce the scale of ObfNet, we down-sample the ASL images to .

Iv-C2 Architecture of InfNet

As ASL hand gestures have more complex patterns than the MNIST handwritten digits, we adopt a CNN-based InfNet Note that compared with MLP, CNN often better deals with multi-dimensional spatial data. The consists of three convolutional layers with 32, 64, 128 channels, a max-pooling layer, and three dense layers. We adopt adopt after the pooling layer and the second dense layer with drop rates of 0.25 and 0.5. Fig. 12 shows the architecture of . The has about 111 million parameters in total.

ASL Image

32 conv filters

48 conv filters

128 conv filters

max-pooling ()

dropout ()

1024 neurons

1024 neurons

dropout ()

29 neurons

softmax

classification result

conv layers dense layers
Fig. 12: Structure of for ASL dataset.

Iv-C3 Architecture of ObfNet

We evaluate both the MLP-based ObfNet and the CNN-based ObfNet :

  • has two dense layers with ReLU activation. We vary the number of neurons in the first dense layer and evaluate how it affects the inference accuracy. has about 6.3 to 25.2 million parameters, depending on the number of neurons in the first dense layer.

  • consists of a convolutional layer, a pooling layer, two dense layers with ReLU activation. The convolutional layer filters the input image (i.e., RGB image) with 32 output filters of kernel size and uses stride of one pixel. A max-pooling layer with pool size of and stride of two pixels follows to reduce spatial dimensions. Two dense layers are then connected with ReLU activation. Two dropout layers with dropout rates of 0.25 and 0.4 are applied after the max-pooling layer and the second dense layer to prevent overfitting. has about 22 to 44 million parameters, depending on the number of neurons in the first dense layer.

Iv-C4 Inference accuracy of InfNet and ObfNet-InfNet

The test accuracy of the trained is 99.82%. This suggests that the InfNet is well trained. Multiple ObfNets are trained by following the procedure presented in Section III-B.

Fig. 13 shows the test accuracy of various concatenations of ObfNets and InfNets when the number of neurons in the first hidden layer of the ObfNet varies. From Fig. 13, compared with the test accuracy of , the concatenation - has test accuracy drops ranging from 0.12% to 2.81% over various neuron number settings. When the ObfNet is adopted, the concatenation - has test accuracy drops ranging from 1.52% to 2.35%. When the number of neurons in the first hidden layer increases from 512 to 1024, the test accuracy of the - drops. This can be caused by overfitting, because compared with the large number of ’s parameters, the number of training samples is not large. Nevertheless, with proper configuration of the ObfNet, the smallest test accuracy drop we can achieve is 0.12%. This shows that the ObfNet introduces little test accuracy drop for ASL recognition.

Fig. 13: Test accuracy of InfNet and ObfNet-InfNet concatenations for ASL recognition.

Iv-C5 Quality of obfuscation

Fig. 14 shows the visual effect of the obfuscation on the ASL samples. From Fig. 14(b) and Fig. 14(c), we cannot interpret the obfuscation results of and into any hand gestures. Note that the obfuscated samples are still RGB images. Interestingly, the obfuscation results by a certain ObfNet exhibit similar patterns. For instance, each obfuscated sample in Fig. 14(b) has a dark hole in the center and a greenish circular belt around the dark hole. In fact, as the ObfNet has a large number of parameters (up to tens of million), the pattern shown in the obfuscation result is mainly determined by the ObfNet, whereas the original inference data sample with a relatively limited amount of information ( pixel values only) can be viewed as a perturbation.

(a) Original inference data
(b) Obfusaction results of
(c) Obfusaction results of
Fig. 14: Obfuscation results of ObfNet on ASL.

V Implementation and Benchmark

This section presents the implementation of our ObfNet approach on edge/backend hardware platforms. The benchmark results on the hardware platforms give understanding on the feasibility of ObfNet in practice and interesting observations. For conciseness of presentation, we only present the results of trained for in the three case studies.

V-a Hardware Platforms

Our implementation uses the Coral development board [4] (referred to as Coral) and NVIDIA Jetson AGX Xavier [29] (referred to as Jetson) as the edge device and backend hardware platforms, respectively. We implement the ObfNets and InfNets of the three case study applications presented in Section IV on Coral and Jetson, respectively.

Coral is a single-board computer equipped with an NXP iMX8M system-on-chip and a Google Edge TPU [15]. Edge TPU is an inference accelerator that cannot perform ML model training. Coral sizes and weighs about 136 grams including a thermal transfer plate and a heat dissipation fan. The power consumption of Coral is no great than . Thus, Coral is a modern edge device platform with hardware-accelerated inference capability. Note that owing to ObfNets’ small-scale design, they can also run on edge devices without hardware acceleration for inference. Coral runs Mendel, a lightweight GNU/Linux distribution. We deploy the ObfNet implemented using the TensorFlow Lite library [39] on Coral.

Jetson is a computing board equipped with a 8-core ARM CPU, 16GB LPDDR4x memory, and a 512-core Volta GPU. The GPU can accelerate DNN training and inference. Jetson sizes and weighs 280 grams including a thermal transfer plate. Jetson’s power rating can be configured as , , and . In our experiments, we configure it to run at to achieve the highest compute power. Jetson can be employed as an embedded backend to serve edge devices of applications in a locality such as an office building and a factory floor. To support massive edge devices, a cloud backend can be used instead. Jetson runs Ubuntu. We deploy the InfNet implemented using TensorFlow [40] on Jetson.

V-B Benchmark Results

Case study ObfNet execution time (ms)
Minimum Average Maximum
FSD- 2.226 2.312 2.253
MNIST- 0.221 0.221 0.224
ASL- 11.136 11.146 11.170
TABLE I: Per-sample execution time on Coral for trained for .
Case study InfNet execution time (ms)
Minimum Average Maximum
FSD- 0.229 0.246 0.289
MNIST- 0.158 0.174 0.212
ASL- 0.201 0.219 0.249
TABLE II: Per-sample execution time on Jetson for .

For each case study application, we measure the per-sample execution time for obfuscation on Coral and per-sample inference time on Jetson. To mitigate the uncertainties caused by the operating systems’ scheduling, for each tested setting, we run ObfNet or InfNet for 100 times.

V-B1 ObfNet and InfNet execution times

Table I shows Coral’s per-sample execution time for the ObfNets designed for the three case studies. We can see that, the ObfNets need little processing time (i.e., a few milliseconds) on Coral. Table II shows Jetson’s per-sample execution time for the InfNets designed for the three case studies. Although the InfNets have larger scales than the ObfNets, the execution times of InfNets are shorter than those of ObfNets due to Jetson’s greater compute power. In TensorFlow, batch execution of inferences can improve the efficiency of utilizing the hardware acceleration. Thus, we also evaluate the impact of the batch size on the per-sample execution time of InfNets. Fig.  15 shows the results. We can see that the per-sample execution time decreases with the batch size and converges. The convergence is caused by the saturation of the hardware acceleration utilization. The above results show that the ObfNets and InfNets introduce little overhead to the edge device and the backend for the considered case study applications.

(a) FSD-
(b) MNIST-
(c) ASL-
Fig. 15: InfNet’s per-sample execution time on Jetson versus batch size. Error bar represents average, maximum and minimum over 100 tests.
Model Inference time ()
Minimum Average Maximum
FSD- 13.484 14.318 15.137
MNIST- 7.606 8.351 9.095
ASL- 100.433 100.467 100.510
TABLE III: Per-sample execution time of on Coral.
Fig. 16: Data sample transmission time versus network connection data rate.

V-B2 Advantage of remote inference

Inference accelerators such as Edge TPU may enable the execution of deep InfNets on edge devices (i.e., local inference). In contrast, the remote inference scheme considered in this paper involves the transmissions of the inference data to the backend, which may incur time delays. In this set of benchmark experiments, we put aside the need of protecting the confidentiality of InfNets as discussed in Section I and compare the local inference and remote inference in terms of total time delay.

Table III shows the execution time of InfNets on Coral. Compared with the results in Table II, for the FSD and MNIST case study applications, the execution times on Coral are about 50x longer than those on Jetson. For ASL, it is about 480x longer. The data transmission delays under the remote inference scheme are often small, because edge devices often have wideband network connections (e.g., Wi-Fi and 4G). Based on the average inference data sample sizes of the case study applications (i.e., , , and for FSD, ASL, and MNIST, respectively), Fig. 16 shows the per-sample transmission times versus the network connection data rate. Analysis shows that, compared with the local inference, the remote inference achieves shorter time delays when the connection data rate is higher than . Note that 4G connections normally provide more than data rate. Thus, remote inference will be more advantageous in terms of total time delay. The advantage of remote inference can be better exhibited when the scales of the InfNets are larger or the edge devices are not equipped with inference accelerators.

Vi Conclusion and Future Work

The case studies presented in this paper show that there can exist a small-scale non-linear transform in the form of a neural network, i.e., ObfNet

, such that the transformed inference data samples are mapped to the same class labels as the original inference data samples, where the mapping is the InfNet . Formally, , holds mostly, , where represents the inference dataset. The evaluation also shows that the ObfNet can well protect the confidentiality of the raw form of the inference data sample , through the volunteers’ auditory examination on the obfuscated FSD samples and the visual examination on the obfuscated MNIST and ASL samples. Therefore, this paper presents a lightweight and unobtrusive data obfuscation approach for inference, which can be used to protect the edge devices’ data privacy in the remote inference systems.

In our future work, we aim to apply the ObfNet approach for a number of heavyweight InfNets that deal with more complex auditory and visual sensing tasks such as full-fledged speech recognition and DNNs for ImageNet

[8].

References

  • [1] O. Abdel-Hamid, L. Deng, and D. Yu (2013) Exploring convolutional neural network structures and optimization techniques for speech recognition.. In Interspeech, Vol. 11, pp. 73–5. Cited by: §IV-A2.
  • [2] Amazon alexa. Note: https://amzn.to/2EJL5VjAccessed: 2019-12-10 Cited by: §I.
  • [3] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine learning 3 (1), pp. 1–122. Cited by: Fig. 1, §II.
  • [4] Coral. Note: https://coral.ai/products/dev-board/Accessed: 2019-12-10 Cited by: §V-A.
  • [5] S. Davis and P. Mermelstein (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing 28 (4), pp. 357–366. Cited by: §IV-A1.
  • [6] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le, and M. Ranzato (2012) Large scale distributed deep networks. In Advances in neural information processing systems, pp. 1223–1231. Cited by: Fig. 1, §II.
  • [7] R. A. DeMillo (1978) Foundations of secure computation. Technical report Georgia Institute of Technology. Cited by: §II.
  • [8] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In

    2009 IEEE conference on computer vision and pattern recognition

    ,
    pp. 248–255. Cited by: §VI.
  • [9] C. Dwork, F. McSherry, K. Nissim, and A. Smith (2006) Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pp. 265–284. Cited by: Fig. 1, §II.
  • [10] C. Dwork (2011) Differential privacy. Encyclopedia of Cryptography and Security, pp. 338–340. Cited by: §II.
  • [11] Facebook-cambridge analytica data scandal. Note: https://bit.ly/2sP8OkgAccessed: 2019-12-10 Cited by: 1st item.
  • [12] B. Fang, J. Co, and M. Zhang (2017) DeepASL: enabling ubiquitous and non-intrusive word and sentence-level sign language translation. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems, pp. 5. Cited by: §IV-C.
  • [13] R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing (2016) Cryptonets: applying neural networks to encrypted data with high throughput and accuracy. In International Conference on Machine Learning, pp. 201–210. Cited by: §I, Fig. 1, §II.
  • [14] I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: 2nd item.
  • [15] Google’s edge tpu. Note: https://cloud.google.com/edge-tpu/Accessed: 2019-12-10 Cited by: §V-A.
  • [16] T. Graepel, K. Lauter, and M. Naehrig (2012) ML confidential: machine learning on encrypted data. In International Conference on Information Security and Cryptology, pp. 1–21. Cited by: Fig. 1, §II.
  • [17] J. Hamm, A. C. Champion, G. Chen, M. Belkin, and D. Xuan (2015) Crowd-ml: a privacy-preserving learning framework for a crowd of smart devices. In 2015 IEEE 35th International Conference on Distributed Computing Systems, pp. 11–20. Cited by: Fig. 1, §II.
  • [18] Z. Jackson, C. Souza, J. Flaks, Y. Pan, Nicolas,Hereman, and A. Thite Free-spoken-digit-dataset. Note: https://zenodo.org/record/1342401#.XdlRd-gzY2wAccessed:2019-12-10 Cited by: §IV-A1.
  • [19] Z. Jackson Free-spoken-digit-dataset. Note: https://github.com/Jakobovski/free-spoken-digit-datasetAccessed:2019-12-10 Cited by: §IV-A1.
  • [20] L. Jiang, R. Tan, X. Lou, and G. Lin (2019) On lightweight privacy-preserving collaborative learning for internet-of-things objects. In 2019 IEEE/ACM Third International Conference on Internet-of-Things Design and Implementation (IoTDI), pp. 70–81. Cited by: §I, Fig. 1, §II.
  • [21] Kaggle Image data set for alphabets in the american sign language. Note: https://www.kaggle.com/grassknoted/asl-alphabetAccessed: 2019-12-10 Cited by: §IV-C1.
  • [22] Y. LeCun and Y. Bengio (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361 (10), pp. 1995. Cited by: §IV-A2.
  • [23] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §IV-A2.
  • [24] Y. LeCun

    The mnist database of handwritten digits

    .
    Note: http://yann.lecun.com/exdb/mnist/Accessed: 2019-12-10 Cited by: 1st item, §IV-B.
  • [25] LibROSA. Note: https://librosa.github.io/librosa/Accessed: 2019-12-10 Cited by: §IV-A5.
  • [26] B. Liu, Y. Jiang, F. Sha, and R. Govindan (2012) Cloud-enabled privacy-preserving collaborative learning for mobile sensing. In Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems, pp. 57–70. Cited by: Fig. 1, §II.
  • [27] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas (2016) Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629. Cited by: Fig. 1, §II.
  • [28] F. McSherry and K. Talwar (2007) Mechanism design via differential privacy.. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), Vol. 7, pp. 94–103. Cited by: Fig. 1, §II.
  • [29] NVIDIA jetson agx xavier. Note: https://bit.ly/2ZgTk4YAccessed: 2019-12-10 Cited by: §V-A.
  • [30] ObfNet-showcase. Note: https://github.com/ntu-aiot/ObfNet-showcaseAccessed: 2019-12-10 Cited by: §IV-A5.
  • [31] S. A. Osia, A. S. Shamsabadi, A. Taheri, K. Katevas, S. Sajadmanesh, H. R. Rabiee, N. D. Lane, and H. Haddadi (2017) A hybrid deep learning architecture for privacy-preserving mobile analytics. arXiv preprint arXiv:1703.02952. Cited by: Fig. 1, §II, §II.
  • [32] PictureThis. Note: https://www.picturethisai.com/Accessed: 2019-12-10 Cited by: §I.
  • [33] Y. Qi and M. J. Atallah (2008) Efficient privacy-preserving k-nearest neighbor search. In 2008 The 28th International Conference on Distributed Computing Systems, pp. 311–319. Cited by: Fig. 1, §II.
  • [34] A. Roth and T. Roughgarden (2010) Interactive privacy via the median mechanism. In

    Proceedings of the forty-second ACM symposium on Theory of computing

    ,
    pp. 765–774. Cited by: Fig. 1, §II.
  • [35] T. N. Sainath, A. Mohamed, B. Kingsbury, and B. Ramabhadran (2013) Deep convolutional neural networks for lvcsr. In 2013 IEEE international conference on acoustics, speech and signal processing, pp. 8614–8618. Cited by: §IV-A2.
  • [36] Y. Shen, C. Luo, D. Yin, H. Wen, R. Daniela, and W. Hu (2018) Privacy-preserving sparse representation classification in cloud-enabled mobile applications. Computer Networks 133, pp. 59–72. Cited by: Fig. 1, §II.
  • [37] R. Shokri and V. Shmatikov (2015) Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp. 1310–1321. Cited by: Fig. 1, §II.
  • [38] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15 (1), pp. 1929–1958. Cited by: 1st item.
  • [39] TensorFlow lite. Note: https://www.tensorflow.org/liteAccessed: 2019-12-10 Cited by: §V-A.
  • [40] TensorFlow. Note: https://www.tensorflow.orgAccessed: 2019-12-10 Cited by: §IV, §V-A.
  • [41] Thousands of amazon workers listen to alexa users’ conversations. Note: https://time.com/5568815/amazon-workers-listen-to-alexa/Accessed: 2019-12-10 Cited by: §I.
  • [42] J. Wang, J. Zhang, W. Bao, X. Zhu, B. Cao, and P. S. Yu (2018) Not just privacy: improving performance of private deep learning in mobile cloud. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2407–2416. Cited by: Fig. 1, §II, §II.
  • [43] M. D. Zeiler (2012) ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701. Cited by: §IV-A4.
  • [44] J. Z. Zhan, L. Chang, and S. Matwin (2005) Privacy preserving k-nearest neighbor classification.. International Journal of Network Security 1 (1), pp. 46–51. Cited by: Fig. 1, §II.
  • [45] M. Zinkevich, M. Weimer, L. Li, and A. J. Smola (2010) Parallelized stochastic gradient descent. In Advances in neural information processing systems, pp. 2595–2603. Cited by: Fig. 1, §II.