The need for protecting input privacy in machine learning (ML) is growing rapidly in many areas such as health care Esteva et al. (2019), autonomous vehicles Zhu et al. (2014), business Heaton et al. (2017), communication Foerster et al. (2016) and etc. In this paper we propose a novel methodology that offers data security to the data holders with privacy concerns. One of most prominent benefits of this work is that MLaaS engineer can easily utilize this framework to securely implement their desired DNNs in the cloud, without any model modifications. To achieve this goal, this work rely on Trusted Execution Environments (TEE) to provide a unified scheme for inference and training of large DNNs on private data. The primary challenge of using TEEs is that DNNs need powerful computational and memory resources while existing TEEs computational power and memory space is very limited. Prior training schemes Hynes et al. (2018); Hunt et al. (2018) secure the entire model within a TEE, thereby preventing large models from taking advantage of the TEE and also are unable to use ML accelerators such as GPUs. Prior inference schemes using TEE Hanzlik et al. (2018); Ohrimenko et al. (2016); Gu et al. (2018); Tramer and Boneh (2018) exploit the fact that model parameters are fixed during inference to enable split execution between TEE and GPU. These schemes cannot be used for training since the parameters are constantly updated during the training process. Our approach, called DarKnight, is a unified platform designed for both training and inference of DNNs without revealing the input data to the MLaaS platforms. To the best of our knowledge this is the first work that uses a single enclave for training large DNNs and has a comparable performance with the fast GPUs.
DarKnight uses TEEs to linearly combine multiple inputs and then use an additive stream cipher noise to blind the input data before exposing it to the unguarded hardware. The scheme then uses a GPU to perform linear operations such as convolution and dense layer matrix multiplication on the coded input data for high performance. By restricting GPUs to perform only linear operations, DarKnight guarantees that computations on each input data can be decoded. By limiting TEEs to perform only data encoding and non-linear operations, DarKnight enables training large DNN models that exceed the size of TEE memory limitations. We design and implement DarKnight on an Intel SGX server equipped with GPU. We train VGG16, VGG19 and MobileNet as large DNN models, to demonstrate the efficacy of DarKnight. Furthermore, we perform several system-level optimization to reduce the memory utilization of the TEE.
The rest of the paper organizes as follow. In Section 2, we describe the problem setup and design objective. Section 3 explains the methodology for inference and training in addition to the challenges facing the proposed algorithm. The experimental results 4 are presented in section. In section 5 we draw the conclusion.
2.1 Intel SGX
Trusted execution environments such as ARMTrustZoneAlves (2004), Intel SGXCostan and Devadas (2016) and SanctumCostan et al. (2016) provide an execution environment where computational integrity of user’s application is guaranteed by the hardware. For instance, using Intel SGX provides a limited amount of secure memory within an enclave which is protected from other processes, operating systems, hypervisors, and physical attacks.
Users can write code to execute part of their application within an enclave while allowing other parts to be executed outside of an enclave. SGX has MB as the enclave memory. In the case that the enclave application needs more space, a time consuming page eviction process is initiated.
While some types of side channel attacks have been performed on SGX, Intel has been able to fix many of these attacks Khandaker et al. (2020); Costan and Devadas (2016); Xu et al. (2015); Lee et al. (2017); Weichbrodt et al. (2016); Wang et al. (2017); Costan and Devadas (2016). In this work we assume that computations performed within SGX are invisible to the outside entities. Instead DarKnight focuses on exploiting this feature to securely and efficiently perform inference and training.
2.2 Related Work
There are a variety of approaches for protecting input privacy during DNN training and inference. We categorized these approaches in Table 1
. These approaches can be classified as those that rely on homomorphic encryption (HE), multi-party computing (MPC), trusted execution environments (TEE), differential privacy (DiffP), and additive noise (Noise). In some of the works mentioned below a combination of forenamed techniques is used. Those works are assigned to a category that describes them best. These techniques can be applied either for inference or training.
Homomorphic encryption techniques encrypt input data and then perform inference directly on encrypted data, albeit with significant performance penalty Gentry (2009); Liu et al. (2017); Gilad-Bachrach et al. (2016); Juvekar et al. (2018).
Secure multi-party computing is another approach, where multiple non-colluding servers may use custom data exchange protocols to protect input data Shokri and Shmatikov (2015); Mohassel and Zhang (2017); Wagh et al. (2019); Mohassel and Rindal (2018). However, this approach requires multiple servers to perform training or inference.
An entirely orthogonal approach is to use differential privacy, which aims to guarantee the confidentiality of the user, if the output of the DNN for that user is exposed Abadi et al. (2016); Erlingsson et al. (2014); Team (2017).
Additive Noise is another approach mostly used for inference. In this mechanism there is a trade-off between the privacy performance, computational complexity and, model accuracy Wang et al. (2018); Leroux et al. (2018); Mireshghallah et al. (2020).
Our idea relies on TEEs to perform private inference or training in a secure hardware Hynes et al. (2018); Hunt et al. (2018); Hanzlik et al. (2018); Ohrimenko et al. (2016); Gu et al. (2018); Tramer and Boneh (2018). In particular, TEE-based training solutions focus on holding the entire model within the TEE environment which protects data and model from outside attackers. However, TEEs impose severe memory and compute limitations since the size of the model that fits within a TEE is quite small, thereby preventing large models from taking advantage of the TEE. Furthermore, most of the DNN training methods are currently performed on GPUs which do not support TEE.
In particular, Slalom Tramer and Boneh (2018) is an inference framework for protecting data privacy and integrity. Slalom uses the Intel SGX enclave to secure received input data from a client, and blind input data with an additive stream cipher noise . The blinded data is then sent to an untrusted accelerator where linear operations are performed. The computed data is then returned to the enclave which can decode the correct computational output by subtracting the precomputed . Here is the model parameter matrix. There are challenges towards the promise of Slalom. First, Slalom cannot be used for training, since it precomputes offline and performs simple subtraction operation to decode the output . Precomputing the blinding factors is not possible during training since the model parameters are updated after every batch. Computing inside the SGX after every batch also defeats the purpose of offloading the computations. Even in inference, securely storing multiple instances of ’s and their corresponding ’s within the enclave memory, occupies a substantial amount of memory for large DNNs. On the other hand, storing an encrypted version of these values outside the enclave memory, leads to significant encryption and decryption costs, as these values are needed after each linear operation.
DarKnight supports both inference and training in a single framework. Figure 1 depicts the overall execution flow of DarKnight. A MLaaS cloud server has at least one SGX enclave and one GPU accelerator. Our goal is to rely on the GPU to perform computationally intensive operations while relying on SGX to protect input privacy. As such the initial model () that a user wants to train is loaded into the cloud server. (1) The training image set is encrypted using a mutually agreed keys with SGX. (2) SGX decrypt the image. (3) SGX calls DarKnight’s blinding mechanism to seal the data. (4) The blinded data is offloaded GPU for linear operation. (5) GPU performs linear operations and returns the data back to SGX. (6) SGX decodes the received computation using DarKnight’s decoding strategy and performs activation to get the next layer’s hidden feature map. This process is repeated both for forward and backward propagation for each layer. The only difference between training and inference is the blinding and unblinding strategy. DarKnight designs a novel blinding strategy that can blind the weight updates and hidden feature maps which are computed with respect to a specific training input. While for inference the approach is simplified.
DarKnight enables training and inference on very large models as its unique blinding strategy does not need to store the model parameters () within secure memory. Instead only a few scalars need to be protected, thereby allowing efficient training of large DNN models with multiple images as we explain below:
3.1 Privacy in Inference
In this section we start with DarKnight’s inference strategy. We consider a trained DNN, represented by model parameters with layers, which is performing inference on input , which must be protected. At a layer the inference process computes , where
corresponds to the bilinear operation at that layer (e.g. matrix product, convolution, etc.). After the linear operation an activation function () creates the next layer input . Within this context, DarKnight first receives a set of inputs for a batch inference from a client. Our goal is to perform linear calculations of on the GPU without exposing the inputs to the GPU. Note that the subscript in all these variables refers to the values of the first layer. As of this point, we will drop the subscript for a more clear notation. The blinding and unblinding strategy is similar for the rest of the layers.
Key Insight: The main idea behind DarKnight’s privacy protection scheme is the fact that our multiplication operator is bilinear. Thus, instead of asking the GPU to calculate
, which exposes the inputs, DarKnight blinds multiple inputs by combining them linearly plus a random noise vector with sufficient power. Due to the bilinear property any linear operation onblinded inputs can be recovered if there are different linear computations performed.
DarKnight Blinding: More specifically, DarKnight creates inputs , as follows,
The scalars and the noise vector
are randomly generated, such that the variance of the noise is sufficiently larger than the inputs (See remarks below for more details). We gather the scalars’s in the the matrix . And we assume that is secured in enclave memory and is invisible to an adversary. Hence, by revealing the values ’s to the external hardware, we do not expose the inputs ’s.
The blinded data ’s are sent to the GPU which performs the following computations: … .
DarKnight Unblinding: The outputs returned from the GPU must be deblinded to extract the original results . These value can be extracted as follows,
Few remarks are in place (1) Unlike prior works DarKnight does not need to store within the enclave memory thereby significantly enhancing our ability to infer with much larger models. (2) The size of the matrix that is secured is proportional to the number of inputs that are blinded together rather than the model size , which is several orders of magnitude larger. (3) The values of the matrix can be computed offline in the pre-processing phase for performance improvement. We also prefer to choose a matrix , with a condition number close to one, so that our blinding and unblinding algorithm remains numerically stable. For this purpose, orthogonal matrices serves us the best. (4) The process of deblinding inputs with one additive stream cipher requires computations. During deblinding we extract , but that value is just dropped. Thus DarKnight trades additional computations in order to eliminate the need to secure very large model parameters.
Selection of : Selecting larger (number of images that are batch inferred) decreases the overhead of one additional noise related computation. However, increasing the size of
requires blinding multiple images within an enclave. Our analysis indicates we can merge and process up to four images from ImageNet dataset at the same time without exceeding current SGX memory limitations. This result is shown in Figure1(a). Thus we need one extra linear computation for every four image inference requests.
The noise vector: Here, we proposed a simple version of the DarKnight that shares one noise vector , among all the equations. This is sufficient to blind the inputs in each equations. But a more complicated version of the DarKnight is explained in the Appendix, that uses multiple noises in each equation in a way to decrease the mutual information between the equations (1) (in case the attacker knows the underlying blinding method).
3.2 Privacy in Training
Model training places a significant burden on DarKnight’s input obfuscation scheme used for inference. In the training setting, the model parameters, , are updated each time a batch is processed.
For a model with layers which is being trained with a batch of inputs, the model parameters at layer are updated using the well known SGD process as:
Here is the learning rate, and is the gradient of the loss for the point in the training batch, with respect to the output of layer .
DarKnight must protect for each layer of the DNN when the layer’s linear operations (convolution, matrix multiplication) are outsourced to a GPU. Recall that the decoding process for inference exploited the invariant property of model parameter for any given input such that . However, during the training process, , we a have different for each input . Thus, decoding the from obfuscated inputs is a non-trivial challenge.
Key Insight: The key insight is that while training a batch of inputs it is not necessary to compute the for each input . Instead the training process only needs to compute cumulative parameter updates for the entire batch of inputs. Hence, what is necessary to compute is the entire which is a summation over multiple inputs in the batch.
DarKnight Blinding: DarKnight exploits this insight to protect privacy without significantly increasing the encoding and decoding complexity of the blinding process. In particular DarKnight uses a new linear encoding scheme to combine inputs (covered by noise). As shown in (3), there are inputs on which gradients are computed. Instead of calculating the products in (3), DarKnight calculate the following equations, in the backward propagation,
DarKnight selects ’s, ’s and ’s such that
Assuming batch size is equal to , the parameters used for scaling values is gathered in the by matrix, . ’s are gathered in the by matrix , the scalar matrix with the same size for intermediate features and ’s form the diagonal of a by matrix , that gives us the proper parameters for efficient decoding.
DarKnight Unblinding: Given the constraint imposed on ’s, ’s and ’s the decoding process is trivially simple to extract . It is easy to see that if the scalars ’s, ’s and ’s satisfy the relation (5), we will have
In other words, the unblinding process only involves calculating a linear combination of the equations in (4), which are calculated in the untrusted GPU and there is no need to compute each component individually. DarKnight Training Complexity: It is important to note that DarKnight’s training approach for blinding and unblinding is very simple. All the scaling parameters can be generated in the prepossessing phase and can be stored inside TEE. The size of the , and matrices is just proportional to the square of the batch size that is being processed at one time. Even with a 8-64 batch size (commonly used in VGG training Canziani et al. (2016); Han et al. (2015)) these scaling values are substantially smaller than the model parameters . Furthermore, even with a batch size of 8-64 the TEE may choose to process only a small subset of images, called a virtual batch, at a time. The size of the virtual batch is limited by the size of the SGX memory that must compute the , typically 4-8 images at a time. Thus the scaling parameters for blinding are quite small.
3.3 Extending DarKnight to Verify Data Integrity with Untrusted GPU
Apart from protecting privacy DarKnight can be extended easily to a scenario where the GPU hardware is not trusted. In this case the linear computations performed by the GPU must also be verified. In the interest of space we just provide an insight into how DarKnight can perform data integrity checks for inference. Similar extensions for training are also possible. Recall that DarKnight creates blinded inputs for original inputs. To provide integrity DarKnight creates one additional linear combination of inputs using the same approach as in Eq.1. This additional computation allows every output to be extracted by the SGX enclave by solving two different sets of linear equations. With this redundant equation, we can detect an error when the two ways to extract each does not match. In case an error is detected, the enclave may perform additional corrective action, such as executing on another GPU worker or perform additional redundant computations. But these actions are outside the scope of our current work.
3.4 Random Noise and Quantization
The strength of random noise added by DarKnight provides strong privacy guarantees for both training and inference. However, strong noise may also obfuscate the original signal due to floating point errors, particularly with long running training iterations where the errors may accumulate. On one hand using a powerful random noise gives a more robust privacy guarantee and on the other hand, a strong noise requires a stricter quantization mechanism and consequently may affect the accuracy of training and inference. Quantized DNNs have been widely studied in recent years Hubara et al. (2017); Zhou et al. (2016); Gupta et al. (2015); Galloway et al. (2017); Panda et al. (2019). In this work we assume that either appropriate quantization is already performed or the random noise does not significantly accumulate the training errors. In the next section, we run some numerical experiments to evaluate the effect of additive noise to accuracy of inference and training.
All the experiments ran on Intel(R) Coffee Lake E-2174G 3.80GHz processor. This server has 64 GB RAM and supports Intel Soft Guard Extension (SGX). The co-located GPU that is used for linear operations is Nvidia GeForce GTX 1080 Ti.
In all of the following simulations, a single thread is performing the TEE’s functionality. Adding multiple threads has its complications especially because of the memory requirement for thread creation.
We used three different DNN models: VGG16 Simonyan and Zisserman (2014), VGG19 Simonyan and Zisserman (2014) and, MobileNet Howard et al. (2017). MobileNet’s goal is to design a small DNN which can be used on small devices and phones. Therefore, it replaces standard convolution with depth-wise convolution followed by point-wise convolution to reduce the number of linear operations. We used three well-known datasets for inference and training. one is CIFAR-10 Krizhevsky et al. (2009) that has 50000 training images evenly distributed between 10 categories. Each 32x32 image stores three bytes of RGB for each pixel. CIFAR-100 Krizhevsky et al. (2009) has 100 classes and each class contains 600 images. The other dataset is ImageNet Russakovsky et al. (2015) which is an image dataset designed based on WorldNet hierarchy with more than 1.2 million images and 1000 categories.
For the sake of comparison we implement two baselines: one fully implemented on GPU and the other one is fully on SGX. For inference, we also compare our performance with Slalom Tramer and Boneh (2018) which is a privacy scheme that can only be used for inference. Needless to mention, the GPU version does not provide any security guarantee. For inference, we show how our model speeds up the execution time in both VGG16, VGG19 and MobileNet. Moreover, we examined the effect of different noise signals on the accuracy of inference. For training, first the effect of noise on the accuracy and convergence is analysed. Furthermore, The details of timing comparison of each operation is depicted for both forward and backward pass. Our source codes, GPU comparisons, and more results for more models and datasets are available at our github:
Effect of Virtual Batch Size: Figure 1(a) demonstrates the effect of merging multiple images on the speedup of inference. As we explained in 2.1, the memory capacity of SGX is very limited and processing more data than the capacity of SGX cause performance degradation because of the complicated encryption and eviction procedure. DarKnight() denote a case of having a virtual batch size of meaning that input images combined and blinded with a noise signal. The X-axis shows the SGX operations and the Y-axis represents the speedup relative to the baseline of DarKnight(1) for each operation. As shown, combining more images up to a certain point, yields a speed up in all the operations. But by merging more than a certain number of images, the performance degrades significantly. The reason behind this observation is that after adding 4 images the available enclave memory is saturated. In the rest of the inference section we use DarKnight(4) which shows the most promising performance.
Inference Speedup: Figure 1(b) compares the speedup of the inference process for VGG16, VGG19 and MobileNet. The case of performing all the calculations on the SGX, using Slalom for inference, using Slalom with integrity, using DarKnight(4), and using DarKnight(3)+Integrity. A few remarks are in place. As observed in the figure, integrity increases the execution time, since we add some redundancy to the calculations to guarantee the integrity. We also observe -fold speedup, compared to the fully on SGX baseline, and improvement compared to Slalom for VGG16. This advantage originates from two points. First, Slalom stores many random vectors, ’s, and their corresponding . Therefore, it has less memory available for processing images and cannot simultaneously process the same amount of images as our model. To avoid storing the ’s inside the SGX, they store the encrypted version of them in the unprotected memory. In each iteration, and in each layer, SGX has to go through the decryption procedure which has a substantial performance overhead. Moreover, they did not fully utilize the SGX memory, as they have to sequentially process one image at a time. For integrity checks Slalom uses Freivalds’ algorithm. However, their implementation only checks the integrity of convolution layers and dense layers integrity check are disregarded for simplicity. On the other hand, our implementation the integrity of all the linear operations including dense layers are checked. For integrity checks in our design we used the DarKnight(3) model in which three images are linearly combined and covered by noise. The reason is that when integrity checks are added to the design, one extra equation is generated. In the other words, we will have equations and unknowns. We will have to simply make sure that the result derived from the first four are consistent with the fifth equation. In order to avoid memory overflow of SGX, it is beneficial to reduce the number of images from four to three, hence we use DarKnight(3) model. As depicted, our implementation of VGG16 with integrity has performance improvement, compared to Slalom, while offering more robustness by checking the integrity of all linear layers.
The same is scheme is used for analyzing MobileNet behaviour. As mentioned in section 4.1, MobileNet reduced the amount of convolution computation and that is why we expect it to show less performance improvement using our method in relative to baseline. Although its speedup over baseline on SGX is less than VGG16, we still observe a x speedup, compared to SGX model, and speedup over Slalom.
|Noise||Top1 Accuracy||Top5 Accuracy||Top1 Accuracy||Top5 Accuracy||Top1 Accuracy||Top5 Accuracy|
Effect of Noise on Accuracy: For our simulations, we use a random Gaussian vector with iid entries, , as the noise vectors ’s. In Table 2, we investigated the effect of various means and variances for the noise, on the accuracy of the inference of VGG16, VGG19 and MobileNet. It is worth mentioning that the accuracy of the baseline (first row of the table) could be improved with different mechanisms, which is out of the scope of this work. While for a few of the noise signals a negligible accuracy loss are observed, for most of them, adding a noise signals cause no accuracy degradation. This arguments holds for VGG19 and MobileNet and also CIFAR-10 and CIFAR-100. The noise signals we applied here are orders of magnitude larger than dataset.
We used Intel Deep Neural Network Library (DNNL) for designing the DNN layers including the Convolution layer, ReLU, MaxPooling and, Dense layer. For evaluating training performance, two aspects are examined. One is the accuracy of training considering adding noise to the data and the other one is the execution time of training.
Effect of Noise on Accuracy: As depicted in Figure 3, the accuracy of training for different noise is measured during the first epochs on VGG16 and VGG19. Figure 2(a) shows the accuracy of training for VGG16 on CIFAR-10 dataset. As demonstrated the accuracy after epoch with some of noise signals is less than and for some no accuracy degradation is observed. The same argument holds for CIFAR-100 as depicted in 2(b). The same argument holds for VGG19.
Training Execution Time: Figure 4 demonstrates the speed of training using DarKnight relative to the baseline fully implemented on SGX for both forward pass and backward propagation. It breaks down the execution time spent in each category and shows the speedup we get per category of operations per image. For VGG16 for example, as shown for the baseline linear operation takes and of the execution time in forward and backward pass respectively. In forward pass the summation of blinding/unblinding and the linear operations reduces the linear operation time on SGX by and hence the total execution time is cut by . For backward propagation, this reduction is even higher since the total blinding/unblinding plus linear operation in DarKnight takes only of the time that baseline spent for its linear computation. As a result, DarKnight speeds up the backward propagation by more than times. It is worth mentioning that, our implementation of ReLU takes longer than the baseline because some part of the unblinding function is fused to ReLU function for performance purposes. For VGG19 the same behaviour is observed.
MLaaS has been attracted many data holders’ attention recently. One of the most important concerns of data owners is data privacy and computation integrity. In this work, we address these issues using Trusted Execution Environments(TEEs). This work utilizes TEEs to establish a solid baseline for both training and inference by minimizing the information leakage and increasing the robustness of the computation. To the best of our knowledge this is the first work that uses TEEs for training large DNNs while having a comparable performance with fast insecure hardware. We achieve a significant speedup by offloading the computationally expensive operations to the co-located GPU while keeping the non-linear operation inside the trusted hardware. For training, we preserve the privacy of the training points in standard SGD by linearly combining the training points and taking advantage of additive stream cipher noise. For training and inference, a novel encoding and decoding procedure is designed to reduce the overhead of blinding and unblinding functions which prepare the data for exposing it to the unguarded hardware.
These days Machine learning has more applications than anytime before. The need for data privacy has been emerged especially now that smart homes, autonomous cars and, personal assistants have been taking over the world. These applications need to process stream of data flowing to them on real time. Since users do not want to reveal data about their personal lives, this real time data processing needs to preserve privacy.
The idea of input privacy using TEEs can be used by all the companies who offer MLaaS and all their costumers. This means that data holders can expose their private data to the secure part of the cloud without any privacy concerns. The cloud itself takes the responsibility of protecting the data when it needs to do a computationally expensive operations. Therefore, from data holder’s perspective they do not need any extra computation or communication power. On the other hand from MLaaS providers’ perspective, they do not need to design a different different cryptography method each time their costumers have a specific security requirement. The design time basically offloads to the TEE designer and once they finish the design, tons of applications can take advantage of it. This is the first approach combining hardware and software to provide security and like every other area having hardware designed for a specific task can improve the performance significantly. Furthermore, MLaaS providers do not need to redesign the models or change any of the parameters for providing security.
One disadvantage is that if MLaaS uses a specific TEE to preserve privacy, there will be attacks targeted the TEE. Since the data is clean for the computations inside SGX, if an attacker is able to invade the TEE they have full access over raw data. Also they can modify the training data in a way that leads to noisy training.
-  (2016) Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. Cited by: §2.2, Table 1.
-  (2004) Trustzone: integrated hardware and software security. White paper. Cited by: §2.1.
-  (2017) Secure multiparty computation from sgx. In International Conference on Financial Cryptography and Data Security, pp. 477–497. Cited by: Table 1.
-  (2016) An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678. Cited by: §3.2.
-  (2016) Intel sgx explained.. IACR Cryptology ePrint Archive 2016 (086), pp. 1–118. Cited by: §2.1.
-  (2016) Sanctum: minimal hardware extensions for strong software isolation. In 25th USENIX Security Symposium (USENIX Security 16), pp. 857–874. Cited by: §2.1.
-  (2014) Rappor: randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp. 1054–1067. Cited by: §2.2, Table 1.
-  (2019) A guide to deep learning in healthcare. Nature medicine 25 (1), pp. 24–29. Cited by: §1.
Learning to communicate with deep multi-agent reinforcement learning. In Advances in neural information processing systems, pp. 2137–2145. Cited by: §1.
Attacking binarized neural networks. arXiv preprint arXiv:1711.00449. Cited by: §3.4.
Fully homomorphic encryption using ideal lattices.
Proceedings of the forty-first annual ACM symposium on Theory of computing, pp. 169–178. Cited by: §2.2, Table 1.
-  (2016) Cryptonets: applying neural networks to encrypted data with high throughput and accuracy. In International Conference on Machine Learning, pp. 201–210. Cited by: §2.2, Table 1.
-  (2018) Securing input data of deep learning inference systems via partitioned enclave execution. arXiv preprint arXiv:1807.00969. Cited by: §1, §2.2, Table 1.
-  (2015) Deep learning with limited numerical precision. In International Conference on Machine Learning, pp. 1737–1746. Cited by: §3.4.
-  (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149. Cited by: §3.2.
-  (2018) Mlcapsule: guarded offline deployment of machine learning as a service. arXiv preprint arXiv:1808.00590. Cited by: §1, §2.2, Table 1.
-  (2017) Deep learning for finance: deep portfolios. Applied Stochastic Models in Business and Industry 33 (1), pp. 3–12. Cited by: §1.
Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. Cited by: §4.1.
-  (2017) Quantized neural networks: training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18 (1), pp. 6869–6898. Cited by: §3.4.
-  (2018) Chiron: privacy-preserving machine learning as a service. arXiv preprint arXiv:1803.05961. Cited by: §1, §2.2, Table 1.
-  (2018) Efficient deep learning on multi-source private data. arXiv preprint arXiv:1807.06689. Cited by: §1, §2.2, Table 1.
-  (2018) gazelle: A low latency framework for secure neural network inference. In 27th USENIX Security Symposium (USENIX Security 18), pp. 1651–1669. Cited by: §2.2, Table 1.
-  (2020) COIN attacks: on insecurity of enclave untrusted interfaces in sgx. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 971–985. Cited by: §2.1.
-  (2009) Learning multiple layers of features from tiny images. online: http://www. cs. toronto. edu/kriz/cifar. html. Cited by: §4.1.
-  (2017) Inferring fine-grained control flow inside sgx enclaves with branch shadowing. In 26th USENIX Security Symposium (USENIX Security 17), pp. 557–574. Cited by: §2.1.
-  (2018) Privacy aware offloading of deep neural networks. arXiv preprint arXiv:1805.12024. Cited by: §2.2, Table 1.
-  (2017) Oblivious neural network predictions via minionn transformations. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 619–631. Cited by: §2.2, Table 1.
-  (2020) Shredder: learning noise distributions to protect inference privacy. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 3–18. Cited by: §2.2, Table 1.
-  (2018) ABY3: a mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 35–52. Cited by: §2.2, Table 1.
-  (2017) Secureml: a system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 19–38. Cited by: §2.2, Table 1.
-  (2016) Oblivious multi-party machine learning on trusted processors. In 25th USENIX Security Symposium (USENIX Security 16), pp. 619–636. Cited by: §1, §2.2, Table 1.
-  (2019) Discretization based solutions for secure machine learning against adversarial attacks. IEEE Access 7, pp. 70157–70168. Cited by: §3.4.
Imagenet large scale visual recognition challenge.
International journal of computer vision115 (3), pp. 211–252. Cited by: §4.1.
-  (2015) Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp. 1310–1321. Cited by: §2.2, Table 1.
-  (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §4.1.
-  (2017) Learning with privacy at scale. Apple Mach. Learn. J 1 (9). Cited by: §2.2, Table 1.
-  (2018) Slalom: fast, verifiable and private execution of neural networks in trusted hardware. arXiv preprint arXiv:1806.03287. Cited by: §1, §2.2, Table 1, §4.2.
-  (2019) Securenn: 3-party secure computation for neural network training. Proceedings on Privacy Enhancing Technologies 2019 (3), pp. 26–49. Cited by: §2.2, Table 1.
-  (2018) Not just privacy: improving performance of private deep learning in mobile cloud. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2407–2416. Cited by: §2.2, Table 1.
-  (2017) Leaky cauldron on the dark land: understanding memory side-channel hazards in sgx. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 2421–2434. Cited by: §2.1.
-  (2016) AsyncShock: exploiting synchronisation bugs in intel sgx enclaves. In European Symposium on Research in Computer Security, pp. 440–457. Cited by: §2.1.
-  (2015) Controlled-channel attacks: deterministic side channels for untrusted operating systems. In 2015 IEEE Symposium on Security and Privacy, pp. 640–656. Cited by: §2.1.
-  (2016) Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160. Cited by: §3.4.
-  (2014-February 25) System and method for predicting behaviors of detected objects. Google Patents. Note: US Patent 8,660,734 Cited by: §1.