Security and Privacy Issues in Deep Learning

07/31/2018 ∙ by Ho Bae, et al. ∙ Seoul National University 2

With the development of machine learning, expectations for artificial intelligence (AI) technology are increasing day by day. In particular, deep learning has shown enriched performance results in a variety of fields. There are many applications that are closely related to our daily life, such as making significant decisions in application area based on predictions or classifications, in which a deep learning (DL) model could be relevant. Hence, if a DL model causes mispredictions or misclassifications due to malicious external influences, it can cause very large difficulties in real life. Moreover, training deep learning models involves relying on an enormous amount of data and the training data often includes sensitive information. Therefore, deep learning models should not expose the privacy of such data. In this paper, we reviewed the threats and developed defense methods on the security of the models and the data privacy under the notion of SPAI: Secure and Private AI. We also discuss current challenges and open issues.



There are no comments yet.


page 3

page 7

page 10

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Advances of deep learning (DL) algorithms have transformed the solution of data-driven problems in various applications in real life, including the use of large amounts of patient data for health prediction services [1]; autonomous security audits from system logs [2]; and unmanned car driving powered by visual object detections [3]. However, the vulnerabilities of DL systems have been recently uncovered within a vast amount of literature. It is very dangerous that these applications are based on little understandings of security and privacy on DL systems.

Although many research studies have been published on both attacks and defense with deep learning security and privacy, they are still fragmented. Hence we review recent attempts toward Secure AI and Private AI. Addressing the need for robust artificial intelligence (AI) systems in security and privacy, we develop a perspective on SPAI: Secure and Private AI. Secure AI aims for AI systems that have high security guarantees; Private AI aims for AI systems that preserve the data privacy. Additionally, as a part of the effort to build the SPAI system, we review the fragmented findings and attempts to address the attacks and defenses in deep learning security and privacy.

Secure AI focuses on attacks and defense with respect to AI systems, which, in terms of DL, is a model. Based on the knowledge of the structure and parameters of the model, attacks on DL models usually attempt to subvert the learning process or induce false predictions on the purpose, by injecting adversarial samples. This type of attack, which can include gradient-based techniques [4, 5], is often called a white-box attack. In constrast, black-box attacks lead the target system to make false predictions, without any information about the underlying model. We observe that most of the attacks exploit the prediction confidence given by the targeted model without knowing the model’s structure and parameters.

To defend from these attacks, methods such as adversarial training [5, 6, 7], gradient masking [8, 9, 10], GAN [11, 10] and statistical approaches [12, 13, 14] have been proposed. Table II lists recent research on attacks with various models of deep learning, with their structures and parameters, together with the defense against these attacks.

On the other hand, Private AI aims for the AI systems that preserve data privacy. DL requires users to transfer some sensitive data to remote machines because of the computational cost or the need for collaborative training. In such situations, users lose control over the data after the transfer and have concerns about their data privacy being stolen between transfers, or the service holders that they upload their data to can misuse their data without consent. It was also claimed that only with the deployed DL model can the data used for training the model be inversed [15]. Against such privacy threats, privacy-preserving techniques including fully homomorphic encryption (FHE) [16, 17, 18, 19, 20], differential privacy [21, 22, 23, 24, 25, 26, 27, 28], and secure multi-party computation (SMC) [29, 30], have been combined with the DL frameworks. Table IV lists recent research on machine learning attacks and defenses to expose the privacy of the training and test data.

We review recent research on privacy and security issues associated with deep learning in several domains. Additionally, we taxonomize possible attacks and the state-of-the-art defense methods on Secure AI and Private AI. To the best of our knowledge, our work is the first attempt to taxonomize approaches to privacy in deep learning.

I-a Adversarial Examples in Real World Setting

The first adversarial attack started with an image in a non-targeted manner. An alternative targeted attack soon developed maximizing the likelihood of the target class. A recent targeted attack performed on the Google Cloud Vision (GCV) API compromising commercial systems. This approach becomes problematic to the second service provider because of the use of a decision-making service with given prediction score. For example, a service provider that uses GCV API for the auto-driving will fail to stop an adversarily crafted stop sign. [31]. Finlayson et al. [32]

extended adversarial attacks to medical imaging tasks. Both white- and black-box attacks are presented fooling medical deep learning classifiers. As such,

Finlayson et al. [32] showed that there is potential harm in the medical domain that can be caused by such adversarial attacks. In addition to the medical domain, Carlini and Wagner [33] proposed the first adversarial examples of automatic speech recognition. They applied a white-box iterative optimization-based attack and showed a 100% success rate, which showed that the feasibility of adversarial attacks on an image can be transferred to another domain. They reconstructed the waveform of input to while exploiting conventional measure distortion, and they successfully produced speech to the desired phrase with 99.9% similarity given any audio waveform.

Ii background

Behind the success of deep learning lies the advancements of deep neural networks (DNNs) trained with an extensive amount of data. In this section, we introduce the components and the training algorithm of a DNN. Further, we describe the recent DL models that are widely used. The building block of a neural network is an

artificial neuron

, which was designed to resemble a human neuron. However, because the actual biological activities inside human neurons are still uncovered, artificial neurons simply compute the weighted sum of the input and activations, as follows:


where the input is , the output is

, the activation function is

, and the weights are

. The artificial neurons are used as nodes to construct layers, and by piling up these layers deep neural networks (DNNs) are constructed. The activation functions are nonlinear functions such as the sigmoid, tanh and ReLU. The nonlinearity of the activation function piles up as the number of layers grows and enables DNNs to approximate target functions without any handcrafted feature selections.

Fig. 1: General DNN training process

Ii-a Artificial Intelligence powered by Deep Learning

Ii-A1 Deep Learning Workflow—Training and Inference

The workflow of the DL contains two phases: training and inference. DNNs learn new capabilities through the training phase from the existing data, and the learned capabilities are applied to unseen data at the inference phase.

The overview of the DNN training process is described in Figure 1

. DNNs are trained by iterating feedforward and backpropagation until convergence. At the feedforward stage, the input propagates along the layers to computes the output. Then, to minimize the error between the output and the actual label, the gradient descent algorithm is used,


where a loss function

is used for the weight parameters and the learning rate . Hence, at each backpropagation stage, each node computes the gradient and updates the weight parameters as described in Equation 2

. However, it is highly inefficient to iterate the process for the full batch of the data (all instances in the data) since the training data required for the DNN training is enormous. Therefore, mini-batch stochastic gradient descent (mini-batch SGD or SGD) is widely used. After the model converges to a certain accuracy or loss value, the model is used for prediction at the inference stage. At the inference phase, the model only forward propagates the input and regards the output as a prediction.

Ii-A2 Different Types of Deep Neural Network Models

(a) CNN structure
(b) RNN structure [34]
(c) GAN structure
Fig. 2: Different DNN model structures
  • Feed-forward Neural Network (FNN). A feed-forward neural network (FNN) is the most basic structure of the DNNs. It contains multiple layers, and the nodes between layers are fully connected while the intra-layer nodes are not connected to one another.

  • Convolutional Neural Network (CNN). A convolutional neural network (CNN) consists of one or more convolutional layers, which use convolutional operations to compute layer-wise results. This operation allows the network to learn about spatial information and hence CNNs show outstanding performances especially on vision applications [35, 36, 37].

  • Recurrent Neural Network (RNN). A recurrent neural network (RNN) is widely used to process sequential data. The RNN updates the current hidden unit and calculates the output based on the current input and past hidden unit. There are well-known problems of RNNs such as the gradient vanishing problem, and some variants, such as Long short-term memory [38]

    and Gated recurrent unit 

    [39] have been proposed to solve such problems.

  • Generative Adversarial Network (GAN). A generative adversarial network (GAN) framework [40] consists of a discriminator and a generator . generates fake data while determines whether the generated data is real. Usually generators and discriminators are neural networks that can have various structures depending on the application. GANs are actively studied in various fields, such as image/speech synthesis and domain adaptation.

Ii-B Privacy-preserving Techniques

Ii-B1 Homomorphic Encryption

An encryption scheme that allows arbitrary computations on encrypted data without decrypting it or having access to any decryption key, is called homomorphic encryption (HE). In other words, the encryption scheme satisfies the following equation:


where , , and are the operations defined in , respectively, and thus called a homomorphic encryption scheme.

Homomorphic cryptosystems in early stages were partial homomorphic cryptosystems [41, 42, 43, 44], that showed either additive or multiplicative homomorphism [45]. However, after the work by Gentry and Boneh [45] using ideal lattices was introduced, various attempts on fully homomorphic encryption (FHE), which allows any computable function to be performed on the encrypted data, have been proposed [46, 47, 48, 49, 17, 50].

Although FHE can benefit many applications including cloud computing platforms and secure multi-party computation, the use of massive data inputs and computational workloads as well as the nonlinearity in DL models, is still a burden to be combined with deep learning.

Ii-B2 Differential Privacy

Fig. 3: Overview of the differential privacy framework

Differential privacy is one of the state-of-the-art privacy preserving models  [51]; it guarantees that an attacker cannot deduce any private information with high confidence from databases or released models. In other words, differential private algorithms prevent an attacker from knowing the existence of a particular record by adding noise to the query responses.

Here is the attack scenario that is assumed in differential privacy algorithms: An attacker is allowed to query two adjacent databases, which vary in at most one record. By sending the same query to both databases, the difference between the respective responses is considered to arise from “one record.” For example, imagine that there is a database on weights and one can query only the average value of all records. In this situation, it is impossible to grasp a specific person’s weight. However, if a new record is added and the attacker knows the former average weight, it is possible for the attacker to figure out the weight of the person added.

Differential privacy counters such privacy threats by adding noise to the response as follows:


where is a randomized mechanism that applies the noise to the query response; is the target database, and is the original query response, which is deterministic.

gives -differential privacy if all adjacent and satisfy the following:


where and are two adjacent databases, and is a subset of . is the privacy budget that controls the privacy level; the smaller is determined, the more similar and are required to be. These facts show that there is a trade-off between the data utility and the privacy level.

Since Equation. 5 is a strict condition, -differential privacy introduces the term, which loosens the bound of error by the amount of . In other words, allows

to satisfy the differential privacy condition even if the probabilities are somewhat different. The definition of

-differential privacy holds when the following equation is satisfied:


where is another privacy budget which controls the privacy (confidence) levels. ⁡

Usually, the noise is sampled from the Laplace distribution or Gaussian distribution 

[51]. Each distribution depends on the sensitivity and privacy budgets. The sensitivity  [51] of the query response function captures how much one record can affect the output, and it can be calculated as the maximum difference between responses on the adjacent databases:


A larger sensitivity demands a larger amount of noise under the same privacy budget. There are some useful theories in which the composition of differential private mechanisms is also a differential private mechanism. Composition theorem  [23, 52], advanced composition theorem  [53, 24, 54]

and moment accountant 

[21] have been proposed.

White-box Black-box Training Phase Inference Phase
Adversarial Attack Types ↓
TABLE I: Attack Methods against Secure AI
Attacks Defenses
White-box Gradient
Szegedy et al. [55] Masking Athalye et al. [56]
Evasion Finlayson et al. [32] Athalye and Sutskever [57]
Huang et al. [58] Buckman et al. [8]
Alfeld et al. [59] Carlini and Wagner [60]
Moosavi-Dezfooli et al. [61] Zantedeschi et al. [62]
Jagielski et al. [63] Dhillon et al. [9]
Goodfellow et al. [5] Ma et al. [64]
Kurakin et al. [65] Na et al. [66]
Dong et al. [67] Kolter and Wong [68]
Papernot et al. [69] Buckman et al. [8]
Carlini and Wagner [70] Guo et al. [71]
Baluja and Fischer [72] Ilyas et al. [73]
Athalye et al. [56] Ma et al. [64]
Biggio et al. [4] Na et al. [66]
Samangouei et al. [11]
Sharma and Chen [74]
Song et al. [10]
Black-box Xie et al. [75]
Kurakin et al. [65] Papernot et al. [76]
Ateniese et al. [77] Metzen et al. [78]
Papernot et al. [79] Papernot et al. [76]
Huang et al. [58] Sinha et al. [80]
Grosse et al. [81] He et al. [82]
Alfeld et al. [59]
Long et al. [83] Adversarial
Elsayed et al. [31] Training Goodfellow et al. [5]
Behzadan and Munir [84] Sun et al. [6]
Kurakin et al. [65] Gu et al. [7]
Kurakin et al. [85]
Tramèr et al. [86]
Samangouei et al. [11]
Song et al. [10]
Approach Steinhardt et al. [12]
Paudice et al. [13]
Paudice et al. [14]
Poisoning Biggio et al. [87] Steinhardt et al. [12]
Mei and Zhu [88] Koh and Liang [89]
Patil et al. [90] Paudice et al. [13]
Muñoz-González et al. [91] Paudice et al. [14]
Yang et al. [92]
Chen et al. [93]
TABLE II: Secure Vulnerability in AI and the Corresponding Defense Methods
Homomorphic Differential Secure Multi-party
Potential Threats by Role ↓ Encryption Privacy Training
Model & Service Providers [15]
Information Silos
DL Service Users
TABLE III: Potential Privacy Threats against Private AI and the Corresponding Defense Methods
Privacy-preserving Techniques
Homomorphic Encryption Gilad-Bachrach et al. [16] Hesamifard et al. [17] Chabanne et al. [18]
Bourse et al. [19] Sanyal et al. [20]
Differential Privacy gradient-level objective-level label-level
Abadi et al. [21] Chaudhuri and Monteleoni [22] Papernot et al. [25]
Xie et al. [94] Phan et al. [95] Papernot et al. [26]
McMahan et al. [27] Phan et al. [96] Triastcyn and Faltings [97]
Phan et al. [98]
Secure Multi-Party Training Barni et al. [99] Orlandi et al. [100] Shokri and Shmatikov [29]
Liu et al. [101] Liu et al. [102] Mohassel and Zhang [103]
Rouhani et al. [104] Juvekar et al. [105] Chase et al. [106]
Acar et al. [107] Riazi et al. [108] Aono et al. [30]
TABLE IV: Defending the Techniques of Private AI

Iii secureAI

Fig. 4: Overview of (a) white-box attack scenario (b) black-box attack scenario

Deep learning is applied to various fields ranging from autonomous driving to medical diagnosis. Hence, if the deep learning models are exposed to hostile influences that can destroy the training process or derive unintended behaviors from the pretrained models, they can result in terrible consequences in real life. For example, it was recently revealed that one can fool the autnomous driving system by jamming sensors [109]. Likewise, if someone can somehow change the input of the autonomous driving model to an adversarial example, it can even lead the passengers to death.

Hence, we suggest the concept of Secure AI, which means the AI system with security guarantees, in order to encourage the studies on the security of the AI systems. As deep learning is one of the state-of-the-art AI algorithms, we introduce and taxonomize groups of studies on attacks on deep learning models and defenses against those attacks.

Iii-a Security Attacks on Deep Learning Models

Attack scenarios on deep learning models can be differed by the amount of information that the attacker has about the model. In a white-box scenario, the attacker has full access to all information in the model, including the model structure and the values of all parameters; in the black-box scenario, the adversary has limited information about the model, such as the predicted label of the input.

Based on these scenarios, most of the attacks on deep learning model security are conducted by adversarial examples. Adversarial examples refer to the manipulated inputs to models that an adversary designed to deceive classifiers to make incorrect predictions. In many cases, the adversarial examples are generated by subtle modifications to the original example that are hard to tell, but still causes the classifier to make mistakes. Depending on the attacker’s goal, the attacks by adversarial examples can be divided into targeted and non-targeted attacks. That is, if the adversary aims to alter the classifier’s output to some pre-specific target label, this attack is called a targeted attack; in the case of non-targeted attack, the adversary’s goal is to make the classifier choose any incorrect label. Generally, a non-targeted attack shows higher success rate compared to a targeted one.

In addition, according to the recent survey of Papernot et al. [110], there are two types of attacks depending on which phase of the workflow of machine learning is interfered. If the adversarial example engages in the training phase and tries to destroy the model while training, the attack is called the poisoning attack and the example used in this attack is referred as an adversarial training example. On the other hand, adversarial examples can also be used in the inference phase and intentionally lead the model to malfunction. This attack is called the evasion attack.

Fig. 5: Two adversarial examples generated by the momentum iterative fast gradient sign method excerpted from Dong et al. [67]. Left column: the original images. Middle column: the generated adversarial perturbations. Right column: the adversarial images into which the adversarial perturbation is added.

Iii-A1 Evasion Attack

White-box Attack
L-BFGS [55] CW attack [70] JSMA [69] FGSM [5] Iterative FGSM [65] Momentum Iterative FGSM [67] ATN [72] UAP [61] AS attack [57] BPDA [56]
White-box Attacks ↓

White-box targeted vs. non-targeted attack methods. Abbreviation: L-BFGS = Limited-memory Broyden–Fletcher–Goldfarb–Shanno, FGSM = Fast Gradient Sign Method, JSMA = Jacobian-based Saliency Map Attack, CW attack = Carlini’s and Wagner’s attack, UAP = Universal Adversarial Perturbation, ATN = Adversarial transformation networks, AS attack = Athalye’s and Sutskever’s attack, and BPDA = Backward Pass Differentiable Approximation.

The initial study on evasion attacks started from [55]. Szegedy et al. [55] suggested the idea of using the limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) algorithm to generate an adversarial example. The authors propose a targeted attack method, which involves solving the simple box-constrained optimization problem below.


where the untainted image, and is the target label; and represents the minimum amount of noise needed to disassociate the image from its true label. This method is designed to find the smallest perturbation needed for a successful attack. Sometimes it creates an inapplicable adversarial perturbation , which performs only the role of blurring the image. This form of attack has a high misclassification rate but also a high computational cost since the adversarial examples are generated as a result of solving the optimization problem in Equation 8 via a box-constrained L-BFGS.

On the other hand, Carlini’s and Wagner’s attack (CW attack) [70] is based on the L-BFGS attack [55], and it modifies the optimization problem in Equation 8 as


where is a distance metric that includes , , , and ; is an objective function in which if and only if and is a properly chosen constant. This modification enables Equation 9 to be solved by the existing optimization algorithms. The use of the Adam [111] optimizer enhances the effectiveness in finding adversarial examples quickly. For relaxation, they use the method of change of variables or projection into box constraints for each optimization step.

Papernot et al. [69] introduced a targeted attack method, that optimizes under the distance, which is known as the Jacobian-based Saliency Map Attack (JSMA). It constructs a saliency map based on the gradient derived from the feedforward propagation and modifies the input features that maximize the saliency map in a way that increases the probability to be classified as target label .

In general, a deep learning model is described as non-linear and overfitting, but in [5], they introduce the fast gradient sign method (FGSM). Goodfellow et al. [5] assert that the main vulnerability of neural networks to adversarial perturbation is caused by their linear nature. Their method linearizes the cost function around the present value, and finds its maximum value from the following closed-form equation as follows:


where is the adversarial example; is the untainted input to the model, and is the true label that corresponds to ; decides how strong the adversarial perturbation is that is applied to the image, and is the cost function to train the network. Although the proposed method can generate adversarial examples with relatively low computational costs, it shows a low success rate.

To overcome the shortcomings of the previous two ideas, various compromises have been made, and iterative FGSM [65] is one of them. Iterative FGSM utilizes a specialized iterative optimization. It utilizes the FGSM for several steps but with a smaller step size. The clip function implements a per-pixel clipping of the image. Technically, the result will be in -neighborhood of the original image. The detailed update rule is described as follows:


where is the adversarial example iteratively optimized, and is the intermediate result in the N-th iteration. As a result, it showed improved performance in terms of the generation throughput and the success rate.

Using the iterative method proposed above, Dong et al. [67] added a momentum term to improve the transferability of the generated adversarial examples. It was presented in the Adversarial Attacks and Defences Competition [67] at NIPS 2017, and it won the first place in both the non-targeted attack and targeted attack tracks. The main idea of the paper is as follows:


Compared to Equation 11, adding the decay provides the momentum with a gradient.

An adversarial transformation network (ATN) [72] is another targeted attack method. An ATN is a neural network trained to generate a targeted adversarial examples with minimal modification from the original input, making it hard to differentiate from the clean examples.

Beyond adding different noise values per input for misclassification, universal adversarial perturbations [61]

show the presence of universal (image-agnostic) perturbation vectors that cause all natural images in a dataset to be misclassified at a high probability. The main focus of the paper is to find a perturbation vector

that tricks the samples in the dataset. Here, represents the dataset that contains all of the samples.


The should satisfy the following conditions of , and , where is the classifier; limits the value of the perturbation and quantifies the specified fooling rate for all images.

In the case of most adversarial attacks, the efficacy of each attack can be decreased via transformations, such as viewpoint shift and camera noise. There is a very low percentage of cases in which the image to which an adversarial noise is added directly applies to the classifier in the system of a physical world. They usually have some steps of preprocessing and angle changing for adjustment. Athalye and Sutskever [57] proposes a method to overcome this current limitation by generating a perturbation that makes the input have a variety of distortions such as random rotation, or translation, and the addition of noise is implemented to be misclassified in a classifier. In addition, they use a visual difference for a -radius ball constraint instead of a distance in a texture space.

A backward pass differential approach attack method [56] has been recently proposed, which is capable of preventing recent gradient masking defense methods. The authors claim that finding defenses that rely on recently suggested gradient masking methods can be circumvented by performing the backward pass with the identity function, which is for approximating true gradients.

Fig. 6: Historical timeline of white-box attacks. (Abbreviations: L-BFGS = Limited-memory Broyden–Fletcher–Goldfarb–Shanno, FGSM = Fast Gradient Sign Method, JSMA = Jacobian-based Saliency Map Attack, CW attack = Carlini’s and Wagner’s attack, UAP = Universal Adversarial Perturbation, ATN = Adversarial transformation networks, AS attack = Athalye’s and Sutskever’s attack, and BPDA = Backward Pass Differentiable Approximation.) Red: CW attack is an advanced idea of L-BFGS. Purple: Like mentioned on each title, FGSM is a basic idea of Iterative and Momentum Iterative FGSMs. Green: UAP and AS attack methods created the idea of generating a special perturbation that is robust to either image preprocessing or resource limitation. Pink: BPDA defeated a recently proposed large number of gradient masking defenses
Black-box attack

In the real world, accessing models or data sets that are used for training the model, or both are too difficult. Although there are a lot of open data (Image, sound, video and etc.), the internal data used for training models from industries still secret. Moreover, Models contained in mobile devices are not accessible to attackers. The Black-box attack assumes a situation similar to reality. The attacker has no information about the model and the dataset. The available information is the input format and the output label of a target model the same as when using a mobile application. A target model can be the models hosted by Amazon and Google. If an attacker wants to create an aggressive example in a gradient-based way, he needs a replacement model because he needs a target model but does not have access to it. How can the attacker reproduce it without knowing the architecture of the target model?

According to Szegedy et al. [55], Goodfellow et al. [5], neural networks can attack other models regardless of the number of layers and the number of hidden nodes as long as the target task is the same. These authors considered this finding to be due to the neural networks’s linear nature, in contrast to previous works in that the transferability is due to the nonlinearity of the neural network. People use a sigmoid or ReLU activation function to give nonlinearity. The sigmoid has the advantage of nonlinearity, but it is tricky to use in learning. On the other hand, ReLU is widely used because it is easy to learn with, but non-linearity does not grow as with sigmoid. Therefore, if the target task is the same, then the models learn a similar decision boundary. Moreover, Papernot et al. [112] showed that transfer is possible between traditional machine learning techniques and neural networks using the experiment’s intra-technique transferability, or between the same algorithms such as SVM but different model initialization’s, or cross-technique transferability, between different algorithm like SVM and neural networks. Kurakin et al. [65] assumes the case in which a model obtain input by a camera or sensors that is not directly obtained, and as a result, it attacks a neural network using the pictures of the attacks. The attack still operated and thus, this approach showed robustness to transformation.

As shown above, transfer of attack is possible, and thus, an attacker can attack by making a substitute for a target. At this time, it is possible to use an approximate architecture such as the CNN, RNN, and MLP through the input format (image or sequence). The model can be trained by collecting data similar to the data obtained by learning the target from the public. However, The cost of collecting is enormous. Papernot et al. [79] solved this issue using an initial synthetic dataset and Jacobian-based data augmentation method. If the dataset of the target is MNIST, then the initial synthetic dataset can be handcrafted digital digit images approximately 100 or the subset of a test set that is not used when training the targets. The label can be obtained by putting the data as a input in oracle which is the target model. After training the substitute using the input and label pair, the authors crafted an adversarial example using Goodfellow et al. [5] and Papernot et al. [76]. The results of the experiment on the transferability of the MNIST case showed about 90% success rate that corresponded to the epsilon range of 0.5–0.9. However, if an attacker wants to label its inputs by blowing queries to a service such as Google or Amazon, he has a limited number of queries or a high probability of being caught by the detector due to the large number of instances. To release this problem, Papernot et al. [112] introduced reservoir sampling, and it effectively reduced the number of data instances needed to train the substitute.

Iii-B Poisoning Attack

Fig. 7: The functionality of poisoning a sample; (a) The decision boundary after training with normal data, (b) The decision boundary after injecting a poisoning sample.

If the evasion attack is to avoid the decision boundary of the classifier at the test time the poisoning attack intentionally inserts a malicious example into the training set at the training time in such a way as to interfere with the learning of the model or to attack at the test time more easily. There is a large number of poisoning attack methods that can be successfully applied to traditional machine learning such as SVM or LASSO, but there are only a few for neural networks. The classics can be expressed mathematically, but neural networks have been difficult to poison because of their complexity. A poisoning attack can be divided into a white-box and black-box attack similar to an evasion attack, but it will be expressed as strong adversaries weak adversaries to make a concrete expression suitable for poisoning attack. The goal of adversaries could be to completely ruin the learning of the system or to make the backdoor to be recognized as a person with authentication when deployed.

Strong adversaries refer to adversaries with powerful permissions that can manipulate the parameter values of the model with direct access to the model and training data, similar to a white-box attack, or poison the training data to spoil the learning. Their main purpose is to subvert the training process by injecting malicious samples, but the accessibility can be different. The authors of [54] presented two attack scenarios of strong adversaries, which are perfect-knowledge (PK) attacks and limited-knowledge (LK) attacks. As the terms suggest, a PK attack scenario is an unrealistic setting, and hence, it is only assumed for a worst-case evaluation of the attack. On the other hand, under LK attack scenarios, the typical knowledge that the attacker possesses, is described as , where is the feature representation and is the learning algorithm. The hat symbol denotes limited knowledge of a given component; is the surrogate data available to the attacker, and is the learned parameter from .


where the surrogate data is divided into the training data and validation data . is an objective function that evaluates the impact of the adversarial examples on the clean examples, and it can be defined in terms of a loss function and , which measures the performance of the surrogated model using . The optimization problem comprised of bilevel optimization and the influence of is propagated using . The goal of the optimization is to ruin the system, and the label of the poison is generic. If a specific target is required, the Equation (10) is changed to


where is the manipulated validation set, which contains the same data as but with misclassified labels for the desired output. Muñoz-González et al. [91] propose the back-gradient optimization to solve the Equation 14 or 15 and generate poisoning examples, and compared with the previous gradient-based optimization methods. Since gradient-based optimization requires a strict convexity assumption of the objective function and Hessian-vector product, the authors argue that such an approach is not applicable to complex learning algorithms including neural networks and deep learning architectures. In addition, Yang et al. [92] introduce the possibility of applying the gradient-based method to DNNs, and they develop a generative method inspired by the concept of GAN [40]. Rather than computing the gradients directly, the authors used an auto-encoder as a generator. Hence, the results show a speed up of the more than 200x compared to the gradient-based method.

A weak adversary has a capability that insider intruders can add a few poisoned samples without having authority for model or training. In contrast to previous studies that assume strong adversaries, Chen et al. [93] introduce three constraints: 1) no knowledge of the model, 2) injecting a small fraction of training data, and 3) poisoning data not detected by humans. This is based on situations similar to the real world. They proposed two methods, input-instance-key strategies and pattern key strategies, for weaker adversaries to break the security and obtain privilege of theface recognition system. The former is to make an image be a key image, and makes it recognized as a targeted label. In consideration of the situation of going through the camera, several random noise add to sample. In the latter case, three strategies exist: 1) blended injection strategy, 2) accessory injection strategy, and 3) blended accessory injection strategy. The first is to blend a kitty image or a random pattern onto the input image. However, it is unreasonable to add a specific pattern to the image captured by the camera in real situations, and thus, the second is to apply an accessory such as glasses or sunglasses to the input. It is easy to use at inference stage. When training, the parts other than the glasses have the same value as the input image, and the pixel values of the glasses are applied only to the glasses. The last is the method combines the first and the second. Unlike previous studies in which poisoning data accounted for 20 percent of the training data, only five poisoned samples were added when 600,000 training images were used for instance-key, and approximately 50 poisoned samples were used for pattern-key strategies. They were able to create a successful backdoor by adding a small fraction of poisoned samples.

Iii-C Defense Techniques against evasion attacks

There is a variety of types of defense techniques against evasion attacks. Most of the defense techniques exploit gradient masking, which can be categorized into two groups, namely, non-obfuscated gradient masking (which includes adversarial training) and obfuscated gradient masking. The basic idea of gradient masking is to have a method that augments the adversarial examples, which is created by making the gradients point slightly farther than a decision boundary with clean examples and a method that makes use of techniques that hand over incorrect or foggy gradients to an adversary; these are both currently under the category of gradient masking.

Iii-C1 Non-Obfuscated Gradient Masking

Currently, most representative studies of non-obfuscated gradient masking involve adversarial training [55], [5]. As a monumental article, it is still one of the significant methods among the various gradient masking methods. By augmenting the training set with truly labeled adversarial examples, the model, which is robust to an expected adversarial attack, can be implemented. It is, however, still vulnerable to attacks that have not be expected during the training time. To expand the previously suggested work, Kurakin et al. [85]

experimented adversarial training to ImageNet 

[113]. Tramèr et al. [86] proposed a defense method that is also robust to black-box attacks by containing adversarial examples generated from other models. The recent work on speech data [6] trained the DNN using adversarial examples along with the clean examples, to increase the robustness against evasion attacks.

Iii-C2 Obfuscated Gradient Masking

The defense approaches against obfuscated gradient masking in general follow three types of obfuscated gradients, which are shattered gradients, stochastic gradients and vanishing/exploding gradient. A robustness against iterative optimization attacks is a key idea for a good defense system that is built a based on machine learning. Nevertheless, the existing gradient-based defense algorithm is designed based on the gradient of the initial version, which makes vulnerability to gradient based attack. The author of [56] showed that most of the obfuscated gradient based defenses are vulnerable to iterative optimization attacks [65, 114, 60] and become standard algorithm evaluating defenses.

Having shattered gradients means that incorrect gradients are achieved by making the model intentionally non-differentiable operationally or unintentionally numerically unstable. The purpose of a shattered gradient attack is to break this linearity with the consideration of the neural network, which in general, behaves in a largely linear manner [57]. A recent defense algorithm over the shattered gradient technique is to exploit thermometer encoding [8] neural networks to break the linearity.

The stochastic gradients make a model obfuscated by test time randomness. The algorithm randomly drops some neurons of each layer to 0 considering their original output value. Meaning that the network stochastically prunes a subset of the activations in each layer during the forward pass. The survived activations are scaled up to normalize the dynamic range of the inputs to the subsequent layer [9]. Similarly, the authors of [71] proposed a transformation approaches under the baseline of image cropping, rescaling [115], bit-depth reduction[116], JPEG compression[117]

, and total variance minimization

[118]. The approach first drops pixels in a random manner, and reconstructs images by replacing small patches using minimum graph cuts in overlapping boundary regions to remove artificially crafted in the edge. Buckman et al. [8] demonstrate thermometer code, which improves the robustness to adversarial attacks. Samangouei et al. [11] proposed Defense-GAN, which is a similar defense method as PixelDefend, but it uses a GAN instead of a PixelCNN.

The vanishing/exploding gradients make a model unusable by deep computation. The basic idea is to purify adversarially perturbed images back to clean examples by exploiting a pixelCNN as a generative model. The purified image is then used for the unmodified classifier. A recent defense algorithm exploits PixelCNN [119] to build PixelDefend[10] to approximate the training distribution.

Iii-C3 Defense against Poisoning Attacks

The framework proposed from [12]

takes the approach of removing outliers that are outside the applicable set. In binary classification, they aim to find the centroids of the positive and negative classes. Then, they remove points that are too far away from each corresponding centroid. To find these points, they make use of two methods: a sphere defense that removes points outside the spherical radius, and a slab defense that discards points that are too far away from the line in a complimentary way.

Koh and Liang [89] uses influence functions to track model predictions and identify the most influential data points that are responsible for a given prediction. They show that approximations in functions can still provide important information in non-convex and non-differentiable models where the theory breaks down. They also claim that by using influence functions, the defender can check out only the data prioritized by its influence score. This method outperforms previous methods of identifying the greatest training loss for removing the tainted examples.

The paper by Paudice et al. [13]

also suggests a defense mechanism to mitigate the effects of poisoning attacks on the basis of outlier detection. The attacker tries to have the greatest effect on the defender with a limited number of poisoning points. To mitigate this effect, they first divide the trustworthy dataset

into different classes, i.e., and

. Then, they use the curated data trains distance-based outlier detectors for each class. The outlier detection algorithm calculates the outlier score for each x in the original (total) data set. There are many ways to measure the outlier score, such as using SVM or LOF as a detector. The empirical cumulative distribution function (ECDF) of training instances is used to calculate the threshold for detecting outliers. By removing all of the samples that are expected to be contaminated, the defender can collect new data sets to retrain the learning algorithm.

Paudice et al. [14] chooses to re-label data points that are considered to be outliers instead of removing them. The label flipping attack is a special case of data poisoning that allows an attacker to control the label of a small number of training points. This paper proposes a mechanism that considers the points farthest from the decision boundary to be malicious, and it reclassifies them. The algorithm reassigns the label of each instance of using a k-NN. For each sample of training data, they first find the closest k-NN using the Euclidean distance. If the number of data points with the most common label among k-NN is equal to or greater than a given threshold, the corresponding training sample is renamed to the most common label in the k-NN.

Iv Private AI

Deep learning algorithms that account for most of the current AI systems rely highly on data. Hence, DL is always exposed to privacy threats, and it is imperative that the privacy of the training data be preserved. Hence we define Private AI, the AI system that preserves the privacy of the concerned data.

Iv-a Potential Threats from Different Perspectives

Fig. 8: Private AI: Potential threats in perspectives of (a) service provider (b) information silo (c) user

Iv-A1 Potential Threats in the Service Providers Perspective

When companies provide deep learning models and services to the public, there are potential risks in that the models leak private information even without revealing the original dataset. A model inversion attack occurs when an adversary uses the pre-trained deep learning model to discover the data used in the model training. Such attacks seek to manipulate the correlation between the target, the unknown input and the model output.

Recent studies on inversion attacks show that a model inversion attack is possible, by recovering images used in training [120] or performing a membership test to know whether an individual is in a dataset or not [121]. Furthermore, deployed deep learning services are exposed to data integrity attacks as well. Because the deployed service demands the user’s data, adversaries can attempt to break the integrity of the data at the servicing server. If a data holder includes the collected data without an integrity check, the broken integrity can mislead or even ruin the model.

Iv-A2 Privacy Violation in Information Silos

Information silos are a group of exclusive data management systems that are related. Data silos, often represented by hospitals or government agencies, have data of a similar nature and can derive productive output through collaborative data mining, but they do not share data by intellectual property or privacy breach issues. Secure multi-party computation (SMC) occurs when a set of parties like silos, with private inputs wish to compute some joint function of their inputs [122]. Similarly, the idea of secure multi-party training, which trains a joint deep learning model of private data input, has been emerging. In such training processes, the data privacy of each participant must be preserved in the face of adversarial behavior by other participants or by an external party.

Hitaj et al. [15] shows that a distributed, federated, or decentralized deep learning approach is fundamentally broken and does not protect the training sets of honest participants from a GAN-based attack. The adversary based on GAN deceives a victim into releasing more accurate information on sensitive data.

Iv-A3 Potential Threats in the User’s Perspective

Because many deep learning-based applications have been introduced in industry, such service users are under serious threats of the invasion of privacy [16, 20]. Since deep learning models are too large and complicated [36, 37] to be computed on small devices such as mobile phones or smart speakers, most service providers require users to upload their sensitive data, such as their voice recordings or face images and compute on their (cloud) servers. The problem is that upon uploading, the users lose control of their data. In other words, the users cannot delete their data and cannot check how their data is used. As the recent Facebook’s privacy scandal suggests, even when there are some privacy policies, it is difficult to notice or restrain from excessive data exploitation. In addition, since hardware requirements for deep learning are enormous, Machine-Learning-as-a-Service (MLaaS) provided by Google, Microsoft, or Amazon has gained in popularity among deep learning-based service providers. Such remote servers even make it difficult to manage the users’ data privacy.

Iv-B Defense Techniques against Potential Threats

Unlike many attacks that are attempted in the domain of SecureAI, only a few attacks are attempted in the field of PrivateAI as well as defense with respect to privacy preserved deep learning. We observed that this finding is due to the nature of privacy preserving techniques. Exploiting traditional security to the field of deep learning require encryption and decryption phases, which make it impractical in a real world due to the enormous computational complexity. As a result, a homomorphic encryption is a one of the few security techniques that can be exploited in deep learning. As one further step to Private AI, the differential privacy technique is actively exploited in deep learning. In the following section, we detail Private AI, which adopts the most recent privacy-preserving methods.

Iv-B1 Homomorphic Encryption on Deep Learning

CryptoNets [16] took the initiative of applying neural networks for inferencing on the encrypted data. CryptoNets utilize the leveled HE scheme YASHE’ [49] for the privacy-preserving inference on a pre-trained CNN model. It demonstrated over 99% accuracy in detecting handwritten digits (MNIST data set [123]). However, leveled HE leads to serious degradation in terms of the model accuracy and efficiency. Furthermore, because of the square activation function being replaced from nonpolynomial activation and the converted precision of the weights, the inferencing model obtains results that are quite different from the trained model. Hence, it is not suited for the recent complicated models [36, 37]. In addition, the latency of the computation is still of the order of hundreds of seconds, while Gilad-Bachrach et al. [16] achieved a throughput of 50,000 predictions in an hour. Cryptonets’ ability to batch images together can be useful in which applications where the same user wants to classify a large number of samples together. In the simplest case in which the user only wants a single image to be classified, this feature does not help.

In return, CryptoDL [17] and Chabanne et al. [18] attempted to improve CryptoNets by low degree polynomial approximations on activation functions. Chabanne et al. [18]

applied batch normalization to reduce the accuracy gap between the actual trained model and the converted model with an approximated activation function at the inference phase. The batch normalization technique also enabled fair predictions on a deeper model.

As a recent bootstrapping FHE technique was introduced [124], TAPAS [20] and FHE-DiNN [19] were proposed. Since the method proposed by Chillotti et al. [124] supports operations on binary data, both utilized the concept of Binary Neural Networks (BNNs) [125]. FHE-DiNN [19] utilized discretized neural networks with different weights and input dimensions to evaluate Chillotti et al. [124] on DNNs. In comparison, TAPAS [20]binarized weights and enabled binary operations and sparsifications techniques. Both FHE-DiNN and TAPAS showed faster prediction than the approaches based on leveled HE. It is also notable that while leveled HE methods only support batch predictions, bootstrapping FHE-based methods enabled the predictions on single instances, which is more practical.

Iv-B2 Secure Multi-Party Computation (SMC) on Deep Learning

Distributed selective SGD (DSSGD) [29] proposed collaborative deep learning protocols with different data holders to train joint deep learning models without sharing their training data. This approach is very similar to the prior distributed deep learning algorithms [126, 127, 128]

. With the coordinated learning models and objectives, the participants train their local models and selectively exchange their gradients and parameters at every local SGD epoch asynchronously. On the other hand, since DSSGD assumes the parameter server 

[129], Aono et al. [30] pointed out that even with a few gradients, it is possible to restore the data used in training. Hence, to preserve the privacy against the honest-but-curious parameter server, LWE-based homomorphic encryption was applied with exchanging weights and gradients. The improved privacy achieved by homomorphic encryption, however, trades off with the communication costs.

Iv-B3 Differential Privacy on Deep Learning

By applying differential privacy to the deep learning models, the training data can be protected from the inversion attacks when the model parameters are released. Hence, there are many studies that utilize the differential privacy to deep learning models. Such methods assume that the training datasets and parameters of the model are the database and the responses, respectively, and prove that their algorithms satisfy either Equation 5 or 6.

Depending on where the noise is added, such approaches can be divided into three groups: gradient-level [21], objective-level [22] and label-level [25]. The gradient level approach injects noise into the gradients of the parameters in the training phase. The objective-level approach introduces the perturbed objective function by injecting the noise into the coefficients of the original objective function. The label-level approach introduces noise into the label in the knowledge transfer phase of the teacher student model.

The gradient-level approach [21]

proposed a differential private SGD algorithm that adds noise to the gradients in the batch-wise updates. It is important to estimate the accumulated privacy loss as learning progress by batch. In particular, the authors of  

[21] proposed the moment accountant to track the cumulative privacy loss. The moment accountant

algorithm considers privacy loss as a random variable and estimates the tail bound of it. The resulting bounds provide a tighter level of privacy than using the basic or strong composition theorems  

[23, 24].  McMahan et al. [27] introduced user-level differentially private LSTM. In language modeling, it is difficult and ineffective to keep privacy as the word level. Therefore,  McMahan et al. [27] defined user-level adjacent datasets and ensured differential privacy for users.  Xie et al. [94] proposed a Differentially Private Generative Adversarial Network (DPGAN). They injects noise into the gradient of discriminator to get the differentially private discriminator and the generator which is trained with that discriminator also become differentially private based on the post-processing theory [130].

Fig. 9: Overview of differential privacy in deep learning framework

The objective-level approach [22] disturbs the original objective function by adding noise to the coefficients. Then, the model trained on the disturbed objective function is differential private. Unlike the gradient-level approach, whose privacy loss is accumulated as training progresses, the privacy loss of the objective-level approach is determined at the building objective function and is independent of the epochs. To inject noise into the coefficients, the objective function should be a polynomial representation of the weights. If an objective function is not a polynomial form, the objective-level approach approximates it to the polynomial representation using approximation techniques such as Taylor or Chebyshev expansion. Then, the noise is added to each coefficient to obtain the disturbed objective function.  Chaudhuri and Monteleoni [22]

proposed the differentially private logistic regression, whose parameters are trained based on the perturbed objective function. The functional mechanism is applied not only to logistic regression but also to various models such as auto-encoder 


and convolutional deep belief network 

s [132].  Phan et al. [95] proposed deep private auto-encoder (dPA) and proved that the dPA is differential private based on the functional mechanism.  Phan et al. [96] introduced the private convolutional deep belief network (pCDBN), and they utilized the Chebyshev expansion to approximate the objective function to the polynomial form.  Phan et al. [98] developed a novel mechanism, called Adaptive Laplace Mechanism (AdLM). The key concept is to add ’more noise’ to the input features that are less relevant to the model output, and vice-versa.  Phan et al. [98] injects noise from the Laplace distribution into the Layer-wise Relevance Propagation (LRP)  [133] to estimate the relevance between the output of the model and the input features. They apply an affine transformation based on the estimated relevance to distribute the noise adaptively. AdLM also applies a functional mechanism that perturbs the objective function. These differential private actions are processed before training the model.

The label-level approach injects noise into the knowledge transfer phase of the teacher-student framework.  Papernot et al. [25] proposed the semi-supervised knowledge transfer model, which is called the Private Aggregation of Teacher Ensembles (PATE) mechanism. PATE is a type of teacher-student model, and its purpose is to train a differentially private classifier (student) based on an ensemble of non-private classifiers (teacher). Figure 9 shows the overview of the PATE approach. Each teacher model learns on disjoint training datasets, and the output of the teacher ensemble is determined by noisy aggregation of each teacher’s prediction. The noisy aggregation introduces a noisy label that meets DP, and then, the student model learns the noisy label from the teacher ensemble as a target label. Because the student model cannot access the training data directly and the differential private noise is injected into the aggregation process, PATE ensures safety intuitively and in terms of the DP, respectively. PATE utilizes the moment accountant to trace the cumulated privacy budget in the learning process. Later,  Papernot et al. [26] extended the PATE to operate on a large scale environment by introducing a new noisy aggregation mechanism. They showed that the improved PATE outperforms the original PATE on all measures and has high utility with a low privacy budget.  Triastcyn and Faltings [97] applied the PATE to build the differential private GAN framework. The discriminator of GAN frameworks is a type of classifier that determines whether the input data is real or fake. By using PATE as a discriminator, the generator trained with the discriminator is also differential private.

V Discussion

V-a Challenges and Future Research Directions

From Section III, we confirmed that there exist attack methods that can fool or subvert the deep learning models. We reviewed two types of attack scenarios which are the white-box attack and black-box attack. In white-box attack scenarios, most adversaries generate adversarial examples by taking advantage of the gradients of the target DL model, and those examples showed very high misclassification rate. It is crucial for the success of such attacks to acquire the true gradients from the vanilla model, which is the model without any defense method applied, to find sparse or blind spots of the target model. Hence, to defend against such attack methods, many researchers proposed diverse gradient masking defense methods, and these methods showed decent achievements by involving more nonlinearity in a model or preventing the gradients of the model from being copied by an adversary. As the authors of [56] suggest, proper gradient methods show powerful defense performance.

Therefore, it is believed to be beneficial if interpretable AI approaches [134, 135, 136] can be applied to such attack or defense methods. Interpretable AI analyzes the underlying functions of the deep learning model and determine the way that a deep learning model makes predictions. With deeper understanding of deep learning models, it will be feasible to make a system (model) robust to unseen attacks by identifying blind spots that should be considered and addressed.

From Section III-B, we reviewed some poisoning attack methods on deep learning models. The recent approaches include outlier detections to eliminate [13] or re-label [14] the suspected poisoned examples. However, it is a concern that such actions might constrain the decision boundary of the models too much. As Figure 7 suggests, the elimination or re-labeling of some data points can vary a model’s decision boundary a large amount. In addition, the degradation of the model accuracy might instead make the model susceptible to the other poisoning approaches. Hence, we need some metrics or evaluation methods to determine whether the model is defended to be safe and sound.

In addition, we reviewed the privacy-preserving deep learning models with the full homomorphic encryption cryptosystems applied in Section IV-B1. Although the recent methods achieved a high prediction rate despite the strict encryption, the performance in accuracy falls behind the state-of-the-art model performances, and it is not compatible for deeper models. The main reason for this situation is that the FHE methods used in those papers do not include the nonlinear activation functions discussed in Section II-A1. Hence, current FHE-based prediction models use different models from the actual trained models. In other words, they train the unencrypted data on the unencrypted typical models, and then, the trained weights and biases are applied to a different model, in which the activation functions are replaced to simple activations such as square functions. A discrepancy between the training and inference models usually causes high degradation in the prediction accuracy. To overcome this deterioration, two approaches are possible: either train the same model from the beginning, or properly transfer the model. As the authors of [137] suggest, the knowledge learned by a DL model can be distilled into another model.

In Section IV-B3, a large number of attempts were confirmed using differential privacy to protect data privacy in deep learning training. Such methods add noise to gradients or objective functions to confuse the attacker, and give closed-form proof on the differential privacy bounds of the proposed methods. However, from the DL researchers’ perspectives, such bounds are insufficient to give practical insights on whether such a privacy bound is strong enough or not. If differential privacy researchers can provide experiments on the assumed attack scenarios or some practical evaluations or metrics, it should be much more informative.

V-B Practical Issues and Suggestions for Deployment

The deep learning model variants might pose threats on the model security and data privacy. Because deep learning models are very complicated, it is difficult to think of a new model structure. Hence, once a model structure is deployed, a large number of users add some variants for their uses and train further with their own data. In the case of U-Net [138], which is a CNN model used for image segmentation in the biomedical field, there are several variants [139, 140, 141] proposed. If such similar models are deployed in public, it is likely to be susceptible for the attacks reviewed in this paper. They might give clues in building substitute models in black-box attack scenarios, or induce easier inversion attacks based on the accumulated knowledge from the similar models. Hence, we must be careful when deploying models, especially when there is a large number of variants.

Practical considerations on the processing time and throughput are needed as well. Although FHE combined with deep learning predictions showed remarkable performances both in the privacy and utility, it lacks the considerations of practical implementations. Because predictions on FHE data and models are still too slow, parallel or distributed processing using GPUs or clusters is crucial. In particular, since GPUs have already achieved high computational speeds in deep learning training, combining GPU’s high computing power with FHE model prediction is promising. Considering those situations in which we need FHE on predictions when the computing resources of the user devices are insufficient, on-device encryption and decryption should be considered as well.

Vi Summary

Deep learning has become one of the inseparable technologies in our daily lives, and the problem of security and privacy of deep learning has become an issue that can no longer be overlooked. Therefore, we defined Secure AI and Private AI, and we reviewed the related attack and defense methods.

In Secure AI, we surveyed the two types of attacks: evasion attack and poisoning attack. We categorized the attack scenarios as white-box and black-box attacks, according to the amount of information and the authority of the model that the adversary possesses. In this process, we confirmed that many research studies have been conducted with advanced and varied attack methods. On the other hand, the studies on the defense techniques are in relatively early stages. In this paper, we introduce the related studies by classifying them as gradient masking, adversarial training and statistical approaches.

Furthermore, the risk of data privacy violations is always widespread due to the characteristics of deep learning, which highly relies on an extensive amount of data, and the era of the fourth industrial revolution, in which data itself is the enormous asset. In this paper, we describe the possible threats on the data privacy from the perspectives of deep learning models and service providers, information silos and deep learning-based service users. In addition, we name the deep learning-based approaches that are concerned with data privacy as Private AI. Unlike Secure AI, there are not many studies on privacy attacks using deep learning. Hence, we introduce recent studies on three defending techniques concerned with Private AI: homomorphic encryption, differential privacy, and secure multi-party training. Finally, open problems and directions for future work are discussed.


  • Shickel et al. [2017] B. Shickel, P. Tighe, A. Bihorac, and P. Rashidi, “Deep ehr: a survey of recent advances in deep learning techniques for electronic health record (ehr) analysis,” arXiv preprint arXiv:1706.03446, 2017.
  • Buczak and Guven [2016] A. L. Buczak and E. Guven, “A survey of data mining and machine learning methods for cyber security intrusion detection,” IEEE Communications Surveys & Tutorials, vol. 18, no. 2, pp. 1153–1176, 2016.
  • Ren et al. [2015] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99.
  • Biggio et al. [2013] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, and F. Roli, “Evasion attacks against machine learning at test time,” in Joint European conference on machine learning and knowledge discovery in databases.   Springer, 2013, pp. 387–402.
  • Goodfellow et al. [2014a] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
  • Sun et al. [2018] S. Sun, C.-F. Yeh, M. Ostendorf, M.-Y. Hwang, and L. Xie, “Training augmentation with adversarial examples for robust speech recognition,” arXiv preprint arXiv:1806.02782, 2018.
  • Gu et al. [2018]

    Z. Gu, Z. Jia, and H. Choset, “Adversary a3c for robust reinforcement learning,” 2018.

  • Buckman et al. [2018] J. Buckman, A. Roy, C. Raffel, and I. Goodfellow, “Thermometer encoding: One hot way to resist adversarial examples,” in Submissions to International Conference on Learning Representations, 2018.
  • Dhillon et al. [2018] G. S. Dhillon, K. Azizzadenesheli, Z. C. Lipton, J. Bernstein, J. Kossaifi, A. Khanna, and A. Anandkumar, “Stochastic activation pruning for robust adversarial defense,” arXiv preprint arXiv:1803.01442, 2018.
  • Song et al. [2017] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman, “Pixeldefend: Leveraging generative models to understand and defend against adversarial examples,” arXiv preprint arXiv:1710.10766, 2017.
  • Samangouei et al. [2018] P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-gan: Protecting classifiers against adversarial attacks using generative models,” arXiv preprint arXiv:1805.06605, 2018.
  • Steinhardt et al. [2017] J. Steinhardt, P. W. W. Koh, and P. S. Liang, “Certified defenses for data poisoning attacks,” in Advances in Neural Information Processing Systems, 2017, pp. 3517–3529.
  • Paudice et al. [2018a] A. Paudice, L. Muñoz-González, A. Gyorgy, and E. C. Lupu, “Detection of adversarial training examples in poisoning attacks through anomaly detection,” arXiv preprint arXiv:1802.03041, 2018.
  • Paudice et al. [2018b] A. Paudice, L. Muñoz-González, and E. C. Lupu, “Label sanitization against label flipping poisoning attacks,” arXiv preprint arXiv:1803.00992, 2018.
  • Hitaj et al. [2017] B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep models under the gan: information leakage from collaborative deep learning,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.   ACM, 2017, pp. 603–618.
  • Gilad-Bachrach et al. [2016] R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing, “Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy,” in International Conference on Machine Learning, 2016, pp. 201–210.
  • Hesamifard et al. [2017] E. Hesamifard, H. Takabi, and M. Ghasemi, “Cryptodl: Deep neural networks over encrypted data,” arXiv preprint arXiv:1711.05189, 2017.
  • Chabanne et al. [2017] H. Chabanne, A. de Wargny, J. Milgram, C. Morel, and E. Prouff, “Privacy-preserving classification on deep neural network.” IACR Cryptology ePrint Archive, vol. 2017, p. 35, 2017.
  • Bourse et al. [2017] F. Bourse, M. Minelli, M. Minihold, and P. Paillier, “Fast homomorphic evaluation of deep discretized neural networks,” Ph.D. dissertation, IACR Cryptology ePrint Archive, 2017.
  • Sanyal et al. [2018] A. Sanyal, M. J. Kusner, A. Gascón, and V. Kanade, “Tapas: Tricks to accelerate (encrypted) prediction as a service,” arXiv preprint arXiv:1806.03461, 2018.
  • Abadi et al. [2016a] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.   ACM, 2016, pp. 308–318.
  • Chaudhuri and Monteleoni [2009] K. Chaudhuri and C. Monteleoni, “Privacy-preserving logistic regression,” in Advances in Neural Information Processing Systems, 2009, pp. 289–296.
  • Dwork et al. [2006] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor, “Our data, ourselves: Privacy via distributed noise generation,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques.   Springer, 2006, pp. 486–503.
  • Dwork et al. [2010] C. Dwork, G. N. Rothblum, and S. Vadhan, “Boosting and differential privacy,” in 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.   IEEE, 2010, pp. 51–60.
  • Papernot et al. [2016a] N. Papernot, M. Abadi, U. Erlingsson, I. Goodfellow, and K. Talwar, “Semi-supervised knowledge transfer for deep learning from private training data,” arXiv preprint arXiv:1610.05755, 2016.
  • Papernot et al. [2018] N. Papernot, S. Song, I. Mironov, A. Raghunathan, K. Talwar, and Ú. Erlingsson, “Scalable private learning with pate,” arXiv preprint arXiv:1802.08908, 2018.
  • McMahan et al. [2018] H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang, “Learning differentially private recurrent language models,” 2018.
  • Ermis and Cemgil [2017] B. Ermis and A. T. Cemgil, “Differentially private variational dropout,” arXiv preprint arXiv:1712.02629, 2017.
  • Shokri and Shmatikov [2015] R. Shokri and V. Shmatikov, “Privacy-preserving deep learning,” in Proceedings of the 22nd ACM SIGSAC conference on computer and communications security.   ACM, 2015, pp. 1310–1321.
  • Aono et al. [2018] Y. Aono, T. Hayashi, L. Wang, S. Moriai et al., “Privacy-preserving deep learning via additively homomorphic encryption,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 5, pp. 1333–1345, 2018.
  • Elsayed et al. [2018] G. F. Elsayed, S. Shankar, B. Cheung, N. Papernot, A. Kurakin, I. Goodfellow, and J. Sohl-Dickstein, “Adversarial examples that fool both human and computer vision,” arXiv preprint arXiv:1802.08195, 2018.
  • Finlayson et al. [2018] S. G. Finlayson, I. S. Kohane, and A. L. Beam, “Adversarial attacks against medical deep learning systems,” arXiv preprint arXiv:1804.05296, 2018.
  • Carlini and Wagner [2018] N. Carlini and D. Wagner, “Audio adversarial examples: Targeted attacks on speech-to-text,” arXiv preprint arXiv:1801.01944, 2018.
  • Sun et al. [2017]

    Y. Sun, L. Li, Z. Xie, Q. Xie, X. Li, and G. Xu, “Co-training an improved recurrent neural network with probability statistic models for named entity recognition,” in

    International Conference on Database Systems for Advanced Applications.   Springer, 2017, pp. 545–555.
  • Krizhevsky et al. [2012] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
  • He et al. [2016] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , 2016, pp. 770–778.
  • Huang et al. [2017a] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, vol. 1, no. 2, 2017, p. 3.
  • Hochreiter and Schmidhuber [1997] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  • Cho et al. [2014] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
  • Goodfellow et al. [2014b] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
  • ElGamal [1985] T. ElGamal, “A public key cryptosystem and a signature scheme based on discrete logarithms,” IEEE transactions on information theory, vol. 31, no. 4, pp. 469–472, 1985.
  • Goldwasser and Micali [1982] S. Goldwasser and S. Micali, “Probabilistic encryption & how to play mental poker keeping secret all partial information,” in

    Proceedings of the fourteenth annual ACM symposium on Theory of computing

    .   ACM, 1982, pp. 365–377.
  • Benaloh [1994] J. Benaloh, “Dense probabilistic encryption,” in Proceedings of the workshop on selected areas of cryptography, 1994, pp. 120–128.
  • Paillier [1999] P. Paillier, “Public-key cryptosystems based on composite degree residuosity classes,” in International Conference on the Theory and Applications of Cryptographic Techniques.   Springer, 1999, pp. 223–238.
  • Gentry and Boneh [2009] C. Gentry and D. Boneh, A fully homomorphic encryption scheme.   Stanford University Stanford, 2009, vol. 20, no. 09.
  • Van Dijk et al. [2010] M. Van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan, “Fully homomorphic encryption over the integers,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques.   Springer, 2010, pp. 24–43.
  • Yagisawa [2015] M. Yagisawa, “Fully homomorphic encryption without bootstrapping.” IACR Cryptology ePrint Archive, vol. 2015, p. 474, 2015.
  • Brakerski and Vaikuntanathan [2014] Z. Brakerski and V. Vaikuntanathan, “Efficient fully homomorphic encryption from (standard) lwe,” SIAM Journal on Computing, vol. 43, no. 2, pp. 831–871, 2014.
  • Bos et al. [2013] J. W. Bos, K. Lauter, J. Loftus, and M. Naehrig, “Improved security for a ring-based fully homomorphic encryption scheme,” in IMA International Conference on Cryptography and Coding.   Springer, 2013, pp. 45–64.
  • Ducas and Micciancio [2015] L. Ducas and D. Micciancio, “Fhew: bootstrapping homomorphic encryption in less than a second,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques.   Springer, 2015, pp. 617–640.
  • Dwork [2008] C. Dwork, “Differential privacy: A survey of results,” in International Conference on Theory and Applications of Models of Computation.   Springer, 2008, pp. 1–19.
  • Dwork and Lei [2009] C. Dwork and J. Lei, “Differential privacy and robust statistics,” in Proceedings of the forty-first annual ACM symposium on Theory of computing.   ACM, 2009, pp. 371–380.
  • Kairouz et al. [2013] P. Kairouz, S. Oh, and P. Viswanath, “The composition theorem for differential privacy,” arXiv preprint arXiv:1311.0776, 2013.
  • Bun and Steinke [2016] M. Bun and T. Steinke, “Concentrated differential privacy: Simplifications, extensions, and lower bounds,” in Theory of Cryptography Conference.   Springer, 2016, pp. 635–658.
  • Szegedy et al. [2013] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.
  • Athalye et al. [2018] A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples,” arXiv preprint arXiv:1802.00420, 2018.
  • Athalye and Sutskever [2017] A. Athalye and I. Sutskever, “Synthesizing robust adversarial examples,” arXiv preprint arXiv:1707.07397, 2017.
  • Huang et al. [2017b] S. Huang, N. Papernot, I. Goodfellow, Y. Duan, and P. Abbeel, “Adversarial attacks on neural network policies,” arXiv preprint arXiv:1702.02284, 2017.
  • Alfeld et al. [2016]

    S. Alfeld, X. Zhu, and P. Barford, “Data poisoning attacks against autoregressive models.” in

    AAAI, 2016, pp. 1452–1458.
  • Carlini and Wagner [2017a] N. Carlini and D. Wagner, “Adversarial examples are not easily detected: Bypassing ten detection methods,” in Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security.   ACM, 2017, pp. 3–14.
  • Moosavi-Dezfooli et al. [2017] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Universal adversarial perturbations,” arXiv preprint, 2017.
  • Zantedeschi et al. [2017] V. Zantedeschi, M.-I. Nicolae, and A. Rawat, “Efficient defenses against adversarial attacks,” in Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security.   ACM, 2017, pp. 39–49.
  • Jagielski et al. [2018] M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and B. Li, “Manipulating machine learning: Poisoning attacks and countermeasures for regression learning,” arXiv preprint arXiv:1804.00308, 2018.
  • Ma et al. [2018] X. Ma, B. Li, Y. Wang, S. M. Erfani, S. Wijewickrema, M. E. Houle, G. Schoenebeck, D. Song, and J. Bailey, “Characterizing adversarial subspaces using local intrinsic dimensionality,” arXiv preprint arXiv:1801.02613, 2018.
  • Kurakin et al. [2016a] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” arXiv preprint arXiv:1607.02533, 2016.
  • Na et al. [2017] T. Na, J. H. Ko, and S. Mukhopadhyay, “Cascade adversarial machine learning regularized with a unified embedding,” arXiv preprint arXiv:1708.02582, 2017.
  • Dong et al. [2017] Y. Dong, F. Liao, T. Pang, H. Su, X. Hu, J. Li, and J. Zhu, “Boosting adversarial attacks with momentum. arxiv preprint,” arXiv preprint arXiv:1710.06081, 2017.
  • Kolter and Wong [2017] J. Z. Kolter and E. Wong, “Provable defenses against adversarial examples via the convex outer adversarial polytope,” arXiv preprint arXiv:1711.00851, 2017.
  • Papernot et al. [2016b] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings,” in Security and Privacy (EuroS&P), 2016 IEEE European Symposium on.   IEEE, 2016, pp. 372–387.
  • Carlini and Wagner [2017b] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in Security and Privacy (SP), 2017 IEEE Symposium on.   IEEE, 2017, pp. 39–57.
  • Guo et al. [2017] C. Guo, M. Rana, M. Cissé, and L. van der Maaten, “Countering adversarial images using input transformations,” arXiv preprint arXiv:1711.00117, 2017.
  • Baluja and Fischer [2017] S. Baluja and I. Fischer, “Adversarial transformation networks: Learning to generate adversarial examples,” arXiv preprint arXiv:1703.09387, 2017.
  • Ilyas et al. [2017] A. Ilyas, A. Jalal, E. Asteri, C. Daskalakis, and A. G. Dimakis, “The robust manifold defense: Adversarial training using generative models,” arXiv preprint arXiv:1712.09196, 2017.
  • Sharma and Chen [2017] Y. Sharma and P.-Y. Chen, “Attacking the madry defense model with -based adversarial examples,” arXiv preprint arXiv:1710.10733, 2017.
  • Xie et al. [2017] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille, “Mitigating adversarial effects through randomization,” arXiv preprint arXiv:1711.01991, 2017.
  • Papernot et al. [2016c] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturbations against deep neural networks,” in Security and Privacy (SP), 2016 IEEE Symposium on.   IEEE, 2016, pp. 582–597.
  • Ateniese et al. [2015] G. Ateniese, L. V. Mancini, A. Spognardi, A. Villani, D. Vitali, and G. Felici, “Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers,” International Journal of Security and Networks, vol. 10, no. 3, pp. 137–150, 2015.
  • Metzen et al. [2017] J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff, “On detecting adversarial perturbations,” arXiv preprint arXiv:1702.04267, 2017.
  • Papernot et al. [2017] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security.   ACM, 2017, pp. 506–519.
  • Sinha et al. [2017] A. Sinha, H. Namkoong, and J. Duchi, “Certifiable distributional robustness with principled adversarial training,” arXiv preprint arXiv:1710.10571, 2017.
  • Grosse et al. [2016] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel, “Adversarial perturbations against deep neural networks for malware classification,” arXiv preprint arXiv:1606.04435, 2016.
  • He et al. [2017] W. He, J. Wei, X. Chen, N. Carlini, and D. Song, “Adversarial example defenses: Ensembles of weak defenses are not strong,” arXiv preprint arXiv:1706.04701, 2017.
  • Long et al. [2017] Y. Long, V. Bindschaedler, and C. A. Gunter, “Towards measuring membership privacy,” arXiv preprint arXiv:1712.09136, 2017.
  • Behzadan and Munir [2017] V. Behzadan and A. Munir, “Vulnerability of deep reinforcement learning to policy induction attacks,” in International Conference on Machine Learning and Data Mining in Pattern Recognition.   Springer, 2017, pp. 262–275.
  • Kurakin et al. [2016b] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” arXiv preprint arXiv:1611.01236, 2016.
  • Tramèr et al. [2017] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel, “Ensemble adversarial training: Attacks and defenses,” arXiv preprint arXiv:1705.07204, 2017.
  • Biggio et al. [2012] B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vector machines,” arXiv preprint arXiv:1206.6389, 2012.
  • Mei and Zhu [2015] S. Mei and X. Zhu, “Using machine teaching to identify optimal training-set attacks on machine learners.” in AAAI, 2015, pp. 2871–2877.
  • Koh and Liang [2017] P. W. Koh and P. Liang, “Understanding black-box predictions via influence functions,” arXiv preprint arXiv:1703.04730, 2017.
  • Patil et al. [2014] K. R. Patil, X. Zhu, Ł. Kopeć, and B. C. Love, “Optimal teaching for limited-capacity human learners,” in Advances in neural information processing systems, 2014, pp. 2465–2473.
  • Muñoz-González et al. [2017] L. Muñoz-González, B. Biggio, A. Demontis, A. Paudice, V. Wongrassamee, E. C. Lupu, and F. Roli, “Towards poisoning of deep learning algorithms with back-gradient optimization,” in Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security.   ACM, 2017, pp. 27–38.
  • Yang et al. [2017] C. Yang, Q. Wu, H. Li, and Y. Chen, “Generative poisoning attack method against neural networks,” arXiv preprint arXiv:1703.01340, 2017.
  • Chen et al. [2017] X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learning systems using data poisoning,” arXiv preprint arXiv:1712.05526, 2017.
  • Xie et al. [2018] L. Xie, K. Lin, S. Wang, F. Wang, and J. Zhou, “Differentially private generative adversarial network,” arXiv preprint arXiv:1802.06739, 2018.
  • Phan et al. [2016] N. Phan, Y. Wang, X. Wu, and D. Dou, “Differential privacy preservation for deep auto-encoders: an application of human behavior prediction.” in AAAI, vol. 16, 2016, pp. 1309–1316.
  • Phan et al. [2017a] N. Phan, X. Wu, and D. Dou, “Preserving differential privacy in convolutional deep belief networks,” Machine Learning, vol. 106, no. 9-10, pp. 1681–1704, 2017.
  • Triastcyn and Faltings [2018] A. Triastcyn and B. Faltings, “Generating differentially private datasets using gans,” arXiv preprint arXiv:1803.03148, 2018.
  • Phan et al. [2017b] N. Phan, X. Wu, H. Hu, and D. Dou, “Adaptive laplace mechanism: differential privacy preservation in deep learning,” in Data Mining (ICDM), 2017 IEEE International Conference on.   IEEE, 2017, pp. 385–394.
  • Barni et al. [2006] M. Barni, C. Orlandi, and A. Piva, “A privacy-preserving protocol for neural-network-based computation,” in Proceedings of the 8th workshop on Multimedia and security.   ACM, 2006, pp. 146–151.
  • Orlandi et al. [2007] C. Orlandi, A. Piva, and M. Barni, “Oblivious neural network computing via homomorphic encryption,” EURASIP Journal on Information Security, vol. 2007, no. 1, p. 037343, 2007.
  • Liu et al. [2016] M. Liu, H. Jiang, J. Chen, A. Badokhon, X. Wei, and M.-C. Huang, “A collaborative privacy-preserving deep learning system in distributed mobile environment,” in Computational Science and Computational Intelligence (CSCI), 2016 International Conference on.   IEEE, 2016, pp. 192–197.
  • Liu et al. [2017] J. Liu, M. Juuti, Y. Lu, and N. Asokan, “Oblivious neural network predictions via minionn transformations,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.   ACM, 2017, pp. 619–631.
  • Mohassel and Zhang [2017] P. Mohassel and Y. Zhang, “Secureml: A system for scalable privacy-preserving machine learning,” in 2017 38th IEEE Symposium on Security and Privacy (SP).   IEEE, 2017, pp. 19–38.
  • Rouhani et al. [2017] B. D. Rouhani, M. S. Riazi, and F. Koushanfar, “Deepsecure: Scalable provably-secure deep learning,” arXiv preprint arXiv:1705.08963, 2017.
  • Juvekar et al. [2018] C. Juvekar, V. Vaikuntanathan, and A. Chandrakasan, “Gazelle: A low latency framework for secure neural network inference,” arXiv preprint arXiv:1801.05507, 2018.
  • Chase et al. [2017] M. Chase, R. Gilad-Bachrach, K. Laine, K. Lauter, and P. Rindal, “Private collaborative neural network learning,” Cryptology ePrint Archive, Report 2017/762, 2017. https://eprint. iacr. org/2017/762, Tech. Rep., 2017.
  • Acar et al. [2017] A. Acar, Z. B. Celik, H. Aksu, A. S. Uluagac, and P. McDaniel, “Achieving secure and differentially private computations in multiparty settings,” in Privacy-Aware Computing (PAC), 2017 IEEE Symposium on.   IEEE, 2017, pp. 49–59.
  • Riazi et al. [2018] M. S. Riazi, C. Weinert, O. Tkachenko, E. M. Songhori, T. Schneider, and F. Koushanfar, “Chameleon: A hybrid secure computation framework for machine learning applications,” in Proceedings of the 2018 on Asia Conference on Computer and Communications Security.   ACM, 2018, pp. 707–721.
  • Yan et al. [2016] C. Yan, X. Wenyuan, and J. Liu, “Can you trust autonomous vehicles: Contactless attacks against sensors of self-driving vehicle,” DEF CON, 2016.
  • Papernot et al. [2016d] N. Papernot, P. McDaniel, A. Sinha, and M. Wellman, “Towards the science of security and privacy in machine learning,” arXiv preprint arXiv:1611.03814, 2016.
  • Kingma and Ba [2014] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • Papernot et al. [2016e] N. Papernot, P. McDaniel, and I. Goodfellow, “Transferability in machine learning: from phenomena to black-box attacks using adversarial samples,” arXiv preprint arXiv:1605.07277, 2016.
  • Deng et al. [2009] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on.   Ieee, 2009, pp. 248–255.
  • Madry et al. [2017] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017.
  • Graese et al. [2016] A. Graese, A. Rozsa, and T. E. Boult, “Assessing threat of adversarial examples on deep neural networks,” in Machine Learning and Applications (ICMLA), 2016 15th IEEE International Conference on.   IEEE, 2016, pp. 69–74.
  • Xu et al. [2017] W. Xu, D. Evans, and Y. Qi, “Feature squeezing: Detecting adversarial examples in deep neural networks,” arXiv preprint arXiv:1704.01155, 2017.
  • Kinga and Adam [2015] D. Kinga and J. B. Adam, “A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), vol. 5, 2015.
  • Rudin et al. [1992] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: nonlinear phenomena, vol. 60, no. 1-4, pp. 259–268, 1992.
  • Oord et al. [2016] A. v. d. Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” arXiv preprint arXiv:1601.06759, 2016.
  • Fredrikson et al. [2015] M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security.   ACM, 2015, pp. 1322–1333.
  • Shokri et al. [2017] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference attacks against machine learning models,” in Security and Privacy (SP), 2017 IEEE Symposium on.   IEEE, 2017, pp. 3–18.
  • Lindell [2005] Y. Lindell, “Secure multiparty computation for privacy preserving data mining,” in Encyclopedia of Data Warehousing and Mining.   IGI Global, 2005, pp. 1005–1009.
  • LeCun et al. [2010] Y. LeCun, C. Cortes, and C. Burges, “Mnist handwritten digit database,” AT&T Labs [Online]. Available: http://yann. lecun. com/exdb/mnist, vol. 2, 2010.
  • Chillotti et al. [2016] I. Chillotti, N. Gama, M. Georgieva, and M. Izabachene, “Faster fully homomorphic encryption: Bootstrapping in less than 0.1 seconds,” in International Conference on the Theory and Application of Cryptology and Information Security.   Springer, 2016, pp. 3–33.
  • Courbariaux et al. [2016] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, “Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1,” arXiv preprint arXiv:1602.02830, 2016.
  • Dean et al. [2012] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le et al., “Large scale distributed deep networks,” in Advances in neural information processing systems, 2012, pp. 1223–1231.
  • Abadi et al. [2016b] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al.

    , “Tensorflow: a system for large-scale machine learning.” in

    OSDI, vol. 16, 2016, pp. 265–283.
  • Lee et al. [2018] S. Lee, H. Kim, J. Park, J. Jang, C.-S. Jeong, and S. Yoon, “Tensorlightning: A traffic-efficient distributed deep learning on commodity spark clusters,” IEEE Access, 2018.
  • Li et al. [2014] M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su, “Scaling distributed machine learning with the parameter server.” in OSDI, vol. 14, 2014, pp. 583–598.
  • Dwork et al. [2014] C. Dwork, A. Roth et al., “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014.
  • Bengio et al. [2009] Y. Bengio et al., “Learning deep architectures for ai,” Foundations and trends® in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
  • Lee et al. [2009]

    H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations,” in

    Proceedings of the 26th annual international conference on machine learning.   ACM, 2009, pp. 609–616.
  • Bach et al. [2015a] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek, “On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation,” PloS one, vol. 10, no. 7, p. e0130140, 2015.
  • Simonyan et al. [2013] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” arXiv preprint arXiv:1312.6034, 2013.
  • Bach et al. [2015b] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek, “On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation,” PloS one, vol. 10, no. 7, p. e0130140, 2015.
  • Shrikumar et al. [2017] A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important features through propagating activation differences,” arXiv preprint arXiv:1704.02685, 2017.
  • Hinton et al. [2015] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
  • Ronneberger et al. [2015] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention.   Springer, 2015, pp. 234–241.
  • Çiçek et al. [2016] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3d u-net: learning dense volumetric segmentation from sparse annotation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.   Springer, 2016, pp. 424–432.
  • Li et al. [2017] X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu, and P. A. Heng, “H-denseunet: Hybrid densely connected unet for liver and liver tumor segmentation from ct volumes,” arXiv preprint arXiv:1709.07330, 2017.
  • Jo et al. [2018] Y. Jo, H. Cho, S. Y. Lee, G. Choi, G. Kim, H.-s. Min, and Y. Park, “Quantitative phase imaging and artificial intelligence: A review,” arXiv preprint arXiv:1806.03982, 2018.