## 1 Introduction

In spite of advantages in terms of application utility, the pervasiveness of machine learning exposes new vulnerabilities in software systems. Adversaries can conduct devastating attacks, in which deployed machine learning models can be used to reveal sensitive information in private training data

Fredrikson:2015:MIA ; DBLP:conf/ijcai/SW15 ; 7958568 ; DBLP:journals/corr/PapernotMSW16 , or to cause the models to misclassify, such as adversarial examples DBLP:journals/corr/GoodfellowSS14 ; DBLP:journals/corr/LiuCLS16 ; 7958570 . Efforts to address this issue typically seek one of three solutions: (1) Privacy-preserving models, which do not reveal sensitive information about the subjects involved in the training data Abadi ; 7953387 ; (2) Adversarial training algorithms, which augment training data to consist of benign examples and adversarial examples crafted during the training process, thereby increasing robustness DBLP:journals/corr/PapernotMSW16 ; 7965897 ; 7965869 ; DBLP:journals/corr/WangGZOXGL16 ; and (3) Provable robustness, in which the robustness against adversarial examples is guaranteed pmlr-v70-cisse17a ; DBLP:journals/corr/abs-1711-00851 ; DBLP:journals/corr/abs-1801-09344 .On one hand, private models, trained with existing privacy preserving mechanisms, are unshielded under adversarial examples. On the other hand, adversarial learning algorithms (with or without provable robustness) do not offer privacy protections to the training data. Aggressive adversaries can attack a deployed model by using both privacy model attacks and adversarial examples. That poses serious risks to machine learning-based systems in critical applications, e.g., face recognition, healthcare, etc. To be assured, a model must be private and resilient to both privacy model attacks and adversarial examples. There is an urgent demand to train such a secure model with a high utility. Unfortunately, learning such a model has not been studied before. It remains a largely open challenge.

In this paper, we propose a novel mechanism to: 1) preserve privacy of the training data in adversarial learning, 2) be provably and practically robust to adversarial examples, and 3) retain high model utility. Such a mechanism will greatly extend the applicability of machine learning, by fortifying the models in both privacy and security aspects. This is a non-trivial task. Existing algorithms cannot be applied to address the trade-off among model utility, privacy loss, and robustness.

Our Contributions. We first focus on establishing a solid theoretical and practical connection among privacy preservation, adversarial learning, and provable robustness. We develop a novel mechanism, called differentially private adversarial learning (DPAL), to preserve differential privacy in adversarial learning and to achieve provable robustness against adversarial examples. Differential privacy (DP) dwork2006calibrating is an elegant cryptography-based formulation of privacy in probabilistic terms, and provides a rigorous protection for an algorithm to avoid leaking personal information contained in its inputs. In our mechanism, privacy-preserving noise is injected into inputs and hidden layers to achieve DP in learning private model parameters. This noise will be used to derive a novel robustness bound, by leveraging the sequential composition theory in DP Dwork:2014:AFD:2693052.2693053 . In our theoretical analysis, noise injected into different layers is considered as a sequence of defensive mechanisms, providing different levels of robustness. The generalized robustness condition will be a composition of these levels of robustness. To our knowledge, our result establishes the first connection between DP preservation to protect the original training data and provable robustness

in deep learning.

Although our robustness condition helps the model to avoid misclassifying adversarial examples, it does not improve the model decision boundary. To tackle this, we incorporate ensemble adversarial learning into our mechanism to improve the decision boundary; thus limiting the number of robustness violations. First, we introduce a concept of DP adversarial examples, which are crafted under DP guarantees. We then address the trade-off between model utility and privacy loss, by designing a new DP adversarial objective function to tighten the global sensitivity; thus reducing the amount of noise injected into our function. Rigorous experiments conducted on MNIST and CIFAR-10 datasets Lecun726791 ; krizhevsky2009learning show that, our mechanism notably enhances the robustness of DP deep neural networks.

## 2 Background and Problem Definition

In this section, we revisit adversarial learning, DP, and introduce our problem definition. Let be a database that contains tuples, each of which contains data and a ground-truth label . Let us consider a classification task with possible categorical outcomes; i.e., the data label given is assigned to only one of the categories. Each

can be considered as a one-hot vector of

categories . On input and parameters , a model outputs class scores that maps -dimensional inputs to a vector of scores s.t. and . The class with the highest score value is selected as the predicted label for the data tuple, denoted as. A loss function

presents the penalty for mismatching between the predicted values and original values . We briefly revisit DP and DP-preserving techniques in deep learning.###### Definition 1

-DP dwork2006calibrating . A randomized algorithm fulfills -DP, if for any two databases and differing at most one tuple, and for all , we have:

(1) |

Here, controls the amount by which the distributions induced by and may differ,

is a broken probability. DP also applies to general metrics

, including Hamming metric as in Definition 1 and -norms Chatzikokolakis . DP-preserving algorithms in deep learning can be categorized into two lines: 1) introducing noise into gradients of parameters Abadi ; ShokriVitaly2015 , and 2) injecting noise into objective functions Phan0WD16 ; PhanMLJ2017 ; NHPhanICDM17Adversarial Learning. For some target model and inputs , i.e., is the true label of , the adversary’s goal is to find an adversarial example , where is the perturbation introduced by the attacker, such that: (1) and are close, and (2) the model misclassifies , i.e., . In this paper, we consider well-known classes of -norm bounded attacks DBLP:journals/corr/GoodfellowSS14 . Let be the -norm ball of radius , one of the goals in adversarial learning is to minimize the risk over adversarial examples:

where a specific attack is used to approximate solutions to the inner maximization problem, and the outer minimization problem corresponds to training the model with parameters over these adversarial examples . There are two basic attacks. The first one is a single-step algorithm, in which only a single gradient computation is required. For instance, Fast Gradient Sign Method (FGSM) algorithm DBLP:journals/corr/GoodfellowSS14 finds an adversarial example by maximizing the loss function . The second one is an iterative algorithm, in which multiple gradients are computed and updated. For instance, in DBLP:journals/corr/KurakinGB16 , FGSM is applied multiple times with small steps, each of which has a size of , where is the number of steps.

To improve the robustness of models, prior work focused on two directions: 1) Producing correct predictions on adversarial examples, while not compromising the accuracy on legitimate inputs 7965897 ; 7965869 ; DBLP:journals/corr/WangGZOXGL16 ; 7546524 ; 7467366 ; DBLP:journals/corr/GuR14 ; papernot2017extending ; hosseini2017blocking ; and 2) Detecting adversarial examples, without introducing too many false positives metzen2017detecting ; DBLP:journals/corr/GrosseMP0M17 ; DBLP:journals/corr/XuEQ17 ; DBLP:journals/corr/AbbasiG17 ; DBLP:journals/corr/GaoWQ17 . Adversarial training appears to hold the greatest promise for learning robust models tramer2017ensemble . One of the well-known algorithms was proposed by DBLP:journals/corr/KurakinGB16a . At every training step, new adversarial examples are generated and injected into batches containing both benign and adversarial examples. Recently, several algorithms pmlr-v70-cisse17a have been proposed to derive provable robustness, in which each prediction is guaranteed to be consistent under the perturbation , if a robustness condition is held.

DP and Provable Robustness. Given a benign example , we focus on achieving a robustness condition to attacks of -norm, as follows:

(2) |

where = , indicating that a small perturbation in the input does not change the predicted label . To achieve the robustness condition in Eq. 2, Lecuyer et al. Lecuyer2018 introduce an algorithm, called PixelDP. By considering an input (e.g., images) as databases in DP parlance, and individual features (e.g., pixels) as tuples in DP, PixelDP shows that randomizing the scoring function to enforce DP on a small number of pixels in an image guarantees robustness of predictions against adversarial examples that can change up to that number of pixels. To achieve this goal, random noise is injected into either input or some hidden layer. That results in the following -PixelDP condition:

###### Lemma 1

-PixelDP Lecuyer2018 . Given a randomized scoring function satisfying -PixelDP w.r.t. a -norm metric, we have:

(3) |

where is the expected value of , is a predefined budget, and is a broken probability.

At the prediction time, a certified robustness check is implemented for each prediction. A generalized robustness condition is proposed as follows:

(4) |

where and are the lower and upper bounds of the expected value

, derived from the Monte Carlo estimation with an

-confidence, given is the number of invocations of with independent draws in the noise . Passing the check for a given input guarantees that no perturbation exists up to -norm that causes the model to change its prediction. In other words, the classification model, based on , i.e., , is provably robust to attacks of -norm on input with probability . PixelDP does not preserve DP in learning private parameters to protect the training data Lecuyer2018 . That is different from our goal.## 3 DPAL with Provable Robustness

We introduce a new DP-preserving mechanism in Alg. 1 (Appendix A). Given a deep neural network with model parameters (Line 2), the network is trained by optimizing the loss function over the training examples . Optimization algorithms, e.g., SGD, are applied on random training batches, each of which is a set of training examples in (Lines 4-16). Our network can be represented as: , where is feature representation learning model with as an input, and will take the output of and return the class scores . Our idea is to use auto-encoder to simultaneously learn DP parameters , and to ensure that the output of is DP. The reasons we choose auto-encoder are: (1) It is easier to train given its small size, and (2) It can be reused for different predictive models. Figure 1 shows the structure of our mechanism. We use a data reconstruction function (cross-entropy), given a batch of the input .

(5) |

where the affine transformation of is: , with is the hidden layer of , and is the reconstruction of . First, we derive the 1st-order polynomial approximation of by applying Taylor Expansion tagkey1985 , denoted as . Then, Functional Mechanism zhang2012functional is employed to inject noise into coefficients of the approximated function .

(6) |

where , , we have that: . In , parameters , which will be derived from the function optimization, need to be DP; they do not disclose information from the data . To achieve that, are considered the coefficients of the parameters . Laplace noise is injected into , where is the sensitivity of , as follows ( can be ignored):

(7) |

To ensure that the computation of does not access the original data, we further inject Laplace noise into (Line 6). The perturbed function now becomes:

(8) |

where , and (Lines 3 and 6). is bounded in to compute the global sensitivity :

###### Lemma 2

Denote

as the number of hidden neurons in

. The global sensitivity of over any two neighboring batches, and , is as follows: .###### Lemma 3

Algorithm 1 preserves -DP in learning .

Detailed proofs of all the lemmas can be found in Appendix. The output of is the perturbed affine transformation , which is -DP as shown in the following lemma, given and is the maximum 1-norm of ’s columns WikipediaOperatornorm .

###### Lemma 4

Algorithm 1 preserves -DP in computing the affine transformation .

After preserving DP in learning and , without using additional information from the original data, the computations of is also -DP, i.e., thank to the post-processing property of DP. This is crucial to tighten the sensitivity of our adversarial objective function at the output layer.

DP Adversarial Learning. To integrate adversarial learning, we first draft adversarial examples using benign examples, with an ensemble of attack algorithms and a random perturbation budget at each step (Lines 9-13). Adversarial examples (crafted from the training data) are very similar to the original benign examples. That clearly poses privacy risks. Therefore, is perturbed to ensure DP in the training procedure. The ensemble adversarial examples definitely enhances the robustness of our model. Second, we propose a novel DP adversarial objective function , in which two loss functions, denoted as for benign examples and for adversarial examples, are combined to optimize the parameters . is defined as follows:

(9) |

where is a hyper-parameter, and is a DP adversarial example, crafted as follows:

where is the class prediction result of to avoid label leaking of the benign examples during the adversarial example crafting. Similar to benign examples , training the auto-encoder with adversarial examples , i.e., , preserves -DP. It can be extended to iterative attacks as

(10) |

where is the prediction result of .

Now we are ready to preserve DP in objective functions and in order to achieve DP (Eq. 9). Since the objective functions use the labels and given and , we need to protect the labels at the output layer. Let us first present the objective function for benign examples. Given computed from the through the network with is the parameter at the last hidden layer . Cross-entropy function can be applied as follows:

Based on Taylor Expansion, the term can be approximated as a 2nd-order polynomial function:

where , and . Based on the post-processing property of DP, is -DP, since the computation of is -DP (Lemma 4). As a result, we have that: 1) The optimization of the function does not disclose any information from the training data; and 2) , given any two neighboring batches and . Thus, to preserve DP in , we only need to preserve -DP in the function , which access the ground-truth label . Given coefficients , the sensitivity of is computed as:

###### Lemma 5

Let and be neighboring batches of benign examples, we have the following inequality: , where is the number of hidden neurons in .

The sensitivity of our objective function is notably smaller than the state-of-the-art bound, which is NHPhanICDM17 . This is crucial to improve our model utility under strong attacks, while providing the same level of DP protections. The perturbed functions are as follows:

(11) |

###### Lemma 6

Algorithm 1 preserves -differential privacy in the optimization of .

Since the budget, accumulated from the perturbation of the auto-encoder, is tiny in practice (i.e., -), the additional privacy budget used to preserve DP in the function can be considered . We apply the same technique to preserve -DP in the optimization of the function over the adversarial examples . Since the perturbed functions and are always optimized given two disjoint batches and , the privacy budget used to preserve DP in the adversarial objective function is , following the parallel composition property in DP Dwork:2014:AFD:2693052.2693053 . The total budget to learn private parameters is . Similar to other objective function-based approaches NHPhanICDM17 ; zhang2012functional ; Lingchen18 , the optimization of our mechanism is repeated in steps without using additional information from the original data. It only reads perturbed inputs and perturbed coefficients. Thus, the privacy budget consumption will not be accumulated at each training step.

Provable Robustness. Now, we establish the correlation between our mechanism and provable robustness. On one hand, to derive the provable robustness condition against adversarial examples , i.e., , PixelDP mechanism randomizes the scoring function by injecting robustness noise into either input or a hidden layer, i.e., or , where and are the sensitivities of and measuring how much and can be changed given the perturbation in the input . Monte Carlo estimation of the expected values , , and are used to derive the robustness in Eq. 4.

On the other hand, in our mechanism, the privacy noise includes Laplace noise injected into both input , i.e., , and its affine transformation , i.e., . Note that the perturbation of is equivalent to . This helps us to avoid injecting the noise directly into the coefficients . The correlation between our DP preservation and provable robustness lies in the correlation between the privacy noise and the robustness noise . We can derive a provable robustness condition by projecting the privacy noise on the scale of the robustness noise . Given the input , let , in our mechanism we have that: . By applying a group privacy size Dwork:2014:AFD:2693052.2693053 ; Lecuyer2018 , the scoring function satisfies -PixelDP given , or equivalently is -PixelDP given , . By applying Lemma 1, we have

(12) |

With that, we can achieve a provable robustness condition against -norm attacks, as follows:

(13) |

with the probability -confidence, derived from the Monte Carlo estimation of . Our mechanism also perturbs (Eq. 8). Given , we further have . Therefore, the scoring function also satisfies -PixelDP given the perturbation . In addition to the robustness to the -norm attacks, we can achieve an additional robustness bound in Eq. 13 against -norm attacks. Similar to PixelDP, these robustness conditions can be achieved as randomization processes in the inference time. They can be considered as two independent, provable defensive mechanisms, sequentially applied against two -norm attacks, i.e., and .

One challenging question here is: “What is the general robustness condition, given and ?" Intuitively, our model is robust to attacks with . We leverage the theory of sequential composition in DP Dwork:2014:AFD:2693052.2693053 to theoretically answer this question. Given independent mechanisms , whose privacy guarantees are -DP with , each mechanism , which takes the input and outputs the value of with the Laplace noise only injected into the position (i.e., no randomization at any other position), is defined as: . We aim to derive a generalized robustness of any composition scoring function bounded in , defined as follows:

(14) |

Our setting clearly follows the sequential composition in DP Dwork:2014:AFD:2693052.2693053 . Therefore, we can prove that the expected value is insensitive to small perturbations in the input.

###### Lemma 7

Given independent mechanisms , which are -DP w.r.t a -norm metric, then the expected output value of any sequential function of them, i.e., , with bounded output , meets the following property:

Given the expected value of our scoring function is not sensitive to small perturbations , we derive our sequential composition of robustness as follows:

###### Theorem 1

(Composition of Robustness) Given independent mechanisms . Given any sequential function , using notation from Lemma 7, and further let and are lower and upper bounds with an -confidence, for the Monte Carlo estimation of . For any input , if so that

(15) |

then the predicted label , is robust to adversarial examples , , with probability , by satisfying: , which is the targeted robustness condition in Eq. 2.

To apply the composition of robustness in our mechanism, the noise injections into the input and the affine transformation can be considered as two independent mechanisms and , sequentially applied as . When is applied by invoking with independent draws in the noise , the noise injected into is fixed (Lines 3 and 6); and vice-versa. By applying group privacy Dwork:2014:AFD:2693052.2693053 with sizes and , the scoring function given and are -DP and -DP given . With Theorem 1, we have a generalized robustness condition as follows:

###### Proposition 1

(DPAL Robustness). For any input , if , then the predicted label of our function is robust to small perturbations with the probability , by satisfying

In the inference time, the failure probability - can be made arbitrarily small by increasing the number of invocations of , with independent draws in the noise. Similar to Lecuyer2018 , Hoeffding’s inequality can be applied to bound the approximation error in . We use the following sensitivity bounds where is the maximum 1-norm of ’s rows, and .

Training and Verified Inferring. Our model is trained similarly to training typical deep neural networks. Note that we optimize for a single draw of noise during training (Line 3). Parameters and are independently updated by applying gradient descent (Lines 16-17). Regarding the inference time, we implement a verified inference procedure as a post-processing step (Lines 18-23). Our verified inference returns a robustness size guarantee for each example , which is the maximal value of , for which the robustness condition in Proposition 1 holds. Maximizing is equivalent to maximizing the robustness epsilon , which is the only parameter controlling the size of ; since, all the other hyper-parameters, i.e., , , , , , , , and are fixed given a well-trained model :

(16) |

The prediction on an example is robust to attacks up to + . We also propose a new way to draw independent noise following the distribution of for the input and for the transformation , where and are the fixed noise used to train the network, and is a parameter to control the distribution shifts. This works better without affecting the DP bounds and the robustness. (Details are in Appendix J.)

## 4 Experimental Results

We have carried out an extensive experiment on MNIST and CIFAR-10 datasets. We consider the well-known class of bounded adversaries to see whether our mechanism could retain high model utility, while providing strong DP guarantees and protections against adversarial examples, compared with existing mechanisms? Our DPAL mechanism is evaluated in comparison with state-of-the-art mechanisms in: (1) DP-preserving algorithms in deep learning, i.e., DP-SGD Abadi , AdLM NHPhanICDM17 , and in (2) Provable robustness, i.e., PixelDP Lecuyer2018 . To preserve DP, DP-SGD injects random noise into gradients of parameters, while AdLM is a Functional Mechanism-based approach. PixelDP is one of the state-of-the-art mechanisms providing provable robustness using DP bounds. The baseline models share the same design in our experiment. Four attacks (white-box) were used, including FGSM, I-FGSM, Momentum Iterative Method (MIM) DBLP:journals/corr/abs-1710-06081 , and MadryEtAl madry2018towards . All the models share the same structure consisting of 2 and 3 convolution layers, respectively for MNIST and CIFAR-10 datasets (detailed configurations are in Appendix J). We apply two accuracy metrics as follows:

where is the number of test cases, returns if the model makes a correct prediction (otherwise, returns 0), and returns if the robustness size is larger than a given attack bound (else, returns 0). It is important to note that in our setting, which is different from a common setting . Thus, the attack size in the setting of is equivalent to an attack size in our setting. The reason of using is to achieve better model utility, while retaining the same global sensitivities to preserve DP, compared with . is used to indicate the DP budget used to protect the training data; meanwhile, is the budget for robustness. is set to be 1.0 in the training of our model.

Comments

There are no comments yet.