The success of deep learning networks has motivated their deployment in some safety-critical environments, such as autonomous driving and facial recognition systems. Applications in these areas make understanding the robustness and security of deep neural networks urgently needed, especially their resilience under malicious, finely crafted inputs. Unfortunately, deep learning models’ performance are often so brittle that even imperceptibly modified inputs, also known as adversarial examples, are able to completely break the model(Goodfellow et al., 2015; Szegedy et al., 2013). Deep learning models’ robustness under adversarial examples is well-studied from both attack (crafting powerful adversarial examples) and defence (making the model more robust) perspectives (Athalye et al., 2018; Carlini & Wagner, 2017a, b; Goodfellow et al., 2015; Madry et al., 2018; Papernot et al., 2016; Xiao et al., 2019a, 2018b, 2018c; Eykholt et al., 2018). Recently, it has been shown that defending deep learning models against adversarial examples is a very difficult task, especially under strong and adaptive attacks. Early defenses such as distillation (Papernot et al., 2016) have been broken by stronger attacks like C&W (Carlini & Wagner, 2017b). Many defense methods have been proposed recently (Guo et al., 2018; Song et al., 2017; Buckman et al., 2018; Ma et al., 2018; Samangouei et al., 2018; Xiao et al., 2018a), but their robustness improvement cannot be certified – no provable guarantees can be given to verify their robustness. In fact, most of these uncertified defenses are actually vulnerable under stronger attacks (Athalye et al., 2018; He et al., 2017).
There are, however, some methods in the literature seeking to give provable guarantees on the robustness performance, such as distributional robust optimization (Sinha et al., 2018), linear relaxations (Wong & Kolter, 2018; Mirman et al., 2018; Wang et al., 2018a; Dvijotham et al., 2018b; Weng et al., 2018; Zhang et al., 2018), interval bound propagation (Gowal et al., 2018)
, ReLU stability regularization(Xiao et al., 2019b), and semidefinite relaxations (Raghunathan et al., 2018a). Linear relaxations of neural networks, first proposed by Wong & Kolter (2018)
, is one of the most popular categories among these certified defences. They use the dual of linear programming or several similar approaches to provide a linear relaxation of the network (referred to as a “convex adversarial polytope”) and the resulting bounds are tractable for robust optimization. However, these methods are both computationally and memory intensive, and can increase model training time by a factor of hundreds. On the other hand, interval bound propagation (IBP) is a simple and efficient method that can also be used for training verifiable neural networks(Gowal et al., 2018)
, which achieved state-of-the-art verified error on many datasets despite its simplicity. However, since the IBP bounds are very loose during the initial phase of training, the training procedure can be unstable and sensitive to hyperparameters.
In this paper, we first investigate the limitation of existing linear relaxation based certified robust training methods and find that they over-penalize the induced norm of weight matrices, due to their nature of being linear. On the other hand, we interpret IBP as an augmented neural network, which learns to optimize a non-linear bound; the blessing of non-linearity gives IBP trained networks more powerful expressiveness. This explains the weakness of linear relaxation based methods and the success of IBP training on some tasks.
To improve the stability of IBP network, we propose a new certified robust training method, CROWN-IBP, which marries the expressive power and efficiency of IBP and a tight linear relaxation based verification bound, CROWN. CROWN-IBP bound propagation involves a forward pass and a backward pass, with computational cost significantly cheaper than purely linear relaxation based methods. In our experiments, we show that CROWN-IBP significantly improves the training stability, and further reduces verified errors by a noticeable margin comparing to the existing IBP based approach (Gowal et al., 2018). On MNIST, we reach and verified error under distortions with and , respectively, outperforming carefully tuned models in (Gowal et al., 2018), and significantly outperforming linear relaxation based methods (at Wong et al. (2018) report over 40% verified error).
2 Related Work and Background
2.1 Robustness Verification and Relaxations of Neural Networks
Neural network robustness verification algorithms seek for upper and lower bounds of an output neuron for all possible inputs within a set, typically a norm bounded perturbation. Most importantly, the margins of the outputs between the ground-truth class and any other classes determine model robustness. However, it has already been shown that finding the exact output range is a non-convex problem and NP-complete (Katz et al., 2017; Weng et al., 2018). Therefore, recent works resorted to giving relatively tight but computationally tractable bounds of the output range with necessary relaxations of the original problem. Many of these robustness verification approaches are based on linear relaxations of non-linear units in neural networks, including CROWN (Zhang et al., 2018), DeepPoly (Singh et al., 2019), Fast-Lin (Weng et al., 2018), DeepZ (Singh et al., 2018) and Neurify (Wang et al., 2018b). We refer the readers to (Salman et al., 2019) for a comprehensive survey on this topic. After linear relaxation, they essentially bound the output of a neural network
by linear upper/lower hyperplanes:
where is the product of the network weight matrices and diagonal matrices reflecting the ReLU relaxations; and are two bias terms unrelated to . Additionally, Dvijotham et al. (2018c, a); Qin et al. (2019) solve the Lagrangian dual of verification problem; Raghunathan et al. (2018a, b) propose semidefinite relaxations which are tighter compared to linear relaxation based methods, but computationally expensive. Bounds on neural network local Lipschitz constant can also be used for verification (Hein & Andriushchenko, 2017; Zhang et al., 2019). Besides these deterministic verification approaches, random smoothing can be used to increase the robustness of any model in a probabilistic manner (Cohen et al., 2019; Lecuyer et al., 2018; Li et al., 2018; Liu et al., 2018).
2.2 Robust Optimization and Verifiable Adversarial Defense
To improve the robustness of neural networks against adversarial perturbations, a natural idea is to generate adversarial examples by attacking itself and then use them to augment the training set (Kurakin et al., 2017). More recently, Madry et al. (2018) showed that adversarial training can be formulated as solving a min-max robust optimization problem as in (2). Given a model with parameter, and training data distribution , the training algorithm aims to minimize the robust loss, which is defined as the max loss within a neighborhood of each data point , leading to the following robust optimization problem:
To solve this minimax problem, Madry et al. (2018) proposed to use projected gradient descent (PGD) attack to approximately solve the inner max and then use the loss on the perturbed example to update the model. Networks trained by this procedure achieve state-of-the-art test accuracy under strong attacks (Athalye et al., 2018; Wang et al., 2018a; Zheng et al., 2018).
Despite being robust under strong attacks, models obtained by this PGD-based adversarial training do not have verified error guarantees. Due to the nonconvexity of neural networks, PGD attack can only compute the lower bound of robust loss (the inner maximization problem). Minimizing a lower bound of the inner max cannot guarantee (2) is minimized. In other words, even if PGD-attack cannot find a perturbation with large verified error, that does not mean there exists no such perturbation. This becomes problematic in safety-critical applications since those models need to be provably safe.
Verifiable adversarial training methods, on the other hand, aim to obtain a network with good robustness that can be verified. This can be done by combining adversarial training and robustness verification—instead of using PGD to find a lower bound of inner max, certified adversarial training uses a verification method to find an upper bound of the inner max, and then update the parameters based on this upper bound of robust loss. Minimizing an upper bound of the inner max guarantees to minimize the robust loss. There are two certified robust training methods that are related to our work and we will describe them in detail below.
2.2.1 Linear Relaxation Based Verifiable Adversarial Training
One of the most popular verifiable adversarial training method was proposed in (Wong & Kolter, 2018) using linear relaxations of neural networks to give an upper bound of the inner max. Other similar approaches include Mirman et al. (2018); Wang et al. (2018a); Dvijotham et al. (2018b). Since the bound propagation process of a convex adversarial polytope is too expensive, several methods were proposed to improve its efficiency, like Cauchy projection (Wong et al., 2018) and dynamic mixed training (Wang et al., 2018a). However, even with these speed-ups, the training process is still slow. Also, this method may significantly reduce the model’s standard accuracy (accuracy on natural, unmodified test set). As will be discussed in our experiments in Section 4, we show that this method tends to over-regularize the network during training. Intuitively, regularizing the linear relaxation of the network results in regularizing the norm of each row. Since they train the network to make this bound tight, an implicit regularization was added to the induced norm of weight matrices.
2.2.2 Interval Bound Propagation (IBP)
Interval Bound Propagation (IBP) uses a very simple rule to compute the pre-activation outer bounds for each layer of the neural network. Unlike linear relaxation based methods, IBP does not relax ReLU neurons and does not consider the correlations between different layer weights and treat each layer individually. Gowal et al. (2018) presented a verifiably robust training method by using IBP to give output bounds. The motivation of (Gowal et al., 2018) is to speed up the training process of verifiably robust models.
However, IBP can be unstable to use in practice, since the bounds can be very loose especially during the initial phase of training, posing a challenging problem to the optimizer. To help with instability, Gowal et al. (2018) use a mixture of regular and robust cross-entropy loss as the model’s training loss, controlled by a parameter ; it can be tricky to balance the two losses. Due to the difficulty in parameter tuning, reproducing the results in (Gowal et al., 2018) is believed to be hard222https://github.com/deepmind/interval-bound-propagation/issues/1. Achieving the best CIFAR results reported in (Gowal et al., 2018)
requires training for 3200 epochs with a batch size of 1600 on 32 TPUs.
We first give notations used throughout the paper, and backgrounds on verification and robust optimization.
We define an -layer neural network recursively as:
where , represents input dimension and is the number of classes,
is an element-wise activation function. We useto represent pre-activation neuron values and to represent post-activation neuron values. Consider an input example with ground-truth label , we consider a set of and we desire a robust network to have the property for all . We define element-wise upper and lower bounds for and as and .
Neural network verification literature typically defines a specification vector, that gives a linear combination for neural network output: . In robustness verification, typically we set where is the ground truth class label, where is the attack target label and other elements in are 0. This will represent the margin between class and class . For an
class classifier and a given label, we define a specification matrix as:
Importantly, each element in vector gives us margins between class and all other classes. We define the lower bound of for all as , which is a very important quantity. Wong & Kolter (2018) showed that for cross-entropy loss we have:
3.2 Analysis of IBP and Linear Relaxation based Verifiable Training Methods
Interval Bound Propagation (IBP)
Interval Bound Propagation (IBP) uses a very simple bound propagation rule. For input layer we have
(element-wise). For an affine layer we have
where takes element-wise absolute value. Note that and . And for element-wise monotonic increasing activation functions ,
Given a fixed NN, IBP gives a very loose estimation of the output range of. However, during training, since the weights of NN can be updated, we can equivalently view IBP as an augmented neural network, referred to as IBP-NN (Figure 1). Unlike an usual network which takes an input with label , IBP-NN takes two points and as input (where , element-wise). The bound propagation process can be equivalently seen as forward propagation in a specially structured neural network, as shown in Figure 1. After the last specification layer (typically merged into ), we can obtain . Then,
is sent to softmax layer for prediction. Importantly, since(due to the -th row in is always 0), the top-1 prediction of the augmented IBP network is if and only if all other elements of are positive, i.e., the original network will predict correctly for all .
|Dataset||( norm)||IBP verified error||Convex-adv verified error|
When we train the augmented IBP network with ordinary cross-entropy loss and desire it to predict correctly on an input , we are implicitly doing robust optimization (Eq. (2)). We attribute the success of IBP in (Gowal et al., 2018) to the power of non-linearity – instead of using linear relaxations of neural networks (Wong & Kolter, 2018) to obtain , we train a non-linear network that learns to gives us a good quality . Additionally, we also found that a network trained using IBP is not verifiable using linear relaxation based verification methods, including CROWN (Zhang et al., 2018), convex adversarial polytope (Wong & Kolter, 2018) and Fast-Lin (Weng et al., 2018). A purely IBP trained network has low IBP verified error but its verified error using convex adversarial polytope (Wong & Kolter, 2018) or Fast-Lin (Weng et al., 2018) can be much higher; sometimes the network becomes unverifiable using these typically tighter bounds (see Table 1). This also indicates that IBP is a non-linear mechanism different from linear relaxation based methods.
However, IBP is a very loose bound during the initial phase of training, which makes training unstable; purely using IBP frequently leads to divergence. Gowal et al. (2018) proposed to use a schedule where is gradually increased during training, and a mixture of robust cross-entropy loss with regular cross-entropy loss as the objective to stablize training:
where starts with 1 and slowly decreases to .
Issues with linear relaxation based training.
Since IBP hugely outperforms linear relaxation based methods in the recent work (Gowal et al., 2018) on some datasets, we want to understand what is going wrong with linear relaxation based methods. We found that the models produced by linear relaxation methods such as (Wong & Kolter, 2018) and (Wong et al., 2018) are over-regularized especially at a larger . In Figure 2 we train a small 4-layer MNIST model and we increase from 0 to 0.3 in 60 epochs. We plot the induced norm of the 2nd CNN layer during the training process on models trained using our method and (Wong et al., 2018). We find that when becomes larger (roughly at , epoch 30), the norm of weight matrix using (Wong et al., 2018) starts to decrease, indicating that the model is forced to learn small norm weights and thus its representation power is severally reduced. Meanwhile, the verified error also starts to ramp up. Our proposed IBP based training method, CROWN-IBP, does not have this issue; the norm of weight matrices keep increasing during the training process, and verifiable error does not significantly increase when reaches 0.3.
Another issue with current linear relaxation based training or verification methods, including convex adversarial polytope and CROWN (Zhang et al., 2018), is their high computational and memory cost, and poor scalability. For the small network in Figure 2, convex adversarial polytope (with 50 random Cauchy projections) is 8 times slower and takes 4 times more memory than CROWN-IBP (without using random projections). Convex adversarial polytope scales even worse for larger networks; see Appendix E for a comparison.
3.3 The proposed algorithm: CROWN-IBP
We have reviewed IBP and linear relaxation based methods above. IBP has great representation power due to its non-linearity, but can be tricky to tune due to its very imprecise bound at the beginning; on the other hand, linear relaxation based methods give tighter lower bounds which stabilize training, but it over-regularizes the network and forbids us to achieve good accuracy.
We propose CROWN-IBP, a new training method for certified defense where we optimize the following problem ( represents the network parameters):
where our lower bound of margin is a combination of two bounds with different natures: IBP, and a CROWN-style bound; is the cross-entropy loss. CROWN is a tight linear relaxation based lower bound which is more general and often tighter than convex adversarial polytope. Importantly, CROWN-IBP avoids the high computational cost of ordinary CROWN (or many other linear relaxation based methods, like convex adversarial polytope), by applying CROWN-style bound propagation on the final specifications only; intermediate layer bounds and are obtained by IBP. We start with and use the tight bounds to stabilize initial training. Then we ramp up from 0 to 1 while we increase from 0 to , until we reach the desired and . The network is trained using pure IBP at that point.
Benefits of CROWN-IBP.
First, we compute tight linear relaxation based bounds during the early phase of training, thus greatly improve the stability and reproducibility of IBP. Second, we do not have the over-regularization problem as the CROWN-style bound is gradually phased out during training. Third, unlike the approach used in (Gowal et al., 2018) that mixes regular cross-entropy loss with robust cross-entropy loss (Eq. (8)) to stabilize IBP, we use the mixture of two lower bounds, which is still a valid lower bound of ; thus, we are strictly within the robust optimization framework (Eq. (2)) and also obtain better empirical results. Forth, because we apply the CROWN-style bound propagation only to the last layer, the computational cost is greatly reduced comparing to other methods that purely relies on linear relaxation based bounds.
CROWN-IBP consists of IBP bound propagation in a forward pass and CROWN-style bound propagation in a backward pass. We discuss the details of CROWN-IBP below.
Forward Bound Propagation in CROWN-IBP.
In CROWN-IBP, we first obtain and for all layers by applying (5), (6) and (7). Then we will obtain (assuming is merged into ). Obtaining these bounds is similar to forward propagation in IBP-NN (Figure 1). The time complexity of IBP is comparable to two forward propagation passes of the original network.
Linear Relaxation of ReLU neurons
Given and computed in previous step, we first check if some neurons are always active () or always inactive (), since these neurons are effectively linear and no relaxations are needed. For the remaining unstable neurons, Zhang et al. (2018); Wong et al. (2018) give a linear relaxation for the special case of element-wise ReLU activation function:
where ; Zhang et al. (2018) proposes to adaptively select when and 0 otherwise, which minimizes the relaxation error. In other words, for an input vector , we effectively replace the ReLU layer with a linear layer, giving upper or lower bounds of the output:
where and are two diagonal matrices representing the “weights” of the relaxed ReLU layer. In the following we focus on conceptually presenting the algorithm, while more details of each term can be found in the Appendix.
Backward Bound Propagation in CROWN-IBP.
Unlike IBP, CROWN-style bounds start computation from the last layer, so we refer it as backward bound propagation (not to be confused with the back-propagation algorithm to obtain gradients). Suppose we want to obtain the lower bound (we assume the specification matrix has been merged into ). The input to layer is , which has already been replaced by Eq. (11). CROWN-style bounds choose the lower bound of (LHS of (11)) when is positive, and choose the upper bound when is negative. We then merge and the linearized ReLU layer together and define:
Now we have a lower bound where collects all terms not related to . Note that the diagonal matrix implicitly depends on . Then, we merge with the next linear layer, which is straight forward by plugging in :
Then we continue to unfold the next ReLU layer using its linear relaxations, and compute a new matrix, with in a similar manner as in (12). Along with the bound propagation process, we need to compute a series of matrices, , where , and . At this point, we merged all layers of the network into a linear layer: where collects all terms not related to . A lower bound for with can then be easily given as
For ReLU networks, convex adversarial polytope (Wong & Kolter, 2018) uses a very similar bound propagation procedure. CROWN-style bounds allow an adaptive selection of in (10), thus often gives slightly better bounds. We give details on each term in Appendix F.
Computational Cost of CROWN-IBP.
In ordinary CROWN (Zhang et al., 2018) and convex adversarial polytope (Wong & Kolter, 2018), the bound propagation process is much more expensive than CROWN-IBP, since it needs to use (13) to compute all intermediate layer’s and (), by considering as the final layer of the network. In this case, for each layer we need a different set of matrices, defined as . This causes three computational issues:
Unlike the last layer , an intermediate layer has a much larger output dimension typically thus all will have large dimensions .
Computation of all matrices is expensive. Suppose the network has neurons for all intermediate and input layers and neurons for the output layer (assuming ), the time complexity of ordinary CROWN or convex adversarial polytope is . A ordinary forward propagation only takes time per example, thus ordinary CROWN does not scale up to large networks for training, due to its quadratic dependency in and extra times overhead.
When both and
represent convolutional layers with small kernel tensorsand , there are no efficient GPU operations to form the matrix using and . Existing implementations either unfold at least one of the convolutional kernels to fully connected weights, or use sparse matrices to represent and . They suffer from poor hardware efficiency on GPUs.
In CROWN-IBP, we do not have the first and second issues since we use IBP to obtain intermediate layer bounds, which is only slower than forward propagation by a constant factor. The time complexity of the backward bound propagation in CROWN-IBP is , only times slower than forward propagation and significantly more scalable than ordinary CROWN (which is times slower than forward propagation, where typically ). The third issue is also not a concern, since we start from the last specification layer which is a small fully connected layer. Suppose we need to compute and is a convolutional layer with kernel , we can efficiently compute on GPUs using the transposed convolution operator with kernel . Conceptually, the backward pass of CROWN-IBP propagates a small specification matrix backwards, replacing affine layers with their transposed operators, and activation function layers with a diagonal matrix product. This allows efficient implementation and better scalability.
4.1 Setup and Models
To discourage hand-tuning on a small set of models, we use 20 different model architectures for a total of 53 models for MNIST, Fashion-MNIST and CIFAR-10 datasets, from small CNNFC models to wide CNNFC models. The models are a mixture of networks with different depths, widths and convolution kernel sizes. Details are presented in Appendix B. We consider robustness and report both the best and worst verified error achieved over all models, and the median of error. Verified errors are evaluated using IBP on the test set. The median error implies that at least half of models trained are as good as this error. We list hyperparameters in Appendix A. Since we focus on improving stability and performance of IBP models, in experiments we compare three different variants of IBP333Our implementations of all IBP variants used in this paper is available at https://github.com/huanzhang12/CROWN-IBP:
CROWN-IBP: our proposed method, using CROWN-style linear relaxations in a backward manner to improve IBP training stability, and a mixture of CROWN and IBP lower bounds in robust cross-entropy (CE) loss, as in Eq. (9). No regular CE loss is used.
Pure-IBP: using the lower bounds purely from IBP in robust CE loss. No regular CE loss is used. Equivalent to Eq. (9) with fixed to 1.
Natural-IBP-: proposed by (Gowal et al., 2018), using the lower bounds provided by IBP in robust CE loss (multiplied by ), plus times a regular CE loss term as in Eq. (8). This is initialized as 1 and is gradually reduced during training. In our experiments we choose two final values, 0.5 and 0. Note that Gowal et al. (2018) uses 0.5 as the final value.
4.2 Comparisons to IBP based methods
In Table 2 we show the verified errors on test sets for CROWN-IBP and the other two IBP baselines. We also include best verified errors reported in literature for comparison. Numbers reported in Gowal et al. (2018) use the same training method as Natural IBP with final , albeit they use different hyperparameters and sometimes different for training and evaluation; we always use the same value for training and evaluation. CROWN-IBP’s best, median, and worst test verified errors are consistently better than all other IBP-based baselines across all models and ’s. Especially, on MNIST with and we achieve 7.46% and 12.96% best verified error, respectively, outperforming all previous works and significantly better than convex relaxation based training methods (Wong et al., 2018); a similar level of advantage can also be observed on Fashion-MNIST (Wong & Kolter, 2018). For small on MNIST, we find that IBP based methods tend to overfit. For example, adding an regularization term can decreases verified error at from 5.63% to 3.60% (see Section 4.4 for more details); this explains the performance gap in Table 2 at small between CROWN-IBP and convex adversarial polytope, since the latter method provides implicit regularization. On CIFAR-10 with , CROWN-IBP is better than all other methods except (Gowal et al., 2018); however, Gowal et al. (2018) obtained the best result by using a large network trained for 3200 epochs with a fine-tuned schedule on 32 TPUs; practically, the reproduciable verified error by Gowal et al. (2018) is around 71% - 72% (see notes under table). In contrast, our results can be obtained in reasonable time using a single RTX 2080 Ti GPU. We include training time comparisons in Appendix E.
4.3 Training stability
To evaluate training stability, we compare the verified errors obtained by training processes under different schedule length (10, 15, 30, 60). We compare the best, worst and median verified errors over all 18 models for MNIST. Our results are presented in Figure 3 (for 8 large models) and Figure 4 (for 10 small models) at . The upper and lower bound of an error bar are the worst and best verified error, respectively, and the lines go through median values. We can see that both Natural-IBP and CROWN-IBP can improve training stability when the schedule length is not sufficient (10, 20 epochs). When schedule length is above 30, CROWN-IBP’s verified errors are consistently better than any other method. Pure-IBP cannot stably converge on all models when schedule is short, espeically for a larger . We conduct additional training stability experiments on CIFAR-10 dataset and the observations are similar (see Appendix D).
Another interesting observation is that at a small , a shorter schedule improves results for large models (Figure 3). This is due to early stopping which controls overfitting (see Section 4.4). Since we decrease the learning rate by half every 10 epochs after the schedule ends, a shorter schedule implies that the learning process stops earlier.
To further test the training stability of CROWN-IBP, we run each MNIST experiment (in Table 2
) 5 times on 10 small models. The mean and standard deviation of the verified and standard errors on test set are presented in AppendixC. Standard deviations of verified errors are very small, giving us further evidence of good stability.
4.4 Overfitting issue with small
We found that on MNIST for a small , the verified error obtained by IBP based methods are not as good as linear relaxation based methods (Wong et al., 2018; Mirman et al., 2018). Gowal et al. (2018) thus propose to train models using a larger and evaluate them under a smaller , for example and . Instead, we investigated this issue further and found that many CROWN-IBP trained models achieve very small verified errors (close to 0 and sometimes exactly 0) on training set (see Table 3). This indicates possible overfitting during training. As we discussed in Section 3.1, linear relaxation based methods implicitly regularize the weight matrices so the network does not overfit when is small. Inspired by this finding, we want to see if adding an explicit regularization term in CROWN-IBP training helps when or . The verified and standard errors on the training and test sets with and without regularization can be found in Table 3. We can see that with a small regularization added () we can reduce verified error on test set significantly. This makes CROWN-IBP results comparable to the numbers reported in convex adversarial polytope (Wong et al., 2018); at , the best model using convex adversarial polytope training can achieve certified error, while CROWN-IBP achieves best certified error on the models presented in Table 3. The overfitting is likely caused by IBP’s strong representation power, which also explains why IBP based methods significantly outperform linear relaxation based methods at larger values. Using early stopping can also improve verified error on test set; see Section 4.3.
|Model Name||: regularization||Training||Test|
|(see Appendix B)||standard error||verified error||standard error||verified error|
In this paper, we propose a new certified defense method, CROWN-IBP, by combining the fast interval bound propagation (IBP) in the forward pass and a tight linear relaxation based bound, CROWN, in the backward pass. Our method enjoys the non-linear representation power and high computational efficiency provided by IBP while facilitating the tight CROWN bound to stabilize training strictly under the robust optimization framework. Our experiments on a variety of model structures and three datasets show that CROWN-IBP consistently outperforms other IBP baselines and achieves state-of-the-art verified errors.
Athalye et al. (2018)
Athalye, A., Carlini, N., and Wagner, D.
Obfuscated gradients give a false sense of security: Circumventing
defenses to adversarial examples.
International Conference on Machine Learning (ICML), 2018.
- Buckman et al. (2018) Buckman, J., Roy, A., Raffel, C., and Goodfellow, I. Thermometer encoding: One hot way to resist adversarial examples. International Conference on Learning Representations, 2018.
Carlini & Wagner (2017a)
Carlini, N. and Wagner, D.
Adversarial examples are not easily detected: Bypassing ten detection
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 3–14. ACM, 2017a.
- Carlini & Wagner (2017b) Carlini, N. and Wagner, D. Towards evaluating the robustness of neural networks. In 2017 38th IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE, 2017b.
- Cohen et al. (2019) Cohen, J. M., Rosenfeld, E., and Kolter, J. Z. Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918, 2019.
- Dvijotham et al. (2018a) Dvijotham, K., Garnelo, M., Fawzi, A., and Kohli, P. Verification of deep probabilistic models. CoRR, abs/1812.02795, 2018a. URL http://arxiv.org/abs/1812.02795.
- Dvijotham et al. (2018b) Dvijotham, K., Gowal, S., Stanforth, R., Arandjelovic, R., O’Donoghue, B., Uesato, J., and Kohli, P. Training verified learners with learned verifiers. arXiv preprint arXiv:1805.10265, 2018b.
- Dvijotham et al. (2018c) Dvijotham, K., Stanforth, R., Gowal, S., Mann, T., and Kohli, P. A dual approach to scalable verification of deep networks. UAI, 2018c.
- Eykholt et al. (2018) Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A., Xiao, C., Prakash, A., Kohno, T., and Song, D. Robust physical-world attacks on deep learning visual classification. In
- Goodfellow et al. (2015) Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. ICLR, 2015.
- Gowal et al. (2018) Gowal, S., Dvijotham, K., Stanforth, R., Bunel, R., Qin, C., Uesato, J., Mann, T., and Kohli, P. On the effectiveness of interval bound propagation for training verifiably robust models. arXiv preprint arXiv:1810.12715, 2018.
- Guo et al. (2018) Guo, C., Rana, M., Cisse, M., and van der Maaten, L. Countering adversarial images using input transformations. In ICLR, 2018.
- He et al. (2017) He, W., Wei, J., Chen, X., Carlini, N., and Song, D. Adversarial example defenses: ensembles of weak defenses are not strong. In Proceedings of the 11th USENIX Conference on Offensive Technologies, pp. 15–15. USENIX Association, 2017.
- Hein & Andriushchenko (2017) Hein, M. and Andriushchenko, M. Formal guarantees on the robustness of a classifier against adversarial manipulation. In Advances in Neural Information Processing Systems (NIPS), pp. 2266–2276, 2017.
- Katz et al. (2017) Katz, G., Barrett, C., Dill, D. L., Julian, K., and Kochenderfer, M. J. Reluplex: An efficient SMT solver for verifying deep neural networks. In International Conference on Computer Aided Verification, pp. 97–117. Springer, 2017.
- Kurakin et al. (2017) Kurakin, A., Goodfellow, I., and Bengio, S. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017.
- Lecuyer et al. (2018) Lecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., and Jana, S. Certified robustness to adversarial examples with differential privacy. arXiv preprint arXiv:1802.03471, 2018.
- Li et al. (2018) Li, B., Chen, C., Wang, W., and Carin, L. Second-order adversarial attack and certifiable robustness. arXiv preprint arXiv:1809.03113, 2018.
- Liu et al. (2018) Liu, X., Cheng, M., Zhang, H., and Hsieh, C.-J. Towards robust neural networks via random self-ensemble. In European Conference on Computer Vision, pp. 381–397. Springer, 2018.
- Ma et al. (2018) Ma, X., Li, B., Wang, Y., Erfani, S. M., Wijewickrema, S., Houle, M. E., Schoenebeck, G., Song, D., and Bailey, J. Characterizing adversarial subspaces using local intrinsic dimensionality. In International Conference on Learning Representations (ICLR), 2018.
- Madry et al. (2018) Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- Mirman et al. (2018) Mirman, M., Gehr, T., and Vechev, M. Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning, pp. 3575–3583, 2018.
- Papernot et al. (2016) Papernot, N., McDaniel, P., Wu, X., Jha, S., and Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. IEEE, 2016.
- Qin et al. (2019) Qin, C., Dvijotham, K. D., O’Donoghue, B., Bunel, R., Stanforth, R., Gowal, S., Uesato, J., Swirszcz, G., and Kohli, P. Verification of non-linear specifications for neural networks. ICLR, 2019.
- Raghunathan et al. (2018a) Raghunathan, A., Steinhardt, J., and Liang, P. Certified defenses against adversarial examples. International Conference on Learning Representations (ICLR), arXiv preprint arXiv:1801.09344, 2018a.
- Raghunathan et al. (2018b) Raghunathan, A., Steinhardt, J., and Liang, P. S. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, pp. 10900–10910, 2018b.
- Salman et al. (2019) Salman, H., Yang, G., Zhang, H., Hsieh, C.-J., and Zhang, P. A convex relaxation barrier to tight robust verification of neural networks. arXiv preprint arXiv:1902.08722, 2019.
- Samangouei et al. (2018) Samangouei, P., Kabkab, M., and Chellappa, R. Defense-GAN: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605, 2018.
- Singh et al. (2018) Singh, G., Gehr, T., Mirman, M., Püschel, M., and Vechev, M. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, pp. 10825–10836, 2018.
- Singh et al. (2019) Singh, G., Gehr, T., Püschel, M., and Vechev, M. Robustness certification with refinement. ICLR, 2019.
- Sinha et al. (2018) Sinha, A., Namkoong, H., and Duchi, J. Certifying some distributional robustness with principled adversarial training. In ICLR, 2018.
- Song et al. (2017) Song, Y., Kim, T., Nowozin, S., Ermon, S., and Kushman, N. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. arXiv preprint arXiv:1710.10766, 2017.
- Szegedy et al. (2013) Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Wang et al. (2018a) Wang, S., Chen, Y., Abdou, A., and Jana, S. Mixtrain: Scalable training of formally robust neural networks. arXiv preprint arXiv:1811.02625, 2018a.
- Wang et al. (2018b) Wang, S., Pei, K., Whitehouse, J., Yang, J., and Jana, S. Efficient formal safety analysis of neural networks. In Advances in Neural Information Processing Systems, pp. 6369–6379, 2018b.
- Weng et al. (2018) Weng, T.-W., Zhang, H., Chen, H., Song, Z., Hsieh, C.-J., Boning, D., Dhillon, I. S., and Daniel, L. Towards fast computation of certified robustness for ReLU networks. In International Conference on Machine Learning, 2018.
- Wong & Kolter (2018) Wong, E. and Kolter, Z. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, pp. 5283–5292, 2018.
- Wong et al. (2018) Wong, E., Schmidt, F., Metzen, J. H., and Kolter, J. Z. Scaling provable adversarial defenses. Advances in Neural Information Processing Systems (NIPS), 2018.
- Xiao et al. (2018a) Xiao, C., Deng, R., Li, B., Yu, F., Liu, M., and Song, D. Characterizing adversarial examples based on spatial consistency information for semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 217–234, 2018a.
- Xiao et al. (2018b) Xiao, C., Li, B., Zhu, J.-Y., He, W., Liu, M., and Song, D. Generating adversarial examples with adversarial networks. IJCAI18, 2018b.
- Xiao et al. (2018c) Xiao, C., Zhu, J.-Y., Li, B., He, W., Liu, M., and Song, D. Spatially transformed adversarial examples. ICLR18, 2018c.
- Xiao et al. (2019a) Xiao, C., Yang, D., Li, B., Deng, J., and Liu, M. Meshadv: Adversarial meshes for visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6898–6907, 2019a.
- Xiao et al. (2019b) Xiao, K. Y., Tjeng, V., Shafiullah, N. M., and Madry, A. Training for faster adversarial robustness verification via inducing relu stability. ICLR, 2019b.
- Zhang et al. (2018) Zhang, H., Weng, T.-W., Chen, P.-Y., Hsieh, C.-J., and Daniel, L. Efficient neural network robustness certification with general activation functions. In Advances in Neural Information Processing Systems (NIPS), dec 2018.
- Zhang et al. (2019) Zhang, H., Zhang, P., and Hsieh, C.-J. Recurjac: An efficient recursive algorithm for bounding jacobian matrix of neural networks and its applications. AAAI Conference on Artificial Intelligence, 2019.
- Zheng et al. (2018) Zheng, T., Chen, C., and Ren, K. Distributionally adversarial attack. arXiv preprint arXiv:1808.05537, 2018.
Appendix A Hyperparameters in Experiments
All our MNIST, CIFAR-10 models are trained on a single NVIDIA 2080 Ti GPU. In all our experiments, if not mentioned otherwise, we use the following hyperparameters:
For MNIST, we train 100 epochs with batch size 256. We use Adam optimizer and the learning rate is . The first epoch is standard training for warming up. We gradually increase linearly per batch in our training process with a schedule length of 60. We reduce the learning rate by 50% every 10 epochs after schedule ends. No data augmentation technique is used and the whole 28 28 images are used (normalized to 0 - 1 range).
For CIFAR, we train 200 epoch with batch size 128. We use Adam optimizer and the learning rate is 0.1%. The first 10 epochs are standard training for warming up. We gradually increase linearly per batch in our training process with a schedule length of 120. We reduce the learning rate by 50% every 10 epochs after schedule ends. We use random horizontal flips and random crops as data augmentation. The three channels are normalized with mean (0.4914, 0.4822, 0.4465) and standard deviation (0.2023, 0.1914, 0.2010). These numbers are per-channel statistics from the training set used in (Gowal et al., 2018).
For all experiments, we set when the schedule starts. We decrease linearly to 0 when finishes its increasing schedule and reaches . We did not tune the schedule for parameter , and it always has the same schedule length as the schedule. All verified error numbers are evaluated on the test set, using IBP, since the networks are trained using pure IBP after reaches the target. We found that CROWN (Zhang et al., 2018) or Fast-Lin (Weng et al., 2018) cannot give tight verification bounds on IBP trained models (some comparison results are given in Table 1).
Appendix B Model Structure
Table 4 gives the 18 model structures used in our paper. MNIST and Fashion-MNIST use exactly the same model structures. Most CIFAR models share the same structures as MNIST models (unless noted on the table) except that their input dimensions are different. Model A is too small for CIFAR thus we remove it for CIFAR experiments. Models A - J are the “small models” reported in Table 2. Models K - T are the “large models” reported in Table 2. For results in Table 1, we use a small model (model structure B) for all three datasets.
Appendix C Reproducibility
|error||model A||model B||model C||model D||model E||model F||model G||model H||model I||model J|
|0.1||std. err. (%)|
|verified err. (%)|
|0.2||std. err. (%)|
|verified err. (%)|
|0.3||std. err. (%)|
|verified err. (%)|
|0.4||std. err. (%)|
|verified err. (%)|
We run CROWN-IBP on 10 small MNIST models, 5 times each, and report the mean and standard deviation of standard and verified errors in Table 5. We can observe that the results from multiple runs are very similar with small standard deviations, so reproducibility is not an issue for CROWN-IBP.
Appendix D Training Stability Experiments on CIFAR
Similar to our experiments in Section 4.3, we compare the verified errors obtained by CROWN-IBP, Natural-IBP and Pure-IBP under different schedule lengths (30, 90, 120). We present the best, worst and median verified errors over all 17 models for CIFAR-10 in Figure 5 and 6, at . The upper and lower bound of an error bar are the worst and best verified error, respectively, and the lines go through median values. CROWN-IBP can improve training stability, and consistently outperform other methods. Pure-IBP cannot stably converge on all models when schedule is short, and verified errors tend to be higher. Natural-IBP is sensitive to setting; with many models have high robust errors (as shown in Figure 5(b)).
Appendix E Training Time
In Table 6 we present the training time of CROWN-IBP, Pure-IBP and convex adversarial polytope (Wong et al., 2018) on several representative models. All experiments are measured on a single RTX 2080 Ti GPU with 11 GB RAM. We can observe that CROWN-IBP is practically 2 to 7 times slower than Pure-IBP (theoretically, CROWN-IBP is up to times slower than Pure-IBP); convex adversarial polytope (Wong et al., 2018), as a representative linear relaxation based method, can be over hundreds times slower than Pure-IBP especially on deeper networks. Note that we use 50 random Cauchy projections for (Wong et al., 2018). Using random projections alone is not sufficient to scale purely linear relaxation based methods to larger datasets, thus we advocate a combination of non-linear IBP bounds with linear relaxation based methods as in CROWN-IBP, which offers good scalability, stability and representation power. We also note that the random projection based acceleration can also be applied to the backward bound propagation (CROWN-style bound) in CROWN-IBP to further speed CROWN-IBP up.
|Convex adv (Wong et al., 2018) (s)||1708||9263||12649||35518||160794||2372||12688||18691||6961||51145|
Appendix F Exact Forms of the CROWN-IBP Backward Bound
CROWN (Zhang et al., 2018) is a general framework that replaces non-linear functions in a neural network with linear upper and lower hyperplanes with respect to pre-activation variables, such that the entire neural network function can be bounded by a linear upper hyperplane and linear lower hyperplane for all ( is typically a norm bounded ball, or a box region):
CROWN achieves such linear bounds by replacing non-linear functions with linear bounds, and utilizing the fact that the linear combinations of linear bounds are still linear, thus these linear bounds can propagate through layers. Suppose we have a non-linear vector function , applying to an input (pre-activation) vector , CROWN requires the following bounds in a general form:
In general the specific bounds for different needs to be given in a case-by-case basis, depending on the characteristics of and the preactivation range . In neural network common can be ReLU, , sigmoid, maxpool, etc. Convex adversarial polytope (Wong et al., 2018) is also a linear relaxation based techniques that is closely related to CROWN, but only for ReLU layers. For ReLU such bounds are simple, where are diagonal matrices, :
where and are two diagonal matrices:
Note that CROWN-style bounds require to know all pre-activation bounds and . We assume these bounds are valid for . In CROWN-IBP, these bounds are obtained by interval bound propagation (IBP). With pre-activation bounds and given (for ), we rewrite the CROWN lower bound for the special case of ReLU neurons:
Theorem F.1 (CROWN Lower Bound).
For a -layer neural network function , , we have , where