Adversarial attacks rely on creating such input data points, which are visually indistinguishable from ‘normal’ examples, but drastically change the prediction of the model Goodfellow et al. (2014). One remedy is to construct adversarial examples and add them to the training set Madry et al. (2017). While such models become robust to many adversarial attacks, there are no guarantees that another adversarial scheme exists. To formally verify the robustness of the model against norm-bounded perturbations, one can find the outer bound on the so-called ‘adversarial polytope’ Wong and Kolter (2017). These techniques give loose bounds on the output activations, but guarantee that no adversary within a given norm can change the class label. Unfortunately, most of these techniques are computationally demanding and do not scale well to large networks, which makes them difficult to use in practice.
In this paper, we consider the framework of interval bounds propagation (IBP) proposed by Gowal et al. Gowal et al. (2018)
for constructing provably robust classifiers. IBP uses the interval arithmetic to minimize the upper bound on the maximum difference between any pair of logits when the input is perturbed within the norm-bounded ball. Direct application of interval arithmetic in a layer-wise fashion leads to the well-knownwrapping effect Moore (1979), because bounds are reshaped to be axis-aligned with bounding boxes that always encompass the adversarial polytope, see Figure 1. To overcome this limitation, the authors starts from a typical classification loss to pretrain the network and gradually increases the importance of adversarial loss together with increasing the size of the input perturbation. Unfortunately, too sudden change of these trade-off factors results in the lack of convergence, which makes the training process cumbersome and time consuming.
In this contribution, we show that the training procedure of IBP can be significantly simplified, which results in more stable training and faster convergence. Our key idea relies on combining the IBP loss with an additional term, which controls the size of adversarial polytope across layers. More precisely, we minimize the size of outer bound of adversarial polytope at each layer jointly with the IBP cost function, see Figure 2 for the illustration. As a result, our model is less sensitive to the change of the aforementioned IBP hyper-parameters, which makes it easier to use in practice.
Our contribution is the following:
We introduce a new term to the IBP loss function. Our modification allows to use larger perturbations at the initial stage of training and helps to stabilize the training. Moreover, it requires a lower number of epochs to obtain comparable performance to IBP. In consequence, our model can be seen as a very efficient technique for constructing provable robust models, which can be applied to large networks. The proposed idea is not limited to IBP and can be incorporated in other robust training methods, such as the convex-optimization-based approachesWong and Kolter (2017), Dvijotham et al. (2018b)
. It also helps to reduce hyperparameter tuning (particularly dynamics ofduring the training).
We give an insight on instability of IBP and show that this effect is correlated with a lack of minimization of interval bounds in hidden layers. Looking from a different perspective — we observe that IBP (implicitly) minimizes the interval bounds in hidden layers when the training is convergent.
Conducted experiments support the research hypothesis, that the additional term in the loss function stabilizes the training, improves its efficiency and guides the network in the early stage of training. In the most challenging settings for the CIFAR-10, we are able to get better results (verified test error) even using much smaller network than the one used in the IBP’s best performance. We show concrete examples, where IBP fails (or gets stuck in local minimum for a long time), whereas the new loss function allows to train the model in a stable fashion.
2 Related work
The study on adversarial examples has begun from Szegedy et al. (2013), in which the authors noticed that neural networks are fragile to targeted perturbations. Since then, numerous attacks and defenses have followed Papernot et al. (2016); Moosavi-Dezfooli et al. (2017); Xiao et al. (2018a); Kurakin et al. (2016); Tramer et al. (2017); Yuan et al. (2019). One extremely effective way to defend against adversarial examples is to generate such examples in the training stage Madry et al. (2017). In this approach, we try to mimic the adversary and simulate his behaviour. While this strategy provides practical benefits, one cannot guarantee that other attacks does not exist.
To construct provable defenses, we aim to produce certificates that no perturbation within a fixed norm can change a given class label. There is a number of works using exact solvers to verify the robustness against adversarial attacks. These methods employ either integer programming approaches Lomuscio and Maganti (2017); Tjeng et al. (2018); Dutta et al. (2018); Xiao et al. (2018b); Cheng et al. (2017), or Satisfiability Modulo Theories (SMT) solvers Katz et al. (2017); Carlini et al. (2017); Ehlers (2017). A downside of these methods is their high computational complexity due to the NP-completeness of the problem. In consequence, vast majority of these methods do not scale well to large or even medium size networks.
To speed up the training of verifiably robust models, one can bound a set of activations reachable through a norm-bounded perturbation Salman et al. (2019); Liu et al. (2019). In Wong and Kolter (2017)2018). As an alternative, Mirman et al. (2018); Singh et al. (2018a, 2019) adapted the framework of ‘abstract transformers’ to compute an approximation to the adversarial polytope using the SGD training. This allowed to train the networks on entire regions of the input space at once. Interval bound propagation Gowal et al. (2018) applied the interval arithmetic to propagate axis-aligned bounding box from layer to layer. Analogical idea was used in Dvijotham et al. (2018a), in which the predictor and the verifier networks were trained simultaneously. While these methods are computationally appealing, they require careful tuning of hyper-parameters to provide tight bounds on the verification network. Finally, there are also hybrid methods, which combine exact and relaxed verifiers Bunel et al. (2018); Singh et al. (2018b).
3 Interval bound propagation
In this section we first give the main idea behind training provable robust models. Next, we recall the IBP framework based on the interval arithmetic. Finally, we present our model, which is the extension of the IBP approach.
3.1 Training robust classifiers
We consider a feed-forward neural networkdesigned for a classification task. The network is composed of layers given by transformations:
is either an affine transformation or nonlinear monotonic function such as ReLU or Sigmoid. In the training stage, we feed the network with pairs of input vectorand its correct class label and minimize the cross-entropy with softmax applied to the output logits .
In the adversarial attack, any test vector can be perturbed by some with norm-bounded by , for a small fixed . Thus the input to the network can by any point in -dimensional hyper-cube:
centered at with side length . This set is transformed by a neural network into some convex set called adversarial polytope:
To design provable defense against adversarial attack, we have to ensure that class label does not change for any output . In other words, all inputs from the hyper-cube should be labeled as by a neural network . In this context, a fraction of incorrectly classified examples on the test set is called the verified test error.
3.2 Verifiable robustness using IBP
Exact verification of the model robustness may be difficult even for simple neural networks. Thus we usually look for an easier task computing loose outer bound of and control the class label inside this bound. In the IBP approach Gowal et al. (2018), we find the smallest bounding box at each layer that encloses the transformed bounding box from the previous layer. In other words, we bound the activation of each layer by an axis-aligned bounding box
In the case of neural networks, finding such a bounding box from layer to layer fashion can be computed efficiently using the interval arithmetic. By applying the affine layer
to , the smallest bounding box for output is given by
is an element-wise absolute value operator. For a monotonic activity function, we get the interval bound defined by:
To obtain a provable robustness in the classification context, we consider the worst-case prediction for the whole interval bound of the final logits. More precisely, we need to ensure that the whole bounding box is classified correctly, i.e. no perturbation changes the correct class label. In consequence, the logit of the true class is equal to its lower bound and the other logits are equal to their upper bounds:
Finally, one can apply softmax with the cross-entropy loss to the logit vectors representing the worst-case prediction.
As shown in Gowal et al. (2018), computing interval bounds uses only two forward passes through the neural network, which makes this approach appealing from a practical perspective. Nevertheless, a direct application of the above procedure with a fixed may fail because propagated bounds are too loose especially for very deep networks (see also wrapping effect illustrated in Figure 1) . To overcome this problem Gowal et al. supplied the above interval loss with a typical cross-entropy cost applied to original non-interval data:
where is a trade-off parameter. In the initial training phase, the model uses only classical loss function applied to non-interval data (). Next, the weight of the interval loss is gradually increased up to . Moreover, the training starts with the small perturbation radius , which is also increased in later epochs. The training process is sensitive to these hyperparameters and finding the correct schedule for every new data set can be problematic and requires extensive experimental studies. This makes the whole training procedure time consuming, which reduces practicality of this approach.
3.3 Constrained interval bound propagation
To make IBP less sensitive to the training settings and provide more training stability (particularly for bigger ), we propose to enhance the cost function. We want to directly control the bounding boxes at each layer of the network. More precisely, in addition to the IBP loss, we minimize the size of the outer interval bound at each layer. Thus our cost function equals
We argue that such the addition would help to circumvent limitations of the original IBP. First, gradients would be calculated not only with respect to the the last layer but to all hidden layers. This should bring more training stability, especially at the early training stage. Second, we expect it would be easier for a model to have small interval bounds in the final layer when bounds are constrained in hidden layers. And indeed our experimental results support these research hypotheses.
Here we report our experiments, which show the effect of the proposed loss function and give some insight why it is beneficial to minimize the interval bounds in hidden layers. We implement our ideas with PyTorch libraryPaszke et al. (2017) and for a fair comparison we reimplement the original IBP using the same framework. We conduct the experiments on three datasets, namely CIFAR-10 Krizhevsky et al. , SVHN Netzer et al. (2011) and MNIST LeCun and Cortes . The neural network architectures used in the experiments are the same as in Gowal et al. (2018) and these are 3 convolutional nets called small, medium and large, see Table 1 for details. In all experiments adversarial perturbations are within norm-bounded ball. If not stated otherwise, we always apply the original training procedure and hyper-parameters used in Gowal et al. (2018). For CIFAR-10 and SVHN, we use data augmentation (adding random translations and flips, normalizing each image channel using the channel statistics from the training set).
|Conv2d(input_filters, 16, 4, 2)||Conv2d(input_filters, 32, 3, 1)||Conv2d(input_filters, 64, 3, 1)|
|Conv2d(16, 32, 4, 1)||Conv2d(32, 32, 4, 2)||Conv2d(64, 64, 3, 1)|
|Fully_connected(100)||Conv2d(32, 64, 3, 1)||Conv2d(64, 128, 3, 2)|
|Fully_connected(10)||Conv2d(64, 64, 4, 2)||Conv2d(128, 128, 3, 1)|
|Fully_connected(512)||Conv2d(128, 128, 3, 1)|
Architectures used in the experiments. There are 4 parameters for the convolutional layer (Conv2d): a number of input and output filters and a size of a filter and a stride. For the fully connected layer the parameter denotes a number of outputs.
4.1 Faster convergence
First, we highlight that our approach minimizes the verified test error much faster than IBP. Since the performance of both methods on MNIST is comparable, we only report the results on most challenging cases of CIFAR-10 and SVHN with maximal perturbation radius .
It is evident from Figure 3 that the difference between both methods is substantial. In the case of CIFAR-10 after 100 epochs, the verified test error is over 20 percentage points lower, whereas the nominal error is close. The shape of the curves for SVHN is similar, but the gain in verified accuracy is slightly lower; after 50 epochs the verified error of is also 20 percentage points lower than the one obtained by IBP, while after 100 epochs the difference is around 10 percentage points.
We stress that we took the training schedule directly from the IBP method (in particular changes in and ), which were tuned for that method. Selecting optimal parameters for opens the possibility for even better results and better trade-offs between the verified and the nominal test errors. In the next section, we investigate the more dynamic changes, which would make the training even faster.
4.2 More stable training
Gowal et al. stated that their method needs to slowly increase (from 0 to ) to provide stability and convergence during the training. For example, for CIFAR-10, this ‘ramp-up’ phase lasts 150 epochs. It raises a natural question whether we could speed-up the increase and whether our new term in the loss function is helpful in this regard.
First we show that even if we keep the original changes, IBP may stuck in a local minimum for a very long time. The experiment was done on CIFAR-10 with the large architecture and . The test error goes down very quickly, reaching 0.2, whereas the verified test error remains 100% for over 100 epochs, see Figure 4. On the contrary, our approach steadily minimizes the verified test error.
Next, we investigate the more dynamic changes to reduce the training time. For the MNIST dataset, increasing 2.5 faster results in lack of convergence for the original IBP method, see Figure 5. On the other hand, the additional term in the loss function helps to stabilize the training and obtain the minimization of verified error.
4.3 Minimizing interval bounds in hidden layers is desired
In our approach, we add the additional term to the loss function which helps to minimize the interval bounds in hidden layers. Interestingly, the IBP method also tries to implicitly keep the interval bounds stable in the hidden layers, see top row of Figure 6. In fact this is a natural behaviour, because it is easier for a model to have small interval bounds in the final layer when bounds are possibly small in hidden layers. Nevertheless, the additional term in our approach stabilizes bounds faster and their values are a few orders of magnitude lower.
To gain more insight, we also checked what happens to the interval bounds during unstable training, such as the one shown in Figure 5 for MNIST. It turns out that in such settings (more dynamic changes), the IBP method is unable to stabilize the interval bounds in hidden layers, see Figure 7. This observation supports our hypothesis that verified error is highly correlated with the interval bounds in the hidden layer. In consequence, it is beneficial to explicitly encourage the model to keep the interval bounds low across layers.
4.4 Comparison with the original Interval Bound Propagation
For the sake of completeness we provide the comparison between IBP and our approach in terms of final error rates. We report the test error, the error rate under the PGD attack and the verified bound on the error rate. All these numbers are obtained after the complete training. We stress that all hyperparameters (particularly schedules of , and learning rate) were left the same as in Gowal et al. (2018).
It is evident from Table 2 that without any hyperparameters tuning, our approach gives comparable results to IBP. However, the reported levels of errors are reached faster and the training is done in more stable way — our main points described in the previous sections. We highlight the results for CIFAR-10 with challenging , where our modification lets even the smallest model beat the verified error obtained by Gowal et al for this dataset. For the largest network, we think the original hyperparameters (number of epochs, learning rate schedule) cause overfitting for our method, i.e. the model learns quickly and spends most of the training time on fitting to the train set.
|CIFAR-10/small||8/255||IBP (Gowal et al.)||39.33%||52.22%||63.58%|
|CIFAR-10/medium||8/255||IBP (Gowal et al.)||18.88%||48.32%||100%|
|CIFAR-10/large||8/255||IBP (Gowal et al.)||40.55%||56.65%||65.89%|
|MNIST/small||0.4||IBP (Gowal et al.)||2.62%||14.14%||20.74%|
|MNIST/medium||0.4||IBP (Gowal et al.)||1.66%||12.16%||17.5%|
|MNIST/large||0.4||IBP (Gowal et al.)||1.66%||10.34%||15.01%|
|SVHN/small||8/255||IBP (Gowal et al.)||26.60%||48.50%||60.87%|
|SVHN/medium||8/255||IBP (Gowal et al.)||36.58%||48.79%||55.95%|
|SVHN/large||8/255||IBP (Gowal et al.)||20.00%||37.06%||52.37%|
Most techniques for training verifiably robust classifiers are computationally demanding. In this paper, we used a simple but promising technique based on the interval arithmetic Gowal et al. (2018), which needs only two standard network passes to process the input perturbed within the norm-bounded ball. Although a single iteration can be performed fast, the whole training process requires careful tuning and many epochs to converge. As a remedy, we proposed to additionally minimize the size of an outer bound of the adversarial polytope across hidden layers. This modification was motivated by the observation that IBP implicitly minimize these bounds in the case of the successful, convergent training. By adding this constraint explicitly, the model become less sensitive to the change of hyper-parameters and, in consequence, we could increase the perturbation radius more dynamically to the desired value, which makes the training faster. As a result, we were able to obtain the lowest verified error in the most challenging case of CIFAR-10 with the perturbation radius using only the small architecture.
- Bunel et al.  Rudy R Bunel, Ilker Turkaslan, Philip Torr, Pushmeet Kohli, and Pawan K Mudigonda. A unified view of piecewise linear neural network verification. In Advances in Neural Information Processing Systems, pages 4790–4799, 2018.
- Carlini et al.  Nicholas Carlini, Guy Katz, Clark Barrett, and David L Dill. Provably minimally-distorted adversarial examples. arXiv preprint arXiv:1709.10207, 2017.
- Cheng et al.  Chih-Hong Cheng, Georg Nuhrenberg, and Harald Ruess. Maximum resilience of artificial neural networks. In International Symposium on Automated Technology for Verification and Analysis, pages 251–268. Springer, 2017.
- Dutta et al.  Souradeep Dutta, Susmit Jha, Sriram Sankaranarayanan, and Ashish Tiwari. Output range analysis for deep feedforward neural networks. In NASA Formal Methods Symposium, pages 121–138. Springer, 2018.
- Dvijotham et al. [2018a] Krishnamurthy Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelovic, Brendan O’Donoghue, Jonathan Uesato, and Pushmeet Kohli. Training verified learners with learned verifiers. arXiv preprint arXiv:1805.10265, 2018a.
- Dvijotham et al. [2018b] Krishnamurthy Dvijotham, Robert Stanforth, Sven Gowal, Timothy A. Mann, and Pushmeet Kohli. A dual approach to scalable verification of deep networks. CoRR, abs/1803.06567, 2018b. URL http://arxiv.org/abs/1803.06567.
- Ehlers  Ruediger Ehlers. Formal verification of piece-wise linear feed-forward neural networks. In International Symposium on Automated Technology for Verification and Analysis, pages 269–286. Springer, 2017.
- Goodfellow et al.  Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Gowal et al.  Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Timothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models. arXiv preprint arXiv:1810.12715, 2018.
- Grosse et al.  Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. Adversarial examples for malware detection. In European Symposium on Research in Computer Security, pages 62–79. Springer, 2017.
- Hinton et al.  Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, et al. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal processing magazine, 29, 2012.
- Katz et al.  Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: An efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, pages 97–117. Springer, 2017.
-  Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-10 (canadian institute for advanced research). URL http://www.cs.toronto.edu/~kriz/cifar.html.
- Krizhevsky et al.  Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
- Kurakin et al.  Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
-  Yann LeCun and Corinna Cortes. MNIST handwritten digit database. URL http://yann.lecun.com/exdb/mnist/.
- Liu et al.  Chen Liu, Ryota Tomioka, and Volkan Cevher. On certifying non-uniform bound against adversarial attacks. arXiv preprint arXiv:1903.06603, 2019.
- Lomuscio and Maganti  Alessio Lomuscio and Lalit Maganti. An approach to reachability analysis for feed-forward relu neural networks. arXiv preprint arXiv:1706.07351, 2017.
- Madry et al.  Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Mirman et al.  Matthew Mirman, Timon Gehr, and Martin Vechev. Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning, pages 3575–3583, 2018.
- Mirman et al.  Matthew Mirman, Gagandeep Singh, and Martin Vechev. A provable defense for deep residual networks. arXiv preprint arXiv:1903.12519, 2019.
- Moore  Ramon E Moore. Methods and applications of interval analysis. SIAM, 1979.
Moosavi-Dezfooli et al. 
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal
Universal adversarial perturbations.
Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1765–1773, 2017.
- Netzer et al.  Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011. URL http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf.
- Papernot et al.  Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pages 372–387. IEEE, 2016.
- Paszke et al.  Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NIPS-W, 2017.
- Salman et al.  Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, and Pengchuan Zhang. A convex relaxation barrier to tight robust verification of neural networks. arXiv preprint arXiv:1902.08722, 2019.
- Singh et al. [2018a] Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Puschel, and Martin Vechev. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, pages 10802–10813, 2018a.
- Singh et al. [2018b] Gagandeep Singh, Timon Gehr, Markus Puschel, and Martin Vechev. Boosting robustness certification of neural networks. 2018b.
- Singh et al.  Gagandeep Singh, Timon Gehr, Markus Puschel, and Martin Vechev. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL):41, 2019.
- Sitawarin et al.  Chawin Sitawarin, Arjun Nitin Bhagoji, Arsalan Mosenia, Mung Chiang, and Prateek Mittal. Darts: Deceiving autonomous cars with toxic signs. arXiv preprint arXiv:1802.06430, 2018.
- Szegedy et al.  Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Tjeng et al.  Vincent Tjeng, Kai Y Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. 2018.
- Tramer et al.  Florian Tramer, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.
- Wong and Kolter  Eric Wong and J Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. arXiv preprint arXiv:1711.00851, 2017.
- Xiao et al. [2018a] Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Song. Generating adversarial examples with adversarial networks. arXiv preprint arXiv:1801.02610, 2018a.
- Xiao et al. [2018b] Kai Y Xiao, Vincent Tjeng, Nur Muhammad Shafiullah, and Aleksander Madry. Training for faster adversarial robustness verification via inducing relu stability. arXiv preprint arXiv:1809.03008, 2018b.
- Yuan et al.  Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. Adversarial examples: Attacks and defenses for deep learning. IEEE transactions on neural networks and learning systems, 2019.
- Zhang et al.  Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In Advances in Neural Information Processing Systems, pages 4939–4948, 2018.