1 Introduction
In recent years it has been highlighted that stateoftheart neural networks are highly nonrobust: small changes to an input image, which are almost nonperceivable for humans, change the classifier decision and the wrong decision has even high confidence [24, 8]. This calls into question the use of neural networks in safetycritical systems e.g. medical diagnosis systems or selfdriving cars and opens up possibilities to actively attack an ML system in an adversarial way [19, 15, 14, 15]. Moreover, this nonrobustness has also implications on followup processes like interpretability. How should we be able to interpret classifier decisions if very small changes of the input lead to different decisions?
The finding of the nonrobustness initiated a competition where on the one hand increasingly more sophisticated attack procedures were proposed [8, 11, 21, 5, 16] and on the other hand research was focused to develop stronger defenses against these attacks [9, 8, 29, 19, 11, 2]. In the end it turned out that for all established defenses there exists still a way to attack the classifier succesfully [4]
. Thus considering the high importance of robustness in safetycritical machine learning applications, we need robustness guarantees, where one can provide for each test point the radius of a ball on which the classifier will not change the decision and thus no attack whatsoever will be able to create an adversarial example inside this ball.
Therefore recent research has focused on such provable guarantees of a classifier with respect to the norm [3], norm [10] and norm [2, 12, 20, 25, 27, 26]. Some works try to solve the combinatorial problem of computing for each test instance the norm of the minimal perturbation necessary to change the decision [3, 12, 25]. Unfortunately this does not scale to large networks and can take minutes to hours just for a single test point. Another line of research thus focuses on lower bounds on the norm of the minimal perturbation necessary to change the decision [10, 20, 27, 26].
Moreover, in recent years several new ways to regularize neural networks [22, 6] or new forms of losses [7] have been proposed with the idea of enforcing a large margin, that is a large distance between training instances and decision boundaries. However, these papers do not directly optimize a robustness guarantee. In spirit our paper is closest to [10, 20, 27, 17]. All of them are aiming at providing robustness guarantees and at the same time they propose a new way how one can optimize the robustness guarantee during training. Currently, up to our knowledge only [27] can optimize robustness wrt to multiple norms, whereas [10] is restricted to and [20, 17] to
. In this paper we propose a regularization scheme for the class of ReLU networks (feedforward networks with ReLU activation functions including convolutional and residual architectures with max or sumpooling layers) which provably increases the robustness of classifiers. We use the fact that these networks lead to continuous piecewise affine classifier functions and show how to get either the optimal minimal perturbation or a lower bound using the properties of the linear region in which the point lies. As a result of this analysis we propose a new regularization scheme which directly maximizes the lower bound on the robustness guarantee. This allows us to get classifiers with good test error and good robustness guarantees at the same time. While we focus on robustness with respect to
distance, our approach applies to all norms. Finally, we show in experiments on four data sets that our approach improves lower bounds as well as upper bounds on the norm of the minimal perturbation and can be integrated with adversarial training [8, 16].2 Local properties of ReLU networks
Feedforward neural networks which use piecewise affine activation functions (e.g. ReLU, leaky ReLU) and are linear in the output layer can be rewritten as continuous piecewise affine functions [1].
Definition 2.1.
A function is called piecewise affine if there exists a finite set of polytopes (referred to as linear regions of ) such that and is an affine function when restricted to every .
In the following we introduce the notation required for the explicit description of the linear regions and decision boundaries of a multiclass ReLU classifier (where is the input space dimension and the number of classes) which has no activation function in the output layer. The decision of at a point is given by . The discrimination between linear regions and decision boundaries allows us to share the linear regions across classifier components.
We denote by , , the ReLU activation function, is the number of layers and and
respectively are the weights and offset vectors of layer
, for and . If and we define recursively the pre and postactivation output of every layer asso that the resulting classifier is obtained as .
We derive the description of the polytope in which lies and the resulting affine function when is restricted to . We assume for this that does not lie on the boundary between two polytopes (which is almost always true as the faces shared by two or more polytopes have dimension strictly smaller than ). Let for be diagonal matrices defined elementwise as
This allows us to write as composition of affine functions, that is
We can further simplify the previous expression as , with and given by
Note that a forward pass through the network is sufficient to compute and for every , which results in only a small overhead compared to the usual effort necessary to compute the output of at . We are then able to characterize the polytope as intersection of half spaces given by
for , , namely
Note that is also the number of hidden units of the network. Finally, we can write
which represents the affine restriction of to . One can further distinguish the subset of assigned to a specific class , among the available ones, which is given by
where is the th row of . The set is again a polytope as it is an intersection of polytopes and it holds .
3 Robustness guarantees for ReLU networks
In the following we first define the problem of the minimal adversarial perturbation and then derive robustness guarantees for ReLU networks. We call a decision of a classifier robust at if small changes of the input do not alter the decision. Formally, this can be described as optimization problem (1) [24]. If the classifier outputs class for input , assuming a unique decision, the robustness of at is given by
(1) 
where is a constraint set which the generated point has to satisfy, e.g., an image has to be in .
The complexity of the optimization problem (1) depends on the classifier , but it is typically nonconvex, see [12] for a hardness result for neural networks.
For standard neural networks is very small for almost any input of the data generating distribution, which questions the use of such classifiers in safetycritical systems. The solutions of (1), , are called adversarial samples.
For a linear classifier, , one can compute the solution of (1) in closed form [11]
where is the dual norm of , that is . In [10] it has been shown that box constraints can be integrated for linear classifiers which results in simple convex optimization problems. In the following we use the intuition from linear classifiers and the particular structures derived in Section 2 to come up with robustness guarantees, that means lower bounds on the optimal solution of (1), for ReLU networks. Moreover, we show that it is possible to compute the minimal perturbation for some of the inputs even though the general problem is NPhard [12].
Let us start analyzing how we can solve efficiently problem (1) inside each linear region . We first need the definition of two important quantities:
Lemma 3.1.
The distance of to the boundary of the polytope is given by
where is the th row of and is the dual norm of ().
Proof.
Due to the polytope structure of it holds that
is the minimum distance to the hyperplanes,
, which constitute the boundary of . It is straightforward to check that the minimum distance of a hyperplane to is given by where is the dual norm. This can be obtained as follows(2) 
Introducing we get
(3) 
Note that by Hölder inequality one has , which yields . Noting that we get and equality is achieved when one has equality in the Hölder inequality. ∎
For the decision boundaries in , with , we define the quantity as
Lemma 3.2.
If , then is the minimal distance of to the decision boundary of the ReLU network .
Proof.
First we note that is the distance of to the decision boundary for the linear multiclass classifier which is equal to on . If then the point realizing the minimal distance to the decision boundary is inside and as there cannot exist another point on the decision boundary of outside of having a smaller distance to . Thus is the minimal distance of to the decision boundary of . ∎
The next theorem combines both results to give lower bounds resp. the solution to the optimization problem of the minimal adversarial perturbation in (1).
Theorem 3.1.
Proof.
If , then the ReLU classifier does not change on and thus is a lower bound on the minimal norm perturbation necessary to change the class. The second statement follows directly by Lemma 3.2. ∎
In Figure 1 we illustrate the different cases for . On the left hand side and thus we get that on the ball the decision does not change, whereas in the rightmost plot we have and thus we obtain the minimum distance to the decision boundary. Using Theorem 3.1 we can provide robustness guarantees for every point and for some even compute the optimal robustness guarantees. Finally, we describe later how one can improve the lower bounds by checking neighboring regions of . Compared to the bounds of [27, 26] ours are slightly worse, see Table 5
. However, our bounds can be directly maximized and have a clear geometrical meaning and thus motivate directly a regularization scheme for ReLU networks which we propose in the next section.
3.1 Improving lower bounds by checking neighboring regions
In order to improve the lower bounds we can use not only the linear region where lies but also some neighboring regions. The following description is just a sketch as one has to handle several case distinctions.
Let be the original point and the set of hyperplanes defining the polytope sorted so that if , where is the distance including box constraints. If we do not directly get the guaranteed optimal solution, we get an upper bound (, namely the distance to the decision boundary in ) and a lower bound for the norm of the adversarial perturbation (). If , we can check the region that we find on the other side of . In order to get the corresponding description of the polytope on the other side, we just have to change the corresponding entry in the activation matrix of the layer where belongs to and recompute the hyperplanes of the new linear region . Solving (1) on the second region we get a new upper bound if the distance of to the decision boundary in is smaller than . Moreover we update with the hyperplanes given by the second region. Finally, if then is the optimal solution, otherwise
and we can repeat this process with the next closest hyperplane. At the moment we stop after checking maximally
neighboring linear regions.4 Large margin and region boundary regularization
Using the results of Section 3, a classifier with guaranteed robustness requires large distance to the boundaries of the linear region as well as to the decision boundary. Even the optimal guarantee (solution of (1)) can be obtained in some cases. Unfortunately, as illustrated in Figures (a)a and (c)c for simple one hidden layer networks, the linear regions are small for networks trained without regularization and thus no meaningful guarantees can be obtained. Thus we propose a new regularization scheme which simultaneously aims for each training point to achieve large distance to the boundary of the linear region it lies in, as well as to the decision boundary. Using Theorem 3.1 this directly leads to good robustness guarantees of the resulting classifier.
However, note that just maximizing the distance to the decision boundary might be misleading during training as this does not discriminate between points which are correctly (correct side of the decision hyperplane) or wrongly classified (wrong side of the decision hyperplane). Thus we introduce the signed version of , where is the true label of the point ,
(4) 
Please note that if , then is correctly classified, whereas if is wrongly classified. If , then it follows from Lemma 3.2 that is the distance to the decision hyperplane. If this does not need to be any longer true, but is at least a good proxy as
is an estimate of the local cross Lipschitz constant
[10]. Finally, we propose to use the following regularization scheme:Definition 4.1.
Let be a point of the training set and define the Maximum Margin Regularizer () for , for some , as
(5) 
The MMR penalizes distances to the boundary of the polytope if and positive distances ( is correctly classified) if . Notice that wrongly classified points are always penalized. The part of the regularizer corresponding to has been suggested in [7]
in a similar form as a loss function for general neural networks without motivation from robustness guarantees. They have an additional loss penalizing difference in the outputs with respect to changes at the hidden layers which is completely different from our geometrically motivated regularizer penalizing the distance to the boundary of the linear region. The choice of
allows different tradeoff between the terms. In particular (stronger maximization of ) leads in practice to more points for which the optimal robustness guarantee (case ) can be proved.For practical reasons, we also propose a variation of our MMR regularizer in (5):
(6) 
where is the distance of to the th closest hyperplane of the boundary of the polytope and is the analogue for the decision boundaries. Basically, we are optimizing, instead of the closest decision hyperplane, the closest ones and analogously the closest hyperplanes defining the linear region of . This speeds up the training time as more hyperplanes are moved in each update. Moreover, when deriving lower bounds using more than one linear region, one needs to consider more than just the closest boundary hyperplane. Finally, many stateoftheart schemes for proving lower bounds [27, 26]
work well only if the activation status of most neurons is constant for small changes of the points. This basically amounts to ensure that the boundaries of all hyperplanes are sufficiently far away, which is exactly what our regularization scheme is aiming at. Thus our regularization scheme helps to improve other schemes to establish better lower bounds (see Section
6). This is also the reason why the term pushing the polytope boundaries away is essential. Just penalizing the distance to the decision boundary is not sufficient to prove good lower bounds as we will show in Section 6. Compared to the regularization scheme in [27] using a dual feasible point of the robust loss, our approach has a direct geometric interpretation and allows to derive the exact minimal perturbation for some fraction of the test points varying from dataset to dataset but it can be as high as . In practice, we gradually decrease and in (6) after some epochs (note that for
formulations (5) and (6) are equivalent) so that only the closest hyperplanes of each training point influence the regularizer.Thus, denoting the cross entropy loss , the final objective of our models is
(7) 
where is the training data and the regularization parameter. Figure 2 shows the effect of the regularizer. Compared to the unregularized case the size of the linear regions is significantly increased.
5 Integration of box constraints into robustness guarantees
Note that in many applications the input space is not full but a certain subset due to constraints e.g. images belong to . These constraints typically increase the norm of the minimal perturbation in (1). Thus it is important to integrate the constraints in the generation of adversarial samples (upper bounds) as done e.g. in [5], but obviously we should also integrate them in the computation of the lower bounds which is in our case based on the computation of distances to hyperplanes. The computation of the distance of to a hyperplane has a closed form solution:
(8) 
The additional box constraints lead to the following optimization problem,
(9) 
which is convex but has no analytical solution. However, its dual is just a onedimensional convex optimization problem which can be solved efficiently. In fact a reformulation of this problem has been considered in [10], where fast solvers for are proposed. Moreover, when computing the distance to the boundary of the polytope or the decision boundaries one does not need to solve always the boxconstrained distance problem (9). It suffices to compute first the distances (8) as they are smaller or equal to the ones of (9) and sort them in ascending order. Then one computes the boxconstrained distances in the given order and stops when the smallest computed boxconstrained distance is smaller than the next original distance in the sorted list. In this way one typically just needs to solve a very small fraction of all boxconstrained problems. The integration of the box constraints is important as the lower bounds improve on average by 20% and this can make the difference between having a certified optimal solution and just a lower bound.
6 Experiments
norm robustness  

dataset  training  FC1  FC10  CNN  
scheme  TE(%)  LB  UB  TE(%)  LB  UB  TE(%)  LB  UB  
MNIST  plain  1.59  0.34  0.98  1.81  0.13  0.70  0.97  0.04  1.03 
at  1.29  0.25  1.23  0.93  0.14  1.59  0.86  0.14  1.67  
KW  1.37  0.70  1.75  1.69  0.75  1.74  1.04  0.32  1.84  
211  MMR  1.51  0.69  1.69  1.87  0.48  1.48  1.17  0.38  1.70 
MMR+at  1.59  0.70  1.70  1.35  0.40  1.60  1.14  0.38  1.86  
GTS  plain  12.24  0.33  0.57  11.25  0.08  0.48  6.73  0.06  0.43 
at  13.55  0.34  0.66  13.01  0.10  0.56  8.12  0.06  0.53  
KW  13.06  0.35  0.63  13.56  0.16  0.52  8.44  0.11  0.52  
211  MMR  11.15  0.69  0.69  12.82  0.64  0.67  7.40  0.09  0.59 
MMR+at  11.72  0.72  0.72  13.36  0.64  0.66  10.50  0.11  0.62  
FMNIST  plain  9.61  0.18  0.53  10.53  0.05  0.44  8.86  0.03  0.32 
at  9.89  0.11  1.00  9.89  0.11  1.00  8.77  0.07  0.80  
KW  9.95  0.46  1.11  11.42  0.47  1.22  10.37  0.17  0.96  
211  MMR  10.22  0.50  0.85  11.73  0.68  1.18  10.30  0.17  0.88 
MMR+at  10.94  0.66  1.45  11.39  0.67  1.24  10.48  0.21  1.14 
In the following we provide a variety of experiments aiming to show the state of the art performance of MMR to achieve robust classifiers. We use four datasets: MNIST, German Traffic Signs (GTS) [23], Fashion MNIST [28] and CIFAR10. We restrict ourselves in the paper to robustness wrt to distance. Unless stated otherwise, lower bounds are computed with the technique presented in [27], upper bounds using the CarliniWagner (CW) attack [5] in the implementation of [18]. However, in the cases where we can compute via our Theorem 3.1 the optimal robustness, we use it as lower and upper bound. We compare five methods: plain training, adversarial training following [16] which has been shown to significantly increase robustness, the robust loss of [27] which we denote as KW, our regularization scheme MMR and MMR together with adversarial training again as in [16]. All schemes are evaluated on two fully connected architectures, both consisting in total of 1024 hidden units (same number of hyperplanes): FC1, which has one hidden layer, and FC10, with 10 hidden layers. Moreover we use a convolutional network having 2 convolutional layers followed by 2 dense layers (for more details see Appendix A). All input images are scaled to be in .
Improvement of robustness: The results on three datasets for all three networks can be found in Table 1. For CIFAR10 we just evaluate the different methods on the convolutional networks as fully connected networks do not have the good test performance shown in Table 2. We report test error and the average lower and upper bounds on the optimal adversarial perturbation computed on 1000 points of the test set. For KW, MMR and MMR + adversarial training we report the solutions which achieve similar test error than the plain model as this is the most interesting application case, where without or minimal loss of prediction performance one gets more robust models. There are several interesting observations. First of all, while adversarial training improves the upper bounds compared to the plain setting often quite significantly, the lower bounds almost never improve, often they get even worse. This is in contrast to the methods, KW and our MMR, which optimize the robustness guarantees. For MMR we see in all cases significant improvements of the lower bounds over plain and adversarial training, for KW this is also true but the improvements on GTS are much smaller. Notably, for the fully connected networks FC1 and FC10 on GTS and FMNIST , the lower bounds achieved by MMR and/or MMR+at are larger than the upper bounds of the plain training for FMNIST and better than plain and adversarial training as well as KW on GTS. Thus MMR is provably more robust than the competing methods in these configurations. Moreover, the achieved lower bounds of MMR are only worse than the ones of KW on MNIST for FC10. Also for the achieved upper bounds MMR is most of the time better than KW and always improves over adversarial training. For the CNNs the improvements of KW and MMR over plain and adversarial training in terms of lower and upper bounds are smaller than for the fully connected networks and it is harder to maintain similar test performance. The differences between KW and MMR for the lower bounds are very small so that for CNNs both robust methods perform on a similar level.
norm robustness on CIFAR10  

training  CNN  
scheme  TE(%)  LB  UB 
plain  25.98  0.02  0.16 
at  25.36  0.04  0.42 
KW  41.52  0.16  0.66 
14 MMR  41.86  0.16  0.39 
MMR+at  41.11  0.13  0.57 
MMRfull  MMR  

dataset  model  test error  lower bounds  upper bounds  test error  lower bounds  upper bounds 
MNIST  FC1  1.51%  0.69  1.69  0.93%  0.35  1.69 
FC10  1.87%  0.48  1.48  1.21%  0.20  1.62  
GTS  FC1  11.15%  0.69  0.69  12.09%  0.48  0.63 
FC10  12.82%  0.64  0.67  12.41%  0.12  0.48  
FMNIST  FC1  10.22%  0.50  0.85  9.83%  0.31  1.30 
FC10  11.73%  0.68  1.18  10.32%  0.13  1.15 
MMR training  plain training  

dataset  model  test error  optimal points  opt vs LB  test error  optimal points 
MNIST  FC1  1.51%  0.20%  14.11%  1.59%  0.02% 
FC10  1.87%  0.06%    1.81%  0.02%  
CNN  1.17%  0.00%    0.97%  0.00%  
GTS  FC1  11.15%  99.97%  0.59%  12.27%  1.12% 
FC10  12.82%  94.86%  0.66%  11.28%  0.14%  
CNN  7.40%  0.00%    6.73%  0.00%  
FMNIST  FC1  10.22%  11.22%  6.53%  9.61%  0.37% 
FC10  11.73%  9.90%  7.27%  10.53%  0.04%  
CNN  10.30%  0.09%    8.86%  0.00% 
KW [27]  Theorem 3.1  our improved  

dataset  model  test error  lower bounds  lower bounds  lower bounds  
MNIST  FC1  MMR  1.51%  0.69  0.22  0.29 
FC1  MMR+at  1.59%  0.70  0.25  0.33  
FC10  MMR  1.87%  0.48  0.31  0.33  
FC10  MMR+at  1.35%  0.40  0.21  0.26  
GTS  FC1  MMR  11.15%  0.69  0.69  0.69 
FC1  MMR+at  11.72%  0.72  0.72  0.72  
FC10  MMR  12.82%  0.64  0.63  0.63  
FC10  MMR+at  13.36%  0.64  0.63  0.63  
FMNIST  FC1  MMR  10.22%  0.50  0.30  0.41 
FC1  MMR+at  10.94%  0.66  0.33  0.42  
FC10  MMR  11.73%  0.68  0.56  0.64  
FC10  MMR+at  11.39%  0.67  0.53  0.60 
Importance of linear regions maximization: In order to highlight the importance of both parts of the MMR regularization, i) penalization of the distance to decision boundary and ii) penalization of the distance to boundary of the polytope, we train, for each dataset/architecture, models penalizing only the distance to the decision boundary, that is the second term in the r.h.s. of (5) and (6). We call this partial regularizer MMR, in contrast to the full version MMRfull. Then we compare the lower and upper bounds on the solution of (1) for MMR and MMRfull models. For a fair comparison we consider models with similar test error. We clearly see in Table 3 that the lower bounds are always significantly better when MMRfull is used, while the behavior of the upper bounds does not clearly favor one of the two. This result shows that in order to get good lower bounds one has to increase the distance of the points to the boundaries of the polytope.
Guaranteed optimal solutions via MMR: Theorem 3.1 provides a simple and efficient way to obtain in certain cases the solution of (1). Although for normally trained networks the conditions are rarely satisfied, we show in Table 4 that for the MMRmodels for fully connected networks for a significant fraction of the test set we obtain the globally optimal solution of (1), that is the true robustness. Moreover, we report how much better our globally optimal solutions are compared to the lower bounds of [27]. Interestingly, we can provably get the true robustness for around of points for FMNIST and for over of the points on GTS for the case of fully connected networks. For these cases the optimal solutions have roughly larger norm for FMNIST and larger for GTS than the lower bounds. For convolutional networks we get almost no globally optimal solutions which is a consequence of both dealing with generally less robust model and the specific structure of the classifier induced by weights sharing. While globally optimal solutions can be obtained via mixedinteger optimization for small ReLU networks [25], this approach does not scale to larger networks. However, globally optimal solutions for larger networks achieved via our method can serve as a test both for lower and upper bounds. This is an important issue as currently large parts of the community relies just on upper bounds of robustness using attack schemes like the CWattack. However, the next paragraph shows that also the CWattack can quite significantly overestimate robustness for a certain fraction of the test set.
Evaluation of CWattack: The CWattack [5] which we use for our upper bounds is considered state of the art. Thus it is interesting to see how close it is to the globally optimal solution. On GTS FC1 we find for the MMR model with best test error the globally optimal solution of (1) for 12596 out of 12630 test points. We compare on this subset the optimal norm to obtained by the CWattack and plot in descending order of the ratio in Figure 3 (note that we truncate at 4000 points). While the CWattack performs in general well, there are 2330 points ( of the test set) where the CWattack has at least 10% larger norms and 1145 points (around ) with at least 20% larger norms. The maximal relative difference is 235%. Thus at least on a pointwise basis evaluating robustness with respect to an attack scheme can significantly overestimate robustness. This shows the importance of techniques to prove lower bounds. Moreover, the time to compute the adversarial examples by the CWattack is 16327s, while our technique provides both lower bounds and optimal solutions in 1701s.
Comparison of lower bounds: In Table 5 we compare, for fully connected models, the lower bounds computed by [27] and our technique using Theorem 3.1 with integration of box constraints once just checking the initial linear region where the point lies versus also checking neighboring linear regions. We see that [27] obtain better lower bounds, this is why we use their method for the evaluation of the lower bounds. Nevertheless, the gap is not too large and while the lower bounds are worse, the achieved robustness using our MMR regularization is mostly better as discussed in Table 1.
7 Conclusion
We have introduced a geometrically motivated regularization scheme which leads to provably better robustness than plain training and achieves better lower bounds than adversarial training. Moreover, it outperforms or is equal to the state of the art [27] in terms of lower bounds and upper bounds wrt norm. Finally, our scheme allows to obtain the provably optimal solution in a significant fraction of cases for large fully connected networks which can be used to test lower and upper bounds.
Acknowledgements
We would like to thank Eric Wong and Zico Kolter for providing their code and helping us to adapt it to the norm.
References

[1]
R. Arora, A. Basuy, P. Mianjyz, and A. Mukherjee.
Understanding deep neural networks with rectified linear unit.
In ICLR, 2018.  [2] O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A. Nori, and A. Criminisi. Measuring neural net robustness with constraints. In NIPS, 2016.
 [3] N. Carlini, G. Katz, C. Barrett, and D. L. Dill. Provably minimallydistorted adversarial examples. preprint, arXiv:1709.10207v2, 2017.

[4]
N. Carlini and D. Wagner.
Adversarial examples are not easily detected: Bypassing ten detection
methods.
In
ACM Workshop on Artificial Intelligence and Security
, 2017.  [5] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017.
 [6] M. Cisse, P. Bojanowksi, E. Grave, Y. Dauphin, and N. Usunier. Parseval networks: Improving robustness to adversarial examples. In ICML, 2017.
 [7] G. F. Elsayed, D. Krishnan, H. Mobahi, K. Regan, and S. Bengio. Large margin deep networks for classification. preprint, arXiv:1803.05598v1, 2018.
 [8] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.
 [9] S. Gu and L. Rigazio. Towards deep neural network architectures robust to adversarial examples. In ICLR Workshop, 2015.
 [10] M. Hein and M. Andriushchenko. Formal guarantees on the robustness of a classifier against adversarial manipulation. In NIPS, 2017.
 [11] R. Huang, B. Xu, D. Schuurmans, and C. Szepesvari. Learning with a strong adversary. In ICLR, 2016.
 [12] G. Katz, C. Barrett, D. Dill, K. Julian, and M. Kochenderfer. Reluplex: An efficient smt solver for verifying deep neural networks. In CAV, 2017.
 [13] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [14] A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial examples in the physical world. In ICLR Workshop, 2017.
 [15] Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and blackbox attacks. In ICLR, 2017.

[16]
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Valdu.
Towards deep learning models resistant to adversarial attacks.
In ICLR, 2018.  [17] M. Mirman, T. Gehr, and M. Vechev. Differentiable abstract interpretation for provably robust neural networks. In ICML, 2018.
 [18] N. Papernot, N. Carlini, I. Goodfellow, R. Feinman, F. Faghri, A. Matyasko, K. Hambardzumyan, Y.L. Juang, A. Kurakin, R. Sheatsley, A. Garg, and Y.C. Lin. cleverhans v2.0.0: an adversarial machine learning library. preprint, arXiv:1610.00768, 2017.
 [19] N. Papernot, P. McDonald, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep networks. In IEEE Symposium on Security & Privacy, 2016.
 [20] A. Raghunathan, J. Steinhardt, and P. Liang. Certified defenses against adversarial examples. In ICLR, 2018.
 [21] P. F. S.M. MoosaviDezfooli, A. Fawzi. Deepfool: a simple and accurate method to fool deep neural networks. In CVPR, pages 2574–2582, 2016.
 [22] J. Sokolic, R. Giryes, G. Sapiro, and M. R. D. Rodrigues. Robust large margin deep neural networks. IEEE Transactions on Signal Processing, 65:4265 – 4280, 2017.
 [23] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks, 32:323–332, 2012.
 [24] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In ICLR, pages 2503–2511, 2014.
 [25] V. Tjeng and R. Tedrake. Verifying neural networks with mixed integer programming. preprint, arXiv:1711.07356v1, 2017.
 [26] T. Weng, H. Zhang, H. Chen, Z. Song, C. Hsieh, L. Daniel, D. S. Boning, and I. S. Dhillon. Towards fast computation of certified robustness for relu networks. In ICML, 2018.
 [27] E. Wong and J. Z. Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In ICML, 2018.
 [28] H. Xiao, K. Rasul, and R. Vollgraf. FashionMNIST: a novel image dataset for benchmarking machine learning algorithms. preprint, arXiv:1708.07747, 2017.
 [29] S. Zheng, Y. Song, T. Leung, and I. J. Goodfellow. Improving the robustness of deep neural networks via stability training. In CVPR, 2016.
Appendix A Experimental details
By FC1 we denote a one hidden layer fully connected network with 1024 hidden units. By FC10 we denote a 10 hidden layers network that has 1 layer with 124 units, seven layers with 104 units and two layers with 86 units (so that the total number of units is again 1024). The convolutional architecture that we use is identical to [27], which consists of two convolutional layers with [16, 32] filters of size 4x4 and 2 fully connected layers with 100 hidden units.
For all the experiments we use batch size 100 and we train the FC models for 300 epochs and the CNNs for 100 epochs. Moreover, we use Adam optimizer [13] with the default learning rate 0.001. We reduce the learning rate by a factor of 10 for the last 10% of epochs. For training on CIFAR10 dataset we apply random crops and random mirroring of the images as data augmentation.
For the FC models we use MMR regularizer in the formulation (6) with for the first 50% epochs and in the formulation (5) for the rest of epochs. For CNNs we used the formulation (6) with fixed , and we gradually change
from 400 hyperplanes in the beginning to 100 hyperplanes towards the end of training. In order to find the optimal set of hyperparameters we performed a grid search over
from {0.1, 0.25, 0.5, 1.0, 2.0}, and from {0.25, 0.5, 0.75} for CIFAR10 and GTS, from {0.25, 0.5, 1.0} for FMNIST and from {1.0, 1.5, 2.0} for MNIST. In order to make a comparison to the robust training of [27] we adapted it for the norm, and performed a grid search over the radius of the norm from {0.05, 0.1, 0.2, 0.3, 0.4, 0.6} used in their robust loss, aiming at a model with nontrivial lower bounds with little or no loss in test error.We perform adversarial training using the PGD attack of [16]. However, since we focus on norm, we adapted the implementation from [18] to perform the plain gradient update instead of the gradient sign (which corresponds to norm and thus irrelevant for case) on every iteration. We use the following norm of the perturbation: 2.0 for MNIST, 1.0 for FMNIST, 0.5 for GTS and CIFAR10 using the step size of 0.5, 0.25 and 0.125 respectively. We perform 40 iterations of the PGD attack for every batch. During the training, every batch contains 50% of adversarial examples and 50% of clean examples.
We use the untargeted formulation of the CarliniWagner attack in order to evaluate the upper bounds on the norm required to change the class. We use the settings provided in the original paper [5] and in their code, including 20 restarts, 10000 iterations, learning rate 0.01 and initial constant of 0.001.
Comments
There are no comments yet.