fab-attack
Code for FAB-attack
view repo
The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as the methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the l_p-norms for p ∈{1,2,∞} aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields high quality results already with one restart, minimizes the size of the perturbation, so that the robust accuracy can be evaluated at all possible thresholds with a single run, and comes with almost no free parameters except number of iterations and restarts. It achieves better or similar robust test accuracy compared to state-of-the-art attacks which are partially specialized to one l_p-norm.
READ FULL TEXT VIEW PDF
Recent advances show that deep neural networks are not robust to deliber...
read it
Evaluating adversarial robustness amounts to finding the minimum perturb...
read it
We study the robustness against adversarial examples of kNN classifiers ...
read it
State-of-the-art deep neural networks are sensitive to small input
pertu...
read it
Enhancing model robustness under new and even adversarial environments i...
read it
Recent improvements in deep learning models and their practical applicat...
read it
There are two major paradigms of white-box adversarial attacks that atte...
read it
Code for FAB-attack
The finding of the vulnerability of neural networks-based classifiers to adversarial examples, that is small perturbations of the input able to modify the decision of the models, started a fast development of a variety of attack algorithms. The high effectiveness of adversarial attacks reveals the fragility of these networks which questions their safe and reliable use in the real world, especially in safety critical applications. Many defenses have been proposed to fix this issue GuRig2015 ; ZheEtAl2016 ; PapEtAl2016a ; HuaEtAl2016 ; BasEtAl2016 ; MadEtAl2018 , but with limited success, as new more powerful attacks showed CarWag2017 ; AthEtAl2018 ; MosEtAl18
. In order to trust the decision of a given model, it is necessary to evaluate the exact adversarial robustness. Although this is possible for ReLU networks
KatzEtAl2017 ; TjeTed2017 , these techniques do not scale to commonly used large networks. Thus, the only way to derive bounds on the true robustness is by approximately solving the minimal adversarial perturbation problem through adversarial attacks.computational cost. Moreover, it is clear that due to the non-convexity of the minimal adversarial perturbation problem there exists no universally best attack (apart from the exact methods), since this depends on runtime constraints, networks architecture, dataset etc. However, our goal is to have an attack which performs well under a broad spectrum of conditions with minimal amount of hyperparameter tuning.
In this paper we propose a new white-box attacking scheme which performs comparably or better than established attacks and has the following features: first, it tries to produce adversarial samples with minimal distortion compared to the original point, measured wrt the -norms with . Compared to the quite popular PGD-attack KurGooBen2016a ; MadEtAl2018 this has the clear advantage that the method does not need to be restarted for every new threshold if one wants to evaluate the success rate of the attack with adversarial perturbations constrained to be in , and thus is particularly suitable in order to have a more complete picture on the robustness of a neural network for several thresholds . Second, it achieves fast good quality in terms of average distortion resp. robust accuracy compared to the PGD-attack and other fast attacks. At the same time we show that increasing the number of restarts keeps improving the results and makes it competitive with the strongest available attacks. Third, although it comes with a few parameters, these generalize across datasets, architectures and norms considered, so that we have an almost off-the-shelf method for which it is sufficient to specify the number of iterations and restarts. Most importantly compared to PGD and other methods there is no stepsize parameter which potentially has to be carefully adapted to every new network.
We first introduce minimal adversarial samples before we recall the definition and properties of the projection wrt the
-norms of a point on the intersection of a hyperplane and box constraints, as they are an essential part of our attack. Then we present our FAB-attack algorithm to generate minimal adversarial examples.
Let be a classifier which assigns every input (with the dimension of the input space) to one of the classes according to . In many scenarios the input of has to satisfy a specific set of constraints , e.g. images are represented as elements of . Then, given a point with true class , we define the minimal adversarial example for wrt the -norm as
(1) |
The optimization problem (1) is non-convex and NP-hard for non-trivial classifiers KatzEtAl2017 and although for some classes of networks it can be formulated as a mixed-integer program TjeTed2017 the computational cost of solving it is prohibitive for large, normally trained networks. Thus, is usually approximated by an attack algorithm
, which can be seen as heuristic to solve (
1). We will see in the experiments that it can happen that current attacks drastically overestimate and thus seemingly robust networks are actually not robust at all.Let and
be the normal vector and the offset defining the hyperplane
. Let , we denote by the box constrained projection wrt the -norm of on (projection onto the intersection of the box and the hyperplane ) the following minimization problem:(2) |
where are lower and upper bounds on each component of . For the optimization problem (2) is convex. Moreover, HeiAnd2017 proved that for the solution can be obtained in time, that is the complexity of sorting a vector of elements, as well as determining that it has no solution. This can be seen by going to the dual problem of (2) which reduces to a one-dimensional convex optimization problem.
Since this projection is part of our iterative scheme to generate adversarial examples, we need to handle specifically the case of (2) being infeasible. In this case, defining , we instead compute
(3) |
whose solution is given componentwise, for every , by . Assuming that the point satisfies the box constraints (as it will be in our algorithm), this is equivalent to identifying the corner of the -dimensional box defined by the componentwise constraints on closest to the hyperplane . Notice that if (2) is infeasible then the objective function of (3) stays positive and the points and are strictly contained in the same of the two halfspaces divided by . Finally, we define the operator
(4) |
which basically yields the point which gets as close as possible to the hyperplane without violating the box constraints.
We introduce now our algorithm^{1}^{1}1https://github.com/fra31/fab-attack to produce minimally distorted adversarial examples, wrt any -norm for , for a given point initially correctly classified by as class . The high-level idea is that we use the linearization of the classifier at the current iterate and compute the box-constrained projections of respectively onto the approximated decision hyperplane and take a convex combinations of these projections depending on the distance of and to the decision hyperplane, followed by some extrapolation step. We explain below the geometric motivation behind these steps. The attack closest in spirit is DeepFool MooFawFro2016 which is known to be very fast but suffers from low quality. The main reason is that DeepFool is just trying to find the decision boundary quickly but has no incentive to provide a solution close to . Our scheme resolves this main problem and, together with the exact projection we use, leads to a principled way to track the decision boundary (that is the surface where the decision of changes) close to .
If was a linear classifier then the closest point to
on the decision hyperplane could be found in closed form. Although neural networks are highly non-linear, so called ReLU networks (neural networks which use ReLU as activation function, and possibly max- and average pooling) are piecewise affine functions and thus locally a linearization of the network is an exact description of the classifier. Let
, then the decision boundary between classes and can be locally approximated using a first order Taylor expansion at by the hyperplane(5) |
Moreover the -distance of to is given by
(6) |
Note that if then belongs to the true decision boundary. Moreover, if the local linear approximation of the network is correct then the class with the decision hyperplane closest to the point can be computed as
(7) |
Thus, given that the approximation holds in some large enough neighborhood, the projection of onto lies on the decision boundary (unless Problem (2) is infeasible).
The iterative algorithm, , would be similar to DeepFool MooFawFro2016 with the difference that our projection operator is exact whereas they project onto the hyperplane and then do clipping in order to satisfy the box constraints. However, the problem with this scheme is that there is no bias towards the original target point after the algorithm has been started. Thus it goes typically further than necessary to find a point on the decision boundary as basically the algorithm just tracks the decision boundary without aiming at the minimal adversarial perturbation. Thus we consider additionally and use instead the iterative step, with , defined as
(8) |
which biases the step towards (see Figure 1). Note that this is a convex combination of points on both contained in and thus also their convex combination lies on and is contained in . The next question is how to choose . As we are aiming at a scheme with minimal amount of parameters, we want to have an automatic selection of based on the available geometric quantities. Let
Note that and are the distances of and to (inside ). We propose to use for the parameter the relative magnitude of these two distances, that is
(9) |
The motivation for doing so is that if is close to the decision boundary, then we should stay close to this point (note that is the approximation of computed at and thus it is valid in a small neighborhood of , whereas is farther away). On the other hand we want to have the bias towards in order not to go too far away from . This is why depends on the distances of and to but we limit with the amount of the step we go towards as the approximation at of is not likely to be valid around . Finally, we use a small extrapolation step as we noted empirically, similarly to MooFawFro2016 , that this helps to cross faster the decision boundary and get an adversarial sample. This leads to the final scheme:
(10) |
where is chosen as in , and is just the projection onto the box which can be done by clipping. In Figure 1 we visualize the scheme: in black one can see the hyperplane and the vectors and , in blue the step we would make going to the decision boundary with the DeepFool variant, while in red the actual step we have in our method. The green vector represents instead the bias towards the original point we introduce. On the left of Figure 1 we use , while on the right we use overshooting .
The projection of the target point onto the intersection of and is defined as
Note that replacing by we can rewrite this as
This can be interpreted as the minimization of the distance of the next iterate to the target point so that lies on the intersection of the (approximate) decision hyperplane and the box . This point of view on the projection again justifies using a convex combination of the two projections in our iterative scheme in (10).
The described scheme finds in a few iterations adversarial perturbations. However, we are interested in minimizing their norms. Thus, once we have a new point , we check whether it is assigned by to a class different from . In this case, we apply
(11) |
that is we go back towards on the segment , effectively starting again the algorithm at a point which is quite close to the decision boundary. In this way, due to the bias of the method towards we successively find adversarial perturbations of smaller norm, meaning that the algorithm tracks the decision boundary while getting closer to .
Our scheme finds points close to the decision boundary but often they are slightly off as the linear approximation is not exact and we apply the extrapolation step with . Thus, after finishing iterations of our algorithmic scheme, we perform a last, fast step to further improve the quality of the adversarial examples. Let be the closest point to classified differently from , say , found with the iterative scheme. It holds that and . This means that, assuming continuous, there exists a point on the segment such that and . If is linear
(12) |
Since is typically non-linear, but close to linear, we compute iteratively for a few steps
(13) |
each time replacing in (13) with if or with if instead . With this kind of modified binary search one can find a better adversarial sample with the cost of a few forward passes of the network.
So far all the steps are deterministic. However, in order to improve the results, we introduce the option of random restarts, that is is randomly sampled in proximity of instead of being itself. Most attacks benefit from random restarts MadEtAl2018 ; ZheEtAl2019 , especially when attacking models protected by gradient-masking defenses MosEtAl18 , as they allow a wider exploration of the input space. In particular, we choose to sample from the -sphere centered in the original point with radius half the -norm of the current best adversarial perturbation (or a given threshold if no adversarial example has been found yet).
Our attack, summarized in Algorithm 1, consists of two main operations: the computation of and its gradient and solving the projection problem (2). In particular, we perform, for each iteration, a forward and a backward pass of the network in the gradient step and a forward pass in the backward step. Regarding the projection, we mentioned that its complexity is for an input of dimension HeiAnd2017 and it can be efficiently implemented so that it runs in batches on the GPU. Moreover, it does not depend on the architecture of the network, meaning that, except for shallow models, its cost is by far smaller than the passes through the network. Overall, we can approximate the computational cost of our algorithm by the total number of calls of the classifier
(14) |
One has to add the forward passes for the final search, which we set in the experiments to , that however happens just once.
The idea of exploiting the first order local approximation of the decision boundary is not novel but the basis of one of the first white-box adversarial attacks, DeepFool MooFawFro2016 . While DeepFool and our FAB-attack share the strategy of using a linear approximation of the classifier and doing steps based on projecting on the decision hyperplanes, we want to point out many key differences: first, MooFawFro2016 do not solve the projection problem (2) exactly but rather its simpler version without box constraints and clipping afterwards. Second, their gradient step does not have any bias towards the original point, that is they have always in (10). Third, DeepFool does not have any backward step, final search or restart, as it stops as soon as it finds a point with different classification (to be fair its main goal is to provide quickly an adversarial perturbation of average quality).
We perform an
ablation study of the differences to DeepFool in Figure 2, where we show
the curves of the robust accuracy as a function of the threshold (we define properly the robust accuracy in Section 3 but in the plots lower is better), on two models, for various -attacks. In particular, we present the results of DeepFool (blue) and FAB-attack with the following variations: and no backward step (magenta), (that is no bias in the gradient step) and no restarts (light green), and 100 restarts (dark green), and no restarts (orange) and and 100 restarts, that is FAB-attack, (red). We can see how every addition we make to the original scheme of DeepFool contributes to the significantly improved performance of FAB-attack when compared to the original DeepFool.
We compare our FAB-attack to other state-of-the-art attacks on MNIST, CIFAR-10 CIFAR10
and Restricted ImageNet
TsiEtAl18 . For each dataset we consider a naturally trained model (plain) and two adversarially trained ones as in MadEtAl2018 , one to achieve robustness wrt the -norm (-AT) and the other wrt the -norm (-AT). In particular the plain and -AT models on MNIST are those available at https://github.com/MadryLab/mnist_challenge and consist of two convolutional and two fully-connected layers. The architecture of the CIFAR-10 models, which can be found at https://github.com/fra31/fab-attack, has 8 convolutional layers (with number of filters increasing from 96 to 384) and 2 dense layers, while on Restricted ImageNet we use the models (ResNet-50 HeZhaRen2015 ) from TsiEtAl18 and available at https://github.com/MadryLab/robust-features-code.In Tables 1 to 9 we report the effectiveness of the attacks evaluating upper bounds on the robust accuracy, that is, for a fixed threshold , the percentage of test points for which the attack could not find an adversarial perturbation with norm smaller than . Please note that PGD and DAA cannot evaluate more than one at the same time, so that they need to be rerun multiple times, unlike the other methods for which a single run is sufficient to obtain the full statistics.
We observe that, while in general it is not possible to determine a universal best method and all of them present at least one case where they are far from the optimal performance, FAB-attack is able to compete and even outperform in most of the scenarios algorithms specialized in a specific norm. In particular, on MNIST -AT we see that DAA is the best one wrt , our method is not far, at least with 100 restarts, and at the same time achieves the lowest (lower is better) values for , where with multiple restarts it competes with the very expensive LRA, and for FAB-attack is notably better than EAD.
On the other hand on CIFAR-10 -AT FAB-attack, already with a single restart, outperforms the competing methods in , is comparable to CW and LRA in , and slightly worse with 1 and 10 restarts but better with 100 restarts than EAD in .
On Restricted ImageNet, for and there is a slight prevalence of PGD or DAA, for FAB-attack consistently outperforms the competing methods.
While a comparison of the computational cost of the various methods is not straightforward we tried our best to provide a fair budget to each of them. First, DeepFool and SparseFool are definitely significantly faster than the others as their primary goal is to succeed as soon as possible in finding adversarial examples, without emphasis on minimizing their norms. Second, CW and LRA are rather expensive attacks as noted in CroRauHei2019 . Third, we can roughly compare the runtime of EAD to that of FAB-attack with 25 restarts and 100 iterations. For DAA, we can say that on MNIST, using the same parameters as in ZheEtAl2019 , the runtime of DAA-50 is comparable to that of FAB-attack with 100 restarts and 100 iterations, with the remarkable difference that DAA has to be restarted for every threshold . Finally, each iteration of PGD consists in one forward and one backward pass of the network. If we do not consider the additional but negligible cost of the projection on the decision hyperplane of our method, we can compare PGD and FAB-attack wrt the number of forward and backward passes of the model performed in total (see (14)). From this point of view PGD-100 used for and in the tables, with 40 iterations, requires, for every , roughly the same number of forward/backward passes as FAB-attack with 27 restarts and 100 iterations.
In summary, our attack is based on a solid geometric motivation, very competitive both in quality and in runtime compared to state-of-the-art attacks and at the same time flexible as it can be used for all -norms in which is not the case for most other methods.
In order to further validate the effectiveness of our FAB-attack, we test it on models obtained via the state-of-the-art in training robust models, that is MadEtAl2018 and ZhaEtAl2019 . Since here the main focus is on producing the strongest adversarial examples rather than having an efficient attack, we introduce for these experiments a so-called targeted version of FAB-attack. Explicitly, at step 6 of Algorithm 1, instead of choosing the hyperplane on which we project as the closest one to the current iterate, we always take the one separating the original class from a specified target class , that is we fix . Note that this is not actually a targeted attack since we do not ensure that the final output is classified as , although it is likely to happen. However, in this way we allow an exploration of the input space even in directions which are not usually considered by the standard algorithm, at the price of a higher computational cost as we run the attack for all the possible values of (all the alternative classes).
With this additional option, we could improve the results of the CIFAR-10 challenge at https://github.com/MadryLab/cifar10_challenge, decreasing the robust accuracy of the proposed model to 44.51% at thresholds in -norm, that is 0.2% lower than the best result so far (we performed 10 restarts of FAB-attack with 150 iterations, achieving already 45.28%, plus 5 restarts of the targeted version).
Moreover, we tested a further modification of our algorithm: instead of performing the line search described in Section 2 only once as the last step of the procedure (called final search in Section 2), we apply the line search every time the attack finds a new adversarial example. In this way one has intermediate solutions which lie closer to the decision boundary than in the usual approach, at the cost of a higher runtime. In Tables 10 and 11 in Section A of the Appendix we repeat the experiments of Tables 1 to 9, using the highest number of restarts, with this modified version, showing that in general there is no large improvement. However, in the case of the model of the CIFAR-10 challenge, we could further decrease the robust accuracy to 44.39% with 10 restarts of the untargeted version (which lead to 45.09% of robust accuracy) and 5 restarts of the targeted one (150 iterations) using this modification.
Furthermore, we attacked the models on MNIST and CIFAR-10 presented in ZhaEtAl2019 and available at https://github.com/yaodongyu/TRADES, which are considered more robust than those obtained by the standard adversarial training of MadEtAl2018 . For MNIST, the lowest robust accuracy wrt at so far achieved was 95.60%, while FAB-attack is able to reduce it to 93.33% (100 restarts in the untargeted scenario plus 10 restarts of the targeted version, both with 1000 iterations). A cheaper version of FAB-attack, with 100 iterations, no random restarts and in the usual untargeted version, decreases accuracy to 94.94%. Similarly on CIFAR-10, our attack reduces the robust accuracy at from 56.43% to 53.44% (5 restarts of the untargeted version plus 5 restarts of the targeted version, both with 100 iterations). Notably, already a weaker version of our untargeted attack, with 20 iterations and no restarts, reaches 54.64%, that is almost 2% better than the previous best result.
The models on MNIST achieve the following clean accuracy: plain 98.7%, -AT 98.5%, -AT 98.6%. The models on CIFAR-10 achieve the following clean accuracy: plain 89.2%, -AT 79.4%, -AT 81.2%.
For PGD wrt we use, given a threshold , a step size of , for PGD wrt we perform at each iteration a step in the direction of the gradient of size , for PGD wrt we use the gradient step suggested in TraBon2019 , with size .
For FAB-attack we use the following parameters for all the cases on MNIST and CIFAR-10: , , . On Restricted ImageNet we set , , . Moreover, the value of used for sampling in the case of random restarts is chosen to be comparable to the expected size of the perturbations (for more details see Section A.1 of the Appendix).
Robust accuracy of MNIST -robust model | ||||||||||
metric | DF | DAA-1 | DAA-50 | PGD-1 | PGD-100 | PGD-500 | FAB-1 | FAB-10 | FAB-100 | |
.2 | 95.2 | 94.6 | 93.7 | 94.9 | 93.8 | 93.8 | 94.7 | 94.3 | 93.7 | |
0.25 | 94.7 | 92.7 | 91.1 | 93.9 | 91.5 | 91.3 | 93.3 | 91.9 | 91.6 | |
0.3 | 93.9 | 89.5 | 87.2 | 92.0 | 88.0 | 87.5 | 91.2 | 89.3 | 88.4 | |
0.325 | 92.5 | 72.1 | 64.2 | 82.5 | 72.4 | 70.9 | 86.6 | 83.3 | 81.3 | |
0.35 | 89.8 | 19.7 | 11.7 | 43.4 | 20.4 | 17.5 | 51.3 | 31.3 | 23.0 | |
CW | DF | LRA | PGD-1 | PGD-100 | PGD-500 | FAB-1 | FAB-10 | FAB-100 | ||
1 | 88.8 | 94.6 | 73.6 | 92.6 | 89.8 | 89.4 | 83.6 | 70.8 | 64.6 | |
1.5 | 77.6 | 93.0 | 25.8 | 87.8 | 80.6 | 77.8 | 46.2 | 20.8 | 12.4 | |
2 | 64.4 | 91.6 | 3.2 | 81.0 | 67.2 | 64.0 | 15.6 | 2.2 | 0.2 | |
2.5 | 53.8 | 89.6 | 0.4 | 73.0 | 46.4 | 38.2 | 3.8 | 0.0 | 0.0 | |
3 | 46.8 | 84.6 | 0.0 | 67.6 | 23.0 | 14.6 | 1.4 | 0.0 | 0.0 | |
SparseFool | EAD | PGD-1 | PGD-50 | PGD-100 | FAB-1 | FAB-10 | FAB-100 | |||
2.5 | 96.8 | 92.2 | 94.1 | 93.9 | 93.6 | 83.1 | 66.1 | 54.2 | ||
5 | 96.5 | 76.0 | 91.0 | 89.9 | 88.0 | 59.2 | 26.2 | 14.5 | ||
7.5 | 96.4 | 49.5 | 86.2 | 82.8 | 79.3 | 48.6 | 13.3 | 5.1 | ||
10 | 96.4 | 27.4 | 80.1 | 73.3 | 66.8 | 45.0 | 9.6 | 2.4 | ||
12.5 | 96.4 | 14.6 | 74.2 | 62.0 | 52.7 | 43.1 | 8.0 | 1.1 |
Robust accuracy of MNIST -robust model | ||||||||||
metric | DF | DAA-1 | DAA-50 | PGD-1 | PGD-100 | PGD-500 | FAB-1 | FAB-10 | FAB-100 | |
0.05 | 96.7 | 96.4 | 96.3 | 96.4 | 96.4 | 96.3 | 96.6 | 96.4 | 96.4 | |
0.1 | 93.4 | 91.0 | 90.2 | 91.0 | 90.4 | 90.4 | 90.6 | 90.6 | 90.4 | |
0.15 | 86.4 | 74.3 | 72.3 | 75.2 | 72.8 | 72.5 | 73.4 | 72.2 | 71.8 | |
0.2 | 73.8 | 34.5 | 27.2 | 36.8 | 27.5 | 25.9 | 34.0 | 28.8 | 26.0 | |
0.25 | 55.1 | 1.5 | 0.9 | 2.9 | 0.9 | 0.9 | 2.2 | 1.0 | 0.8 | |
CW | DF | LRA | PGD-1 | PGD-100 | PGD-500 | FAB-1 | FAB-10 | FAB-100 | ||
1 | 92.6 | 93.8 | 92.6 | 93.0 | 93.0 | 93.0 | 92.8 | 92.6 | 92.6 | |
1.5 | 84.8 | 87.2 | 83.4 | 84.2 | 83.4 | 83.4 | 83.8 | 83.6 | 83.6 | |
2 | 70.6 | 79.0 | 68.0 | 70.2 | 68.2 | 67.8 | 70.0 | 69.2 | 68.0 | |
2.5 | 46.4 | 67.4 | 41.6 | 47.2 | 39.4 | 38.2 | 46.8 | 42.4 | 38.6 | |
3 | 17.2 | 54.2 | 11.2 | 22.8 | 10.6 | 10.4 | 20.0 | 12.6 | 11.6 | |
SparseFool | EAD | PGD-1 | PGD-50 | PGD-100 | FAB-1 | FAB-10 | FAB-100 | |||
5 | 94.9 | 89.8 | 90.2 | 90.2 | 90.2 | 90.5 | 90.4 | 90.1 | ||
8.75 | 89.1 | 71.2 | 75.6 | 73.7 | 72.9 | 75.4 | 74.0 | 72.6 | ||
12.5 | 81.0 | 45.9 | 60.0 | 56.8 | 54.8 | 55.5 | 49.4 | 45.9 | ||
16.25 | 72.8 | 20.6 | 46.6 | 40.0 | 35.7 | 33.6 | 24.8 | 20.7 | ||
20 | 60.8 | 8.3 | 37.2 | 26.3 | 20.7 | 16.1 | 9.6 | 7.7 |
Robust accuracy of MNIST plain model | ||||||||||
metric | DF | DAA-1 | DAA-50 | PGD-1 | PGD-100 | PGD-500 | FAB-1 | FAB-10 | FAB-100 | |
0.03 | 93.2 | 91.9 | 91.9 | 92.0 | 91.9 | 91.9 | 93.8 | 93.6 | 93.6 | |
0.05 | 83.4 | 78.2 | 76.7 | 77.1 | 75.6 | 75.2 | 77.8 | 77.4 | 77.2 | |
0.07 | 61.5 | 59.8 | 56.3 | 45.0 | 41.8 | 41.5 | 45.2 | 43.4 | 42.8 | |
0.09 | 33.2 | 46.7 | 41.0 | 17.6 | 14.3 | 13.7 | 16.4 | 15.4 | 15.2 | |
0.11 | 13.1 | 34.4 | 26.2 | 4.2 | 2.8 | 2.7 | 3.4 | 2.8 | 2.4 | |
CW | DF | LRA | PGD-1 | PGD-100 | PGD-500 | FAB-1 | FAB-10 | FAB-100 | ||
0.5 | 92.6 | 93.6 | 92.6 | 92.6 | 92.6 | 92.6 | 92.6 | 92.6 | 92.6 | |
1 | 47.4 | 58.6 | 47.4 | 49.6 | 48.2 | 47.6 | 47.2 | 46.8 | 46.8 | |
1.5 | 8.8 | 19.8 | 7.8 | 10.0 | 10.0 | 10.0 | 8.0 | 7.4 | 7.4 | |
2 | 0.6 | 1.8 | 0.2 | 0.6 | 0.6 | 0.6 | 0.2 | 0.2 | 0.2 | |
2.5 | 0.0 | 0.0 | 0.0 | 0.2 | 0.2 | 0.2 | 0.0 | 0.0 | 0.0 | |
SparseFool | EAD | PGD-1 | PGD-50 | PGD-100 | FAB-1 | FAB-10 | FAB-100 | |||
2 | 95.5 | 93.6 | 94.2 | 93.8 | 93.6 | 94.1 | 93.6 | 93.5 | ||
4 | 88.9 | 76.7 | 80.3 | 77.5 | 76.2 | 80.0 | 76.1 | 75.4 | ||
6 | 75.8 | 48.1 | 55.9 | 49.8 | 46.7 | 54.2 | 47.1 | 43.0 | ||
8 | 60.3 | 26.6 | 36.6 | 28.5 | 25.2 | 30.9 | 23.9 | 22.8 | ||
10 | 43.8 | 11.2 | 25.8 | 15.4 | 12.7 | 15.1 | 10.3 | 8.2 |
Robust accuracy of CIFAR-10 -robust model | ||||||||||
metric | DF | DAA-1 | DAA-50 | PGD-1 | PGD-100 | PGD-500 | FAB-1 | FAB-10 | FAB-100 | |
64.1 | 67.2 | 66.3 | 62.6 | 62.6 | 62.6 | 62.7 | 62.6 | 62.6 | ||
49.0 | 65.0 | 62.8 | 45.7 | 45.1 | 45.0 | 44.6 | 44.4 | 44.2 | ||
36.9 | 64.2 | 60.8 | 33.5 | 32.0 | 31.6 | 27.6 | 26.9 | 26.8 | ||
25.8 | 62.3 | 58.0 | 26.1 | 23.9 | 23.5 | 15.3 | 14.2 | 13.7 | ||
17.6 | 61.9 | 54.8 | 20.4 | 17.7 | 17.3 | 8.8 | 8.1 | 8.0 | ||
CW | DF | LRA | PGD-1 | PGD-100 | PGD-500 | FAB-1 | FAB-10 | FAB-100 | ||
0.25 | 66.0 | 67.0 | 65.6 | 65.8 | 65.6 | 65.6 | 65.6 | 65.6 | 65.6 | |
0.5 | 48.2 | 53.8 | 47.8 | 50.2 | 49.6 | 49.2 | 48.4 | 48.2 | 48.0 | |
0.75 | 32.6 | 42.2 | 32.4 | 46.4 | 43.8 | 43.4 | 32.8 | 32.2 | 32.2 | |
1 | 21.6 | 30.0 | 21.6 | 44.8 | 42.2 | 41.4 | 21.8 | 21.6 | 21.0 | |
1.25 | 11.4 | 22.4 | 12.4 | 44.4 | 40.8 | 39.6 | 12.4 | 12.2 | 11.8 | |
SparseFool | EAD | PGD-1 | PGD-50 | PGD-100 | FAB-1 | FAB-10 | FAB-100 | |||
3 | 69.5 | 62.2 | 58.1 | 57.9 | 57.9 | 63.7 | 63.1 | 63.0 | ||
6 | 61.6 | 45.5 | 44.2 | 43.1 | 42.9 | 49.1 | 47.6 | 46.1 | ||
9 | 53.1 | 27.7 | 32.4 | 30.9 | 30.1 | 33.6 | 30.4 | 29.0 | ||
12 | 44.4 | 17.9 | 25.0 | 24.3 | 23.1 | 24.0 | 19.8 | 17.8 | ||
15 | 37.0 | 10.4 | 20.6 | 18.2 | 16.6 | 16.5 | 12.4 | 10.7 | ||
Robust accuracy of CIFAR-10 plain model | ||||||||||
metric | DF | DAA-1 | DAA-50 | PGD-1 | PGD-100 | PGD-500 | FAB-1 | FAB-10 | FAB-100 | |
62.6 | 65.7 | 64.1 | 56.5 | 56.1 | 56.1 | 57.1 | 56.1 | 56.0 | ||
49.3 | 63.2 | 60.8 | 40.3 | 38.6 | 38.3 | 39.8 | 38.2 | 37.7 | ||
37.3 | 62.4 | 58.5 | 26.0 | 23.9 | 23.2 | 24.2 | 22.5 | 21.5 | ||
26.4 | 61.2 | 56.3 | 17.1 | 14.7 | 14.6 | 14.4 | 12.4 | 12.1 | ||
19.0 | 60.2 | 54.4 | 11.9 | 9.8 | 9.5 | 7.3 | 5.9 | 5.7 | ||
CW | DF | LRA | PGD-1 | PGD-100 | PGD-500 | FAB-1 | FAB-10 | FAB-100 | ||
0.1 | 69.4 | 72.2 | 69.0 | 68.4 | 67.6 | 67.6 | 68.6 | 68.6 | 68.4 | |
0.15 | 55.4 | 62.6 | 55.0 | 54.6 | 54.2 | 54.2 | 54.6 | 54.0 | 53.8 | |
0.2 | 43.4 | 51.2 | 43.4 | 45.2 | 44.0 | 43.4 | 43.2 | 42.4 | 42.2 | |
0.3 | 21.6 | 33.8 | 22.0 | 31.2 | 29.4 | 29.2 | 21.6 | 21.0 | 20.6 | |
0.4 | 9.4 | 20.8 | 9.8 | 26.8 | 23.8 | 23.0 | 10.0 | 8.2 | 8.0 | |
SparseFool | EAD | PGD-1 | PGD-50 | PGD-100 | FAB-1 | FAB-10 | FAB-100 | |||
2 | 72.1 | 54.7 | 54.9 | 53.9 | 53.6 | 55.5 | 52.9 | 50.5 | ||
4 | 58.6 | 24.1 | 27.4 | 26.0 | 25.3 | 31.0 | 25.3 | 23.2 | ||
6 | 45.6 | 8.9 | 14.7 | 12.3 | 11.6 | 15.7 | 10.4 | 8.3 | ||
8 | 34.3 | 3.0 | 7.7 | 6.2 | 5.6 | 7.2 | 3.4 | 2.4 | ||
10 | 27.2 | 0.7 | 4.9 | 3.5 | 2.9 | 3.3 | 1.2 | 0.8 | ||
Robust accuracy of CIFAR-10 -robust model | ||||||||||
metric | DF | DAA-1 | DAA-50 | PGD-1 | PGD-100 | PGD-500 | FAB-1 | FAB-10 | FAB-100 | |
66.8 | 66.9 | 66.3 | 65.5 | 65.5 | 65.5 | 65.8 | 65.8 | 65.7 | ||
53.2 | 63.8 | 61.4 | 50.2 | 49.3 | 49.2 | 49.5 | 49.1 | 49.0 | ||
42.9 | 63.1 | 58.4 | 38.3 | 37.3 | 37.1 | 35.7 | 35.0 | 34.7 | ||
32.9 | 61.2 | 56.3 | 31.0 | 29.2 | 28.8 | 24.4 | 23.4 | 23.3 | ||
24.5 | 59.8 | 54.1 | 24.9 | 21.6 | 21.4 | 15.7 | 14.7 | 14.5 | ||
CW | DF | LRA | PGD-1 | PGD-100 | PGD-500 | FAB-1 | FAB-10 | FAB-100 | ||
0.25 | 64.6 | 67.0 | 64.4 | 64.4 | 64.4 | 64.4 | 64.8 | 64.4 | 64.4 | |
0.5 | 48.4 | 53.0 | 48.8 | 50.2 | 49.2 | 49.2 | 48.6 | 48.6 | 48.2 | |
0.75 | 33.4 | 41.4 | 33.4 | 44.8 | 43.2 | 42.8 | 33.8 | 33.4 | 33.2 | |
1 | 22.8 | 32.6 | 22.8 | 43.0 | 41.0 | 40.6 | 22.8 | 22.2 | 22.0 | |
1.25 | 12.0 | 24.2 | 13.0 | 41.8 | 40.0 | 38.8 | 12.8 | 12.0 | 11.4 | |
SparseFool | EAD | PGD-1 | PGD-50 | PGD-100 | FAB-1 | FAB-10 | FAB-100 | |||
5 | 57.8 | 36.8 | 65.5 | 65.5 | 65.5 | 43.8 | 40.2 | 38.6 | ||
8.75 | 44.7 | 19.2 | 50.2 | 49.5 | 49.3 | 26.3 | 23.1 | 21.0 | ||
12.5 | 34.9 | 7.1 | 38.3 | 38.0 | 37.3 | 14.6 | 10.8 | 8.9 | ||
16.25 | 27.6 | 3.0 | 31.0 | 29.9 | 29.2 | 6.8 | 4.7 | 3.8 | ||
20 | 20.2 | 0.9 | 24.9 | 22.4 | 21.6 | 4.3 | 1.9 | 1.2 | ||
Robust accuracy of Restricted ImageNet plain model | ||||||||
metric | DF | DAA-1 | DAA-10 | PGD-1 | PGD-50 | FAB-1 | FAB-10 | |
76.6 | 74.8 | 74.8 | 74.8 | 74.8 | 78.0 | 76.8 | ||
52.0 | 51.8 | 48.2 | 38.8 | 38.4 | 49.6 | 46.2 | ||
26.8 | 46.0 | 41.0 | 13.2 | 12.8 | 28.0 | 19.8 | ||
11.2 | 43.2 | 39.4 | 4.0 | 4.0 | 17.4 | 10.2 | ||
5.0 | 41.2 | 38.2 | 1.2 | 1.2 | 12.4 | 5.4 | ||
DF | PGD-1 | PGD-50 | FAB-1 | FAB-10 | ||||
0.2 | 80.2 | 76.6 | 76.4 | 76.2 | 76.2 | |||
0.4 | 58.4 | 41.0 | 40.8 | 41.6 | 41.4 | |||
0.6 | 33.8 | 16.4 | 16.2 | 18.0 | 16.8 | |||
0.8 | 18.8 | 4.2 | 4.2 | 4.6 | 4.4 | |||
1 | 8.6 | 1.8 | 1.8 | 1.2 | 1.0 | |||
SparseFool | PGD-1 | PGD-20 | FAB-1 | FAB-10 | ||||
5 | 88.6 | 83.8 | 83.8 | 79.2 | 76.6 | |||
16 | 80.0 | 51.6 | 51.2 | 47.0 | 40.0 | |||
27 | 70.6 | 23.8 | 23.2 | 27.8 | 19.4 | |||
38 | 65.0 | 8.4 | 8.2 | 14.2 | 8.4 | |||
49 | 55.4 | 3.8 | 3.6 | 9.4 | 3.8 | |||
Robust accuracy of Restricted ImageNet -robust model | ||||||||
metric | DF | DAA-1 | DAA-10 | PGD-1 | PGD-50 | FAB-1 | FAB-10 | |
75.8 | 75.0 | 75.0 | 75.0 | 75.0 | 75.0 | 74.6 | ||
53.0 | 46.2 | 46.2 | 46.6 | 46.2 | 47.8 | 46.6 | ||
32.4 | 24.6 | 23.8 | 21.0 | 20.4 | 23.6 | 21.6 | ||
19.4 | 17.0 | 14.6 | 7.0 | 7.0 | 10.0 | 7.2 | ||
10.8 | 12.8 | 11.6 | 1.4 | 1.2 | 3.0 | 1.4 | ||
DF | PGD-1 | PGD-50 | FAB-1 | FAB-10 | ||||
1 | 79.4 | 76.6 | 76.6 | 76.2 | 76.2 | |||
2 | 65.0 | 48.2 | 47.0 | 49.4 | 48.8 | |||
3 | 46.8 | 23.6 | 22.6 | 24.0 | 23.2 | |||
4 | 32.8 | 9.6 | 8.8 | 10.6 | 10.2 | |||
5 | 20.4 | 3.2 | 3.0 | 3.4 | 3.2 | |||
SparseFool | PGD-1 | PGD-20 | FAB-1 | FAB-10 | ||||
15 | 81.8 | 72.4 | 72.4 | 69.4 | 66.8 | |||
25 | 76.4 | 61.6 | 61.2 | 55.0 | 52.6 | |||
40 | 71.4 | 47.2 | 46.6 | 41.6 | 37.2 | |||
60 | 63.2 | 33.0 | 32.6 | 29.4 | 24.4 | |||
100 | 49.2 | 14.0 | 13.6 | 15.8 | 10.8 |
Robust accuracy of Restricted ImageNet -robust model | ||||||||
metric | DF | DAA-1 | DAA-10 | PGD-1 | PGD-50 | FAB-1 | FAB-10 | |
74.4 | 73.0 | 73.0 | 73.0 | 73.0 | 73.8 | 73.4 | ||
49.0 | 39.6 | 39.2 | 38.6 | 38.6 | 39.4 | 38.6 | ||
27.4 | 22.6 | 21.0 | 14.4 | 14.2 | 15.4 | 14.2 | ||
13.8 | 18.6 | 16.8 | 4.6 | 4.2 | 5.6 | 3.4 | ||
6.6 | 15.6 | 13.8 | 0.8 | 0.8 | 2.8 | 0.8 | ||
DF | PGD-1 | PGD-50 | FAB-1 | FAB-10 | ||||
2 | 74.2 | 71.8 | 71.6 | 71.4 | 71.4 | |||
3 | 61.6 | 51.6 | 51.6 | 51.4 | 51.4 | |||
4 | 45.6 | 32.4 | 32.0 | 33.0 | 32.4 | |||
5 | 34.6 | 21.0 | 20.6 | 21.4 | 21.2 | |||
6 | 25.2 | 10.2 | 10.2 | 11.0 | 10.8 | |||
SparseFool | PGD-1 | PGD-20 | FAB-1 | FAB-10 | ||||
50 | 85.4 | 82.6 | 82.2 | 78.6 | 78.4 | |||
100 | 79.6 | 67.4 | 67.0 | 61.6 | 58.8 | |||
150 | 74.4 | 51.2 | 50.8 | 46.2 | 43.2 | |||
200 | 68.6 | 38.0 | 37.8 | 32.4 | 29.2 | |||
250 | 60.0 | 25.8 | 25.0 | 23.8 | 19.8 | |||
Decision-based adversarial attacks: Reliable attacks against black-box machine learning models.
In ICLR, 2018.ACM Workshop on Artificial Intelligence and Security
, 2017.Towards deep learning models resistant to adversarial attacks.
In ICLR, 2018.Robustness may be at odds with accuracy.
In ICLR, 2019.In Section 3.1 we have introduced a modified version of FAB-attack where we apply the final search not just at the end of the procedure but every time a new adversarial example is found. This comes with an extra computational cost of 3 forward passes of the network each time the search is performed. We want here to check whether this variant improves the results on the other models used in Section 3.
In Tables 10 and 11 we compare the robust accuracy at the different thresholds obtained by the usual FAB-attack without the intermediate search (wo/IS), see Tables 1 to 9
, to that of the version of FAB-attack with the intermediate search (w/IS), with the same number of iterations and restarts. We can see that overall there is no significant improvement given by the intermediate search, beyond the variance due to the random restarts.
Robust accuracy by FAB-100 with intermediate search | ||||||||||||
MNIST | CIFAR-10 | |||||||||||
plain | -AT | -AT | plain | -AT | -AT | |||||||
metric | wo/IS | w/IS | wo/IS | w/IS | wo/IS | w/IS | wo/IS | w/IS | wo/IS | w/IS | wo/IS | w/IS |
93.6 | 92.0 | 93.7 | 94.0 | 96.4 | 96.3 | 56.0 | 55.9 | 65.7 | 65.7 | 62.6 | 62.6 | |
77.2 | 76.7 | 91.6 | 91.6 | 90.4 | 90.3 | 37.7 | 37.9 | 49.0 | 48.9 | 44.2 | 44.2 | |
42.8 | 43.2 | 88.4 | 88.5 | 71.8 | 72.1 | 21.5 | 21.5 | 34.7 | 34.8 | 26.8 | 26.7 | |
15.2 | 14.6 | 81.3 | 80.8 | 26.0 | 25.5 | 12.1 | 12.0 | 23.3 | 23.5 | 13.7 | 13.8 | |
2.4 | 2.7 | 23.0 | 23.4 | 0.8 | 0.9 | 5.7 | 5.6 | 14.5 | 14.6 | 8.0 | 8.0 | |
92.6 | 92.6 | 64.6 | 66.0 | 92.6 | 92.6 | 68.4 | 68.4 | 64.4 | 64.4 | 65.6 | 65.6 | |
46.8 | 46.8 | 12.4 | 13.2 | 83.6 | 83.6 | 53.8 | 54.0 | 48.2 | 48.2 | 48.0 | 48.0 | |
7.4 | 7.4 | 0.2 | 0.2 | 68.0 | 68.0 | 42.2 | 42.0 | 33.2 | 33.2 | 32.2 | 32.2 | |
0.2 | 0.2 | 0.0 | 0.0 | 38.6 | 39.6 | 20.6 | 20.4 | 22.0 | 22.0 | 21.0 | 21.2 | |
0.0 | 0.0 | 0.0 | 0.0 | 11.6 | 11.6 | 8.0 | 8.0 | 11.4 | 11.6 | 11.8 | 12.0 | |
93.5 | 93.6 | 54.2 | 53.3 | 90.1 | 90.0 | 50.5 | 50.5 | 38.6 | 38.4 | 63.0 | 63.0 | |
75.4 | 75.0 | 14.5 | 14.3 | 72.6 | 72.4 | 23.2 | 22.6 | 21.0 | 20.7 | 46.1 | 46.1 | |
43.0 | 43.6 | 5.1 | 4.0 | 45.9 | 45.8 | 8.3 | 8.6 | 8.9 | 9.2 | 29.0 | 28.9 | |
22.8 | 22.5 | 2.4 | 2.0 | 20.7 | 20.4 | 2.4 | 2.6 | 3.8 | 3.9 | 17.8 | 17.8 | |
8.2 | 8.3 | 1.1 | 1.3 | 7.7 | 7.6 | 0.8 | 1.0 | 1.2 | 1.3 | 10.7 | 11.0 | |
Robust accuracy by FAB-10 with intermediate search | ||||||
Restricted ImageNet | ||||||
plain | -AT | -AT | ||||
metric | wo/IS | w/IS | wo/IS | w/IS | wo/IS | w/IS |
76.8 | 77.2 | 74.6 | 74.6 | 73.4 | 73.2 | |
46.2 | 46.6 | 46.6 | 46.2 | 38.6 | 38.4 | |
19.8 | 21.4 | 21.6 | 21.4 | 14.2 | 14.2 | |
10.2 | 10.8 | 7.2 | 7.2 | 3.4 | 3.8 | |
5.4 | 5.8 | 1.4 | 1.4 | 0.8 | 0.8 | |
76.2 | 76.2 | 76.2 | 76.2 | 71.4 | 71.4 | |
41.4 | 41.2 | 48.8 | 48.8 | 51.4 | 51.2 | |
16.8 | 16.8 | 23.2 | 22.8 | 32.4 | 32.2 | |
4.4 | 4.4 | 10.2 | 10.0 | 21.2 | 21.0 | |
1 | 1.2 | 3.2 | 3.2 | 10.8 | 10.6 | |
76.6 | 77.2 | 66.8 | 67.2 | 78.4 | 78.4 | |
40 | 39.6 | 52.6 | 51.4 | 58.8 | 58.8 | |
19.4 | 19.4 | 37.2 | 36.8 | 43.2 | 42.8 | |
8.4 | 8.2 | 24.4 | 23.8 | 29.2 | 28.8 | |
3.8 | 4.0 | 10.8 | 10.8 | 19.8 | 20.4 | |
When using random restarts, FAB-attack needs a value for the parameters . It represents the radius of the -ball around the original point inside which we sample the starting point of the algorithm, at least until a sufficiently small adversarial perturbation is found (see Algorithm 1). We use the values of reported in Table 12. Note however that the attack usually finds at the first run an adversarial perturbation small enough so that in practice rarely comes into play.
values of used for random restarts | |||||||||
MNIST | CIFAR-10 | Restricted ImageNet | |||||||
metric | plain | -AT | -AT | plain | -AT | -AT | plain | -AT | -AT |
0.15 | 0.3 | 0.3 | 0.0 | 0.02 | 0.02 | 0.02 | 0.08 | 0.08 | |
2.0 | 2.0 | 2.0 | 0.5 | 4.0 | 4.0 | 5.0 | 5.0 | 5.0 | |
40.0 | 40.0 | 40.0 | 10.0 | 10.0 | 10.0 | 100.0 | 250.0 | 250.0 | |
Comments
There are no comments yet.