1 Introduction
Deep neural networks (DNNs) are challenged by their vulnerability to adversarial examples [23, 5], which are crafted by adding small, humanimperceptible noises to legitimate examples, but make a model output attackerdesired inaccurate predictions. It has garnered an increasing attention to generating adversarial examples since it helps to identify the vulnerability of the models before they are launched. Besides, adversarial samples also facilitate various DNN algorithms to assess the robustness by providing more varied training data [5, 10].
With the knowledge of the structure and parameters of a given model, many methods can successfully generate adversarial examples in the whitebox manner, including optimizationbased methods such as boxconstrained LBFGS [23], onestep gradientbased methods such as fast gradient sign [5] and iterative variants of gradientbased methods [9]. In general, a more severe issue of adversarial examples is their good transferability [23, 12, 14], i.e
., the adversarial examples crafted for one model remain adversarial for others, thus making blackbox attacks practical in realworld applications and posing real security issues. The phenomenon of transferability is due to the fact that different machine learning models learn similar decision boundaries around a data point, making the adversarial examples crafted for one model also effective for others.
However, existing attack methods exhibit low efficacy when attacking blackbox models, especially for those with a defense mechanism. For example, ensemble adversarial training [24] significantly improves the robustness of deep neural networks and most of existing methods cannot successfully attack them in the blackbox manner. This fact largely attributes to the tradeoff between the attack ability and the transferability. In particular, the adversarial examples generated by optimizationbased and iterative methods have poor transferability [10], and thus make blackbox attacks less effective. On the other hand, onestep gradientbased methods generate more transferable adversarial examples, however they usually have a low success rate for the whitebox model [10], making it ineffective for blackbox attacks. Given the difficulties of practical blackbox attacks, Papernot et al. [16]
use adaptive queries to train a surrogate model to fully characterize the behavior of the target model and therefore turn the blackbox attacks to whitebox attacks. However, it requires the full prediction confidences given by the target model and tremendous number of queries, especially for large scale datasets such as ImageNet
[19]. Such requirements are impractical in realworld applications. Therefore, we consider how to effectively attack a blackbox model without knowing its architecture and parameters, and further, without querying.In this paper, we propose a broad class of momentum iterative gradientbased
methods to boost the success rates of the generated adversarial examples. Beyond iterative gradientbased methods that iteratively perturb the input with the gradients to maximize the loss function
[5], momentumbased methods accumulate a velocity vector in the gradient direction of the loss function across iterations, for the purpose of stabilizing update directions and escaping from poor local maxima. We show that the adversarial examples generated by momentum iterative methods have higher success rates in both whitebox and blackbox attacks. The proposed methods alleviate the tradeoff between the whitebox attacks and the transferability, and act as a stronger attack algorithm than onestep methods
[5] and vanilla iterative methods [9].To further improve the transferability of adversarial examples, we study several approaches for attacking an ensemble of models, because if an adversarial example fools multiple models, it is more likely to remain adversarial for other blackbox models [12]. We show that the adversarial examples generated by the momentum iterative methods for multiple models, can successfully fool robust models obtained by ensemble adversarial training [24] in the blackbox manner. The findings in this paper raise new security issues for developing more robust deep learning models, with a hope that our attacks will be used as a benchmark to evaluate the robustness of various deep learning models and defense methods. In summary, we make the following contributions:

We introduce a class of attack algorithms called momentum iterative gradientbased methods, in which we accumulate gradients of the loss function at each iteration to stabilize optimization and escape from poor local maxima.

We study several ensemble approaches to attack multiple models simultaneously, which demonstrates a powerful capability of transferability by preserving a high success rate of attacks.

We are the first to show that the models obtained by ensemble adversarial training with a powerful defense ability are also vulnerable to the blackbox attacks.
2 Backgrounds
In this section, we provide the background knowledge as well as review the related works about adversarial attack and defense methods. Given a classifier
that outputs a label as the prediction for an input , the goal of adversarial attacks is to seek an example in the vicinity of but is misclassified by the classifier. Specifically, there are two classes of adversarial examples—nontargeted and targeted ones. For a correctly classified input with groundtruth label such that , a nontargeted adversarial example is crafted by adding small noise to without changing the label, but misleads the classifier as ; and a targeted adversarial example aims to fool the classifier by outputting a specific label as , where is the target label specified by the adversary, and . In most cases, the norm of the adversarial noise is required to be less than an allowed value as , where could be .2.1 Attack methods
Existing approaches for generating adversarial examples can be categorized into three groups. We introduce their nontargeted version of attacks here, and the targeted version can be simply derived.
Onestep gradientbased approaches, such as the fast gradient sign method (FGSM) [5], find an adversarial example by maximizing the loss function , where is often the crossentropy loss. FGSM generates adversarial examples to meet the norm bound as
(1) 
where is the gradient of the loss function w.r.t. . The fast gradient method (FGM) is a generalization of FGSM to meet the norm bound as
(2) 
Iterative methods [9] iteratively apply fast gradient multiple times with a small step size . The iterative version of FGSM (IFGSM) can be expressed as:
(3) 
To make the generated adversarial examples satisfy the (or ) bound, one can clip into the vicinity of or simply set with being the number of iterations. It has been shown that iterative methods are stronger whitebox adversaries than onestep methods at the cost of worse transferability [10, 24].
Optimizationbased methods [23] directly optimize the distance between the real and adversarial examples subject to the misclassification of adversarial examples. Boxconstrained LBFGS can be used to solve such a problem. A more sophisticated way [1] is solving:
(4) 
Since it directly optimizes the distance between an adversarial example and the corresponding real example, there is no guarantee that the () distance is less than the required value. Optimizationbased methods also lack the efficacy in blackbox attacks just like iterative methods.
2.2 Defense methods
Among many attempts [13, 3, 15, 10, 24, 17, 11], adversarial training is the most extensively investigated way to increase the robustness of DNNs [5, 10, 24]. By injecting adversarial examples into the training procedure, the adversarially trained models learn to resist the perturbations in the gradient direction of the loss function. However, they do not confer robustness to blackbox attacks due to the coupling of the generation of adversarial examples and the parameters being trained. Ensemble adversarial training [24] augments the training data with the adversarial samples produced not only from the model being trained, but also from other holdout models. Therefore, the ensemble adversarially trained models are robust against onestep attacks and blackbox attacks.
3 Methodology
In this paper, we propose a broad class of momentum iterative gradientbased methods to generate adversarial examples, which can fool whitebox models as well as blackbox models. In this section, we elaborate the proposed algorithms. We first illustrate how to integrate momentum into iterative FGSM, which induces a momentum iterative fast gradient sign method (MIFGSM) to generate adversarial examples satisfying the norm restriction in the nontargeted attack fashion. We then present several methods on how to efficiently attack an ensemble of models. Finally, we extend MIFGSM to norm bound and targeted attacks, yielding a broad class of attack methods.
3.1 Momentum iterative fast gradient sign method
The momentum method [18] is a technique for accelerating gradient descent algorithms by accumulating a velocity vector in the gradient direction of the loss function across iterations. The memorization of previous gradients helps to barrel through narrow valleys, small humps and poor local minima or maxima [4]
. The momentum method also shows its effectiveness in stochastic gradient descent to stabilize the updates
[20]. We apply the idea of momentum to generate adversarial examples and obtain tremendous benefits.To generate a nontargeted adversarial example from a real example , which satisfies the norm bound, gradientbased approaches seek the adversarial example by solving the constrained optimization problem
(5) 
where is the size of adversarial perturbation. FGSM generates an adversarial example by applying the sign of the gradient to a real example only once (in Eq. (1)) by the assumption of linearity of the decision boundary around the data point. However in practice, the linear assumption may not hold when the distortion is large [12], which makes the adversarial example generated by FGSM “underfits” the model, limiting its attack ability. In contrast, iterative FGSM greedily moves the adversarial example in the direction of the sign of the gradient in each iteration (in Eq. (3)). Therefore, the adversarial example can easily drop into poor local maxima and “overfit” the model, which is not likely to transfer across models.
(6) 
(7) 
In order to break such a dilemma, we integrate momentum into the iterative FGSM for the purpose of stabilizing update directions and escaping from poor local maxima. Therefore, the momentumbased method remains the transferability of adversarial examples when increasing iterations, and at the same time acts as a strong adversary for the whitebox models like iterative FGSM. It alleviates the tradeoff between the attack ability and the transferability, demonstrating strong blackbox attacks.
The momentum iterative fast gradient sign method (MIFGSM) is summarized in Algorithm 1. Specifically, gathers the gradients of the first iterations with a decay factor , defined in Eq. (6). Then the adversarial example until the th iteration is perturbed in the direction of the sign of with a step size in Eq. (7). If equals to , MIFGSM degenerates to the iterative FGSM. In each iteration, the current gradient is normalized by the distance (any distance measure is feasible) of itself, because we notice that the scale of the gradients in different iterations varies in magnitude.
3.2 Attacking ensemble of models
In this section, we study how to attack an ensemble of models efficiently. Ensemble methods have been broadly adopted in researches and competitions for enhancing the performance and improving the robustness [6, 8, 2]. The idea of ensemble can also be applied to adversarial attacks, due to the fact that if an example remains adversarial for multiple models, it may capture an intrinsic direction that always fools these models and is more likely to transfer to other models at the same time [12], thus enabling powerful blackbox attacks.
We propose to attack multiple models whose logit activations^{1}^{1}1Logits are the input values to softmax. are fused together, and we call this method ensemble in logits
. Because the logits capture the logarithm relationships between the probability predictions, an ensemble of models fused by logits aggregates the fine detailed outputs of all models, whose vulnerability can be easily discovered. Specifically, to attack an ensemble of
models, we fuse the logits as(8) 
where are the logits of the th model, is the ensemble weight with and . The loss function is defined as the softmax crossentropy loss given the groundtruth label and the logits
(9) 
where
is the onehot encoding of
. We summarize the MIFGSM algorithm for attacking multiple models whose logits are averaged in Algorithm 2.For comparison, we also introduce two alternative ensemble schemes, one of which is already studied [12]. Specifically, models can be averaged in predictions [12] as , where is the predicted probability of the th model given input . models can also be averaged in loss as . In these three methods, the only difference is where to combine the outputs of multiple models, but they result in different attack abilities. We empirically find that the ensemble in logits performs better than the ensemble in predictions and the ensemble in loss, among various attack methods and various models in the ensemble, which will be demonstrated in Sec. 4.3.1.

Attack  Incv3  Incv4  IncResv2  Res152  Incv3_{ens3}  Incv3_{ens4}  IncResv2_{ens} 
Incv3 
FGSM  72.3  28.2  26.2  25.3  11.3  10.9  4.8 
IFGSM  100.0  22.8  19.9  16.2  7.5  6.4  4.1  
MIFGSM  100.0  48.8  48.0  35.6  15.1  15.2  7.8  
Incv4  FGSM  32.7  61.0  26.6  27.2  13.7  11.9  6.2 
IFGSM  35.8  99.9  24.7  19.3  7.8  6.8  4.9  
MIFGSM  65.6  99.9  54.9  46.3  19.8  17.4  9.6  
IncResv2  FGSM  32.6  28.1  55.3  25.8  13.1  12.1  7.5 
IFGSM  37.8  20.8  99.6  22.8  8.9  7.8  5.8  
MIFGSM  69.8  62.1  99.5  50.6  26.1  20.9  15.7  
Res152  FGSM  35.0  28.2  27.5  72.9  14.6  13.2  7.5 
IFGSM  26.7  22.7  21.2  98.6  9.3  8.9  6.2  
MIFGSM  53.6  48.9  44.7  98.5  22.1  21.7  12.9  

3.3 Extensions
The momentum iterative methods can be easily generalized to other attack settings. By replacing the current gradient with the accumulated gradient of all previous steps, any iterative method can be extended to its momentum variant. Here we introduce the methods for generating adversarial examples in terms of the norm bound attacks and the targeted attacks.
To find an adversarial examples within the vicinity of a real example measured by distance as , the momentum variant of iterative fast gradient method (MIFGM) can be written as
(10) 
where is defined in Eq. (6) and with standing for the total number of iterations.
For targeted attacks, the objective for finding an adversarial example misclassified as a target class is to minimize the loss function . The accumulated gradient is derived as
(11) 
The targeted MIFGSM with an norm bound is
(12) 
and the targeted MIFGM with an norm bound is
(13) 
Therefore, we introduce a broad class of momentum iterative methods for attacks in various settings, whose effectiveness is demonstrated in Sec. 4.
4 Experiments
In this section, we conduct extensive experiments on the ImageNet dataset [19] to validate the effectiveness of the proposed methods. We first specify the experimental settings in Sec. 4.1. Then we report the results for attacking a single model in Sec. 4.2 and an ensemble of models in Sec. 4.3. Our methods won both the NIPS 2017 Nontargeted and Targeted Adversarial Attack competitions, with the configurations introduced in Sec. 4.4.
4.1 Setup
We study seven models, four of which are normally trained models—Inception v3 (Incv3) [22], Inception v4 (Incv4), Inception Resnet v2 (IncResv2) [21], Resnet v2152 (Res152) [7] and the other three of which are trained by ensemble adversarial training—Incv3_{ens3}, Incv3_{ens4}, IncResv2_{ens} [24]. We will simply call the last three models as “adversarially trained models” without ambiguity.
It is less meaningful to study the success rates of attacks if the models cannot classify the original image correctly. Therefore, we randomly choose images belonging to the categories from the ILSVRC 2012 validation set, which are all correctly classified by them.
In our experiments, we compare our methods to onestep gradientbased methods and iterative methods. Since optimizationbased methods cannot explicitly control the distance between the adversarial examples and the corresponding real examples, they are not directly comparable to ours, but they have similar properties with iterative methods as discussed in Sec. 2.1. For clarity, we only report the results based on norm bound for nontargeted attacks, and leave the results based on norm bound and targeted attacks in the supplementary material. The findings in this paper are general across different attack settings.
4.2 Attacking a single model
We report in Table 1 the success rates of attacks against the models we consider. The adversarial examples are generated for Incv3, Incv4, InvResv2 and Res152 respectively using FGSM, iterative FGSM (IFGSM) and MIFGSM attack methods. The success rates are the misclassification rates of the corresponding models with adversarial images as inputs. The maximum perturbation is set to among all experiments, with pixel value in . The number of iterations is for IFGSM and MIFGSM, and the decay factor is , which will be studied in Sec. 4.2.1.
From the table, we can observe that MIFGSM remains as a strong whitebox adversary like IFGSM since it can attack a whitebox model with a near success rate. On the other hand, it can be seen that IFGSM reduces the success rates for blackbox attacks than onestep FGSM. But by integrating momentum, our MIFGSM outperforms both FGSM and IFGSM in blackbox attacks significantly. It obtains more than times of the success rates than IFGSM in most blackbox attack cases, demonstrating the effectiveness of the proposed algorithm. We show two adversarial images in Fig. 1 generated for Incv3.
It should be noted that although our method greatly improves the success rates for blackbox attacks, it is still ineffective for attacking adversarially trained models (e.g., less than for IncResv2_{ens}) in the blackbox manner. Later we show that ensemblebased approaches greatly improve the results in Sec. 4.3. Next, we study several aspects of MIFGSM that are different from vanilla iterative methods, to further explain why it performs well in practice.
4.2.1 Decay factor
The decay factor plays a key role for improving the success rates of attacks. If , momentumbased iterative methods trivially turn to vanilla iterative methods. Therefore, we study the appropriate value of the decay factor. We attack Incv3 model by MIFGSM with the perturbation , the number of iterations , and the decay factor ranging from to with a granularity . We show the success rates of the generated adversarial examples against Incv3, Incv4, IncResv2 and Res152 in Fig. 2. The curve of the success rate against a blackbox model is unimodal whose maximum value is obtained at around . When , another interpretation of defined in Eq. (6) is that it simply adds up all previous gradients to perform the current update.
4.2.2 The number of iterations
We then study the effect of the number of iterations on the success rates when using IFGSM and MIFGSM. We adopt the same hyperparameters (i.e., , ) for attacking Incv3 model with the number of iterations ranging from to , and then evaluate the success rates of adversarial examples against Incv3, Incv4, IncResv2 and Res152 models, with the results shown in Fig. 3.
It can be observed that when increasing the number of iterations, the success rate of IFGSM against a blackbox model gradually decreases, while that of MIFGSM maintains at a high value. The results prove our argument that the adversarial examples generated by iterative methods easily overfit a whitebox model and are not likely to transfer across models. But momentumbased iterative methods help to alleviate the tradeoff between the whitebox attacks and the transferability, thus demonstrating a strong attack ability for whitebox and blackbox models simultaneously.

Ensemble method  FGSM  IFGSM  MIFGSM  

Ensemble  Holdout  Ensemble  Holdout  Ensemble  Holdout  
Incv3 
Logits  55.7  45.7  99.7  72.1  99.6  87.9 
Predictions  52.3  42.7  95.1  62.7  97.1  83.3  
Loss  50.5  42.2  93.8  63.1  97.0  81.9  
Incv4  Logits  56.1  39.9  99.8  61.0  99.5  81.2 
Predictions  50.9  36.5  95.5  52.4  97.1  77.4  
Loss  49.3  36.2  93.9  50.2  96.1  72.5  
IncResv2  Logits  57.2  38.8  99.5  54.4  99.5  76.5 
Predictions  52.1  35.8  97.1  46.9  98.0  73.9  
Loss  50.7  35.2  96.2  45.9  97.4  70.8  
Res152  Logits  53.5  35.9  99.6  43.5  99.6  69.6 
Predictions  51.9  34.6  99.9  41.0  99.8  67.0  
Loss  50.4  34.1  98.2  40.1  98.8  65.2 
4.2.3 Update directions
To interpret why MIFGSM demonstrates better transferability, we further examine the update directions given by IFGSM and MIFGSM along the iterations. We calculate the cosine similarity of two successive perturbations and show the results in Fig. 4 when attacking Incv3. The update direction of MIFGSM is more stable than that of IFGSM due to the larger value of cosine similarity in MIFGSM.
Recall that the transferability comes from the fact that models learn similar decision boundaries around a data point [12]. Although the decision boundaries are similar, they are unlikely the same due to the highly nonlinear structure of DNNs. So there may exist some exceptional decision regions around a data point for a model (holes as shown in Fig. 4&5 in [12]), which are hard to transfer to other models. These regions correspond to poor local maxima in the optimization process and the iterative methods can easily trap into such regions, resulting in less transferable adversarial examples. On the other hand, the stabilized update directions obtained by the momentum methods as observed in Fig. 4 can help to escape from these exceptional regions, resulting in better transferability for adversarial attacks. Another interpretation is that the stabilized updated directions make the norm of the perturbations larger, which may be helpful for the transferability.
4.2.4 The size of perturbation
We finally study the influence of the size of adversarial perturbation on the success rates. We attack Incv3 model by FGSM, IFGSM and MIFGSM with ranging from to with the image intensity , and evaluate the performance on the whitebox model Incv3 and a blackbox model Res152. In our experiments, we set the step size in IFGSM and MIFGSM to , so the number of iterations grows linearly with the size of perturbation . The results are shown in Fig. 5.
For the whitebox attack, iterative methods reach the success rate soon, but the success rate of onestep FGSM decreases when the perturbation is large. The phenomenon largely attributes to the inappropriate assumption of the linearity of the decision boundary when the perturbation is large [12]. For the blackbox attacks, although the success rates of these three methods grow linearly with the size of perturbation, MIFGSM’s success rate grows faster. In other words, to attack a blackbox model with a required success rate, MIFGSM can use a smaller perturbation, which is more visually indistinguishable for humans.
4.3 Attacking an ensemble of models
In this section, we show the experimental results of attacking an ensemble of models. We first compare the three ensemble methods introduced in Sec. 3.2, and then demonstrate that the adversarially trained models are vulnerable to our blackbox attacks.
4.3.1 Comparison of ensemble methods
We compare the ensemble methods for attacks in this section. We include four models in our study, which are Incv3, Incv4, IncResv2 and Res152. In our experiments, we keep one model as the holdout blackbox model and attack an ensemble of the other three models by FGSM, IFGSM and MIFGSM respectively, to fully compare the results of the three ensemble methods, i.e., ensemble in logits, ensemble in predictions and ensemble in loss. We set to , the number of iterations in IFGSM and MIFGSM to , in MIFGSM to , and the ensemble weights equally. The results are shown in Table 2.
It can be observed that the ensemble in logits outperforms the ensemble in predictions and the ensemble in loss consistently among all the attack methods and different models in the ensemble for both the whitebox and blackbox attacks. Therefore, the ensemble in logits scheme is more suitable for adversarial attacks.
Another observation from Table 2 is that the adversarial examples generated by MIFGSM transfer at a high rate, enabling strong blackbox attacks. For example, by attacking an ensemble of Incv4, IncResv2 and Res152 fused in logits without Incv3, the generated adversarial examples can fool Incv3 with a success rate. Normally trained models show their great vulnerability against such an attack.
4.3.2 Attacking adversarially trained models

Attack  Ensemble  Holdout 

Incv3_{ens3} 
FGSM  36.1  15.4 
IFGSM  99.6  18.6  
MIFGSM  99.6  37.6  
Incv3_{ens4}  FGSM  33.0  15.0 
IFGSM  99.2  18.7  
MIFGSM  99.3  40.3  
IncResv2_{ens}  FGSM  36.2  6.4 
IFGSM  99.5  9.9  
MIFGSM  99.7  23.3  

To attack the adversarially trained models in the blackbox manner, we include all seven models introduced in Sec. 4.1. Similarly, we keep one adversarially trained model as the holdout target model to evaluate the performance in the blackbox manner, and attack the rest six model in an ensemble, whose logits are fused together with equal ensemble weights. The perturbation is and the decay factor is . We compare the results of FGSM, IFGSM and MIFGSM with iterations. The results are shown in Table 3.
It can be seen that the adversarially trained models also cannot defend our attacks effectively, which can fool Incv3_{ens4} by more than of the adversarial examples. Therefore, the models obtained by ensemble adversarial training, the most robust models trained on the ImageNet as far as we are concerned, are vulnerable to our attacks in the blackbox manner, thus causing new security issues for developing algorithms to learn robust deep learning models.
4.4 Competitions
There are three subcompetitions in the NIPS 2017 Adversarial Attacks and Defenses Competition, which are the Nontargeted Adversarial Attack, Targeted Adversarial Attack and Defense Against Adversarial Attack. The organizers provide ImageNetcompatible images for evaluating the attack and defense submissions. For each attack, one adversarial example is generated for each image with the size of perturbation ranging from to (specified by the organizers), and all adversarial examples run through all defense submissions to get the final score. We won the first places in both the nontargeted attack and targeted attack by the method introduced in this paper. We will specify the configurations in our submissions.
For the nontargeted attack^{2}^{2}2Source code is available at https://github.com/dongyp13/NonTargetedAdversarialAttacks., we implement the MIFGSM for attacking an ensemble of Incv3, Incv4, IncResv2, Res152, Incv3_{ens3}, Incv3_{ens4}, IncResv2_{ens} and Incv3_{adv} [10]. We adopt the ensemble in logits scheme. The ensemble weights are set as equally for the first seven models and for Incv3_{adv}. The number of iterations is and the decay factor is .
For the targeted attack^{3}^{3}3Source code is available at https://github.com/dongyp13/TargetedAdversarialAttacks., we build two graphs for attacks. If the size of perturbation is smaller than , we attack Incv3 and IncResv2_{ens} with ensemble weights and ; otherwise we attack an ensemble of Incv3, Incv3_{ens3}, Incv3_{ens4}, IncResv2_{ens} and Incv3_{adv} with ensemble weights and . The number of iterations is and respectively, and the decay factor is also .
5 Discussion
Taking a different perspective, we think that finding an adversarial example is an analogue to training a model and the transferability of the adversarial example is also an analogue to the generalizability of the model. By taking a meta view, we actually “train” an adversarial example given a set of models as training data. In this way, the improved transferability obtained by the momentum and ensemble methods is reasonable because the generalizability of a model is usually improved by adopting the momentum optimizer or training on more data. And we think that other tricks (e.g., SGD) for enhancing the generalizability of a model could also be incorporated into adversarial attacks for better transferability.
6 Conclusion
In this paper, we propose a broad class of momentumbased iterative methods to boost adversarial attacks, which can effectively fool whitebox models as well as blackbox models. Our methods consistently outperform onestep gradientbased methods and vanilla iterative methods in the blackbox manner. We conduct extensive experiments to validate the effectiveness of the proposed methods and explain why they work in practice. To further improve the transferability of the generated adversarial examples, we propose to attack an ensemble of models whose logits are fused together. We show that the models obtained by ensemble adversarial training are vulnerable to our blackbox attacks, which raises new security issues for the development of more robust deep learning models.
Acknowledgements
The work is supported by the National NSF of China (Nos. 61620106010, 61621136008, 61332007, 61571261 and U1611461), Beijing Natural Science Foundation (No. L172037), Tsinghua Tiangong Institute for Intelligent Computing and the NVIDIA NVAIL Program, and partially funded by Microsoft Research Asia and TsinghuaIntel Joint Research Institute.
References
 [1] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017.
 [2] R. Caruana, A. NiculescuMizil, G. Crew, and A. Ksikes. Ensemble selection from libraries of models. In ICML, 2004.
 [3] Y. Dong, H. Su, J. Zhu, and F. Bao. Towards interpretable deep neural networks by leveraging adversarial examples. arXiv preprint arXiv:1708.05493, 2017.
 [4] W. Duch and J. Korczak. Optimization and global minimization methods suitable for neural networks. Neural computing surveys, 2:163–212, 1998.
 [5] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.
 [6] L. K. Hansen and P. Salamon. Neural network ensembles. IEEE transactions on pattern analysis and machine intelligence, 12(10):993–1001, 1990.
 [7] K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In ECCV, 2016.

[8]
A. Krogh and J. Vedelsby.
Neural network ensembles, cross validation and active learning.
In NIPS, 1994.  [9] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
 [10] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale. In ICLR, 2017.
 [11] Y. Li and Y. Gal. Dropout inference in bayesian neural networks with alphadivergences. In ICML, 2017.
 [12] Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and blackbox attacks. In ICLR, 2017.
 [13] J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff. On detecting adversarial perturbations. In ICLR, 2017.
 [14] S. M. MoosaviDezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations. In CVPR, 2017.
 [15] T. Pang, C. Du, and J. Zhu. Robust deep learning via reverse crossentropy training and thresholding test. arXiv preprint arXiv:1706.00633, 2017.
 [16] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. Practical blackbox attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, 2017.
 [17] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy, 2016.
 [18] B. T. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1–17, 1964.

[19]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang,
A. Karpathy, A. Khosla, M. Bernstein, et al.
Imagenet large scale visual recognition challenge.
International Journal of Computer Vision
, 115(3):211–252, 2015.  [20] I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the importance of initialization and momentum in deep learning. In ICML, 2013.

[21]
C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi.
Inceptionv4, inceptionresnet and the impact of residual connections on learning.
In AAAI, 2017.  [22] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In CVPR, 2016.
 [23] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In ICLR, 2014.
 [24] F. Tramèr, A. Kurakin, N. Papernot, D. Boneh, and P. McDaniel. Ensemble adversarial training: Attacks and defenses. In ICLR, 2018.
A Nontargeted attacks based on norm bound
We first perform nontargeted attacks based on norm bound. Since the distance between an adversarial example and a real example is defined as
(14) 
where is the dimension of input and , the distance measure depends on . For example, if the distance of each dimension of an adversarial example and a real example is , the norm is between them while the norm is . Therefore, we set the norm bound as in our experiments, where is the dimension of the input to a network.

Attack  Ensemble  Holdout 
Incv3 
FGM  47.3  52.7 
IFGM  99.1  65.3  
MIFGM  99.2  89.7  
Incv4  FGM  47.2  49.3 
IFGM  99.3  56.7  
MIFGM  99.4  88.0  
IncResv2  FGSM  47.3  50.4 
IFGSM  99.4  54.3  
MIFGSM  99.5  86.1  
Res152  FGM  47.6  46.6 
IFGM  99.0  44.7  
MIFGM  99.5  81.4  
Incv3_{ens3}  FGM  51.8  35.4 
IFGM  99.8  29.5  
MIFGM  99.6  59.8  
Incv3_{ens4}  FGM  51.2  37.5 
IFGM  99.2  36.4  
MIFGM  99.7  66.5  
IncResv2_{ens}  FGSM  54.4  32.4 
IFGSM  99.2  19.9  
MIFGSM  99.8  56.4  

a.1 Attacking a single model
We include seven networks in this section, which are Incv3, Incv4, IncResv2, Res152, Incv3_{ens3}, Incv3_{ens4} and IncResv2_{ens}. We generate adversarial examples for Incv3, Incv4, IncResv2 and Res152 respectively, and measure the success rates of attacks on all models. We compare three attack methods, which are the fast gradient method (FGM, defined in Eq. (2)), iterative FGM (IFGM) and momentum iterative FGM (MIFGM, defined in Eq. (10)). We set the number of iterations to in IFGM and MIFGM, and the decay factor to in MIFGM.
The results are shown in Table 5 (See next page.). We can also see that MIFGM attacks a whitebox model with a near success rate as IFGM, and outperforms FGM and IFGM in blackbox attacks significantly. The conclusions are similar to those of norm bound experiments, which consistently demonstrate the effectiveness of the proposed momentumbased iterative methods.
a.2 Attacking an ensemble of models

Attack  Incv3  Incv4  IncResv2  Res152  Incv3_{ens3}  Incv3_{ens4}  IncResv2_{ens} 

Incv3 
FGM  76.2  41.0  43.1  41.3  34.6  34.9  26.2 
IFGM  100.0  39.9  36.4  27.5  17.5  19.2  10.9  
MIFGM  100.0  67.6  66.3  56.1  44.4  45.5  33.9  
Incv4  FGM  47.3  63.1  37.3  39.0  35.3  33.9  27.7 
IFGM  52.8  100.0  42.0  33.5  21.9  19.9  13.8  
MIFGM  76.9  100.0  69.6  59.7  51.2  51.0  39.4  
IncResv2  FGM  48.2  38.9  60.4  39.8  36.6  35.5  30.5 
IFGM  56.0  47.5  99.6  36.9  27.5  22.9  18.7  
MIFGM  81.7  75.8  99.6  66.9  62.7  57.7  58.8  
Res152  FGM  50.8  40.7  42.0  75.1  36.5  36.0  31.6 
IFGM  47.6  43.9  43.9  99.4  32.7  32.3  25.2  
MIFGM  71.3  65.5  64.3  99.6  56.7  55.4  51.5  

In this experiments, we also include Incv3, Incv4, IncResv2, Res152, Incv3_{ens3}, Incv3_{ens4} and IncResv2_{ens} models for our study. We keep one model as the holdout blackbox model and attack an ensemble of the other six models by FGM, IFGM and MIFGM respectively. We set the number of iterations to in IFGM and MIFGM, the decay factor to in MIFGM, and the ensemble weights to equally.
We show the results in Table 4. Iterative methods including IFGM and MIFGM can obtain a near success rate for an ensemble of whitebox models. And MIFGM can attack a blackbox model with a much higher success rate, showing the good transferability of the adversarial examples generated by MIFGM. For adversarially trained models, MIFGM can fool them with about success rates, revealing the great vulnerability of the adversarially trained models against our blackbox attacks.
B Targeted attacks
b.1 norm bound

Attack  Ensemble  Holdout 
Incv3 
FGSM  0.5  0.5 
IFGSM  99.6  9.0  
MIFGSM  99.5  17.6  
Incv4  FGSM  0.3  0.4 
IFGSM  99.9  7.0  
MIFGSM  99.8  15.6  
IncResv2  FGSM  0.4  0.2 
IFGSM  99.9  7.3  
MIFGSM  99.8  16.1  
Res152  FGSM  0.1  0.5 
IFGSM  99.6  3.3  
MIFGSM  99.5  11.4  
Incv3_{ens3}  FGSM  0.3  0.1 
IFGSM  99.7  0.1  
MIFGSM  99.7  0.5  
Incv3_{ens4}  FGSM  0.2  0.1 
IFGSM  99.9  0.4  
MIFGSM  99.8  0.9  
IncResv2_{ens}  FGSM  0.5  0.1 
IFGSM  99.7  0.1  
MIFGSM  99.8  0.2  


Attack  Ensemble  Holdout 
Incv3 
FGM  0.7  0.4 
IFGM  99.7  17.8  
MIFGM  99.5  21.0  
Incv4  FGM  0.7  0.5 
IFGM  99.9  15.2  
MIFGM  99.8  21.8  
IncResv2  FGM  0.7  0.7 
IFGM  99.8  16.4  
MIFGM  99.9  21.7  
Res152  FGM  0.5  0.4 
IFGM  99.5  9.2  
MIFGM  99.6  17.4  
Incv3_{ens3}  FGM  0.6  0.2 
IFGM  99.9  0.7  
MIFGM  99.6  1.6  
Incv3_{ens4}  FGM  0.5  0.2 
IFGM  99.7  1.7  
MIFGM  100.0  2.0  
IncResv2_{ens}  FGM  0.6  0.4 
IFGM  99.6  0.5  
MIFGM  99.8  1.9  

Targeted attacks are much more difficult than nontargeted attacks in the blackbox manner, since they require the blackbox model to output the specific target label. For DNNs trained on a dataset with thousands of output categories such as the ImageNet dataset, finding targeted adversarial examples by only one model to fool a blackbox model is impossible [12]. Thus we perform targeted attacks by integrating the ensemblebased approach.
We show the results in Table 6, where the success rate is measured by the percentage of the adversarial examples that are classified as the target label by the model. Similar to the experimental settings in Sec. 4.3.2, we keep one model to test the performance of blackbox attacks, with the targeted adversarial examples generated for the ensemble of the other six models. We set the size of perturbation to , decay factor to and the number of iterations to for IFGSM and MIFGSM. We can see that onestep FGSM can hardly attack the ensemble of models as well as the target blackbox models. The success rates of the adversarial examples generated by MIFGSM are close to for whitebox models and higher than for normally trained blackbox models. Unfortunately, it cannot effectively generate targeted adversarial examples to fool adversarially trained models, which remains an open issue for future researches.
b.2 norm bound
We draw similar conclusions for targeted attacks based on norm bound. In our experiments, we also include Incv3, Incv4, IncResv2, Res152, Incv3_{ens3}, Incv3_{ens4} and IncResv2_{ens} models. We keep one model as the holdout blackbox model and attack an ensemble of the other six models with equal ensemble weights by FGM, IFGM and MIFGM respectively. We set the maximum perturbation to where is the dimension of inputs, the number of iterations to in IFGM and MIFGM, and the decay factor to in MIFGM. We report the success rates of adversarial examples against the whitebox ensemble of models and the blackbox target model in Table 7. MIFGM can easily fool whitebox models, but it cannot fool the adversarially trained models effectively in the targeted blackbox attacks.
Comments
There are no comments yet.