Recent research has shown that machine learning models can easily be fooled by adversarial samples which are crafted by adding designed perturbations to the inputs[5, 17]. From the perspective of attack goals, there are two types of adversarial attacks: (1) the non-targeted attack, which crafts adversarial examples misclassified as wrong labels; (2) the targeted attack, which generates adversarial examples classified as target labels. People are more interested in the targeted attack. However, the targeted attack has a lower success rate.
In a black-box attack protocol, it is probable to utilize the transferability of crafted adversarial examples which can be misclassified by unseen target models[12, 9, 11]. While existing approaches are effective to generate non-targeted transferable adversarial examples, targeted adversarial examples are hardly transferred . Even in a white-box attack protocol, the targeted attack also has little effect on some adversarial robust models. As shown in Fig.1, targeted attacks on three models are conducted in the top row, with an ensemble attack method in . Inc-v3  and AdvInc-v3  miss-classify the adversarial image as the target label “cup”, while the adversarial robust model ResnextDenoiseAll  still successfully classify it as the true label, which means the adversarial robust model is hard to be targetedly attacked, even for the white-box attack.
In this paper, we propose a new adversarial attack mechanism, which maximizes the attack success rate by considering both the targeted and non-targeted attacks but prioritizing targeted attack. When the targeted attack fails, this adversarial sample hopefully performs the non-targeted attack successfully. Besides, as shown in the bottom row of Fig.1, the generated single adversarial example is able to fool all deployed models and even for ResNextDenoiseAll, the non-targeted attack at least succeed. Thus in our attack, different models are firstly divided into two groups, easy-to-attack models and robust models. Two groups of models are then alternately attacked. We name this attack a bagging and stacking ensemble (BAST) attack. Our contributions are as follows:
1. We present a new adversarial attack mechanism, at-least-non-targeted attack. To test the efficiency of methods in this attack protocol, we design a new evaluation criterion.
2. We propose a novel ensemble attack method, BAST attack, to make sure at-least-non-targeted attack succeed. BAST attack outperforms the state-of-the-art attack methods with respect to at-least-non-targeted attack.
3. Experimental results on image classification models demonstrate that adversarial examples crafted by our proposed method are able to fool multiple models in either white-box or black-box protocol, achieving an optimal trade-off between non-targeted attack and targeted attack.
In this section, we review the backgrounds of adversarial attack and defense methods. Following Sharma et al. , we conduct attack based on MIFGSM , utilizing some methods in sec. 2.1, including input diversity  and a translation-invariant method 
, to improve the transferability. The defense models to evaluate our proposed method include pretrained models on ImageNet and some robust models trained with some defense methods in sec. 2.2.
2.1 Attack Methods
2.1.1 Fast Gradient Sign Method
2.1.2 Momentum-based Iterative Method
To solve the problem that most adversarial attacks fool black-box models with low success rate, Dong et al. propose a momentum-based iterative algorithm, MI-FGSM . By integrating momentum term into the iterative process of attack, this method can craft more transferable adversarial examples by computing as
2.1.3 Input Diversity Method
, random transformation is performed on the input during each iteration to realize the input diversity. Their experiments show that the combination of random scaling and random zero padding has the best performance.
2.1.4 Translation-invariant Method
Dong et al. propose a translation-invariant attack method to generate more transferable adversarial examples against defense models . To improve the efficiency, the translation operation is realized by convolving the gradient of the untranslated image with a predefined kernel function . The update rule to compute adversarial examples is as following:
2.1.5 Ensemble Method
Ensemble methods have been widely adopted in previous researches to enhance the performance of neural networks[6, 2, 7]. For example, Bagging  and Stacking , can both improve accuracy and robustness of neural networks. Recently, Liu et al. propose novel ensemble-based approaches to generate adversarial examples, which improve the transferability even for targeted adversarial examples .
2.2 Defense Methods
Adversarial training [17, 5] is the simplest and most widely used method to defense adversarial attack. It can increase robustness by directly adding a considerable amount of adversarial examples generated by different attack methods to the training set during network training. Madry et al.  regard adversarial training as a framework of maximum and minimum game and train more robust models in this way.
Xie et al. propose a denoising network architecture , which enhances adversarial robustness by adding feature denoising module. Combined with adversarial training, feature denoising networks greatly improve the adversarial robustness, especially in targeted attack protocol. However, non-targeted attack is still a threat to these defense methods.
by accumulating the velocity vector in the gradient direction as
3 The Proposed Method
To generate adversarial examples in at-least-non-targeted protocol, we propose a novel ensemble attack method, BAST attack. In this section, we first give a brief introduction of previous ensemble methods and explain their drawbacks in at-least-non-targeted attack protocol. We then introduce the attack method for a single model, which is the base for our ensemble attack. Finally, we present our solution, BAST attack, which enables us to efficiently craft adversarial examples in at-least-non-targeted attack protocol.
3.1 Previous Ensemble Methods
In NIPS 2017 adversarial attack competition , Dong et al.  report three different ensemble methods including ensemble in logits
ensemble in logits, ensemble in predictions and ensemble in loss, the only difference of which is where to combine the outputs of multiple models. All methods simply add model’s outputs together and then average. We focus on ensemble in loss and give its formulation as follows:
where is the cross-entropy loss of the i-th model.
The loss function is optimized by gradient-based algorithms such as FGSM. When the targeted attack is performed on the ensemble, the gradient always points to target class boundary of easy-to-attack models rather than that of robust models, shown as the green arrow in Fig. 2. To fully utilize the the gradient information of robust models, we propose our method, BAST attack.
3.2 Attack on Single Model
To achieve high success rate and strong transferability of adversarial examples, we follow Sharma et al.  to use Algorithm 1 as our baseline attack for a single model. For targeted attack, ground-truth label is exchanged with target class label and the plus sign in Eq.(6) is changed to the minus sign. To further improve the transferability in black-box attack scenario, cropping is added into the random preprocess .
3.3 BAST Attack
Motivated by ensemble methods such as Bagging  and Stacking , we propose a novel ensemble method for adversarial attack, called bagging and stacking ensemble (BAST) attack, as shown in Algorithm 2.
In our BAST attack, all models are divided into two groups: easy to attack and robust models. The models in each group compose an ensemble model in a way similar to Bagging. The same type of attack, non-targeted or targeted attack, is performed on each ensemble. The Bagging ensemble model obtains a lower variance by averaging predictions of independent models. In another aspect, different groups compose an ensemble model in a way similar to Stacking. In Stacking, outputs returned by some lower layer weak learners are used to train a meta model, while in our BAST attack, outputs of last Bagging ensemble model are fed into the next one.
Fig. 3 shows one iteration of BAST attack. In practical application, we conduct targeted attack on easy-to-attack models for times and conduct non-targeted attack on other models for
times, respectively. By controlling these two hyperparameters, the performance of BAST attack can be further improved.
In this section, we conduct extensive experiments to validate the effectiveness of the proposed method. We first specify the experimental settings in Sec. 4.1. Then we describe a new evaluation criterion in at-least-non-targeted attack protocol in Sec. 4.2. We further evaluate our proposed method utilizing different adversarial models in Sec. 4.3 and Sec. 4.4.
We conduct experiments on ImageNet . The maximum perturbation is set to 16, with pixel values in [0, 255]. We ensemble three models, which are a normally trained model—Inception v3 (Inc-v3) , an adversarially trained model—AdvInception v3 (AdvInc-v3)  and an extremely robust model trained by Facebook—ResnextDenoiseAll (AdvDeRex) , as the substitute model to be attacked. For BAST attack, we set Inc-v3 and AdvInc-v3 as easy-to-attack models, AdvDeRex as the robust model. Besides white-box attack success rates of the three models, we also report black-box attack success rates evaluated on a normally trained model—Inception v1 (Inc-v1)  and an adversarially trained model—AdvInceptionResnet v2 (AdvIncres-v2) . We evaluate with 1000 images from ImageNet  which are specially chosen to be classified by our models correctly. For traditional ensemble attack methods, the number of iterations is 100, which is enough to make the algorithm converge. For BAST attack, as we conduct targeted attack on easy-to-attack models for times and then conduct non-targeted attack on robust models for times, the number of iterations is set as  for comparison. We set and .
|Method||Inc-v3||AdvInc-v3||AdvDeRex||Inc-v1 *||AdvIncres-v2 *|
4.2 Evaluation Criteria
Notice that we aim to obtain a weighted combination of targeted and non-targeted attack. Thus a new evaluation criterion is needed to verify the efficiency of tested methods. The evaluation score is defined as the following formula:
where n is the number of images for evaluation and is the attack score on one model. For each image , if targeted attack is successful on a model, is 1 point on this model. If targeted attack is not successful, but non-targeted attack is successful, then is 0.5 point. Otherwise, is 0 point.
|Method||Inc-v3||AdvInc-v3||AdvDeRex||Inc-v1 *||AdvIncres-v2 *|
|Hyperparameters||Inc-v3||AdvInc-v3||AdvDeRex||Inc-v1 *||AdvIncres-v2 *|
|(a) m=1, n=1||0.0/99.6/99.6||2.8/92.2/93.6||77.7/0.1/38.95||41.6/3.7/24.5||30.8/5.4/20.8|
|(b) m=2, n=1||0.0/100.0/100.0||0.7/98.2/98.55||72.0/0.1/36.1||40.5/6.7/26.95||29.7/8.5/23.35|
|(c) m=3, n=1||0.0/100.0/100.0||0.3/98.9/99.05||67.3/0.1/33.75||39.8/7.5/27.4||27.1/9.9/23.45|
|(d) m=5, n=5||0.0/99.8/99.8||3.0/92.0/93.5||74.4/0.1/37.3||38.8/4.0/23.4||28.6/4.2/18.5|
|(e) m=10, n=10||0.0/99.9/99.9||3.4/91.6/93.3||71.6/0.1/35.9||37.5/4.2/22.95||27.0/4.6/18.1|
4.3 Main Results
We evaluate three methods: (1) Bagging attack. The targeted attack is performed on all three models in the bagging way, which means the ensemble method is Eq.(7), where the ensemble weight for each model is the same. (2) Stacking attack. The targeted attack is performed on each model alternately in the stacking way. (3) Our BAST attack.
Our main results are in Table 1. We show that:
(1) Our new attack mechanism, at-least-non-targeted attack, plays a key role in attacking multiple models, especially when attacking some robust models.
Focus on the column “AdvDeRex” in Table 1. When we conduct targeted ensemble attack, the highest targeted attack success rate on AdvDeRex is 4.0%. The low success rates indicate that targeted adversarial examples have a low attack success rate especially on robust models. Thus a successful non-targeted attack on these robust models is essential.
(2) Our BAST attack outperforms other methods in the at-least-non-targeted attack protocol.
BAST attack gets competitive results with other two targeted ensemble attacks on easy-to-attack models. On the robust model, however, BAST attack largely improve the non-targeted attack success rate with little decrease in targeted attack success rate. We list some natural images in ImageNet and corresponding adversarial images and perturbations crafted by BAST attack in Fig. 4. Moreover, in black-box attack on ImageNet, BAST attack outperforms targeted ensemble attacks on the success rate of both non-targeted and targeted attack, indicating that our BAST attack improves the transferability of crafted adversarial examples.
4.4 Analysis of BAST Attack
In this section we study factors which affect the performance of BAST attack.
We first focus on the ensemble method. Our BAST attack is intuited from bagging and stacking. It is natural to compare it with two special ensemble methods: (1) Without-stacking, which ensembles all models only in the bagging way. Pay attention to the difference of Without-stacking with Bagging attack in Table 1: Without-stacking is a variant of BAST, also set in at-least-non-targeted protocol, which means the attack on the robust models is always non-targeted. (2) Without-bagging, which ensembles all models only in the stacking way.
Results are in Table 2. According to the evaluation score, BAST outperforms other two methods in both white-box and black-box protocol. One intriguing phenomenon is that the non-targeted attack success rates of three methods on AdvDeRex are far higher than 15.6% of non-targeted bagging attack, indicating that performing non-targeted attack on the bagging ensemble weakens the effect of robust models.
We then study the effect of hyperparameters. In one iteration of BAST attack, we conduct targeted attack on easy-to-attack models for times and then conduct non-targeted attack on robust models for times. Experiments are conducted on different combinations of and , and results are shown in Table 3. We conclude as follows:
(1) Comparing (a) with (b) and (c), the targeted attack success rate of AdvInc-v3 increases with more targeted attack iterations, while the non-targeted attack success rate of AdvDeRex decreases. Thus the trade-off between non-targeted and targeted attack success rate can be adjusted flexibly in this way.
(2) Comparing (a) with (d) and (e), on the condition , the performance becomes poor as the value of n and m increases. and with large value limit the diversity of ensemble.
In this paper, we have stated a new adversarial attack mechanism, at-least-non-targeted attack, which is more important for practical application. In order to achieve this, a novel ensemble attack method, BAST attack has been proposed. Extensive experiments have shown the effectiveness of the BAST attack which has improved non-targeted attack success rate while keeping targeted attack performance. That means it outperforms the state-of-the-art ensemble attacks.
We gratefully thank Tianxiang Ma and Yueming Lv for their assistance with the experiments.
-  (1996) Bagging predictors. Machine Learning 24, pp. 123–140. Cited by: §2.1.5, §3.3.
-  (2004) Ensemble selection from libraries of models. In ICML, Cited by: §2.1.5.
Boosting adversarial attacks with momentum.
IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9185–9193. Cited by: Figure 1, §1, §2.1.2, §2.1.3, §2, §3.1.
-  (2019) Evading defenses to transferable adversarial examples by translation-invariant attacks. IEEE/CVF Conference on Computer Vision and Pattern Recognition. Cited by: §2.1.4, §2.
-  (2015) Explaining and harnessing adversarial examples. ICLR. Cited by: §1, §2.2.
-  (1990) Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12, pp. 993–1001. Cited by: §2.1.5.
Neural network ensembles, cross validation, and active learning. In NIPS, Cited by: §2.1.5.
-  (2018) Adversarial attacks and defences competition. ArXiv abs/1804.00097. Cited by: §3.1.
-  (2017) Delving into transferable adversarial examples and black-box attacks. ICLR. Cited by: §1, §2.1.5.
-  (2018) Towards deep learning models resistant to adversarial attacks. ICLR. Cited by: §2.2.
-  (2016) Practical black-box attacks against machine learning. In AsiaCCS, Cited by: §1.
-  (2016) Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. ArXiv abs/1605.07277. Cited by: §1.
-  (2015) ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, pp. 211–252. Cited by: §2, §4.1.
-  (2018) CAAD 2018: generating transferable adversarial examples. ArXiv abs/1810.01268. Cited by: §2, §3.2.
-  (2015) Going deeper with convolutions. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. Cited by: §4.1.
-  (2016) Rethinking the inception architecture for computer vision. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2818–2826. Cited by: Figure 1, §1, §4.1.
-  (2014) Intriguing properties of neural networks. ICLR. Cited by: §1, §2.1.1, §2.2.
-  (2018) Ensemble adversarial training: attacks and defenses. ICLR. Cited by: Figure 1, §1, §4.1.
-  (1992) Stacked generalization. Neural Networks 5 (2), pp. 241–259. Cited by: §2.1.5, §3.3.
-  (2019) Feature denoising for improving adversarial robustness. IEEE/CVF Conference on Computer Vision and Pattern Recognition. Cited by: Figure 1, §1, §2.2, §4.1.
-  (2019) Improving transferability of adversarial examples with input diversity. IEEE/CVF Conference on Computer Vision and Pattern Recognition. Cited by: §2.1.3, §2, §3.2.