A New Ensemble Method for Concessively Targeted Multi-model Attack

12/19/2019 ∙ by Ziwen He, et al. ∙ 0

It is well known that deep learning models are vulnerable to adversarial examples crafted by maliciously adding perturbations to original inputs. There are two types of attacks: targeted attack and non-targeted attack, and most researchers often pay more attention to the targeted adversarial examples. However, targeted attack has a low success rate, especially when aiming at a robust model or under a black-box attack protocol. In this case, non-targeted attack is the last chance to disable AI systems. Thus, in this paper, we propose a new attack mechanism which performs the non-targeted attack when the targeted attack fails. Besides, we aim to generate a single adversarial sample for different deployed models of the same task, e.g. image classification models. Hence, for this practical application, we focus on attacking ensemble models by dividing them into two groups: easy-to-attack and robust models. We alternately attack these two groups of models in the non-targeted or targeted manner. We name it a bagging and stacking ensemble (BAST) attack. The BAST attack can generate an adversarial sample that fails multiple models simultaneously. Some of the models classify the adversarial sample as a target label, and other models which are not attacked successfully may give wrong labels at least. The experimental results show that the proposed BAST attack outperforms the state-of-the-art attack methods on the new defined criterion that considers both targeted and non-targeted attack performance.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recent research has shown that machine learning models can easily be fooled by adversarial samples which are crafted by adding designed perturbations to the inputs 

[5, 17]. From the perspective of attack goals, there are two types of adversarial attacks: (1) the non-targeted attack, which crafts adversarial examples misclassified as wrong labels; (2) the targeted attack, which generates adversarial examples classified as target labels. People are more interested in the targeted attack. However, the targeted attack has a lower success rate.

In a black-box attack protocol, it is probable to utilize the transferability of crafted adversarial examples which can be misclassified by unseen target models 

[12, 9, 11]. While existing approaches are effective to generate non-targeted transferable adversarial examples, targeted adversarial examples are hardly transferred [9]. Even in a white-box attack protocol, the targeted attack also has little effect on some adversarial robust models. As shown in Fig.1, targeted attacks on three models are conducted in the top row, with an ensemble attack method in [3]. Inc-v3 [16] and AdvInc-v3 [18] miss-classify the adversarial image as the target label “cup”, while the adversarial robust model ResnextDenoiseAll [20] still successfully classify it as the true label, which means the adversarial robust model is hard to be targetedly attacked, even for the white-box attack.

In this paper, we propose a new adversarial attack mechanism, which maximizes the attack success rate by considering both the targeted and non-targeted attacks but prioritizing targeted attack. When the targeted attack fails, this adversarial sample hopefully performs the non-targeted attack successfully. Besides, as shown in the bottom row of Fig.1, the generated single adversarial example is able to fool all deployed models and even for ResNextDenoiseAll, the non-targeted attack at least succeed. Thus in our attack, different models are firstly divided into two groups, easy-to-attack models and robust models. Two groups of models are then alternately attacked. We name this attack a bagging and stacking ensemble (BAST) attack. Our contributions are as follows:

1. We present a new adversarial attack mechanism, at-least-non-targeted attack. To test the efficiency of methods in this attack protocol, we design a new evaluation criterion.

2. We propose a novel ensemble attack method, BAST attack, to make sure at-least-non-targeted attack succeed. BAST attack outperforms the state-of-the-art attack methods with respect to at-least-non-targeted attack.

3. Experimental results on image classification models demonstrate that adversarial examples crafted by our proposed method are able to fool multiple models in either white-box or black-box protocol, achieving an optimal trade-off between non-targeted attack and targeted attack.

2 Backgrounds

In this section, we review the backgrounds of adversarial attack and defense methods. Following Sharma et al[14], we conduct attack based on MIFGSM [3], utilizing some methods in sec. 2.1, including input diversity [21] and a translation-invariant method [4]

, to improve the transferability. The defense models to evaluate our proposed method include pretrained models on ImageNet 

[13] and some robust models trained with some defense methods in sec. 2.2.

2.1 Attack Methods

2.1.1 Fast Gradient Sign Method

Fast gradient sign method (FGSM) [17] performs a single step update on the original sample

along the direction of the gradient of a loss function

, where is often the cross-entropy loss. In the scenario where perturbation is to meet the norm bound , the adversarial example is computed as

(1)

2.1.2 Momentum-based Iterative Method

To solve the problem that most adversarial attacks fool black-box models with low success rate, Dong et al. propose a momentum-based iterative algorithm, MI-FGSM [3]. By integrating momentum term into the iterative process of attack, this method can craft more transferable adversarial examples by computing as

(2)
(3)

2.1.3 Input Diversity Method

Xie et al. propose that diverse inputs can improve transferability of adversarial examples [21]. Based on the MI-FGSM [3]

, random transformation is performed on the input during each iteration to realize the input diversity. Their experiments show that the combination of random scaling and random zero padding has the best performance.

2.1.4 Translation-invariant Method

Dong et al. propose a translation-invariant attack method to generate more transferable adversarial examples against defense models [4]. To improve the efficiency, the translation operation is realized by convolving the gradient of the untranslated image with a predefined kernel function . The update rule to compute adversarial examples is as following:

(4)

2.1.5 Ensemble Method

Ensemble methods have been widely adopted in previous researches to enhance the performance of neural networks 

[6, 2, 7]. For example, Bagging [1] and Stacking [19], can both improve accuracy and robustness of neural networks. Recently, Liu et al. propose novel ensemble-based approaches to generate adversarial examples, which improve the transferability even for targeted adversarial examples [9].

2.2 Defense Methods

Adversarial training [17, 5] is the simplest and most widely used method to defense adversarial attack. It can increase robustness by directly adding a considerable amount of adversarial examples generated by different attack methods to the training set during network training. Madry et al[10] regard adversarial training as a framework of maximum and minimum game and train more robust models in this way.

Xie et al. propose a denoising network architecture [20], which enhances adversarial robustness by adding feature denoising module. Combined with adversarial training, feature denoising networks greatly improve the adversarial robustness, especially in targeted attack protocol. However, non-targeted attack is still a threat to these defense methods.

1:A classifier with loss function ; a real example and ground-truth label ; The size of perturbation ; iterations and decay factor ; The random preprocess function ; gaussian kernel W.
2:An adversarial example with .
3:; ; ;
4:for  to  do
5:     Do random preprocess for input to obtain ;
6:     Input processed image to and obtain the gradient ;
7:     Convolve the gradient with to get smoothed gradient ;
8:     Update

by accumulating the velocity vector in the gradient direction as

(5)
9:     Update by applying the clipped gradient as
(6)
10:end for
11:return .
Algorithm 1 Attack on single model

3 The Proposed Method

To generate adversarial examples in at-least-non-targeted protocol, we propose a novel ensemble attack method, BAST attack. In this section, we first give a brief introduction of previous ensemble methods and explain their drawbacks in at-least-non-targeted attack protocol. We then introduce the attack method for a single model, which is the base for our ensemble attack. Finally, we present our solution, BAST attack, which enables us to efficiently craft adversarial examples in at-least-non-targeted attack protocol.

3.1 Previous Ensemble Methods

In NIPS 2017 adversarial attack competition [8], Dong et al[3] report three different ensemble methods including

ensemble in logits

, ensemble in predictions and ensemble in loss, the only difference of which is where to combine the outputs of multiple models. All methods simply add model’s outputs together and then average. We focus on ensemble in loss and give its formulation as follows:

(7)

where is the cross-entropy loss of the i-th model.

The loss function is optimized by gradient-based algorithms such as FGSM. When the targeted attack is performed on the ensemble, the gradient always points to target class boundary of easy-to-attack models rather than that of robust models, shown as the green arrow in Fig. 2. To fully utilize the the gradient information of robust models, we propose our method, BAST attack.

Figure 2: Illustration of targeted ensemble attack versus BAST attack.
Figure 3: One iteration of BAST attack. It has two main steps: (1) From to , attacks are performed on easy-to-attack models. (2)From to , attacks are performed on robust models.

3.2 Attack on Single Model

To achieve high success rate and strong transferability of adversarial examples, we follow Sharma et al[14] to use Algorithm 1 as our baseline attack for a single model. For targeted attack, ground-truth label is exchanged with target class label and the plus sign in Eq.(6) is changed to the minus sign. To further improve the transferability in black-box attack scenario, cropping is added into the random preprocess [21].

3.3 BAST Attack

Motivated by ensemble methods such as Bagging [1] and Stacking [19], we propose a novel ensemble method for adversarial attack, called bagging and stacking ensemble (BAST) attack, as shown in Algorithm  2.

In our BAST attack, all models are divided into two groups: easy to attack and robust models. The models in each group compose an ensemble model in a way similar to Bagging. The same type of attack, non-targeted or targeted attack, is performed on each ensemble. The Bagging ensemble model obtains a lower variance by averaging predictions of independent models. In another aspect, different groups compose an ensemble model in a way similar to Stacking. In Stacking, outputs returned by some lower layer weak learners are used to train a meta model, while in our BAST attack, outputs of last Bagging ensemble model are fed into the next one.

Fig. 3 shows one iteration of BAST attack. In practical application, we conduct targeted attack on easy-to-attack models for times and conduct non-targeted attack on other models for

times, respectively. By controlling these two hyperparameters, the performance of BAST attack can be further improved.

1:The logits of classifiers ; ensemble weights ; non-targeted attack times and targeted attack times ; a real example and ground-truth label , target label ; the size of perturbation ; iterations and decay factor ; the random preprocess function ; gaussian kernel W.
2:An adversarial example with .
3:; ; ;
4:for  to  do
5:     
6:     for  to  do
7:          if i=0 then
8:               , , , ;
9:          else
10:               , , , ;
11:          end if
12:          for  to  do
13:               
14:               Input and output for ;
15:               Get softmax cross-entropy loss based on ;
16:               Total loss ;
17:               Obtain the gradient ;
18:               Update by Eq. (5);
19:               Update by Eq. (6);
20:          end for
21:     end for
22:end for
23:return .
Algorithm 2 BAST attack

4 Experiments

In this section, we conduct extensive experiments to validate the effectiveness of the proposed method. We first specify the experimental settings in Sec. 4.1. Then we describe a new evaluation criterion in at-least-non-targeted attack protocol in Sec. 4.2. We further evaluate our proposed method utilizing different adversarial models in Sec. 4.3 and Sec. 4.4.

4.1 Setup

We conduct experiments on ImageNet [13]. The maximum perturbation is set to 16, with pixel values in [0, 255]. We ensemble three models, which are a normally trained model—Inception v3 (Inc-v3) [16], an adversarially trained model—AdvInception v3 (AdvInc-v3) [18] and an extremely robust model trained by Facebook—ResnextDenoiseAll (AdvDeRex) [20], as the substitute model to be attacked. For BAST attack, we set Inc-v3 and AdvInc-v3 as easy-to-attack models, AdvDeRex as the robust model. Besides white-box attack success rates of the three models, we also report black-box attack success rates evaluated on a normally trained model—Inception v1 (Inc-v1) [15] and an adversarially trained model—AdvInceptionResnet v2 (AdvIncres-v2) [18]. We evaluate with 1000 images from ImageNet [13] which are specially chosen to be classified by our models correctly. For traditional ensemble attack methods, the number of iterations is 100, which is enough to make the algorithm converge. For BAST attack, as we conduct targeted attack on easy-to-attack models for times and then conduct non-targeted attack on robust models for times, the number of iterations is set as [] for comparison. We set and .

Method Inc-v3 AdvInc-v3 AdvDeRex Inc-v1 * AdvIncres-v2 *


Bagging attack
0.0/100.0/100.0 0.0/99.8/99.8 18.3/4.0/13.15 32.7/6.2/22.55 23.7/7.5/19.35
Stacking attack 0.0/99.5/99.5 3.0/91.2/92.7 4.9/0.1/2.55 11.3/0.9/6.55 14.0/0.5/7.5

BAST attack
0.0/100.0/100.0 0.7/98.2/98.55 72.0/0.1/36.1 40.5/6.7/26.95 29.7/8.5/23.35
Table 1: Results on ImageNet, shown as A/B/C(%), where A represents that non-targeted attack succeeds but targeted attack fails, B represents targeted attack succeeds and C is the score computed by Eq.(8). The sign * represents attack on this model is black-box attack.
Figure 4: Adversarial examples and perturbations generated by BAST attack, Bagging attack and Stacking attack. (a): Natural examples and their target class labels. (b): BAST attack. The predicted labels by AdvDeRex are in the yellow patch. (c): Bagging attack. (d): Stacking attack. All adversarial images are classified by Inc-v3 and AdvInc-v3 as the target labels, while only adversarial images in (b) successfully fool AdvDeRex.

4.2 Evaluation Criteria

Notice that we aim to obtain a weighted combination of targeted and non-targeted attack. Thus a new evaluation criterion is needed to verify the efficiency of tested methods. The evaluation score is defined as the following formula:

(8)

and

(9)

where n is the number of images for evaluation and is the attack score on one model. For each image , if targeted attack is successful on a model, is 1 point on this model. If targeted attack is not successful, but non-targeted attack is successful, then is 0.5 point. Otherwise, is 0 point.

Method Inc-v3 AdvInc-v3 AdvDeRex Inc-v1 * AdvIncres-v2 *

Without-stacking
0.0/100.0/100.0 0.0/99.9/99.9 62.5/0.1/31.35 41.3/3.7/24.35 32.0/5.0/21.0
Without-bagging 0.0/99.1/99.1 2.8/93.5/94.9 69.7/0.1/34.95 39.3/5.0/24.65 27.7/5.3/19.15
BAST 0.0/100.0/100.0 0.7/98.2/98.55 72.0/0.1/36.1 40.5/6.7/26.95 29.7/8.5/23.35

Table 2: Results for different ensemble methods, shown as A/B/C. A represents that non-targeted attack succeeds but targeted attack fails, B represents targeted attack succeeds and C is the score computed by Eq.(8). The sign * represents attack on this model is black-box attack.
Hyperparameters Inc-v3 AdvInc-v3 AdvDeRex Inc-v1 * AdvIncres-v2 *
(a) m=1, n=1 0.0/99.6/99.6 2.8/92.2/93.6 77.7/0.1/38.95 41.6/3.7/24.5 30.8/5.4/20.8
(b) m=2, n=1 0.0/100.0/100.0 0.7/98.2/98.55 72.0/0.1/36.1 40.5/6.7/26.95 29.7/8.5/23.35
(c) m=3, n=1 0.0/100.0/100.0 0.3/98.9/99.05 67.3/0.1/33.75 39.8/7.5/27.4 27.1/9.9/23.45
(d) m=5, n=5 0.0/99.8/99.8 3.0/92.0/93.5 74.4/0.1/37.3 38.8/4.0/23.4 28.6/4.2/18.5
(e) m=10, n=10 0.0/99.9/99.9 3.4/91.6/93.3 71.6/0.1/35.9 37.5/4.2/22.95 27.0/4.6/18.1
Table 3: Results for different combinations of and in BAST attack, shown as A/B/C. A represents that non-targeted attack succeeds but targeted attack fails, B represents targeted attack succeeds and C is the score computed by Eq.(8). The sign * represents attack on this model is black-box attack.

4.3 Main Results

We evaluate three methods: (1) Bagging attack. The targeted attack is performed on all three models in the bagging way, which means the ensemble method is Eq.(7), where the ensemble weight for each model is the same. (2) Stacking attack. The targeted attack is performed on each model alternately in the stacking way. (3) Our BAST attack.

Our main results are in Table 1. We show that:

(1) Our new attack mechanism, at-least-non-targeted attack, plays a key role in attacking multiple models, especially when attacking some robust models.

Focus on the column “AdvDeRex” in Table 1. When we conduct targeted ensemble attack, the highest targeted attack success rate on AdvDeRex is 4.0%. The low success rates indicate that targeted adversarial examples have a low attack success rate especially on robust models. Thus a successful non-targeted attack on these robust models is essential.

(2) Our BAST attack outperforms other methods in the at-least-non-targeted attack protocol.

BAST attack gets competitive results with other two targeted ensemble attacks on easy-to-attack models. On the robust model, however, BAST attack largely improve the non-targeted attack success rate with little decrease in targeted attack success rate. We list some natural images in ImageNet and corresponding adversarial images and perturbations crafted by BAST attack in Fig. 4. Moreover, in black-box attack on ImageNet, BAST attack outperforms targeted ensemble attacks on the success rate of both non-targeted and targeted attack, indicating that our BAST attack improves the transferability of crafted adversarial examples.

4.4 Analysis of BAST Attack

In this section we study factors which affect the performance of BAST attack.

We first focus on the ensemble method. Our BAST attack is intuited from bagging and stacking. It is natural to compare it with two special ensemble methods: (1) Without-stacking, which ensembles all models only in the bagging way. Pay attention to the difference of Without-stacking with Bagging attack in Table 1: Without-stacking is a variant of BAST, also set in at-least-non-targeted protocol, which means the attack on the robust models is always non-targeted. (2) Without-bagging, which ensembles all models only in the stacking way.

Results are in Table 2. According to the evaluation score, BAST outperforms other two methods in both white-box and black-box protocol. One intriguing phenomenon is that the non-targeted attack success rates of three methods on AdvDeRex are far higher than 15.6% of non-targeted bagging attack, indicating that performing non-targeted attack on the bagging ensemble weakens the effect of robust models.

We then study the effect of hyperparameters. In one iteration of BAST attack, we conduct targeted attack on easy-to-attack models for times and then conduct non-targeted attack on robust models for times. Experiments are conducted on different combinations of and , and results are shown in Table 3. We conclude as follows:

(1) Comparing (a) with (b) and (c), the targeted attack success rate of AdvInc-v3 increases with more targeted attack iterations, while the non-targeted attack success rate of AdvDeRex decreases. Thus the trade-off between non-targeted and targeted attack success rate can be adjusted flexibly in this way.

(2) Comparing (a) with (d) and (e), on the condition , the performance becomes poor as the value of n and m increases. and with large value limit the diversity of ensemble.

5 Conclusion

In this paper, we have stated a new adversarial attack mechanism, at-least-non-targeted attack, which is more important for practical application. In order to achieve this, a novel ensemble attack method, BAST attack has been proposed. Extensive experiments have shown the effectiveness of the BAST attack which has improved non-targeted attack success rate while keeping targeted attack performance. That means it outperforms the state-of-the-art ensemble attacks.

6 Acknowledgement

We gratefully thank Tianxiang Ma and Yueming Lv for their assistance with the experiments.

References

  • [1] L. Breiman (1996) Bagging predictors. Machine Learning 24, pp. 123–140. Cited by: §2.1.5, §3.3.
  • [2] R. Caruana, A. Niculescu-Mizil, G. Crew, and A. Ksikes (2004) Ensemble selection from libraries of models. In ICML, Cited by: §2.1.5.
  • [3] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li (2018) Boosting adversarial attacks with momentum.

    IEEE/CVF Conference on Computer Vision and Pattern Recognition

    , pp. 9185–9193.
    Cited by: Figure 1, §1, §2.1.2, §2.1.3, §2, §3.1.
  • [4] Y. Dong, T. Pang, H. Su, and J. Zhu (2019) Evading defenses to transferable adversarial examples by translation-invariant attacks. IEEE/CVF Conference on Computer Vision and Pattern Recognition. Cited by: §2.1.4, §2.
  • [5] I. J. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. ICLR. Cited by: §1, §2.2.
  • [6] L. K. Hansen and P. Salamon (1990) Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12, pp. 993–1001. Cited by: §2.1.5.
  • [7] A. Krogh and J. Vedelsby (1994)

    Neural network ensembles, cross validation, and active learning

    .
    In NIPS, Cited by: §2.1.5.
  • [8] A. Kurakin, I. J. Goodfellow, S. Bengio, Y. Dong, F. Liao, M. Liang, T. Pang, J. Zhu, X. Hu, C. Xie, J. Wang, Z. Zhang, Z. Ren, A. L. Yuille, S. Huang, Y. Zhao, Y. Zhao, Z. Han, J. Long, Y. Berdibekov, T. Akiba, S. Tokui, and M. Abe (2018) Adversarial attacks and defences competition. ArXiv abs/1804.00097. Cited by: §3.1.
  • [9] Y. Liu, X. Chen, C. Liu, and D. X. Song (2017) Delving into transferable adversarial examples and black-box attacks. ICLR. Cited by: §1, §2.1.5.
  • [10] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018) Towards deep learning models resistant to adversarial attacks. ICLR. Cited by: §2.2.
  • [11] N. Papernot, P. D. McDaniel, I. J. Goodfellow, S. Jha, Z. B. Celik, and A. Swami (2016) Practical black-box attacks against machine learning. In AsiaCCS, Cited by: §1.
  • [12] N. Papernot, P. D. McDaniel, and I. J. Goodfellow (2016) Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. ArXiv abs/1605.07277. Cited by: §1.
  • [13] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and L. Fei-Fei (2015) ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, pp. 211–252. Cited by: §2, §4.1.
  • [14] Y. Sharma, T. Le, and M. Alzantot (2018) CAAD 2018: generating transferable adversarial examples. ArXiv abs/1810.01268. Cited by: §2, §3.2.
  • [15] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich (2015) Going deeper with convolutions. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. Cited by: §4.1.
  • [16] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna (2016) Rethinking the inception architecture for computer vision. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2818–2826. Cited by: Figure 1, §1, §4.1.
  • [17] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus (2014) Intriguing properties of neural networks. ICLR. Cited by: §1, §2.1.1, §2.2.
  • [18] F. Tramèr, A. Kurakin, N. Papernot, I. J. Goodfellow, D. Boneh, and P. D. McDaniel (2018) Ensemble adversarial training: attacks and defenses. ICLR. Cited by: Figure 1, §1, §4.1.
  • [19] D. H. Wolpert (1992) Stacked generalization. Neural Networks 5 (2), pp. 241–259. Cited by: §2.1.5, §3.3.
  • [20] C. Xie, Y. Wu, L. van der Maaten, A. L. Yuille, and K. He (2019) Feature denoising for improving adversarial robustness. IEEE/CVF Conference on Computer Vision and Pattern Recognition. Cited by: Figure 1, §1, §2.2, §4.1.
  • [21] C. Xie, Z. Zhang, J. Wang, Y. Zhou, Z. Ren, and A. L. Yuille (2019) Improving transferability of adversarial examples with input diversity. IEEE/CVF Conference on Computer Vision and Pattern Recognition. Cited by: §2.1.3, §2, §3.2.