Learning Transferable Adversarial Examples via Ghost Networks

12/09/2018 ∙ by Yingwei Li, et al. ∙ Johns Hopkins University 0

The recent development of adversarial attack has proven that ensemble-based methods can perform black-box attack better than the traditional, non-ensemble ones. However, those methods generally suffer from high complexity. They require a family of diverse models, and ensembling them afterward, both of which are computationally expensive. In this paper, we propose Ghost Networks to efficiently learn transferable adversarial examples. The key principle of ghost networks is to perturb an existing model, which potentially generates a huge set of diverse models. Those models are subsequently fused by longitudinal ensemble. Both steps almost require no extra time and space consumption. Extensive experimental results suggest that the number of networks is essential for improving the transferability of adversarial examples, but it is less necessary to independently train different networks and then ensemble them in an intensive aggregation way. Instead, our work can be a computationally cheap plug-in, which can be easily applied to improve adversarial approaches both in single-model attack and multi-model attack, compatible with both residual and non-residual networks. In particular, by re-producing the NIPS 2017 adversarial competition, our work outperforms the No.1 attack submission by a large margin, which demonstrates its effectiveness and efficiency.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years, Convolutional Neural Networks (CNN) have greatly advanced performance in various vision tasks, including image recognition 

[11, 14, 26], object detection [25, 7], and semantic segmentation [4, 19]etc. However, it has been observed [31, 9]

that adding human imperceptible perturbations to input image can cause CNN to make incorrect predictions even if the original image can be correctly classified. These intentionally generated images are usually called adversarial examples 

[9, 15, 31].

Figure 1: An illustration of the capacity of the proposed ghost networks in learning transferable adversarial examples. The base model is ResNet-50, which is used not only to generate adversarial examples and but also to generate ghost networks. The evaluation is done on Inception v3.

A recent development [18] has demonstrated that adversarial examples generated from certain models (especially for those learned by iteration-based methods [15, 6]) are less transferable to other models. In other words, those adversarial examples easily overfit to a specific network and achieve much lower attack rate in black-box settings (i.e., attackers have no knowledge of the models they may attack). To remedy this, an ensemble [18] of multiple networks has been suggested to improve the transferability.

However, ensemble-based attacks suffer from expensive computational overhead, making it difficult to efficiently learn transferable adversarial examples. First, in order to acquire good (i.e., low test error) and diverse (i.e., converge at different local minima) models, people usually independently train them from scratch. To leverage their complementarity, existing methods adopt an intensive aggregation way to fuse the outputs of those networks (e.g

., logits). Consequently, attacking methods in 

[17] ensemble at most 10 networks restricted by the computational complexity.

In this work, we propose a highly efficient alternative called Ghost Networks to address this issue. The basic principle is to generate a huge number of virtual models built on a network trained from scratch (a base network or a base model). The word “virtual” means that those ghost networks are not stored or trained, which incurs extra time and space cost. Instead, they are simply generated by imposing erosion on certain intermediate structures of the base network and then used on-the-fly. In this case, with an increasing number of models we have, it is apparent that a standard ensemble [18] would be problematic owing to its complexity. Accordingly, we propose Longitudinal Ensemble, a specialized fusion method for ghost networks, which conducts an implicit ensemble during attack iterations. Consequently, transferable adversarial examples can be easily generated without sacrificing computational efficiency.

To summarize, the contributions of our work can be divided into three folds: 1) Our work is the first one to explore network erosion to learn transferable adversarial examples, which does not solely rely on multi-network ensemble. 2) We observe that the number of different networks actually used for ensemble (intrinsic networks) is essential for the transferability. However, it is less necessary to train different models independently. Instead, ghost networks can be a competitive alternative with extremely low complexity. 3) It is generic. Though appearing to be an ensemble-based method for multi-model attacks, it can be also applied to single-model attacks where only one trained model is accessible. Furthermore, it is also compatible with different network structures, attack methods, and adversarial settings.

Extensive experimental results demonstrate that our work is a computationally cheap plug-in which improves the transferability of adversarial examples. In particular, by re-producing NIPS 2017 adversarial competition [17], our work outperforms the No.1 attack submission by a large margin, which demonstrates its effectiveness and efficiency.

2 Related Work

Deep networks have been shown to be vulnerable to adversarial examples, i.e., maliciously perturbed inputs designed to mislead a model [2, 31, 9, 22].

The transferability of adversarial examples refers to the property of the same input image easily misclassified by different models. This was first investigated by Szegedy et al[31] on MNIST, which later led to the development of black-box attack. Afterward, Liu et al[18]

proposed ensemble-based approaches which demonstrate transferability on large-scale datasets like ImageNet 

[5].

Optimization-based methods (e.g., Carlini-Wagner attack [3]) and iteration-based methods (e.g., I-FGSM [15]) tend to overfit a specific network structure, and thus leading to weak transferability [16]. On the other hand, single-step gradient-based methods, such as Fast Gradient Sign Method (FGSM) [9], are good at learning stronger transferable adversarial examples but are less successful for the white-box attacks. Taking advantages of [15] and [9], Dong et al[6] proposed a momentum iterative method to generate adversarial examples with stronger transferability. In order to further avoid overfitting specific models, it uses an ensemble of trained-from-scratch models.

However, how to efficiently learn transferable adversarial examples remains a challenging task. Some works [1, 24, 35] suggest re-training neural networks, e.g., generative adversarial learning [8], can achieve high transferability. Moreover, Papernot et al[23] proposed a query-based method to improve black-box attack performance. However, this requires massive information from the target model. In conclusion, acquiring and integrating information from various models to approximate the target model is the key to achieving better transferability. However, most existing works are inefficient and inadequate to learn adversarial examples with strong transferability. Our work addresses this issue with high efficiency.

3 Ghost Networks

The goal of this work is to learn adversarial examples, with a particular attention on the transferability. Given a clean image , we want to find an adversarial example , which is still visually similar to after adding adversarial noise but fools the classifier. In order to improve the transferability of , we choose to simultaneously attack multiple models. However, unlike existing works [6, 18], we propose Ghost Networks, an highly efficient algorithm that can do both generation and combination an ensemble of diverse models to learn transferable adversarial examples.

We introduce two strategies of generating ghost networks in Sec. 3.1 and Sec. 3.2, respectively, then present a customized fusion method named longitudinal ensemble in Sec. 3.3.

3.1 Dropout Erosion

Revisiting Dropout. Dropout [27]

has been one of the most popular techniques in deep learning. By randomly dropping out units from the model during the training phase, dropout can effectively prevent deep neural networks from overfitting. Some recent works 

[26, 28, 30, 29] achieve state-of-the-art performances on benchmark datasets by applying dropout to a layer of high-level features.

Let be the activation in the -th layer, at training time, the output after the dropout layer can be mathematically defined as

(1)

where denotes an element-wise product and

denotes the Bernoulli distribution with the probability

of being . At test time, units in are always present, thus to keep the output is the same as the expected output at training time, is set to be .

Perturbing Dropout. Dropout provides an efficient way of approximately combining different neural network architectures and thereby prevents overfitting. Inspired by this, we propose to generate ghost networks by inserting the dropout layer. In order to make these ghost networks as diverse as possible, we densely apply dropout to every block throughout the base network. Therefore, diversity is not limited to high-level features but applies to all feature levels.

Let be the function between the -th and -th layer, i.e., , then the output after applying dropout erosion is

(2)

where , and has the same meaning as in Eq. (1), indicating the probability that is preserved. To keep the expected input of consistency after erosion, the activation of should be divided by .

During the inference, the output feature after -th dropout layer () is

(3)

where denotes the composite function, more specifically, .

By combining Eq. (2) and Eq. (3), we observe that when (means ), all elements in are equal to . In this case, we do not impose any perturbations to the base network. When gradually increases to (meaning decreases to ), the ratio of the number of elements that are dropped out is . In other words, of elements can be successfully back-propagated. Hence, larger implies a heavier erosion on the base network. Therefore, we define to be the magnitude of erosion.

When perturbing the dropout layer, the gradient in back-propagation can be written as

(4)

As shown in Eq. (4), deeper networks with larger are influenced more easily according to the product rule. Sec. 4.2 will experimentally analyze the impact of .

Generating Ghost Network. The generation of ghost networks via perturbing dropout layer proceeds in three steps: 1) randomly sample a parameter set from the Bernoulli distribution ; 2) apply Eq. (2) to the base network with the parameter set and get the perturbed network; 3) we independently sample for times and obtain a pool of ghost networks which can be used for adversarial attacks.

Figure 2: An illustration of skip connection (a, Eq. (5)) and skip connection erosion (b, Eq. (6)).

3.2 Skip Connection Erosion

Revisiting Skip Connection. He et al[10, 11] propose skip connection in CNN, which makes it feasible to train very deep neural networks.

The standard residual block in [11] is defined by

(5)

where and are the input and output to the -th residual block with the weights . denotes the residual function. As suggested in [11], it is crucial to uses the identity skip connection, i.e., , to facilitate the residual learning process, otherwise the network may not converge to a good local minima.

Perturbing Skip Connection. Following the principle of skip connection, we propose to perturb skip connection to generate ghost networks.

More specifically, the network weights are first learned using identity skip connections, then switched to the randomized skip connection (see Fig. 2). To achieve this end, we apply randomized modulating scalar to the -th residual block, by

(6)

where

is drawn from the uniform distribution

. One may have observed several similar formulations on skip connection to improve the classification performance, e.g., gated inference in ConvNet-AIG [33] and lesion study in [34]. However, our work focuses on attacking the model with a randomized perturbation on skip connection, i.e., the model is actually not trained via Eq. (6).

During inference, the output feature after -th layer () is

(7)

The gradient in back-propagation is then written as

(8)

Similar to the analysis in Sec. 3.1, we conclude from Eq. (7) and Eq. (8) that a larger will have a greater influence on the base network and deeper networks are easily influenced.

Generating Ghost Network. The generation of ghost networks via perturbing the skip connections is similar to the generating procedure via perturbing the dropout layer. The only difference is that in the first step, we need to sample a set of modulating scalars from the uniform distribution for each skip connection.

3.3 Longitudinal Ensemble

Existing iteration-based ensemble-attack approaches [6, 18] require averaging the outputs (e.g., logits, classification probabilities, losses) of different networks. However, such a standard ensemble [18] is too costly and inefficient for us because we can readily obtain a huge candidate pool of qualified neural models by using Ghost Networks.

To remedy this, we propose longitudinal ensemble, a specialized fusion method for Ghost Networks, which constructs an implicit ensemble of the ghost networks by randomizing the perturbations during the iterations of adversarial attack (e.g., I-FGSM [15] and MI-FGSM [6]). Suppose we have a base model , which can then generate a pool of networks , where is the model number. The key step of longitudinal ensemble is that at the -th iteration, we attack the ghost network only. In comparison, for each iteration, the standard ensemble requires fusing the gradients of all the models in the model pool , which requires more computational cost. We illustrate the difference between the standard ensemble and longitudinal ensemble method in Fig. 3.

Figure 3: The illustration of the standard ensemble (a) and the proposed longitudinal ensemble (b).

The longitudinal ensemble shares the same prior as [6, 18] that if an adversarial example is generated by attacking multiple networks, then it is more likely to transfer to other networks. However, the longitudinal ensemble removes duplicated computations by only sampling one model from the model pool rather than using all models in each iteration.

Three noteworthy comments should be made here. First, the ghost networks are never stored or trained. As a result, this incurs neither additional time nor space complexity. Second, it is obvious from Fig. 3 that attackers can combine the standard ensemble and the longitudinal ensemble of ghost networks. Finally, it is easy to extend longitudinal ensemble to multi-model attack by considering each base model as a branch (see experimental evaluations for details).

4 Experiments

In this section, we give a comprehensive experimental evaluation of the proposed Ghost Networks. In order to distinguish models trained from scratch and the ghost networks we generate, we call the former one the base network or base model in the rest of this paper.

Due to space limitations, we will give a more detailed evaluation in the supplementary material.

4.1 Experimental Setup

Base Networks.  base models are used in our experiments, including normally trained models111Available at https://github.com/tensorflow/models/tree/master/research/slimi.e., Resnet v2-50 (Res-50) [11], Resnet v2-101 (Res-101) [11], Resnet v2-152 (Res-152) [11], Inception v3 (Inc-v3) [28], Inception v4 (Inc-v4) [30] and Inception Resnet v2 (IncRes-v2) [30], and adversarially-trained models [32]222Available at https://github.com/tensorflow/models/tree/master/research/adv_imagenet_modelsi.e., Inc-v3ens3, Inc-v3ens4 and IncRes-v2ens.

Datasets. Because it is less meaningful to attack images that are originally misclassified, we select images from the ILSVRC 2012 validation set following [36], which can be correctly classified by all the base models.

Attacking Methods. We employ two iteration-based attack methods to evaluate the adversarial robustness, i.e., Iterative Fast Gradient Sign Method (I-FGSM) and Momentum Iterative Fast Gradient Sign Method (MI-FGSM). Both of them are variants of Fast Gradient Sign Method (FSGM) [9], and are available at cleverhans library [21].

I-FGSM was proposed by Kurakin et al[15], and learns the adversarial example by

(9)

where

is the loss function of a network with parameter

. is the clip function which ensures the generated adversarial example is within the -ball of the original image with ground-truth label . is the iteration number and is the step size. MI-FGSM was proposed by Dong et al[6], and integrates the momentum term into the attack process to stabilize the update directions and escape from poor local maxima. At the -th iteration, the accumulated gradient is calculated by:

(10)

where is the decay factor of the momentum term. The sign of the accumulated gradient is then used to generate the adversarial example, by

(11)

Parameter Specification. If not specified otherwise, we follow the default settings in [15]i.e., step size and the total iteration number . We set the maximum perturbation (the iteration number in this case). For the momentum term, the decay factor is set to be as in [6].

4.2 Analysis of Ghost Networks

As analyzed above, in order to generate adversarial examples with good transferability, there are generally two requirements for the intrinsic models. First, each individual model should have a low test error. Second, different models should be as diverse as possible (i.e., converge at different local minima). To show the generated ghost networks are qualified for adversarial attack, we conduct an experiment on the whole ILSVRC 2012 validation set [5].

Descriptive Capacity. In order to quantitatively measure the descriptive capacity of the generated ghost networks, we plot the relationship between the magnitude of erosion and top-1 classification accuracy.

We apply dropout erosion in Sec. 3.1 to non-residual networks (Inc-v3 and Inc-v4) and skip connection erosion in Sec. 3.2 to residual networks (Res-50, Res-101, Res-152 and IncRes-v2). Fig. 4 and Fig. 4 present the change curves of the dropout erosion and skip connection erosion, respectively.

It is not surprising to observe that the classification accurayies of different models are negatively correlated to the magnitude of erosion . By choosing the performance drop approximately equal to 10% as a threshold, we can determine the value of individually for each network. Although the performances of the ghost networks are slightly worse than those achieved by the independently trained base networks, the ghost networks still preserve low error rates. As emphasized throughout this paper, it is extremely cheap to generate a huge number of ghost networks.

In our following experiments, we set to , , , , and for Inc-v3, Inc-v4, Res-50, Res-101, Res-152 and IncRes-v2, respectively unless otherwise specified.

Figure 4: The top-1 accuracy of dropout erosion (a) and skip connection erosion (b) with different magnitude .

Model Diversity. To measure the diversity, we use Res-50 as the backbone model. We denote the base Res-50 described in Sec. 4.1 as Res-50-A, and independently train two additional models with the same architecture, denoted by Res-50-B and Res-50-C. Meanwhile, we apply skip connection erosion to Res-50-A, then obtain three ghost networks denoted as Res-50S-A, Res-50S-B and Res-50S-C, respectively.

We employ the Jensen-Shannon Divergence (JSD) as the evaluation metric for model diversity. Concretely, we compute the pairwise similarity of the output probability distribution (

i.e

., the predictions after softmax layer) for each pair of networks as in 

[12]. Given any image, let and denote the softmax outputs of two networks, then JSD is defined as

(12)

where is the average of and i.e., .

is the Kullback-Leibler divergence.

In Fig. 5, we report the averaged JSD for any pairs of networks over the ILSVRC 2012 validation set. As can be drawn, the diversity between ghost networks is comparable or even larger than independently trained networks.

Figure 5: The illustration of the mean diversity of any pair of networks over the ILSVRC 2012 validation set.

Based on the analysis of descriptive capacity and model diversity, we can see that the generated ghost networks can provide accurate enough yet diverse descriptions of the data manifold, which is beneficial to learn transferable adversarial examples as we will experimentally prove below.

4.3 Single-model Attack

Attack Methods Res-50 Res-101 Res-152 IncRes-v2 Inc-v3 Inc-v4
W B W B W B W B W B W B
I-FGSM [15] Exp. S1 99.5 16.3 99.4 17.8 98.4 16.7 94.8 8.3 99.8 5.3 99.5 7.3
Exp. S2 98.7 8.4 78.8 6.1 92.4 6.4 95.9 5.7 67.6 1.7 39.6 1.9
Exp. S3 (ours) 99.7 23.4 99.6 23.7 99.4 21.1 96.5 11.2 97.0 6.3 86.8 10.0
Exp. S4 99.6 28.8 99.7 29.9 99.6 25.6 98.7 13.1 98.9 6.3 96.2 9.3
Exp. S5 (ours) 99.6 35.9 99.7 35.9 99.6 60.1 98.7 14.6 99.9 12.3 98.5 19.4
MI-FGSM [6] Exp. S1 99.4 29.4 99.2 31.3 98.3 29.6 94.0 20.0 99.8 13.7 99.5 18.4
Exp. S2 99.4 17.4 99.2 19.9 98.6 17.9 94.1 15.2 85.9 5.6 76.5 7.2
Exp. S3 (ours) 99.7 39.4 99.8 40.1 99.5 38.0 95.9 26.8 98.0 17.6 90.6 22.4
Exp. S4 99.6 44.5 99.7 43.2 99.3 41.9 98.5 30.4 99.7 17.9 98.3 25.6
Exp. S5 (ours) 99.6 50.6 99.7 51.4 98.6 64.9 98.3 33.3 99.8 28.3 97.8 37.4
Table 1: The attack rate (%) comparison of different methods. “W” denote the white-box attack, and “B” denotes the black-box attack. Due to the space limitation, we report the average performances of out of models in black-box attack.

Firstly, we evaluate the ghost networks in single-model attack, where attackers can only access one base model trained from real data. To demonstrate the effectiveness of our method, we design five experimental comparisons, as

Exp. S1: We attack the base model by the two attack methods (I-FGSM [15] or MI-FGSM [6]) as two baselines.

Exp. S2: We apply erosion to the base model and obtain one ghost network . Then, an adversarial attack is conducted to to generate the adversarial examples.

Exp. S3: We independently apply erosion times to get a pool of ghost networks , then utilize the proposed longitudinal ensemble to efficiently fuse them during adversarial attack.

Exp. S4: Similar to Exp. S3, the only difference is that we use the standard ensemble method proposed in [18] to fuse the ghost networks.

Exp. S5: ghost networks are generated, which are fused in a manner, that is, we do a standard ensemble of models for each iteration of attack and a longitudinal ensemble of models.

We attack normally-trained networks, and test on all the models ( adversarially-trained network are included). The attack rate is shown in Table 1. Due to the space limitation, we report the average performances for black-box attack, rather than each individual performance on each testing model (all the individual cases are shown in Fig. 6 and the supplementary material.).

As can be drawn from Table 1, a single ghost network is worse than the base network (Exp. S2 vs. Exp. S1), due to the fact that the descriptive power of ghost network is inferior to base network. However, by leveraging the longitudinal ensemble, our work achieves a much higher attack rate at most settings, especially for the black-box atack. For example, when attacking Res-50 in black-box attack, Exp. S3 outperforms Exp. S1 by with I-FGSM and by with MI-FGSM. This observation firmly demonstrates the effectiveness of ghost networks in learning transferable adversarial examples. It should be mentioned that the computational cost almost remains the same as Exp. S1 for two reasons. First, the ghost networks used in Exp. S3 are not trained but eroded from the base model and used on-the-fly. Second, multiple ghost networks are fused via the longitudinal ensemble, instead of the standard ensemble method in [18].

In fact, the proposed ghost networks can be also fused via the standard ensemble method, as shown in Exp. S4. In this case, we can report a higher attack rate at the sacrifice of computational efficiency. For instance, Exp. S4 reports attack rate of by attacking Res-50 with I-FGSM in black-box setting, an improvement of over Exp. S3.

This observation, from another point of view, inspires us to combine the standard ensemble and the longitudinal ensemble as shown in Exp. S5. As we can see, Exp. S5 consistently beats all the compared methods in all the black-box settings. Of course, Exp. S5 is as computational expensive as Exp. S4. However, the additional computational overhead stems from the standard ensemble, rather than longitudinal ensemble proposed in this work.

Note that in all the experiments presented in Table 1, we use only one individual base model. Even in the case of Exp. S3, all the to-be-fused models are ghost networks. However, the generated ghost networks are never stored or trained, which means no extra space complexity are needed. Therefore, one can clearly observe the benefit of ghost networks. Especially when comparing Exp. S5 and Exp. S1, ghost networks can achieve a dramatic improvement in black-box attack.

Based on the experimental results above, we arrive a similar conclusion as [18], that is, the number of intrinsic models is essential to improve the transferability of adversarial examples. However, a different conclusion is that it is less necessary to independently train different models. Instead, ghost networks can be a computationally cheap alternative which can enable a good performance. When the number of intrinsic models increases, the attack rate will increase. We will further exploit this in multi-model attack.

Figure 6: The attack rate comparison when attacking Res-50 (a)(b) and Inc-v3 (c)(d) with the attack method I-FGSM (a)(c) and MI-FGSM (b)(d), and testing on all the base models.

In Fig. 6, we select two base models, i.e., Res-50 and Inc-v3, to attack and present their individual performances when testing on all the base models. One can easily observe the positive effect of ghost networks on improving the transferability of adversarial examples.

4.4 Multi-model Attack

In this subsection, we evaluate ghost networks in multi-model setting, where attackers have access to multiple networks trained independently.

4.4.1 Same Architecture and Different Parameters

We first evaluate a simple setting of multi-model attack, where the base models share the same network architecture but have different weights. The same three Res-50 models as in Sec. 4.2 are used, denoted as , and . Then, we denote the -th ghost network generated upon as .

Exp. M1: A standard ensemble of the base model for three times. This is simply equivalent to single-model attack, which can serve as a weak baseline.

Exp. M2: A standard ensemble of the base models and and , which can serve as a strong baseline.

Exp. M3: A standard ensemble of and and , which simply replaces the base model in Exp. 1 with three ghost networks associated to it.

Exp. M4: A standard ensemble of and and , which replaces the base networks used in Exp. 2 with ghost networks, each one associated to a base model.

Exp. M5 ghost networks are generated upon the base model . They are fused in a manner, that is, we do a standard ensemble of models for each iteration of attack, and a longitudinal ensemble of models in total.

Exp. M6: For each base model, we generate ghost networks. At the -th iteration of attack, we do a standard ensemble of , then do a longitudinal ensemble of for the -th base model.

The adversarial example generated by each method are used to test all the models. We report the average attack rates in Table 2. It is easy to understand that Exp. M2 performs better than Exp. M1, Exp. M3 and Exp. M4 as it has three independently trained models. However, by comparing Exp. M5 with Exp. M2, we observe a significant improvement of attack rate. For example, By using MI-FGSM as the attack method, Exp. M5 beats Exp. M2 by . Although Exp. M5 only has base model and Exp. M2 has , Exp. M5 actually fuses intrinsic models. Such a result further supports our previous claim that the number of intrinsic models is essential but it is less necessary to independently obtain them by training from scratch. Similarly, Exp. M6 yields the best performance as it has independently trained models and intrinsic models.

Methods Attack Rate Model Number
I-FGSM MI-FGSM #Base #Intrinsic
Exp. M1 25.51 37.22 1 1
Exp. M2 33.63 46.83 3 3
Exp. M3 28.88 37.23 1 3
Exp. M4 26.28 40.79 3 3
Exp. M5 38.29 52.53 1 30
Exp. M6 41.14 54.29 3 30
Table 2: The comparison of attack rate () of multi-model attack. #Base denotes the number of base models and #Intrinsic denotes the number of intrinsic models. We test on the networks described in Sec. 4.1 and report the average performances.
Settings Methods -Res-50 -Res-101 -Res-152 -IncRes-v2 -Inc-v3 -Inc-v4
Ensemble I-FGSM 98.08 98.06 98.46 99.22 98.78 99.02
I-FGSM + ours 92.86 93.04 92.62 96.02 95.46 96.82
MI-FGSM 97.62 99.46 97.86 98.98 98.32 98.84
MI-FGSM + ours 93.98 93.88 93.66 96.96 95.92 97.08
Hold-out I-FGSM 71.08 71.16 67.92 46.60 59.98 50.86
I-FGSM + ours 80.22 79.80 77.02 60.20 73.18 67.84
MI-FGSM 79.32 79.14 77.26 64.24 72.22 66.64
MI-FGSM + ours 87.14 86.14 84.64 74.18 82.06 79.18
Inc-v3ens3 I-FGSM 13.34 13.40 13.46 13.36 15.42 14.06
I-FGSM + ours 21.38 22.00 21.78 20.98 24.06 21.36
MI-FGSM 26.32 25.74 26.56 25.48 29.72 27.36
MI-FGSM + ours 34.10 34.50 35.00 33.78 39.78 36.64
Inc-v3ens4 I-FGSM 7.10 6.96 6.92 6.54 8.22 7.30
I-FGSM + ours 11.30 11.74 11.56 10.10 12.98 10.98
MI-FGSM 13.96 13.52 13.68 12.72 16.50 14.80
MI-FGSM + ours 17.82 17.68 17.78 16.06 22.16 18.82
IncRes-v2ens I-FGSM 11.36 10.92 11.34 10.94 12.40 11.52
I-FGSM + ours 18.42 18.26 18.66 17.94 20.08 17.40
MI-FGSM 22.40 22.06 22.58 22.40 25.12 23.02
MI-FGSM + ours 29.32 28.98 29.58 29.00 32.60 30.48
Table 3: The attack rate (%) comparison of multi-model attack. The sign “-” indicates the name of the hold-out model. “Ensemble” means white-box attack and the rest are black-box attack.
Methods Black-box Attack White-box Attack
TsAIL iyswim Anil Thomas Average Inc-v3_adv IncRes-v2_ens Inc-v3 Average
No.1 Submission 13.60 43.20 43.90 33.57 94.40 93.00 97.30 94.90
No.1 Submission+ours 14.80 52.28 51.68 39.59 97.62 96.00 95.48 96.37
Table 4: The attack rate (%) comparison in the NIPS 2017 Adversarial Challenge.

4.4.2 Different Architectures

Besides the baseline comparison above, we then evaluate ghost networks in the multi-model setting following [18]. We attack an ensemble of out of normally-trained models in this experiment, then test on the ensembled network (white-box setting) and the hold-out network (black-box setting). We also test on the adversarially-trained networks to evaluate the transferability of the generated adversarial examples in black-box attack.

The results are summarized in Table 3, our method achieves comparable attack rates on the ensembled network (white-box setting) than I-FGSM and MI-FGSM. However, the performances in black-box attack are significantly improved. For example, when holding out Res-50, our method improves the performance of I-FGSM from to , and that of MI-FGSM from to . When testing on the three adversarially-trained networks, the improvement is more notable. These results further testify to the ability of ghost networks to learn transferable adversarial examples.

4.5 NIPS 2017 Adversarial Challenge

Finally, we evaluate our method in a benchmark test of the NIPS 2017 Adversarial Challenge [17]. For performance evaluation, we use the top-3 defense submissions (black-box models), i.e., TsAIL333https://github.com/lfz/Guided-Denoise, iyswim444https://github.com/cihangxie/NIPS2017_adv_challenge_defense and Anil Thomas555https://github.com/anlthms/nips-2017/tree/master/mmd, and three official baselines (white-box models), i.e., Inc-v3adv, IncRes-v2ens and Inc-v3. The test dataset contains images with the same 1000-class labels as ImageNet [5].

Following the experimental setting of the No.1 attack submission [6], we attack on an ensemble of Inc-v3, IncRes-v2, Inc-v4, Res-152, Inc-v3ens3, Inc-v3ens4, IncRes-v2ens and Inc-v3adv [16]. The ensemble weights are set to / equally for the first seven networks and / for Inc-v3adv. The total iteration number is set to , and the maximum perturbation is randomly selected from . The step size .

The results are summarized in Table 4. Consistent with previous experiments, we observe that by applying ghost networks, the performance of the No. 1 submission can be significantly improved, especially with black-box attack. For example, the average performance of black-box attack is changed from to , an improvement of . The most remarkable improvement is achieved when testing on iyswim, where ghost networks leads to an improvement of . This suggests that our proposed method can generalize well to other defense mechanisms.

5 Conclusion

This work focuses on learning transferable adversarial examples for adversarial attack. We propose, for the first time, to exploit network erosion to generate a kind of virtual models called ghost networks. Ghost networks, together with the coupled longitudinal ensemble strategy, require almost no additional time and space consumption, therefore can be a rather efficient tool to improve existing methods in learning transferable adversarial examples. Extensive experiments (more in the supplementary material) have firmly demonstrated the efficacy of ghost networks. Note that the ghost networks in our work are generated by perturbing the dropout layer and skip connectection. However, it would be interesting to see the effect if other typical layers (e.g

., batch normalization 

[13]

, relu 

[20]) in neural networks are perturbed. We leave these issues as future work.

References

  • [1] S. Baluja and I. Fischer.

    Learning to attack: Adversarial transformation networks.

    In AAAI, 2018.
  • [2] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, and F. Roli.

    Evasion attacks against machine learning at test time.

    In ECML-PKDD, 2013.
  • [3] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In IEEE S&P, 2017.
  • [4] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI, 40(4):834–848, 2018.
  • [5] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  • [6] Y. Dong, F. Liao, T. Pang, H. Su, X. Hu, J. Li, and J. Zhu. Boosting adversarial attacks with momentum. In CVPR, 2018.
  • [7] R. Girshick. Fast r-cnn. In ICCV, 2015.
  • [8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 2014.
  • [9] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.
  • [10] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
  • [11] K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In ECCV, 2016.
  • [12] G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger. Snapshot ensembles: Train 1, get m for free. In ICLR, 2017.
  • [13] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
  • [14] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
  • [15] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. In ICLR Workshop, 2017.
  • [16] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale. In ICLR, 2017.
  • [17] A. Kurakin, I. Goodfellow, S. Bengio, Y. Dong, F. Liao, M. Liang, T. Pang, J. Zhu, X. Hu, C. Xie, et al. Adversarial attacks and defences competition. arXiv preprint arXiv:1804.00097, 2018.
  • [18] Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and black-box attacks. In ICLR, 2017.
  • [19] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015.
  • [20] V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, 2010.
  • [21] N. Papernot, F. Faghri, N. Carlini, I. Goodfellow, R. Feinman, A. Kurakin, C. Xie, Y. Sharma, T. Brown, A. Roy, A. Matyasko, V. Behzadan, K. Hambardzumyan, Z. Zhang, Y.-L. Juang, Z. Li, R. Sheatsley, A. Garg, J. Uesato, W. Gierke, Y. Dong, D. Berthelot, P. Hendricks, J. Rauber, R. Long, and P. McDaniel. cleverhans v2.1.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768, 2018.
  • [22] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In EuroS&P, 2016.
  • [23] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE S&P, 2016.
  • [24] O. Poursaeed, I. Katsman, B. Gao, and S. Belongie. Generative adversarial perturbations. In CVPR, 2017.
  • [25] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, 2015.
  • [26] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
  • [27] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. JMLR, 15(1):1929–1958, 2014.
  • [28] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi.

    Inception-v4, inception-resnet and the impact of residual connections on learning.

    In AAAI, 2017.
  • [29] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.
  • [30] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna.

    Rethinking the inception architecture for computer vision.

    In CVPR, 2016.
  • [31] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In ICLR, 2014.
  • [32] F. Tramèr, A. Kurakin, N. Papernot, D. Boneh, and P. McDaniel. Ensemble adversarial training: Attacks and defenses. In ICLR, 2018.
  • [33] A. Veit and S. Belongie. Convolutional networks with adaptive inference graphs. In ECCV, 2018.
  • [34] A. Veit, M. J. Wilber, and S. Belongie. Residual networks behave like ensembles of relatively shallow networks. In NIPS, 2016.
  • [35] C. Xiao, B. Li, J.-Y. Zhu, W. He, M. Liu, and D. Song. Generating adversarial examples with adversarial networks. In IJCAI, 2018.
  • [36] C. Xie, Z. Zhang, J. Wang, Y. Zhou, Z. Ren, and A. Yuille. Improving transferability of adversarial examples with input diversity. arXiv preprint arXiv:1803.06978, 2018.