, which adds well-designed tiny perturbations to the legitimate inputs to cause the intended misclassification of DNN models. Such attacks could cause severe safety, economic and social problems if launched to the DNNs deployed in practical applications ranging from face recognition, autonomous driving to speech authentication.
In order to address this critical challenge, the machine learning community has conducted extensive researches on the vulnerability of DNNs, from both the attack and defense aspects. Adversarial attack technique was pioneered by Szegedyet al.
szegedy2013intriguing. Since then, researchers have developed various adversarial attacking algorithms, targeting different types of DNN models including convolutional neural networks, recurrent neural networks and graph neural networks, and also different application scenarios, ranging from image classification, machine translation, to graph classification etc. Among those algorithms, one popular class of attack techniques is fast gradient sign method (FGSM), which performs one-step gradient computation to craft untargeted adversarial examples[goodfellow2014explaining]. Considering the relatively weak attack performance of FGSM, the machine learning community has proposed several iterative optimization-based techniques including C&W, I-FGSM and PGD that deliver the state-of-the-art attack performance [carlini2017towards, goodfellow2014explaining, madry2017towards]. Furthermore, some recent work has also proposed to use generative models, e.g., GAN and U-Net, to generate adversarial examples [poursaeed2018generative, xiao2018generating, goodfellow2014generative, ronneberger2015u].
Although the existing adversarial attack methods can already exhibit high attack success rate (ASR), especially in white-box attack scenario, from the perspective of practical deployment, they are still suffering one or more drawbacks, namely long adversarial example generating time, high memory cost for launching adversarial attack, insufficient robustness against defense methods and low transferability in black-box attack scenario.
Aiming to overcome these drawbacks, in this paper we propose a Content-aware Adversarial Attack Generator (CAG), to achieve real-time, low-cost, enhanced-robustness and high-transferability adversarial attack. We show some adversarial images generated by CAG in Figure 1. The features and benefits of CAG are summarized as follows:
CAG is a generative model-based attack, so it can avoid time-consuming iterative optimization procedure to generate adversarial examples. Compared with the state-of-the-art iterative attacks such as PGD and C&W, CAG achieves significant speedup (at least times), and hence makes real-time attack possible.
CAG utilizes a trainable embedding layer to encode all label information to one single model, unlike prior generative model-based methods which require different generative models for different targeted classes. In -class targeted attack scenario, the number of the required generative models is reduced from to , thereby drastically reducing the memory cost for launching attacks.
CAG integrates the class activation maps (CAMs) information into the training process, in contrast to many other attack methods that generate adversarial perturbations over the entire input. Consequently, CAG is able to generate adversarial perturbations that focus on the critical areas of input, and thus improves the attack’s robustness against the state-of-art defense approaches.
CAG exhibits high transferability across different DNN classifier models in black-box attack scenario. CAG can generate adversarial perturbations with better generality by introducing random dropout in the perturbations-generation process. As a result, CAG’s adversarial examples have higher transferability when attacking unseen classifiers.
The rest of this paper is organized as follows. Section 2 introduces the related work on adversarial attack and defense methods. Section 3 discusses our motivation. Section 4 describes the technical details of CAG. The experimental results are presented and analyzed in Section 5. Section 6 draws the conclusions of all findings in our paper.
2 Related Work
2.1 Adversarial Attacks
To define an adversarial attack, let be a set of the valid inputs from the dataset, be the valid class label, and be the well-trained DNN classifier. Let denote the -th benign instance and the corresponding true label. The goal of an adversarial attack is to create the , where is imperceptible adversarial perturbation. A nontargeted attack requires and a targeted attack specifies such that .
Fast gradient sign method (FGSM) is a one-step fast-adversarial-example-generation approach [goodfellow2014explaining]
. It aims to linearize loss function inneighborhood of a legitimate input and to find the exact maximum of the linearized loss function. Correspondingly, its adversarial example generation formula is as follows:
where denotes the true label, computes the gradient of the loss function, and denotes the sign function. Notice that here is the attack strength parameter to control the balance between the attack performance and the norm of the perturbations.
I-Fgsm & Pgd
Although FGSM is fast, its attack performance is relatively weak. Researchers have proposed various approaches to achieve stronger attack by improving the vanilla FGSM method. Kurakin et al. kurakin2016adversarial propose to take multiple steps of FGSM (I-FGSM) with smaller attack strength in an iterative way:
where is the adversarial image at the -th iteration, and clips the overall attack strength back to at the end of the iteration. Notice that in the case of using norm, I-FGSM is equivalent to another popular iteration-based attack method (PGD) [madry2017towards].
C&W [carlini2017towards] is an optimization-based attack method. It aims to optimize the loss function as follows:
where is the targeted class, denotes the softmax function, is a constant set by binary search, and is an adjustable parameter that encourages the attacker to find an adversarial example being classified as class with high confidence. By minimizing the above loss function using Adam optimizer in an iterative way, C&W can achieve high ASR with low perturbation norm.
One drawback of the iterative methods mentioned above is long generating time. Hence another method to generate adversarial examples is to use a generative model, such as GAN, Autoencoder[goodfellow2014generative] or U-Net. For instance, Xiao et al. xiao2018generating apply AdvGAN to craft perceptually realistic adversarial examples. Moreover, Baluja et al.
baluja2017adversarial develop an adversarial transformation network to convert inputs into adversarial examples. Poursaeedet al. poursaeed2018generative propose a method they name Generative Adversarial Perturbations (GAP) that uses a ResNet-based generative model [johnson2016perceptual] to perform adversarial attack.
2.2 Adversarial Defenses
The key idea of pixel deflection defense is to randomly replace pixels with nearby pixels [prakash2018deflecting]. To achieve the replacement, this method uses CAMs of the top-5 predictions to guide the update of the pixels [zhou2016learning]
. In this scenario, the probability of a pixel being updated is inversely proportional to the likelihood that the area contains the object. After the pixel replacement, a denoising operation is applied to recover the classification accuracy.
The mitigation of adversarial attack effects can also be achieved by using randomization. For instance, Xie et al.
proposes to first resize the input to random size xie2017mitigating. After that, a random padding operation is performed to pad zeros around the resized image. Though it may seem simple, this method can significantly improve the robustness of DNN models against adversarial attack.
Another type of popular defense methods is input transformation. Its key idea is to perform various transformations, such as bit-depth reduction, lossy compression and variance minimization on adversarial examples to mitigate the attack effects[guo2017countering, xu2017feature]. The reported experimental results show that these methods can achieve balance between robustness against attack and computation overhead.
Despite the abundance of researches on adversarial attack methods described in Section 2, existing approaches still suffer from several inherent drawbacks–in particular from the perspective of practical deployment.
Long Generation Time Iteration-based approaches predominate among current state-of-the-art methods, including PGD and C&W. Consequently, generating adversarial examples using iteration is time-expensive and requires extensive computational resources, especially for the targeted attack. For example, to achieve a high ASR, C&W method takes hours to generate 100 large-size adversarial examples on a GPU. Such long generation time makes launching the adversarial attack in real-time setting infeasible.
High Memory Cost Using an iteration-free generative model-based attack promises to avoid long generation time [xiao2018generating, poursaeed2018generative, baluja2017adversarial]. However, in these existing works if the attackers wants to achieve targeted attack to a specific class, they have to train different generator models for different targeted classes. For example, to prepare for the targeted attack to 1000 classes in the ImageNet dataset, in total 1000 different generator models have to be trained and stored, thereby causing massive memory cost.
Insufficient Robustness To date, most adversarial example generation is based on the search over the entire input size instead of focusing on the critical part of legitimate object content. Noticing this phenomenon, many defense methods have been developed to improve defense performance via integrating this information into the defense scheme. For instance, Luo et al. propose to mask out the background regions with little transformation performed on the critical areas [luo2015foveation]. Similarly, Prakash et al. propose to use pixel deflection to denoise and reconstruct the input by locally redistributing pixels under the guidance of the object position prakash2018deflecting. Consequently, such well-designed defense schemes make the existing adversarial attack exhibit insufficient robustness.
Low Transferability Most adversarial attack methods can achieve high ASR in the white-box attack scenario. However, in real-world applications, black-box attack is a more common environment setting. In such cases, the transferability of the generated adversarial examples is important to ensure a successful attack. However, to date on large-scale datasets and large DNN models, the existing adversarial attack approaches exhibit low transferability, thereby impeding the feasibility of launching real-life black-box attack.
Our Motivation Motivated to redress the above challenges plaguing the existing adversarial attack methods, we aim to develop an adversarial attack method that can 1) generate each adversarial example in a real-time manner; 2) require only one model for different targeted classes; 3) exhibit strong robustness against the-state-of-the-art defense techniques; and 4) exhibit high transferability in the black-box attack scenario. To fulfill those requirements, we develop CAG, an attack method with fast generation speed, low memory cost, improved robustness and high transferability. Next, we describe the model training and attack generation schemes of CAG in detail.
4 CAG: Content-aware Adversarial Attack Generator
4.1 Overall Architecture
illustrates the overall architecture of CAG. To generate an adversarial image, an input tensoris first constructed based on the given clean image , true label and targeted label . Then a generator model , in the format of U-Net, is used to generate the perturbations from . After that, is scaled to a fixed norm and added to . Finally, after clipping out-of-range values, the adversarial image is ready to mislead the classifier from original true class to the targeted class .
Fast Generation Speed using U-Net CAG utilizes U-Net as the underlying generative model. Therefore, when compared with other iteration-based attack methods, U-Net-based approach avoids time-expensive iterative procedure, and hence makes real-time generation of adversarial examples possible.
4.2 Building Input Tensors
Single Generative Model via Label Embedding As mentioned in Section 3, the main drawback of generative model-based attack is a need for massive amount of models for different target classes. To address this problem, we encode the class label information into the input tensor for U-Net. Figure 3 shows the overall procedure of constructing . Here the dimension of the clean image is denoted as where represent height, width and number of channels, respectively. Then during training phase, an embedding layer with the size of , where is the number of valid classes, is trained to encode the label information. Specifically, in the forward propagation pass, a targeted class is randomly selected for each training data . The target label , as well as the true label , are used to extract the corresponding slices and from , where denotes the k-th front slice of the embedding layer . Then in the backward propagation phase, and are updated to help capture more class information for this training data. After being trained on the entire dataset, the embedding layer learns the important class encoding information and thereby ensuring only one U-Net model is sufficient for different target classes.
Enhanced Attack Robustness using CAM Besides using an embedding layer, the construction of input tensor also utilizes the information of CAMs. Classifiers make decisions based heavily on the hot areas of CAM because they contain the most discriminative information of an image. Therefore, defense methods cannot make substantial modifications in these critical area, otherwise they can easily cause misclassifications. Taking advantage of this behavior, we place the perturbations only on the hot areas of CAM to enhance the robustness of our attack against many defense schemes. To achieve this increment in robustness, the position of the object in the image needs to be integrated into the input tensor, which can be reflected by the CAM. As shown in Figure 3, another component of input tensor is with the size of , which is the CAM with respect to input and its true label . Consequently, we denote as the concatenating operation, and the final input tensor is constructed as follows:
where the size of is .
4.3 Training CAG
Next, we describe the details of CAG training procedure. Our objective is to get and to achieve:
In this scenario, the embedding layer is treated as a model parameter that can be learned, so that can be any selected label from . Therefore, we can formulate an effective loss function , and use existing optimization algorithms to perform training as follows.
First, in order to keep the perturbations imperceptible, we scale the perturbations using the distance metric. In other words, we keep all the perturbations at a fixed norm to constrain the attack strength of the noise in a fixed amount.
Then we feed the generated adversarial example to the classifier to produce the prediction . We define as the cross-entropy with respect to the one-hot label of the targeted class. Therefore, to ensure the generated adversarial examples can fool the classifier, is formulated as:
Meanwhile, the CAM of the targeted class for is computed and denoted as . We aim to concentrate the adversarial noise on the critical areas which contain the legitimate object content, so that the for the adversarial examples would not be significantly changed compared to . In other words, to satisfy the similarity between and , we need to minimize the their distance. Therefore, is defined to lead the distribution of the noise:
Finally, the new loss function is formulated as:
where controls the magnitude of . We then iteratively optimize the CAG as well as by minimizing the function. The details of our approach to train the CAG are summarized in Algorithm 1.
Improve Transferability via Noise Dropout It is worth noting that before directly adding the noise on , we propose to apply a dropout layer with probability in the training phase. As a result, dropout layer can eliminate over-fitting problem to the current classifier and achieve better performance in black-box attack scenario by increasing the transferability. The extensive experimental results are given in the next section.
After preparing CAG and to perform attacks, we visualize the to demonstrate the effectiveness of this embedding layer. We show the examples using CIFAR-10 dataset, thus the size of the embedding layer is for 10 classes [krizhevsky2009learning]. For better visualization, T-SNE is applied to reduce the each class embedding’s dimension to 2 [maaten2008visualizing]. As we can see from Figure 4
, at epoch 0, class embeddings are initialized and distributed randomly. However, at epoch 500, embeddings of similar classes are close to each other, such as car-truck, horse-deer, and dog-cat. Therefore, the local distance between similar classes suggests that our approach creates a useful set of embeddings.
We also show the attention regions using CAM for adversarial examples generated by different attack methods. As shown in Figure 5, compared with the clean images, I-FGSM and GAP achieve targeted attack by misleading the network’s attention. However, we believe that changing the attention would make adversarial images vulnerable to designed defense mechanisms. Interestingly, as can be seen in the last row of Figure 5, the adversarial images generated by CAG do not suffer from this problem. Malicious perturbations are constrained to locate in the discriminative areas, so that CAG’s adversarial examples are robust enough circumvent detection and defense methods.
5 Experimental Results
5.1 Experiment Design
To evaluate the effectiveness of CAG, we conduct extensive experiments on CIFAR-10 [krizhevsky2009learning] and ImageNet [deng2009imagenet] dataset. We perform white-box and black-box attacks by using a pool of 6 different classifiers: ResNet-18 (RN-18), ResNet-34 (RN-34); VGG-11, VGG-13; DenseNet-121 (DN-121), DenseNet-169 (DN-169) [he2016deep, simonyan2014very, huang2017densely]. The top-1 classification accuracy is above (CIFAR-10) and above (ImageNet) for all classifiers. We use ResNet-18 to generate CAMs for all experiments. We set for both datasets because higher
enforces too much restriction and can reduce ASR. Then we train CAG using SGD with Nesterov momentum. The initial learning rate is set toand gradually decayed to using a cosine annealing curve. During training, a target label is randomly picked from all incorrect classes for each data point. On CIFAR-10, the CAG is trained for a total of epochs using the batch size of . On ImagetNet, we train the CAG for epochs with batch size of . The norm of adversarial perturbations is set to for both datasets.
We compare our proposed method with other existing attack algorithms: I-FGSM, PGD, and C&W. We use FoolBox in PyTorch[rauber2017foolbox] to generate these adversarial examples. Our experiments are performed on NVIDIA Tesla V100 GPU.
|Attacks / Classifiers||RN-34||VGG-11||VGG-13||DN-121||DN-169||Average|
|Attacks / Classifiers||RN-18||VGG-11||DN-121||Average||RN-34||VGG-13||DN-169||Average|
|GAP Unet (5T)||30 MB 5||97.98%||98.45%||97.85%||98.09%||82.97%||85.69%||88.31%||85.66%|
|GAP ResNet (5T)||30 MB 5||91.02%||94.25%||90.58%||91.95%||76.40%||86.27%||78.33%||80.33%|
|GAP (1000T)||30 MB 1000||N/A||N/A||N/A||N/A||N/A||N/A||N/A||N/A|
|CAG (5T)||222 MB||98.52%||97.71%||96.91%||97.71%||95.45%||94.34%||94.06%||94.62%|
|CAG (1000T)||222 MB||97.79%||97.01%||96.62%||97.14%||93.38%||94.28%||92.61%||93.42%|
|Bit Depth Reduction||4.80%||7.33%||10.41%||7.51%||10.22%||11.12%||10.18%||17.85%||12.96%||50.08%|
We first evaluate our proposed CAG on CIFAR-10 in white-box scenario. The classifier is set to be ResNet-18, and the classification accuracy on clean images achieves 93.48% for 10,000 validation images. To evaluate the targeted attack algorithms, ASR is used as the performance metric.
Low Computation Time
We generate 10,000 adversarial examples in CIFAR-10 validation set, and each image is targeted to a randomly incorrect class. The ASR can reach on the ResNet-18. We compare our proposed CAG with other state-of-art targeted attack methods. Similar to the procedure we use to evaluate CAG, we also choose attack targets in random manner. As for C&W, we only report first 1000 images targeted on random classes. Since the norm for CAG is set to be , for fair comparison, we try to keep norm around similar range for I-FGSM and PGD. Therefore, and is set to and , respectively. The maximum iteration is set to 50. When using C&W attack, we perform 10 iterations of binary search and run 10,000 iterations of gradient descent with learning rate at using the Adam optimizer. We only generate 1,000 images using C&W attack. As can be seen from the Table 1, our attack achieves comparable results compared with I-FGSM, PGD, and C&W. However, our attack has much lower inference time of only 1.44 seconds compared of 12 minutes 56 seconds of PGD and more than 10 hours of C&W attack–a more than 500-fold speedup. The ability to generate a large number of adversarial images in a such a small time makes our attack method practical in real-time applications.
CAG always has ASR greater than in white-box attack scenario. However, considering black-box attack, when attackers have no access to architecture and parameters of the classifier, ASR is not as high as the white-box scenario. To address this high transferbility requirement, we propose to drop out part of the perturbation before adding it on the benign image during training phase. As a result, CAG generalizes better and is less prone to over-fitting to a particular classifier. Hence, the transferability of the adversarial examples to new classifiers increases. We train 4 CAG models using ResNet-18 with dropout probability , , and . The ASR for 10,000 validation images (only 1000 images for C&W) targeted on random incorrect classes are reported. Table 2 reveals that even without dropout, CAG still has better performance in black-box results compared with other methods. Furthermore, the transferability of adversarial examples improves with increasing dropout probability.
We also evaluate the CAG on ImageNet. In our experiments, CAG takes a long time to converge when trained with a single classifier. Thus to accelerate the training process and perform stronger attack, we train CAG with an ensemble of ResNet-18, VGG-11 and DenseNet-121. When training with an ensemble of classifiers, we observe that the CAG does not suffer from over-fitting as much as training with only one classifier. Hence, unlike the best configuration in CIFAR-10 where , we reduce the perturbation dropout to in this case.
To explicitly demonstrate the performance of our proposed method, we compare our results with GAP [poursaeed2018generative]. To create a fair comparison, we implement GAP with two architectures and keep the configuration the same as our method. The first GAP uses identical generative architecture to ours, so we denote it GAP U-Net. The second GAP has the same architecture used in GAP’s original paper, which we denote it GAP ResNet. However, to perform targeted attack, GAP requires 1 model for each targeted class. Because we do not have enough resources to train 1,000 GAP models to have a comprehensive evaluation, we train 5 models for each architecture targeted at these following random chosen classes: black swan, Tibetan terrier, tiger beetle, cliff dwelling, hook.
Low Memory Cost
The comparison result is shown in Table 3. 10,000 benign images are randomly picked from the validation dataset to do the evaluation. We use CAG to generate adversarial examples targeted at the same 5 selected labels for fair comparison. In addition, since our proposed CAG can perform the comprehensive targeted attack on all 1,000 classes, we also generate adversarial images crossing all classes. In the table, 5T means ASR are evaluated on a pool of the same 5 targeted classes using 10,000 images in ImageNet evaluation dataset. In the last row, 1000T means that 10,000 images are targeted to any randomly selected label from all 1,000 classes. As can be seen from the table, to perform comprehensive attacks to all 1000 classes of ImageNet, our model takes 222MB of storage: 30MB for model’s weights, and 192MB for the embeddings. However, other generative models can take up to 30MB 1000 30GB for storage to attack all classes. Moreover, as shown in Table 3, for seen classifiers, ASR is above 90% for all approaches. On the one hand, while targeting 5 selected classes, adversarial images generated by GAP U-Net and CAG have comparable performance. On the other hand, by analyzing the result of unseen classifiers, we can see that CAG outperforms GAP. ASR of CAG can reach to 93.42% for 1,000 target labels in black-box scenario. To sum up, our proposed CAG is more practical to perform general targeted attack while keeping high ASR and transferability.
5.4 Breaking Defenses
Finally, we study the robustness of adversarial examples generated by CAG on ImageNet. We prepare CAG trained on the ensemble of ResNet-18, VGG-11 and DenseNet-121. Using the optimal setting, the dropout probability is set to . Since it is meaningless to protect images that are originally mis-classified, we evaluate 10,000 (ImageNet) images that are correctly classified by all three classifiers. We use the following configurations:
To achieve the strongest defense performance we provide the CAMs of the true class of correctly classified images to guide the pixel deflection (unlike using CAMs of top-5 predictions as the original paper suggests). We set the parameters following the original paper with window=10, deflections=100.
To perform this defense with optimal parameters, we keep the scale ratio the same as the ratio reported in the original paper. Thus the image size is modified from to in our implementation.
In our experimental setting, we reduce images to 3 bits as the original paper [xie2017mitigating].
We perform JPEG compression at quality level 75 out of 100.
Classification accuracy after applying defense methods is shown in Table 4. As a result of using CAM guidance in proposed CAG, our attack is robust against defense methods that aim to modify the non-discriminative regions such as pixel deflection and randomization. After using pixel deflection, classifiers accuracy on CAG’s generated adversarial images is still low at 27.63% (white-box) and 33.34% (black-box). In addition, CAG’s adversarial images can bypass the defense effects of input transformation. Bit depth reduction and JPEG compression can not improve the accuracy more than 10% for white-box and 14% for black-box setting. Compared with I-FGSM, our attack achieves lower classification accuracy in almost all categories. To sum up, our attack is robust against many defense mechanisms.
In this work, we propose a generative model to perform targeted adversarial attacks called CAG. With the help of the trainable embedding layer, the supervision of CAMs and random dropout, CAG is able to produce robust adversarial examples with state-of-art attacking performance and high transferability, while still maintaining low computation time and low memory cost. CAG has many desirable properties of an adversarial attack method, and therefore outperforms many other methods and can launch a real-time robust attack against many modern DNN systems.
Partial financial supports by AFRL (USA) under Grant No. FA8750‐18‐2‐0058.