Log In Sign Up

Initializing Perturbations in Multiple Directions for Fast Adversarial Training

by   Xunguang Wang, et al.

Recent developments in the filed of Deep Learning have demonstrated that Deep Neural Networks(DNNs) are vulnerable to adversarial examples. Specifically, in image classification, an adversarial example can fool the well trained deep neural networks by adding barely imperceptible perturbations to clean images. Adversarial Training, one of the most direct and effective methods, minimizes the losses of perturbed-data to learn robust deep networks against adversarial attacks. It has been proven that using the fast gradient sign method (FGSM) can achieve Fast Adversarial Training. However, FGSM-based adversarial training may finally obtain a failed model because of overfitting to FGSM samples. In this paper, we proposed the Diversified Initialized Perturbations Adversarial Training (DIP-FAT) which involves seeking the initialization of the perturbation via enlarging the output distances of the target model in a random directions. Due to the diversity of random directions, the embedded fast adversarial training using FGSM increases the information from the adversary and reduces the possibility of overfitting. In addition to preventing overfitting, the extensive results show that our proposed DIP-FAT technique can also improve the accuracy of the clean data. The biggest advantage of DIP-FAT method: achieving the best banlance among clean-data, perturbed-data and efficiency.


Alleviating Robust Overfitting of Adversarial Training With Consistency Regularization

Adversarial training (AT) has proven to be one of the most effective way...

Fast and Scalable Adversarial Training of Kernel SVM via Doubly Stochastic Gradients

Adversarial attacks by generating examples which are almost indistinguis...

Fast is better than free: Revisiting adversarial training

Adversarial training, a method for learning robust deep networks, is typ...

Adversarial Image Generation and Training for Deep Convolutional Neural Networks

Deep convolutional neural networks (DCNNs) have achieved great success i...

Learning with Multiplicative Perturbations

Adversarial Training (AT) and Virtual Adversarial Training (VAT) are the...

Stylized Adversarial Defense

Deep Convolution Neural Networks (CNNs) can easily be fooled by subtle, ...

I Introduction

Deep Neural Networks(DNNs) have achieved great success in a variety of applications, mainly including Computer Vision

[1, 2, 3], Speech Recognition [4]

and Natural Language Processing


. Despite achieving remarkable accuracy on many benchmark datasets, recent studies have shown taht DNNs are vulnerable to adversarial examples, which are carefully crafted by adding imperceptible noises to natural inputs, but can fool the networks to output wrong predictions. The existence of Adversarial Examples illustrates the potential vulnerabilities of Deep Learning. In image classification task, although the networks are fed with printed adversarial examples obtained from a camera which simulates physical attacks, lots of images are classified incorrectly

[6]. Another typical example about physical attacks is that the ”STOP” sign with adversarial stickers mislead the perceptual systems into misclassifying it [7]. Thus, adversarial examples bring great challenges to the applications of the physical world and how to defend against them is an important and urgent problem that must be solved.

There have been a cohort of works on resisting various adversarial attacks. One of the most direct methods is adversarial training [8, 9] that can effectively improve the robustness of DNNs by minimize the losses of adversarial examples generated in each step of learning. Recently, different works [9, 10, 11, 12, 13] on adversarial training greatly extend the standard adversarial training [8]. Madry et al. [9] used the more powerful PGD attack for adversarial training such that the model after trained can can enhance the ability to resist general attacks such as FGSM. Another improvement way is to increase the diversity of adversarial examples, for which Tramèr et al. [10] proposed Ensemble Adversarial Training. Both of the PGD and the ensemble adversarial training improve robustness via better perturbations. The rest of methods [11, 12, 13] are to reduce convergence time of adversarial training.

It is evident that adversarial training methods [8, 9, 10] have led to significant progress in improving adversarial robustness, where using PGD adversary [9] is recognized as the most effective methods in adversarial training. However, PGD adversarial training is computationally expensive because PGD involves multiple random initializations and iterative gradient calculations when seeking adversarial perturbations. Fortunately, many methods [11, 12, 13, 14] have been proposed to speed up this adversarial learning. By replacing PGD with a single-step FGSM [8] to produce perturbations, Wong et al. [14] proposed the fastest adversarial training algorithm called revisiting FGSM adversarial training which has very close adversarial accuracies to the standard PGD-based training [9]. Although the author claims to be able to train a robust classifier in 6 minutes [14], it is practically found that the target model is quite possible to fall into overfitting and thus cannot resist PGD attacks. The most likely reason is that the perturbations only covers a smaller subset compared to PGD.

In the present paper, we follow up the study of the FGSM-based training [14] and conduct further research on diversified random initialization of perturbations. Compared with the previous method [14], the main contributions of this paper are as follows:

  1. []

  2. The Random Diversified Initialization (RDI) technique is proposed to replace the simple random initialization of perturbations. It makes the generated perturbations more diverse and increases the difficulty of adversarial training.

  3. The catastrophic overfitting of FGSM-based adversarial training was solved so that the target model can be trained with many epochs to fit all data.

  4. the proposed method achieves the best banlance among clean-data, perturbed-data and efficiency, which means that we trained the most robust model between clean-data and perturbed-data in the same short time.

Ii Related works

Ii-a Adversarial Attacks

Since Szegedy et al. [15] discovered the properties of adversarial examples, various adversarial attacks have been proposed to fool a trained DNN by designing adversarial perturbations. There are two types of attacks: white-box attacks know the whole structure of the target model being attacked(i.e., FGSM [8], PGD [9] and C&W [16]), and black-box attacks only have access to the prediction of the target model(i.e., SBA [17] and ZOO [18]), which are briefly introduced as follows.

Ii-A1 FGSM Attack

The Fast Gradient Sign Method (FGSM) [8] is an efficient one-step method to generate adversarial examples. Given an input , it aims at quickly finding a perturbation direction of

to increase the training loss function of the target model such that the model will classify

incorrectly. Concretely, the adversarial perturbation is produced by calculating the gradient of the loss function with respect to and multiplying the gradient by a constant :


where is the true class label for the input instance , is the loss function, is the gradient, is the sign function, controls attack intensity, and is the adversarial example obatained.

Ii-A2 PGD Attack

The Projected Gradient Descent (PGD) is a iterative variant of FGSM and also generates adversarial perturbations. Given an input , the process of PGD to obtain an adversarial example can be expressed as:


where restricts in -neighborhood, is a original image perturbed with -ball random noise in norm, and is the step size. PGD is considered to be the strongest first-order attack.

Ii-A3 ODI-PGD Attack

The ODI-PGD [19] is a PGD-based attack using ODI (Output Diversified Initialization) [19]

method which produce initial perturbations. The ODI changes the output logits in random direction and then calculates the perturbations by back propagation. In addition to PGD, ODI has strengthened the C&W

[16] attack. Inspired by ODI, we also explore to output diversified initialization of perturbations. In contrast to that, our perturbations produced by RDI are more diverse and mainly strengthen adversarial training.

Ii-B Adversarial Training methods

Adversarial training [8] is a simple defense method against adversarial attacks in which a model is retrained by adding perturb-data into training set at every training step for improving robustness of the model. With the development of adversarial attacks, adversarial training has been also improved continuously.

Ii-B1 PGD Adversarial Trainging

PGD Adversarial Trainging [9] uses PGD as adversary. What is more, Madry et al. redefined the form of adversarial training:


where the goal of the inner maximization is to find adversarial examples which maximize the loss of the target model, while the outer minimization is a optimization process to obtain the parameters that minimize the loss caused by the adversarial inputs. Different with standard adversarial training [8], the PGD one only rely on adversarial examples to train the model. A model trained by PGD in standard training framework [8] can defend against FGSM, but not vice versa, because PGD attacks are stronger than FGSM.

Ii-B2 Revisiting FGSM Adversarial Trainging

Revisiting FGSM Adversarial Trainging [14] is the fastest strategy to train a roubust model against gradient-based attacks like FGSM and PGD. For example, under the author’s experimental setting, a CIFAR-10 [20]

classifer only need 6 minutes and an ImageNet classifier is trained completely in 12 hours, reducing 10 and 38 hours respectively compared to free adversarial training

[11]. Surprisingly, the key to significantly reducing training time but is only to replace PGD with FGSM. Despite the huge advantage on acceleration, this schema is easy to overfit and then cannot defend against PGD attacks, which is why the previous studies failed in using FGSM to train a robust model against other more powerful attacks.

Iii Proposed Training Algorithm

In this section, We first introduce the overview of the proposed algorithm, which incorporates the random diversified initialization of perturbations and FGSM-based adversarial training. Then, the detailed adversarial training and the random initialization will be given.

0:  a classifier , radius , step size , training epochs , size of a dataset
  for  do
     for  do
        //Perform FGSM adversarial attack
         // Update model weights
     end for
  end for
Algorithm 1 Revisiting FGSM Adversarial Trainging

Iii-a FGSM-based adversarial training

In this work, we follow the algorithm called Revisiting FGSM Adversarial Training. This FGSM adversarial training uses FGSM as adversary, but it has the close defense score to the PGD-based training. And the pseudo-code for this training method is provided in Algorithm 1. Due to the FGSM only calculates the gradient in one-step, the FGSM-based training is much faster than the training based on the iterative PGD attack. In addition, different with naive FGSM adversarial training [8], random initialization for the perturbation is embedded before training, as shown in Algorithm 1. The random initialization can expand the space of perturbations so that the calculated perturbations cover a subset of PGD, which is the key to defend against PGD attacks successfully.

Iii-B Random Diversified Initialiaztion for perturbations

The perturbations of previous attacking methods [21, 22, 23]

like PGD are directly sampled from a uniform distribution in

-ball norm of the input pixel space. Inspired by the novel random initialization strategy called Output Diversified Initialization (ODI), we proposed Random Diversified Initialization (RDI) for FGSM-based adversarial training.

Like ODI, RDI encourages the diversity in the output space to calculate the diversified perturbations in the input space. Given the input and the classifier , the optimization problem of initializing the perturbations is formulated as:


where is the output logits of , defines the direction of the diversification in -dimension space ( is the number of classes of ), and is the perturbation set constrained by . The maximization in Eq. (4) aims to drive the output of away from the original output along the direction which is randomly sampled from the uniform distribution . Similar with PGD, we adopt iterative gradients to update :


where is updated in the -th iteration, constraints within the bound . Note that is derived for , which is different with ODI that calculates the gradient with respect to .

0:  a classifier , radius , step size of FGSM, training epochs , size of a dataset, number of RDI steps , step size of RDI
  Initialize:  // Initialize the perturbation with 0
  for  do
     for  do

// Randomize the direction vector

        for  do
        end for
        // Perform FGSM adversarial attack
     end for
  end for
Algorithm 2 Diversified Initialized Perturbations Adversarial Training (DIP-FAT): Previous

Iii-C Diversified Initialized Perturbations Adversarial Training

The combination of RDI and FGSM-based training is our proposed method, where RDI initializes the perturbations and FGSM-based training makes the model robust by adversarial learning. Depending on whether to reuse the perturbation , we divide the RDI-FAT into two types: the previous training and the random initialization training.

As shown in Algorithm 2, the previous training is the way that used perturbations in current iteration is from the previous iteration. The idea of the previous schema comes from free adversarial training [11]. Another way is the random initialization training which initializes the perturbation in each batch of learning before RDI. The initialization training is summarizd in Algorithm 3. is the number of RDI steps. In theory, the larger the , the logits output by is away from the starting point. In order to control the cost of time, does not exceed 2 in general. In particular, in Algorithm 2, we recommend .

0:  a classifier , radius , step size of FGSM, training epochs , size of a dataset, number of RDI steps , step size of RDI
  for  do
     for  do
         Uniform  // Initialize perturbations in a random direction
         Uniform // Randomize the direction vector
        for  do
        end for
        // Perform FGSM adversarial attack
     end for
  end for
Algorithm 3 Diversified Initialized Perturbations Adversarial Training (DIP-FAT): Random initialization

Iv Experiments

Iv-a Experiment setup

Iv-A1 Datasets

The experiments were conducted on MNIST [24] and CIFAR-10 [20] benchmarks. The MNIST [24] is a dataset for handwritten digit recognition with 70K 2828 gray-scale images, where 60K images are training examples and 10K images are test examples. The CIFAR-10 [20] contains 50K training samples and 10K test samples of 3232 color image for image classification.

Iv-A2 Experimental environment and acceleration skills

All experiments in this paper are run on a single NVIDIA Tesla P100 (16GB). The deep learning framework is pytorch 1.4.0. Unlike wong et al.

[14], we only use the cyclic learning rate [25, 26] to reduce the number of epochs required for training, because not all GPUs support the mixed-precision [27], which is more in line with the real situation.

Iv-B Verified performance on MNIST

Since the FGSM-based training [14] benefits from the random initialization, we wonder if the RDI works for robustness of DNNs. To demonstrate RDI confers real robustness to the model, we followed the experiment [14] on MNIST [24]. We did not randomly initialize perturbations, in order to increase the credibility. We adopt algorithm 2 with , , and the other parameters as the same as [14]. The exact verification results can be seen in Table 1, where we can find that our result is closer to PGD adversarial training on MNIST than the FGSM method [14].

Method Clean PGD(=0.1) PGD(=0.3) FGSM(=0.1)
PGD 98.99% 97.55% 89.04% 97.7%
FGSM 98.50% 96.76% 88.34% 97.1%
ours 98.80% 97.09% 88.63% 97.6%
TABLE I: Robustness of different adversarial training on MNIST

Iv-C Verified performance on CIFAR-10

In this section, we test our proposed technique on CIFAR-10 to show the effect of the diversified initialization in FGSM-based adversarial training. We trained the PreAct ResNet18 architecture with radius and . The results are given in Table 2. We listed the PGD [9], the Free [11], the FGSM [14] and ours, where the PGD trained 40 epochs, the Free trained 96 epochs, the FGSM trained 30 epochs. To be fair, our methods also only trained 30 epochs.

The best adversarial accuracy (PGD, ) was picked by the Free, and the best standard accuracy tested by the clean-data (the natural images) was picked by our previous method (). Our methods perform well on the clean-data. In contrast to the FGSM schema, our methods improve accuracy on both clean-data and perturbed-data only by increasing the tolerable time cost. In addition, we found that the result in is better than , which verifies that the perturbation is gradually far away from the starting point with the increase of calculation steps.

Method Clean PGD() Time(min)
PGD-7[9] 87.30% 45.80% 1426.00
Free (m=8) [11] 85.96% 46.33% 351.68
FGSM [14]
- previous init 86.02% 42.37% 25.21
- random init 84.01% 45.25% 25.81
The previous: 85.02% 45.26% 40.26
The previous: 88.12% 45.93% 53.85
The random: 87.19% 44.05% 40.72
The random: 86.40% 45.57% 54.80
TABLE II: Standard and robust performance of various adversarial training methods on CIFAR-10 for and their corresponding training times

V Conclusion

In this paper, we combined the diversified initialization for adversarial perturbations and FGSM-based adversarial training and proposed the random diversified initialization fast adversarial training (RDI-FAT). Through extensive experiments, the proposed adversarial training not only solves the problem of over fitting in FGSM training, but also improves the performance on both clean-data accuracy and perturbed-data accuracy.