DeepAI
Log In Sign Up

Adversarial Training: embedding adversarial perturbations into the parameter space of a neural network to build a robust system

Adversarial training, in which a network is trained on both adversarial and clean examples, is one of the most trusted defense methods against adversarial attacks. However, there are three major practical difficulties in implementing and deploying this method - expensive in terms of extra memory and computation costs; accuracy trade-off between clean and adversarial examples; and lack of diversity of adversarial perturbations. Classical adversarial training uses fixed, precomputed perturbations in adversarial examples (input space). In contrast, we introduce dynamic adversarial perturbations into the parameter space of the network, by adding perturbation biases to the fully connected layers of deep convolutional neural network. During training, using only clean images, the perturbation biases are updated in the Fast Gradient Sign Direction to automatically create and store adversarial perturbations by recycling the gradient information computed. The network learns and adjusts itself automatically to these learned adversarial perturbations. Thus, we can achieve adversarial training with negligible cost compared to requiring a training set of adversarial example images. In addition, if combined with classical adversarial training, our perturbation biases can alleviate accuracy trade-off difficulties, and diversify adversarial perturbations.

READ FULL TEXT VIEW PDF

page 3

page 7

09/27/2020

Beneficial Perturbations Network for Defending Adversarial Examples

Adversarial training, in which a network is trained on both adversarial ...
05/15/2020

Initializing Perturbations in Multiple Directions for Fast Adversarial Training

Recent developments in the filed of Deep Learning have demonstrated that...
04/05/2020

Approximate Manifold Defense Against Multiple Adversarial Perturbations

Existing defenses against adversarial attacks are typically tailored to ...
07/29/2020

Stylized Adversarial Defense

Deep Convolution Neural Networks (CNNs) can easily be fooled by subtle, ...
03/15/2019

On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models

Adversarial examples --- perturbations to the input of a model that elic...
12/19/2019

Mitigating large adversarial perturbations on X-MAS (X minus Moving Averaged Samples)

We propose the scheme that mitigates an adversarial perturbation ϵ on th...
06/03/2019

Analysis and Improvement of Adversarial Training in DQN Agents With Adversarially-Guided Exploration (AGE)

This paper investigates the effectiveness of adversarial training in enh...

1 Introduction

Neural networks have lead to a series of breakthroughs in many fields, such as image classification tasks (He et al., 2016)

, natural language processing

(Devlin et al., 2018). Model performance on clean examples was the main evaluation criterion for these applications until the realization of the adversarial example phenomenon by Szegedy et al. (2013); Biggio et al. (2013). Neural networks were shown to be vulnerable to adversarial perturbations: carefully computed small perturbations added to legitimate clean examples (adversarial examples, Fig. 1

a) can cause misclassification on state-of-the-art machine learning models. Thus, building a deep learning system that is robust to both adversarial examples and clean examples has emerged as a critical requirement.

Researchers have proposed a number of adversarial defense strategies to increase the robustness of a deep learning system. Adversarial training, in which a network is trained on both adversarial examples () and clean examples () with true class labels , is one of the few defenses against adversarial attacks that withstands strong attacks. Adversarial examples are the summation of adversarial perturbations lying inside the input space () and clean examples: (Fig. 1

a). Given a classifier with a classification loss function

and parameters , the objective function of adversarial training is:

(1)

For example, Goodfellow et al. (2014) used adversarial training and reduced the test set error rates from 89.4% to 17.9% on adversarial examples of the MNIST dataset. Huang et al. (2015) built a robust model against adversarial examples by punishing misclassified adversarial examples.

Despite the efficacy of adversarial training in building a robust system, there are three major practical difficulties while implementing and deploying this method. Difficulty one: adversarial training is expensive in terms of memory and computation costs. Producing an adversarial example requires multiple gradient computations. In a practical scenario, we further produce more than one adversarial examples for each clean example (Tramèr et al., 2017). We need to at least double the amount of memory, to store those adversarial examples alongside the clean examples. In addition, during adversarial training, the network has to train on both clean and adversarial examples; hence, adversarial training requires at least twice the computation power than just training on clean examples. For example, even on reasonably-sized datasets, such as CIFAR-10 and CIFAR-100, adversarial training can take multiple days on a single GPU. As a consequence, although adversarial training remains among the most trusted defenses, it has only been within reach for research labs having hundreds of GPUs [[[ref]]]. Difficulty two: accuracy trade-off between clean examples and adversarial examples - although adversarial training can improve the robustness against adversarial examples, it sometimes hurts accuracy on clean examples. There is an accuracy trade-off between the adversarial examples and clean examples (Di et al., 2018; Raghunathan et al., 2019; Stanforth et al., 2019; Zhang et al., 2019). Because most of the test data in real applications are clean examples, test accuracy on clean examples should be as good as possible. Thus, this accuracy trade-off hinders the practical usefulness of adversarial training because it often ends up lowering performance on clean examples. Difficulty three: lack of diversity of adversarial perturbations - even though one might have sufficient computation resources to train a network on both adversarial and clean examples, it is unrealistic and expensive to introduce all unknown attack samples into the adversarial training. For example, Tramèr et al. (2017) proposed Ensemble Adversarial Training which can increase the diversity of adversarial perturbations in a training set by generating adversarial perturbations transferred from other models (they won the competition on Defenses against Adversarial Attacks). Thus, broad diversity of adversarial examples is crucial for adversarial training.

Figure 1: (a) Adversarial examples are a summation of clean examples and adversarial perturbations. (b) Adversarial training for classical deep convolutional network model. Adversarial training on both clean and adversarial examples requires two duplicate information from clean examples and one information on adversarial perturbations. Thus, the adversarial training procedure is expensive. (c) Solving the expensive training procedure. By introducing adversarial perturbation biases to the fully connected layers of a deep convolutional network, we can achieve adversarial training with negligible costs. While training only on clean examples, adversarial perturbation biases automatically create and store adversarial perturbations by recycling the gradient information computed when updating model parameters. The network learns and adjusts itself to these injected adversarial perturbations. Thus, the adversarial perturbation bias increases the robustness against adversarial examples.

To solve the above three practical difficulties, we dive into details of adversarial training and analyze the causes of these difficulties. The cause of difficulty one: In Fig. 1b, a classical deep convolutional network is trained on both clean examples and adversarial examples during adversarial training. Since the adversarial examples are the summation of clean examples and adversarial perturbations (Fig. 1a), the adversarial training uses duplicate information from clean examples and the clean portion of perturbed examples. If we could use information from clean examples only once and generate corresponding adversarial perturbations during the training of clean examples, we could reduce computation costs significantly. The cause of difficulty two: to reduce computation costs, one might train the neural network on only adversarial examples. Since adversarial examples are the summation of clean examples and adversarial perturbations, it contains overlapped information from both clean examples and adversarial perturbations. However, failing to incorporate intact information of clean examples hurts test accuracy on clean examples (Di et al., 2018; Raghunathan et al., 2019). The cause of difficulty three: even though one might have sufficient computation resources and can afford training on both clean and adversarial examples, the finite amount of adversarial examples still limits the diversity of adversarial perturbations. Thus, the lack of diversity of adversarial perturbations decreases test accuracy on both clean and adversarial examples (Tramèr et al., 2017).

Here, we introduce a new adversarial perturbation bias () to the last few fully connected layers of the deep convolutional network, replacing the normal bias term (Fig. 1c). The main novelty is that instead of using a fixed, precomputed adversarial perturbations in adversarial examples (input space), we introduce dynamic adversarial perturbations into the parameter space of the network, lying inside the adversarial perturbation bias. During training on clean examples, the adversarial perturbation bias automatically creates and stores adversarial perturbations by recycling the gradient information computed and through updating using the Fast Gradient Sign Method (FGSM) Goodfellow et al. (2014) with the objective function:

(2)

Where is a clean example with true class , is the network parameters, is adversarial perturbation bias, and is the classification loss function. Both adversarial perturbations in adversarial examples and in adversarial perturbation biases are calculated using FGSM. The only difference is that adversarial perturbations in adversarial examples lie inside the input space and adversarial perturbations in adversarial perturbation bias lie inside the parameter space. Thus, the adversarial perturbation biases play a role like adversarial examples in the input space, but they can inject the learned adversarial perturbations directly to the network parameter space. Then, the network learns and adjusts itself automatically to these injected adversarial perturbations. Thus, it is a robust system against adversarial attacks. During training only on clean examples, the network with adversarial perturbation bias shows largely improved test accuracy on adversarial examples, with negligible extra costs. In addition, during classical adversarial training combined with our approach, the network with adversarial perturbation bias can alleviate the accuracy trade-off and diversify available adversarial perturbations. To show the efficacy of the network with adversarial perturbation bias on the above three difficulties, we consider three different scenarios that correspond to different abilities to access increasing computation power.

2 Related Work and background information

Attack models and algorithms: Most of the attack models and algorithms focus on causing misclassification of target classifiers. They find an adversarial perturbation . Then, they create an adversarial example by adding the adversarial perturbation to a clean example : . The adversarial perturbation sneaks the clean example out of its natural class and into another. Given a fixed classifier with parameters , a clean example with true label y, and a classification loss function , the bounded non-targeted adversarial perturbation is computed by solving:

(3)

where is some -norm distance metric, and is the adversarial perturbation budget.

In this work, we consider the most popular non-targeted method - Fast Gradient Sign Method (FGSM) by Goodfellow et al. (2014), in the context of -bounded attacks:

White box attack model: In white box attacks (Qiu et al., 2019), the adversaries have complete knowledge about the target model, including algorithm, data distribution and model parameters. The adversaries can generate adversarial perturbations by identifying the most vulnerable feature space of the target model. In this work, we use white box Fast Gradient Sign Method with to generate adversarial perturbations.

Modification of bias terms: Modifying the structure of bias terms in the fully connected layer can be beneficial. Wen and Itti (2019) used bias units to store the beneficial perturbations (opposite to the well-known adversarial perturbations). Wen and Itti (2019) showed that the beneficial perturbations, stored inside task-dependent bias units, can bias the network outputs toward the correct classification region for each task, allowing a single neural network to have multiple input to output mappings. Multiple input to output mappings alleviate the catastrophic forgetting problem (McCloskey and Cohen, 1989) in sequential learning scenarios (a previously learned mapping of an old task is erased during learning of a new mapping for a new task). Here, we leverage a similar idea. During training on clean examples, adversarial perturbation bias automatically creates and stores the adversarial perturbations. The network automatically learns how to adjust to these adversarial perturbations. Thus, it helps us to build a robust model against adversarial examples.

3 Network structure and algorithm for a neural network with adversarial perturbation bias.

Two different structures for adversarial perturbation bias: naive adversarial perturbation bias and multimodal adversarial perturbation bias. Naive adversarial perturbation bias (

) is just a normal bias term of the fully connected layer. The only difference is that during backpropagation, we update the naive adversarial perturbation bias using the Fast Gradient Sign Method. For multimodal adversarial perturbation bias, we design the multimodal adversarial perturbation bias (

) as a product of bias memories () and bias weight (), where

is the number of neurons in the fully connected layer, and

the number of bias memories. Multimodal adversarial perturbation bias has more degrees of freedom to better fit a multimodal distribution. Thus it could yield better results than the naive adversarial perturbation bias. The update rules for the network with naive (blueblue) and multimodal (redred) adversarial perturbation bias are shown in Alg. 

1 and Alg. 2.

   Input: —  Activations from the last layer
           —  Adversarial perturbation bias
   Output:,       where: —  Normal neuron weights.
Algorithm 1 Forward rules for both naive and multimodal adversarial perturbation bias
  Notations: —  Normal neuron weights    —  Activations from the last layer
                 —  Adversarial perturbation bias (native or multimodal)
                 —  Bias memories for multimodal adversarial perturbation bias
                 —  Bias weights for multimodal adversarial perturbation bias
                 —  Adversarial perturbation budgets for Fast Gradient Sign Method
  
  During the training:
   Input:   —  Gradients from the next layer
   output: green// gradients for the normal neuron weights
      green// gradients for activations to last layer
                blue For naive adversarial perturbation bias:
                 blue
                                        green// gradients for the naive adversarial perturbation bias using FGSM
                red For multimodal adversarial perturbation bias:
                  red green// gradients for the bias memories of
                                                     green multimodal adversarial perturbation bias using FGSM
                  red
                         green// gradients for the bias weights of multimodal adversarial perturbation bias
  
  After the training:
                red For multimodal adversarial perturbation bias:
                 redKeep the as the product of and
                 redDelete and to reduce memory and parameter costs
Algorithm 2 Backward rules for blue naive and red multimodal adversarial perturbation bias

4 The function of adversarial perturbation bias in three different scenarios

Scenario 1, adversarial training with negligible costs: a company with modest computation resources can only afford training the network on clean examples, but still wants to have a moderately robust system against adversarial examples. During training on clean examples, when updating model parameters with objective function Eqn. 2, in the backward pass of backpropagation, adversarial perturbation bias automatically creates and stores adversarial perturbations (calculated from clean examples). In the forward pass, the adversarial perturbation bias injects the learned adversarial perturbations to the network. As a result, the neural network learns the structure of these adversarial perturbations and adjusts itself to against these injected adversarial perturbations automatically during the training on clean examples. Although the network cannot be trained on the adversarial examples to access the information of adversarial perturbations directly because of the modest computation power, adversarial perturbation bias can still provide the information of adversarial perturbations to the network. Thus, adversarial perturbation bias can improve the model’s robustness against adversarial examples (see Results) and maintain the highest test accuracy on clean examples while barely increase any computation costs. For naive adversarial perturbation bias, we do not introduce any extra computation costs beyond FGSM. For multimodal adversarial perturbation, the extra computation costs are further increased by a matrix multiplication in the forward pass and two dots products in the backward pass per fully connected layer. The state-of-the-art deep convolutional structure might have up to three fully connected layers as a classifier, yielding up to three extra multiplication and six extra dot products computation costs. The extra computation costs are negligible comparing to generating a huge amount of extra adversarial examples and training the neural network on both clean and adversarial examples.

Scenario 2, counteracting the adversarial perturbations lying inside of the adversarial examples: to have strong robustness against adversarial examples, a company with moderate computation resources can only afford to generate adversarial examples and train a network only on adversarial examples. Training only on adversarial examples can achieve high robustness against adversarial examples, but it decreases the test accuracy on the clean examples. During training on adversarial examples, when updating model parameters, in the backward pass of backpropagation, adversarial perturbation bias automatically creates and stores adversarial perturbations (calculated from the adversarial examples) by recycling the gradient information computed. In the forward pass, the adversarial perturbation bias injects the learned adversarial perturbations to the network. There is a high chance that the adversarial perturbations ( calculated from adversarial examples) lying inside the parameter space of neural network (adversarial perturbation bias) would counteract the adversarial perturbations ( calculated from clean examples) lying inside the input space (images of adversarial examples). In mathematical term, in the view of the network, , where counteract with the . Adversarial perturbation bias can convert some adversarial examples into clean examples because of the counteraction. Thus, it largely improves the testing accuracy on the clean examples (see Results), while it still maintains a high testing accuracy against adversarial examples.

Scenario 3, diversify the adversarial perturbations:

a company with abundant computation resources can afford to train a network on both clean and adversarial examples. In the generation of adversarial examples, the adversarial perturbations lying inside the adversarial examples are calculated from the clean examples. In comparison, during the training of the network on both clean and adversarial examples, in the backward pass, the adversarial perturbation bias creates and stores adversarial perturbations calculated from both clean and adversarial images by recycling the gradient information computed. This allows the adversarial perturbation bias to diversify the available adversarial perturbations and to inject these varied adversarial perturbations to the network. In addition, adversarial perturbations lying inside the adversarial examples are fixed and precomputed. In addition, the adversarial perturbations lying inside the adversarial perturbation bias change dynamically for every training epoch. This further increases the variations of adversarial perturbations during the training. In mathematical representation, the diversified group has: clean examples

, adversarial examples , perturbed clean examples and perturbed adversarial examples . Adversarial perturbation bias can diversify the adversarial perturbations available to the neural network using adversarial training. As a result, the diversity improves the testing accuracy on both clean and adversarial examples (see Results).

5 Experiments

5.1 Dataset and network structure

MNIST (LeCun et al., 1998) is a dataset with handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. FashionMnist (Xiao et al., 2017) is a dataset with article images, has a training set of 60,000 examples, and a test set of 10,000 examples. We use a LeNet (LeCun et al., 1998) as a classifier (classical LeNet). Then, we create our version of LeNet (LeNet with adversarial perturbation bias) by adding adversarial perturbation bias into the fully connected layers, replacing the normal bias. We generate the adversarial examples using Fast Gradient Sign Method for both classical LeNet and LeNet with adversarial perturbation bias.

Figure 2: Performance of LeNet with adversarial perturbation bias (our method) versus classical network LeNet under three different computation power budgets for three different scenarios. The adversarial examples are generated using FGSM (). 1. Negligible costs scenario (A, B): training neural network only on clean examples. LeNet with adversarial perturbation bias demonstrates much increased test accuracy on adversarial examples, and slightly increased test accuracy on clean examples compared to classical LeNet. 2. Moderate extra costs scenario (C, D): training neural network only on adversarial examples. LeNet with adversarial perturbation bias demonstrates largely increased test accuracy on clean examples and slightly increased test accuracy on adversarial examples compared to classical LeNet. 3. High extra costs scenario (E, F): training the neural network on both clean and adversarial examples. LeNet with adversarial perturbation bias demonstrates slightly increased test accuracy on both clean and adversarial examples compared to classical LeNet.

5.2 Quantitative results

We test the LeNet with adversarial perturbation bias and classical LeNet on both clean and adversarial examples from MNIST and FashionMNIST dataset, under 3 different computation budgets: Negligible costs (Training only on clean examples), moderate extra costs (Training only on adversarial examples) and high extra costs (Training on both clean and adversarial examples).

Negligible cost - training only on clean examples: our method can largely increase test accuracy on adversarial examples and slightly increase test accuracy on clean examples. When the neural network can only be trained on clean examples because of modest computation power, LeNet with adversarial perturbation bias achieves a slightly higher test accuracy on clean examples than classical Lenet (Fig. 2B MNIST: 99.17% vs. 99.01%, FashionMNIST: 89.54% vs. 89.17%). In addition, for the test accuracy on adversarial examples (Fig. 2

A), classical Lenet can only achieve 18.08% on the MNIST dataset and 11.87% on the FashionMNIST dataset. In comparison, LeNet with adversarial perturbation bias can achieve 98.88% on the MNIST dataset and 53.88% on the FashionMNIST dataset. Thus, for companies with modest computation resources, adversarial perturbation bias can help a system achieve moderate robustness against adversarial examples, while only introducing negligible computation costs (e.g., on FashionMNIST, our method only uses 59% training time compared to the training time of adversarial training with just one adversarial example per clean example, saving 43.51 minutes training time for 500 training epochs on NVIDIA Tesla-V100 platform. The saving would be huge on a larger dataset such as Imagenet

(Deng et al., 2009)).

Moderate extra costs - training only on adversarial examples: our method can slightly increase the test accuracy on adversarial examples and largely increase the test accuracy on clean examples. In the generation of adversarial examples, it takes multiple gradient computation through backpropagation to calculate adversarial perturbations. As a consequence, training only on adversarial examples is slightly more expensive than training only on clean examples. On classical networks, although training only on adversarial examples can achieve a high test accuracy on adversarial examples (Fig. 2C classical LeNet: MNIST 99.01%, FashionMNIST 91.49%), it hurts the test accuracy on clean examples. For example, for classical LeNet, training on clean examples can achieve test accuracy - 99.01% for the MNIST dataset and 89.17% for the FashionMNIST dataset (Fig. 2B). For classical LeNet, training on the adversarial examples can only achieve a much worse testing accuracy - 95.54% for MNIST dataset and 65.64% for FashionMNIST dataset (Fig. 2D). In comparison, for LeNet with adversarial perturbation bias, training on the adversarial examples can achieve a testing accuracy - 97.72% for MNIST dataset and 71.54% for FashionMNIST dataset (Fig. 2D). This accuracy is still worse than the accuracy of training only on clean examples, but it is much better than the accuracy of classical LeNet training only on adversarial examples. Thus, for companies with moderate computation resources, we build a strong robust system against adversarial examples while achieving a better testing accuracy on clean examples.

High extra costs - training on both clean and adversarial examples: our method can diversify the adversarial perturbations, so it can slightly increase the test accuracy on both clean and adversarial examples. LeNet with adversarial perturbation bias can achieve slightly higher accuracy on clean examples than classical Lenet (Fig. 2F MNIST 99.13% vs. 99.09%, FashionMNIST 89.65% vs. 89.49%). In addition, LeNet with adversarial perturbation bias can achieve slightly higher accuracy on adversarial examples than classical LeNet (Fig. 2E MNIST 97.62% vs. 97.01%, FashionMNIST 95.39% vs. 94.98%). Even for companies with abundant computation resources, it is still helpful to adapt our adversarial perturbation bias because it diversifies the adversarial perturbations.

5.3 Influence of the adversarial perturbation budgets and structure of adversarial perturbation bias

Adversarial perturbation budgets: The higher the adversarial perturbation budgets, the higher the chance it can successfully attack a neural network. However, attacks with higher adversarial perturbation budgets are easier to detect by a program or by humans. For example, (Fig. 1a) represents very high noise, which makes FashionMNIST images difficult to classify, even by humans. But the distribution differences between the adversarial examples and clean examples are so large that they can be easily captured by defense programs. Thus, is a good attack since the differences caused by adversarial perturbations are too small to be detected by most defense programs. For small adversarial perturbations (Fig. 3a ), by just training on clean images, LeNet with adversarial perturbation bias achieves moderate robustness against adversarial examples with negligible costs. Thus, it is really beneficial to adapt our method for companies with modest computation power, who still want to achieve moderate robustness against adversarial examples. Structure of adversarial perturbation bias: under the FGSM attacks (, Fig. 3

b), multimodal adversarial perturbation bias works better than naive perturbation bias on MNIST. However, naive adversarial perturbation bias works better than multimodal perturbation bias on FashionMNIST. Thus, the structure of adversarial perturbation bias is a hyperparameter for different datasets. The number of bias memories of multimodal adversarial perturbation bias is another hyperparameter. If we have insufficient bias memories, there is not enough degrees of freedom to learn a good multimodal distribution. If we have excessive bias memories, the model overfits the multimodal distribution.

Figure 3: Test accuracy on adversarial examples after training only on clean examples from MNIST (Blue) or FashionMNIST (Orange) datasets. (a): Influence of different adversarial perturbation budgets () on LeNet with adversarial perturbation bias. (b): Influence of different structures of adversarial perturbation bias on LeNet under adversarial attack: FGSM = 0.3.

6 Conclusion

In this paper, we propose a new method to solve the three major practical difficulties while implementing and deploying adversarial training, by embedding adversarial perturbation into the parameter space (adversarial perturbation bias) of neural network. There are three major contributions that benefit for companies with different levels of computation resources - 1. Modest Computation power: adversarial training with negligible costs. 2. Moderate computation power: alleviate the accuracy trade-off between clean examples and adversarial examples. 3. Abundant computation power: diversify the adversarial perturbations available to the adversarial training.

References

  • B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, and F. Roli (2013) Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases, pp. 387–402. Cited by: §1.
  • J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In

    2009 IEEE conference on computer vision and pattern recognition

    ,
    pp. 248–255. Cited by: §5.2.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §1.
  • X. Di, P. Yu, and M. Tian (2018) Towards adversarial training with moderate performance improvement for neural network classification. arXiv preprint arXiv:1807.00340. Cited by: §1, §1.
  • I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §1, §1, §2.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §1.
  • R. Huang, B. Xu, D. Schuurmans, and C. Szepesvári (2015) Learning with a strong adversary. CoRR abs/1511.03034. External Links: Link, 1511.03034 Cited by: §1.
  • Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al. (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §5.1.
  • M. McCloskey and N. J. Cohen (1989) Catastrophic interference in connectionist networks: the sequential learning problem. In Psychology of learning and motivation, Vol. 24, pp. 109–165. Cited by: §2.
  • S. Qiu, Q. Liu, S. Zhou, and C. Wu (2019)

    Review of artificial intelligence adversarial attack and defense technologies

    .
    Applied Sciences 9 (5), pp. 909. Cited by: §2.
  • A. Raghunathan, S. M. Xie, F. Yang, J. C. Duchi, and P. Liang (2019) Adversarial training can hurt generalization. arXiv preprint arXiv:1906.06032. Cited by: §1, §1.
  • R. Stanforth, A. Fawzi, P. Kohli, et al. (2019) Are labels required for improving adversarial robustness?. arXiv preprint arXiv:1905.13725. Cited by: §1.
  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §1.
  • F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel (2017) Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204. Cited by: §1, §1.
  • S. Wen and L. Itti (2019) Beneficial perturbation network for continual learning. CoRR abs/1906.10528. External Links: Link, 1906.10528 Cited by: §2.
  • H. Xiao, K. Rasul, and R. Vollgraf (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747. Cited by: §5.1.
  • H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan (2019) Theoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573. Cited by: §1.