1 Introduction
Due to the surprisingly good representation power of complex distributions, NN models are widely used in many applications including natural language processing, computer vision and cybersecurity. For example, in cybersecurity, NN based classifiers are used for spam filtering, phishing detection as well as face recognition
[18] [1]. However, the training and usage of NN classifiers are based on an underlying assumption that the environment is attack free. Therefore, such classifiers fail when adversarial examples are presented to them.Adversarial examples were first introduced in [21] in the context of image classification. It shows that a visually insignificant modification with specially designed perturbations can result in a huge change of prediction results with nearly success rate. Generally, adversarial examples can be used to mislead NN models to output any aimed prediction. They could be extremely harmful for many applications that utilize NNs, such as automatic cheque withdrawal in banks, traffic speed detection, and medical diagnosis in hospitals. As a result, this serious threat inspires a new line of research to explore the vulnerability of NN classifiers and develop appropriate defensive methods.
Recently, a plethora of methods to countermeasure adversarial examples has been introduced and evaluated. Among these methods, adversarial training defenses play an important role since they (1) effectively enhance the robustness, and (2) do not limit adversary’s knowledge. However, most of them lack the tradeoff between classifying original and adversarial examples. For applications that are sensitive to misbehavior or operate in risky environment, it is worth to enhance defenses against adversarial examples by sacrificing performance on original examples. The ability to dynamically control such tradeoff makes the defense even more valuable.
In this paper, we propose a GAN based defense against adversarial examples, dubbed GanDef. GanDef is designed based on adversarial training combined with feature learning [12] [24] [10]. As a GAN model, GanDef contains a classifier and a discriminator which form a minimax game. To achieve the dynamic tradeoff between classifying original and adversarial examples, we also propose a variant of GanDef, GanDefComb, that utilizes both classifier and discriminator. During evaluation, we select several stateoftheart adversarial training defenses as references, including Pure PGD training (Pure PGD) [13], Mix PGD training (Mix PGD) [7] and Logit Pairing [7]. The comparison results show that GanDef performs better than stateoftheart adversarial training defenses in terms of test accuracy. Our contributions can be summarized as follows:

We propose the defensive method, GanDef, which is based on the idea of using a discriminator to regularize classifier’s feature selection.

We mathematically prove that the solution of the proposed minimax game in GanDef contains an optimal classifier, which usually makes correct predictions on adversarial examples by using perturbation invariant features.

We empirically show that the trained classifier in GanDef achieves the same level of test accuracy as that in stateoftheart approaches. Adding the discriminator, GanDefComb can dynamically control the tradeoff on classifying original and adversarial examples and achieves the highest overall test accuracy when the ratio of adversarial examples exceeds 41.7%.
2 Background and Related Work
In this section, we introduce highlevel background material about threat model, adversarial example generators and defensive mechanisms for the better understanding of concepts presented in this work. We also provide relevant references for further information about each topic.
2.1 Threat Model
The adversary aims at misleading the NN model utilized by the application to achieve a malicious goal. For example, adversary adds adversarial perturbation to the image of a cheque. As a result, this image may mislead the NN model utilized by the ATM machine to cash out a huge amount of money. During the preparation of adversarial examples we assume that adversary has full knowledge of the targeted NN model, which is the whitebox scenario. Also, we assume that adversary has limited computational power. As a result, the adversary can generate iterative adversarial examples but cannot exhaustively search all possible input perturbation.
2.2 Generating Adversarial Examples
The adversarial examples could be classified into whitebox and blackbox attacks based on adversary’s knowledge of target NN classifier. Based on the generating process, they could be also classified as singlestep and iterative adversarial examples.
Fast Gradient Sign Method (FGSM) is introduced by Goodfellow et. al in [6]
as a singlestep whitebox adversarial example generator against NN image classifiers. This method tries to maximize the loss function value,
, of NN classifier, , to find adversarial examples. The function is used to ensure that the generated adversarial example is still a valid image.subject to 
To keep visual similarity and enhance generation speed, this maximization problem is solved by running gradient ascent for one iteration. It simply generates adversarial examples,
, from original images, , by adding small perturbation, , which changes each pixel value along the gradient direction of the loss function. As a single step adversarial example generator, FGSM can generate adversarial examples efficiently. However, the quality of the generated adversarial examples is relatively low due to the linear approximation of the loss function landscape.Basic Iterative Method (BIM) is introduced by Kurakin et. al in [8] as an iterative whitebox adversarial example generator against NN image classifiers. In the algorithm design, BIM utilizes the same mathematical model as FGSM. But, different from the FGSM, BIM is an iterative attack method. Instead of making the adversarial perturbation in one iteration, BIM runs the gradient ascent algorithm multiple iterations to maximize the loss function. In each iteration, BIM applies smaller perturbation and maps the perturbed image through the function
. As a result, BIM approximates the loss function landscape by linear spline interpolation. Therefore, it generates stronger adversarial examples than FGSM within the same neighboring area.
Projected Gradient Descent (PGD) is another iterative whitebox adversarial example generator recently introduced by Madry et. al in [13]. Similar to BIM, PGD also solves the same optimization problem iteratively with the projected gradient descent algorithm. However, PGD randomly selects an initial point within a limited area of the original image and repeats this several times to generate an adversarial example. With this multiple time random initialization, PGD is shown experimentally to solve the optimization problem efficiently and generate more serious adversarial examples since the loss landscape has a surprisingly tractable structure [13].
2.3 Adversarial Example Defensive Methods
Many defense methods have been proposed recently. In the following, we summarize and present representative samples from three major defense classes.
Augmentation and Regularization
aims at penalizing overconfident prediction or utilizing synthetic data during training. One of the early ideas is the defensive distillation. Defensive distillation uses the prediction score from original NN, usually called teacher, as ground truth to train another smaller NN, usually called student
[17] [16]. It has been shown that the calculated gradients from the student model become very small or even reach zero and hence become useless to the adversarial example generator [16]. Some of the recent works that belong to this set of methods are referred to as Fortified Network [9] and Manifold Mixup [23]. Fortified Network utilizes denoising autoencoder to regularize the hidden states. Manifold Mixup also focuses on the hidden states but follows a different way. During the training, Manifold Mixup uses interpolations of hidden states and logits during training to enhance the diversity of training data. Compared with adversarial training defenses, this set of defenses has significant limitations. For example, defensive distillation is vulnerable to Carlini attack
[4] and Manifold Mixup can only defend against single step attacks.Protective Shell is a set of defensive methods which aim at using a shell to reject or reform the adversarial examples. An example of these methods is introduced by Meng et. al in [14] which is called MagNet. In this work, the authors design two types of functional components: the detector and the reformer. Adversarial examples are either rejected by the detector or reformed to eliminate the perturbations. Other recent works such as [11] and [19] try to utilize different methods to build the shell. In [11], authors inject adaptive noise to input images which breaks the adversarial perturbations without significant decrease of classification accuracy. In [19], a generator is utilized to generate images that are similar to the inputs. By replacing the inputs with generated images, it achieves resistance to adversarial examples. However, this set of methods usually assume the shell itself is blackbox to the adversary and the work in [2] has already found ways to break such an assumption.
Adversarial Training is based on a straightforward idea that treats adversarial examples as blind spots of the original training data [25]. Through retraining with adversarial examples, the classifier learns the perturbation pattern and generalizes its prediction to account for such perturbations. In [6], the adversarial examples generated by FGSM are used for adversarial training and the trained NN classifier can defend single step adversarial examples. Later works in [13] and [22] enhance the adversarial training method to defend examples like BIM and PGD. A more recent work in [7] requires that the presoftmax logits from original and adversarial examples to be similar. Authors believe this method could utilize more information during adversarial training. A common problem in existing adversarial training defenses is that the trained classifier has no control of the tradeoff between correctly classifying original and adversarial examples. Our work achieves this flexibility and shows the benefit.
3 GanDef: GAN based Adversarial Training
In this section, we present the design of our defensive method (GanDef) as follows. First, the design of GanDef is introduced as a minimax game of the classifier and discriminator. Then we conduct a theoretical analysis of the proposed minimax game in GanDef. Finally, we conduct experimental analysis to evaluate the convergence of GanDef.
3.1 Design
Given the training data pair , where , we try to find a classification function that uses to produce presoftmax logits such that:
The mapping between and is the softmax function. 
Since can be either original example or adversarial example
, we want the classifier to model the conditional probability
with only nonadversarial features. To achieve this, we employ another NN and call it discriminator . uses the presoftmax logits from as inputs and predicts whether the input to classifier is or . This process can be performed by maximizing the conditional probability , where is a Boolean variable indicating the source of is original or adversarial. Finally, by combining the classifier and the discriminator, we formulate the following minimax game:In this work, we envision that the classifier could be seen as a generator that generates presoftmax logits based on selected features from input images. Then, the classifier and the discriminator engage in a minimax game, which is also known as Generative Adversarial Net (GAN) [5]. Therefore, we name our proposed defense as “GAN based Adversarial Training” (GanDef). While other defenses ignore or only compare and , utilizing discriminator with adds a second line of defense when the classifier is defeated by adversarial examples.
The pseudocode of GanDef training is summarized in Algorithm 1 and is visualized in Figure 2. A summary of the notations used throughout this work is available in Table 1.
loss function of NN classifier  
function which regularize pixel value of generated example  
original, adversarial and all training examples  
ground truth of original, adversarial and all training examples  
presoftmax logits of original, adversarial and all training examples  
source indicator of original, adversarial and all training examples  
adversarial perturbation  
NN based classifier  
NN based discriminator  
reward function of the minimax game  
weight parameter in the NN model  
tradeoff hyperparameters in GanDef 
3.2 Theoretical Analysis
With the formal definition of our GanDef, we perform a theoretical analysis in this subsection. We show that under the current definition where is a combination of log likelihood of and , the solution of the minimax game contains an optimal classifier which can correctly classify adversarial examples. It is worth noting that our analysis is conducted in a nonparametric setting, which means that the classifier and the discriminator have enough capacity to model any distribution.
Proposition 1
If there exists a solution for the aforementioned minmax game such that , then is a classifier that can defend against adversarial examples.
Proof
For any fixed classification model , the optimal discriminator can be formulated as
In this case, the discriminator can perfectly model the conditional distribution and we have for all and all . Therefore, we can rewrite with optimal discriminator as and denote the second half of as a conditional entropy
For the optimal classification model, the goal is to achieve the conditional probability since can determine by taking softmax transformation. Therefore, the first part of (the expectation) is larger than or equal to . Combined with the basic property of conditional entropy that , we can get the following lower bound of with optimal classifier and discriminator
This equality holds when the following two conditions are satisfied:

The classifier perfectly models the conditional distribution of given , , which means that is an optimal classifier.

and are independent, , which means that adversarial perturbations do not affect presoftmax logits.
In practice, the assumption of unlimited capacity in classifier and discriminator may not hold and it would be hard or even impossible to build an optimal classifier that outputs presoftmax logits that are independent from adversarial perturbation. Therefore, we introduce a tradeoff hyperparameter into the minimax function as follows:
When , GanDef is the same as traditional adversarial training. When increases, the discriminator becomes more and more sensitive to information of contained in presoftmax logits, .
3.3 Convergence Analysis
Beyond the theoretical analysis, we also conduct an experimental analysis of the convergence of GanDef. Based on the pseudocode in Algorithm 1
, we train a classifier on MNIST dataset. In order to compare the convergence, we also implement Pure PGD, Mix PGD and Logit Pairing and present their test accuracies on original test images during different training epochs.
As we can see from Figure 2, the convergence of GanDef is not as good as other stateoftheart adversarial training defenses. Although all these methods converge to over 95% test accuracy, GanDef shows significant fluctuation during the training process.
In order to improve the convergence of GanDef, we carefully trace back the design process and identify the root cause of the fluctuations. During the training of the classifier, we subtract the penalty term which encourages the classifier to hide information of in every . Compared with Logit Pairing which requires similar from original and adversarial examples, our penalty term is too strong. Therefore, we modify the training loss of the classifier to:
Recall that , and represent the adversarial example, its presoftmax logits, and the source indicator, respectively. It is also worth to mention that this modification is only applied to the classifier. Therefore, it does not affect the consistency of the previous proof. During convergence analysis, we denote the modified version of our defensive method as GanDef V2 and its convergence results are also shown in Figure 2. It is clear that GanDef V2 significantly improves the convergence and stability during the training. Moreover, its test accuracy on the original as well as several different whitebox adversarial examples is also higher than the initial design. Due to these improvements, we use it as the standard implementation of GanDef in the rest of this work.
4 Experiments and Results
In this section, we present comparative evaluation results of the adversarial training defenses introduced previously.
4.1 Datasets, NN Structures and Hyperparameter
During evaluation, we conduct experiments for classifying original and adversarial examples on both MNIST and CIFAR10 datasets. To ensure the quality of evaluation, we utilize the standard python library (CleverHans [15]) and run all experiments on a Linux Workstation with NVIDIA GTX1080 GPU. We choose the adversarial examples introduced in Section 2 and denote them as FGSM, BIM, PGD1 and PGD2 examples. For MNIST dataset, PGD1 represents 40iteration PGD attack while PGD2 corresponds to 80iteration PGD attack. Moreover, the maximum perturbation limitation is 0.3. The per step perturbation limitations for BIM and PGD examples are 0.05 and 0.01. For CIFAR10 dataset, these two sets of adversarial examples are 7iteration and 20iteration PGD attack. The maximum perturbation limitation is while per step perturbation limitation for BIM and PGD is .
During the training, the vanilla classifier only uses original training data while defensive methods utilize original and PGD1 examples except for Pure PGD which only requires the PGD1 examples. For the testing part, we generate adversarial examples based on test data which was not used in training. These adversarial examples together with original test data form the complete test dataset during the evaluation stage. To make a fair comparison, defensive methods and vanilla classifier share the same NN structures which are (1) LeNet [13] for MNIST, and (2) allCNN [20] for CIFAR10. Due to the page limitation, the detailed structure is shown in the Appendix. The hyperparameter of existing defensive methods are the same as the original papers [13][7]. During the training of Logit Pairing on CIFAR10, we found that using the same tradeoff parameter as MNIST lead to divergence. To resolve the issue, we try to change the optimizer, learning rate, initialization and weight decay. However, none of them work until the weight of logit comparison loss is decreased to 0.01.
To validate the NN structure as well as the adversarial examples, we utilize the vanilla classifier to classify original and adversarial examples. Based on the results in Table 2, the test accuracy of the vanilla classifier on original examples matches the records of benchmarks in [3]. Moreover, the test accuracy of the vanilla classifier on any kind of adversarial examples has significant degeneration which shows the adversarial example generators are working properly.
4.2 Comparative Evaluation of Defensive Approaches
As the first step, we compare the GanDef with stateoftheart adversarial training defenses in terms of test accuracy on original and whitebox adversarial examples. The results are presented in Figure 3 and summarized in Table 2.
On MNIST, all defensive methods achieve around 99% test accuracy on original examples and Pure PGD is slightly better than others. In general, the test accuracy of defensive methods are almost the same and does not go lower than that of the vanilla model. On CIFAR10, we can see that the test accuracy of defensive methods on original data is around 83% and these of Logit Pairing and GanDef are slightly higher than others. Compared with the vanilla classifier, there is about 5% decrease in test accuracy. Similar degeneration is also reported in previous works on Pure PGD, Mix PGD and Logit Pairing [13][7].
During the evaluation on MNIST, there are no significant differences among defensive methods and each could achieve around 95% test accuracy. The Pure PGD method is the best on the evaluation of FGSM and BIM examples, while the Logit Pairing is the best on the evaluation of PGD1 and PGD2 examples. Based on the evaluation results from CIFAR10, we can see the differences between defensive methods are slightly larger. On all four kinds of whitebox adversarial examples, Pure PGD is the best method and the test accuracy ranges from 48.33% (PGD1) to 56.18% (FGSM). In the rest of defensive methods, GanDef is the best choice with test accuracy ranges from 45.62% (PGD1) to 54.14% (FGSM).
Based on the comparison as well as visualization in Figure 3, it is clear that the proposed GanDef has the same level of performance as stateoftheart adversarial training defenses in terms of the trained classifier’s test accuracy on original and different adversarial examples.
Vanilla  Pure PGD  Mix PGD  Logit Pairing  GanDef  GanDefComb  

MNIST 
Original  98.70%  99.15%  99.17%  98.50%  99.10%  99.10% 
FGSM  12.15%  97.60%  96.89%  97.00%  96.85%  96.85%  
BIM  1.07%  94.75%  94.58%  95.83%  94.28%  94.28%  
PGD1  0.87%  95.60%  95.56%  96.34%  95.31%  95.21%  
PGD2  0.93%  94.14%  93.99%  95.42%  93.62%  93.38%  
CIFAR10 
Original  89.69%  82.06%  83.70%  84.21%  84.05%  63.97% 
FGSM  18.43%  56.18%  52.21%  51.63%  54.14%  87.61%  
BIM  6.76%  49.21%  44.39%  44.09%  46.64%  76.02%  
PGD1  6.48%  51.51%  47.11%  46.53%  49.21%  80.39%  
PGD2  6.44%  48.33%  43.48%  43.28%  45.62%  73.56% 
4.3 Evaluation of GanDefComb
In the second phase of evaluation, we consider GanDefComb which is a variant of GanDef. This variant utilizes both classifier and discriminator trained by GanDef. As we show in Section 3, the discriminator could be seen as a second line of defense when the trained classifier fails to make correct predictions on adversarial examples. By setting different threshold values for the discriminator, GanDef can dynamically control the tradeoff between classifying original and adversarial examples. In current evaluation, the threshold is set to .
On MNIST, the test accuracy of GanDefComb on original, FGSM and BIM examples is the same as that of GanDef. On PGD1 and PGD2 examples, the test accuracy of GanDefComb has a small degeneration (less than 0.3%). This is because MNIST dataset is so simple such that the classifier alone can provide near optimal defense. Those misclassified corner cases are hard to be patched by utilizing discriminator. In common cases, the classifier has much larger degeneration on classifying adversarial examples. For example, on the CIFAR10, the benefit of utilizing discriminator is obvious due to such degeneration. From the results of test accuracy, GanDefComb is significantly better than stateoftheart adversarial training defenses on mitigating FGSM, BIM, PGD1 and PGD2 examples. Based on the comparison, GanDefComb enhances test accuracy by at least 31.43% on FGSM, 26.81% on BIM, 28.88% on PGD1 and 25.23% on PGD2. Although the test accuracy of GanDefComb on original examples has about 20% degeneration, the enhancement on defending adversarial examples benefits the overall test accuracy when the ratio of adversarial examples exceeds a certain limit.
To show the benefit of being able to control the tradeoff, we design two experiments on CIFAR10 dataset. We form test dataset with original and adversarial examples (FGSM examples in the first experiment and PGD2 examples in the second one). The ratio of adversarial examples, , changes from to . Giving similar weight losses in classifying original and adversarial examples, represents the probability of receiving adversarial examples. Or, giving similar probabilities of receiving original and adversarial examples, represents the weight of correctly classify adversarial examples ( for original examples). These two evaluations are designed for risky or misbehaviorsensitive running environments, respectively.
The results of the overall test accuracy under different experiments are shown in Figure 4. It can be seen that GanDefComb is better than stateoftheart defenses in terms of overall test accuracy when exceeds 41.7%. In real applications, we could further enhance the overall test accuracy through changing the discriminator’s threshold value. When is low, GanDefComb gives less attention to discriminator (high threshold value) and achieves similar performance as that of the stateoftheart defenses. When is high, GanDefComb relies on discriminator (low threshold value) to detect more adversarial examples.
5 Conclusion
In this paper, we introduce a new defensive method for Adversarial Examples, GanDef, which formulates a minimax game with a classifier and a discriminator during training. Through evaluation, we show that (1) the classifier achieves the same level of defense as classifiers trained by stateoftheart defenses, and (2) using both classifier and discriminator (GanDefComb) can dynamically control the tradeoff in classification and achieve higher overall test accuracy under the risky or misbehaviorsensitive running environment.
6 Future Work
One of the unsolved problem in the proposed GanDef method is that the degeneration of classifying original examples when the classifier and the discriminator are combined. For future work, we consider utilizing more sophisticated GAN models which can mitigate this degeneration.
References
 [1] AbuNimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings of the antiphishing working groups 2nd annual eCrime researchers summit. pp. 60–69. ACM (2007)
 [2] Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018)
 [3] Benenson, R.: Classification datasets results (2018), http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html, [Online; accessed 06April2018]
 [4] Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP). pp. 39–57. IEEE (2017)

[5]
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep learning, vol. 1. MIT press Cambridge (2016)
 [6] Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. International Conference on Learning Representations (2015)
 [7] Kannan, H., Kurakin, A., Goodfellow, I.: Adversarial logit pairing. arXiv preprint arXiv:1803.06373 (2018)

[8]
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. International Conference on Learning Representations (2017)
 [9] Lamb, A., Binas, J., Goyal, A., Serdyuk, D., Subramanian, S., Mitliagkas, I., Bengio, Y.: Fortified networks: Improving the robustness of deep networks by modeling the manifold of hidden representations. arXiv preprint arXiv:1804.02485 (2018)
 [10] Lample, G., Zeghidour, N., Usunier, N., Bordes, A., Denoyer, L., et al.: Fader networks: Manipulating images by sliding attributes. In: Advances in Neural Information Processing Systems. pp. 5969–5978 (2017)
 [11] Liang, B., Li, H., Su, M., Li, X., Shi, W., Wang, X.: Detecting adversarial examples in deep networks with adaptive noise reduction. arXiv preprint arXiv:1705.08378 (2017)
 [12] Louppe, G., Kagan, M., Cranmer, K.: Learning to pivot with adversarial networks. In: Advances in Neural Information Processing Systems. pp. 982–991 (2017)
 [13] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
 [14] Meng, D., Chen, H.: Magnet: a twopronged defense against adversarial examples. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. pp. 135–147. ACM (2017)
 [15] Papernot, N., Faghri, F., Carlini, N., Goodfellow, I., Feinman, R., Kurakin, A., Xie, C., Sharma, Y., Brown, T., Roy, A., Matyasko, A., Behzadan, V., Hambardzumyan, K., Zhang, Z., Juang, Y.L., Li, Z., Sheatsley, R., Garg, A., Uesato, J., Gierke, W., Dong, Y., Berthelot, D., Hendricks, P., Rauber, J., Long, R.: Technical report on the cleverhans v2.1.0 adversarial examples library. arXiv preprint arXiv:1610.00768 (2018)
 [16] Papernot, N., McDaniel, P.: Extending defensive distillation. arXiv preprint arXiv:1705.05264 (2017)
 [17] Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: Security and Privacy (SP), 2016 IEEE Symposium on. pp. 582–597. IEEE (2016)
 [18] Rowley, H.A., Baluja, S., Kanade, T.: Neural networkbased face detection. IEEE Transactions on pattern analysis and machine intelligence 20(1), 23–38 (1998)
 [19] Samangouei, P., Kabkab, M., Chellappa, R.: Defensegan: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605 (2018)
 [20] Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: The all convolutional net. International Conference on Learning Representations (2017)
 [21] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. International Conference on Learning Representations (2014)
 [22] Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204 (2017)
 [23] Verma, V., Lamb, A., Beckham, C., Courville, A., Mitliagkis, I., Bengio, Y.: Manifold mixup: Encouraging meaningful onmanifold interpolation as a regularizer. arXiv preprint arXiv:1806.05236 (2018)
 [24] Xie, Q., Dai, Z., Du, Y., Hovy, E., Neubig, G.: Controllable invariance through adversarial feature learning. In: Advances in Neural Information Processing Systems. pp. 585–596 (2017)
 [25] Xu, W., Qi, Y., Evans, D.: Automatically evading classifiers. In: Proceedings of the 2016 Network and Distributed Systems Symposium (2016)
Appendix Classifier Structures
Layer  Kernel Size  Strides  Padding  Activation  Init 

Convolution  Same  ReLU  Default  
MaxPool        
Convolution  Same  ReLU  Default  
MaxPool        
Flatten           
Dense      ReLU  Default  
Dense        Default 
Layer  Kernel Size  Strides  Padding  Activation  Init 

Dropout  (drop rate)         
Convolution  Same  ReLU  He  
Convolution  Same  ReLU  He  
Convolution  Same  ReLU  He  
MaxPool        
Dropout  (drop rate)         
Convolution  Same  ReLU  He  
Convolution  Same  ReLU  He  
Convolution  Same  ReLU  He  
MaxPool        
Dropout  (drop rate)         
Convolution  Valid  ReLU  He  
Convolution  Same  ReLU  He  
Convolution  Same  ReLU  He  
GlobalAvgPool           
Dense        Default 